首页> 外文学位 >Budgeted online kernel classifiers for large scale learning.
【24h】

Budgeted online kernel classifiers for large scale learning.

机译:用于大规模学习的预算在线内核分类器。

获取原文
获取原文并翻译 | 示例

摘要

In the environment where new large scale problems are emerging in various disciplines and pervasive computing applications are becoming more common, there is an urgent need for machine learning algorithms that could process increasing amounts of data using comparatively smaller computing resources in a computational efficient way. Previous research has resulted in many successful learning algorithms that scale linearly or even sub-linearly with sample size and dimension, both in runtime and in space. However, linear or even sub-linear space scaling is often not sufficient, because it implies an unbounded growth in memory with sample size. This clearly opens another challenge: how to learn from large, or practically infinite, data sets or data streams using memory limited resources.;Online learning is an important learning scenario in which a potentially unlimited sequence of training examples is presented one example at a time and can only be seen in a single pass. This is opposed to offline learning where the whole collection of training examples is at hand. The objective is to learn an accurate prediction model from the training stream. Upon on repetitively receiving fresh example from stream, typically, online learning algorithms attempt to update the existing model without retraining.;The invention of the Support Vector Machines (SVM) attracted a lot of interest in adapting the kernel methods for both offline and online learning. Typical online learning for kernel classifiers consists of observing a stream of training examples and their inclusion as prototypes when specified conditions are met. However, such procedure could result in an unbounded growth in the number of prototypes. In addition to the danger of the exceeding the physical memory, this also implies an unlimited growth in both update and prediction time.;To address this issue, in my dissertation I propose a series of kernel-based budgeted online algorithms, which have constant space and constant update and prediction time. This is achieved by maintaining a fixed number of prototypes under the memory budget. Most of the previous works on budgeted online algorithms focus on kernel perceptron. In the first part of the thesis, I review and discuss these existing algorithms and then propose a kernel perceptorn algorithm which removes the prototype with the minimal impact on classification accuracy to maintain the budget. This is achieved by dual use of cached prototypes for both model presentation and validation. In the second part, I propose a family of budgeted online algorithms based on the Passive- Aggressive (PA) style. The budget maintenance is achieved by introducing an additional constraint into the original PA optimization problem. A closed-form solution was derived for the budget maintenance and model update. In the third part, I propose a budgeted online SVM algorithm. The proposed algorithm guarantees that the optimal SVM solution is maintained on all the prototype examples at any time. To maximize the accuracy, prototypes are constructed to approximate the data distribution near the decision boundary. In the fourth part, I propose a family of budgeted online algorithms for multi-class classification. The proposed algorithms are the recently proposed SVM training algorithm Pegasos. I prove that the gap between the budgeted Pegasos and the optimal SVM solution directly depends on the average model degradation due to budget maintenance. Following the analysis, I studied greedy multi-class budget maintenance methods based on removal, projection and merging of SVs. In each of these four parts, the proposed algorithms were experimentally evaluated against the state-of-art competitors. The results show that the proposed budgeted online algorithms outperform the competitive algorithm and achieve accuracy comparable to non-budget counterparts while being extremely computationally efficient.
机译:在各种学科中出现新的大规模问题并且无处不在的计算应用程序变得越来越普遍的环境中,迫切需要一种机器学习算法,该算法可以使用相对较小的计算资源以高效计算的方式处理越来越多的数据。先前的研究已经产生了许多成功的学习算法,这些算法可以在运行时和空间中随样本大小和维度线性甚至亚线性地扩展。但是,线性甚至子线性空间缩放通常是不够的,因为这意味着内存随样本大小的增长是无限的。这显然带来了另一个挑战:如何使用内存有限的资源从大型的或几乎无限的数据集或数据流中学习;在线学习是一种重要的学习场景,其中一次演示一个示例的顺序可能是无限的并且只能通过一次查看。这与离线学习相反,在离线学习中,整个培训示例集合就在眼前。目的是从训练流中学习准确的预测模型。在反复从流中接收到新的示例后,通常,在线学习算法会尝试在不进行重新训练的情况下更新现有模型。支持向量机(SVM)的发明引起了人们对于将内核方法应用于离线和在线学习的广泛兴趣。 。内核分类器的典型在线学习包括观察培训示例流,并在满足指定条件时将其作为原型包含在内。但是,这样的过程可能会导致原型数量的无限增长。除了存在超出物理内存的危险之外,这还意味着更新和预测时间将无限增长。为了解决这个问题,我在论文中提出了一系列基于内核的预算在线算法,这些算法具有恒定的空间并不断更新和预测时间。这是通过在内存预算下保持固定数量的原型来实现的。先前有关预算在线算法的大部分工作都集中在内核感知器上。在论文的第一部分中,我回顾并讨论了这些现有算法,然后提出了一种内核感知算法,该算法去除了原型,而对分类精度的影响最小,以保持预算。这是通过将缓存的原型同时用于模型表示和验证来实现的。在第二部分中,我提出了一系列基于被动进取(PA)风格的预算在线算法。通过将额外的约束引入原始的PA优化问题中来实现预算维护。为预算维护和模型更新得出了封闭式解决方案。在第三部分中,我提出了一种预算在线SVM算法。所提出的算法保证了可以在所有原型实例上随时保持最佳的SVM解决方案。为了使准确性最大化,构建了原型以近似决策边界附近的数据分布。在第四部分中,我提出了一套用于多类分类的预算在线算法。提出的算法是最近提出的SVM训练算法Pegasos。我证明了预算的Pegasos与最佳SVM解决方案之间的差距直接取决于预算维护带来的平均模型退化。经过分析,我研究了基于SV的删除,投影和合并的贪婪的多类预算维护方法。在这四个部分的每一个部分中,都针对最新的竞争对手对所提出的算法进行了实验评估。结果表明,所提出的预算在线算法优于竞争算法,并具有与非预算算法相当的准确性,同时具有极高的计算效率。

著录项

  • 作者

    Wang, Zhuang.;

  • 作者单位

    Temple University.;

  • 授予单位 Temple University.;
  • 学科 Computer Science.
  • 学位 Ph.D.
  • 年度 2010
  • 页码 124 p.
  • 总页数 124
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号