Budgeted online kernel classifiers for large scale learning.

机译：用于大规模学习的预算在线内核分类器。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

In the environment where new large scale problems are emerging in various disciplines and pervasive computing applications are becoming more common, there is an urgent need for machine learning algorithms that could process increasing amounts of data using comparatively smaller computing resources in a computational efficient way. Previous research has resulted in many successful learning algorithms that scale linearly or even sub-linearly with sample size and dimension, both in runtime and in space. However, linear or even sub-linear space scaling is often not sufficient, because it implies an unbounded growth in memory with sample size. This clearly opens another challenge: how to learn from large, or practically infinite, data sets or data streams using memory limited resources.;Online learning is an important learning scenario in which a potentially unlimited sequence of training examples is presented one example at a time and can only be seen in a single pass. This is opposed to offline learning where the whole collection of training examples is at hand. The objective is to learn an accurate prediction model from the training stream. Upon on repetitively receiving fresh example from stream, typically, online learning algorithms attempt to update the existing model without retraining.;The invention of the Support Vector Machines (SVM) attracted a lot of interest in adapting the kernel methods for both offline and online learning. Typical online learning for kernel classifiers consists of observing a stream of training examples and their inclusion as prototypes when specified conditions are met. However, such procedure could result in an unbounded growth in the number of prototypes. In addition to the danger of the exceeding the physical memory, this also implies an unlimited growth in both update and prediction time.;To address this issue, in my dissertation I propose a series of kernel-based budgeted online algorithms, which have constant space and constant update and prediction time. This is achieved by maintaining a fixed number of prototypes under the memory budget. Most of the previous works on budgeted online algorithms focus on kernel perceptron. In the first part of the thesis, I review and discuss these existing algorithms and then propose a kernel perceptorn algorithm which removes the prototype with the minimal impact on classification accuracy to maintain the budget. This is achieved by dual use of cached prototypes for both model presentation and validation. In the second part, I propose a family of budgeted online algorithms based on the Passive- Aggressive (PA) style. The budget maintenance is achieved by introducing an additional constraint into the original PA optimization problem. A closed-form solution was derived for the budget maintenance and model update. In the third part, I propose a budgeted online SVM algorithm. The proposed algorithm guarantees that the optimal SVM solution is maintained on all the prototype examples at any time. To maximize the accuracy, prototypes are constructed to approximate the data distribution near the decision boundary. In the fourth part, I propose a family of budgeted online algorithms for multi-class classification. The proposed algorithms are the recently proposed SVM training algorithm Pegasos. I prove that the gap between the budgeted Pegasos and the optimal SVM solution directly depends on the average model degradation due to budget maintenance. Following the analysis, I studied greedy multi-class budget maintenance methods based on removal, projection and merging of SVs. In each of these four parts, the proposed algorithms were experimentally evaluated against the state-of-art competitors. The results show that the proposed budgeted online algorithms outperform the competitive algorithm and achieve accuracy comparable to non-budget counterparts while being extremely computationally efficient.

机译：在各种学科中出现新的大规模问题并且无处不在的计算应用程序变得越来越普遍的环境中，迫切需要一种机器学习算法，该算法可以使用相对较小的计算资源以高效计算的方式处理越来越多的数据。先前的研究已经产生了许多成功的学习算法，这些算法可以在运行时和空间中随样本大小和维度线性甚至亚线性地扩展。但是，线性甚至子线性空间缩放通常是不够的，因为这意味着内存随样本大小的增长是无限的。这显然带来了另一个挑战：如何使用内存有限的资源从大型的或几乎无限的数据集或数据流中学习;在线学习是一种重要的学习场景，其中一次演示一个示例的顺序可能是无限的并且只能通过一次查看。这与离线学习相反，在离线学习中，整个培训示例集合就在眼前。目的是从训练流中学习准确的预测模型。在反复从流中接收到新的示例后，通常，在线学习算法会尝试在不进行重新训练的情况下更新现有模型。支持向量机（SVM）的发明引起了人们对于将内核方法应用于离线和在线学习的广泛兴趣。。内核分类器的典型在线学习包括观察培训示例流，并在满足指定条件时将其作为原型包含在内。但是，这样的过程可能会导致原型数量的无限增长。除了存在超出物理内存的危险之外，这还意味着更新和预测时间将无限增长。为了解决这个问题，我在论文中提出了一系列基于内核的预算在线算法，这些算法具有恒定的空间并不断更新和预测时间。这是通过在内存预算下保持固定数量的原型来实现的。先前有关预算在线算法的大部分工作都集中在内核感知器上。在论文的第一部分中，我回顾并讨论了这些现有算法，然后提出了一种内核感知算法，该算法去除了原型，而对分类精度的影响最小，以保持预算。这是通过将缓存的原型同时用于模型表示和验证来实现的。在第二部分中，我提出了一系列基于被动进取（PA）风格的预算在线算法。通过将额外的约束引入原始的PA优化问题中来实现预算维护。为预算维护和模型更新得出了封闭式解决方案。在第三部分中，我提出了一种预算在线SVM算法。所提出的算法保证了可以在所有原型实例上随时保持最佳的SVM解决方案。为了使准确性最大化，构建了原型以近似决策边界附近的数据分布。在第四部分中，我提出了一套用于多类分类的预算在线算法。提出的算法是最近提出的SVM训练算法Pegasos。我证明了预算的Pegasos与最佳SVM解决方案之间的差距直接取决于预算维护带来的平均模型退化。经过分析，我研究了基于SV的删除，投影和合并的贪婪的多类预算维护方法。在这四个部分的每一个部分中，都针对最新的竞争对手对所提出的算法进行了实验评估。结果表明，所提出的预算在线算法优于竞争算法，并具有与非预算算法相当的准确性，同时具有极高的计算效率。

著录项

作者
Wang, Zhuang.;
展开▼
作者单位

Temple University.;

展开▼
授予单位 Temple University.;
学科 Computer Science.
学位 Ph.D.
年度 2010
页码 124 p.
总页数 124
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Online Learning Algorithm of Kernel-Based Ternary Classifiers Using Support Vectors [J] . A. V. Kovalchuk, N. S. Bellyustin Optical memory & neural networks . 2013,第3期

机译：支持向量的基于核的三元分类器在线学习算法
2. Fast Kernel Classifiers with Online and Active Learning [J] . Bordes Antoine, Ertekin Seyda, Weston Jason, Journal of machine learning research . 2005,第Sep期

机译：在线和主动学习的快速内核分类器
3. An empirical study on budget-aware online kernel algorithms for streams of graphs [J] . Da San Martino Giovanni, Navarin Nicolo, Sperduti Alessandro Neurocomputing . 2016,第DECa5期

机译：图流感知预算的在线内核算法的实证研究
4. Online Budgeted Stochastic Coordinate Ascent for Large-Scale Kernelized Dual Support Vector Machine Training [C] . Sahar Qaadan, Abhijeet Pendyala, Merlin Schueler, International conference on pattern recognition applications and methods . 2020

机译：在线预算随机坐标Ascent用于大型内核双重支持向量机训练
5. Scalable kernel methods for machine learning. [D] . Kulis, Brian Joseph. 2008

机译：用于机器学习的可扩展内核方法。
6. Measuring attitudes towards interprofessional learning. Testing two German versions of the tool Readiness for Interprofessional Learning Scale on interprofessional students of health and nursing sciences and of human medicine [O] . Christiane Luderer, Matthias Donat, Ute Baum, 2017

机译：衡量对跨职业学习的态度。针对健康护理科学和人类医学专业的学生测试两个德语版本的专业间学习准备程度工具
7. Insights from classifying visual concepts with multiple kernel learning. [O] . Alexander Binder, Shinichi Nakajima, Marius Kloft, 2012

机译：通过多核学习对视觉概念进行分类的见解。

Budgeted online kernel classifiers for large scale learning.

摘要

著录项

相似文献

相关主题

期刊订阅