首页> 外文期刊>Evolutionary Intelligence >Efficient recurrent local search strategies for semi- and unsupervised regularized least-squares classification
【24h】

Efficient recurrent local search strategies for semi- and unsupervised regularized least-squares classification

机译:用于半监督和无监督正则化最小二乘分类的有效递归局部搜索策略

获取原文
获取原文并翻译 | 示例
           

摘要

Binary classification tasks are among the most important ones in the field of machine learning. One prominent approach to address such tasks are support vector machines which aim at finding a hyperplane separating two classes well such that the induced distance between the hyperplane and the patterns is maximized. In general, sufficient labeled data is needed for such classification settings to obtain reasonable models. However, labeled data is often rare in real-world learning scenarios while unlabeled data can be obtained easily. For this reason, the concept of support vector machines has also been extended to semi- and unsupervised settings: in the unsupervised case, one aims at finding a partition of the data into two classes such that a subsequent application of a support vector machine leads to the best overall result. Similarly, given both a labeled and an unlabeled part, semi-supervised support vector machines favor decision hyperplanes that lie in a low density area induced by the unlabeled training patterns, while still considering the labeled part of the data. The associated optimization problems for both the semi- and unsupervised case, however, are of combinatorial nature and, hence, difficult to solve. In this work, we present efficient implementations of simple local search strategies for (variants of) the both cases that are based on matrix update schemes for the intermediate candidate solutions. We evaluate the performances of the resulting approaches on a variety of artificial and real-world data sets. The results indicate that our approaches can successfully incorporate unlabeled data. (The unsupervised case was originally proposed by Gieseke F, Pahikkala et al. (2009). The derivations presented in this work are new and comprehend the old ones (for the unsupervised setting) as a special case.)
机译:二进制分类任务是机器学习领域中最重要的任务。解决这些任务的一种突出方法是支持向量机,其目的是找到能很好地将两类分开的超平面,从而使超平面和图案之间的感应距离最大化。通常,此类分类设置需要足够的标记数据以获得合理的模型。但是,在现实世界的学习场景中,带标签的数据通常很少见,而可以轻松获得未标记的数据。因此,支持向量机的概念也已扩展到半监督和无监督的设置:在无监督的情况下,旨在将数据划分为两类,以便支持向量机的后续应用导致最好的整体效果。类似地,给定标记和未标记的部分,半监督支持向量机偏向于决策超平面,该决策超平面位于由未标记的训练模式引起的低密度区域中,同时仍在考虑数据的标记部分。然而,对于半监督和无监督情况,相关的优化问题具有组合性质,因此难以解决。在这项工作中,我们针对两种情况(的变种)提出了简单的局部搜索策略的有效实现,这两种情况都基于针对中间候选解决方案的矩阵更新方案。我们在各种人工和现实数据集上评估所得方法的性能。结果表明,我们的方法可以成功地合并未标记的数据。 (无监督案例最初是由Gieseke F,Pahikkala等人(2009)提出的。此工作中介绍的推导是新的,并且是旧案例(对于无监督环境而言)是特例。)

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号