【24h】

Selecting Samples and Features for SVM Based on Neighborhood Model

机译:基于邻域模型的支持向量机样本和特征选择

获取原文
获取原文并翻译 | 示例

摘要

Support vector machine (SVM) is a class of popular learning algorithms for good generalization. However, it is time-consuming in training SVM with a large set of samples. How to improve learning efficiency is one of the most important research tasks. It is known although there are many candidate training samples in learning tasks only the samples near decision boundary have influence on classification hyperplane. Finding these samples and training SVM with them may greatly decrease time and space complexity in training. Based on the observation, we introduce neighborhood based rough set model to search boundary samples. With the model, we divide a sample space into two subsets: positive region and boundary samples. What's more, we also partition the features into several subsets: strongly relevant features, weakly relevant and indispensable features, weakly relevant and superfluous features and irrelevant features. We train SVM with the boundary samples in the relevant and indispensable feature subspaces, therefore simultaneous feature and sample selection is conducted with the proposed model. Some experiments are performed to test the proposed method. The results show that the model can select very few features and samples for training; and the classification performances are kept or improved.
机译:支持向量机(SVM)是一类流行的学习算法,可以很好地进行泛化。但是,在训练带有大量样本的SVM时非常耗时。如何提高学习效率是最重要的研究任务之一。众所周知,尽管学习任务中有许多候选训练样本,但只有决策边界附近的样本才对分类超平面产生影响。找到这些样本并使用它们训练SVM可以大大减少训练中的时间和空间复杂度。基于观察,我们引入了基于邻域的粗糙集模型来搜索边界样本。使用该模型,我们将样本空间分为两个子集:正区域样本和边界样本。此外,我们还将特征划分为几个子集:高度相关的特征,弱相关和必不可少的特征,弱相关和多余的特征以及不相关的特征。我们在相关且必不可少的特征子空间中用边界样本训练SVM,因此,使用所提出的模型进行特征和样本的同时选择。进行了一些实验以测试该方法。结果表明,该模型只能选择很少的特征和样本进行训练。分类性能得以保持或提高。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号