首页> 外国专利> METHOD FOR CLASSIFYING HIGH-DIMENSIONAL IMBALANCED DATA BASED ON SVM

METHOD FOR CLASSIFYING HIGH-DIMENSIONAL IMBALANCED DATA BASED ON SVM

机译:基于支持向量机的高维不平衡数据分类方法

摘要

A method for classifying high-dimensional imbalanced data based on SVM, comprising two parts. The first part is feature selection, involving: using an SVM-BRFE algorithm to resample a boundary to look for an optimal feature weight so as to carry out feature importance measurement, feature selection and training set update, and repeating the process. Finally, the feature most conductive to enhancing an F1 value is retained, and other features are removed, so that a subsequent training process is carried out in a situation with feature redundancy and irrelevant feature combination as less as possible and dimension as low as possible, thereby reducing the influence of a high dimension problem on an imbalance problem and the constraint over an SMOTE oversampling algorithm. The second part is data sampling, involving: using an improved SMOTE algorithm, i.e. a PBKS algorithm; considering to use minority classes in boundaries automatically partitioned by SVM as distance constraints in DHxij of a Hilbert space so as to replace an original constraint; and using a grid method to look for the approximate preimage. The method can stably and effectively complete the task of classifying high-dimensional unbalanced data, and can achieve a considerable effect.
机译:一种基于支持向量机的高维不平衡数据分类方法,包括两部分。第一部分是特征选择,包括:使用SVM-BRFE算法对边界重采样以寻找最佳特征权重,以进行特征重要性测量,特征选择和训练集更新,并重复该过程。最后,保留最有助于提高F1值的特征,并删除其他特征,以便在具有尽可能少的特征冗余和不相关的特征组合且尺寸尽可能小的情况下进行后续训练过程,从而减少了高维问题对不平衡问题的影响以及对SMOTE过采样算法的约束。第二部分是数据采样,涉及:使用改进的SMOTE算法,即PBKS算法;考虑在由SVM自动划分的边界中使用少数类作为希尔伯特空间的D H x ij 中的距离约束,以替换原始约束;并使用网格方法查找近似原像。该方法可以稳定有效地完成对高维不平衡数据的分类任务,并可以取得显着效果。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号