首页> 外文会议>IEEE/ACIS International Conference on Computer and Information Science >A bi-directional sampling based on K-means method for imbalance text classification
【24h】

A bi-directional sampling based on K-means method for imbalance text classification

机译:基于K-means方法的双向采样不平衡文本分类

获取原文

摘要

This paper studies the imbalanced data classify-cation problem and proposes bi-directional sampling based on clustering (BDSK) for the imbalanced data classification. This algorithm combines SMOTE over-sampling algorithm and under-sampling algorithm based on K-Means to solve the within-class imbalance problem and the between-class imbalance problem. It not only avoid induce too much noise but also resolve the problem of shortage of sample. Experimental results on Tan corpus dataset show that the algorithm can effectively improve the classification performance on imbalanced data sets, especially in the cases when classification performance is heavily affected by class imbalance.
机译:本文研究了不平衡数据分类问题,并提出了基于聚类的双向采样(BDSK)进行不平衡数据分类。该算法结合了基于K-Means的SMOTE过采样算法和欠采样算法,解决了类内不平衡问题和类间不平衡问题。它不仅避免了产生过多的噪声,而且解决了样品不足的问题。 Tan语料库数据集的实验结果表明,该算法可以有效地提高不平衡数据集的分类性能,特别是在分类性能严重受类不平衡影响的情况下。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号