A bi-directional sampling based on K-means method for imbalance text classification

机译：基于K-means方法的双向采样不平衡文本分类

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper studies the imbalanced data classify-cation problem and proposes bi-directional sampling based on clustering (BDSK) for the imbalanced data classification. This algorithm combines SMOTE over-sampling algorithm and under-sampling algorithm based on K-Means to solve the within-class imbalance problem and the between-class imbalance problem. It not only avoid induce too much noise but also resolve the problem of shortage of sample. Experimental results on Tan corpus dataset show that the algorithm can effectively improve the classification performance on imbalanced data sets, especially in the cases when classification performance is heavily affected by class imbalance.

机译：本文研究了不平衡数据分类问题，并提出了基于聚类的双向采样（BDSK）进行不平衡数据分类。该算法结合了基于K-Means的SMOTE过采样算法和欠采样算法，解决了类内不平衡问题和类间不平衡问题。它不仅避免了产生过多的噪声，而且解决了样品不足的问题。 Tan语料库数据集的实验结果表明，该算法可以有效地提高不平衡数据集的分类性能，特别是在分类性能严重受类不平衡影响的情况下。

著录项

来源
《IEEE/ACIS International Conference on Computer and Information Science》|2016年|1-5|共5页
会议地点
作者
Jia Song; Xianglin Huang; Sijun Qin; Qing Song;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Classification algorithms; Clustering algorithms; Bidirectional control; Text categorization; Artificial intelligence; Partitioning algorithms;

机译：分类算法;聚类算法;双向控制;文本分类;人工智能;分区算法;

相似文献

外文文献
中文文献
专利

1. Sample cutting method for imbalanced text sentiment classification based on BRC [J] . Suge Wang, Deyu Li, Lidong Zhao, Knowledge-Based Systems . 2013,第JANa期

机译：基于BRC的不平衡文本情感分类的样本切割方法
2. Improving imbalanced scientific text classification using sampling strategies and dictionaries [J] . Lourdes Borrajo, Rubén Romero, Eva Lorenzo Iglesias, Journal of Integrative Bioinformatics . 2011,第3期

机译：使用采样策略和词典改善不平衡的科学文本分类
3. Improving imbalanced scientific text classification using sampling strategies and dictionaries [J] . L. Borrajo, R. Romero, E. L. Iglesias, Journal of Integrative Bioinformatics . 2011,第3期

机译：使用采样策略和词典改进不平衡的科学文本分类
4. A bi-directional sampling based on K-means method for imbalance text classification [C] . Jia Song, Xianglin Huang, Sijun Qin, IEEE/ACIS International Conference on Computer and Information Science . 2016

机译：基于K-均值文本分类的K均值方法的双向采样
5. Combining text-, link-, and classification-based retrieval methods to enhance information discovery on the Web. [D] . Yang, Kiduk. 2002

机译：结合基于文本，链接和分类的检索方法，以增强Web上的信息发现能力。
6. Sentimental text mining based on an additional features method for text classification [O] . Ching-Hsue Cheng, Hsien-Hsiu Chen -1

机译：基于附加特征方法的情感文本挖掘
7. The Research of Imbalanced Data Set of Sample Sampling Method Based on K-Means Cluster and Genetic Algorithm [O] . Yong Yang 2012

机译：基于K-Means聚类和遗传算法的样本采样方法不平衡数据集研究

A bi-directional sampling based on K-means method for imbalance text classification

摘要

著录项

相似文献

相关主题

期刊订阅