首页> 外文会议>IEEE International Conference on Cloud Computing and Big Data Analysis >Feature selection algorithm for hierarchical text classification using Kullback-Leibler divergence
【24h】

Feature selection algorithm for hierarchical text classification using Kullback-Leibler divergence

机译:基于Kullback-Leibler散度的层次文本分类特征选择算法

获取原文

摘要

Text classification, a simple and effective method, is considered as the key technology to deal with and organize a large amount of text data. At present, the simple text classification is unable to meet the increasing of user's demand, hierarchical text classification has received extensive attention and has broad application prospects. Hierarchical feature selection algorithm is the key technology of hierarchical text automatic classification, and the general method mainly aims at the individual feature selection of each class in the class hierarchy, and ignores the correlation between the parent and child class. This paper proposes a feature selection method based on KL divergence, measure the correlation between the class and subclasses by the KL divergence, calculate the correlation between each feature and sub class by Mutual Information method, measure the importance of subclasses characteristics using Term Frequency probability, to select the better discrimination set of features for parent class node. In this paper, we used hierarchical feature selection method and SVM classifiers for the hierarchical text categorization task on two corpora. Experiments showed the algorithm we proposed was effective, compared with the χ statistic (CHI), information gain (IG), and mutual information (MI) that were used directly to select hierarchical feature.
机译:文本分类是一种简单有效的方法,被认为是处理和组织大量文本数据的关键技术。目前,简单文本分类已经不能满足用户需求的增长,分层文本分类受到了广泛的关注,具有广阔的应用前景。层次特征选择算法是层次文本自动分类的关键技术,通用方法主要针对类层次中每个类的个体特征选择,而忽略了父子类之间的相关性。本文提出了一种基于KL散度的特征选择方法,通过KL散度测量类与子类之间的相关性,通过互信息方法计算每个特征与子类之间的相关性,使用术语频率概率来测量子类特征的重要性,为父类节点选择更好的区分特征集。在本文中,我们使用层次特征选择方法和SVM分类器对两个语料库进行层次文本分类任务。实验表明,与直接用于选择层次特征的χ统计量(CHI),信息增益(IG)和互信息(MI)相比,我们提出的算法是有效的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号