首页> 外文会议>2011 Third International Conference on Intelligent Human-Machine Systems and Cybernetics >A Feature Selection Simultaneously Based on Intra-category and Extra-Category for Text Categorization
【24h】

A Feature Selection Simultaneously Based on Intra-category and Extra-Category for Text Categorization

机译:基于类别内和类别外的文本分类同时选择特征

获取原文

摘要

Text categorization is an important means to process automatically the information which increases exponentially. But due to the high dimensionality of the text corpus, many sophisticated classifiers can not be efficiently and effectively used in text categorization. So feature selection has become a research focus in text categorization. In this paper, we proposed a new feature selection, named SIE, which simultaneously considers the number of documents that contain a feature in intra-category and extra-category. We compare the proposed method with four well known feature selections using two classification algorithms on two text corpora. The experiments show that the proposed method performs significantly better than information gain, orthogonal centroid feature selection and Poisson distribution, and produces comparable performance with ¦Ö2-statistic in terms of accuracy when Naïve Bayes classifier and Support Vector machines are used.
机译:文本分类是自动处理呈指数增长的信息的重要手段。但是由于文本语料库的高维性,许多复杂的分类器无法有效地用于文本分类。因此,特征选择已经成为文本分类的研究重点。在本文中,我们提出了一个名为SIE的新功能选择,该功能同时考虑了类别内和类别外包含某个功能的文档的数量。我们使用两种文本语料库上的两种分类算法,将所提出的方法与四种众所周知的特征选择进行了比较。实验表明,所提出的方法在性能上明显优于信息增益,正交质心特征选择和泊松分布,并且在使用朴素贝叶斯分类器和支持向量机的情况下,在精度上与Ö2统计量相比具有可比的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号