【24h】

A New Approach of Feature Selection for Text Categorization

机译:文本分类特征选择的新方法

获取原文
获取原文并翻译 | 示例

摘要

This paper proposes a new approach of feature selection based on the independent measure between features for text categorization. A fundamental hypothesis that occurrence of the terms in documents is independent of each other, widely used in the probabilistic models for text categorization (TO , is discussed. However, the basic hypothesis is incomplete for independence of feature set. From the view of feature selection, a new independent measure between features is designed, by which a feature selection algorithm is given to obtain a feature subset. The selected subset is high in relevance with category and strong in independence between features, satisfies the basic hypothesis at maximum degree. Compared with other traditional feature selection method in TC (which is only taken into the relevance account), the performance of feature subset selected by our method is prior to others with experiments on the benchmark dataset of 20 Newsgroups.
机译:本文提出了一种基于特征之间独立度量的特征选择方法,用于文本分类。讨论了术语在文档中的出现彼此独立的基本假设,该基本假设被广泛用于文本分类的概率模型中(TO)。但是,基本假设对于特征集的独立性是不完整的。设计了一种新的特征间独立度量,通过特征选择算法获得特征子集,所选择的子集与类别相关性强,特征间独立性强,最大程度满足了基本假设。在TC的其他传统特征选择方法中(仅考虑了相关性),通过对20个新闻组的基准数据集进行实验,我们的方法选择的特征子集的性能要优于其他方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号