首页> 外文会议>IEEE International Conference on Computational Intelligence and Computing Research >An Efficient Filtered Classifier for Classification of Unseen Test Data in Text Documents
【24h】

An Efficient Filtered Classifier for Classification of Unseen Test Data in Text Documents

机译:一种有效的过滤分类器,用于对文本文档中看不见的测试数据进行分类

获取原文

摘要

Rapid development of information technology has increased the availability of Electronic documents and the task of automatic classification of e-documents play an important role for organizing the information in large data repositories. In addition, many researchers proposed various algorithms for classification, but these approaches need to filter the data before classification. Keeping these limitations, we address the problem of classifier for classification of unseen data in text documents where document data distribution is not homogeneous. In this study, we used a Filtered Classifier on text documents that has passed through an arbitrary filter. To classify the text documents C4.5 classifier was used, the structure of the filter is based on training and testing data sets that are processed by the filter without changing the structural behavior. In addition, Fayyad and Irani's discretization method is used as a preprocessing that discretize a range of numerical attributes in the text document data set into nominal attributes. For classification, we use C4.5 decision tree classifier. Four datasets such as CNAE-9, 20 Newsgroups, Twitter and Reuter-21578 were employed to test the unseen test documents and test the efficiency of the Filtered Classifier. Experimentation is described in detail and the results show improved classifier accuracy for classification.
机译:信息技术的飞速发展增加了电子文档的可用性,电子文档的自动分类任务对于在大型数据存储库中组织信息起着重要的作用。另外,许多研究人员提出了各种分类算法,但是这些方法需要在分类之前对数据进行过滤。保留这些限制,我们解决了文本数据中分布不均匀的文本文档中看不见的数据分类的分类器问题。在本研究中,我们对已通过任意过滤器的文本文档使用了过滤分类器。为了使用C4.5分类器对文本文档进行分类,过滤器的结构基于训练和测试数据集,这些数据集由过滤器处理而未更改结构行为。此外,Fayyad和Irani的离散化方法用作预处理,将文本文档数据集中的一系列数字属性离散化为名义属性。对于分类,我们使用C4.5决策树分类器。使用了四个数据集(例如CNAE-9、20个新闻组,Twitter和Reuter-21578)测试了看不见的测试文档并测试了过滤分类器的效率。详细描述了实验,结果表明分类器的分类精度提高了。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号