首页> 外文期刊>Expert Systems with Application >Rough set and ensemble learning based semi-supervised algorithm for text classification
【24h】

Rough set and ensemble learning based semi-supervised algorithm for text classification

机译:基于粗糙集和集成学习的半监督文本分类算法

获取原文
获取原文并翻译 | 示例
           

摘要

Text classification has received more and more attention due to the enormous growth of digital content available on-line. This paper investigates the design of two-class text classifiers using positive and unla-beled data only. The specialty of this problem is that there is no labeled negative example for learning, which makes traditional text classification techniques inapplicable. In this paper, a novel semi-supervised classification algorithm based on tolerance rough set and ensemble learning is proposed. Tolerance rough set theory is used to approximate concepts existed in documents and extract an initial set of negative example. Then, SVM, Rocchio and Naive Bayes algorithms are used as base classifiers to construct an ensemble classifier, which runs iteratively and exploits margins between positive and negative data to progressively improve the approximation of negative data. Thus, the class boundary eventually converges to the true boundary of the positive class in the feature space. An experimental evaluation of different methods is carried out on two common text corpora, i.e., the Reuters-21578 collection and the WebKB collection. The experimental results indicate that the proposed method achieves significant performance improvement.
机译:由于在线提供的数字内容的巨大增长,文本分类受到越来越多的关注。本文研究仅使用正数和纯正数据的两类文本分类器的设计。这个问题的特长是没有标记的负面学习范例,这使得传统的文本分类技术不适用。提出了一种基于容差粗糙集和集成学习的新型半监督分类算法。容差粗糙集理论用于近似文档中存在的概念并提取初始的否定示例集。然后,将SVM,Rocchio和Naive Bayes算法用作基础分类器,以构建整体分类器,该分类器迭代运行并利用正负数据之间的边距逐步改善负数据的逼近度。因此,类别边界最终会收敛到特征空间中正类别的真实边界。对两种常见的文本语料库,即Reuters-21578集合和WebKB集合,进行了不同方法的实验评估。实验结果表明,该方法取得了明显的性能提高。

著录项

  • 来源
    《Expert Systems with Application》 |2011年第5期|p.6300-6306|共7页
  • 作者单位

    College of Information and Management Science, HeNan Agricultural University, Zhengzhou 450002, China;

    College of Information and Management Science, HeNan Agricultural University, Zhengzhou 450002, China;

    College of Information and Management Science, HeNan Agricultural University, Zhengzhou 450002, China;

    Zhengzhou Commodity Exchange, Zhengzhou 450008, China;

    Department of Computer Science and Engineering, Dalian Nationalities University, Dalian 116600, China;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    text classification; rough set; ensemble learning; semi-supervised classification;

    机译:文字分类粗糙集整体学习;半监督分类;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号