首页> 外文会议>4th European Conference on Principles of Data Mining and Knowledge Discovery PKDD 2000 Lyon, France, September 13-16, 2000 >Learning from Labeled and Unlabeled Documents: A Comparative Study on Semi-Supervised Text Classification
【24h】

Learning from Labeled and Unlabeled Documents: A Comparative Study on Semi-Supervised Text Classification

机译:从有标签和无标签文档中学习:半监督文本分类的比较研究

获取原文
获取原文并翻译 | 示例

摘要

Supervised learning algorithms usually require large amounts of training data to learn reasonably accurate classifiers. Yet, for many text classification tasks, providing labeled training documents is expensive, while unlabeled documents are readily available in large quantities. Learning from both, labeled and unlabeled documents, in a semisupervised framework is a promising approach to reduce the need for labeled training documents. This paper compares three commonly applied text classifiers in the light of semi-supervised learning, namely a linear support vector machine, a similarity-based tfidf and a Naive Bayes classifier. Results on a real-world text datasets show that these learners may substantially benefit from using a large amount of unlabeled documents in addition to some labeled documents.
机译:监督学习算法通常需要大量的训练数据才能学习合理准确的分类器。但是,对于许多文本分类任务,提供带标签的培训文档非常昂贵,而无标签的文档很容易大量获得。在半监督的框架中从有标签和无标签的文档中学习是减少标签培训文档需求的一种有前途的方法。本文根据半监督学习比较了三种常用的文本分类器,即线性支持向量机,基于相似度的tfidf和朴素贝叶斯分类器。真实文本数据集上的结果表明,这些学习者除了可以使用一些未标记的文档以外,还可以通过使用大量未标记的文档而从中受益。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号