Learning from Labeled and Unlabeled Documents: A Comparative Study on Semi-Supervised Text Classification

机译：从有标签和无标签文档中学习：半监督文本分类的比较研究

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Supervised learning algorithms usually require large amounts of training data to learn reasonably accurate classifiers. Yet, for many text classification tasks, providing labeled training documents is expensive, while unlabeled documents are readily available in large quantities. Learning from both, labeled and unlabeled documents, in a semisupervised framework is a promising approach to reduce the need for labeled training documents. This paper compares three commonly applied text classifiers in the light of semi-supervised learning, namely a linear support vector machine, a similarity-based tfidf and a Naive Bayes classifier. Results on a real-world text datasets show that these learners may substantially benefit from using a large amount of unlabeled documents in addition to some labeled documents.

机译：监督学习算法通常需要大量的训练数据才能学习合理准确的分类器。但是，对于许多文本分类任务，提供带标签的培训文档非常昂贵，而无标签的文档很容易大量获得。在半监督的框架中从有标签和无标签的文档中学习是减少标签培训文档需求的一种有前途的方法。本文根据半监督学习比较了三种常用的文本分类器，即线性支持向量机，基于相似度的tfidf和朴素贝叶斯分类器。真实文本数据集上的结果表明，这些学习者除了可以使用一些未标记的文档以外，还可以通过使用大量未标记的文档而从中受益。

著录项

来源
《4th European Conference on Principles of Data Mining and Knowledge Discovery PKDD 2000 Lyon, France, September 13-16, 2000》|2000年|p.490-497|共8页
会议地点 Lyon(FR);Lyon(FR)
作者
Carsten Lanquillon;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类自动化技术、计算机技术;
关键词

相似文献

外文文献
中文文献
专利

1. Supervised and semi-supervised learning in text classification using enhanced KNN algorithm: a comparative study of supervised and semi-supervised classification in text categorisation [J] . M. A. Wajeed, T. Adilakshmi International Journal of Intelligent Systems Technologies and Applications . 2012,第3a4期

机译：使用增强型KNN算法的文本分类中的有监督和半监督学习：文本分类中有监督和半监督分类的比较研究
2. Semi-supervised Learning To Classify Evaluative Expressions From Labeled And Unlabeled Texts [J] . Yasuhiro SUZUKI, Hiroya TAKAMURA, Manabu OKUMURA IEICE Transactions on Information and Systems . 2007,第10期

机译：半监督学习从带标签的文本和不带标签的文本中对评估表达式进行分类
3. Text Classification from Labeled and Unlabeled Documents using EM [J] . KAMAL NIGAM, ANDREW KACHITES MCCALLUM, SEBASTIAN THRUN Machine Learning . 2000,第2a3期

机译：使用EM对标签和未标签文档进行文本分类
4. Learning from labeled and unlabeled documents: a comparative study on semi-supervised text classification [C] . Carsten Lanquillon European Conference on Principles and Practice of Knowledge Discovery in Databases . 2000

机译：从标签和未标记的文件学习：半监督文本分类的比较研究
5. Leveraging Label Information in Representation Learning for Multi-Label Text Classification [D] . Wu, Jiayu 2019

机译：在表示学习中利用标签信息进行多标签文本分类
6. Clinical Document Classification Using Labeled and Unlabeled Data Across Hospitals [O] . Hamed Hassanzadeh, Mahnoosh Kholghi, Anthony Nguyen, 2018

机译：跨医院使用标记和未标记数据的临床文件分类
7. Learning from Labeled and Unlabeled Documents: A Comparative Study on Semi-Supervised Text Classification [O] . Carsten Lanquillon 2000

机译：从标签和未标记的文件学习：半监督文本分类的比较研究

Learning from Labeled and Unlabeled Documents: A Comparative Study on Semi-Supervised Text Classification

摘要

著录项

相似文献

相关主题

期刊订阅