首页> 外文会议>International Symposium on Computational and Business Intelligence >Text classification using combined sparse representation classifiers and support vector machines
【24h】

Text classification using combined sparse representation classifiers and support vector machines

机译:使用组合稀疏表示分类器和支持向量机的文本分类

获取原文

摘要

Text classification is an important task in managing huge repository of textual content prevailing in various domains. In this paper, we propose to use sparse representation classifier (SRC) and support vector machines (SVMs) based classifiers using frequency-based kernels for text classification. We consider term-frequency (TF) representation for a text document. The sparse representation of an example is obtained by using an overcomplete dictionary made up of TF vectors corresponding to all the training documents [1]. We propose to seed the dictionary using principal components of TF vector representation corresponding to training text documents. SVM-based text classifiers use linear kernel or Gaussian kernel on the TF vector representation of documents. TF representation being a non-negative, histogram representation, we propose to build SVM-based text classifiers using frequency-based kernels such as histogram intersection kernel, Chi-square (X2) kernel and Hellinger's kernel. It is observed that the examples misclassified by one classifier is correctly classified in another classifier. To take advantage of the various classifiers, we introduce an approach to combine classifiers to improve the performance of text classification. The effectiveness of all the proposed techniques for text classification is demonstrated on 20 Newsgroup Corpus.
机译:文本分类是管理各个域中主要的文本内容的庞大存储库的重要任务。在本文中,我们建议使用基于频率的内核进行文本分类的基于频率的内核使用基于稀疏表示分类器(SRC)和支持矢量机(SVM)的分类器。我们考虑文本文档的术语频率(TF)表示。通过使用与所有训练文件对应的TF向量组成的过度符合的字典来获得示例的稀疏表示[1]。我们建议使用与训练文本文档相对应的TF矢量表示的主要组件来种子字典。基于SVM的文本分类器在文档的TF矢量表示上使用线性内核或高斯内核。 TF表示是非负,直方图表示,我们建议使用基于频率的内核(如直方图交叉核,Chi-Square(x2)内核和Hellinger内核)构建基于SVM的文本分类器。观察到,在另一分类器中正确分类了一个分类器错误分类的实施例。为了利用各种分类器,我们介绍一种方法来组合分类器来提高文本分类的性能。在20个新闻组语料库上证明了所有提出的文本分类技术的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号