首页> 外文期刊>Pattern recognition letters >A probabilistic model derived term weighting scheme for text classification
【24h】

A probabilistic model derived term weighting scheme for text classification

机译:一种基于概率模型的文本分类术语加权方案

获取原文
获取原文并翻译 | 示例
       

摘要

Term weighting is known as a text presentation strategy to assign appropriate value to each term to improve the performance of text classification in the task of transforming the content of textual document into a vector in the term space. Supervised weighting methods using the information on the membership of training documents in predefined classes are naturally expected to provide better results than the unsupervised ones. In this paper, a new weighting scheme is proposed via a matching score function based on a probabilistic model. We introduce a latent variable to indicate whether a term contains text classification information or not, specify conjugate priors and exploit the conjugacy by integrating out the latent indicator and the parameters. Then the non-discriminating terms can be assigned weights close to 0. Experimental results using kNN and SVM classifiers illustrate the effectiveness of the proposed approach on both small and large text data sets. (C) 2018 Published by Elsevier B.V.
机译:术语加权是一种文本表示策略,可以为每个术语分配适当的值,以在将文本文档的内容转换为术语空间中的向量的任务中提高文本分类的性能。使用预定义类中的培训文档成员资格信息的监督加权方法自然会比无监督监督方法提供更好的结果。本文基于概率模型,通过匹配评分函数提出了一种新的加权方案。我们引入一个潜在变量来指示术语是否包含文本分类信息,指定共轭先验并通过整合潜在指示符和参数来利用共轭。然后可以为非区分词分配接近0的权重。使用kNN和SVM分类器的实验结果说明了该方法在小型和大型文本数据集上的有效性。 (C)2018由Elsevier B.V.发布

著录项

  • 来源
    《Pattern recognition letters》 |2018年第15期|23-29|共7页
  • 作者单位

    Northeast Normal Univ, Sch Comp Sci & Informat Technol, Key Lab Intelligent Informat Proc Jilin Univ, Changchun 130117, Jilin, Peoples R China;

    Dongbei Univ Finance & Econ, Sch Stat, Dalian 116025, Peoples R China;

    Northeast Normal Univ, Sch Comp Sci & Informat Technol, Key Lab Intelligent Informat Proc Jilin Univ, Changchun 130117, Jilin, Peoples R China;

    Northeast Normal Univ, Sch Comp Sci & Informat Technol, Key Lab Intelligent Informat Proc Jilin Univ, Changchun 130117, Jilin, Peoples R China;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Latent feature selection indicator; Matching score function; Naive Bayes; Supervised term weighting; Text classification;

    机译:潜在特征选择指标;匹配得分函数;朴素贝叶斯;监督词权重;文本分类;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号