首页> 外文期刊>Journal of Multimedia >Chinese Short-Text Classification Based on Topic Model with High-Frequency Feature Expansion
【24h】

Chinese Short-Text Classification Based on Topic Model with High-Frequency Feature Expansion

机译:基于主题模型和高频特征扩展的中文短文本分类

获取原文
获取原文并翻译 | 示例
           

摘要

Short text differs from traditional documents in its shortness and sparseness. Feature extension can ease the problem of high sparseness in the vector space model, but it inevitably introduces noise. To resolve this problem, this paper proposes a high-frequency feature expansion method based on a latent Dirichlet allocation (LDA) topic model. High-frequency features are extracted from each category as the feature space, using LDA to derive latent topics from the corpus, and topic words are extended to the short text. Extensive experiments are conducted on Chinese short messages and news titles. The proposed method for classifying Chinese short texts outperforms conventional classification methods.
机译:短文本与传统文献的不同之处在于其简短和稀疏。特征扩展可以缓解向量空间模型中的稀疏问题,但不可避免地会引入噪声。为了解决这个问题,本文提出了一种基于潜在狄利克雷分配(LDA)主题模型的高频特征扩展方法。从每个类别中提取高频特征作为特征空间,使用LDA从语料库中提取潜在主题,并将主题词扩展到短文本。对中文短信和新闻标题进行了广泛的实验。所提出的中文短文本分类方法优于传统的分类方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号