首页> 外文期刊>Concurrency, practice and experience >Research on improved text classification method based on combined weighted model
【24h】

Research on improved text classification method based on combined weighted model

机译:基于组合加权模型的改进文本分类方法研究

获取原文
获取原文并翻译 | 示例
           

摘要

Text classification is very important in information retrieval, but the traditional text classification model has many problems, such as the feature dimension disaster, the lack of semantic features, etc. Aiming at the problems, this paper proposes an improved TFIDF model combined with the Word2vec model for weighing word vectors. In view of the inability of the Word2vec model to distinguish the importance of words with the text, TFIDF is further introduced to weighing Word2vec word vectors to achieve a weighted Word2vec classification model. For data preprocessing, we optimized the traditional StringToWordVector algorithm. The main improvement of StringToWordVector is the introduction to a new algorithm of stem extraction. First, this paper gives a simple description of the basic steps and algorithms of traditional text classification, and then, the ideas and steps of the improved StringToWordVector algorithm are proposed. Finally, experimental results using our improved algorithm are tested for four different data sets (WEBO_SINA and three standard UCI data sets). The experimental results show that the improved StringToWordVector algorithm combined with the combined weighted model has higher classification accuracy, recall, and F1 values than the traditional text classification model only using the Word2vec model or using TFIDF. The experimental results are satisfactory.
机译:文本分类在信息检索中非常重要,但是传统的文本分类模型存在很多问题,例如特征维数灾难,语义特征缺乏等。针对这些问题,本文提出了一种结合Word2vec的改进的TFIDF模型。词向量的模型。鉴于Word2vec模型无法区分单词与文本的重要性,进一步引入了TFIDF对Word2vec单词向量进行加权,以实现加权的Word2vec分类模型。对于数据预处理,我们优化了传统的StringToWordVector算法。 StringToWordVector的主要改进是引入了一种新的词干提取算法。本文首先简单介绍了传统文本分类的基本步骤和算法,然后提出了改进的StringToWordVector算法的思路和步骤。最后,使用改进算法对四个不同数据集(WEBO_SINA和三个标准UCI数据集)进行了测试。实验结果表明,与仅使用Word2vec模型或使用TFIDF的传统文本分类模型相比,改进的StringToWordVector算法与组合加权模型相结合具有更高的分类准确度,召回率和F1值。实验结果令人满意。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号