首页> 外国专利> SENTENCE VECTOR CALCULATION METHOD BASED ON CHI-SQUARE TEST, AND TEXT CLASSIFICATION METHOD AND SYSTEM

SENTENCE VECTOR CALCULATION METHOD BASED ON CHI-SQUARE TEST, AND TEXT CLASSIFICATION METHOD AND SYSTEM

机译:基于卡方检验的句子矢量计算方法,文本分类方法及系统

摘要

Disclosed are a sentence vector calculation method based on a chi-square test, and a text classification method and system, the method involving: carrying out word segmentation processing on the current text and removing stop words to obtain a word segmentation resu calculating a word vector of each word in the word segmentation resu calculating a chi-square value between each word vector and a preset category, and dividing the word vectors into feature words and non-feature words according to the chi-square values; calculating usage frequency of the feature words in the preset category, giving a first weight value to the feature words according to the usage frequency, and giving a second weight value to the non-feature words, wherein the first weight value is greater than the second weight value; and calculating a weighted mean value of all word vectors according to the word vectors of the feature words and the non-feature words and the corresponding weight values, and taking same as a sentence vector of the current text, thereby improving a weight value of the sentence vector in a feature dimension, reducing mutual interference between the word vectors in text information and greatly improving the accuracy of text classification.
机译:本发明公开了一种基于卡方检验的句子向量计算方法以及文本分类方法和系统,该方法包括:对当前文本进行分词处理,去除停用词,得到分词结果;计算分词结果中每个词的词向量;计算每个单词向量与预设类别之间的卡方值,并根据卡方值将单词向量分为特征词和非特征词;计算所述预设类别中的特征词的使用频率,根据所述使用频率为所述特征词赋予第一权重值,并为所述非特征词赋予第二权重值,其中,所述第一权重值大于所述第二权重值重量值根据特征词和非特征词的词向量以及相应的权重值,计算所有词向量的加权平均值,并将其作为当前文本的句子向量,从而提高词的权重值。在特征维度上的句子向量,减少了文本信息中词向量之间的相互干扰,大大提高了文本分类的准确性。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号