Tibetan Text Classification Method Based on BiLSTM Model

机译：基于BiLSTM模型的藏文文本分类方法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Text classification is a key technology in the field of information retrieval and data mining. It can effectively solve the problem of information clutter and locate effective information. This paper proposes a method of merging Word2vec and TF-IDF Tibetan text representation based on class frequence variance. Based on the representation method, BiLSTM network model is used to classify Tibetan text. First of all, it proposes to perform pre-processing work such as word segmentation on the Tibetan classification text, construction of a basic stop word list, and calculation of word frequency. Then the text representation uses the method of merging Word2vec and the TF-IDF algorithm based on class frequence variance, which takes into account both the importance of words and the distribution of words. Finally, the word vector is transmitted to the classification model to train the Tibetan text classifier, and the trained classifier is used to classify the unclassified Tibetan text. The experimental results show that the text representation method combined with Word2vec and TF-IDF based on class frequency variance can effectively improve the effect of text classification. The accuracy of Tibetan text classifier based on BiLSTM can reach 89.03%, which is significantly better than RNN LSTM.

机译：文本分类是信息检索和数据挖掘领域的关键技术。它可以有效解决信息混乱的问题，定位有效的信息。提出了一种基于类频率方差的Word2vec和TF-IDF藏文文本表示的融合方法。基于表示方法，使用BiLSTM网络模型对藏文进行分类。首先，它建议进行预处理工作，例如对藏文分类文本进行分词，构建基本的停用词表以及计算词频。然后，文本表示使用合并Word2vec的方法和基于类频率方差的TF-IDF算法，该方法同时考虑了单词的重要性和单词的分布。最后，将词向量传递给分类模型，训练藏文文本分类器，然后使用训练后的分类器对未分类的藏文文本进行分类。实验结果表明，基于类频差的Word2vec和TF-IDF相结合的文本表示方法可以有效地提高文本分类的效果。基于BiLSTM的藏文文本分类器的准确率可以达到89.03％，明显优于RNN LSTM。

著录项

来源
《International Conference on Artificial Intelligence and Electromechanical Automation》|2020年|27-31|共5页
会议地点
作者
Li Jia; Tao Jiang; Jia Hao Meng; TingTing Zhang;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Word2vec, class frequence variance, B- iLSTM, Tibetan text classification;

机译：Word2vec，班级频率差异，B- iLSTM，藏文分类;

相似文献

外文文献
中文文献
专利

1. Attention-based BiLSTM fused CNN with gating mechanism model for Chinese long text classification [J] . Jianfeng Deng, Lianglun Cheng, Zhuowei Wang Computer speech and language . 2021,第Jula期

机译：基于注意力的Bilstm融合CNN与中国长文本分类的门控机制模型
2. Tatt-BiLSTM:Web service classification with topical attention-based BiLSTM [J] . Kang Guosheng, Xiao Yong, Liu Jianxun, Concurrency and computation: practice and experience . 2021,第16期

机译：TATT-BILSTM：Web服务分类，具有基于局部关注的Bilstm
3. A Text Sentiment Classification Modeling Method Based on Coordinated CNN-LSTM-Attention Model [J] . Zhang Yangsen, Zheng Jia, Jiang Yuru, Chinese Journal of Electronics . 2019,第1期

机译：基于协同CNN-LSTM-注意模型的文本情感分类建模方法
4. Tibetan Word Segmentation Method Based on BiLSTM_ CRF Model [C] . Lili Wang, Hongwu Yang International conference on Asian language processing . 2018

机译：基于BiLSTM_ CRF模型的藏文分词方法
5. Combining text-, link-, and classification-based retrieval methods to enhance information discovery on the Web. [D] . Yang, Kiduk. 2002

机译：结合基于文本，链接和分类的检索方法，以增强Web上的信息发现能力。
6. Sentimental text mining based on an additional features method for text classification [O] . Ching-Hsue Cheng, Hsien-Hsiu Chen -1

机译：基于附加特征方法的情感文本挖掘
7. Self-Attention-Based BiLSTM Model for Short Text Fine-Grained Sentiment Classification [O] . Jun Xie, Bo Chen, Xinglong Gu, 2019

机译：基于自我关注的Bilstm模型，短文本细粒度情绪分类

Tibetan Text Classification Method Based on BiLSTM Model

摘要

著录项

相似文献

相关主题

期刊订阅