【24h】

A Review on Sentiment Analysis Model for Chinese Weibo Text

机译:中国微博文本情感分析模型综述

获取原文

摘要

the technology of sentiment analysis about Chinese Weibo text is a complex and systematic model. In general situation, it includes 3 parts: data washing, word segmentation and feature extraction. Weibo text is an unstructured text and there are many non-standard contents in it. Therefore, it should be thoroughly data washing before feature extraction. Due to emoticon in Weibo text are very useful in sentiment analysis, thus, in data washing, all of Non-Chinese, with "@","#" character should be removed except emoticon. In word segmentation, related algorithms can be divided into three categories: based on string matching, based on understand and based on statistics[1]. In feature extraction, the Lexicon-based Model, Machine learning Model and deep learning Model usually was used. Through literature search, the paper found that grammar characteristic in Chinese Weibo text was fully considered and solved by program of Lexicon-based Model, sentiment word, for example, adverb of degree, no word and all kinds of Chinese sentence patterns. But, due to characteristic of poor generalization, the performance of Lexicon-based Model in experiment is not good. Therefore, performance the model should be continued to improve. For traditional machine learning, there are 2 mainly aspects of innovation: Simultaneous classifier (Adoboost+SVM) and Improvement of classical classification algorithm. One worth noted is that the performance of the some improve classifier (SVM, P Naïve Bayes) has not been verified in Chinese Weibo classification. For deep learning, now, the innovation point is mainly focus on Convolution layer and input attention mechanism. For the next step, YuanHejin think should input ensemble learning and attention mechanism should be improve. LuXin argue that the recognition performance about irony sentence with context in Weibo needs to improve. GaoWeiju think that individual sentiment space for each user in EMCNN model should be build.
机译:中文微博文本的情感分析技术是一个复杂而系统的模型。一般情况下,它包括3个部分:数据清洗,分词和特征提取。微博文本是非结构化文本,并且其中包含许多非标准内容。因此,应该在特征提取之前彻底清洗数据。由于微博中的表情符号在情感分析中非常有用,因此,在数据清洗中,除表情符号外,所有非中文(带有“ @”,“#”字符)都应删除。在分词中,相关算法可分为三类:基于字符串匹配,基于理解和基于统计[1]。在特征提取中,通常使用基于词典的模型,机器学习模型和深度学习模型。通过文献检索,发现基于词汇模型模型程序,情感词(如程度副词,无词)和各种汉语句型,充分考虑和解决了中文微博文本的语法特征。但是,由于泛化性差的特点,基于Lexicon的模型在实验中的性能不佳。因此,应该继续改进模型的性能。对于传统的机器学习,创新有两个主要方面:同时分类器(Adoboost + SVM)和经典分类算法的改进。值得一提的是,某些改进分类器(SVM,PNaïveBayes)的性能尚未在中国微博分类中得到验证。对于深度学习而言,现在的创新点主要集中在卷积层和输入注意机制上。对于下一步,袁和金认为应该投入整体学习,并且应该完善注意力机制。陆新认为,微博中具有上下文反讽句的识别性能有待提高。高卫菊认为,应该在EMCNN模型中为每个用户建立个人情感空间。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号