首页> 中文期刊> 《管理工程学报》 >基于IDSSL的文本情感分析研究

基于IDSSL的文本情感分析研究

         

摘要

随着社交媒体的不断普及,网络上出现了大量用户创造的文本信息.这类文本所包含的用户的观点、意见和态度等情感信息,对于互联网用户有着重要的作用,已受到越来越多的重视,并已提出大量有监督的文本情感分析方法来利用这些数据.但文本情感分析中存在大量无标记样本,如何利用大量无标记样本和少量有标记样本进行学习的问题,已成为了文本情感分析领域亟待解决的问题之一.为此,本文提出一种改进的半监督文本情感分析方法IDSSL(Improved Disagreement-based Semi-Supervised Learning).该方法以基于分歧的半监督方法为框架,首先利用Random Subspace的方式构建多个初始分类器,然后以"多数帮助少数"的方式利用无标记样本训练分类器.最后,在情感分析经典数据集上进行了实验,结果证明了本文提出的方法的有效性,而且取得了比其它半监督学习方法都好的实验结果.%With the growing popularity of social media, a large number of user generated content is posted on the Internet. These kinds of texts contain user's points of view, opinions and attitudes, which play an important role for Internet users. Researchers pay increased attention to user-generated content. Subsequently, a lot of supervised text sentiment analysis methods have been proposed to make use of this kind of data. However, there are a lot of unlabeled data in the sentiment analysis. How to use a large number of unlabeled data and a small amount of labeled data has become one of the urgent research problems in the area of sentiment analysis. Therefore, this paper proposed an Improved Disagreement-based Semi-Supervised Learning (IDSSL) method for text sentiment analysis, which is based on the framework of disagreement-based semi-supervised learning. Firstly, a model for sentiment analysis based on the disagreement-based semi-supervised learning was constructed. First of all, the disagreement-based semi-supervised learning was theoretically analyzed. The analysis found that the multiple-classifiers method is better than original disagreement-based semi-supervised learning method. On the other hand, diversity is the key value of the multiple-classifier disagreement-based semi-supervised learning method. Moreover, Random Subspace method can lead to diversity of the classifiers in the area of sentiment analysis. Therefore, we constructed a sentiment analysis model by combining multiple classifiers method produced with Random Subspace method, namely IDSSL method. IDSSL method consists of three steps: (1) multiple initial classifiers are built based on the Random Subspace method; (2) classifiers are trained by the rule of "majority help minority" to utilize the unlabeled instances; and (3) the base classifier was integrated in majority vote. Secondly, experiments were carried out using the classic datasets of sentiment analysis. The established standard measure in sentiment analysis was adopted to evaluate the performance of the proposed method. IDSSL method is compared with several disagreement-based semi-supervised learning method, including Self-training method, Co-training method, Tri-training method and Co-forest method. Self-training, Co-training, Tri-training, and IDSSL used SVM as base learner. To minimize the influence of variability in the training set, the 10-fold cross validation was performed five times on the sentiment analysis datasets. Finally, experimental results proved the effectiveness of our proposed method. Moreover, our proposed method obtained better results than the other semi-supervised learning methods, including Self-training method, Co-training method, Tri-training method, and Co-forest method. In addition, we also discuss different semi-supervised learning methods’ results, the influence of the label rate on semi-supervised learning methods, and the influence of the add-number on the IDSSL method.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号