【24h】

Large Scale and Parallel Sentiment Analysis Based on Label Propagation in Twitter Data

机译:基于Twitter数据标签传播的大规模并行情感分析

获取原文
获取原文并翻译 | 示例

摘要

Sentiment analysis is a promising branch in natural language processing, but it becomes challenging when dealing with data from Twitter due to the big volume, rapidly changing language style and a lack of training data. As a result, it is difficult to utilize the traditional lexicon-based approach and supervised learning method for the problems mentioned above. In this paper, we propose the label propagation algorithm in order to solve the last two problems based on graph structure and apply GraphX, an API in Spark framework for graph parallel computing, to address the first problem. The results show that the label propagation algorithm is robust and scalable in our parallel implementation. Meanwhile, our approach which utilizes the lexicon and noisy label like emoticons outperform the baseline significantly. For the future works, we plan to test more algorithms in clusters and optimize the way of taking advantage of the social network by adding a community detection procedure before the classification to improve the accuracy.
机译:情感分析在自然语言处理中是一个很有前途的分支,但是由于处理量很大,语言风格迅速变化且缺少训练数据,因此在处理来自Twitter的数据时变得充满挑战。结果,对于上述问题,难以利用传统的基于词典的方法和监督学习方法。在本文中,我们提出标签传播算法,以解决基于图结构的最后两个问题,并应用Spark框架中的API GraphX进行图并行计算,以解决第一个问题。结果表明,在我们的并行实现中,标签传播算法具有鲁棒性和可扩展性。同时,我们使用诸如表情符号之类的词典和嘈杂标签的方法明显优于基线。对于未来的工作,我们计划在分类中测试更多算法,并通过在分类之前添加社区检测程序来优化利用社交网络的方式,以提高准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号