首页> 外文会议>International Conference on Information and Communication Systems >Multi-way sentiment classification of Arabic reviews
【24h】

Multi-way sentiment classification of Arabic reviews

机译:阿拉伯文评论的多方位情感分类

获取原文

摘要

The evolution of the Web and the appearance of new technologies led to the rise of new ways for the Internet users to express their opinions and feelings regarding different aspects of life. Such expressions are written in an unstructured way using natural languages. They hold a great deal of knowledge about the user's opinions and reactions on various subjects. As a result, a new field called Sentiment Analysis (SA) has come into existence to address the complicated task of extracting such opinions or sentiments from the massive pool of unstructured text available online. Traditional works on SA consider only two sentiments: positive and negative. Multi-way SA sentiment analysis consider sentiments expressed using a star or ranking system. E.g., in a 5-star ranking system, the user's opinion ranges from very negative (1 star) to very positive (5 stars). This version of SA is obviously much harder to handle which partly explains the limited number of works on it. Moreover, we focus in this work on the Arabic language, which is largely understudied compared to the English language. In this work, a new and relatively large Arabic dataset is used. The dataset, called the Large Arabic Book Reviews (LABR) dataset, is gathered from an online book reviews website. The objective of this work is to perform baseline experiments on this dataset by applying the Bag-Of-Words words coupled with the most popular classifiers. We also investigate the effect of stemming and balancing the dataset. The obtained accuracies are low confirming the intuition that the multi-way SA problem is very difficult and needs further attention.
机译:Web的发展和新技术的出现导致互联网用户表达新的方式来表达他们对生活各个方面的看法和感受。此类表达式是使用自然语言以非结构化方式编写的。他们对用户在各个主题上的观点和反应拥有丰富的知识。结果,一个名为情感分析(SA)的新领域应运而生,以解决从在线提供的大量非结构化文本库中提取意见或情感的复杂任务。关于SA的传统著作只考虑了两种情绪:积极和消极。多方SA情感分析考虑使用星级或排名系统表达的情感。例如,在5星评级系统中,用户的意见范围从非常负面(1星)到非常正面(5星)。这个版本的SA显然更难处理,部分解释了其上的作品数量有限。此外,我们的工作重点是阿拉伯语,与英语相比,阿拉伯语在很大程度上未被研究。在这项工作中,使用了一个新的且相对较大的阿拉伯数据集。该数据集称为大型阿拉伯书评(LABR)数据集,该数据集来自在线书评网站。这项工作的目的是通过应用Bag-Of-Words单词和最受欢迎的分类器在此数据集上进行基线实验。我们还研究了阻止和平衡数据集的影响。所获得的准确度很低,证实了直觉,即多路SA问题非常困难,需要进一步关注。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号