首页> 外文会议>International Moratuwa Engineering Research Conference >ACTSEA: Annotated Corpus for Tamil Sinhala Emotion Analysis
【24h】

ACTSEA: Annotated Corpus for Tamil Sinhala Emotion Analysis

机译:actsea:泰米尔和僧伽罗大学的注释语料库

获取原文

摘要

The purpose of text emotion analysis is to detect and recognize the classification of feeling expressed in text. In recent years, there has been an increase in text emotion analysis studies for English language since data were abundant. Due to the growth of social media large amount data are now available for regional languages such as Tamil and Sinhala as well. However, these languages lack necessary annotated corpus for many NLP tasks including emotion analysis. In this paper, we present our scalable semi-automatic approach to create an annotated corpus named ACTSEA for Tamil and Sinhala to support emotion analysis. Alongside, our analysis on a sample of the produced data and the useful findings are presented for the low resourced NLP community to benefit. For ACTSEA, data were gathered from twitter platform and annotated manually after cleaning. We collected 600280 (Tamil) and 318308 (Sinhala) tweets in total which makes our corpus largest data collection which is currently available for these languages.
机译:文字情感分析的目的是检测和识别文本中感觉的分类。近年来,由于数据丰富,因此对英语进行了文本情感分析研究。由于社交媒体的增长,大量数据现在可用于泰米尔和僧伽达拉等区域语言。但是,这些语言缺乏必要的注释语料库,包括情感分析。在本文中,我们介绍了我们可扩展的半自动方法,以创建一个名为Actsea的泰米尔和僧伽拉的注释语料库,以支持情绪分析。除此之外,我们对所产生数据的样本和有用调查结果的分析是为了低资源的NLP社区受益。对于Actsea,数据从Twitter平台收集,清洁后手动注释。我们收集了600280(泰米尔)和318308(Sinhala)Tweets,这使得我们的语料库最大的数据收集目前可用于这些语言。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号