...
首页> 外文期刊>Journal of supercomputing >An integrated semi-automated framework for domain-based polarity words extraction from an unannotated non-English corpus
【24h】

An integrated semi-automated framework for domain-based polarity words extraction from an unannotated non-English corpus

机译:一种集成的半自动框架,用于从未经发票的非英语语料库中提取基于域的极性词

获取原文
获取原文并翻译 | 示例
           

摘要

Building sentiment analysis resources is a fundamental step before developing any sentiment analysis model. Sentiment lexicons are one of these critical resources. However, many non-English languages suffer from a severe shortage of these resources and lexicons. This study proposes an integrated framework for extracting domain-based polarity words from unannotated massive non-English corpus. The framework consists of three layers, namely lexicon-based, corpus-based and human-based. The first two layers automatically recognize and extract new polarity words from a massive unannotated corpus using initial seed lexicons. A key advantage of the proposed framework is that it only needs an initial seed lexicon and unannotated corpus to start the extraction process. Therefore, the framework is semi-automated due to the use of seed lexicons. Experiments on three languages indicate the proposed framework outperformed existing lexicons, achieving F-scores of 77.8%, 83.8% and 68.6% for the Arabic, French and Malay lexicons, respectively.
机译:建筑物情绪分析资源是开发任何情感分析模型之前的基本步骤。情绪词典是这些关键资源之一。然而,许多非英语语言遭受这些资源和词典的严重短缺。本研究提出了一种综合框架,用于从未发布的大规模非英语语料库中提取基于域的极性词。该框架由三层组成,即基于词汇的基于语料库和基于人的人。前两层自动识别并使用初始种子词典从大规模的未解压语料库中提取新的极性单词。所提出的框架的一个关键优势是它只需要初始种子词典和未经发布的语料库来启动提取过程。因此,由于使用种子词典,框架是半自动化的。三种语言的实验表明,拟议的框架优于现有的词典,分别实现了阿拉伯,法国和马来词典的77.8%,83.8%和68.6%的F-scres。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号