首页> 外文期刊>Information Processing & Management >Paraphrase identification and semantic text similarity analysis in Arabic news tweets using lexical, syntactic, and semantic features
【24h】

Paraphrase identification and semantic text similarity analysis in Arabic news tweets using lexical, syntactic, and semantic features

机译:使用词汇,句法和语义特征的阿拉伯新闻推文中的释义识别和语义文本相似性分析

获取原文
获取原文并翻译 | 示例
           

摘要

The rapid growth in digital information has raised considerable challenges in particular when it comes to automated content analysis. Social media such as twitter share a lot of its users' information about their events, opinions, personalities, etc. Paraphrase Identification (PI) is concerned with recognizing whether two texts have the same/similar meaning, whereas the Semantic Text Similarity (STS) is concerned with the degree of that similarity. This research proposes a state-of-the-art approach for paraphrase identification and semantic text similarity analysis in Arabic news tweets. The approach adopts several phases of text processing, features extraction and text classification. Lexical, syntactic, and semantic features are extracted to overcome the weakness and limitations of the current technologies in solving these tasks for the Arabic language. Maximum Entropy (MaxEnt) and Support Vector Regression (SVR) classifiers are trained using these features and are evaluated using a dataset prepared for this research. The experimentation results show that the approach achieves good results in comparison to the baseline results.
机译:数字信息的快速增长带来了巨大的挑战,特别是在自动化内容分析方面。诸如twitter之类的社交媒体分享了许多用户的事件,观点,个性等信息。释义识别(PI)涉及识别两个文本是否具有相同/相似的含义,而语义文本相似性(STS)与这种相似程度有关。这项研究提出了一种用于阿拉伯新闻推文中释义识别和语义文本相似性分析的最新方法。该方法采用文本处理,特征提取和文本分类的多个阶段。提取词法,句法和语义特征,以克服当前技术在解决阿拉伯语任务时的弱点和局限性。使用这些功能训练最大熵(MaxEnt)和支持向量回归(SVR)分类器,并使用为此研究准备的数据集进行评估。实验结果表明,与基线结果相比,该方法取得了良好的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号