首页> 外文会议>International workshop on database and expert systems applications >Topic Identification of Noisy Arabic Texts Using Graph Approaches
【24h】

Topic Identification of Noisy Arabic Texts Using Graph Approaches

机译:使用图形方法识别嘈杂的阿拉伯语文本的主题

获取原文

摘要

This paper deals with the problem of automatic topic identification of noisy Arabic texts. Actually, there exist several works in this field based on statistical and machine learning approaches for different text categories. Unfortunately, most of the proposed methods are effective in clean and long texts. In this research work, we use an in-house dataset of noisy Arabic texts, which are collected from several Arabic discussion forums related to 6 topics. In this investigation, we propose a graph approach called LIGA for topic identification task. This approach was firstly introduced for language identification field. Moreover, we propose two other extensions in order to enhance LIGA performances. The experiments undergone on the Arabic dataset have shown quite interesting performances, reaching about 98% of accuracy.
机译:本文涉及自动主题识别嘈杂阿拉伯文本的问题。实际上,基于不同文本类别的统计和机器学习方法,此字段中存在多种作品。不幸的是,大多数提出的方法都在干净和长篇文本中有效。在这项研究工作中,我们使用嘈杂的阿拉伯语文本的内部数据集,该论坛收集到与6个主题相关的阿拉伯语讨论论坛。在这次调查中,我们提出了一种称为LIGA的图形方法,用于主题识别任务。首先介绍了语言识别领域的这种方法。此外,我们提出了另外两种延伸,以增强LIGA性能。在阿拉伯语数据集中经过的实验表现出相当有趣的性能,达到准确性的约98%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号