首页> 外文会议>IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining >Are they Our Brothers? Analysis and Detection of Religious Hate Speech in the Arabic Twittersphere
【24h】

Are they Our Brothers? Analysis and Detection of Religious Hate Speech in the Arabic Twittersphere

机译:他们是我们的兄弟吗?阿拉伯语Twittersphere宗教仇恨言论的分析与检测

获取原文

摘要

Religious hate speech in the Arabic Twittersphere is a notable problem that requires developing automated tools to detect messages that use inflammatory sectarian language to promote hatred and violence against people on the basis of religious affiliation. Distinguishing hate speech from other profane and vulgar language is quite a challenging task that requires deep linguistic analysis. The richness of the Arabic morphology and the limited available resources for the Arabic language make this task even more challenging. To the best of our knowledge, this paper is the first to address the problem of identifying speech promoting religious hatred in the Arabic Twitter. In this work, we describe how we created the first publicly available Arabic dataset annotated for the task of religious hate speech detection and the first Arabic lexicon consisting of terms commonly found in religious discussions along with scores representing their polarity and strength. We then developed various classification models using lexicon-based, n-gram-based, and deep-learning-based approaches. A detailed comparison of the performance of different models on a completely new unseen dataset is then presented. We find that a simple Recurrent Neural Network (RNN) architecture with Gated Recurrent Units (GRU) and pre-trained word embeddings can adequately detect religious hate speech with 0.84 Area Under the Receiver Operating Characteristic curve (AUROC).
机译:阿拉伯语Twittersphere的宗教仇恨是一个值得注意的问题,需要开发自动化工具,以检测使用炎症蜘蛛语言在宗教信仰的基础上促进仇恨和暴力侵害人民的消息。从其他亵渎和粗俗语言中区分仇恨言论是一个挑战性的任务,需要深入的语言学分析。阿拉伯语形态的丰富性和阿拉伯语的有限可用资源使得这项任务更具挑战性。据我们所知,本文是第一个解决识别宣传宗教仇恨在阿拉伯语推特中的问题的问题。在这项工作中,我们描述了我们如何为宗教仇恨语音检测任务创建第一个公开的阿拉伯语数据集,以及由宗教讨论中常见的术语组成的第一个阿拉伯词典以及代表其极性和力量的分数。然后,我们使用基于Lexicon的基于N-GRAM的和基于深度学习的方法开发了各种分类模型。然后提出了详细的不同模型在完全新的未完成的DateSet上的性能的比较。我们发现,具有门控经常性单元(GRU)和预训练的单词嵌入的简单复发性神经网络(RNN)架构可以充分检测在接收器操作特征曲线(AUROC)下的0.84区域的宗教仇恨语音。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号