首页> 外文会议>Conference on empirical methods in natural language processing >Hate Speech Dataset from a White Supremacy Forum
【24h】

Hate Speech Dataset from a White Supremacy Forum

机译:讨厌来自白色至上的论坛的演讲数据集

获取原文

摘要

Hate speech is commonly defined as any communication that disparages a target group of people based on some characteristic such as race, colour, ethnicity, gender, sexual orientation, nationality, religion, or other characteristic. Due to the massive rise of user-generated web content on social media, the amount of hate speech is also steadily increasing. Over the past years, interest in online hate speech detection and, particularly, the automation of this task has continuously grown, along with the societal impact of the phenomenon. This paper describes a hate speech dataset composed of thousands of sentences manually labelled as containing hate speech or not. The sentences have been extracted from Storm-front, a white supremacist forum. A custom annotation tool has been developed to carry out the manual labelling task which, among other things, allows the annotators to choose whether to read the context of a sentence before labelling it. The paper also provides a thoughtful qualitative and quantitative study of the resulting dataset and several baseline experiments with different classification models. The dataset is publicly available.
机译:仇恨言论通常被定义为基于种族,颜色,种族,性别,性取向,国籍,宗教或其他特征等一些特征来贬低目标群体的任何沟通。由于在社交媒体上的用户生成的网上内容大量升高,仇恨言论的数量也在稳步增加。在过去几年中,对在线仇恨讲话的兴趣,特别是,这项任务的自动化不断发展,以及这种现象的社会影响。本文介绍了由数千个句子组成的仇恨语音数据集,手动标记为包含仇恨语音。这些句子已从暴风母前提取,这是一个白色至高无上的论坛。已经开发了一种自定义注释工具来执行手动标记任务,其中包括注释器在标记之前选择是否读取句子的上下文。本文还提供了对所产生的数据集和几个基线实验的深思熟虑的定性和定量研究,具有不同的分类模型。数据集是公开可用的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号