Hate Speech Dataset from a White Supremacy Forum

机译：讨厌来自白色至上的论坛的演讲数据集

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Hate speech is commonly defined as any communication that disparages a target group of people based on some characteristic such as race, colour, ethnicity, gender, sexual orientation, nationality, religion, or other characteristic. Due to the massive rise of user-generated web content on social media, the amount of hate speech is also steadily increasing. Over the past years, interest in online hate speech detection and, particularly, the automation of this task has continuously grown, along with the societal impact of the phenomenon. This paper describes a hate speech dataset composed of thousands of sentences manually labelled as containing hate speech or not. The sentences have been extracted from Storm-front, a white supremacist forum. A custom annotation tool has been developed to carry out the manual labelling task which, among other things, allows the annotators to choose whether to read the context of a sentence before labelling it. The paper also provides a thoughtful qualitative and quantitative study of the resulting dataset and several baseline experiments with different classification models. The dataset is publicly available.

机译：仇恨言论通常被定义为基于种族，颜色，种族，性别，性取向，国籍，宗教或其他特征等一些特征来贬低目标群体的任何沟通。由于在社交媒体上的用户生成的网上内容大量升高，仇恨言论的数量也在稳步增加。在过去几年中，对在线仇恨讲话的兴趣，特别是，这项任务的自动化不断发展，以及这种现象的社会影响。本文介绍了由数千个句子组成的仇恨语音数据集，手动标记为包含仇恨语音。这些句子已从暴风母前提取，这是一个白色至高无上的论坛。已经开发了一种自定义注释工具来执行手动标记任务，其中包括注释器在标记之前选择是否读取句子的上下文。本文还提供了对所产生的数据集和几个基线实验的深思熟虑的定性和定量研究，具有不同的分类模型。数据集是公开可用的。

著录项

来源
《Conference on empirical methods in natural language processing》|2018年|xiii 170 p.|共10页
会议地点
作者
Ona de Gibert; Naiara Perez; Aitor Garcia-Pablos; Montse Cuadros;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类程序设计、软件工程;
关键词

相似文献

外文文献
中文文献
专利

1. How well do hate speech, toxicity, abusive and offensive language classification models generalize across datasets? [J] . Paula Fortuna, Juan Soler-Company, Leo Wanner Information Processing & Management . 2021,第3期

机译：仇恨言语，毒性，滥用和令人反感的语言分类模型如何概括到数据集？
2. AraCOVID19-MFH: Arabic COVID-19 Multi-label Fake News & Hate Speech Detection Dataset [J] . Mohamed Seghir Hadj Ameur, Hassina Aliane Procedia Computer Science . 2021,第a期

机译：Aracovid19-MFH：阿拉伯Covid-19多标签假新闻＆amp; 讨厌语音检测数据集
3. Rhetorical Analysis of Hate Speech: Case Study of Hate Speech Related to Ahok’s Religion Blasphemy Case [J] . Kurnia Arofah MediaTor . 2018,第1期

机译：仇恨言论的修辞分析：与阿霍克宗教亵渎案件有关的仇恨言论案例研究
4. Hate Speech Dataset from a White Supremacy Forum [C] . Ona de Gibert, Naiara Perez, Aitor Garcia-Pablos, Second workshop on abusive language online 2018 . 2018

机译：来自白人至上论坛的仇恨言论数据集
5. Online Hate Speech: An Analysis of Stance in White Supremacist and Neutral Websites. [D] . Freeman, Janelle. 2017

机译：在线仇恨言论：白人至上主义者和中性网站的立场分析。
6. Must we Defend Nazis?: Why the First Amendment Should Not Protect Hate Speech and White Supremacy [O] . Rodrigo Pablo Pérez 2020

机译：我们必须捍卫纳粹吗？：为什么第一次修正案不应该保护仇恨言论和白色至高无上

Hate Speech Dataset from a White Supremacy Forum

摘要

著录项

相似文献

相关主题

期刊订阅