Studying Generalisability Across Abusive Language Detection Datasets

机译：研究各种滥用语言检测数据集的通用性

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Work on Abusive Language Detection has tackled a wide range of subtasks and domains. As a result of this, there exists a great deal of redundancy and non-generalisability between datasets. Through experiments on cross-dataset training and testing, the paper reveals that the preconceived notion of including more non-abusive samples in a dataset (to emulate reality) may have a detrimental effect on the generalisability of a model trained on that data. Hence a hierarchical annotation model is utilised here to reveal redundancies in existing datasets and to help reduce redundancy in future efforts.

机译：滥用语言检测的工作已经解决了各种子组织和域。结果，数据集之间存在很大的冗余和不可恒大性。通过对跨数据集培训和测试的实验，涉及在数据集中包括更多非滥用样本（以模拟现实）的前进的概念可能对对该数据训练的模型的可不可行性产生不利影响。因此，这里利用分层注释模型来揭示现有数据集中的冗余，并帮助减少未来的努力中的冗余。

著录项

来源
《Conference on computational natural language learning》|2019年|940-950|共11页
会议地点
作者
Steve Durairaj Swamy; Anupam Jamatia; Bjoern Gambaeck;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. A Dataset and Preliminaries Study for Abusive Language Detection in Indonesian Social Media [J] . Muhammad Okky Ibrohim, Indra Budi Procedia Computer Science . 2018,第1期

机译：印尼社交媒体中滥用语言检测的数据集和初步研究
2. How well do hate speech, toxicity, abusive and offensive language classification models generalize across datasets? [J] . Paula Fortuna, Juan Soler-Company, Leo Wanner Information Processing & Management . 2021,第3期

机译：仇恨言语，毒性，滥用和令人反感的语言分类模型如何概括到数据集？
3. Automatic Detection of Cyberbullying and Abusive Language in Arabic Content on Social Networks: A Survey [J] . Marwa Khairy, Tarek M. Mahmoud, Tarek Abd-El-Hafeez Procedia Computer Science . 2021,第a期

机译：在社交网络中的阿拉伯语内容中自动检测网络欺凌和滥用语言：调查
4. Studying Generalisability Across Abusive Language Detection Datasets [C] . Steve Durairaj Swamy, Anupam Jamatia, Bjoern Gambaeck Conference on computational natural language learning . 2019

机译：研究跨滥用语言检测数据集的恒定性
5. On Landsurface Change Detection Using LiDAR Datasets of Different Spatial Resolutions: A Case Study Examining Colorado's Waldo Canyon Fire Burn Scar [D] . Madsen, Michael James 2017

机译：基于不同空间分辨率的LiDAR数据集的地表变化检测：以科罗拉多州的沃尔多峡谷火烧伤疤为例的案例研究
6. A Study on the Application of Convolutional Neural Networks to Fall Detection Evaluated with Multiple Public Datasets [O] . Eduardo Casilari, Raúl Lora-Rivera, Francisco García-Lagos 2020

机译：卷积神经网络在多个公共数据集跌倒检测评估中的应用研究
7. Studying Generalisability across Abusive Language Detection Datasets [O] . Steve Durairaj Swamy, Anupam Jamatia, Björn Gambäck 2019

机译：研究跨滥用语言检测数据集的恒定性

Studying Generalisability Across Abusive Language Detection Datasets

摘要

著录项

相似文献

相关主题

期刊订阅