首页> 外文会议>Conference on computational natural language learning >Studying Generalisability Across Abusive Language Detection Datasets
【24h】

Studying Generalisability Across Abusive Language Detection Datasets

机译:研究各种滥用语言检测数据集的通用性

获取原文

摘要

Work on Abusive Language Detection has tackled a wide range of subtasks and domains. As a result of this, there exists a great deal of redundancy and non-generalisability between datasets. Through experiments on cross-dataset training and testing, the paper reveals that the preconceived notion of including more non-abusive samples in a dataset (to emulate reality) may have a detrimental effect on the generalisability of a model trained on that data. Hence a hierarchical annotation model is utilised here to reveal redundancies in existing datasets and to help reduce redundancy in future efforts.
机译:滥用语言检测的工作已经解决了各种子组织和域。结果,数据集之间存在很大的冗余和不可恒大性。通过对跨数据集培训和测试的实验,涉及在数据集中包括更多非滥用样本(以模拟现实)的前进的概念可能对对该数据训练的模型的可不可行性产生不利影响。因此,这里利用分层注释模型来揭示现有数据集中的冗余,并帮助减少未来的努力中的冗余。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号