首页> 外文会议>Workshop on Speech and Language Technologies for Dravidian Languages >NLP-CUET@DravidianLangTech-EACL2021: Offensive Language Detection from Multilingual Code-Mixed Text using Transformers
【24h】

NLP-CUET@DravidianLangTech-EACL2021: Offensive Language Detection from Multilingual Code-Mixed Text using Transformers

机译:nlp-cuet @ dravidianlangtech-eacl2021:使用变压器的多语言代码混合文本的攻击性语言检测

获取原文

摘要

The increasing accessibility of the internet facilitated social media usage and encouraged individuals to express their opinions liberally. Nevertheless, it also creates a place for content polluters to disseminate offensive posts or contents. Most of such offensive posts are written in a cross-lingual manner and can easily evade the online surveillance systems. This paper presents an automated system that can identify offensive text from multilingual code-mixed data. In the task, datasets provided in three languages including Tamil. Malayalam and Kannada code-mixed with English where participants are asked to implement separate models for each language. To accomplish the tasks, we employed two machine learning techniques (LR, SVM), three deep learning (LSTM, LSTM+Attention) techniques and three transformers (m-BERT, Indic-BERT, XLM-R) based methods. Results show that XLM-R outperforms other techniques in Tamil and Malayalam languages while m-BERT achieves the highest score in the Kannada language. The proposed models gained weighted f_1 score of 0.76 (for Tamil), 0.93 (for Malayalam), and 0.71 (for Kannada) with a rank of 3rd. 5th and 4th respectively.
机译:互联网的可达性促进了社交媒体的可行性,并鼓励个人自由地表达他们的意见。尽管如此,它还为内容污染者创造了一个冒险职位或内容的地方。大多数如此进攻柱以交叉方式编写,并且可以轻松避免在线监控系统。本文介绍了一种自动化系统,可以从多语言代码混合数据中识别令人反感的文本。在任务中,数据集以三种语言提供,包括泰米尔。 Malayalam和Kannada Code-Mix与英语混合在其中,要求参与者为每种语言实施单独的模型。为完成任务,我们采用两台机器学习技术(LR,SVM),三个深度学习(LSTM,LSTM +注意)技术和三种变压器(M-BERT,Aden-Bert,XLM-R)的方法。结果表明,XLM-R优于泰米尔和马拉雅拉姆语言的其他技术,而M-BERT达到了Kannada语言的最高分。拟议模型加权F_1得分为0.76(泰米尔),0.93(马拉雅拉姆),0.71(适用于Kannada),排名第3。分别为5和4。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号