NLP-CUET@DravidianLangTech-EACL2021: Offensive Language Detection from Multilingual Code-Mixed Text using Transformers

机译：nlp-cuet @ dravidianlangtech-eacl2021：使用变压器的多语言代码混合文本的攻击性语言检测

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The increasing accessibility of the internet facilitated social media usage and encouraged individuals to express their opinions liberally. Nevertheless, it also creates a place for content polluters to disseminate offensive posts or contents. Most of such offensive posts are written in a cross-lingual manner and can easily evade the online surveillance systems. This paper presents an automated system that can identify offensive text from multilingual code-mixed data. In the task, datasets provided in three languages including Tamil. Malayalam and Kannada code-mixed with English where participants are asked to implement separate models for each language. To accomplish the tasks, we employed two machine learning techniques (LR, SVM), three deep learning (LSTM, LSTM+Attention) techniques and three transformers (m-BERT, Indic-BERT, XLM-R) based methods. Results show that XLM-R outperforms other techniques in Tamil and Malayalam languages while m-BERT achieves the highest score in the Kannada language. The proposed models gained weighted f_1 score of 0.76 (for Tamil), 0.93 (for Malayalam), and 0.71 (for Kannada) with a rank of 3rd. 5th and 4th respectively.

机译：互联网的可达性促进了社交媒体的可行性，并鼓励个人自由地表达他们的意见。尽管如此，它还为内容污染者创造了一个冒险职位或内容的地方。大多数如此进攻柱以交叉方式编写，并且可以轻松避免在线监控系统。本文介绍了一种自动化系统，可以从多语言代码混合数据中识别令人反感的文本。在任务中，数据集以三种语言提供，包括泰米尔。 Malayalam和Kannada Code-Mix与英语混合在其中，要求参与者为每种语言实施单独的模型。为完成任务，我们采用两台机器学习技术（LR，SVM），三个深度学习（LSTM，LSTM +注意）技术和三种变压器（M-BERT，Aden-Bert，XLM-R）的方法。结果表明，XLM-R优于泰米尔和马拉雅拉姆语言的其他技术，而M-BERT达到了Kannada语言的最高分。拟议模型加权F_1得分为0.76（泰米尔），0.93（马拉雅拉姆），0.71（适用于Kannada），排名第3。分别为5和4。

著录项

来源
《Workshop on Speech and Language Technologies for Dravidian Languages》|2021年|255-261|共7页
会议地点
作者
Omar Sharif; Eftekhar Hossain; Mohammed Moshiul Hoque;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Language identification framework in code-mixed social media text based on quantum LSTM - the word belongs to which language? [J] . Modern Physics Letters, B. Condensed Matter Physics, Statistical Physics, Applied Physics . 2020,第6期

机译：基于量子LSTM的代码混合社交媒体文本中语言识别框架 - 这个词属于哪种语言？
2. AUTOMATIC TEXT SUMMARIZATION OF INDIAN LANGUAGES: A MULTILINGUAL PROBLEM A REVIEW OF MULTILINGUAL SUMMARIZATION TECHNIQUES [J] . JOVI DSILVA, Dr. UZZAL SHARMA Journal of Theoretical and Applied Information Technology . 2019,第11期

机译：印度语的自动文本摘要：多语言问题—多语言摘要技术的回顾
3. AUTOMATIC TEXT SUMMARIZATION OF INDIAN LANGUAGES: A MULTILINGUAL PROBLEM A REVIEW OF MULTILINGUAL SUMMARIZATION TECHNIQUES [J] . JOVI DSILVA, Dr. UZZAL SHARMA Journal of Theoretical and Applied Information Technology . 2019,第11期

机译：印度语的自动文本摘要：多语言问题—多语言摘要技术的回顾
4. OFFLangOne@DravidianLangTech-EACL2021: Transformers with the Class Balanced Loss for Offensive Language Identification in Dravidian Code-Mixed text [C] . Suman Dowlagar, Radhika Mamidi Workshop on Speech and Language Technologies for Dravidian Languages . 2021

机译：Offlangone @ Dravidianlangtech-EACL2021：变形金刚在Dravidian代码混合文本中具有普通语言识别的均衡损失
5. A Domain Adaptation Approach for Offensive Language Detection with Bidirectional Transformers [D] . Singh, Sumer. 2020

机译：双向变压器攻击性语言检测的域适应方法
6. Intensity of Multilingual Language Use Predicts Cognitive Performance in Some Multilingual Older Adults [O] . Anna Pot, Merel Keijzer, Kees de Bot 2018

机译：使用多种语言的强度可以预测一些使用多种语言的成年人的认知能力
7. NULI at SemEval-2019 Task 6: Transfer Learning for Offensive Language Detection using Bidirectional Transformers [O] . Ping Liu, Wen Li, Liang Zou 2019

机译：Nuli在Semeval-2019任务6：使用双向变压器转移学习攻击语言检测

NLP-CUET@DravidianLangTech-EACL2021: Offensive Language Detection from Multilingual Code-Mixed Text using Transformers

摘要

著录项

相似文献

相关主题

期刊订阅