HUB@DravidianLangTech-EACL2021: Identify and Classify Offensive Text in Multilingual Code Mixing in Social Media

机译：集线器@Dravidianlangtech-EACL2021：在社交媒体中的多语言代码混合中识别和分类攻击性文本

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper introduces the system description of the HUB team participating in Dravidian-LangTech - EACL2021: Offensive Language Identification in Dravidian Languages. The theme of this shared task is the detection of offensive content in social media. Among the known tasks related to offensive speech detection, this is the first task to detect offensive comments posted in social media comments in the Dravidian language. The task organizer team provided us with the code-mixing task data set mainly composed of three different languages: Malayalam, Kannada, and Tamil. The tasks on the code mixed data in these three different languages can be seen as three different comment/post-level classification tasks. The task on the Malayalam data set is a five-category classification task, and the Kannada and Tamil language data sets are two six-category classification tasks. Based on our analysis of the task description and task data set, we chose to use the multilingual BERT model to complete this task. In this paper, we will discuss our fine-tuning methods, models, experiments, and results.

机译：本文介绍了参与Dravidian-Langtech的集线器团队的系统描述 - EACL2021：Dravidian语言中的攻击性语言识别。这项共享任务的主题是检测社交媒体中的冒犯内容。在与令人反感的语音检测相关的已知任务中，这是检测Dravidian语言中社交媒体评论中发布的冒犯性评论的第一个任务。任务组织者团队为我们提供了代码混合任务数据集，主要由三种不同的语言组成：Malayalam，Kannada和Tamil。这三种不同语言中的代码混合数据的任务可以被视为三个不同的评论/后级分类任务。 Malayalam数据集的任务是五类分类任务，kannada和泰米尔语言数据集是两个六类分类任务。根据我们对任务描述和任务数据集的分析，我们选择使用多语言BERT模型来完成此任务。在本文中，我们将讨论我们的微调方法，模型，实验和结果。

著录项

来源
《Workshop on Speech and Language Technologies for Dravidian Languages》|2021年|203-209|共7页
会议地点
作者
Bo Huang; Yang Bai;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Experimenting Language Identification for Sentiment Analysis of English Punjabi Code Mixed Social Media Text [J] . International Journal of E-Adoption . 2020,第1期

机译：英文旁遮普语代码混合社交媒体文本的情感分析实验语言识别
2. Language identification framework in code-mixed social media text based on quantum LSTM - the word belongs to which language? [J] . Modern Physics Letters, B. Condensed Matter Physics, Statistical Physics, Applied Physics . 2020,第6期

机译：基于量子LSTM的代码混合社交媒体文本中语言识别框架 - 这个词属于哪种语言？
3. An effective cybernated word embedding system for analysis and language identification in code-mixed social media text [J] . Shekhar Shashi, Sharma Dilip Kumar, Sufyan Beg M.M. International journal of knowledge-based and intelligent engineering systems . 2019,第3期

机译：一个有效的电子化词嵌入系统，用于在代码混合的社交媒体文本中进行分析和语言识别
4. Codewithzichao@DravidianLangTech-EACL2021: Exploring Multilingual Transformers for Offensive Language Identification on Code Mixing Text [C] . Zichao Li Workshop on Speech and Language Technologies for Dravidian Languages . 2021

机译：codewithzichao @dravidianlangtech-eacl2021：探索多语言变压器，用于代码混合文本的进攻语言识别
5. Detecting Offensive Social Media Text in Nepali Language [D] . ?Timilsina, Sandesh 2020

机译：进攻检测社会化媒体中的文本尼泊尔语
6. Lessons From Neuro-(a)-Typical Brains: Universal Multilingualism Code-Mixing Recombination and Executive Functions [O] . Enoch O. Aboh 2020

机译：来自神经（a）型典型大脑的经验教训：通用的多种语言代码混合重组和执行功能
7. Identifying Languages at the Word Level in Code-Mixed Indian Social Media Text [O] . Das Amitava, Gambäck Björn 2016

机译：在代码混合的印度社交媒体文本中在单词级别识别语言

HUB@DravidianLangTech-EACL2021: Identify and Classify Offensive Text in Multilingual Code Mixing in Social Media

摘要

著录项

相似文献

相关主题

期刊订阅