首页> 外文会议>Workshop on Speech and Language Technologies for Dravidian Languages >HUB@DravidianLangTech-EACL2021: Identify and Classify Offensive Text in Multilingual Code Mixing in Social Media
【24h】

HUB@DravidianLangTech-EACL2021: Identify and Classify Offensive Text in Multilingual Code Mixing in Social Media

机译:集线器@Dravidianlangtech-EACL2021:在社交媒体中的多语言代码混合中识别和分类攻击性文本

获取原文

摘要

This paper introduces the system description of the HUB team participating in Dravidian-LangTech - EACL2021: Offensive Language Identification in Dravidian Languages. The theme of this shared task is the detection of offensive content in social media. Among the known tasks related to offensive speech detection, this is the first task to detect offensive comments posted in social media comments in the Dravidian language. The task organizer team provided us with the code-mixing task data set mainly composed of three different languages: Malayalam, Kannada, and Tamil. The tasks on the code mixed data in these three different languages can be seen as three different comment/post-level classification tasks. The task on the Malayalam data set is a five-category classification task, and the Kannada and Tamil language data sets are two six-category classification tasks. Based on our analysis of the task description and task data set, we chose to use the multilingual BERT model to complete this task. In this paper, we will discuss our fine-tuning methods, models, experiments, and results.
机译:本文介绍了参与Dravidian-Langtech的集线器团队的系统描述 - EACL2021:Dravidian语言中的攻击性语言识别。这项共享任务的主题是检测社交媒体中的冒犯内容。在与令人反感的语音检测相关的已知任务中,这是检测Dravidian语言中社交媒体评论中发布的冒犯性评论的第一个任务。任务组织者团队为我们提供了代码混合任务数据集,主要由三种不同的语言组成:Malayalam,Kannada和Tamil。这三种不同语言中的代码混合数据的任务可以被视为三个不同的评论/后级分类任务。 Malayalam数据集的任务是五类分类任务,kannada和泰米尔语言数据集是两个六类分类任务。根据我们对任务描述和任务数据集的分析,我们选择使用多语言BERT模型来完成此任务。在本文中,我们将讨论我们的微调方法,模型,实验和结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号