首页> 外文会议>International Conference for Convergence in Technology >Evaluation of Deep Learning Models for Hostility Detection in Hindi Text
【24h】

Evaluation of Deep Learning Models for Hostility Detection in Hindi Text

机译:印地文中敌意检测深度学习模型的评价

获取原文

摘要

The social media platform is a convenient medium to express personal thoughts and share useful information. It is fast, concise, and has the ability to reach millions. It is an effective place to archive thoughts, share artistic content, receive feedback, promote products, etc. Despite having numerous advantages these platforms have given a boost to hostile posts. Hate speech and derogatory remarks are being posted for personal satisfaction or political gain. The hostile posts can have a bullying effect rendering the entire platform experience hostile. Therefore detection of hostile posts is important to maintain social media hygiene. The problem is more pronounced languages like Hindi which are low in resources. In this work, we present approaches for hostile text detection in the Hindi language. The proposed approaches are evaluated on the Constraint@AAAI 2021 Hindi hostility detection dataset. The dataset consists of hostile and nonhostile texts collected from social media platforms. The hostile posts are further segregated into overlapping classes of fake, offensive, hate, and defamation. We evaluate a host of deep learning approaches based on CNN, LSTM, and BERT for this multi-label classification problem. The pre-trained Hindi fast text word embeddings by IndicNLP and Facebook are used in conjunction with CNN and LSTM models. Two variations of pre-trained multilingual transformer language models mBERT and IndicBERT are used. We show that the performance of BERT based models is best. Moreover, CNN and LSTM models also perform competitively with BERT based models.
机译:社交媒体平台是一种表达个人思想的便捷媒介,并分享有用的信息。它快速,简洁,并有能力达到数百万。它是一个有效的归档思想,共享艺术内容,接收反馈,推广产品等。尽管有许多优势,但这些平台已经提升到敌对帖子。讨厌言论和贬义言论正在为个人满足或政治收益发布。敌对帖子可以具有欺凌效果,呈现整个平台经验敌对。因此,检测敌对职位对于维持社交媒体卫生非常重要。问题是更明显的语言,如资源中的印度。在这项工作中,我们在印地语语言中提出了敌对文本检测的方法。在AAAI 2021 Hindi敌意检测数据集中评估所提出的方法。 DataSet由来自社交媒体平台收集的敌对和非硬盘文本组成。敌对帖子进一步被隔离成重叠的假,冒犯,仇恨和诽谤。我们为该多标签分类问题的CNN,LSTM和BERT评估了一系列深度学习方法。 PRET训练的印地语快速文本单词嵌入式NExDNLP和Facebook与CNN和LSTM模型结合使用。使用预先训练的多语言变压器语言模型MBERT和indectbert的两种变体。我们表明基于BERT的模型的性能最好。此外,CNN和LSTM模型也与基于BERT的模型竞争地执行。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号