首页> 外文会议>Workshop on noisy user-generated text >EdinburghNLP at WNUT-2020 Task 2: Leveraging Transformers with Generalized Augmentation for Identifying Informativeness in COVID-19 Tweets
【24h】

EdinburghNLP at WNUT-2020 Task 2: Leveraging Transformers with Generalized Augmentation for Identifying Informativeness in COVID-19 Tweets

机译:在Wnut-2020任务2的爱丁堡NLP:利用具有广义增强的变压器,以确定Covid-19推文中的信息

获取原文

摘要

Twitter has become an important communication channel in times of emergency. The ubiquitousness of smartphones enables people to announce an emergency they're observing in real-time. Because of this, more agencies are interested in programatically monitoring Twitter (disaster relief organizations and news agencies) and therefore recognizing the informativeness of a tweet can help filter noise from large volumes of data. In this paper, we present our submission for WNUT-2020 Task 2:Identification of informative COVID-19 English Tweets. Our most successful model is an ensemble of transformers including RoBERTa, XLNet, and BERTweet trained in a Semi-Supervised Learning (SSL) setting. The proposed system achieves a F1 score of 0.9011 on the test set (ranking 7th on the leaderboard), and shows significant gains in performance compared to a baseline system using fasttext embeddings.
机译:Twitter在紧急情况下成为重要的沟通渠道。智能手机的无处不在使人们能够在实时观察的紧急情况。因此,更多的机构对以编程监控推特(救灾组织和新闻机构)有兴趣,因此认识到推文的信息性可以帮助滤除来自大量数据的噪声。在本文中,我们展示了我们的Wnut-2020任务2的提交:识别信息丰富的Covid-19英语推文。我们最成功的模型是一个变形金刚的集合,包括罗伯塔,XLNET和Bertweet在半监督学习(SSL)设置中培训。所提出的系统在测试集上实现了0.9011的F1得分(排行榜上排名第7),与使用FastText Embeddings的基线系统相比,性能的显着增益。

著录项

相似文献

  • 外文文献
  • 中文文献
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号