首页> 外文会议>Workshop on noisy user-generated text >EdinburghNLP at WNUT-2020 Task 2: Leveraging Transformers with Generalized Augmentation for Identifying Informativeness in COVID-19 Tweets

【24h】

EdinburghNLP at WNUT-2020 Task 2: Leveraging Transformers with Generalized Augmentation for Identifying Informativeness in COVID-19 Tweets

机译：在Wnut-2020任务2的爱丁堡NLP：利用具有广义增强的变压器，以确定Covid-19推文中的信息

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Twitter has become an important communication channel in times of emergency. The ubiquitousness of smartphones enables people to announce an emergency they're observing in real-time. Because of this, more agencies are interested in programatically monitoring Twitter (disaster relief organizations and news agencies) and therefore recognizing the informativeness of a tweet can help filter noise from large volumes of data. In this paper, we present our submission for WNUT-2020 Task 2:Identification of informative COVID-19 English Tweets. Our most successful model is an ensemble of transformers including RoBERTa, XLNet, and BERTweet trained in a Semi-Supervised Learning (SSL) setting. The proposed system achieves a F1 score of 0.9011 on the test set (ranking 7th on the leaderboard), and shows significant gains in performance compared to a baseline system using fasttext embeddings.

机译：Twitter在紧急情况下成为重要的沟通渠道。智能手机的无处不在使人们能够在实时观察的紧急情况。因此，更多的机构对以编程监控推特（救灾组织和新闻机构）有兴趣，因此认识到推文的信息性可以帮助滤除来自大量数据的噪声。在本文中，我们展示了我们的Wnut-2020任务2的提交：识别信息丰富的Covid-19英语推文。我们最成功的模型是一个变形金刚的集合，包括罗伯塔，XLNET和Bertweet在半监督学习（SSL）设置中培训。所提出的系统在测试集上实现了0.9011的F1得分（排行榜上排名第7），与使用FastText Embeddings的基线系统相比，性能的显着增益。

著录项

来源
《Workshop on noisy user-generated text》|2020年|455-461|共7页
会议地点
作者
Nickil Maveli;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献

1. Identifying crisis-related informative tweets using learning on distributions [J] . Information Processing & Management . 2020,第2期

机译：通过对分布的学习来识别与危机相关的信息性推文
2. Leveraging process data to assess adults' problem-solving skills: Using sequence mining to identify behavioral patterns across digital tasks [J] . He Qiwei, Borgonovi Francesca, Paccagnella Marco Computers & education . 2021,第Juna期

机译：利用流程数据评估成人的问题解决技巧：使用序列挖掘来识别数字任务的行为模式
3. Data augmentation using generative adversarial networks (CycleGAN) to improve generalizability in CT segmentation tasks [J] . Veit Sandfort, Ke Yan, Perry J. Pickhardt, Scientific reports. . 2019,第1期

机译：使用生成的对冲网络（Cyclegan）来提高CT分割任务中的概括性的数据增强
4. Linguist Geeks on WNUT-2020 Task 2: COVID-19 Informative Tweet Identification using Progressive Trained Language Models and Data Augmentation [C] . Vasudev Awatramani, Anupam Kumar Workshop on noisy user-generated text . 2020

机译：Wnut-2020任务2：使用渐进式培训的语言模型和数据增强的Covid-19信息推文识别
5. Applying Machine Learning to Identify Anti-Vaccination Tweets during the COVID-19 Pandemic [O] . Quyen G. To, Kien G. To, Van-Anh N. Huynh, 2021

机译：应用机器学习识别Covid-19流行期间的反疫苗接种推文
6. InfoMiner at WNUT-2020 Task 2: Transformer-based Covid-19 Informative Tweet Extraction [O] . Hansi Hettiarachchi, Tharindu Ranasinghe 2020

机译：Wnut-2020任务2：基于变压器的Covid-19信息推进提取

EdinburghNLP at WNUT-2020 Task 2: Leveraging Transformers with Generalized Augmentation for Identifying Informativeness in COVID-19 Tweets

摘要

著录项

相似文献

相关主题

期刊订阅