SeqMix: Augmenting Active Sequence Labeling via Sequence Mixup

机译：SEQMIX：通过序列混合增强主动序列标记

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Active learning is an important technique for low-resource sequence labeling tasks. However, current active sequence labeling methods use the queried samples alone in each iteration, which is an inefficient way of leveraging human annotations. We propose a simple but effective data augmentation method to improve label efficiency of active sequence labeling. Our method, SeqMix, simply augments the queried samples by generating extra labeled sequences in each iteration. The key difficulty is to generate plausible sequences along with token-level labels. In SeqMix, we address this challenge by performing mixup for both sequences and token-level labels of the queried samples. Furthermore, we design a discriminator during sequence mixup, which judges whether the generated sequences are plausible or not. Our experiments on Named Entity Recognition and Event Detection tasks show that SeqMix can improve the standard active sequence labeling method by 2.27%-3.75% in terms of E_1 scores.

机译：主动学习是低资源序列标签任务的重要技术。然而，当前的有源序列标记方法在每次迭代中使用查询样本，这是利用人类注释的低效方法。我们提出了一种简单但有效的数据增强方法，提高了有源序列标记的标签效率。我们的方法SEQMIX只是通过在每次迭代中生成额外标记的序列来增强查询样本。关键难度是产生合理的序列以及令牌级标签。在SEQMIX中，我们通过对查询样本的序列和令牌级标签进行混合来解决这一挑战。此外，我们在序列混合过程中设计鉴别器，这判断所生成的序列是否是合理的。我们关于命名实体识别和事件检测任务的实验表明，SEQMIX可以在E_1分数方面将标准有源序列标记方法提高2.27％-3.75％。

著录项

来源
《Conference on Empirical Methods in Natural Language Processing》|2020年|8566-8579|共14页
会议地点
作者
Rongzhi Zhang; Yue Yu; Chao Zhang;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Multi-function Prediction of Unknown Protein Sequences Using Multilabel Classifiers and Augmented Sequence Features [J] . Agrawal Saurabh, Sisodia Dilip Singh, Nagwani Naresh Kumar Iranian journal of science and technology . 2021,第4期

机译：使用多标签分类器和增强序列特征的未知蛋白序列的多功能预测
2. Active Sequences Collection (ASC) database: a new tool to assign functions to protein sequences [J] . Angelo M. Facchiano, Antonio Facchiano, Francesco Facchiano Nucleic Acids Research . 2003,第1期

机译：主动序列收集（ASC）数据库：为蛋白质序列分配功能的新工具
3. Active Sequences Collection (ASC) database: a new tool to assign functions to protein sequences [J] . Angelo M. Facchiano, Antonio Facchiano, Francesco Facchiano Nucleic acids research . 2003,第1期

机译：主动序列收集（ASC）数据库：为蛋白质序列分配功能的新工具
4. An Experimental Comparison of Active Learning Strategies for Partially Labeled Sequences [C] . Diego Marcheggiani, Thierry Artieres Conference on empirical methods in natural language processing . 2014

机译：部分标记序列主动学习策略的实验比较
5. The Effect of Partial Promoter Sequences on Primer Labeling and De Novo Initiation By T7 RNA Polymerasepolymerase [D] . Padmanabhan, Ramesh. 2018

机译：部分启动子序列对T7 RNA聚合酶聚合酶引物标记和从头引发的影响
6. Cloning and sequence analysis of the rat augmenter of liver regeneration (ALR) gene: expression of biologically active recombinant ALR and demonstration of tissue distribution. [O] . M Hagiya, A Francavilla, L Polimeno, 2019

机译：大鼠肝再生增强子（ALR）基因的克隆和序列分析：具有生物活性的重组ALR的表达和组织分布的演示。
7. An Experimental Comparison of Active Learning Strategies for Partially Labeled Sequences [O] . Diego Marcheggiani 2015

机译：部分标记序列主动学习策略的实验比较

SeqMix: Augmenting Active Sequence Labeling via Sequence Mixup

摘要

著录项

相似文献

相关主题

期刊订阅