首页> 外文会议>Conference on Empirical Methods in Natural Language Processing >Mind Your Inflections! Improving NLP for Non-Standard Englishes with Base-Inflection Encoding
【24h】

Mind Your Inflections! Improving NLP for Non-Standard Englishes with Base-Inflection Encoding

机译:介意你的变化!用基础拐线改进非标准英语的NLP

获取原文

摘要

Inflectional variation is a common feature of World Englishes such as Colloquial Singapore English and African American Vernacular English. Although comprehension by human readers is usually unimpaired by non-standard inflections, currenl NLP systems are not yet robust. We propose Base-Inflection Encoding (BITE), a method to tokenize English text by reducing inllected words to their base forms before reinjecting the grammatical information as special symbols. Fine-tuning pre-trained NLP models for downstream tasks using our encoding defends against inflectional adversaries while maintaining performance on clean data. Models using BITE generalize better to dialects with non-standard inflections without explicit training and translation models converge faster when trained with BITE. Finally, we show that our encoding improves the vocabulary efficiency of popular data-driven subword tokenizers. Since there has been no prior work on quantitatively evaluating vocabulary efficiency, we propose metrics to do so.
机译:折射变异是世界英语的共同特征,如口语新加坡英语和非洲裔美国白话英语。虽然人类读者的理解通常是由非标拐点未受害的,但Currenl NLP系统尚不强劲。我们提出了基础拐点编码(咬合),通过将语料信息作为特殊符号重新注入语法信息之前将铭牌递减给基础形式的键入英语文本的方法。使用我们的编码对下游任务进行微调预先培训的NLP模型,防止对拐服对手,同时保持清洁数据的性能。使用BITE的模型更好地通过无标准拐点的方言,无明确的训练,并且翻译模型在用咬训练时更快地收敛。最后,我们表明我们的编码提高了流行的数据驱动子字标记的词汇效率。由于没有在定量评估词汇效率的情况下,我们提出了指标。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号