首页> 外国专利> Techniques for correcting linguistic training bias in training data

Techniques for correcting linguistic training bias in training data

机译:纠正训练数据中语言训练偏向的技术

摘要

In automated assistant systems, a deep-learning model in form of a long short-term memory (LSTM) classifier is used for mapping questions to classes, with each class having a manually curated answer. A team of experts manually create the training data used to train this classifier. Relying on human curation often results in such linguistic training biases creeping into training data, since every individual has a specific style of writing natural language and uses some words in specific context only. Deep models end up learning these biases, instead of the core concept words of the target classes. In order to correct these biases, meaningful sentences are automatically generated using a generative model, and then used for training a classification model. For example, a variational autoencoder (VAE) is used as the generative model for generating novel sentences and a language model (LM) is utilized for selecting sentences based on likelihood.
机译:在自动助理系统中,使用长短期记忆(LSTM)分类器形式的深度学习模型将问题映射到班级,每个班级都有一个手动策划的答案。一组专家手动创建用于训练该分类器的训练数据。依靠人类的策展往往会导致这种语言训练偏见蔓延到训练数据中,因为每个人都有特定的自然语言书写风格,并且仅在特定的上下文中使用某些单词。深度模型最终学习了这些偏差,而不是目标类的核心概念词。为了纠正这些偏见,使用生成模型自动生成有意义的句子,然后将其用于训练分类模型。例如,可变自动编码器(VAE)被用作生成新颖句子的生成模型,而语言模型(LM)被用于基于似然选择句子。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号