首页>
外国专利>
Techniques for correcting linguistic training bias in training data
Techniques for correcting linguistic training bias in training data
展开▼
机译:纠正训练数据中语言训练偏向的技术
展开▼
页面导航
摘要
著录项
相似文献
摘要
In automated assistant systems, a deep-learning model in form of a long short-term memory (LSTM) classifier is used for mapping questions to classes, with each class having a manually curated answer. A team of experts manually create the training data used to train this classifier. Relying on human curation often results in such linguistic training biases creeping into training data, since every individual has a specific style of writing natural language and uses some words in specific context only. Deep models end up learning these biases, instead of the core concept words of the target classes. In order to correct these biases, meaningful sentences are automatically generated using a generative model, and then used for training a classification model. For example, a variational autoencoder (VAE) is used as the generative model for generating novel sentences and a language model (LM) is utilized for selecting sentences based on likelihood.
展开▼