首页> 外文会议>Conference on Computational Linguistics and Speech Processing >Data-intensive Automatic Speech Recognition Based on Machine Learning
【24h】

Data-intensive Automatic Speech Recognition Based on Machine Learning

机译:基于机器学习的数据密集型语音自动识别

获取原文

摘要

Since speech is highly variable, even if we have a fairly large-scale database, we cannot avoid the data sparseness problem in constructing automatic speech recognition (ASR) systems. How to train and adapt statistical models using limited amounts of data is one of the most important research issues in ASR. This talk summarizes major techniques that have been proposed to solve the generalization problem in acoustic model training and adaptation, that is, how to achieve high recognition accuracy for new utterances. One of the common approaches is controlling the degree of freedom in model training and adaptation. The techniques can be classified by whether a priori knowledge of speech obtained from a speech database such as those recorded using many speakers is used or not. Another approach is maximizing "margins" between training samples and the decision boundaries. Many of these techniques have also been combined and extended to further improve performance. Although many useful techniques have been developed, we still do not have a golden standard that can be applied to any kind of speech variation and any condition of the speech data available for training and adaptation. We need to focus on collecting rich and effective speech databases covering a wide range of variations, active learning for automatically selecting data for annotation, cheap, fast and good-enough transcription, and efficient supervised, semi-supervised, or unsupervised training/adaptation, based on advanced machine learning techniques. We also need to extend current efforts to understand more about human speech processing and the mechanism of natural speech variation.
机译:由于语音是高度可变的,因此即使我们拥有相当大规模的数据库,我们也无法避免在构建自动语音识别(ASR)系统时出现数据稀疏问题。如何使用有限的数据来训练和调整统计模型是ASR中最重要的研究问题之一。这篇演讲总结了为解决声学模型训练和自适应中的泛化问题而提出的主要技术,即如何为新话语实现高识别精度。常见的方法之一是控制模型训练和适应的自由度。可以通过是否使用从语音数据库获得的语音先验知识(例如使用许多说话者记录的语音数据库)来对技术进行分类。另一种方法是最大化训练样本和决策边界之间的“余量”。这些技术中的许多技术也已被组合和扩展以进一步提高性能。尽管已经开发了许多有用的技术,但我们仍然没有可用于任何种类的语音变体和可用于训练和适应的语音数据任何条件的黄金标准。我们需要集中精力收集涵盖各种变化的丰富而有效的语音数据库,主动学习以自动选择要注释的数据,廉价,快速和足够好的转录,以及有效的有监督,半监督或无监督的训练/适应,基于先进的机器学习技术。我们还需要加倍努力,以更多地了解人类语音处理和自然语音变化的机制。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号