首页> 外文会议>Conference on Computational Linguistics and Speech Processing >Data-intensive Automatic Speech Recognition Based on Machine Learning
【24h】

Data-intensive Automatic Speech Recognition Based on Machine Learning

机译:基于机器学习的数据密集型自动语音识别

获取原文

摘要

Since speech is highly variable, even if we have a fairly large-scale database, we cannot avoid the data sparseness problem in constructing automatic speech recognition (ASR) systems. How to train and adapt statistical models using limited amounts of data is one of the most important research issues in ASR. This talk summarizes major techniques that have been proposed to solve the generalization problem in acoustic model training and adaptation, that is, how to achieve high recognition accuracy for new utterances. One of the common approaches is controlling the degree of freedom in model training and adaptation. The techniques can be classified by whether a priori knowledge of speech obtained from a speech database such as those recorded using many speakers is used or not. Another approach is maximizing "margins" between training samples and the decision boundaries. Many of these techniques have also been combined and extended to further improve performance. Although many useful techniques have been developed, we still do not have a golden standard that can be applied to any kind of speech variation and any condition of the speech data available for training and adaptation. We need to focus on collecting rich and effective speech databases covering a wide range of variations, active learning for automatically selecting data for annotation, cheap, fast and good-enough transcription, and efficient supervised, semi-supervised, or unsupervised training/adaptation, based on advanced machine learning techniques. We also need to extend current efforts to understand more about human speech processing and the mechanism of natural speech variation.
机译:由于语音很大,即使我们有一个相当大的数据库,我们也无法避免构建自动语音识别(ASR)系统的数据稀疏问题。如何使用有限量的数据训练和调整统计模型是ASR中最重要的研究问题之一。这句话总结了提议解决声学模型培训和适应的泛化问题的主要技术,即如何实现新话语的高识别准确性。其中一种常见方法是控制模型训练和适应的自由度。通过或不使用诸如使用许多扬声器记录的语音数据库获得的先验语音的先验知识来分类技术。另一种方法是在训练样本和决策边界之间最大化“边距”。许多这些技术也已被组合并扩展以进一步提高性能。虽然已经开发了许多有用的技术,但我们仍然没有一种金色标准,可以应用于任何类型的语音变化和可用于训练和适应的语音数据的任何条件。我们需要专注于收集丰富有效的语音数据库,涵盖各种变化,积极学习,用于自动选择要注释的数据,便宜,快速,足够的转录,高效的监督,半监督或无监督培训/适应,基于先进机器学习技术。我们还需要扩展目前的努力,了解更多关于人类语音处理和自然语音变异机制。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号