Data-intensive Automatic Speech Recognition Based on Machine Learning

机译：基于机器学习的数据密集型语音自动识别

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Since speech is highly variable, even if we have a fairly large-scale database, we cannot avoid the data sparseness problem in constructing automatic speech recognition (ASR) systems. How to train and adapt statistical models using limited amounts of data is one of the most important research issues in ASR. This talk summarizes major techniques that have been proposed to solve the generalization problem in acoustic model training and adaptation, that is, how to achieve high recognition accuracy for new utterances. One of the common approaches is controlling the degree of freedom in model training and adaptation. The techniques can be classified by whether a priori knowledge of speech obtained from a speech database such as those recorded using many speakers is used or not. Another approach is maximizing "margins" between training samples and the decision boundaries. Many of these techniques have also been combined and extended to further improve performance. Although many useful techniques have been developed, we still do not have a golden standard that can be applied to any kind of speech variation and any condition of the speech data available for training and adaptation. We need to focus on collecting rich and effective speech databases covering a wide range of variations, active learning for automatically selecting data for annotation, cheap, fast and good-enough transcription, and efficient supervised, semi-supervised, or unsupervised training/adaptation, based on advanced machine learning techniques. We also need to extend current efforts to understand more about human speech processing and the mechanism of natural speech variation.

机译：由于语音是高度可变的，因此即使我们拥有相当大规模的数据库，我们也无法避免在构建自动语音识别（ASR）系统时出现数据稀疏问题。如何使用有限的数据来训练和调整统计模型是ASR中最重要的研究问题之一。这篇演讲总结了为解决声学模型训练和自适应中的泛化问题而提出的主要技术，即如何为新话语实现高识别精度。常见的方法之一是控制模型训练和适应的自由度。可以通过是否使用从语音数据库获得的语音先验知识（例如使用许多说话者记录的语音数据库）来对技术进行分类。另一种方法是最大化训练样本和决策边界之间的“余量”。这些技术中的许多技术也已被组合和扩展以进一步提高性能。尽管已经开发了许多有用的技术，但我们仍然没有可用于任何种类的语音变体和可用于训练和适应的语音数据任何条件的黄金标准。我们需要集中精力收集涵盖各种变化的丰富而有效的语音数据库，主动学习以自动选择要注释的数据，廉价，快速和足够好的转录，以及有效的有监督，半监督或无监督的训练/适应，基于先进的机器学习技术。我们还需要加倍努力，以更多地了解人类语音处理和自然语音变化的机制。

著录项

来源
《Conference on Computational Linguistics and Speech Processing》|2013年|3-4|共2页
会议地点
作者
Sadaoki Furui;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Machine learning based sample extraction for automatic speech recognition using dialectal Assamese speech [J] . Agarwalla Swapna, Sarma Kandarpa Kumar Neural Networks: The Official Journal of the International Neural Network Society . 2016,第Null期

机译：基于机器学习的样本提取用于使用方言阿萨姆语语音进行自动语音识别
2. Machine learning based sample extraction for automatic speech recognition using dialectal Assamese speech [J] . Agarwalla Swapna, Sarma Kandarpa Kumar Neural Networks: The Official Journal of the International Neural Network Society . 2016,第Null期

机译：基于机器学习的样本提取自动语音识别使用方言issamese语言
3. Machine Learning in Automatic Speech Recognition: A Survey [J] . Padmanabhan Jayashree, Premkumar Melvin Jose Johnson IETE Technical Review . 2015,第4期

机译：自动语音识别中的机器学习：一项调查
4. Data-intensive Automatic Speech Recognition Based on Machine Learning [C] . Sadaoki Furui Conference on Computational Linguistics and Speech Processing . 2013

机译：基于机器学习的数据密集型自动语音识别
5. Graph-based Semi-Supervised Learning in Acoustic Modeling for Automatic Speech Recognition. [D] . Liu, Yuzong. 2016

机译：用于自动语音识别的声学建模中基于图的半监督学习。
6. Automatic recognition of breast invasive ductal carcinoma based on terahertz spectroscopy with wavelet packet transform and machine learning [O] . Wenquan Liu, Rui Zhang, Yu Ling, 2020

机译：小波包变换和机器学习的太赫兹光谱自动识别乳腺浸润性导管癌
7. Automatic Speech Emotion Recognition Using Machine Learning [O] . Leila Kerkeni, Youssef Serrestou, Mohamed Mbarki, 2020

机译：自动语音情感识别使用机器学习

Data-intensive Automatic Speech Recognition Based on Machine Learning

摘要

著录项

相似文献

相关主题

期刊订阅