Data-intensive Automatic Speech Recognition Based on Machine Learning

机译：基于机器学习的数据密集型自动语音识别

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Since speech is highly variable, even if we have a fairly large-scale database, we cannot avoid the data sparseness problem in constructing automatic speech recognition (ASR) systems. How to train and adapt statistical models using limited amounts of data is one of the most important research issues in ASR. This talk summarizes major techniques that have been proposed to solve the generalization problem in acoustic model training and adaptation, that is, how to achieve high recognition accuracy for new utterances. One of the common approaches is controlling the degree of freedom in model training and adaptation. The techniques can be classified by whether a priori knowledge of speech obtained from a speech database such as those recorded using many speakers is used or not. Another approach is maximizing "margins" between training samples and the decision boundaries. Many of these techniques have also been combined and extended to further improve performance. Although many useful techniques have been developed, we still do not have a golden standard that can be applied to any kind of speech variation and any condition of the speech data available for training and adaptation. We need to focus on collecting rich and effective speech databases covering a wide range of variations, active learning for automatically selecting data for annotation, cheap, fast and good-enough transcription, and efficient supervised, semi-supervised, or unsupervised training/adaptation, based on advanced machine learning techniques. We also need to extend current efforts to understand more about human speech processing and the mechanism of natural speech variation.

机译：由于语音很大，即使我们有一个相当大的数据库，我们也无法避免构建自动语音识别（ASR）系统的数据稀疏问题。如何使用有限量的数据训练和调整统计模型是ASR中最重要的研究问题之一。这句话总结了提议解决声学模型培训和适应的泛化问题的主要技术，即如何实现新话语的高识别准确性。其中一种常见方法是控制模型训练和适应的自由度。通过或不使用诸如使用许多扬声器记录的语音数据库获得的先验语音的先验知识来分类技术。另一种方法是在训练样本和决策边界之间最大化“边距”。许多这些技术也已被组合并扩展以进一步提高性能。虽然已经开发了许多有用的技术，但我们仍然没有一种金色标准，可以应用于任何类型的语音变化和可用于训练和适应的语音数据的任何条件。我们需要专注于收集丰富有效的语音数据库，涵盖各种变化，积极学习，用于自动选择要注释的数据，便宜，快速，足够的转录，高效的监督，半监督或无监督培训/适应，基于先进机器学习技术。我们还需要扩展目前的努力，了解更多关于人类语音处理和自然语音变异机制。

著录项

来源
《Conference on Computational Linguistics and Speech Processing》|2013年||共2页
会议地点
作者
Sadaoki Furui;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类程序设计、软件工程;
关键词

相似文献

外文文献
中文文献
专利

1. Machine learning based sample extraction for automatic speech recognition using dialectal Assamese speech [J] . Agarwalla Swapna, Sarma Kandarpa Kumar Neural Networks: The Official Journal of the International Neural Network Society . 2016,第Null期

机译：基于机器学习的样本提取用于使用方言阿萨姆语语音进行自动语音识别
2. Machine learning based sample extraction for automatic speech recognition using dialectal Assamese speech [J] . Agarwalla Swapna, Sarma Kandarpa Kumar Neural Networks: The Official Journal of the International Neural Network Society . 2016,第Null期

机译：基于机器学习的样本提取自动语音识别使用方言issamese语言
3. Machine Learning in Automatic Speech Recognition: A Survey [J] . Padmanabhan Jayashree, Premkumar Melvin Jose Johnson IETE Technical Review . 2015,第4期

机译：自动语音识别中的机器学习：一项调查
4. Data-intensive Automatic Speech Recognition Based on Machine Learning [C] . Sadaoki Furui Conference on Computational Linguistics and Speech Processing . 2013

机译：基于机器学习的数据密集型语音自动识别
5. Graph-based Semi-Supervised Learning in Acoustic Modeling for Automatic Speech Recognition. [D] . Liu, Yuzong. 2016

机译：用于自动语音识别的声学建模中基于图的半监督学习。
6. Automatic recognition of breast invasive ductal carcinoma based on terahertz spectroscopy with wavelet packet transform and machine learning [O] . Wenquan Liu, Rui Zhang, Yu Ling, 2020

机译：小波包变换和机器学习的太赫兹光谱自动识别乳腺浸润性导管癌
7. Automatic Speech Emotion Recognition Using Machine Learning [O] . Leila Kerkeni, Youssef Serrestou, Mohamed Mbarki, 2020

机译：自动语音情感识别使用机器学习

Data-intensive Automatic Speech Recognition Based on Machine Learning

摘要

著录项

相似文献

相关主题

期刊订阅