首页> 外文学位 >Improving robustness of speech recognition systems.
【24h】

Improving robustness of speech recognition systems.

机译:提高语音识别系统的鲁棒性。

获取原文
获取原文并翻译 | 示例

摘要

Current Automatic Speech Recognition (ASR) systems fail to perform nearly as good as human speech recognition performance due to their lack of robustness against speech variability and noise contamination. The goal of this dissertation is to investigate these critical robustness issues, put forth different ways to address them and finally present an ASR architecture based upon these robustness criteria.;Acoustic variations adversely affect the performance of current phone-based ASR systems, in which speech is modeled as 'beads-on-a-string', where the beads are the individual phone units. While phone units are distinctive in cognitive domain, they are varying in the physical domain and their variation occurs due to a combination of factors including speech style, speaking rate etc.; a phenomenon commonly known as 'coarticulation'. Traditional ASR systems address such coarticulatory variations by using contextualized phone-units such as triphones. Articulatory phonology accounts for coarticulatory variations by modeling speech as a constellation of constricting actions known as articulatory gestures. In such a framework, speech variations such as coarticulation and lenition are accounted for by gestural overlap in time and gestural reduction in space. To realize a gesture-based ASR system, articulatory gestures have to be inferred from the acoustic signal. At the initial stage of this research an initial study was performed using synthetically generated speech to obtain a proof-of-concept that articulatory gestures can indeed be recognized from the speech signal. It was observed that having vocal tract constriction trajectories (TVs) as intermediate representation facilitated the gesture recognition task from the speech signal.;Presently no natural speech database contains articulatory gesture annotation; hence an automated iterative time-warping architecture is proposed that can annotate any natural speech database with articulatory gestures and TVs. Two natural speech databases: X-ray microbeam and Aurora-2 were annotated, where the former was used to train a TV-estimator and the latter was used to train a Dynamic Bayesian Network (DBN) based ASR architecture. The DBN architecture used two sets of observation: (a) acoustic features in the form of mel-frequency cepstral coefficients (MFCCs) and (b) TVs (estimated from the acoustic speech signal). In this setup the articulatory gestures were modeled as hidden random variables, hence eliminating the necessity for explicit gesture recognition. Word recognition results using the DBN architecture indicate that articulatory representations not only can help to account for coarticulatory variations but can also significantly improve the noise robustness of ASR system.
机译:当前的自动语音识别(ASR)系统由于缺乏针对语音可变性和噪声污染的鲁棒性,因此其性能几乎不如人类语音识别性能。本文的目的是研究这些关键的鲁棒性问题,提出不同的解决方案,并最终基于这些鲁棒性标准提出一种ASR体系结构。声学变化会对当前基于电话的ASR系统的性能产生不利影响被建模为“串珠”,其中串珠是单个电话单元。尽管电话单元在认知领域是独特的,但它们在物理领域是变化的,并且它们的变化是由于多种因素共同作用而产生的,这些因素包括言语风格,语速等。通常称为“共发音”的现象。传统的ASR系统通过使用情境化的电话单元(如三音机)来解决这种共音变化。语音发音学通过将语音建模为一组被称为发音手势的收缩动作来解释共发音变化。在这样的框架中,语音变化(例如共呼和轻度)是由时间上的手势重叠和空间上的手势减少所引起的。为了实现基于手势的ASR系统,必须从声音信号中推断出关节手势。在此研究的初始阶段,使用合成生成的语音进行了初步研究,以获得概念证明,即可以从语音信号中识别出发音手势。有人观察到以声道收缩轨迹(TVs)作为中间表示有助于语音信号中的手势识别任务。因此,提出了一种自动迭代时间扭曲体系结构,该体系结构可以用发音手势和电视注释任何自然语音数据库。注释了两个自然语音数据库:X射线微束和Aurora-2,其中前者用于训练电视估计器,后者用于训练基于动态贝叶斯网络(DBN)的ASR体系结构。 DBN体系结构使用了两组观测值:(a)mel-频率倒谱系数(MFCC)形式的声学特征和(b)TV(根据声学语音信号估算)。在此设置中,将发音手势建模为隐藏的随机变量,因此消除了明确手势识别的必要性。使用DBN架构的单词识别结果表明,发音表示不仅可以帮助解决发音差异,而且还可以显着提高ASR系统的噪声鲁棒性。

著录项

  • 作者

    Mitra, Vikramjit.;

  • 作者单位

    University of Maryland, College Park.;

  • 授予单位 University of Maryland, College Park.;
  • 学科 Engineering Electronics and Electrical.;Physics Acoustics.
  • 学位 Ph.D.
  • 年度 2010
  • 页码 199 p.
  • 总页数 199
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号