首页> 外文学位 >Accent and speaker recognition for advanced automatic speech recognition.
【24h】

Accent and speaker recognition for advanced automatic speech recognition.

机译:口音和说话者识别功能可实现高级自动语音识别。

获取原文
获取原文并翻译 | 示例

摘要

The speech signal conveys many levels of information which incorporate: linguistics (e.g., text, language, accent/dialect), speaker-specific (e.g., gender, emotion, speaker identity), and environmental information (e.g., communication channels, background noises). This dissertation focuses on addressing the speech-pattern recognition for detection of foreign accent and speaker identity information.; The first thesis goal addresses the problem of computer based automatic speech accent classification. A phone-based accent classification framework is developed which makes a decision based on the likelihood scores from pre-defined accent classes. Novel spectral trajectory modeling techniques are applied for estimating accent-sensitive acoustic traits for whole phoneme segments, in an effort to better capture the spectral evolution of speech over conventional Hidden Markov Model methods. Integrated feature-space transformations are applied for dimensionality reduction and better discrimination among accent classes. Furthermore, for the first time the open-set accent detection problem, which aims to detect native and non-native speech when no pre-defined system models exist for that specific accent is explored. Comparable performance is achieved for most open accents using a closed set of four accent models.; The second thesis goal addresses the problem of in-set/out-of-set speaker recognition, where we identify a speaker as belonging to a group of pre-defined speakers. An effective algorithm is developed which employs spectral-based features within a Gaussian Mixture Model - Universal Background Model framework, enhanced by discriminative adaptation based on modified minimum classification error and minimum verification error criteria. Alternative speaker rejection criteria based on the distribution of in-set speaker discriminative score space are introduced and compared with the conventional log-likelihood ratio test. This represents the first published study addressing in-set speaker recognition.; Finally, the thesis concludes with a demonstration of the proposed algorithms for spoken document retrieval (SDR) using a collection of historical audio materials from the National Gallery of the Spoken Word. Results show that accent classification and in-set speaker recognition can successfully be integrated into an application for rich transcript generation in SDR. Collectively, the advances demonstrated in this research add new directions for future development in automatic accent classification, speaker recognition, and improving robustness in speech recognition, and next generation human-computer spoken language technology.
机译:语音信号传达了许多级别的信息,这些信息包括:语言学(例如,文本,语言,口音/方言),特定于讲话者(例如,性别,情感,讲话者身份)和环境信息(例如,通信渠道,背景噪音) 。本文主要研究语音模式识别技术,以检测外来口音和说话人身份信息。第一个论文目标是解决基于计算机的语音重音自动分类问题。开发了基于电话的口音分类框架,该框架基于来自预定义口音类别的似然分数来做出决策。新颖的频谱轨迹建模技术被用于估计整个音素片段的口音敏感的声学特性,以力图更好地捕获传统隐马尔可夫模型方法中语音的频谱演变。集成的特征空间变换可用于降维和更好地区分重音类。此外,首次探索了开放式口音检测问题,该问题旨在在不存在针对该特定口音的预定义系统模型的情况下检测本地和非本地语音。使用一组四个口音模型的封闭设置,可以对大多数打开的口音实现可比的性能。论文的第二个目标是解决说话者内在/外在识别的问题,我们将说话者识别为属于一组预定义的说话者。开发了一种有效的算法,该算法在高斯混合模型-通用背景模型框架中采用了基于光谱的特征,并通过基于修改的最小分类误差和最小验证误差标准的判别自适应进行了增强。引入了基于内置说话者区分分数空间分布的替代说话者拒绝标准,并将其与常规对数似然比测试进行比较。这是第一个发表的针对演讲者识别的研究。最后,本文以对语音文档检索(SDR)的拟议算法进行了演示,该算法使用了来自国家话语画廊的历史音频材料的集合。结果表明,口音分类和内置说话人识别可以成功集成到SDR中用于生成丰富转录本的应用程序中。总体而言,这项研究中展示的进步为自动重音分类,说话人识别和提高语音识别的鲁棒性以及下一代人机口语技术的未来发展提供了新的方向。

著录项

  • 作者

    Angkititrakul, Pongtep.;

  • 作者单位

    University of Colorado at Boulder.;

  • 授予单位 University of Colorado at Boulder.;
  • 学科 Engineering Electronics and Electrical.
  • 学位 Ph.D.
  • 年度 2004
  • 页码 155 p.
  • 总页数 155
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 无线电电子学、电信技术;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号