首页> 外文学位 >Accent and speaker recognition for advanced automatic speech recognition.

【24h】

Accent and speaker recognition for advanced automatic speech recognition.

机译：口音和说话者识别功能可实现高级自动语音识别。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

The speech signal conveys many levels of information which incorporate: linguistics (e.g., text, language, accent/dialect), speaker-specific (e.g., gender, emotion, speaker identity), and environmental information (e.g., communication channels, background noises). This dissertation focuses on addressing the speech-pattern recognition for detection of foreign accent and speaker identity information.; The first thesis goal addresses the problem of computer based automatic speech accent classification. A phone-based accent classification framework is developed which makes a decision based on the likelihood scores from pre-defined accent classes. Novel spectral trajectory modeling techniques are applied for estimating accent-sensitive acoustic traits for whole phoneme segments, in an effort to better capture the spectral evolution of speech over conventional Hidden Markov Model methods. Integrated feature-space transformations are applied for dimensionality reduction and better discrimination among accent classes. Furthermore, for the first time the open-set accent detection problem, which aims to detect native and non-native speech when no pre-defined system models exist for that specific accent is explored. Comparable performance is achieved for most open accents using a closed set of four accent models.; The second thesis goal addresses the problem of in-set/out-of-set speaker recognition, where we identify a speaker as belonging to a group of pre-defined speakers. An effective algorithm is developed which employs spectral-based features within a Gaussian Mixture Model - Universal Background Model framework, enhanced by discriminative adaptation based on modified minimum classification error and minimum verification error criteria. Alternative speaker rejection criteria based on the distribution of in-set speaker discriminative score space are introduced and compared with the conventional log-likelihood ratio test. This represents the first published study addressing in-set speaker recognition.; Finally, the thesis concludes with a demonstration of the proposed algorithms for spoken document retrieval (SDR) using a collection of historical audio materials from the National Gallery of the Spoken Word. Results show that accent classification and in-set speaker recognition can successfully be integrated into an application for rich transcript generation in SDR. Collectively, the advances demonstrated in this research add new directions for future development in automatic accent classification, speaker recognition, and improving robustness in speech recognition, and next generation human-computer spoken language technology.

机译：语音信号传达了许多级别的信息，这些信息包括：语言学（例如，文本，语言，口音/方言），特定于讲话者（例如，性别，情感，讲话者身份）和环境信息（例如，通信渠道，背景噪音）。本文主要研究语音模式识别技术，以检测外来口音和说话人身份信息。第一个论文目标是解决基于计算机的语音重音自动分类问题。开发了基于电话的口音分类框架，该框架基于来自预定义口音类别的似然分数来做出决策。新颖的频谱轨迹建模技术被用于估计整个音素片段的口音敏感的声学特性，以力图更好地捕获传统隐马尔可夫模型方法中语音的频谱演变。集成的特征空间变换可用于降维和更好地区分重音类。此外，首次探索了开放式口音检测问题，该问题旨在在不存在针对该特定口音的预定义系统模型的情况下检测本地和非本地语音。使用一组四个口音模型的封闭设置，可以对大多数打开的口音实现可比的性能。论文的第二个目标是解决说话者内在/外在识别的问题，我们将说话者识别为属于一组预定义的说话者。开发了一种有效的算法，该算法在高斯混合模型-通用背景模型框架中采用了基于光谱的特征，并通过基于修改的最小分类误差和最小验证误差标准的判别自适应进行了增强。引入了基于内置说话者区分分数空间分布的替代说话者拒绝标准，并将其与常规对数似然比测试进行比较。这是第一个发表的针对演讲者识别的研究。最后，本文以对语音文档检索（SDR）的拟议算法进行了演示，该算法使用了来自国家话语画廊的历史音频材料的集合。结果表明，口音分类和内置说话人识别可以成功集成到SDR中用于生成丰富转录本的应用程序中。总体而言，这项研究中展示的进步为自动重音分类，说话人识别和提高语音识别的鲁棒性以及下一代人机口语技术的未来发展提供了新的方向。

著录项

作者
Angkititrakul, Pongtep.;
展开▼
作者单位

University of Colorado at Boulder.;

展开▼
授予单位 University of Colorado at Boulder.;
学科 Engineering Electronics and Electrical.
学位 Ph.D.
年度 2004
页码 155 p.
总页数 155
原文格式 PDF
正文语种 eng
中图分类无线电电子学、电信技术;
关键词

相似文献

外文文献
中文文献
专利

1. Do We Need STRFs for Cocktail Parties? On the Relevance of Physiologically Motivated Features for Human Speech Perception Derived from Automatic Speech Recognition. [J] . B Kollmeier, M R René Sch?dler, A Meyer, Advances in Experimental Medicine and Biology . 2013,第Null期

机译：鸡尾酒会需要STRF吗？生理动机特征与自动语音识别衍生的人类语音感知的相关性。
2. Evaluation of speech intelligibility for children with cleft lip and palate by means of automatic speech recognition. [J] . Schuster M, Maier A, Haderlein T, International journal of pediatric otorhinolaryngology . 2006,第10期

机译：通过自动语音识别评估唇left裂儿童的语音清晰度。
3. Fractal dimensions of speech sounds: computation and application to automatic speech recognition. [J] . Maragos P, Potamianos A The Journal of the Acoustical Society of America . 1999,第3期

机译：语音的分形维数：自动语音识别的计算和应用。
4. Whispered speech speaker recognition. Listening tests versus speaker recognition system. [C] . Krzysztof Goliasz, Michal Luczynski Audio Engineering Society International Convention . 2017

机译：说话者说话时低语识别。听力测试与说话者识别系统。
5. Frequency warping by linear transformation, and vocal tract inversion for speaker normalization in automatic speech recognition. [D] . Panchapagesan, Sankaran. 2008

机译：通过线性变换实现的频率扭曲和声道反转，可在自动语音识别中实现说话人归一化。
6. Listening with a foreign-accent: The interlanguage speech intelligibility benefit in Mandarin speakers of English [O] . Xin Xie, Carol A. Fowler -1

机译：带有异味的听力：讲普通话的英语者的中介语语音清晰度
7. An accent-independent lexicon for automatic speech recognition. [O] . Van Bael Christophe, King Simon 2003

机译：用于自动语音识别的与重音无关的词典。

Accent and speaker recognition for advanced automatic speech recognition.

摘要

著录项

相似文献

相关主题

期刊订阅