Maximal Figure-of-Merit Framework to Detect Multi-Label Phonetic Features for Spoken Language Recognition

Ivan Kukanov; Trung Ngo Trong; Ville Hautamäki; Sabato Marco Siniscalchi; Valerio Mario Salerno; Kong Aik Lee

首页> 外文期刊>Audio, Speech, and Language Processing, IEEE/ACM Transactions on >Maximal Figure-of-Merit Framework to Detect Multi-Label Phonetic Features for Spoken Language Recognition

【24h】

Maximal Figure-of-Merit Framework to Detect Multi-Label Phonetic Features for Spoken Language Recognition

机译：最大值框架用于检测用于口语语言识别的多标签语音特征

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Bottleneck features (BNFs) generated with a deep neural network (DNN) have proven to boost spoken language recognition accuracy over basic spectral features significantly. However, BNFs are commonly extracted using language-dependent tied-context phone states as learning targets. Moreover, BNFs are less phonetically expressive than the output layer in a DNN, which is usually not used as a speech feature because of its very high dimensionality hindering further post-processing. In this article, we put forth a novel deep learning framework to overcome all of the above issues and evaluate it on the 2017 NIST Language Recognition Evaluation (LRE) challenge. We use manner and place of articulation as speech attributes, which lead to low-dimensional “universal” phonetic features that can be defined across all spoken languages. To model the asynchronous nature of the speech attributes while capturing their intrinsic relationships in a given speech segment, we introduce a new training scheme for deep architectures based on a Maximal Figure of Merit (MFoM) objective. MFoM introduces non-differentiable metrics into the backpropagation-based approach, which is elegantly solved in the proposed framework. The experimental evidence collected on the recent NIST LRE 2017 challenge demonstrates the effectiveness of our solution. In fact, the performance of speech language recognition (SLR) systems based on spectral features is improved for more than 5% absolute Cavg. Finally, the F1 metric can be brought from 77.6% up to 78.1% by combining the conventional baseline phonetic BNFs with the proposed articulatory attribute features.

机译：用深神经网络（DNN）产生的瓶颈特征（BNF）已被证明可以显着通过基本光谱特征提高语言识别精度。但是，BNF通常使用语言依赖的TIED-Context Pharm状态作为学习目标来提取。此外，BNFS比DNN中的输出层较少，该输出层通常不用作语音特征，因为其非常高的维度妨碍了进一步的后处理。在本文中，我们提出了一种新颖的深度学习框架，以克服所有上述问题，并在2017年NIST语言识别评估（LRE）挑战上评估它。我们使用铰接的方式和地点作为语音属性，这导致了可以在所有口头语言中定义的低维“通用”语音特征。为了在给定语音段中捕获其内部关系的同时模拟语音属性的异步性质，我们基于最大优点（MFOM）目标的深度架构引入了新的培训方案。 MFOM将非可分辨率的指标引入基于BackProjagation的方法，在所提出的框架中典雅地解决了。最近NIST LRE 2017挑战收集的实验证据表明了我们解决方案的有效性。实际上，基于频谱特征的语音语言识别（SLR）系统的性能得到了5％以上的绝对ACKG。最后，通过将传统的基线拼音BNF与所提出的铰接属性特征组合，F1度量可以从77.6％增加到77.6％。

著录项

来源
《Audio, Speech, and Language Processing, IEEE/ACM Transactions on》 |2020年第2020期|682-695|共14页
作者
Ivan Kukanov; Trung Ngo Trong; Ville Hautamäki; Sabato Marco Siniscalchi; Valerio Mario Salerno; Kong Aik Lee;
展开▼
作者单位

School of Computing University of Eastern Finland Joensuu Finland;

School of Computing University of Eastern Finland Joensuu Finland;

School of Computing University of Eastern Finland Joensuu Finland;

Department of Computer Engineering Kore University of Enna Enna Italy;

Department of Computer Engineering Kore University of Enna Enna Italy;

Biometrics Research Laboratories NEC Corporation Tokyo Japan;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Phonetics; Speech recognition; Feature extraction; Training; Task analysis; Detectors; Neural networks;

机译：语音;语音识别;特征提取;训练;任务分析;探测器;神经网络;

相似文献

外文文献
中文文献
专利

1. Spoken language recognition using a new conditional cascade method to combine acoustic and phonetic results [J] . Shabnam Gholamdokht Firooz, Shaghayegh Reza, Yasser Shekofteh International journal of speech technology . 2018,第3期

机译：使用新的条件级联方法结合语音和语音结果的口语识别
2. An elitist approach to automatic articulatory-acoustic feature classification for phonetic characterization of spoken language [J] . Shuangyu Chang, Mirjam Wester, Steven Greenberg Speech Communication . 2005,第3期

机译：语音自动语音特征的精英主义方法
3. Robust speech recognition for spoken dialogue using distinctive phonetic feature [J] . Shingo Iseji, Takashi Fukuda, Hirobumi Yamada, 電子情報通信学会技術研究報告. 音声. Speech . 2003,第93期

机译：使用独特的语音功能进行语音对话的鲁棒语音识别
4. Maximal Figure-of-Merit Embedding for Multi-Label Audio Classification [C] . Ivan Kukanov, Ville Hautamäki, Kong Aik Lee IEEE International Conference on Acoustics, Speech and Signal Processing . 2018

机译：多标签音频分类的最大品质因数嵌入
5. Phonetic features in language production: An experimental examination of phonetic feature errors. [D] . Guest, Daniel James. 2002

机译：语言产生中的语音特征：语音特征错误的实验检查。
6. Eros Beauty and Phon-Aesthetic Judgements of Language Sound. We Like It Flat and Fast but Not Melodious. Comparing Phonetic and Acoustic Features of 16 European Languages [O] . Vita V. Kogan, Susanne M. Reiterer 2021

机译：ErosBeauty和Phon-Aesthetic判断语言声音。我们像平坦而快速但不是悠扬。比较16欧洲语言的语音和声学特征
7. An Elitist Approach to Automatic Articulatory-Acoustic Feature Classification for Phonetic Characterization of Spoken Language [O] . Shuangyu Chang Mirjam, Mirjam Wester, Steven Greenberg 2007

机译：语音特征的自动发音-声音特征自动分类的精英方法

Maximal Figure-of-Merit Framework to Detect Multi-Label Phonetic Features for Spoken Language Recognition

摘要

著录项

相似文献

相关主题

期刊订阅