Improvement of the learning process of the automated speaker recognition system for critical use with HMM-DNN component

机译：改进了与HMM-DNN组件一起使用的自动说话人识别系统的学习过程

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The article presents the results of the adaptation of the hybrid HMM-DNN speech synthesis model for use in automatedspeaker recognition system for critical use (ASRSCU). In particular, the process of learning the HMM-DNN speechsynthesis model with the estimation of the difference between the posterior probability distributions of all HMM statesand the actual a posteriori probability distribution, calculated by DNN, and the use of semantic information in thespeaker recognition process, has been improved. The features that are observed in the sequence of frames to which theinput phonogram is divided describe this information. The obtained results allowed improving the efficiency of the textdependentspeaker recognition when using ASRSCU in a noisy acoustic environment. The article formulated measuresfor the structural integration of the HMM-DNN component in ASRSCU and describes the practical aspects of thisprocess. In particular, the choice of the type and the method of normalization of the vectors of basic informative featuresat the frame level was substantiated, the number of HMM states and GMM parameters were determined depending onthe parameters of the chosen formation model, and the procedure for interpreting the recognition results was described.The paper formulates measures to optimize the learning process of the ASRSCU with the HMM-DNN component,which will be exploited in noisy environments.

机译：本文介绍了用于自动化的混合HMM-DNN语音合成模型的适应结果关键用途说话人识别系统（ASRSCU）。特别是学习HMM-DNN语音的过程估计所有HMM状态的后验概率分布之间的差异的综合模型以及DNN计算的实际后验概率分布，以及语义信息在说话人识别过程，得到了改善。在帧序列中观察到的特征输入的留声机被划分描述此信息。获得的结果允许提高文本相关的效率在嘈杂的声学环境中使用ASRSCU时的说话人识别。文章制定措施用于ASRSCU中的HMM-DNN组件的结构集成，并描述了这一方面的实际方面过程。特别是基本信息特征向量的类型选择和归一化方法在帧级别得到证实，HMM状态和GMM参数的数量取决于描述了所选地层模型的参数，以及解释识别结果的步骤。本文提出了使用HMM-DNN组件优化ASRSCU学习过程的措施，在嘈杂的环境中会被利用。

著录项

来源
《Conference on Photonics Applications in Astronomy, Communications, Industry, and High-Energy Physics Experiments;Society of Photo-Optical Instrumentation Engineers》||1117620.1-1117620.10|共10页
会议地点
作者
Mykola M. Bykov; Viacheslav V. Kovtun; Iryna M. Kobylyanska; Waldemar Wójcik; Saule Smailova;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Integration of hidden markov models in the automated speaker recognition system for critical use [J] . Vjatcheslav V. KOVTUN, Maria S. YUKHIMCHUK, Piotr KISALA, Przeglad Elektrotechniczny . 2019,第4期

机译：用于批判性扬声器识别系统中隐马尔可夫模型的集成
2. English speech sound improvement system based on deep learning from signal processing to semantic recognition [J] . Yucheng Yang, Yibo Yue International journal of speech technology . 2020,第3期

机译：基于深度学习从信号处理到语义识别的英语语音声音改进系统
3. Mixed deep learning and natural language processing method for fake-food image recognition and standardization to help automated dietary assessment [J] . SimonMezgec, TomeEftimov, TamaraBucher, Public Health Nutrition . 2019,第7期

机译：用于伪食物图像识别和标准化的混合深度学习和自然语言处理方法，以帮助自动饮食评估
4. Improvement of the learning process of the automated speaker recognition system for critical use with HMM-DNN component [C] . Mykola M. Bykov, Viacheslav V. Kovtun, Iryna M. Kobylyanska, Photonics Applications in Astronomy, Communications, Industry, and High-Energy Physics Experiments . 2019

机译：与HMM-DNN组件关键用途的自动扬声器识别系统的学习过程的改进
5. Multimodal Sensing and Data Processing for Speaker and Emotion Recognition Using Deep Learning Models with Audio, Video and Biomedical Sensors [D] . Abtahi, Farnaz. 2018

机译：使用具有音频，视频和生物医学传感器的深度学习模型，对说话人和情感识别进行多模式传感和数据处理
6. Mixed deep learning and natural language processing method for fake-food image recognition and standardization to help automated dietary assessment [O] . Simon Mezgec, Tome Eftimov, Tamara Bucher, -1

机译：混合深度学习和自然语言处理方法用于假食品图像识别和标准化以帮助自动膳食评估
7. ANALYSIS OF THE AUTOMATED SPEAKER RECOGNITION SYSTEM OF CRITICAL USE OPERATION RESULTS [O] . O. V. Bisikalo, V. V. Kovtun, M. S. Yukhimchuk, 2019

机译：分析关键用途运行结果的自动扬声器识别系统
8. Robust Speech Processing & Recognition: Speaker ID, Language ID, Speech Recognition/Keyword Spotting, Diarization/Co-Channel/Environmental Characterization, Speaker State Assessment. [R] . Hansen, J. H. 2015

机译：强大的语音处理和识别：说话者ID，语言ID，语音识别/关键字识别，Diarization / Co-Channel /环境表征，说话者状态评估。

Improvement of the learning process of the automated speaker recognition system for critical use with HMM-DNN component

摘要

著录项

相似文献

相关主题

期刊订阅