首页> 外文学位 >A Framework for Enhancing Speaker Age and Gender Classification by Using a New Feature Set and Deep Neural Network Architectures
【24h】

A Framework for Enhancing Speaker Age and Gender Classification by Using a New Feature Set and Deep Neural Network Architectures

机译:通过使用新功能集和深度神经网络体系结构提高演讲者年龄和性别分类的框架

获取原文
获取原文并翻译 | 示例

摘要

Speaker age and gender classification is one of the most challenging problems in speech processing. Recently with developing technologies, identifying a speaker age and gender has become a necessity for speaker verification and identification systems such as identifying suspects in criminal cases, improving human-machine interaction, and adapting music for awaiting people queue. Although many studies have been carried out focusing on feature extraction and classifier design for improvement, classification accuracies are still not satisfactory. The key issue in identifying speaker's age and gender is to generate robust features and to design an in-depth classifier. Age and gender information is concealed in speaker's speech, which is liable for many factors such as, background noise, speech contents, and phonetic divergences.;In this work, different methods are proposed to enhance the speaker age and gender classification based on the deep neural networks (DNNs) as a feature extractor and classifier. First, a model for generating new features from a DNN is proposed. The proposed method uses the Hidden Markov Model toolkit (HTK) tool to find tied-state triphones for all utterances, which are used as labels for the output layer in the DNN. The DNN with a bottleneck layer is trained in an unsupervised manner for calculating the initial weights between layers, then it is trained and tuned in a supervised manner to generate transformed mel-frequency cepstral coefficients (T-MFCCs). Second, the shared class labels method is introduced among misclassified classes to regularize the weights in DNN. Third, DNN-based speakers models using the SDC feature set is proposed. The speakers-aware model can capture the characteristics of the speaker age and gender more effectively than a model that represents a group of speakers. In addition, AGender-Tune system is proposed to classify the speaker age and gender by jointly fine-tuning two DNN models; the first model is pre-trained to classify the speaker age, and second model is pre-trained to classify the speaker gender. Moreover, the new T-MFCCs feature set is used as the input of a fusion model of two systems. The first system is the DNN-based class model and the second system is the DNN-based speaker model. Utilizing the T-MFCCs as input and fusing the final score with the score of a DNN-based class model enhanced the classification accuracies. Finally, the DNN-based speaker models are embedded into an AGender-Tune system to exploit the advantages of each method for a better speaker age and gender classification.;The experimental results on a public challenging database showed the effectiveness of the proposed methods for enhancing the speaker age and gender classification and achieved the state of the art on this database.
机译:说话者的年龄和性别分类是语音处理中最具挑战性的问题之一。近年来,随着技术的发展,识别说话者的年龄和性别已成为说话者验证和识别系统的必要条件,例如识别刑事案件中的嫌疑人,改善人机交互以及调整音乐以等待排队。尽管已经进行了许多针对特征提取和分类器设计以进行改进的研究,但是分类精度仍然不令人满意。识别发言人的年龄和性别的关键问题是生成可靠的功能并设计一个深入的分类器。说话者的语音中隐藏了年龄和性别信息,这归因于背景噪音,语音内容和语音差异等诸多因素。在这项工作中,基于深入的研究,提出了多种方法来提高说话者的年龄和性别分类。神经网络(DNN)作为特征提取器和分类器。首先,提出了一种用于从DNN生成新特征的模型。所提出的方法使用隐马尔可夫模型工具包(HTK)工具来查找所有话语的束缚态三音器,并用作DNN中输出层的标签。带有瓶颈层的DNN以无监督的方式进行训练,以计算各层之间的初始权重,然后以有监督的方式进行训练和调整,以生成变换的梅尔频率倒谱系数(T-MFCC)。其次,在错误分类的类之间引入共享类标签方法以规范DNN中的权重。第三,提出了使用SDC功能集的基于DNN的扬声器模型。说话者感知模型比代表一组说话者的模型可以更有效地捕获说话者年龄和性别的特征。另外,提出了AGender-Tune系统,通过联合微调两个DNN模型对说话者的年龄和性别进行分类。预训练第一个模型以对说话者年龄进行分类,预训练第二个模型以对说话者性别进行分类。此外,新的T-MFCCs功能集被用作两个系统的融合模型的输入。第一个系统是基于DNN的类模型,第二个系统是基于DNN的说话者模型。利用T-MFCC作为输入并将最终分数与基于DNN的班级模型的分数融合在一起,可以提高分类的准确性。最后,将基于DNN的说话人模型嵌入到AGender-Tune系统中,以利用每种方法的优势来实现更好的说话人年龄和性别分类。演讲者的年龄和性别分类,并在此数据库上达到了最新水平。

著录项

  • 作者

    Abumallouh, Arafat.;

  • 作者单位

    University of Bridgeport.;

  • 授予单位 University of Bridgeport.;
  • 学科 Artificial intelligence.
  • 学位 Ph.D.
  • 年度 2017
  • 页码 95 p.
  • 总页数 95
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 农业化学;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号