首页> 外文期刊>Digital investigation >Forensic speaker recognition: A new method based on extracting accent and language information from short utterances
【24h】

Forensic speaker recognition: A new method based on extracting accent and language information from short utterances

机译:法医扬声器识别:一种基于简短话语中提取重音和语言信息的新方法

获取原文
获取原文并翻译 | 示例
       

摘要

This paper presents a new method for Forensic Speaker Recognition (FSR). The new method is based on extracting accent and language information from short utterances. Accent Classification (AC) and Lan-guage Identification (LI) play important role in the identification of people of different groups, communities and origins due to different speaking styles and native languages. In a multilingual society, the forensic experts use AC and LI to reduce search space for suspect recognition to regional and ethnic groups. In this paper, we use different baseline and deep learning methods to automate this process. The baseline methods used are Gaussian Mixture Model-Universal Background Model (GMM-UBM), i-vector and Gaussian Mixture Model-Support Vector Machine (GMM-SVM). The Mel-Frequency Cepstral Coefficients (MFCC) are used as speech features in the baseline methods. The deep learning methods used are Convolutional Neural Network (CNN) and Deep Neural Network (DNN). The recently proposed CNN based methods like VGGVox and GMM-CNN are used. VGGVox and GMM-CNN use speech spectrograms. In case of DNN, x-vectors method is used, which is based on DNN embedding. The experimental results show that GMM-SVM demonstrates better FSR performance compared to GMM-UBM and i-vector methods. Whereas, x-vectors method performs better than GMM-CNN and VGGVox methods. It also performs better than GMM-SVM method. The experimental results show that x-vectors method demonstrates 80.4% FSR accuracy. With AC, it achieves 85.4% accuracy. With LI, its accuracy is 90.2%. Whereas by combining AC and LI it obtains 95.1% accuracy. This shows that the proposed method based on AC and LI gives promising results. (C) 2020 Elsevier Ltd. All rights reserved.
机译:本文介绍了法医扬声器识别(FSR)的新方法。新方法基于从短语中提取重音和语言信息。口音分类(AC)和LAN-GUAGE识别(LI)在识别不同群体,社区和起源的识别中起重要作用,由于不同的说话方式和母语。在多语种社会中,法医专家使用AC和LI减少对区域和族群的嫌疑人的识别搜索空间。在本文中,我们使用不同的基线和深度学习方法来自动化此过程。使用的基线方法是高斯混合模型 - 通用背景模型(GMM-UBM),I形式和高斯混合模型支持向量机(GMM-SVM)。熔融频率谱系数(MFCC)用作基线方法中的语音特征。使用的深度学习方法是卷积神经网络(CNN)和深神经网络(DNN)。最近提出的基于CNN的基于CNN的方法,如VGGVOX和GMM-CNN。 VGGVOX和GMM-CNN使用语音谱图。在DNN的情况下,使用X载体方法,其基于DNN嵌入。实验结果表明,与GMM-UBM和I载体方法相比,GMM-SVM展示了更好的FSR性能。虽然,X-Vectors方法比GMM-CNN和VGGVOX方法更好。它还比GMM-SVM方法更好。实验结果表明,X型载体方法证明了80.4%的FSR精度。通过AC,精度达到85.4%。与李,其准确性为90.2%。而通过组合AC和LI,它获得95.1%的精度。这表明基于AC和LI的提出方法提供了有希望的结果。 (c)2020 elestvier有限公司保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号