首页> 外文会议>Information Communication Technologies Conference >Deep CNN with SE Block for Speaker Recognition
【24h】

Deep CNN with SE Block for Speaker Recognition

机译:扬声器识别的SE块深CNN

获取原文

摘要

This paper highlights a structure called SECNN. It combines squeeze-and-excitation (SE) components with a simplified residual convolutional neural network (ResNet). This model takes time-frequency spectrogram as input and measures speaker similarity between an utterance embedding and speaker models by cosine similarity. Speaker models are obtained by averaging utterance level features of each enrollment speaker. On the one hand, SECNN can mitigate speaker overfitting in speaker verification by using some techniques such as regularization techniques and SE operation. On the other hand, SECNN is a lightweight model with merely 1.5M parameters. Experimental results indicate that SECNN outperforms other end-to-end models such as Deep Speaker and achieves an equal error rate (EER) of 5.55% in speaker verification and accuracy of 93.92% in speaker identification on Librispeech dataset. It also achieves an EER of 2.58% in speaker verification and accuracy of 95.83% in speaker identification on TIMIT dataset.
机译:本文突出了一个名为SECNN的结构。它将挤压和激励(SE)组件与简化的残余卷积神经网络(Reset)结合起来。该模型将时间频谱图作为输入,通过余弦相似性在话语嵌入和扬声器模型之间测量扬声器相似性。通过平均每个注册扬声器的话语级别功能获得扬声器模型。一方面,通过使用诸如正则化技术和SE操作的一些技术,SECNN可以减轻扬声器验证中的扬声器过度装备。另一方面,SECNN是一种轻量级模型,仅为1.5米参数。实验结果表明,SECNN优于诸如深扬声器的其他端到端模型,并在LibrisPeech数据集上的扬声器验证和准确度的扬声器验证和准确度实现了5.55%的相同错误率(eer)。它还在Timit DataSet上实现了扬声器验证和扬声器验证和准确性的eer,eer为95.83%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号