Speaker Diarization with Session-Level Speaker Embedding Refinement Using Graph Neural Networks

机译：使用图形神经网络与会话级扬声器嵌入细化的扬声器日益改血

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Deep speaker embedding models have been commonly used as a building block for speaker diarization systems; however, the speaker embedding model is usually trained according to a global loss defined on the training data, which could be suboptimal for distinguishing speakers locally in a specific meeting session. In this work we present the first use of graph neural networks (GNNs) for the speaker diarization problem, utilizing a GNN to refine speaker embeddings locally using the structural information between speech segments inside each session. The speaker embeddings extracted by a pre-trained model are remapped into a new embedding space, in which the different speakers within a single session are better separated. The model is trained for linkage prediction in a supervised manner by minimizing the difference between the affinity matrix constructed by the refined embeddings and the ground-truth adjacency matrix. Spectral clustering is then applied on top of the refined embeddings. We show that the clustering performance of the refined speaker embeddings outperforms the original embeddings significantly on both simulated and real meeting data, and our system achieves the state-of-the-art result on the NIST SRE 2000 CALLHOME database.

机译：较深扬声器嵌入模型通常用作扬声器日益改估系统的构建块;然而，扬声器嵌入模型通常根据培训数据上定义的全局损失培训，这可能是在特定会议中区分扬声器的次优。在这项工作中，我们介绍了图形神经网络（GNN）的第一次使用扬声器日复速度问题，利用GNN在每个会话内的语音段之间本地优化扬声器嵌入。通过预先训练的模型提取的扬声器嵌入式被重新映射到新的嵌入空间中，其中单个会话中的不同扬声器更好地分离。通过最小化由精细嵌入的嵌入和地面邻接矩阵构成的亲和矩阵之间的亲和矩阵之间的差异，在监督方式中训练该模型。然后将光谱聚类施加在精制嵌入物的顶部。我们表明，精细扬声器嵌入式的聚类性能在模拟和真实的会议数据上显着优于原始嵌入式，我们的系统实现了NIST SRE 2000 CallHome数据库的最先进结果。

著录项

来源
《IEEE International Conference on Acoustics, Speech and Signal Processing》|2020年|p6824-7443|共5页
会议地点
作者
Jixuan Wang; Xiong Xiao; Jian Wu; Ranjani Ramamurthy; Frank Rudzicz; Michael Brudno;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TN912-53;
关键词
Speaker diarization; graph neural networks; deep speaker embedding;

机译：扬声器日期;图形神经网络;深扬声器嵌入;

相似文献

外文文献
中文文献
专利

1. Speaker diarization using autoassociative neural networks [J] . S. Jothilakshmi, V. Ramalingam, S. Palanivel Engineering Applications of Artificial Intelligence . 2009,第4a5期

机译：使用自联想神经网络进行说话人区分
2. State-of-the-art speaker recognition with neural network embeddings in NIST SRE18 and Speakers in the Wild evaluations [J] . Jesus Villalba, Nanxin Chen, David Snyder, Computer speech and language . 2020,第Mara期

机译：NIST SRE18中具有神经网络嵌入功能的最先进的说话人识别功能，Wild评估中的说话人功能
3. Speaker/Style-Dependent Neural Network Speech Synthesis Based on Speaker/Style Embedding [J] . Milan Se?ujski, Darko Pekar, Sini?a Suzi?, Journal of Universal Computer Science . 2020,第4期

机译：基于扬声器/风格嵌入的扬声器/型依赖神经网络语音合成
4. Speaker Diarization with Session-Level Speaker Embedding Refinement Using Graph Neural Networks [C] . Jixuan Wang, Xiong Xiao, Jian Wu, IEEE International Conference on Acoustics, Speech and Signal Processing . 2020

机译：基于图神经网络的会话级说话人嵌入细化算法
5. Speaker Recognition: Evaluation for GMM-UBM and 3D Convolutional Neural Networks Systems [D] . Alghamdi, Mohammad S. 2019

机译：说话者识别：对GMM-UBM和3D卷积神经网络系统的评估
6. Speaker-dependent multipitch tracking using deep neural networks [O] . Yuzhou Liu, DeLiang Wang -1

机译：使用深度神经网络的说话人相关多音高跟踪
7. Speaker Diarization using Deep Recurrent Convolutional Neural Networks for Speaker Embeddings [O] . Cyrta, Pawel, Trzciński, Tomasz, Stokowiec, Wojciech 2017

机译：使用深度递归卷积神经网络的扬声器二值化用于扬声器嵌入

Speaker Diarization with Session-Level Speaker Embedding Refinement Using Graph Neural Networks

摘要

著录项

相似文献

相关主题

期刊订阅