首页> 外文会议>IEEE International Conference on Systems, Man, and Cybernetics >Identification and Localization of One or Two Concurrent Speakers in a Binaural Robotic Context
【24h】

Identification and Localization of One or Two Concurrent Speakers in a Binaural Robotic Context

机译:双耳机器人上下文中一个或两个并发发言人的识别和定位

获取原文

摘要

This paper presents a method of identification and azimuth estimation for one or two concurrent speakers in simultaneous utterances. This method is applicable to human-machine interaction and robot audition. Identification and localization have been rarely mutually addressed and related works rely on time-frequency exploitation strategies to extract and treat each source's contribution to the received signal. The presented method relies on a training made with one speaker at a time, but it can exploit a speech segment to identify and localize two speakers. A cochlear filtering-based binaural front-end allows to extract equivalent rectangular bandwidth frequency cepstral coefficients (ERBFCC) and interaural level difference (ILD) features. Artificial neural networks (ANNs) exploit ERBFCCs to provide identity information, and a histogram-based exploitation of ILDs provides azimuth angle information. The method was evaluated in contexts including overlapping segments in the presence of noises and sound reflections and its efficiency was demonstrated. Even with fully overlapping utterances, we reached an 83% identification rate of both speakers, an 82% estimation accuracy of both azimuths and an 68% correct mutual identity and azimuth estimation rate. At least one speaker was correctly identified and localized in more than 99% of the tests for utterances lasting near 5s.
机译:本文提出了一种用于同时发声的一个或两个并发扬声器的识别和方位角估计的方法。该方法适用于人机交互和机器人试听。识别和定位很少相互解决,相关工作依赖于时频利用策略来提取和处理每个信号源对接收信号的贡献。提出的方法依赖于一次由一位说话者进行的训练,但是它可以利用语音段来识别和定位两位说话者。基于耳蜗滤波的双耳前端允许提取等效矩形带宽频率倒谱系数(ERBFCC)和耳间水平差(ILD)特征。人工神经网络(ANN)利用ERBFCC提供身份信息,而基于直方图的ILD利用则提供了方位角信息。在存在噪声和声音反射的情况下,在包括重叠片段的环境中评估了该方法,并证明了其效率。即使发声完全重叠,我们两个说话者的识别率也达到了83%,两个方位角的估算准确度都达到了82%,正确的相互身份和方位角估算率达到了68%。在至少99%的测试中,至少有一位说话者被正确识别并定位,发声持续时间接近5s。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号