Identification and Localization of One or Two Concurrent Speakers in a Binaural Robotic Context

机译：双耳机器人上下文中一个或两个并发发言人的识别和定位

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper presents a method of identification and azimuth estimation for one or two concurrent speakers in simultaneous utterances. This method is applicable to human-machine interaction and robot audition. Identification and localization have been rarely mutually addressed and related works rely on time-frequency exploitation strategies to extract and treat each source's contribution to the received signal. The presented method relies on a training made with one speaker at a time, but it can exploit a speech segment to identify and localize two speakers. A cochlear filtering-based binaural front-end allows to extract equivalent rectangular bandwidth frequency cepstral coefficients (ERBFCC) and interaural level difference (ILD) features. Artificial neural networks (ANNs) exploit ERBFCCs to provide identity information, and a histogram-based exploitation of ILDs provides azimuth angle information. The method was evaluated in contexts including overlapping segments in the presence of noises and sound reflections and its efficiency was demonstrated. Even with fully overlapping utterances, we reached an 83% identification rate of both speakers, an 82% estimation accuracy of both azimuths and an 68% correct mutual identity and azimuth estimation rate. At least one speaker was correctly identified and localized in more than 99% of the tests for utterances lasting near 5s.

机译：本文提出了一种用于同时发声的一个或两个并发扬声器的识别和方位角估计的方法。该方法适用于人机交互和机器人试听。识别和定位很少相互解决，相关工作依赖于时频利用策略来提取和处理每个信号源对接收信号的贡献。提出的方法依赖于一次由一位说话者进行的训练，但是它可以利用语音段来识别和定位两位说话者。基于耳蜗滤波的双耳前端允许提取等效矩形带宽频率倒谱系数（ERBFCC）和耳间水平差（ILD）特征。人工神经网络（ANN）利用ERBFCC提供身份信息，而基于直方图的ILD利用则提供了方位角信息。在存在噪声和声音反射的情况下，在包括重叠片段的环境中评估了该方法，并证明了其效率。即使发声完全重叠，我们两个说话者的识别率也达到了83％，两个方位角的估算准确度都达到了82％，正确的相互身份和方位角估算率达到了68％。在至少99％的测试中，至少有一位说话者被正确识别并定位，发声持续时间接近5s。

著录项

来源
《IEEE International Conference on Systems, Man, and Cybernetics》|2015年|407-412|共6页
会议地点
作者
Karim Youssef; Katsutoshi Itoyama; Kazuyoshi Yoshii;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Speaker identification; binaural inputs; human-machine interaction; localization; robot audition;

机译：说话人识别;双耳输入;人机交互;定位;机器人试听;

相似文献

外文文献
中文文献
专利

1. Improved binaural sound localization and tracking for unknown time-varying number of speakers [J] . Ui-Hyun Kim, Hiroshi G. Okuno Advanced Robotics: The International Journal of the Robotics Society of Japan . 2013,第15a16期

机译：改进的双耳声音定位和跟踪功能，可针对未知的随时间变化的扬声器数量
2. A Binaural Scene Analyzer for Joint Localization and Recognition of Speakers in the Presence of Interfering Noise Sources and Reverberation [J] . May T. Audio, Speech, and Language Processing, IEEE Transactions on . 2012,第7期

机译：双耳场景分析仪，用于在干扰源和混响存在的情况下对说话人进行联合定位和识别
3. A survey on sound source localization in robotics: From binaural to array processing methods [J] . S. Argentieri, P. Danes, P. Soueres Computer speech and language . 2015,第1期

机译：机器人中声源定位的调查：从双耳到阵列处理方法
4. Identification and Localization of One or Two Concurrent Speakers in a Binaural Robotic Context [C] . Karim Youssef, Katsutoshi Itoyama, Kazuyoshi Yoshii IEEE International Conference on Systems, Man, and Cybernetics . 2015

机译：在双耳机器人背景下的一个或两个并发扬声器的识别和定位
5. An Attention-based Methodology for Context Identification and Exploitation in Autonomous Robots [D] . Montironi, Maria Alessandra. 2018

机译：基于注意力的自主机器人情境识别与开发方法
6. Floor Covering and Surface Identification for Assistive Mobile Robotic Real-Time Room Localization Application [O] . Michael Gillham, Gareth Howells, Sarah Spurgeon, 2013

机译：辅助移动机器人实时房间定位应用的地板覆盖和表面识别
7. A Survey on Sound Source Localization in Robotics: from Binaural to Array Processing Methods [O] . Argentieri, Sylvain, Danès, Patrick, Souères, Philippe 2015

机译：机器人中声源定位研究：从双耳到阵列处理方法

Identification and Localization of One or Two Concurrent Speakers in a Binaural Robotic Context

摘要

著录项

相似文献

相关主题

期刊订阅