首页> 美国卫生研究院文献>Sensors (Basel Switzerland) >Towards End-to-End Acoustic Localization Using Deep Learning: From Audio Signals to Source Position Coordinates
【2h】

Towards End-to-End Acoustic Localization Using Deep Learning: From Audio Signals to Source Position Coordinates

机译:使用深度学习实现端到端声学定位:从音频信号到源位置坐标

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

This paper presents a novel approach for indoor acoustic source localization using microphone arrays, based on a Convolutional Neural Network (CNN). In the proposed solution, the CNN is designed to directly estimate the three-dimensional position of a single acoustic source using the raw audio signal as the input information and avoiding the use of hand-crafted audio features. Given the limited amount of available localization data, we propose, in this paper, a training strategy based on two steps. We first train our network using semi-synthetic data generated from close talk speech recordings. We simulate the time delays and distortion suffered in the signal that propagate from the source to the array of microphones. We then fine tune this network using a small amount of real data. Our experimental results, evaluated on a publicly available dataset recorded in a real room, show that this approach is able to produce networks that significantly improve existing localization methods based on SRP-PHAT strategies and also those presented in very recent proposals based on Convolutional Recurrent Neural Networks (CRNN). In addition, our experiments show that the performance of our CNN method does not show a relevant dependency on the speaker’s gender, nor on the size of the signal window being used.
机译:本文提出了一种基于卷积神经网络(CNN)的使用麦克风阵列进行室内声源定位的新方法。在提出的解决方案中,CNN旨在使用原始音频信号作为输入信息来直接估计单个声源的三维位置,并避免使用手工制作的音频功能。鉴于可用的本地化数据数量有限,我们在本文中提出了一种基于两个步骤的训练策略。我们首先使用从近距离语音记录生成的半合成数据来训练我们的网络。我们模拟了从源传播到麦克风阵列的信号中的时间延迟和失真。然后,我们使用少量实际数据微调此网络。我们对在真实房间中记录的公开可用数据集进行的实验结果表明,这种方法能够产生可显着改善基于SRP-PHAT策略以及基于卷积递归神经网络的最新提议中提出的定位方法的网络。网络(CRNN)。另外,我们的实验表明,我们的CNN方法的性能并未显示出与说话者性别的相关性,也没有显示出所使用的信号窗口的大小。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号