首页> 外文期刊>International Journal of Neural Systems >Acoustic Space Learning for Sound-Source Separation and Localization on Binaural Manifolds
【24h】

Acoustic Space Learning for Sound-Source Separation and Localization on Binaural Manifolds

机译:声学空间学习在双耳歧管中进行声源分离和定位

获取原文
获取原文并翻译 | 示例
           

摘要

In this paper, we address the problems of modeling the acoustic space generated by a full-spectrum sound source and using the learned model for the localization and separation of multiple sources that simultaneously emit sparse-spectrum sounds. We lay theoretical and methodological grounds in order to introduce the binaural manifold paradigm. We perform an in-depth study of the latent low-dimensional structure of the high-dimensional interaural spectral data, based on a corpus recorded with a human-like audiomotor robot head. A nonlinear dimensionality reduction technique is used to show that these data lie on a two-dimensional (2D) smooth manifold parameterized by the motor states of the listener, or equivalently, the sound-source directions. We propose a probabilistic piecewise affine mapping model (PPAM) specifically designed to deal with high-dimensional data exhibiting an intrinsic piecewise linear structure. We derive a closed-form expectation-maximization (EM) procedure for estimating the model parameters, followed by Bayes inversion for obtaining the full posterior density function of a sound-source direction. We extend this solution to deal with missing data and redundancy in real-world spectrograms, and hence for 2D localization of natural sound sources such as speech. We further generalize the model to the challenging case of multiple sound sources and we propose a variational EM framework. The associated algorithm, referred to as variational EM for source separation and localization (VESSL) yields a Bayesian estimation of the 2D locations and time-frequency masks of all the sources. Comparisons of the proposed approach with several existing methods reveal that the combination of acoustic-space learning with Bayesian inference enables our method to outperform state-of-the-art methods.
机译:在本文中,我们解决了对由全频谱声源生成的声学空间建模以及将学习到的模型用于同时发出稀疏频谱声音的多个声源的定位和分离的问题。为了介绍双耳流形范例,我们奠定了理论和方法论基础。我们对高维听觉频谱数据的潜在低维结构进行了深入研究,其基础是用类似人的声马达机器人头部记录的语料库。非线性降维技术用于显示这些数据位于二维(2D)平滑流形上,该平滑流形由听众的运动状态或等效地由声源方向参数化。我们提出了一种概率分段仿射映射模型(PPAM),该模型专门设计用于处理具有固有分段线性结构的高维数据。我们推导了用于估计模型参数的封闭式期望最大化(EM)程序,然后进行了贝叶斯反演以获取声源方向的全部后验密度函数。我们扩展了该解决方案,以处理现实世界声谱图中的数据丢失和冗余,因此可以处理自然声源(如语音)的2D定位。我们进一步将模型推广到具有多种声源的具有挑战性的情况,并提出了一种变体EM框架。称为源分离和定位的变分EM(VESSL)的相关算法产生所有源的2D位置和时频掩码的贝叶斯估计。所提出的方法与几种现有方法的比较表明,将声学空间学习与贝叶斯推理相结合可以使我们的方法优于最新方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号