首页> 外文学位 >Use of speaker location features in meeting diarization.
【24h】

Use of speaker location features in meeting diarization.

机译:会议发言者使用语音定位功能。

获取原文
获取原文并翻译 | 示例

摘要

This thesis proposes several improvements to the correlation-based location features recently used in meeting speaker diarization (answering the question, "Who spoke when?"). The problem of leveraging time delay information is examined for multi-microphone meeting environments, where microphones are placed at unknown, widely spaced, and ad-hoc locations. In addition, conversational speech is challenging because of the many short utterances and speaker overlaps. Finally, assuming no room constraints, the microphone configuration and acoustic environment changes from meeting to meeting. Together, these conditions make it impractical to apply standard localization and beamforming techniques. To address these challenges, we first consider what combination of channel pairs and signal processing to use for location information extraction. Initially, we consider all pairs, then de-emphasizing low quality pairs with feature vector dimension reduction. We also develop an approach for fusing speaker ID information as viewed by different physical processes. Two views are a new time delay estimate and multi-band energy ratios (cues to location) and a third is a vector of mel-warped cepstral coefficients (MFCC's), related to vocal tract characteristics. We find that both MFCC's and energy ratios can improve time delay information when jointly transformed using canonical correlation analysis (CCA). Oracle experiments show that the location feature dimension producing the best diarization error varies with meeting. Therefore, we evaluate automatic methods for determining feature reduction output dimension. In addition, we separately consider reducing the feature dimension by explictly selecting subsets of channel pairs using estimated signal to noise ratio (SNR) and information-theoretic feature selection methods. Location features are also employed to detect speaker overlap, a significant cause of increased speaker diarization error. First, monaural overlap features are developed for a single channel beamformer output. These features are then compared to overlap detector features which make use of location information, but neither type provides good performance due to a high degree of variation across meetings. We also develop a simple, nearest-neighbor overlap processing scheme which, when given accurate overlap detection, improves diarization accuracy. Together, these results underscore the need for dynamic models to handle variable room and recording configurations.
机译:本文提出了一些新的改进方案,这些改进方案最近用于满足会议发言人的二字化要求(回答“谁在何时说话?”的问题)。在多麦克风会议环境中研究了利用时间延迟信息的问题,在这种环境中,麦克风被放置在未知的,间隔较大的临时位置。另外,由于许多简短的讲话和说话者重叠,对话语音具有挑战性。最后,假设没有房间限制,麦克风的配置和声学环境会在会议之间发生变化。总之,这些条件使应用标准定位和波束成形技术变得不切实际。为了解决这些挑战,我们首先考虑将信道对和信号处理的哪种组合用于位置信息提取。最初,我们考虑所有对,然后通过特征向量维数减少对低质量对进行强调。我们还开发了一种融合说话人ID信息的方法,这些信息可以通过不同的物理过程查看。两种观点是新的时延估计和多频带能量比(提示到位置),第三种观点是与声道特性有关的翘曲倒谱系数(MFCC)的向量。我们发现,使用规范相关分析(CCA)进行联合转换时,MFCC和能量比都可以改善时延信息。 Oracle实验表明,产生最佳误差的位置特征维随会议而变化。因此,我们评估用于确定特征约简输出尺寸的自动方法。此外,我们分别考虑通过使用估计的信噪比(SNR)和信息理论特征选择方法显式选择通道对的子集来减小特征尺寸。定位功能还用于检测说话者重叠,这是说话者二值化误差增加的重要原因。首先,为单通道波束形成器输出开发了单声道重叠功能。然后将这些功能与利用位置信息的重叠检测器功能进行比较,但是由于会议之间的高度差异,这两种类型都无法提供良好的性能。我们还开发了一种简单的,最近邻的重叠处理方案,该方案在进行精确的重叠检测时可以提高数字化精度。总之,这些结果强调了需要动态模型来处理可变房间和录制配置的需求。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号