Coordination of Speech Recognition Devices in Intelligent Environments with Multiple Responsive Devices

Antonio Benítez-Guijarro; Zoraida Callejas; Manuel Noguera; Kawtar Benghazi

摘要

Devices with oral interfaces are enabling new interesting interaction scenarios and ways of interaction in ambient intelligence settings. The use of several of such devices in the same environment opens up the possibility to compare the inputs gathered from each one of them and perform a more accurate recognition and processing of user speech. However, the combination of multiple devices presents coordination challenges, as the processing of one voice signal by different speech processing units may result in conflicting outputs and it is necessary to decide which is the most reliable source. This paper presents an approach to rank several sources of spoken input in multi-device environments in order to give preference to the input with the highest estimated quality. The voice signals received by the multiple devices are assessed in terms of their calculated acoustic quality and the reliability of the speech recognition hypotheses produced. After this assessment, each input is assigned a unique score that allows the audio sources to be ranked so as to pick the best to be processed by the system. In order to validate this approach, we have performed an evaluation using a corpus of 4608 audios recorded in a two-room intelligent environment with 24 microphones. The experimental results show that our ranking approach makes it possible to successfully orchestrate an increasing number of acoustic inputs, obtaining better recognition rates than considering a single input, both in clear and noisy settings.

机译：具有口头接口的设备可以实现新的有趣交互方案和环境智能设置中的交互方式。在相同环境中使用多个这样的设备开辟了比较从它们中的每一个聚集的输入并执行更准确的用户语音的识别和处理的可能性。然而，多个设备的组合提出了协调挑战，因为通过不同语音处理单元的一个语音信号的处理可能导致输出冲突，并且有必要确定哪个是最可靠的源。本文介绍了一种在多设备环境中排名几个口头输入来源的方法，以便优先于具有最高估计质量的输入。由多个设备接收的语音信号在其计算的声学质量方面进行评估，并且产生的语音识别假设的可靠性。在此评估之后，每个输入都分配了一个唯一的分数，允许音频源排列，以便选择系统最佳处理。为了验证这种方法，我们使用与24个麦克风的两室智能环境中记录的4608个音频进行了评估。实验结果表明，我们的排名方法使得可以成功协调越来越多的声学输入，而不是考虑清晰和嘈杂的设置的单个输入而比考虑单个输入更好的识别率。

Coordination of Speech Recognition Devices in Intelligent Environments with Multiple Responsive Devices

摘要

著录项

相关主题

期刊订阅