首页> 外文会议>Asia-Pacific Signal and Information Processing Association Annual Summit and Conference >Confidence estimation and keyword extraction from speech recognition result based on Web information
【24h】

Confidence estimation and keyword extraction from speech recognition result based on Web information

机译:基于Web信息的语音识别结果的置信度估计和关键词提取

获取原文

摘要

This paper proposes to use Web information for confidence measure and to extract keywords for speech recognition results. Spoken document processing has been attracting attention particularly for information retrieval and video (audiovisual) content systems. For example, measuring a confidence score which indicates how likely a document or a segmented document includes recognition errors has been studied. It is well known keyword extraction from recognition results is also an important issue. For these purposes, in this paper, pointwise mutual information (PMI) between two words is employed. PMI has been used to calculate a confidence measure of speech recognition, as a coherence measure by co-occurrence of words. We propose to further improve the method by using a Web query expansion technique with term triplets which consist of nouns in the same document. We also apply PMI to keyword estimation by summing a co-occurrence score (sumPMI) between a targeting keyword candidate and each term. The proposed methods were tested with 10 lectures in Corpus of Spontaneous Japanese (CSJ) and 2 simulated movie dialogues. In the experiments it is shown that the estimated confidence score has high relationship with recognition accuracy, indicating the effectiveness of our method. And sumPMI scores for keywords have higher values in the subjective tests.
机译:本文建议使用Web信息进行置信度度量,并提取用于语音识别结果的关键字。口头文档处理一直引起人们的关注,特别是在信息检索和视频(视听)内容系统中。例如,已经研究了测量指示文档或分割的文档包括识别错误的可能性的置信度得分。众所周知,从识别结果中提取关键词也是一个重要的问题。为此,在本文中,使用了两个单词之间的逐点相互信息(PMI)。 PMI已被用于计算语音识别的置信度,作为通过单词共现的连贯度度量。我们建议通过使用Web查询扩展技术和术语三胞胎来进一步改进该方法,该词三胞胎由同一文档中的名词组成。通过将定位关键字候选词和每个词之间的共现分数(sumPMI)相加,我们还将PMI应用于关键字估计。在自发日语语料库(CSJ)中进行了10次演讲,并通过2次模拟电影对话对提出的方法进行了测试。在实验中表明,估计的置信度得分与识别准确度有很高的关系,表明了我们方法的有效性。并且关键字的sumPMI分数在主观测试中具有较高的值。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号