首页> 外文会议>Annual International Conference on Computational Linguistics and Intelligent Text Processing >Combining Word and Phonetic-Code Representations for Spoken Document Retrieval
【24h】

Combining Word and Phonetic-Code Representations for Spoken Document Retrieval

机译:结合口语文档检索的单词和语音代码表示

获取原文

摘要

The traditional approach for spoken document retrieval (SDR) uses an automatic speech recognizer (ASR) in combination with a word-based information retrieval method. This approach has only showed limited accuracy, partially because ASR systems tend to produce transcriptions of spontaneous speech with significant word error rate. In order to overcome such limitation we propose a method which uses word and phonetic-code representations in collaboration. The idea of this combination is to reduce the impact of transcription errors in the processing of some (presumably complex) queries by representing words with similar pronunciations through the same phonetic code. Experimental results on the CLEF-CLSR-2007 corpus are encouraging; the proposed hybrid method improved the mean average precision and the number of retrieved relevant documents from the traditional word-based approach by 3% and 7% respectively.
机译:传统的口头文档检索(SDR)方法使用自动语音识别器(ASR)与基于Word的信息检索方法结合使用。这种方法仅显示了有限的准确性,部分原因是ASR系统倾向于产生具有显着字错误率的自发性言语的转录。为了克服此类限制,我们提出了一种使用单词和语音代码表示的方法。这种组合的思想是通过代表通过相同的语音代码的单词来减少转录错误在处理某些(可能的复杂)查询时的影响。 CLEF-CLSR-2007语料库的实验结果令人鼓舞;所提出的混合方法改善了平均平均精度和从传统的基于词的方法的检索相关文件的数量分别为3%和7%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号