...
首页> 外文期刊>IEEE Transactions on Speech and Audio Proceessing >SpeechFind: Advances in Spoken Document Retrieval for a National Gallery of the Spoken Word
【24h】

SpeechFind: Advances in Spoken Document Retrieval for a National Gallery of the Spoken Word

机译:SpeechFind:国家语言单词库的语音文档检索进展

获取原文
获取原文并翻译 | 示例
           

摘要

Advances in formulating spoken document retrieval for a new National Gallery of the Spoken Word (NGSW) are addressed. NGSW is the first large-scale repository of its kind, consisting of speeches, news broadcasts, and recordings from the 20th century. After presenting an overview of the audio stream content of the NGSW, with sample audio files from U.S. Presidents from 1893 to the present, an overall system diagram is proposed with a discussion of critical tasks associated with effective audio information retrieval. These include advanced audio segmentation, speech recognition model adaptation for acoustic background noise and speaker variability, and information retrieval using natural language processing for text query requests that include document and query expansion. For segmentation, a new evaluation criterion entitled fused error score (FES) is proposed, followed by application of the CompSeg segmentation scheme on DARPA Hub4 Broadcast News (30.5% relative improvement in FES) and NGSW data. Transcript generation is demonstrated for a six-decade portion of the NGSW corpus. Novel model adaptation using structure maximum likelihood eigenspace mapping shows a relative 21.7% improvement. Issues regarding copyright assessment and metadata construction are also addressed for the purposes of a sustainable audio collection of this magnitude. Advanced parameter-embedded watermarking is proposed with evaluations showing robustness to correlated noise attacks. Our experimental online system entitled “SpeechFind” is presented, which allows for audio retrieval from a portion of the NGSW corpus. Finally, a number of research challenges such as language modeling and lexicon for changing time periods, speaker trait and identification tracking, as well as new directions, are discussed in order to address the overall task of robust phrase searching in unrestricted audio corpora.
机译:解决了在新的国家话音画廊(NGSW)制定口头文件检索方面的进展。 NGSW是同类中的第一个大型存储库,包括演讲,新闻广播和20世纪的录音。在介绍NGSW音频流内容的概述以及从1893年至今的美国总统采样音频文件后,提出了总体系统图,其中讨论了与有效音频信息检索相关的关键任务。这些措施包括高级音频分段,针对声学背景噪声和说话者可变性的语音识别模型适应,以及使用自然语言处理的文本查询请求(包括文档和查询扩展)的信息检索。对于分段,提出了一种新的评估标准,称为融合错误评分(FES),然后将CompSeg分段方案应用于DARPA Hub4广播新闻(FES相对提高30.5%)和NGSW数据。 NGSW语料库的六个十年部分显示了成绩单生成。使用结构最大似然本征空间映射的新型模型自适应显示了相对21.7%的改进。为了实现如此规模的可持续音频收集,还解决了有关版权评估和元数据构建的问题。提出了高级参数嵌入水印,其评估显示了对相关噪声攻击的鲁棒性。我们介绍了名为“ SpeechFind”的实验性在线系统,该系统可从NGSW语料库的一部分中检索音频。最后,为了解决无限制音频语料库中健壮短语搜索的总体任务,讨论了许多研究挑战,例如用于更改时间段的语言建模和词典,说话者特征和识别跟踪以及新的方向。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号