SpeechFind: Advances in Spoken Document Retrieval for a National Gallery of the Spoken Word

Hansen J.H.L.; Huang R.; Zhou B.; Seadle M.; Deller J.R.; Gurijala A.R.; Kurimo M.; Angkititrakul P.

首页> 外文期刊>IEEE Transactions on Speech and Audio Proceessing >SpeechFind: Advances in Spoken Document Retrieval for a National Gallery of the Spoken Word

【24h】

SpeechFind: Advances in Spoken Document Retrieval for a National Gallery of the Spoken Word

机译：SpeechFind：国家语言单词库的语音文档检索进展

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Advances in formulating spoken document retrieval for a new National Gallery of the Spoken Word (NGSW) are addressed. NGSW is the first large-scale repository of its kind, consisting of speeches, news broadcasts, and recordings from the 20th century. After presenting an overview of the audio stream content of the NGSW, with sample audio files from U.S. Presidents from 1893 to the present, an overall system diagram is proposed with a discussion of critical tasks associated with effective audio information retrieval. These include advanced audio segmentation, speech recognition model adaptation for acoustic background noise and speaker variability, and information retrieval using natural language processing for text query requests that include document and query expansion. For segmentation, a new evaluation criterion entitled fused error score (FES) is proposed, followed by application of the CompSeg segmentation scheme on DARPA Hub4 Broadcast News (30.5% relative improvement in FES) and NGSW data. Transcript generation is demonstrated for a six-decade portion of the NGSW corpus. Novel model adaptation using structure maximum likelihood eigenspace mapping shows a relative 21.7% improvement. Issues regarding copyright assessment and metadata construction are also addressed for the purposes of a sustainable audio collection of this magnitude. Advanced parameter-embedded watermarking is proposed with evaluations showing robustness to correlated noise attacks. Our experimental online system entitled “SpeechFind” is presented, which allows for audio retrieval from a portion of the NGSW corpus. Finally, a number of research challenges such as language modeling and lexicon for changing time periods, speaker trait and identification tracking, as well as new directions, are discussed in order to address the overall task of robust phrase searching in unrestricted audio corpora.

机译：解决了在新的国家话音画廊（NGSW）制定口头文件检索方面的进展。 NGSW是同类中的第一个大型存储库，包括演讲，新闻广播和20世纪的录音。在介绍NGSW音频流内容的概述以及从1893年至今的美国总统采样音频文件后，提出了总体系统图，其中讨论了与有效音频信息检索相关的关键任务。这些措施包括高级音频分段，针对声学背景噪声和说话者可变性的语音识别模型适应，以及使用自然语言处理的文本查询请求（包括文档和查询扩展）的信息检索。对于分段，提出了一种新的评估标准，称为融合错误评分（FES），然后将CompSeg分段方案应用于DARPA Hub4广播新闻（FES相对提高30.5％）和NGSW数据。 NGSW语料库的六个十年部分显示了成绩单生成。使用结构最大似然本征空间映射的新型模型自适应显示了相对21.7％的改进。为了实现如此规模的可持续音频收集，还解决了有关版权评估和元数据构建的问题。提出了高级参数嵌入水印，其评估显示了对相关噪声攻击的鲁棒性。我们介绍了名为“ SpeechFind”的实验性在线系统，该系统可从NGSW语料库的一部分中检索音频。最后，为了解决无限制音频语料库中健壮短语搜索的总体任务，讨论了许多研究挑战，例如用于更改时间段的语言建模和词典，说话者特征和识别跟踪以及新的方向。

著录项

来源
《IEEE Transactions on Speech and Audio Proceessing》 |2005年第5期|p.712-730|共19页
作者
Hansen J.H.L.; Huang R.; Zhou B.; Seadle M.; Deller J.R.; Gurijala A.R.; Kurimo M.; Angkititrakul P.;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类电声技术和语音信号处理;
关键词
audio signal processing; information retrieval; natural languages; speech recognition; watermarking; CompSeg segmentation scheme; DARPA Hub4 Broadcast News; National Gallery of the Spoken Word; SpeechFind; acoustic background noise; advanced audio segmentation; adv;

机译：音频信号处理;信息检索;自然语言;语音识别;加水印;CompSeg分割方案;DARPA Hub4广播新闻;国家美术馆口语;语音发现;声学背景噪声;高级音频分割;adv;

相似文献

外文文献
中文文献
专利

1. Word Topic Models for Spoken Document Retrieval and Transcription [J] . BERLIN CHEN ACM transactions on Asian language information processing . 2009,第1期

机译：语音文档检索和转录的Word主题模型
2. Robust Spoken Document Retrieval Methods for Misrecognition and Out-of-Vocabulary Keywords [J] . Hiromitsu Nishizaki, Seiichi Nakagawa Systems and Computers in Japan . 2004,第14期

机译：识别错误和词汇不足的健壮口语文档检索方法
3. News spoken document retrieval by considering out-of-vocabulary keywords [J] . Hiromitsu Nishizaki, Seiichi Nakagawa 電子情報通信学会技術研究報告. 音声. Speech . 2002,第108期

机译：通过考虑词汇外关键字来检索新闻语音文档
4. SPEECHFIND FOR CDP: ADVANCES IN SPOKEN DOCUMENT RETRIEVAL FOR THE U. S. COLLABORATIVE DIGITIZATION PROGRAM [C] . Wooil Kim, John H. L. Hansen IEEE Workshop on Automatic Speech Recognition and Understanding . 2007

机译：CDP的SpeemFind：对U. S.协作数字化计划的口头文件检索的进展
5. Robust spoken document retrieval in multilingual and noisy acoustic environments. [D] . Akbacak, Murat. 2009

机译：在多语言和嘈杂的声学环境中进行可靠的语音文档检索。
6. The Role of Grammatical Category Information in Spoken Word Retrieval [O] . Carolina Palma Duràn, Agnesa Pillon 2011

机译：语法类别信息在口语检索中的作用
7. SpeechFind: Advances in spoken document retrieval for a national gallery of the spoken word [O] . John H. L. Hansen, Senior Member, Rongqing Huang, 2005

机译：speechFind：口语文本检索的进展，用于口语的国家画廊

SpeechFind: Advances in Spoken Document Retrieval for a National Gallery of the Spoken Word

摘要

著录项

相似文献

相关主题

期刊订阅