首页> 外文期刊>ACM transactions on Asian language information processing >Word Topic Models for Spoken Document Retrieval and Transcription
【24h】

Word Topic Models for Spoken Document Retrieval and Transcription

机译:语音文档检索和转录的Word主题模型

获取原文
获取原文并翻译 | 示例
           

摘要

Statistical language modeling (LM), which aims to capture the regularities in human natural language and quantify the acceptability of a given word sequence, has long been an interesting yet challenging research topic in the speech and language processing community. It also has been introduced to information retrieval (IR) problems, and provided an effective and theoretically attractive probabilistic framework for building IR systems. In this article, we propose a word topic model (WTM) to explore the co-occurrence relationship between words, as well as the long-span latent topical information, for language modeling in spoken document retrieval and transcription. The document or the search history as a whole is modeled as a composite WTM model for generating a newly observed word. The underlying characteristics and different kinds of model structures are extensively investigated, while the performance of WTM is thoroughly analyzed and verified by comparison with the well-known probabilistic latent semantic analysis (PLSA) model as well as the other models. The IR experiments are performed on the TDT Chinese collections (TDT-2 and TDT-3), while the large vocabulary continuous speech recognition (LVCSR) experiments are conducted on the Mandarin broadcast news collected in Taiwan. Experimental results seem to indicate that WTM is a promising alternative to the existing models.
机译:统计语言建模(LM)旨在捕获人类自然语言的规律性并量化给定单词序列的可接受性,长期以来一直是语音和语言处理社区中一个有趣但具有挑战性的研究主题。它还已被引入到信息检索(IR)问题中,并为构建IR系统提供了有效且理论上有吸引力的概率框架。在本文中,我们提出了一个单词主题模型(WTM),以探讨单词之间的共现关系,以及大跨度的潜在主题信息,用于语音文档检索和转录中的语言建模。整个文档或搜索历史被建模为用于生成新观察到的单词的复合WTM模型。广泛研究了其基本特征和不同类型的模型结构,同时通过与著名的概率潜在语义分析(PLSA)模型以及其他模型进行比较,对WTM的性能进行了彻底的分析和验证。对TDT中文集(TDT-2和TDT-3)进行IR实验,对台湾收集的普通话广播新闻进行大词汇量连续语音识别(LVCSR)实验。实验结果似乎表明WTM是现有模型的有希望的替代方案。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号