首页> 外文会议>Machine learning for multimodal interaction >The IBM Rich Transcription Spring 2006 Speech-to-Text System for Lecture Meetings
【24h】

The IBM Rich Transcription Spring 2006 Speech-to-Text System for Lecture Meetings

机译:IBM Rich Transcription 2006春季演讲会议的语音转文本系统

获取原文
获取原文并翻译 | 示例

摘要

We describe the IBM systems submitted to the NIST RT06s Speech-to-Text (STT) evaluation campaign on the CHIL lecture meeting data for three conditions: Multiple distant microphone (MDM), single distant microphone (SDM), and individual headset microphone (IHM). The system building process is similar to the IBM conversational telephone speech recognition system. However, the best models for the far-field conditions (SDM and MDM) proved to be the ones that use neither variance normalization nor vocal tract length normalization. Instead, feature-space minimum-phone error discriminative training yielded the best results. Due to the relatively small amount of CHIL-domain data, the acoustic models of our systems are built on publicly available meeting corpora, with maximum a-posteriori adaptation applied twice on CHIL data during training: First, at the initial speaker-independent model, and subsequently at the minimum phone error model. For language modeling, we utilized meeting transcripts, text from scientific conference proceedings, and spontaneous telephone conversations. On development data, chosen in our work to be the 2005 CHIL-internal STT evaluation test set, the resulting language model provided a 4% absolute gain in word error rate (WER), compared to the model used in last year's CHIL evaluation. Furthermore, the developed STT system significantly outperformed our last year's results, by reducing close-talking microphone data WER from 36.9% to 25.4% on our development set. In the NIST RT06s evaluation campaign, both MDM and SDM systems scored well, however the IHM system did poorly due to unsuccessful cross-talk removal.
机译:我们在CHIL演讲会议数据上描述了提交给NIST RT06的语音转文本(STT)评估活动的IBM系统,该数据满足以下三种条件:多距离麦克风(MDM),单距离麦克风(SDM)和单个耳机麦克风(IHM) )。系统构建过程类似于IBM对话电话语音识别系统。然而,事实证明,针对远场条件的最佳模型(SDM和MDM)是既不使用方差归一化也不使用声道长度归一化的模型。取而代之的是,特征空间最小电话错误判别训练产生了最佳结果。由于CHIL域数据相对较少,因此我们系统的声学模型是建立在公开可用的会议语料库上,并且在训练期间对CHIL数据进行了两次最大的后验自适应:首先,在初始独立于说话者的模型中,然后采用最小电话错误模型。对于语言建模,我们利用了会议记录,科学会议记录中的文字以及自发的电话交谈。在我们的工作中选择的开发数据作为2005 CHIL内部STT评估测试集,与去年的CHIL评估中使用的模型相比,最终的语言模型提供了4%的绝对误码率(WER)绝对增益。此外,通过将我们开发套件上的近距离传声器数据WER从36.9%降低到25.4%,已开发的STT系统大大优于我们去年的结果。在NIST RT06的评估活动中,MDM和SDM系统均取得了不错的成绩,但是由于串扰移除失败,IHM系统的表现不佳。

著录项

  • 来源
  • 会议地点 Bethesda MD(US);Bethesda MD(US)
  • 作者单位

    IBM Thomas J. Watson Research Center Yorktown Heights, NY 10598, U.S.A.;

    IBM Thomas J. Watson Research Center Yorktown Heights, NY 10598, U.S.A.;

    IBM Thomas J. Watson Research Center Yorktown Heights, NY 10598, U.S.A.;

    IBM Thomas J. Watson Research Center Yorktown Heights, NY 10598, U.S.A.;

    IBM Thomas J. Watson Research Center Yorktown Heights, NY 10598, U.S.A.;

    IBM Thomas J. Watson Research Center Yorktown Heights, NY 10598, U.S.A.;

    IBM Thomas J. Watson Research Center Yorktown Heights, NY 10598, U.S.A.;

    IBM Thomas J. Watson Research Center Yorktown Heights, NY 10598, U.S.A.;

    IBM Thomas J. Watson Research Center Yorktown Heights, NY 10598, U.S.A.;

    IBM Thomas J. Watson Research Center Yorktown Heights, NY 10598, U.S.A.;

  • 会议组织
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 程序语言、算法语言;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号