The IBM Rich Transcription Spring 2006 Speech-to-Text System for Lecture Meetings

机译：IBM Rich Transcription 2006春季演讲会议的语音转文本系统

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

We describe the IBM systems submitted to the NIST RT06s Speech-to-Text (STT) evaluation campaign on the CHIL lecture meeting data for three conditions: Multiple distant microphone (MDM), single distant microphone (SDM), and individual headset microphone (IHM). The system building process is similar to the IBM conversational telephone speech recognition system. However, the best models for the far-field conditions (SDM and MDM) proved to be the ones that use neither variance normalization nor vocal tract length normalization. Instead, feature-space minimum-phone error discriminative training yielded the best results. Due to the relatively small amount of CHIL-domain data, the acoustic models of our systems are built on publicly available meeting corpora, with maximum a-posteriori adaptation applied twice on CHIL data during training: First, at the initial speaker-independent model, and subsequently at the minimum phone error model. For language modeling, we utilized meeting transcripts, text from scientific conference proceedings, and spontaneous telephone conversations. On development data, chosen in our work to be the 2005 CHIL-internal STT evaluation test set, the resulting language model provided a 4% absolute gain in word error rate (WER), compared to the model used in last year's CHIL evaluation. Furthermore, the developed STT system significantly outperformed our last year's results, by reducing close-talking microphone data WER from 36.9% to 25.4% on our development set. In the NIST RT06s evaluation campaign, both MDM and SDM systems scored well, however the IHM system did poorly due to unsuccessful cross-talk removal.

机译：我们在CHIL演讲会议数据上描述了提交给NIST RT06的语音转文本（STT）评估活动的IBM系统，该数据满足以下三种条件：多距离麦克风（MDM），单距离麦克风（SDM）和单个耳机麦克风（IHM））。系统构建过程类似于IBM对话电话语音识别系统。然而，事实证明，针对远场条件的最佳模型（SDM和MDM）是既不使用方差归一化也不使用声道长度归一化的模型。取而代之的是，特征空间最小电话错误判别训练产生了最佳结果。由于CHIL域数据相对较少，因此我们系统的声学模型是建立在公开可用的会议语料库上，并且在训练期间对CHIL数据进行了两次最大的后验自适应：首先，在初始独立于说话者的模型中，然后采用最小电话错误模型。对于语言建模，我们利用了会议记录，科学会议记录中的文字以及自发的电话交谈。在我们的工作中选择的开发数据作为2005 CHIL内部STT评估测试集，与去年的CHIL评估中使用的模型相比，最终的语言模型提供了4％的绝对误码率（WER）绝对增益。此外，通过将我们开发套件上的近距离传声器数据WER从36.9％降低到25.4％，已开发的STT系统大大优于我们去年的结果。在NIST RT06的评估活动中，MDM和SDM系统均取得了不错的成绩，但是由于串扰移除失败，IHM系统的表现不佳。

著录项

来源
《Machine learning for multimodal interaction》|2006年|432-443|共12页
会议地点 Bethesda MD(US);Bethesda MD(US)
作者
Jing Huang; Martin Westphal; Stanley Chen; Olivier Siohan; Daniel Povcy; Vit Libal; Alvaro Soneiro; Henrik Schulz; Thomas Ross; Gerasimos Potamianos;
展开▼
作者单位

IBM Thomas J. Watson Research Center Yorktown Heights, NY 10598, U.S.A.;

IBM Thomas J. Watson Research Center Yorktown Heights, NY 10598, U.S.A.;

IBM Thomas J. Watson Research Center Yorktown Heights, NY 10598, U.S.A.;

IBM Thomas J. Watson Research Center Yorktown Heights, NY 10598, U.S.A.;

IBM Thomas J. Watson Research Center Yorktown Heights, NY 10598, U.S.A.;

IBM Thomas J. Watson Research Center Yorktown Heights, NY 10598, U.S.A.;

IBM Thomas J. Watson Research Center Yorktown Heights, NY 10598, U.S.A.;

IBM Thomas J. Watson Research Center Yorktown Heights, NY 10598, U.S.A.;

IBM Thomas J. Watson Research Center Yorktown Heights, NY 10598, U.S.A.;

IBM Thomas J. Watson Research Center Yorktown Heights, NY 10598, U.S.A.;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类程序语言、算法语言;
关键词

相似文献

外文文献
中文文献
专利

1. British Association for Cancer Research/Association of Cancer Physicians/British Oncological Association Joint Winter Meeting on 'Growth Control and Cancer Therapy' (Incorporating the 12th Gordon Hamilton-Fairley Memorial Lecture) and 'Transcription Control' (Incorporating the Constance Wood Memorial Lecture) [J] . British Journal of Cancer . 1992,第6期

机译：英国癌症研究协会/癌症医师协会/英国肿瘤协会联合冬季会议，讨论“生长控制和癌症疗法”（包括第十二届戈登·汉密尔顿-费尔利纪念演讲）和“转录控制”（包括康斯坦茨·伍德纪念演讲）
2. Meeting Assistant System Berbasis Teknologi Speech-to-Text [J] . Daniel Soesanto, Budi Hartanto, Melisa Teknika . 2021,第1期

机译：会议助理系统Berbasis Teknologi演讲到文本
3. Neural Text Normalization in Speech-to-Text Systems with Rich Features [J] . Tran Oanh Thi, Bui Viet The Applied Artificial Intelligence . 2021,第1a4期

机译：具有丰富功能的语音到文本系统中的神经文本规范化
4. The IBM Rich Transcription Spring 2006 Speech-to-Text System for Lecture Meetings [C] . Jing Huang, Martin Westphal, Stanley Chen, International workshop on machine learning for multimodal interaction . 2006

机译：IBM丰富的转录2006年讲学会议的语音到文本系统
5. African American male students of architecture at select HBCUs: Locus of Control, self-efficacy and Triarchic Intelligence (Spring 2006--Spring 2007). [D] . Charles, Kelly Jackson. 2007

机译：某些HBCU的非裔美国建筑学男学生：控制源，自我效能感和三级智能（2006年春季至2007年春季）。
6. Abstracts presented at the 2006 Annual Spring Meeting of the Society for Education in Anesthesia [O] . E Bauman, C Farrell, T Kloosterboer, 2006

机译：在麻醉教育学会2006年春季会议上提交的摘要
7. The rich transcription 2006 spring meeting recognition evaluation [O] . Jonathan G. Fiscus, Jerome Ajot, Martial Michel, 2006

机译：丰富的转录2006年春季会议认可评估
8. Minutes 1978 Spring Regional Meeting International Purdue Workshops on Industrial Computer Systems Held at Federal Institute of Technology (ETH) Zurich, Switzerland April 4 - 7, 1978, Purdue University, West Lafayette, Indiana April 10 - 12, 1978 and Jeida, Tokyo, Japan June 29 - 30, 1978. Part I. Narrative and Technical Appendices. [R] . 1978

机译：1978年春季区域会议国际普渡大学工业计算机系统研讨会于1978年4月4日至7日在瑞士苏黎世联邦理工学院举行，普渡大学，西拉斐特，1978年4月10日至12日，日本东京都日本1978年6月29日至30日。第一部分叙述和技术附录。

The IBM Rich Transcription Spring 2006 Speech-to-Text System for Lecture Meetings

摘要

著录项

相似文献

相关主题

期刊订阅