首页> 外文期刊>IEICE transactions on information and systems >Discriminative Approach to Build Hybrid Vocabulary for Conversational Telephone Speech Recognition of Agglutinative Languages
【24h】

Discriminative Approach to Build Hybrid Vocabulary for Conversational Telephone Speech Recognition of Agglutinative Languages

机译:会话语言语音识别的混合词汇建立判别方法

获取原文
           

摘要

Morphemes, which are obtained from morphological parsing, and statistical sub-words, which are derived from data-driven splitting, are commonly used as the recognition units for speech recognition of agglutinative languages. In this letter, we propose a discriminative approach to select the splitting result, which is more likely to improve the recognizer's performance, for each distinct word type. An objective function which involves the unigram language model (LM) probability and the count of misrecognized phones on the acoustic training data is defined and minimized. After determining the splitting result for each word in the text corpus, we select the frequent units to build a hybrid vocabulary including morphemes and statistical sub-words. Compared to a statistical sub-word based system, the hybrid system achieves 0.8% letter error rates (LERs) reduction on the test set.
机译:从形态学分析中获得的词素和从数据驱动的拆分中获得的统计子词通常用作凝集语言语音识别的识别单元。在这封信中,我们提出了一种判别方法来选择分割结果,对于每种不同的词类型,分割结果更有可能提高识别器的性能。定义并最小化了一个目标函数,该函数涉及unigram语言模型(LM)的概率和在声学训练数据上误认电话的数量。确定文本语料库中每个单词的拆分结果后,我们选择常用单元来构建包含语素和统计子单词的混合词汇。与基于统计子词的系统相比,混合系统在测试集上实现了0.8%的字母错误率(LER)降低。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号