首页> 外文会议>9th International conference on language resources and evaluation >A Corpus and Phonetic Dictionary for Tunisian Arabic Speech Recognition
【24h】

A Corpus and Phonetic Dictionary for Tunisian Arabic Speech Recognition

机译:突尼斯阿拉伯语音识别的语料库和语音词典

获取原文

摘要

In this paper we describe an effort to create a corpus and phonetic dictionary for Tunisian Arabic Automatic Speech Recognition (ASR). The corpus, named TARIC (Tunisian Arabic Railway Interaction Corpus) has a collection of audio recordings and transcriptions from dialogues in the Tunisian Railway Transport Network. The phonetic (or pronunciation) dictionary is an important ASR component that serves as an intermediary between acoustic models and language models in ASR systems. The method proposed in this paper, to automatically generate a phonetic dictionary, is rule based. For that reason, we define a set of pronunciation rules and a lexicon of exceptions. To determine the performance of our phonetic rules, we chose to evaluate our pronunciation dictionary on two types of corpora. The word error rate of word grapheme-to-phoneme mapping is around 9%.
机译:在本文中,我们描述了为突尼斯阿拉伯自动语音识别(ASR)创建语料库和语音词典的工作。该语料库名为TARIC(突尼斯阿拉伯铁路互动语料库),收集了突尼斯铁路运输网络中对话的录音和转录本。语音(或发音)词典是重要的ASR组件,可作为ASR系统中声学模型和语言模型之间的中介。本文提出的自动生成语音词典的方法是基于规则的。因此,我们定义了一组发音规则和一个例外词典。为了确定我们的语音规则的性能,我们选择对两种类型的语料库评估我们的发音词典。词素到音素映射的词错误率约为9%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号