首页> 外文会议>International Conference on Spoken Language Processing; 20041004-08; Jeju(KR) >Construct a Multi-Lingual Speech Corpus in Taiwan with Extracting Phonetically Balanced Articles
【24h】

Construct a Multi-Lingual Speech Corpus in Taiwan with Extracting Phonetically Balanced Articles

机译:提取语音平衡的文章在台湾建立多语言语音语料库

获取原文
获取原文并翻译 | 示例

摘要

In this paper, we describe an initial stage to construct a multilingual speech corpus in Taiwan with selecting phonetically balanced scripts. It is expected to collect a multilingual speech corpus covering three most frequently used languages in Taiwan, including Taiwanese (Min-nan), Hakka, and Mandarin Chinese. To achieve the objective, constructing a multilingual phonetic alphabet, namely Formosa Phonetic Alphabet (ForPA), is the first step. In addition, the multilingual lexicons (Fomosa Lexicons) are also important parts for building the corpus. Recently, this corpus containing 2,300 speakers' speech database has been finished and is ready to be released. It contains about 200 hours of speech and 404,000 utterances.
机译:在本文中,我们描述了通过选择语音平衡的脚本在台湾构建多语言语音语料库的初始阶段。预计它将收集一个涵盖台湾三种最常用语言的多语言语音语料库,包括台湾语(闽南语),客家语和普通话。为了实现该目标,第一步是构建多语言语音字母表,即福尔摩沙语音字母表(ForPA)。此外,多语言词典(Fomosa Lexicons)也是构建语料库的重要部分。最近,这个包含2300个演讲者语音数据库的语料库已经完成,可以发布了。它包含大约200个小时的语音和404,000语音。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号