首页> 外文会议>International Conference on Spoken Language Processing; 20041004-08; Jeju(KR) >Automatic Extraction of Phonetically Rich Sentences from Large Text Corpus of Indian Languages
【24h】

Automatic Extraction of Phonetically Rich Sentences from Large Text Corpus of Indian Languages

机译:从印度语大文本语料库中自动提取语音丰富的句子

获取原文
获取原文并翻译 | 示例

摘要

A set of phonetically rich sentences is a requirement for representing different speech units, to be used for developing Automatic Speech Recognition and Speech Synthesis Systems. Selecting such a set from a large text corpus without modifying the characteristics of the corpus is still a difficult task. A major concern in this process is to decide on what basis sentences must be chosen so that it covers all phonetic aspects of the language under study in a minimum possible size. This paper describes a simple process of automatically extracting such set of sentences from a large text corpus of a given Indian Language and also presents an algorithm for the process. The process discussed in this paper is language independent and works for most of the Indian Languages. The extent of success, in terms of phonetic richness of the sentences, achieved in the process is also discussed.
机译:一组语音丰富的句子是表示不同语音单元的要求,可用于开发自动语音识别和语音合成系统。从大型文本语料库中选择这样的集合而不修改语料库的特征仍然是困难的任务。在此过程中,一个主要的问题是决定必须选择什么基础的句子,以便以最小的大小覆盖所研究语言的所有语音方面。本文介绍了一种从给定印度语言的大型文本语料库中自动提取此类句子的简单过程,并提出了一种用于该过程的算法。本文讨论的过程与语言无关,并且适用于大多数印度语言。还讨论了在此过程中所获得的成功程度,以句子的语音丰富性为依据。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号