首页> 外文会议>International Conference on Electrical Engineering/Electronics Computer Telecommunications and Information Technology;ECTI-CON 2010 >A hybrid diphone speech unit and a speech corpus construction technique for a Thai text-to-speech system on mobile devices
【24h】

A hybrid diphone speech unit and a speech corpus construction technique for a Thai text-to-speech system on mobile devices

机译:用于移动设备上泰国文字语音转换系统的混合双音语音单元和语音语料库构建技术

获取原文
获取原文并翻译 | 示例

摘要

Most Thai text-to-speech systems on personal computers can synthesize sound in real time with acceptable quality. However, when porting the Thai TTS systems to limited-resource systems such as mobile devices, computational time has to be reduced. Hence, the quality of synthesized sound is decreased. Even though Flite_Thai, a unit concatenation synthesizer for Thai, can reduce the computational time into a real time system, the output sound is quite unintelligible. In this paper, we aim at selecting the appropriate speech unit for Flite_Thai in order to improve its intelligibility. We design a new speech corpus that consists of three different speech units: demi-syllable, diphone and a new speech unit called hybrid diphone. We use a non-sense carrier sentence technique for recording this corpus since we focus more on clear articulation of each speech unit. Our carrier sentence contains a speech unit or a set of similar speech units per sentence without concerning the meaning. We compare the quality of speech synthesized using four types of speech units, a diphone from the TsynC corpus recorded with natural sentences, and the three types of units from the new corpus recorded with non-sense carrier sentences. In terms of intelligibility, all of the speech units from the new corpus achieved higher MOS (Mean Opinion Score) than the existing Flite_Thai system which uses speech units from TsynC. Among the three unit types in the news corpus, demi-syllable obtained the highest score. Although hybrid diphone obtained higher MOS than the existing system and the diphone, it still suffers from a similar problem which is unsmooth joints between units.
机译:大多数泰国个人计算机上的文本语音转换系统可以以可接受的质量实时合成声音。但是,将Thai TTS系统移植到资源有限的系统(例如移动设备)时,必须减少计算时间。因此,合成声音的质量降低。即使泰语的单元串联合成器Flite_Thai可以将计算时间减少到实时系统中,输出声音还是非常难以理解的。本文旨在为Flite_Thai选择合适的语音单元,以提高其清晰度。我们设计了一个新的语音语料库,它由三个不同的语音单元组成:半音节,双音节和一个称为混合双音节的新语音单元。由于我们更关注每个语音单元的清晰发音,因此我们使用了无意义的载体句技术来记录该语料库。我们的载体句包含每个句子的语音单位或一组相似的语音单位,而无需考虑其含义。我们比较了使用四种类型的语音单元(来自TsynC语料库中记录有自然句子的双音素)和来自新语料库中的三种类型单元(记录了无义的载体句子)合成的语音质量。在清晰度方面,与使用TsynC语音单元的现有Flite_Thai系统相比,新语料库的所有语音单元均实现了更高的MOS(平均意见评分)。在新闻语料库的三种单位类型中,半音节音节得分最高。尽管混合双音器比现有系统和双音器获得更高的MOS,但是它仍然遭受类似的问题,即单元之间的连接不平滑。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号