...
首页> 外文期刊>IEEE transactions on systems, man, and cybernetics. Part B, Cybernetics >A novel prosodic-information synthesizer based on recurrent fuzzy neural network for the Chinese TTS system
【24h】

A novel prosodic-information synthesizer based on recurrent fuzzy neural network for the Chinese TTS system

机译:一种基于递归模糊神经网络的中文TTS系统韵律信息合成器

获取原文
获取原文并翻译 | 示例
           

摘要

In this paper, a new technique for the Chinese text-to-speech (TTS) system is proposed. Our major effort focuses on the prosodic information generation. New methodologies for constructing fuzzy rules in a prosodic model simulating human's pronouncing rules are developed. The proposed recurrent fuzzy neural network (RFNN) is a multilayer recurrent neural network (RNN) which integrates a self-constructing neural fuzzy inference network (SONFIN) into a recurrent connectionist structure. The RFNN can be functionally divided into two parts. The first part adopts the SONFIN as a prosodic model to explore the relationship between high-level linguistic features and prosodic information based on fuzzy inference rules. As compared to conventional neural networks, the SONFIN can always construct itself with an economic network size in high learning speed. The second part employs a five-layer network to generate all prosodic parameters by directly using the prosodic fuzzy rules inferred from the first part as well as other important features of syllables. The TTS system combined with the proposed method can behave not only sandhi rules but also the other prosodic phenomena existing in the traditional TTS systems. Moreover, the proposed scheme can even find out some new rules about prosodic phrase structure. The performance of the proposed RFNN-based prosodic model is verified by imbedding it into a Chinese TTS system with a Chinese monosyllable database based on the time-domain pitch synchronous overlap add (TD-PSOLA) method. Our experimental results show that the proposed RFNN can generate proper prosodic parameters including pitch means, pitch shapes, maximum energy levels, syllable duration, and pause duration. Some synthetic sounds are on-line available for demonstration.
机译:本文提出了一种新的中文语音合成系统(TTS)技术。我们的主要工作集中在韵律信息的产生上。开发了在模拟人类发音规则的韵律模型中构造模糊规则的新方法。所提出的递归模糊神经网络(RFNN)是一个多层递归神经网络(RNN),它将自构造神经模糊推理网络(SONFIN)集成到递归连接主义结构中。 RFNN在功能上可以分为两部分。第一部分采用SONFIN作为韵律模型,基于模糊推理规则探索高级语言特征与韵律信息之间的关系。与传统的神经网络相比,SONFIN可以始终以经济的网络规模构建自己的高学习速度。第二部分采用五层网络,通过直接使用从第一部分推断出的韵律模糊规则以及音节的其他重要特征来生成所有韵律参数。结合所提出的方法的TTS系统不仅可以表现出桑迪规则,而且还可以表现出传统TTS系统中存在的其他韵律现象。此外,提出的方案甚至可以找到一些关于韵律短语结构的新规则。通过基于时域音高同步重叠叠加(TD-PSOLA)方法,将所提出的基于RFNN的韵律模型嵌入具有中文单音节数据库的中文TTS系统中,从而验证了该模型的性能。我们的实验结果表明,提出的RFNN可以生成适当的韵律参数,包括音高均值,音高形状,最大能量水平,音节持续时间和暂停持续时间。一些合成声音可以在线进行演示。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号