首页> 外文会议>Spoken Language Technology Workshop >Hierarchical Prosody Modeling for Non-Autoregressive Speech Synthesis
【24h】

Hierarchical Prosody Modeling for Non-Autoregressive Speech Synthesis

机译:非自动评级语音合成的分层韵律建模

获取原文

摘要

Prosody modeling is an essential component in modern text-to-speech (TTS) frameworks. By explicitly providing prosody features to the TTS model, the style of synthesized utterances can thus be controlled. However, predicting natural and reasonable prosody at inference time is challenging. In this work, we analyzed the behavior of non-autoregressive TTS models under different prosody-modeling settings and proposed a hierarchical architecture, in which the prediction of phoneme-level prosody features are conditioned on the word-level prosody features. The proposed method outperforms other competitors in terms of audio quality and prosody naturalness on objective and subjective evaluation.
机译:韵律建模是现代文本到语音(TTS)框架中的重要组成部分。通过明确向TTS模型提供韵律特征,因此可以控制合成话语的风格。然而,在推理时间预测自然和合理的韵律是具有挑战性的。在这项工作中,我们分析了在不同韵律建模环境下的非自动增加TTS模型的行为,并提出了一种分层体系结构,其中对词级韵律特征的调节级韵律特征的预测。该方法在音频质量和韵律自然的客观和主观评价方面优于其他竞争对手。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号