Hierarchical Prosody Modeling for Non-Autoregressive Speech Synthesis

机译：非自动评级语音合成的分层韵律建模

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Prosody modeling is an essential component in modern text-to-speech (TTS) frameworks. By explicitly providing prosody features to the TTS model, the style of synthesized utterances can thus be controlled. However, predicting natural and reasonable prosody at inference time is challenging. In this work, we analyzed the behavior of non-autoregressive TTS models under different prosody-modeling settings and proposed a hierarchical architecture, in which the prediction of phoneme-level prosody features are conditioned on the word-level prosody features. The proposed method outperforms other competitors in terms of audio quality and prosody naturalness on objective and subjective evaluation.

机译：韵律建模是现代文本到语音（TTS）框架中的重要组成部分。通过明确向TTS模型提供韵律特征，因此可以控制合成话语的风格。然而，在推理时间预测自然和合理的韵律是具有挑战性的。在这项工作中，我们分析了在不同韵律建模环境下的非自动增加TTS模型的行为，并提出了一种分层体系结构，其中对词级韵律特征的调节级韵律特征的预测。该方法在音频质量和韵律自然的客观和主观评价方面优于其他竞争对手。

著录项

来源
《Spoken Language Technology Workshop》|2021年|446-453|共8页
会议地点
作者
Chung-Ming Chien; Hung-yi Lee;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Analytical models; Conferences; Predictive models; Speech synthesis;

机译：分析模型;会议;预测模型;语音合成;

相似文献

外文文献
中文文献
专利

1. Exploiting Prosody Hierarchy and Dynamic Features for Pitch Modeling and Generation in HMM-Based Speech Synthesis [J] . Hsia C.-C., Wu C.-H., Wu J.-Y. Audio, Speech, and Language Processing, IEEE Transactions on . 2010,第8期

机译：在基于HMM的语音合成中利用韵律层次和动态特征进行音高建模和生成
2. Hierarchical Prosody Conversion Using Regression-Based Clustering for Emotional Speech Synthesis [J] . Chung-Hsien Wu, Chi-Chun Hsia, Chung-Han Lee, Audio, Speech, and Language Processing, IEEE Transactions on . 2010,第6期

机译：使用基于回归的聚类进行层次韵律转换以进行情感语音合成
3. A STATISTICAL MODEL WITH HIERARCHICAL STRUCTURE FOR PREDICTING PROSODY IN A MANDARIN TEXT-TO-SPEECH SYSTEM [J] . Ming-Shing Yu, Neng-Huang Pan Journal of the Chinese Institute of Engineers. Series A . 2005,第3期

机译：普通话语到言语系统中用于预测韵律的具有层次结构的统计模型
4. Fully-Hierarchical Fine-Grained Prosody Modeling For Interpretable Speech Synthesis [C] . Guangzhi Sun, Yu Zhang, Ron J. Weiss, IEEE International Conference on Acoustics, Speech and Signal Processing . 2020

机译：可解释语音合成的全层次细粒度韵律建模
5. Building a prosodically sensitive diphone database for a Korean text-to-speech synthesis system. [D] . Yoon, Kyuchul. 2005

机译：为韩国文字转语音合成系统建立一个对韵律敏感的diphone数据库。
6. Unsupervised Adaptation of Categorical Prosody Models for Prosody Labeling and Speech Recognition [O] . Sankaranarayanan Ananthakrishnan, Shrikanth Narayanan -1

机译：类别韵律模型的无监督适应用于韵律标记和语音识别
7. Hierarchical Prosody Modeling for Non-Autoregressive Speech Synthesis [O] . Chung-Ming Chien, Hung-yi Lee 2021

机译：非自动评级语音合成的分层韵律建模

Hierarchical Prosody Modeling for Non-Autoregressive Speech Synthesis

摘要

著录项

相似文献

相关主题

期刊订阅