Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with Tacotron

RJ Skerry-Ryan; Eric Battenberg; Ying Xiao; Yuxuan Wang; Daisy Stanton; Joel Shor; Ron Weiss; Rob Clark; Rif A. Saurous

首页> 外文期刊>JMLR: Workshop and Conference Proceedings >Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with Tacotron

【24h】

Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with Tacotron

机译：达到最终韵律转移，用于塔歇尔斯竞争语言合成

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

We present an extension to the Tacotron speech synthesis architecture that learns a latent embedding space of prosody, derived from a reference acoustic representation containing the desired prosody. We show that conditioning Tacotron on this learned embedding space results in synthesized audio that matches the prosody of the reference signal with fine time detail even when the reference and synthesis speakers are different. Additionally, we show that a reference prosody embedding can be used to synthesize text that is different from that of the reference utterance. We define several quantitative and subjective metrics for evaluating prosody transfer, and report results with accompanying audio samples from single-speaker and 44-speaker Tacotron models on a prosody transfer task.

机译：我们向Tacodron语音合成架构展示了学习韵律的潜在嵌入空间的延伸，从包含所需韵律的参考声学表示。我们表明，在该学习的嵌入空间上的调节塔克罗伦嵌入空间导致合成音频与参考信号的韵律相匹配，即使当参考和合成扬声器也不同，即使在参考和合成扬声器也不同。此外，我们表明，参考韵律嵌入可以用来合成与参考话语不同的文本。我们定义了几种定量和主观度量，用于评估韵律转移，并在韵律转移任务上伴随单人扬声器和44扬声器塔基诺模型的音频样本报告结果。

著录项

来源
《JMLR: Workshop and Conference Proceedings》 |2018年第2010期|共10页
作者
RJ Skerry-Ryan; Eric Battenberg; Ying Xiao; Yuxuan Wang; Daisy Stanton; Joel Shor; Ron Weiss; Rob Clark; Rif A. Saurous;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. A novel method for Mandarin speech synthesis by inserting prosodic structure prediction into Tacotron2 [J] . Liu Junmin, Xie Zhuangzhuang, Zhang Chunxia, International journal of machine learning and cybernetics . 2021,第10期

机译：一种新的普通话语音合成方法，通过将韵律结构预测插入Tacotron2
2. Understanding expressive speech acts: the role of prosody and situational context in French-speaking 5- to 9-year-olds. [J] . Aguert M, Laval V, Le Bigot L, Journal of speech, language, and hearing research: JSLHR . 2010,第6期

机译：了解表达性言语行为：韵律和情境背景在讲法语的5至9岁儿童中的作用。
3. Audiovisual representation of prosody in expressive speech communication [J] . Bjoern Granstroem, David House Speech Communication . 2005,第3a4期

机译：言语表达中韵律的视听表示
4. Focusing on Attention: Prosody Transfer and Adaptative Optimization Strategy for Multi-Speaker End-to-End Speech Synthesis [C] . Ruibo Fu, Jianhua Tao, Zhengqi Wen, IEEE International Conference on Acoustics, Speech and Signal Processing . 2020

机译：专注于注意力：多说话人端到端语音合成的韵律转移和自适应优化策略
5. Building a prosodically sensitive diphone database for a Korean text-to-speech synthesis system. [D] . Yoon, Kyuchul. 2005

机译：为韩国文字转语音合成系统建立一个对韵律敏感的diphone数据库。
6. The Prosodic Marionette: a method to visualize speech prosody and assess perceptual and expressive prosodic abilities [O] . Jonathan S. Brumberg, Jill C. Thorson, Rupal Patel -1

机译：韵律木偶：一种可视化语音韵律并评估感知和表达韵律能力的方法
7. Tacotron: Towards End-to-End Speech Synthesis [O] . Wang, Yuxuan, Skerry-Ryan, RJ, Stanton, Daisy, 2017

机译：Tacotron：走向端到端的语音合成

Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with Tacotron

摘要

著录项

相似文献

相关主题

期刊订阅