Hierarchical Prosody Conversion Using Regression-Based Clustering for Emotional Speech Synthesis

Chung-Hsien Wu; Chi-Chun Hsia; Chung-Han Lee; Mai-Chun Lin

首页> 外文期刊>Audio, Speech, and Language Processing, IEEE Transactions on >Hierarchical Prosody Conversion Using Regression-Based Clustering for Emotional Speech Synthesis

【24h】

Hierarchical Prosody Conversion Using Regression-Based Clustering for Emotional Speech Synthesis

机译：使用基于回归的聚类进行层次韵律转换以进行情感语音合成

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper presents an approach to hierarchical prosody conversion for emotional speech synthesis. The pitch contour of the source speech is decomposed into a hierarchical prosodic structure consisting of sentence, prosodic word, and subsyllable levels. The pitch contour in the higher level is encoded by the discrete Legendre polynomial coefficients. The residual, the difference between the source pitch contour and the pitch contour decoded from the discrete Legendre polynomial coefficients, is then used for pitch modeling at the lower level. For prosody conversion, Gaussian mixture models (GMMs) are used for sentence- and prosodic word-level conversion. At subsyllable level, the pitch feature vectors are clustered via a proposed regression-based clustering method to generate the prosody conversion functions for selection. Linguistic and symbolic prosody features of the source speech are adopted to select the most suitable function using the classification and regression tree for prosody conversion. Three small-sized emotional parallel speech databases with happy, angry, and sad emotions, respectively, were designed and collected for training and evaluation. Objective and subjective evaluations were conducted and the comparison results to the GMM-based method for prosody conversion achieved an improved performance using the hierarchical prosodic structure and the proposed regression-based clustering method.

机译：本文提出了一种用于情感语音合成的层次韵律转换方法。源语音的音高轮廓被分解为由句子，韵律词和子音节级别组成的分层韵律结构。较高音调轮廓由离散的勒让德多项式系数编码。然后，将残差（源音高轮廓线与从离散Legendre多项式系数解码的音高轮廓线之间的差）用于较低级别的音高建模。对于韵律转换，高斯混合模型（GMM）用于句子和韵律词级转换。在子音节级别，音高特征向量通过一种基于回归的聚类方法进行聚类，以生成韵律转换函数进行选择。使用源语音的语言和符号韵律特征，使用分类和回归树进行韵律转换，以选择最合适的功能。设计并收集了三个分别具有快乐，愤怒和悲伤情绪的小型情绪平行语音数据库，以进行培训和评估。进行了客观和主观的评估，并与基于GMM的韵律转换方法的比较结果使用分层韵律结构和建议的基于回归的聚类方法获得了改进的性能。

著录项

来源
《Audio, Speech, and Language Processing, IEEE Transactions on》 |2010年第6期|P.1394-1405|共12页
作者
Chung-Hsien Wu; Chi-Chun Hsia; Chung-Han Lee; Mai-Chun Lin;
展开▼
作者单位

Dept. of Comput. Sci. & Inf. Eng., Nat. Cheng Kung Univ., Tainan, Taiwan;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Emotional speech synthesis; hierarchical prosodic structure; prosody conversion; regression-based clustering;

机译：情感语音合成;层次韵律结构;韵律转换;基于回归的聚类;

相似文献

外文文献
中文文献
专利

1. Prosody conversion from neutral speech to emotional speech [J] . Jianhua Tao, Yongguo Kang, Aijun Li IEEE transactions on audio, speech and language processing . 2006,第4期

机译：韵律从中性语音转换为情感语音
2. Affect-insensitive speaker recognition systems via emotional speech clustering using prosodic features [J] . Li Dongdong, Yuan Yubo, Wu Zhaohui, Neural computing & applications . 2015,第2期

机译：使用韵律特征的情感语音聚类，对情感不敏感的说话人识别系统
3. Emotional Speech Synthesis Based on Prosodic Feature Modification [J] . Ling He, Hua Huang, Margaret Lech Engineering . 2013,第10期

机译：基于韵律特征修饰的情感语音合成
4. Multi-level prosody and spectrum conversion for emotional speech synthesis [C] . Zexun Wang, Yibiao Yu 2014 12th International Conference on Signal Processing . 2014

机译：多级韵律和频谱转换，用于情感语音合成
5. Perception and Production of Emotional Prosody in the Speech of Mandarin-Speaking Adults with Cochlear Implants [D] . Pak, Cecilia Liu. 2018

机译：普通话成年人与人工耳蜗的讲话中对情绪韵律的感知和产生
6. Emotional Connotations of Musical Instrument Timbre in Comparison With Emotional Speech Prosody: Evidence From Acoustics and Event-Related Potentials [O] . Xiaoluan Liu, Yi Xu, Kai Alter, -1

机译：与情感言语韵律相比乐器音色的情感内涵：来自声学和与事件相关的电位的证据
7. Speaker-Adaptive Speech Synthesis Based on Eigenvoice Conversion and Language-Dependent Prosodic Conversion in Speech-to-Speech Translation [O] . Nobuhiko Hattori, Tomoki Toda, Hisashi Kawai, 2011

机译：语音转换中基于特征语音转换和语言相关韵律转换的说话人自适应语音合成

Hierarchical Prosody Conversion Using Regression-Based Clustering for Emotional Speech Synthesis

摘要

著录项

相似文献

相关主题

期刊订阅