...
首页> 外文期刊>Audio, Speech, and Language Processing, IEEE Transactions on >Hierarchical Prosody Conversion Using Regression-Based Clustering for Emotional Speech Synthesis
【24h】

Hierarchical Prosody Conversion Using Regression-Based Clustering for Emotional Speech Synthesis

机译:使用基于回归的聚类进行层次韵律转换以进行情感语音合成

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

This paper presents an approach to hierarchical prosody conversion for emotional speech synthesis. The pitch contour of the source speech is decomposed into a hierarchical prosodic structure consisting of sentence, prosodic word, and subsyllable levels. The pitch contour in the higher level is encoded by the discrete Legendre polynomial coefficients. The residual, the difference between the source pitch contour and the pitch contour decoded from the discrete Legendre polynomial coefficients, is then used for pitch modeling at the lower level. For prosody conversion, Gaussian mixture models (GMMs) are used for sentence- and prosodic word-level conversion. At subsyllable level, the pitch feature vectors are clustered via a proposed regression-based clustering method to generate the prosody conversion functions for selection. Linguistic and symbolic prosody features of the source speech are adopted to select the most suitable function using the classification and regression tree for prosody conversion. Three small-sized emotional parallel speech databases with happy, angry, and sad emotions, respectively, were designed and collected for training and evaluation. Objective and subjective evaluations were conducted and the comparison results to the GMM-based method for prosody conversion achieved an improved performance using the hierarchical prosodic structure and the proposed regression-based clustering method.
机译:本文提出了一种用于情感语音合成的层次韵律转换方法。源语音的音高轮廓被分解为由句子,韵律词和子音节级别组成的分层韵律结构。较高音调轮廓由离散的勒让德多项式系数编码。然后,将残差(源音高轮廓线与从离散Legendre多项式系数解码的音高轮廓线之间的差)用于较低级别的音高建模。对于韵律转换,高斯混合模型(GMM)用于句子和韵律词级转换。在子音节级别,音高特征向量通过一种基于回归的聚类方法进行聚类,以生成韵律转换函数进行选择。使用源语音的语言和符号韵律特征,使用分类和回归树进行韵律转换,以选择最合适的功能。设计并收集了三个分别具有快乐,愤怒和悲伤情绪的小型情绪平行语音数据库,以进行培训和评估。进行了客观和主观的评估,并与基于GMM的韵律转换方法的比较结果使用分层韵律结构和建议的基于回归的聚类方法获得了改进的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号