For constructing a speech synthesis system which can achieveuddiverse voices, we have been developing a speaker independentudapproach of HMM-based speech synthesis in which statisticaludaverage voice models are adapted to a target speaker using audsmall amount of speech data. In this paper, we incorporate audhigh-quality speech vocoding method STRAIGHT and a parameterudgeneration algorithm with global variance into the systemudfor improving quality of synthetic speech. Furthermore, weudintroduce a feature-space speaker adaptive training algorithmudand a gender mixed modeling technique for conducting furtherudnormalization of the average voice model. We build an Englishudtext-to-speech system using these techniques and show the performanceudof the system.
展开▼