首页>
外国专利>
TTS Operating Method for Voice-Conversion Application with Phonetic-Posteriorgram Extractor TTS and Vocoder
TTS Operating Method for Voice-Conversion Application with Phonetic-Posteriorgram Extractor TTS and Vocoder
展开▼
机译:语音后验语音提取器和声码器的语音转换应用中的TTS操作方法
展开▼
页面导航
摘要
著录项
相似文献
摘要
The present invention relates to an operation method of a voice modulation application using a voice post-distribution graph extractor and TTS and a vocoder, allowing the MFCC and the base tone to be extracted from the voice data by the voice data extractor, and the MFCC by the voice post-distribution graph DNN. Can be converted into a speech post-distribution graph, and can be converted into a voice of a desired person using a speech synthesis DNN by speech synthesis DNN, and converted to the voice of the voice input by the linear bass converter. The existing speaker's base tone information is obtained by enabling the user to obtain a high-quality voice modulation result by receiving the speech synthesis DNN and the output of the speech synthesis DNN described above by the speech restoration unit and the speech synthesis DNN. It was to solve the problem of not reflecting. That is, in the present invention, in a speech synthesis technology and apparatus using speech model encoding for speech modulation, a speech data recording unit configured with a microphone and an ADC so that speech can be stored in a computer, MFCC and bass sounds can be extracted from speech data. Voice data extractor using vocoder, voice post-distributiongram DNN learned to convert MFCC to voice post-distributiongram, voice synthesis DNN learned to change to the voice of the desired person with voice post-distribution graph, input voice apparatus Linear bass transformation characterized by restoring the pitch information of the original speaker by performing linear transformation using the average and variance of the target speaker's bass to reconcile it back to the vocoder so that it can match the voice to be converted into the bass. It is composed of a speech reconstruction unit with a built-in vocoder so as to receive the output of the linear bass converter and the speech synthesis DNN and output the final speech synthesis. Therefore, the present invention enables the MFCC and the base tone to be extracted from the speech data by the speech data extractor, and the MFCC to the speech post-distribution graph by the speech post-distribution graph DNN, and the speech post-syntax by the speech synthesis DNN. It is possible to change to a voice of a desired person with a distribution graph, to match the voice of the voice input by the linear bass converter, and to output the linear bass converter described above by the voice restoration unit. And by having a speech synthesis DNN and outputting the final speech synthesis, it is possible to obtain a high-quality speech modulation result, thereby solving the problem of not reflecting the existing speaker's bass information.
展开▼