首页> 外国专利> TTS Operating Method for Voice-Conversion Application with Phonetic-Posteriorgram Extractor TTS and Vocoder

TTS Operating Method for Voice-Conversion Application with Phonetic-Posteriorgram Extractor TTS and Vocoder

机译：语音后验语音提取器和声码器的语音转换应用中的TTS操作方法

页面导航

摘要
著录项
相似文献

摘要

The present invention relates to an operation method of a voice modulation application using a voice post-distribution graph extractor and TTS and a vocoder, allowing the MFCC and the base tone to be extracted from the voice data by the voice data extractor, and the MFCC by the voice post-distribution graph DNN. Can be converted into a speech post-distribution graph, and can be converted into a voice of a desired person using a speech synthesis DNN by speech synthesis DNN, and converted to the voice of the voice input by the linear bass converter. The existing speaker's base tone information is obtained by enabling the user to obtain a high-quality voice modulation result by receiving the speech synthesis DNN and the output of the speech synthesis DNN described above by the speech restoration unit and the speech synthesis DNN. It was to solve the problem of not reflecting. That is, in the present invention, in a speech synthesis technology and apparatus using speech model encoding for speech modulation, a speech data recording unit configured with a microphone and an ADC so that speech can be stored in a computer, MFCC and bass sounds can be extracted from speech data. Voice data extractor using vocoder, voice post-distributiongram DNN learned to convert MFCC to voice post-distributiongram, voice synthesis DNN learned to change to the voice of the desired person with voice post-distribution graph, input voice apparatus Linear bass transformation characterized by restoring the pitch information of the original speaker by performing linear transformation using the average and variance of the target speaker's bass to reconcile it back to the vocoder so that it can match the voice to be converted into the bass. It is composed of a speech reconstruction unit with a built-in vocoder so as to receive the output of the linear bass converter and the speech synthesis DNN and output the final speech synthesis. Therefore, the present invention enables the MFCC and the base tone to be extracted from the speech data by the speech data extractor, and the MFCC to the speech post-distribution graph by the speech post-distribution graph DNN, and the speech post-syntax by the speech synthesis DNN. It is possible to change to a voice of a desired person with a distribution graph, to match the voice of the voice input by the linear bass converter, and to output the linear bass converter described above by the voice restoration unit. And by having a speech synthesis DNN and outputting the final speech synthesis, it is possible to obtain a high-quality speech modulation result, thereby solving the problem of not reflecting the existing speaker's bass information.

机译：本发明涉及使用语音后分配图提取器和TTS以及声码器的语音调制应用的操作方法，该语音调制器允许语音数据提取器从语音数据中提取MFCC和基本音，以及MFCC通过语音后分配图DNN。可以转换为语音后分配图，并且可以通过语音合成DNN使用语音合成DNN转换为所需人员的语音，并通过线性低音转换器转换为语音输入的语音。通过使用户通过语音恢复单元和语音合成DNN接收上述语音合成DNN和上述语音合成DNN的输出，从而使用户能够获得高质量的语音调制结果，来获得现有说话者的基本音调信息。这是为了解决不反映的问题。即，在本发明中，在使用语音模型编码进行语音调制的语音合成技术和设备中，语音数据记录单元配置有麦克风和ADC，使得语音可以存储在计算机中，MFCC和低音可以从语音数据中提取。使用声码器的语音数据提取器，通过DNN学习将MFCC转换为语音后分布图的DNN，通过语音后分布图学习到将所需语音转换为所需人的语音的语音合成DNN，以还原为特征的输入语音设备通过使用目标扬声器低音的平均值和方差执行线性变换，将原始扬声器的音高信息调和回声码器，以便它可以匹配要转换为低音的声音。它由带有内置声码器的语音重构单元组成，以便接收线性低音转换器和语音合成DNN的输出，并输出最终的语音合成。因此，本发明使得能够由语音数据提取器从语音数据中提取MFCC和基本音，并且能够通过语音后分布图DNN和语音后语法从MFCC提取到语音后分布图。由语音合成DNN。可以改变具有分布图的期望人的语音，以匹配由线性低音转换器输入的语音的语音，并且可以由语音恢复单元输出上述线性低音转换器。并且，通过具有语音合成DNN并输出最终的语音合成，可以获得高质量的语音调制结果，从而解决了不反映现有说话者的低音信息的问题。

著录项

公开/公告号KR20200094493A

专利类型
公开/公告日2020-08-07

原文格式PDF
申请/专利权人 김남형;
展开▼

申请/专利号KR20190012042
发明设计人 김남형;
展开▼

申请日2019-01-30
分类号G10L13/033;G10L13/08;G10L19/16;
国家 KR
入库时间 2022-08-21 11:06:14

相似文献

专利
外文文献
中文文献