Prosody-aware subword embedding considering Japanese intonation systems and its application to DNN-based multi-dialect speech synthesis

机译：考虑日语语调系统的韵律意识子词嵌入及其在基于DNN的多方言语音合成中的应用

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper presents prosody-aware subword embedding considering Japanese intonation systems and its application to DNN (deep neural network)-based multi-dialect speech synthesis. In accordance with recent improvements of speech synthesis in rich-resourced languages, the research trend is shifting to more challenging languages such as Japanese dialects that still have undefined prosodic contexts. Conventional prosody-aware word embedding can unsupervisedly extract the contexts in a data-driven manner using words and F0 sequences. However, accurate contexts for unknown words are difficult to generate. To solve this problem, we propose prosody-aware subword embedding considering Japanese intonation systems. The unsupervised subword model, which is trained considering language and acoustic characteristics, can tokenize an unknown word into known subwords suitable for prosody-aware embedding. We also propose a modulation filtering method considering intra-subword moras to improve the embedding accuracies. We apply the methods to not only Japanese but also Japanese multi-dialect speech synthesis. In the multi-dialect case, we propose subword models shared among dialects and embedding models conditioned by dialect information. The experimental evaluation demonstrates that the proposed multi-dialect methods can improve speech quality in some Japanese dialects.

机译：本文介绍了考虑日语语调系统的韵律感知子词嵌入及其在基于DNN（深度神经网络）的多方言语音合成中的应用。根据资源丰富的语言中语音合成的最新改进，研究趋势正在转向更具挑战性的语言，例如日语方言，这些语言仍具有不确定的韵律语境。常规的感知韵律的单词嵌入可以使用单词和F0序列以数据驱动的方式无监督地提取上下文。但是，难以生成未知单词的准确上下文。为了解决这个问题，我们提出了考虑日语语调系统的韵律感知子词嵌入。经过训练的无监督子词模型考虑了语言和声学特征，可以将未知词标记为适合于韵律感知嵌入的已知子词。我们还提出了一种考虑子内字词修饰的调制滤波方法，以提高嵌入精度。我们不仅将方法应用于日语，而且还将其应用于日语多方言语音合成。在多方言的情况下，我们提出了在方言之间共享的子词模型和以方言信息为条件的嵌入模型。实验评估表明，所提出的多方言方法可以提高某些日本方言的语音质量。

著录项

来源
《Asia-Pacific Signal and Information Processing Association Annual Summit and Conference》|2018年|659-664|共6页
会议地点
作者
Takanori Akiyama; Shinnosuke Takamichi; Hiroshi Saruwatari;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Speech synthesis; Modulation; Training; Context modeling; Training data; Feature extraction; Data models;

机译：语音合成;调制;训练;上下文建模;训练数据;特征提取;数据模型;

相似文献

外文文献
中文文献
专利

1. A method for combining intonation modelling and speech unit selection in corpus-based speech synthesis systems [J] . Francisco Campillo Díaz, Eduardo Rodríguez Banga Speech Communication . 2006,第8期

机译：基于语料库的语音合成系统中语调建模与语音单元选择相结合的方法
2. Pre-Training of DNN-Based Speech Synthesis Based on Bidirectional Conversion between Text and Speech [J] . Kentaro SONE, Toru NAKASHIKA IEICE transactions on information and systems . 2019,第8期

机译：基于文本和语音之间双向转换的基于DNN的语音合成的预训练
3. Phonological theory informs the analysis of intonational exaggeration in Japanese infant-directed speech [J] . Igarashi Y., Nishikawa K., Tanaka K., The Journal of the Acoustical Society of America . 2013,第2aPta1期

机译：语音理论为日本婴儿语音中的国际夸张分析提供了依据
4. Prosody-aware subword embedding considering Japanese intonation systems and its application to DNN-based multi-dialect speech synthesis [C] . Takanori Akiyama, Shinnosuke Takamichi, Hiroshi Saruwatari Asia-Pacific Signal and Information Processing Association Annual Summit and Conference . 2018

机译：考虑日语语调系统的韵律感知子字嵌入及其在基于DNN的多方面语音合成中的应用
5. ENGLISH INTONATION AND COMPUTERIZED SPEECH SYNTHESIS. [D] . LEVINE, ARVIN. 1977

机译：英语语调和计算机语音合成。
6. BN Embedded Polycyclic π-Conjugated Systems: Synthesis Optoelectronic Properties and Photovoltaic Applications [O] . Jianhua Huang, Yuqing Li 2018

机译：BN嵌入式多环π共轭系统：合成光电性质和光伏应用
7. DNN-based Speaker Embedding Using Subjective Inter-speaker Similarity for Multi-speaker Modeling in Speech Synthesis [O] . Yuki Saito, Shinnosuke Takamichi, Hiroshi Saruwatari 2019

机译：基于DNN的扬声器使用主观讲话者相似性，用于语音合成中的多扬声器建模

Prosody-aware subword embedding considering Japanese intonation systems and its application to DNN-based multi-dialect speech synthesis

摘要

著录项

相似文献

相关主题

期刊订阅