首页> 外文会议>International conference on text, speech, and dialogue >LSTM-Based Speech Segmentation for TTS Synthesis
【24h】

LSTM-Based Speech Segmentation for TTS Synthesis

机译:基于LSTM的TTS综合语音分割

获取原文

摘要

This paper describes experiments on speech segmentation for the purposes of text-to-speech synthesis. We used a bidirectional LSTM neural network for framewise phone classification and another bidirectional LSTM network for predicting the duration of particular phones. The proposed segmentation procedure combines both outputs and finds the optimal speech-phoneme alignment by using the dynamic programming approach. We introduced two modifications to increase the robustness of phoneme classification. Experiments were performed on 2 professional voices and 2 amateur voices. A comparison with a reference HMM-based segmentation with additional manual corrections was performed. Preference listening tests showed that the reference and experimental segmentation are equivalent when used in a unit selection TTS system.
机译:本文介绍了用于语音合成的语音分割的实验。我们使用了用于框架电话分类的双向LSTM神经网络和另一个双向LSTM网络,用于预测特定手机的持续时间。建议的分割过程结合了两个输出并通过使用动态编程方法找到最佳语音 - 音素对齐。我们介绍了两个修改,以增加音素分类的稳健性。实验是在2个专业的声音和2个业余声音上进行的。执行与附加手动校正的基于HMM的分割的比较。偏好聆听测试表明,当在单位选择TTS系统中使用时,参考和实验分割是等效的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号