首页> 外文会议>International Conference on Medical Image Computing and Computer-Assisted Intervention;MICCAI 2008 >Human Vocal Tract Analysis by in Vivo 3D MRI during Phonation: A Complete System for Imaging, Quantitative Modeling, and Speech Synthesis
【24h】

Human Vocal Tract Analysis by in Vivo 3D MRI during Phonation: A Complete System for Imaging, Quantitative Modeling, and Speech Synthesis

机译:语音化过程中通过体内3D MRI进行人声道分析:成像,定量建模和语音合成的完整系统

获取原文

摘要

We present a complete system for image-based 3D vocal tract analysis ranging from MR image acquisition during phonation, semi-automatic image processing, quantitative modeling including model-based speech synthesis, to quantitative model evaluation by comparison between recorded and synthesized phoneme sounds. For this purpose, six professionally trained speakers, age 22-34y, were examined using a standardized MRI protocol (1.5 T, T1w FLASH, ST 4mm, 23 slices, acq. time 21s). The volunteers performed a prolonged ( ≥ 21s) emission of sounds of the German phonemic inventory. Simultaneous audio tape recording was obtained to control correct utterance. Scans were made in axial, coronal, and sagittal planes each. Computer-aided quantitative 3D evaluation included (1) automated registration of the phoneme-specific data acquired in different slice orientations, (2) semi-automated segmentation of oropharyngeal structures, (3) computation of a curvilinear vocal tract mid-line in 3D by nonlinear PCA, (4) computation of cross-sectional areas of the vocal tract perpendicular to this midline. For the vowels /a/,/e/,/i/,/o/,/(o)/,/u/,/y/, the extracted area functions were used to synthesize phoneme sounds based on an articulatory-acoustic model. For quantitative analysis, recorded and synthesized phonemes were compared, where area functions extracted from 2D mid-sagittal slices were used as a reference. All vowels could be identified correctly based on the synthesized phoneme sounds. The comparison between synthesized and recorded vowel phonemes revealed that the quality of phoneme sound synthesis was improved for phonemes /a/, /o/, and /y/, if 3D instead of 2D data were used, as measured by the average relative frequency shift between recorded and synthesized vowel formants (p<0.05, one-sided Wilcoxon rank sum test). In summary, the combination of fast MRI followed by subsequent 3D segmentation and analysis is a novel approach to examine human phonation in vivo. It unveils functional anatomical findings that may be essential for realistic modelling of the human vocal tract during speech production.
机译:我们为基于图像的3D声道分析提供了一个完整的系统,范围包括发声期间的MR图像采集,半自动图像处理,包括基于模型的语音合成在内的定量建模,以及通过比较已记录和已合成音素声音的定量模型评估。为此,使用标准化的MRI协议(1.5 T,T1w闪光灯,ST 4毫米,23片,时间为21秒)检查了六名年龄在22-34岁的经过专业培训的演讲者。志愿者长时间(≥21秒)发出德国音位清单的声音。同时录制了录音带,以控制正确的发音。分别在轴向,冠状和矢状面进行扫描。计算机辅助的定量3D评估包括(1)自动记录在不同切片方向上获取的音素特定数据,(2)口咽结构的半自动分割,(3)通过非线性PCA,(4)垂直于此中线的声道横截面积的计算。对于元音/ a /,/ e /,/ i /,/ o /,/(o)/,/ u /,/ y /,提取的区域函数用于基于发音声学模型合成音素声音。为了进行定量分析,比较了记录的和合成的音素,其中从2D中矢状切片中提取的面积函数用作参考。所有元音都可以根据合成的音素声音正确识别。合成元音和记录元音音素之间的比较表明,如果使用3D而不是2D数据(通过平均相对频移测量),则音素/ a /,/ o /和/ y /的音素声音合成质量得到了改善。在记录的元音和合成的元音共振峰之间变化(p <0.05,单面Wilcoxon秩和检验)。总而言之,快速MRI与随后的3D分割和分析相结合是一种在体内检查人声的新颖方法。它揭示了功能性解剖发现,这可能是语音生成过程中对人声道进行逼真的建模所必不可少的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号