首页> 外文学位 >Audio-visual interactions in multimodal communications using facial animation parameters.

【24h】

Audio-visual interactions in multimodal communications using facial animation parameters.

机译：使用面部动画参数的多模式通信中的视听交互。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Over time, reliable speech communication and recognition systems will become increasingly bimodal, where both audio and visual information will be captured, transmitted or stored, and processed. The interaction between different research areas using audio and visual information has opened a door to many bimodal applications.; Different ways of utilizing this interaction are explored in the work presented in this thesis, focusing on the extraction of MPEG-4 compliant Facial Animation Parameters (FAPs), utilization of such parameters for robust audio-visual speech recognition, speech driven facial animation, audio-visual person recognition, and automatic facial expression. MPEG-4 is expected to become a dominant standard in a number of applications and, therefore, working within its framework adds to the usefulness and applicability of this work.; A novel automatic and robust visual feature extraction approach that combines active contour and deformable templates algorithms and does not require prior knowledge about the data, extensive computational training, or hand labeling is developed in this work.; The audio-visual continuous speech recognition system developed, significantly improves speech recognition performance over a wide range of acoustic noise levels, for different dimensionalities of visual features. The speech recognition experiments were performed on a relatively large vocabulary audio-visual database. The improvement in ASR performance that can be obtained by exploiting the visual speech information contained in outer- and inner-lip FAPs was determined.; The developed HMM-based speech-to-video synthesis system integrates acoustic HMMs (AHMMs) and visual HMMs (VHMMs). This approach allows independent modeling of acoustic and visual signals. The acoustic state sequence is mapped into a visual state sequence using the correlation HMM (CHMM) system. The resulting visual state sequence is used to produce sequence of visual observations (FAPs). The performance of the system was evaluated through several objective experiments. The experiments showed that the proposed speech-to-video synthesis system significantly reduces time-alignment errors compared to the conventional temporal scaling method. The objective FAP comparison results confirmed the strong similarity between the synthesized FAPs and the original FAPs.; In addition, audio-visual person verification and automatic facial expression recognition systems are also developed and described in this thesis.

机译：随着时间的流逝，可靠的语音通信和识别系统将越来越成为双峰的，其中音频和视觉信息都将被捕获，传输或存储以及处理。使用音频和视频信息的不同研究领域之间的交互为许多双峰应用打开了一扇门。在本文提出的工作中，探索了利用这种交互的不同方式，重点是提取符合MPEG-4的面部动画参数（FAP），将这些参数用于健壮的视听语音识别，语音驱动的面部动画，音频-视觉人识别和自动面部表情。 MPEG-4有望成为许多应用程序中的主要标准，因此，在其框架内工作将增加这项工作的实用性和适用性。在这项工作中，开发了一种新颖的自动且鲁棒的视觉特征提取方法，该方法结合了主动轮廓和可变形模板算法，并且不需要有关数据的先验知识，广泛的计算训练或手标记。开发的视听连续语音识别系统可针对各种维度的视觉特征，在很大范围的声噪声水平上显着提高语音识别性能。语音识别实验是在相对较大的词汇视听数据库上进行的。通过利用包含在外唇和内唇FAP中的视觉语音信息，可以确定ASR性能的提高。基于HMM的开发的语音到视频合成系统集成了声学HMM（AHMM）和可视HMM（VHMM）。这种方法允许对声音和视觉信号进行独立建模。使用相关HMM（CHMM）系统将声学状态序列映射到视觉状态序列。所得的视觉状态序列用于产生视觉观察序列（FAP）。通过几个客观实验评估了系统的性能。实验表明，与传统的时间缩放方法相比，所提出的语音视频合成系统显着减少了时间对准误差。 FAP的客观比较结果证实了合成的FAP与原始FAP之间的强烈相似性。此外，本文还开发并描述了视听人员验证和面部表情自动识别系统。

著录项

作者
Aleksic, Petar S.;
展开▼
作者单位

Northwestern University.;

展开▼
授予单位 Northwestern University.;
学科 Engineering Electronics and Electrical.
学位 Ph.D.
年度 2004
页码 224 p.
总页数 224
原文格式 PDF
正文语种 eng
中图分类无线电电子学、电信技术;
关键词

相似文献

外文文献
中文文献
专利

1. Audio-visual interaction in multimodal communication [J] . Chellappa R., Tsuhan Chen IEEE Signal Processing Magazine . 1997,第4期

机译：多模式通信中的视听交互
2. Construction of Audio-Visual Speech Corpus Using Motion-Capture System and Corpus Based Facial Animation [J] . Tatsuo YOTSUKURA, Shigeo MORISHIMA, Satoshi NAKAMURA IEICE Transactions on Information and Systems . 2005,第11期

机译：基于动作捕捉系统和基于人脸动画的视听语音语料库的构建
3. AUDIOVISUAL INTEGRATION IN MULTIMEDIA COMMUNICATIONS BASED ON MPEG-4 FACIAL ANIMATION [J] . Z.S. Bojkovic, D.A. Milovanovic Circuits Systems and Signal Processing . 2001,第3a4期

机译：基于MPEG-4动画的多媒体通信中的视听集成。
4. COMPARISON OF MPEG-4 FACIAL ANIMATION PARAMETER GROUPS WITH RESPECT TO AUDIO-VISUAL SPEECH RECOGNITION PERFORMANCE [C] . Petar S. Aleksic, Aggelos K. Katsaggelos International Conference on Image Processings . 2005

机译：MPEG-4面部动画参数组关于视听语音识别性能的比较
5. Multimodal signal processing with MPEG -4 facial animation parameters. [D] . Wu, Zhilin. 2004

机译：具有MPEG -4面部动画参数的多峰信号处理。
6. Multimodality and the origin of a novel communication system in face-to-face interaction [O] . Vinicius Macuch Silva, Judith Holler, Asli Ozyurek, 2020

机译：多模式性和面对面互动中新型通信系统的起源
7. Product Hmms For Audio-Visual Continuous Speech Recognition Using Facial Animation Parameters [O] . Petar S. Aleksic et al. 2003

机译：用于使用面部动画参数的视听连续语音识别的产品Hmms
8. Effects of Multimodal Mobile Communications on Cooperative Team Interactions Executing Distributed Tasks. [R] . Burnett, G., Calvo, A., Finomore, V., 2013

机译：多模移动通信对协同团队交互执行分布式任务的影响。

Audio-visual interactions in multimodal communications using facial animation parameters.

摘要

著录项

相似文献

相关主题

期刊订阅