Persian Viseme Classification for Developing Visual Speech Training Application

机译：波斯语视位分类在视觉语音训练中的应用

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Viseme classification and analysis in every language is among the most important preliminaries for conducting various multimedia researches as talking head, lip reading, lip synchronization and computer assisted pronunciation training applications. Viseme classification and analysis is language dependent. For that reason, in different languages and based on the target applications, visemes of a language are classified. Up to date, there has been no such research in Persian language, in that it makes it rather impossible for researches to be conducted in AVSR system or lip synchronization. In this paper, we propose a novel method adopting an image-based approach for grouping visemes in Persian language considering coarticulation effect. For each phoneme, the central frame is selected in several images representing different positions in various syllables. Having obtained eigenlips of each phoneme, we project each viseme on another viseme's eigenspace. Then the weight value as a result of reconstruction is set as the criterion for comparing viseme similarity. The experimental results indicate an ideal precision and robustness of the proposed algorithm.

机译：Viseme的每种语言分类和分析是进行各种多媒体研究的最重要的前提之一，这些研究包括说话头，嘴唇读取，嘴唇同步和计算机辅助发音训练应用。 Viseme的分类和分析取决于语言。因此，基于目标应用程序，以不同的语言来分类语言的视位素。迄今为止，还没有用波斯语进行过这样的研究，因为这使得不可能在AVSR系统或口型同步中进行研究。在本文中，我们提出了一种新的方法，该方法采用基于图像的方法来考虑波斯语的协同发音效果，从而对波斯语中的语音素进行分组。对于每个音素，在代表各个音节中不同位置的几幅图像中选择中心框架。获得每个音素的特征唇之后，我们将每个视素投影到另一个视位的特征空间上。然后，将作为重构结果的权重值设置为比较视位素相似度的标准。实验结果表明该算法具有理想的精度和鲁棒性。

著录项

来源
《Advances in multimedia information processing - PCM 2009》|2009年|P.1080-1085|共6页
会议地点 Bangkok(TH);Bangkok(TH)
作者
Azam Bastanfard; Mohammad Aghaahmadi; Alireza Abdi kelishami; Maryam Fazel; Maedeh Moghadam;
展开▼
作者单位

Computer Engineering Faculty, Islamic Azad University of Karaj, Moazen Blvd., Karaj, Iran;

rnComputer Engineering Faculty, Islamic Azad University of Qazvin, Daneshgah St., Nokhbegan Blvd., Qazvin, Iran;

rnComputer Engineering Faculty, Islamic Azad University of Qazvin, Daneshgah St., Nokhbegan Blvd., Qazvin, Iran;

rnComputer Engineering Faculty, Islamic Azad University of Qazvin, Daneshgah St., Nokhbegan Blvd., Qazvin, Iran;

rnComputer Engineering Faculty, Islamic Azad University of Qazvin, Daneshgah St., Nokhbegan Blvd., Qazvin, Iran;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类计算机网络;
关键词
multimedia systems; computer assisted pronunciation training; viseme classification; audio visual process; eigenspace;

机译：多媒体系统；计算机辅助发音训练；视位分类视听过程；本征空间;

相似文献

外文文献
中文文献
专利

1. Clustering Persian viseme using phoneme subspace for developing visual speech application [J] . Mohammad Aghaahmadi, Mohammad Mahdi Dehshibi, Azam Bastanfard, Multimedia Tools and Applications . 2013,第3期

机译：使用音素子空间将波斯视位聚类以开发视觉语音应用程序
2. About Neural-Network Algorithms Application in Viseme Classification Problem with Face Video in Audiovisual Speech Recognition Systems [J] . A. V. Savchenko, Ya. I. Khokhlova Optical memory & neural networks . 2014,第1期

机译：关于神经网络算法在视听语音识别系统中带有面部视频的Viseme分类问题中的应用
3. A hybrid approach for automatic lip localization and viseme classification to enhance visual speech recognition [J] . Walid Mahdi, Salah Werda, Abdelmajid Ben Hamadou Integrated Computer-Aided Engineering . 2008,第3期

机译：自动嘴唇定位和视位分类的混合方法，以增强视觉语音识别
4. Persian Viseme Classification for Developing Visual Speech Training Application [C] . Azam Bastanfard, Mohammad Aghaahmadi, Alireza Abdi kelishami, Pacific Rim Conference on Multimedia . 2009

机译：波斯视觉培训应用程序的激烈分类
5. Classification of visemes using visual cues. [D] . Alothmany, Nazeeh Shuja. 2009

机译：使用视觉提示对视位素进行分类。
6. Multisensory training can promote or impede visual perceptual learning of speech stimuli: visual-tactile vs. visual-auditory training [O] . Silvio P. Eberhardt, Edward T. Auer Jr., Lynne E. Bernstein 2014

机译：多感官训练可以促进或阻碍语音刺激的视觉感知学习：视觉触觉与视觉听觉训练
7. Synthesising visual speech using dynamic visemes and deep learning architectures [O] . Ausdang Thangthai, Ben Milner, Sarah Taylor 2019

机译：使用动态探测和深度学习架构综合视觉言论

Persian Viseme Classification for Developing Visual Speech Training Application

摘要

著录项

相似文献

相关主题

期刊订阅