...
首页> 外文期刊>Journal of information and computational science >Spatial and Temporal Visual Speech Feature for Chinese Phonemes
【24h】

Spatial and Temporal Visual Speech Feature for Chinese Phonemes

机译:汉语音素的时空视觉语音特征

获取原文
获取原文并翻译 | 示例
           

摘要

This paper aims to propose a practical set of features for representing the visual speech of Chinese phonemes. The state and hence visibility of teeth and tongue play important roles in pronunciation, but discriminating them in images or video is tricky. This paper introduces the concept of inner appearance features based on structural analysis. Our experiment results show preliminary evidence that describing the pixel distribution of the upper and lower inner mouth separately can improve the ability to discriminate useful facial features as well as individual phonemes. The Chinese phonemes defined in the SAPI Speech Interface generally corresponding to one character or morpheme, and our dynamic feature is proposed based on the traditional division of these syllabic phonemes into a consonant-like onset and a vowel- and/or nasal-like coda. Features are established by combining a series of frames and identifying the most salient change frame as the key frame to avoid provide an objective framework for phoneme onset recognition. Our work provides a basis for bimodal Audiovisual Chinese speech recognition as well as unimodal Visual speech reading, but is also targeted to AudioVisual speaking face/talking head synthesis.
机译:本文旨在提出一套实用的功能来表示中文音素的视觉语音。牙齿和舌头的状态以及可见度在发音中起着重要的作用,但是在图像或视频中区分它们却很棘手。本文基于结构分析介绍了外观特征的概念。我们的实验结果表明,初步描述分别描述上下内口的像素分布可以提高区分有用面部特征和单个音素的能力。在SAPI语音界面中定义的中文音素通常对应一个字符或语素,而我们的动态功能是根据这些音节音素的传统划分方式而提出的,即将其分为辅音型和元音/鼻型。通过组合一系列框架并将最显着的变化框架确定为关键框架来建立功能,从而避免为音素发作识别提供客观的框架。我们的工作为双峰视听中文语音识别以及单峰视听阅读提供了基础,但同时也致力于视听说话人脸/说话人头部的合成。

著录项

  • 来源
    《Journal of information and computational science》 |2012年第14期|4177-4185|共9页
  • 作者单位

    Multimedia and Intelligent Software Technology, Beijing Municipal Key Laboratory, Beijing University of Technology, Beijing 100124, China;

    Multimedia and Intelligent Software Technology, Beijing Municipal Key Laboratory, Beijing University of Technology, Beijing 100124, China;

    Multimedia and Intelligent Software Technology, Beijing Municipal Key Laboratory, Beijing University of Technology, Beijing 100124, China,School of Computer Science, Engineering and Mathematics, Flinders University of South Australia Adelaide, Australia;

    Multimedia and Intelligent Software Technology, Beijing Municipal Key Laboratory, Beijing University of Technology, Beijing 100124, China;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    inner lip appearance; dynamic feature; chinese phoneme;

    机译:内唇外观;动态特征中文音素;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号