首页> 外文会议>IEEE International Symposium on Multimedia >Detection of Inconsistency Between Subject and Speaker Based on the Co-occurrence of Lip Motion and Voice Towards Speech Scene Extraction from News Videos

【24h】

Detection of Inconsistency Between Subject and Speaker Based on the Co-occurrence of Lip Motion and Voice Towards Speech Scene Extraction from News Videos

机译：根据新闻视频的唇观运动和语音提取的唇观运动和语音的共同发生，检测主题与扬声器的不一致

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

We propose a method to detect the inconsistency between a subject and the speaker for extracting speech scenes from news videos. Speech scenes in news videos contain a wealth of multimedia information, and are valuable as archived material. In order to extract speech scenes from news videos, there is an approach that uses the position and size of a face region. However, it is difficult to extract them with only such approach, since news videos contain non-speech scenes where the speaker is not the subject, such as narrated scenes. To solve this problem, we propose a method to discriminate between speech scenes and narrated scenes based on the co-occurrence between a subject's lip motion and the speaker's voice. The proposed method uses lip shape and degree of lip opening as visual features representing a subject's lip motion, and uses voice volume and phoneme as audio feature representing a speaker's voice. Then, the proposed method discriminates between speech scenes and narrated scenes based on the correlations of these features. We report the results of experiments on videos captured in a laboratory condition and also on actual broadcast news videos. Their results showed the effectiveness of our method and the feasibility of our research goal.

机译：我们提出了一种检测主题与扬声器之间不一致的方法，用于从新闻视频中提取语音场景。新闻视频中的语音场景包含丰富的多媒体信息，并且有价值作为归档材料。为了从新闻视频中提取语音场景，存在一种方法，它使用面部区域的位置和大小。然而，只有这样的方法很难提取它们，因为新闻视频包含扬声器不是主题的非语音场景，例如叙述场景。为了解决这个问题，我们提出了一种方法来基于受试者的唇部运动和扬声器的声音之间的共同发生来区分语音场景和叙述场景。该方法使用唇部形状和唇部开口，作为代表受试者的唇部运动的视觉特征，并使用语音卷和音素作为表示扬声器的声音的音频特征。然后，所提出的方法基于这些特征的相关性来判断语音场景和叙述场景。我们报告了在实验室条件中捕获的视频的实验结果以及实际广播新闻视频。他们的结果表明了我们的方法的有效性和我们的研究目标的可行性。

著录项

来源
《IEEE International Symposium on Multimedia》|2011年||共8页
会议地点
作者
Kumagai Shogo; Doman Keisuke; Takahashi Tomokazu; Deguchi Daisuke; Ide Ichiro; Murase Hiroshi;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP37-53;
关键词
audiovisual integration; correlation; lip motion; news videos; speech scene extraction;

机译：视听融合;相关;唇动;新闻视频;言语场景提取;

相似文献

外文文献
中文文献
专利

1. Research on Motion Attention Fusion Model-Based Video Target Detection and Extraction of Global Motion Scene [J] . Long Liu, Boyang Fan, Jing Zhao Journal of Signal and Information Processing . 2013,第3期

机译：基于动画融合模型的视频目标检测与全局运动场景的研究
2. Multimodal speaker/speech recognition using lip motion, lip texture and audio [J] . Cetingul HE, Erzin E, Yemez Y, Signal processing . 2006,第12期

机译：使用嘴唇运动，嘴唇纹理和音频的多模式说话者/语音识别
3. Motion Direction Inconsistency-Based Fight Detection for Multiview Surveillance Videos [J] . Chuang Yao, Xiaoyan Su, Xuehua Wang, Wireless communications & mobile computing . 2021,第a期

机译：MultiView监控视频的运动方向基于不一致的战斗检测
4. Detection of Inconsistency Between Subject and Speaker Based on the Co-occurrence of Lip Motion and Voice Towards Speech Scene Extraction from News Videos [C] . Kumagai Shogo, Doman Keisuke, Takahashi Tomokazu, 2011 IEEE International Symposium on Multimedia . 2011

机译：基于嘴唇运动和语音并发的新闻视频语音场景提取，检测主题和说话者之间的不一致
5. Video State Extraction for Decision-Making through Motion-Based Detection, Tracking, and Clustering [D] . Miao, Tianshun 2015

机译：通过基于运动的检测，跟踪和聚类进行决策的视频状态提取
6. The Feature Extraction Based on Texture Image Information for Emotion Sensing in Speech [O] . Kun-Ching Wang 2014

机译：基于纹理图像信息的语音情感特征提取
7. Speaker Change Detection and Speaker Clustering Using VQ Distortion for Broadcast News Speech Recognition [O] . Kazumasa Mori, Seiichi Nakagawa 2001

机译：利用VQ失真进行广播新闻语音识别的扬声器变化检测和扬声器聚类

Detection of Inconsistency Between Subject and Speaker Based on the Co-occurrence of Lip Motion and Voice Towards Speech Scene Extraction from News Videos

摘要

著录项

相似文献

相关主题

期刊订阅