首页> 中文期刊> 《智能系统学报》 >一种多模态融合的网络视频相关性度量方法

一种多模态融合的网络视频相关性度量方法

         

摘要

随着网络和多媒体技术的发展,视频分享网站中的网络视频数量呈爆炸式增长。海量视频库中的高精度视频检索、分类、标注等任务成为亟待解决的研究问题。视频间的相关性度量是这些问题所面临的一个共性基础技术。本文从视频视觉内容,视频标题和标签文本,以及视频上传时间、类别、作者3种人与视频交互产生的社会特征等多源异构信息出发,提出一种新颖的多模态融合的网络视频相关性度量方法,并将所获相关性应用到大规模视频检索任务中。 YouTube数据上的实验结果显示:相对于传统单一文本特征、单一视觉特征的检索方案,以及文本和视觉特征相融合的检索方案,文本视觉和用户社会特征多模态融合方法表现出更好的性能。%With the advances in internet and multimedia technologies, the number of web videos on social video platforms rapidly grows. Therefore, tasks such as large⁃scale video retrieval, classification, and annotation become issues that need to be urgently addressed. Web video relatedness serves as a basic and common infrastructure for these issues. This paper investigates the measurement of web video relatedness from a multi⁃modal fusion perspec⁃tive. It proposes to measure web video relatedness based on multi⁃source heterogeneous information. The multi⁃mo⁃dal fusion simultaneously leverages videos'visual content, title, and tag text as well as social features contributed by human⁃video interactions (i.e., the upload time, channel, and author of a video). Consequently, a novel multi⁃modal fusion approach is proposed for computing web video relatedness, which serves to give a ranking criterion and is applied to the task of large⁃scale video retrieval. Experimental results using YouTube videos show that the pro⁃posed text, visual, and users' social feature multi⁃modal fusion approach performs best in comparison tests with three alternate approaches;i.e., those approaches that compute web video relatedness based just on text features, just on visual features, or jointly on text and visual features.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号