YouMakeup: A Large-Scale Domain-Specific Multimodal Dataset for Fine-Grained Semantic Comprehension

机译：YouMakeup：用于细粒度语义理解的大规模领域特定多模式数据集

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Multimodal semantic comprehension has attracted increasing research interests in recent years, such as visual question answering and caption generation. However, due to the data limitation, fine-grained semantic comprehension which requires to capture semantic details of multimodal contents has not been well investigated. In this work, we introduce "YouMakeup", a large-scale multimodal instructional video dataset to support finegrained semantic comprehension research in specific domain. YouMakeup contains 2,800 videos from YouTube, spanning more than 420 hours in total. Each video is annotated with a sequence of natural language descriptions for instructional steps, grounded in temporal video range and spatial facial areas. The annotated steps in a video involve subtle difference in actions, products and regions, which require fine-grained understanding and reasoning both temporally and spatially. In order to evaluate models' ability for fined-grained comprehension, we further propose two groups of tasks including generation tasks and visual question answering tasks from different aspects. We also establish a baseline of step caption generation for future comparison.

机译：近年来，多模式语义理解已经吸引了越来越多的研究兴趣，例如视觉问题解答和字幕生成。然而，由于数据的限制，需要捕获多模式内容的语义细节的细粒度语义理解尚未得到很好的研究。在这项工作中，我们介绍了“ YouMakeup”，这是一个大规模的多模式教学视频数据集，可支持特定领域的细粒度语义理解研究。 YouMakeup包含来自YouTube的2,800个视频，总计超过420小时。每个视频都以一系列自然语言描述进行注释，这些描述以教学步骤为基础，并以时间视频范围和空间面部区域为基础。视频中带注释的步骤涉及动作，产品和区域的细微差异，需要在时间和空间上进行细粒度的理解和推理。为了评估模型的细粒度理解能力，我们进一步提出了两组任务，包括生成任务和视觉问题回答任务的不同方面。我们还建立了步骤字幕生成的基线，以便将来进行比较。

著录项

来源
《International joint conference on natural language processing;Conference on empirical methods in natural language processing》|2019年|5132-5142|共11页
会议地点
作者
Weiying Wang; Yongcheng Wang; Shizhe Chen; Qin Jin;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. IAIR-CarPed: A psychophysically annotated dataset with fine-grained and layered semantic labels for object recognition [J] . Yang Wu, Yuanliu Liu, Zejian Yuan, Pattern recognition letters . 2012,第2期

机译：IAIR-CarPed：带有细粒度和分层语义标签以进行对象识别的心理注释数据集
2. Multimodal semantic revision during inferential processing: The role of inhibitory control in text and picture comprehension [J] . Perez A., Schmidt E., Kourtzi Z., Neuropsychologia . 2020,第期

机译：推论过程中的多模式语义修订：抑制控制在文本和图像理解中的作用
3. MASAD: A large-scale dataset for multimodal aspect-based sentiment analysis [J] . Zhou Jie, Zhao Jiabao, Huang Jimmy Xiangji, Neurocomputing . 2021,第Sepa30期

机译：MASAD：基于多模式宽高的情感分析的大规模数据集
4. YouMakeup: A Large-Scale Domain-Specific Multimodal Dataset for Fine-Grained Semantic Comprehension [C] . Weiying Wang, Yongcheng Wang, Shizhe Chen, International joint conference on natural language processing . 2019

机译：Youmakeup：特定于大型域的多模式数据集，用于细粒度语义理解
5. Fine-grained Activity Recognition Using Multimodal Datasets [D] . Song, Young Chol. 2016

机译：使用多模式数据集的细粒度活动识别
6. Hybrid semantic recommender system for chemical compounds in large-scale datasets [O] . Marcia Barros, Andre Moitinho, Francisco M. Couto 2021

机译：大型数据集中化学化合物的混合语义推荐系统
7. YouMakeup: A Large-Scale Domain-Specific Multimodal Dataset for Fine-Grained Semantic Comprehension [O] . Weiying Wang, Yongcheng Wang, Shizhe Chen, 2019

机译：Youmakeup：特定于大型域的多模式数据集，用于细粒度语义理解

YouMakeup: A Large-Scale Domain-Specific Multimodal Dataset for Fine-Grained Semantic Comprehension

摘要

著录项

相似文献

相关主题

期刊订阅