Egocentric visual scene description based on human-object interaction and deep spatial relations among objects

Gulraiz Khan; Muhammad Usman Ghani; Aiman Siddiqi; Zahoor-ur-Rehman; Sanghyun Seo; Sung Wook Baik; Irfan Mehmood

首页> 外文期刊>Multimedia Tools and Applications >Egocentric visual scene description based on human-object interaction and deep spatial relations among objects

【24h】

Egocentric visual scene description based on human-object interaction and deep spatial relations among objects

机译：基于人对象交互的自我视觉场景描述和物体之间的深层空间关系

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Visual Scene interpretation is one of the major areas of research in the recent past. Recognition of human object interaction is a fundamental step towards understanding visual scenes. Videos can be described via a variety of human-object interaction scenarios such as when both human and object are static (static-static), one is static while other is dynamic (static-dynamic) and both are dynamic (dynamic-dynamic). This paper presents a unified framework for the explanation of these interactions between humans and a variety of objects using deep learning as a pivot methodology. Human-object interaction is extracted through native machine learning techniques, while spatial relations are captured by training a model through convolution neural network. We also address the recognition of human posture in detail to provide egocentric visual description. After extracting visual features, sequential minimal optimization is employed for training our model. Extracted inter-action, spatial relations and posture information are fed into natural language generation module along with interacting object label to generate scene understanding. Evaluation of the proposed framework is done for two state of the art datasets i.e., MSCOCO and MSR3D Daily activity dataset; where achieved results are 78 and 91.16% accurate, respectively.

机译：视觉场景解释是最近过去的主要研究领域之一。对人体对象互动的认识是了解视觉场景的基本步骤。视频可以通过各种人类对象交互方案描述，例如人类和对象都是静态（静态静态）时，一个是静态的，而其他是动态（静态动态），两者都是动态（动态动态）。本文提出了一个统一的框架，用于解释这些人类与各种物体之间的这些相互作用，使用深度学习作为枢轴方法。通过天然机器学习技术提取人对象交互，而通过卷积神经网络训练模型来捕获空间关系。我们还详细介绍了人类姿势的认可，以提供专业的视觉描述。在提取可视特征后，采用顺序最小优化来培训我们的模型。提取的帧间间，空间关系和姿势信息被送入自然语言生成模块以及交互对象标签以生成场景理解。对所提出的框架的评估是针对两个最先进的数据集的状态，Mscoco和MSR3D日常活动数据集完成;在达到的结果分别为78和91.16％的准确情况下。

著录项

来源
《Multimedia Tools and Applications》 |2020年第24期|15859-15880|共22页
作者
Gulraiz Khan; Muhammad Usman Ghani; Aiman Siddiqi; Zahoor-ur-Rehman; Sanghyun Seo; Sung Wook Baik; Irfan Mehmood;
展开▼
作者单位

Al-Khwarizmi Institute of Computer Science UET Lahore Pakistan;

Al-Khwarizmi Institute of Computer Science UET Lahore Pakistan Department of Computer Science and Engineering University of Engineering and Technology Lahore Lahore Pakistan;

Al-Khwarizmi Institute of Computer Science UET Lahore Pakistan;

Department of Computer Science COMSATS University Islamabad Attack Campus Pakistan;

Department of Media Software Sungkyul University Anyang-si South Korea;

Department of Software Sejong University Seoul South Korea;

Department of Software Sejong University Seoul South Korea;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Scene description; Classification; Surveillance; Human-object interaction; Spatial relations; Deep neural network;

机译：场景描述;分类;监视;人对象互动;空间关系;深神经网络;

相似文献

外文文献
中文文献
专利

1. Learning spatial relations and shapes for structural object description and scene recognition [J] . Clement Michael, Kurtz Camille, Wendling Laurent Pattern Recognition: The Journal of the Pattern Recognition Society . 2018,第期

机译：学习结构对象描述和场景识别的空间关系和形状
2. Assisting the Visually Impaired in Multi‑object Scene Description Using OWA‑Based Fusion of CNN Models [J] . Haikel Alhichri, Yakoub Bazi, Naif Alajlan Arabian Journal for Science and Engineering. Section A, Sciences . 2020,第12期

机译：协助使用基于OWA的CNN模型的融合来帮助视觉损害
3. Scene perception system for visually impaired based on object detection and classification using multimodal deep convolutional neural network [J] . Kaur Baljit, Bhattacharya Jhilik Journal of electronic imaging . 2019,第1期

机译：基于多模态深度卷积神经网络的基于目标检测和分类的视障者场景感知系统
4. Deep Recurrent Architecture based Scene Description Generator for Visually Impaired [C] . Aviral Chharia, Rahul Upadhyay International Congress on Ultra Modern Telecommunications and Control Systems and Workshops . 2020

机译：基于深度递归架构的视力障碍场景描述生成器
5. Visual Recognition and Synthesis of Human-Object Interactions [D] . ?Chao, Yu-Wei 2019

机译：人体对象交互的视觉识别和合成
6. Scaling Human-Object Interaction Recognition in the Video through Zero-Shot Learning [O] . Vali Ollah Maraghi, Karim Faez 2021

机译：通过零射击学习将人类对象交互识别缩放
7. Scene perception system for visually impaired based on object detection and classification using multimodal deep convolutional neural network [O] . Baljit Kaur, Jhilik Bhattacharya 2019

机译：基于对象检测和使用多模式深卷积神经网络的对象检测和分类，现场感知系统

Egocentric visual scene description based on human-object interaction and deep spatial relations among objects

摘要

著录项

相似文献

相关主题

期刊订阅