...
首页> 外文期刊>Expert Systems with Application >MultiD-CNN: A multi-dimensional feature learning approach based on deep convolutional networks for gesture recognition in RGB-D image sequences
【24h】

MultiD-CNN: A multi-dimensional feature learning approach based on deep convolutional networks for gesture recognition in RGB-D image sequences

机译:MultiD-CNN:基于深度卷积网络的多维特征学习方法,用于RGB-D图像序列中的手势识别

获取原文
获取原文并翻译 | 示例
           

摘要

Human gesture recognition has become a pillar of today's intelligent Human-Computer Interfaces as it typically provides more comfortable and ubiquitous interaction. Such expert system has a promising prospect in various applications, including smart houses, gaming, healthcare, and robotics. However, recognizing human gestures in videos is one of the most challenging topics in computer vision, because of some irrelevant environmental factors like complex background, occlusion, lighting conditions, and so on. With the recent development of deep learning, many researchers have addressed this problem by building single deep networks to learn spatiotemporal features from video data. However, the performance is still unsatisfactory due to the limitation that the single deep networks are incapable of handling these challenges simultaneously. Hence, the extracted features cannot efficiently capture both relevant shape information and detailed spatiotemporal variation of the gestures. One solution to overcome the aforementioned drawbacks is to fuse multiple features from different models learned on multiple vision cues. Aiming at this objective, we present in this paper an effective multi-dimensional feature learning approach, termed as MultiD-CNN, for human gesture recognition in RGB-D videos. The key to our design is to learn high-level gesture representations by taking advantages from Convolutional Residual Networks (ResNets) for training extremely deep models and Convolutional Long Short-Term Memory Networks (ConvLSTM) for dealing with time-series connections. More specifically, we first construct an architecture to simultaneously learn the spatiotemporal features from RGB and depth sequences through 3D ResNets which are then linked to a ConvLSTM to capture the temporal dependencies between them, and we show that they better combine appearance and motion information effectively. Second, to alleviate distractions from background and other variations, we propose a method that encodes the temporal information into a motion representation, while a two-stream architecture based on 2D-ResNets is then employed to extract deep features from this representation. Third, we investigate different fusion strategies at different levels for blending the classification results, and we show that integrating multiple ways of encoding the spatial and temporal information leads to a robust and stable spatiotemporal feature learning with better generalization capability. Finally, we perform different experiments to evaluate the performance of the investigated architectures on four kinds of challenging datasets, demonstrating that our approach is particularly impressive where it outperforms prior arts in both accuracy and efficiency. The obtained results affirm also the importance of embedding the proposed approach in other intelligent systems application areas. (C) 2019 Elsevier Ltd. All rights reserved.
机译:手势识别已成为当今智能人机界面的支柱,因为它通常提供更舒适和无所不在的交互。这种专家系统在包括智能住宅,游戏,医疗保健和机器人技术在内的各种应用中都具有广阔的前景。但是,由于一些不相关的环境因素(例如复杂的背景,遮挡,光照条件等),在视频中识别人的手势是计算机视觉中最具挑战性的主题之一。随着深度学习的最新发展,许多研究人员通过构建单个深度网络来从视频数据中学习时空特征来解决这个问题。但是,由于单个深度网络无法同时处理这些挑战的限制,性能仍然不能令人满意。因此,提取的特征不能有效地捕获相关的形状信息和手势的详细时空变化。克服上述缺点的一种解决方案是融合在多个视觉提示上学习到的来自不同模型的多个功能。为了实现这一目标,我们在本文中提出了一种有效的多维特征学习方法,称为MultiD-CNN,用于RGB-D视频中的人类手势识别。我们设计的关键是通过利用卷积残差网络(ResNets)来训练极深的模型以及卷积长短期记忆网络(ConvLSTM)来处理时序连接来学习高级手势表示。更具体地说,我们首先构建一种架构,以通过3D ResNet同时从RGB和深度序列中学习时空特征,然后将其链接到ConvLSTM以捕获它们之间的时间依赖性,并且我们证明它们可以更好地有效地组合外观和运动信息。其次,为了减轻对背景和其他变化的干扰,我们提出了一种将时间信息编码为运动表示的方法,然后采用基于2D-ResNets的两流体系结构从该表示中提取深层特征。第三,我们研究了在不同层次上融合分类结果的不同融合策略,并且我们发现,集成多种编码时空信息的方法会导致强大而稳定的时空特征学习,并具有更好的泛化能力。最后,我们在4种具有挑战性的数据集上执行了不同的实验,以评估所研究体系结构的性能,这表明我们的方法在准确性和效率方面均优于现有技术,尤其令人印象深刻。获得的结果也证实了将拟议方法嵌入其他智能系统应用领域的重要性。 (C)2019 Elsevier Ltd.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号