...
首页> 外文期刊>IEEE Transactions on Image Processing >Image Representations With Spatial Object-to-Object Relations for RGB-D Scene Recognition
【24h】

Image Representations With Spatial Object-to-Object Relations for RGB-D Scene Recognition

机译:具有用于RGB-D场景识别的空间对象关系的图像表示

获取原文
获取原文并翻译 | 示例
           

摘要

Scene recognition is challenging due to the intra-class diversity and inter-class similarity. Previous works recognize scenes either with global representations or with the intermediate representations of objects. In contrast, we investigate more discriminative image representations of object-to-object relations for scene recognition, which are based on the triplets of & x003C;object, relation, object & x003E; obtained with detection techniques. Particularly, two types of representations, including co-occurring frequency of object-to-object relation (denoted as COOR) and sequential representation of object-to-object relation (denoted as SOOR), are proposed to describe objects and their relative relations in different forms. COOR is represented as the intermediate representation of co-occurring frequency of objects and their relations, with a three order tensor that can be fed to scene classifier without further embedding. SOOR is represented in a more explicit and freer form that sequentially describe image contents with local captions. And a sequence encoding model (e.g., recurrent neural network (RNN)) is implemented to encode SOOR to the features for feeding the classifiers. In order to better capture the spatial information, the proposed COOR and SOOR are adapted to RGB-D data, where a RGB-D proposal fusion method is proposed for RGB-D object detection. With the proposed approaches COOR and SOOR, we obtain the state-of-the-art results of RGB-D scene recognition on SUN RGB-D and NYUD2 datasets.
机译:由于课外多样性和阶级相似性,场景识别是挑战性的。以前的作品识别使用全局表示或对象的中间表示来识别场景。相比之下,我们调查了场景识别对象关系的更多辨别性图像表示,用于基于&x003c的三态;对象,关系,对象和x003e;用检测技术获得。特别地,提出了两种类型的表示,包括对象关系(表示为COOR)的共同发生频率(表示为COOR)和对象对象关系的顺序表示(表示为SOOR),以描述对象及其相关关系不同的形式。 CoOR表示为对象的共同发生频率的中间表示及其关系,具有三阶张量,可以馈送到场景分类器而无需​​进一步嵌入。 SOOR以更明显的和更自由的形式表示,该形式顺序地描述了具有本地字幕的图像内容。和序列编码模型(例如,复发性神经网络(RNN))被实现为对SOOR进行编码到用于馈送分类器的特征。为了更好地捕获空间信息,所提出的Coor和SOOR适用于RGB-D数据,其中提出了RGB-D对象检测的RGB-D提案融合方法。通过提出的接近Coor和Soor,我们获得了Sun RGB-D和NYUD2数据集的RGB-D场景识别的最先进结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号