首页> 外文期刊>IEEE Transactions on Pattern Analysis and Machine Intelligence >Pixel Objectness: Learning to Segment Generic Objects Automatically in Images and Videos
【24h】

Pixel Objectness: Learning to Segment Generic Objects Automatically in Images and Videos

机译:像素客观性:学习如何在图像和视频中自动分割通用对象

获取原文
获取原文并翻译 | 示例
           

摘要

We propose an end-to-end learning framework for segmenting generic objects in both images and videos. Given a novel image or video, our approach produces a pixel-level mask for all "object-like" regions-even for object categories never seen during training. We formulate the task as a structured prediction problem of assigning an object/background label to each pixel, implemented using a deep fully convolutional network. When applied to a video, our model further incorporates a motion stream, and the network learns to combine both appearance and motion and attempts to extract all prominent objects whether they are moving or not. Beyond the core model, a second contribution of our approach is how it leverages varying strengths of training annotations. Pixel-level annotations are quite difficult to obtain, yet crucial for training a deep network approach for segmentation. Thus we propose ways to exploit weakly labeled data for learning dense foreground segmentation. For images, we show the value in mixing object category examples with image-level labels together with relatively few images with boundary-level annotations. For video, we show how to bootstrap weakly annotated videos together with the network trained for image segmentation. Through experiments on multiple challenging image and video segmentation benchmarks, our method offers consistently strong results and improves the state-of-the-art for fully automatic segmentation of generic (unseen) objects. In addition, we demonstrate how our approach benefits image retrieval and image retargeting, both of which flourish when given our high-quality foreground maps. Code, models, and videos are at: http://vision.cs.utexas.edu/projects/pixelobjectness/
机译:我们提出了一种端到端学习框架,用于分割图像和视频中的通用对象。对于新颖的图像或视频,我们的方法会为所有“类物体”区域(甚至对于训练过程中从未见过的物体类别)生成像素级蒙版。我们将任务表述为使用深度完全卷积网络实现的将对象/背景标签分配给每个像素的结构化预测问题。当应用于视频时,我们的模型进一步合并了运动流,并且网络学会了结合外观和运动,并尝试提取所有突出的对象,无论它们是否在移动。除了核心模型之外,我们方法的第二个贡献是它如何利用训练注释的各种优势。像素级注释很难获得,但对于训练深度网络分割方法至关重要。因此,我们提出了利用弱标记数据来学习密集前景分割的方法。对于图像,我们显示了将对象类别示例与图像级标签以及相对较少的具有边界级注释的图像混合在一起的价值。对于视频,我们展示了如何引导弱注释视频以及经过训练以进行图像分割的网络。通过在多个具有挑战性的图像和视频分割基准上进行的实验,我们的方法提供了始终如一的强大结果,并改进了对通用(看不见的)对象进行全自动分割的最新技术。此外,我们演示了我们的方法如何有益于图像检索和图像重新定向,当使用高质量前景地图时,这两种方法都会蓬勃发展。代码,模型和视频位于:http://vision.cs.utexas.edu/projects/pixelobjectness/

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号