首页> 外文会议>IEEE/CVF Conference on Computer Vision and Pattern Recognition >Vision-and-Language Navigation: Interpreting Visually-Grounded Navigation Instructions in Real Environments
【24h】

Vision-and-Language Navigation: Interpreting Visually-Grounded Navigation Instructions in Real Environments

机译:视觉和语言导航:在真实环境中解释视觉导航指令

获取原文

摘要

A robot that can carry out a natural-language instruction has been a dream since before the Jetsons cartoon series imagined a life of leisure mediated by a fleet of attentive robot helpers. It is a dream that remains stubbornly distant. However, recent advances in vision and language methods have made incredible progress in closely related areas. This is significant because a robot interpreting a natural-language navigation instruction on the basis of what it sees is carrying out a vision and language process that is similar to Visual Question Answering. Both tasks can be interpreted as visually grounded sequence-to-sequence translation problems, and many of the same methods are applicable. To enable and encourage the application of vision and language methods to the problem of interpreting visually-grounded navigation instructions, we present the Matter-port3D Simulator - a large-scale reinforcement learning environment based on real imagery [11]. Using this simulator, which can in future support a range of embodied vision and language tasks, we provide the first benchmark dataset for visually-grounded natural language navigation in real buildings - the Room-to-Room (R2R) dataset1.
机译:一个机器人可以在杰尔斯卡通系列想象一下由一队专注机器人帮助者介绍的休闲寿命之前,这是一个自然语义的梦想。这是一个难以顽固地居住的梦想。然而,最近的愿景和语言方法的进展在密切相关的地区取得了令人难以置信的进展。这是重要的,因为一个机器人在其所看到的基础上解释自然语言导航指令正在进行类似于视觉和语言过程,类似于类似于视觉问题的回答。这两个任务都可以解释为视觉接地的序列到序列翻译问题,并且许多相同的方法是适用的。为了启用和鼓励应用视觉和语言方法,以解释视觉接地导航指令的问题,我们介绍了基于真实图像的大规模加强学习环境[11]。使用此模拟器可以在将来支持一系列体现的视觉和语言任务中,我们为真实建筑物的视觉上接地的自然语言导航提供了第一个基准数据集 - 房间到室(R2R)DataSet1。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号