首页> 外文期刊>Pattern recognition letters >Let the robot tell: Describe car image with natural language via LSTM
【24h】

Let the robot tell: Describe car image with natural language via LSTM

机译:让机器人说:通过LSTM用自然语言描述汽车图像

获取原文
获取原文并翻译 | 示例
           

摘要

Image-based car detection and classification has remained as a research hub in self-driving for decades. However, natural language description of car images is still a virgin territory even though it is a simple task for human to describe it by sentences within a glimpse at the image. In this paper, we present an end-to-end trainable and spatial-temporal deep recurrent neural network: LSTM (Long-Short Term Memory) to automatically convert car images to human understandable natural language descriptions. Our model builds on state of the art progress in computer vision and machine translation: we extract car region proposals with Region Convolutional Neural Networks(R-CNN) and embed them into fixed-sized vectors. Each word in a sentence is also embedded into real-valued vectors of the same size of images through a local global context aware neural network. The LSTM, feeding by image-sentence pairs sequentially in the training stage, is trained to maximize the joint probability of target word in each time step. In the test stage, the pre-trained LSTM receives a car image and predicts natural language description word by word. Finally, we evaluate our model regarding car's static/dynamic attribute description on both 30,000 CompCar dataset [21] and 1000 video dataset collected on street scenario by our self-driving car, with quantitative BLEU score and subjective human-rating system evaluation metrics. We test our model's generalization ability, its transfer ability to address car property classification issue and various image feature extractors' impact on our model. Experiment results show the superiority and robustness of our model (refer to www. carlib. net/carimg2text. html formoreexperimentresults). (C) 2017 Elsevier B.V. Allrightsreserved.
机译:数十年来,基于图像的汽车检测和分类一直是自动驾驶的研究中心。然而,即使人们通过瞥见图像中的句子来描述汽车图像是一项简单的任务,但汽车图像的自然语言描述仍然是一个处女地。在本文中,我们提出了一种端到端的可训练的时空深度递归神经网络:LSTM(长短期记忆),可将汽车图像自动转换为人类可理解的自然语言描述。我们的模型建立在计算机视觉和机器翻译的最新进展的基础上:我们使用区域卷积神经网络(R-CNN)提取汽车区域建议,并将其嵌入固定大小的向量中。句子中的每个单词也通过局部全局上下文感知神经网络嵌入到具有相同大小图像的实值向量中。在训练阶段按图像句子对顺序喂食的LSTM被训练为在每个时间步最大化目标词的联合概率。在测试阶段,经过预训练的LSTM会接收汽车图像并逐字预测自然语言描述。最后,我们使用自动BLEU评分和主观人类评分系统评估指标,在我们的自动驾驶汽车在30,000个CompCar数据集[21]和1000个视频数据集上评估了汽车的静态/动态属性描述模型。我们测试了模型的泛化能力,其解决汽车性能分类问题的转移能力以及各种图像特征提取器对模型的影响。实验结果显示了我们模型的优越性和鲁棒性(有关更多实验结果,请参见www.carlib.net/carimg2text.html)。 (C)2017 Elsevier B.V.保留所有权利。

著录项

  • 来源
    《Pattern recognition letters》 |2017年第15期|75-82|共8页
  • 作者

    Chen Long; He Yuhang; Fan Lei;

  • 作者单位

    Sun Yat Sen Univ, Guangzhou 510006, Guangdong, Peoples R China;

    Sun Yat Sen Univ, Guangzhou 510006, Guangdong, Peoples R China;

    Sun Yat Sen Univ, Guangzhou 510006, Guangdong, Peoples R China;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号