Translating Videos to Natural Language Using Deep Recurrent Neural Networks

机译：使用深度递归神经网络将视频翻译成自然语言

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Solving the visual symbol grounding problem has long been a goal of artificial intelligence. The field appears to be advancing closer to this goal with recent breakthroughs in deep learning for natural language grounding in static images. In this paper, we propose to translate videos directly to sentences using a unified deep neural network with both con-volutional and recurrent structure. Described video datasets are scarce, and most existing methods have been applied to toy domains with a small vocabulary of possible words. By transferring knowledge from 1.2M+ images with category labels and 100,000+ images with captions, our method is able to create sentence descriptions of open-domain videos with large vocabularies. We compare our approach with recent work using language generation metrics, subject, verb, and object prediction accuracy, and a human evaluation.

机译：解决视觉符号接地问题一直是人工智能的目标。随着深度学习在静态图像中基于自然语言的深度学习方面的最新突破，该领域似乎正在朝着这个目标迈进。在本文中，我们建议使用具有卷积和递归结构的统一深度神经网络将视频直接翻译为句子。所描述的视频数据集很稀少，并且大多数现有方法已被应用到玩具领域，而单词的词汇量很少。通过从具有类别标签的120万张图像和带有标题的10万张图像中转移知识，我们的方法能够创建具有大词汇量的开放域视频的句子描述。我们将我们的方法与最近使用语言生成指标，主语，动词和宾语预测精度以及人工评估的工作进行了比较。

著录项

来源
《Conference on the North American Chapter of the Association for Computational Linguistics: Human Language Technologies》|2015年|1494-1504|共11页
会议地点
作者
Subhashini Venugopalan; Huijuan Xu; Jeff Donahue; Marcus Rohrbach; Raymond Mooney; Kate Saenko;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Learning a bidirectional mapping between human whole-body motion and natural language using deep recurrent neural networks [J] . Plappert Matthias, Mandery Christian, Asfour Tamim Robotics and Autonomous Systems . 2018,第期

机译：使用深且经常性神经网络学习人类全身运动和自然语言之间的双向映射
2. Hand Gesture Recognition in Video Sequences Using Deep Convolutional and Recurrent Neural Networks [J] . Zeitschrift fur Arznei- und Gewurzpflanzen . 2020,第1期

机译：使用深卷积和经常性神经网络的视频序列中的手势识别
3. Emotional Video to Audio Transformation Using Deep Recurrent Neural Networks and a Neuro-Fuzzy System [J] . Gwenaelle Cunha Sergio, Minho Lee Mathematical Problems in Engineering: Theory, Methods and Applications . 2020,第1期

机译：使用深频神经网络和神经模糊系统的音频变换情感视频
4. Translating Videos to Natural Language Using Deep Recurrent Neural Networks [C] . Subhashini Venugopalan, Huijuan Xu, Jeff Donahue, Conference on the North American Chapter of the Association for Computational Linguistics: Human Language Technologies . 2015

机译：使用深度经常性神经网络将视频转换为自然语言
5. Deep Neural Language Model for Text Classification Based on Convolutional and Recurrent Neural Networks [D] . Hassan, Abdalraouf. 2018

机译：基于卷积神经网络和递归神经网络的深度神经语言文本分类模型
6. Deep Recurrent Neural Network Reveals a Hierarchy of Process Memory during Dynamic Natural Vision [O] . Junxing Shi, Haiguang Wen, Yizhen Zhang, 2018

机译：深度递归神经网络揭示了动态自然视觉过程记忆的层次结构
7. Translating Videos to Natural Language Using Deep Recurrent Neural Networks [O] . Venugopalan, Subhashini, Xu, Huijuan, Donahue, Jeff, 2015

机译：利用深度递归神经网络将视频翻译成自然语言网络
8. Natural Language Video Description using Deep Recurrent Neural Networks. [R] . Venugopalan, S. 2015

机译：使用深度递归神经网络的自然语言视频描述。

Translating Videos to Natural Language Using Deep Recurrent Neural Networks

摘要

著录项

相似文献

相关主题

期刊订阅