Generating a concise natural language description of an image enables a number of applications including fast keyword based search of large image collections. Primarily inspired by deep learning, recent times have witnessed a substantially increased focus on machine based image caption generation. In this paper, we provide a brief review of deep learning based image caption generation along with a brief overview of the datasets and metrics used to evaluate the captioning algorithms. We conclude the paper with some discussion on promising directions for future research.
展开▼