Caption Generation

90 papers with code • 1 benchmarks • 1 datasets

This task has no description! Would you like to contribute one?

Benchmarks

Add a Result

These leaderboards are used to track progress in Caption Generation

Trend	Dataset	Best Model	Paper	Code	Compare
	Concadia	VLIS (BLIP-2)			See all

Libraries

Use these libraries to find Caption Generation models and implementations

rakshithShetty/captionGAN

2 papers

Datasets

Concadia

Most implemented papers

Most implemented Social Latest No code

Video captioning with recurrent networks based on frame- and video-level features and visual content classification

aalto-cbir/neuraltalkTheano • 9 Dec 2015

In this paper, we describe the system for generating textual descriptions of short video clips using recurrent neural networks (RNN), which we used while participating in the Large Scale Movie Description Challenge 2015 in ICCV 2015.

Paper
Code

DSD: Dense-Sparse-Dense Training for Deep Neural Networks

3outeille/DSD-training • • 15 Jul 2016

We propose DSD, a dense-sparse-dense training flow, for regularizing deep neural networks and achieving better optimization performance.

Paper
Code

An Empirical Study of Language CNN for Image Captioning

showkeyjar/chinese_im2text.pytorch • • ICCV 2017

Language Models based on recurrent neural networks have dominated recent image caption generation tasks.

Paper
Code

Twin Networks: Matching the Future for Sequence Generation

dmitriy-serdyuk/twin-net • • ICLR 2018

We propose a simple technique for encouraging generative RNNs to plan ahead.

Paper
Code

CNN Fixations: An unraveling approach to visualize the discriminative image regions

val-iisc/cnn-fixations • • 22 Aug 2017

We demonstrate through a variety of applications that our approach is able to localize the discriminative image locations across different network architectures, diverse vision tasks and data modalities.

Paper
Code

Tensor Product Generation Networks for Deep NLP Modeling

ggeorgea/TPRcaption • • NAACL 2018

We present a new approach to the design of deep networks for natural language processing (NLP), based on the general technique of Tensor Product Representations (TPRs) for encoding and processing symbol structures in distributed neural networks.

Paper
Code

Attacking Visual Language Grounding with Adversarial Examples: A Case Study on Neural Image Captioning

huanzhang12/ImageCaptioningAttack • • ACL 2018

Our extensive experiments show that our algorithm can successfully craft visually-similar adversarial examples with randomly targeted captions or keywords, and the adversarial examples can be made highly transferable to other image captioning systems.

Paper
Code