Video Captioning

162 papers with code • 11 benchmarks • 32 datasets

Video Captioning is a task of automatic captioning a video by understanding the action and event in the video which can help in the retrieval of the video efficiently through text.

Source: NITS-VC System for VATEX Video Captioning Challenge 2020

Benchmarks

Add a Result

These leaderboards are used to track progress in Video Captioning

Dataset	Best Model	Compare
MSR-VTT	mPLUG-2	See all
MSVD	MaMMUT	See all
YouCook2	VAST	See all
VATEX	VALOR	See all
ActivityNet Captions	VideoCoCa	See all
Hindi MSR-VTT	SBD_Keyframe	See all
TVC	VAST	See all
MSVD-Indonesian	VNS-GRU (Cross-Lingual)	See all
ChinaOpen-1k	GVT	See all
Shot2Story20K	Ours	See all
VidChapters-7M	Vid2Seq	See all

Show all 11 benchmarks

Collapse benchmarks

Libraries

Use these libraries to find Video Captioning models and implementations

rakshithShetty/captionGAN

2 papers

Datasets

Subtasks

Audio-Visual Video Captioning

Video Boundary Captioning

Latest papers with no code

Most implemented Social Latest No code

Set Prediction Guided by Semantic Concepts for Diverse Video Captioning

no code yet • 25 Dec 2023

Each caption in the set is attached to a concept combination indicating the primary semantic content of the caption and facilitating element alignment in set prediction.

Paper
Add Code

Subject-Oriented Video Captioning

no code yet • 20 Dec 2023

To address this problem, we propose a new video captioning task, subject-oriented video captioning, which allows users to specify the describing target via a bounding box.

Paper
Add Code

Attention Based Encoder Decoder Model for Video Captioning in Nepali (2023)

no code yet • 12 Dec 2023

Video captioning in Nepali, a language written in the Devanagari script, presents a unique challenge due to the lack of existing academic work in this domain.

Paper
Add Code

Video Summarization: Towards Entity-Aware Captions

no code yet • 1 Dec 2023

We also release a large-scale dataset, VIEWS (VIdeo NEWS), to support research on this task.

Paper
Add Code

Exo2EgoDVC: Dense Video Captioning of Egocentric Procedural Activities Using Web Instructional Videos

no code yet • 28 Nov 2023

We propose a novel benchmark for cross-view knowledge transfer of dense video captioning, adapting models from web instructional videos with exocentric views to an egocentric view.

Paper
Add Code

Incorporating granularity bias as the margin into contrastive loss for video captioning

no code yet • 25 Nov 2023

To mitigate the impact of granularity bias on the model, we introduced a statistical-based bias extractor.

Paper
Add Code

Nepali Video Captioning using CNN-RNN Architecture

no code yet • 5 Nov 2023

This article presents a study on Nepali video captioning using deep neural networks.

Paper
Add Code

Dense Video Captioning: A Survey of Techniques, Datasets and Evaluation Protocols

no code yet • 5 Nov 2023

Dense Video Captioning (DVC) aims at detecting and describing different events in a given video.

Paper
Add Code

Learning Interactive Real-World Simulators

no code yet • 9 Oct 2023

Applications of a real-world simulator range from controllable content creation in games and movies, to training embodied agents purely in simulation that can be directly deployed in the real world.

Paper
Add Code

IcoCap: Improving Video Captioning by Compounding Images

no code yet • IEEE Transactions on Multimedia 2023

Video captioning is a more challenging task compared to image captioning, primarily due to differences in content density.

Paper
Add Code

Video Captioning

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Latest papers with no code

Content

Benchmarks

Add a Result