Video Captioning

162 papers with code • 11 benchmarks • 32 datasets

Video Captioning is a task of automatic captioning a video by understanding the action and event in the video which can help in the retrieval of the video efficiently through text.

Source: NITS-VC System for VATEX Video Captioning Challenge 2020

Libraries

Use these libraries to find Video Captioning models and implementations

Latest papers with no code

Set Prediction Guided by Semantic Concepts for Diverse Video Captioning

no code yet • 25 Dec 2023

Each caption in the set is attached to a concept combination indicating the primary semantic content of the caption and facilitating element alignment in set prediction.

Subject-Oriented Video Captioning

no code yet • 20 Dec 2023

To address this problem, we propose a new video captioning task, subject-oriented video captioning, which allows users to specify the describing target via a bounding box.

Attention Based Encoder Decoder Model for Video Captioning in Nepali (2023)

no code yet • 12 Dec 2023

Video captioning in Nepali, a language written in the Devanagari script, presents a unique challenge due to the lack of existing academic work in this domain.

Video Summarization: Towards Entity-Aware Captions

no code yet • 1 Dec 2023

We also release a large-scale dataset, VIEWS (VIdeo NEWS), to support research on this task.

Exo2EgoDVC: Dense Video Captioning of Egocentric Procedural Activities Using Web Instructional Videos

no code yet • 28 Nov 2023

We propose a novel benchmark for cross-view knowledge transfer of dense video captioning, adapting models from web instructional videos with exocentric views to an egocentric view.

Incorporating granularity bias as the margin into contrastive loss for video captioning

no code yet • 25 Nov 2023

To mitigate the impact of granularity bias on the model, we introduced a statistical-based bias extractor.

Nepali Video Captioning using CNN-RNN Architecture

no code yet • 5 Nov 2023

This article presents a study on Nepali video captioning using deep neural networks.

Dense Video Captioning: A Survey of Techniques, Datasets and Evaluation Protocols

no code yet • 5 Nov 2023

Dense Video Captioning (DVC) aims at detecting and describing different events in a given video.

Learning Interactive Real-World Simulators

no code yet • 9 Oct 2023

Applications of a real-world simulator range from controllable content creation in games and movies, to training embodied agents purely in simulation that can be directly deployed in the real world.

IcoCap: Improving Video Captioning by Compounding Images

no code yet • IEEE Transactions on Multimedia 2023

Video captioning is a more challenging task compared to image captioning, primarily due to differences in content density.