Video Captioning
162 papers with code • 11 benchmarks • 32 datasets
Video Captioning is a task of automatic captioning a video by understanding the action and event in the video which can help in the retrieval of the video efficiently through text.
Source: NITS-VC System for VATEX Video Captioning Challenge 2020
Libraries
Use these libraries to find Video Captioning models and implementationsSubtasks
Latest papers with no code
Set Prediction Guided by Semantic Concepts for Diverse Video Captioning
Each caption in the set is attached to a concept combination indicating the primary semantic content of the caption and facilitating element alignment in set prediction.
Subject-Oriented Video Captioning
To address this problem, we propose a new video captioning task, subject-oriented video captioning, which allows users to specify the describing target via a bounding box.
Attention Based Encoder Decoder Model for Video Captioning in Nepali (2023)
Video captioning in Nepali, a language written in the Devanagari script, presents a unique challenge due to the lack of existing academic work in this domain.
Video Summarization: Towards Entity-Aware Captions
We also release a large-scale dataset, VIEWS (VIdeo NEWS), to support research on this task.
Exo2EgoDVC: Dense Video Captioning of Egocentric Procedural Activities Using Web Instructional Videos
We propose a novel benchmark for cross-view knowledge transfer of dense video captioning, adapting models from web instructional videos with exocentric views to an egocentric view.
Incorporating granularity bias as the margin into contrastive loss for video captioning
To mitigate the impact of granularity bias on the model, we introduced a statistical-based bias extractor.
Nepali Video Captioning using CNN-RNN Architecture
This article presents a study on Nepali video captioning using deep neural networks.
Dense Video Captioning: A Survey of Techniques, Datasets and Evaluation Protocols
Dense Video Captioning (DVC) aims at detecting and describing different events in a given video.
Learning Interactive Real-World Simulators
Applications of a real-world simulator range from controllable content creation in games and movies, to training embodied agents purely in simulation that can be directly deployed in the real world.
IcoCap: Improving Video Captioning by Compounding Images
Video captioning is a more challenging task compared to image captioning, primarily due to differences in content density.