Dense Video Captioning

25 papers with code • 4 benchmarks • 7 datasets

Most natural videos contain numerous events. For example, in a video of a “man playing a piano”, the video might also contain “another man dancing” or “a crowd clapping”. The task of dense video captioning involves both detecting and describing events in a video.

Latest papers with no code

Exploiting Auxiliary Caption for Video Grounding

no code yet • 15 Jan 2023

Video grounding aims to locate a moment of interest matching the given query sentence from an untrimmed video.

A Closer Look at Temporal Ordering in the Segmentation of Instructional Videos

no code yet • 30 Sep 2022

We refer to this task as Procedure Segmentation and Summarization (PSS).

Recipe Generation from Unsegmented Cooking Videos

no code yet • 21 Sep 2022

However, unlike DVC, in recipe generation, recipe story awareness is crucial, and a model should extract an appropriate number of events in the correct order and generate accurate sentences based on them.

SAVCHOI: Detecting Suspicious Activities using Dense Video Captioning with Human Object Interactions

no code yet • 24 Jul 2022

Further, we validate our approach against the existing state-of-the-art algorithms for the Dense Video Captioning task for the ActivityNet Captions dataset.

PIC 4th Challenge: Semantic-Assisted Multi-Feature Encoding and Multi-Head Decoding for Dense Video Captioning

no code yet • 6 Jul 2022

The task of Dense Video Captioning (DVC) aims to generate captions with timestamps for multiple events in one video.

End-to-end Dense Video Captioning as Sequence Generation

no code yet • COLING 2022

Dense video captioning aims to identify the events of interest in an input video, and generate descriptive captions for each event.

Semantic-Aware Pretraining for Dense Video Captioning

no code yet • 13 Apr 2022

This report describes the details of our approach for the event dense-captioning task in ActivityNet Challenge 2021.

End-to-end Dense Video Captioning as Sequence Generation

no code yet • ACL ARR January 2022

Dense video captioning aims to identify the events of interest in an input video, and generate descriptive captions for each event.

DVCFlow: Modeling Information Flow Towards Human-like Video Captioning

no code yet • 19 Nov 2021

Dense video captioning (DVC) aims to generate multi-sentence descriptions to elucidate the multiple events in the video, which is challenging and demands visual consistency, discoursal coherence, and linguistic diversity.

Sketch, Ground, and Refine: Top-Down Dense Video Captioning

no code yet • CVPR 2021

The dense video captioning task aims to detect and describe a sequence of events in a video for detailed and coherent storytelling.