Video Description

26 papers with code • 0 benchmarks • 7 datasets

The goal of automatic Video Description is to tell a story about events happening in a video. While early Video Description methods produced captions for short clips that were manually segmented to contain a single event of interest, more recently dense video captioning has been proposed to both segment distinct events in time and describe them in a series of coherent sentences. This problem is a generalization of dense image region captioning and has many practical applications, such as generating textual summaries for the visually impaired, or detecting and describing important events in surveillance footage.

Source: Joint Event Detection and Description in Continuous Video Streams

Benchmarks

Add a Result

These leaderboards are used to track progress in Video Description

No evaluation results yet. Help compare methods by submitting evaluation metrics.

Datasets

Latest papers

Most implemented Social Latest No code

Delving Deeper into the Decoder for Video Captioning

WingsBrokenAngel/delving-deeper-into-the-decoder-for-video-captioning • • 16 Jan 2020

Video captioning is an advanced multi-modal task which aims to describe a video clip using a natural language sentence.

16 Jan 2020

Paper
Code

VizSeq: A Visual Analysis Toolkit for Text Generation Tasks

facebookresearch/vizseq • IJCNLP 2019

Automatic evaluation of text generation tasks (e. g. machine translation, text summarization, image captioning and video description) usually relies heavily on task-specific metrics, such as BLEU and ROUGE.

438

12 Sep 2019

Paper
Code

VATEX: A Large-Scale, High-Quality Multilingual Dataset for Video-and-Language Research

eric-xw/Video-guided-Machine-Translation • • ICCV 2019

We also introduce two tasks for video-and-language research based on VATEX: (1) Multilingual Video Captioning, aimed at describing a video in various languages with a compact unified captioning model, and (2) Video-guided Machine Translation, to translate a source language description into the target language using the video information as additional spatiotemporal context.

06 Apr 2019

Paper
Code

Grounded Video Description

facebookresearch/grounded-video-description • • CVPR 2019

Our dataset, ActivityNet-Entities, augments the challenging ActivityNet Captions dataset with 158k bounding box annotations, each grounding a noun phrase.

311

17 Dec 2018

Paper
Code