About

The goal of automatic Video Description is to tell a story about events happening in a video. While early Video Description methods produced captions for short clips that were manually segmented to contain a single event of interest, more recently dense video captioning has been proposed to both segment distinct events in time and describe them in a series of coherent sentences. This problem is a generalization of dense image region captioning and has many practical applications, such as generating textual summaries for the visually impaired, or detecting and describing important events in surveillance footage.

Source: Joint Event Detection and Description in Continuous Video Streams

Benchmarks

No evaluation results yet. Help compare methods by submit evaluation metrics.

Datasets

Latest papers with code

Identity-Aware Multi-Sentence Video Description

ECCV 2020 jamespark3922/lsmdc-fillin

This auxiliary task allows us to propose a two-stage approach to Identity-Aware Video Description.

GENDER PREDICTION VIDEO DESCRIPTION

3
22 Aug 2020

Describing Unseen Videos via Multi-Modal Cooperative Dialog Agents

ECCV 2020 L-YeZhu/Video-Description-via-Dialog-Agents-ECCV2020

With the arising concerns for the AI systems provided with direct access to abundant sensitive information, researchers seek to develop more reliable AI with implicit information sources.

VIDEO DESCRIPTION

3
18 Aug 2020

Delving Deeper into the Decoder for Video Captioning

16 Jan 2020WingsBrokenAngel/delving-deeper-into-the-decoder-for-video-captioning

Video captioning is an advanced multi-modal task which aims to describe a video clip using a natural language sentence.

VIDEO CAPTIONING VIDEO DESCRIPTION

27
16 Jan 2020

VizSeq: A Visual Analysis Toolkit for Text Generation Tasks

IJCNLP 2019 facebookresearch/vizseq

Automatic evaluation of text generation tasks (e. g. machine translation, text summarization, image captioning and video description) usually relies heavily on task-specific metrics, such as BLEU and ROUGE.

IMAGE CAPTIONING MACHINE TRANSLATION TEXT GENERATION TEXT SUMMARIZATION VIDEO DESCRIPTION

313
12 Sep 2019

VATEX: A Large-Scale, High-Quality Multilingual Dataset for Video-and-Language Research

ICCV 2019 eric-xw/Video-guided-Machine-Translation

We also introduce two tasks for video-and-language research based on VATEX: (1) Multilingual Video Captioning, aimed at describing a video in various languages with a compact unified captioning model, and (2) Video-guided Machine Translation, to translate a source language description into the target language using the video information as additional spatiotemporal context.

MACHINE TRANSLATION VIDEO CAPTIONING VIDEO DESCRIPTION

37
06 Apr 2019

Grounded Video Description

CVPR 2019 facebookresearch/grounded-video-description

Our dataset, ActivityNet-Entities, augments the challenging ActivityNet Captions dataset with 158k bounding box annotations, each grounding a noun phrase.

VIDEO DESCRIPTION

230
17 Dec 2018

Adversarial Inference for Multi-Sentence Video Description

CVPR 2019 jamespark3922/adv-inf

Among the main issues are the fluency and coherence of the generated descriptions, and their relevance to the video.

IMAGE CAPTIONING VIDEO DESCRIPTION

28
13 Dec 2018

Audio Visual Scene-Aware Dialog (AVSD) Challenge at DSTC7

1 Jun 2018hudaAlamri/DSTC7-Audio-Visual-Scene-Aware-Dialog-AVSD-Challenge

Scene-aware dialog systems will be able to have conversations with users about the objects and events around them.

VIDEO DESCRIPTION VISUAL DIALOG

41
01 Jun 2018

Predicting Visual Features from Text for Image and Video Caption Retrieval

5 Sep 2017danieljf24/w2vv

This paper strives to find amidst a set of sentences the one best describing the content of a given image or video.

VIDEO DESCRIPTION

60
05 Sep 2017