Video Description

26 papers with code • 0 benchmarks • 7 datasets

The goal of automatic Video Description is to tell a story about events happening in a video. While early Video Description methods produced captions for short clips that were manually segmented to contain a single event of interest, more recently dense video captioning has been proposed to both segment distinct events in time and describe them in a series of coherent sentences. This problem is a generalization of dense image region captioning and has many practical applications, such as generating textual summaries for the visually impaired, or detecting and describing important events in surveillance footage.

Source: Joint Event Detection and Description in Continuous Video Streams

Benchmarks

Add a Result

These leaderboards are used to track progress in Video Description

No evaluation results yet. Help compare methods by submitting evaluation metrics.

Datasets

Latest papers

Most implemented Social Latest No code

A Mid-level Video Representation based on Binary Descriptors: A Case Study for Pornography Detection

jackaduma/nude-detect • 12 May 2016

Although these approaches provide good results, they generally have the disadvantage of a high false positive rate since not all images with large areas of skin exposure are necessarily pornographic images, such as people wearing swimsuits or images related to sports.

12 May 2016

Paper
Code

Video Description using Bidirectional Recurrent Neural Networks

lvapeab/ABiViRNet • • 12 Apr 2016

Although traditionally used in the machine translation field, the encoder-decoder framework has been recently applied for the generation of video and image descriptions.

12 Apr 2016

Paper
Code

TGIF: A New Dataset and Benchmark on Animated GIF Description

raingo/TGIF-Release • CVPR 2016

The motivation for this work is to develop a testbed for image sequence description systems, where the task is to generate natural language descriptions for animated GIFs or video clips.

109

10 Apr 2016

Paper
Code

Improving LSTM-based Video Description with Linguistic Knowledge Mined from Text

TejInaco/multimodalML • EMNLP 2016

This paper investigates how linguistic knowledge mined from large text corpora can aid the generation of natural language descriptions of videos.

06 Apr 2016

Paper
Code

Using Descriptive Video Services to Create a Large Data Source for Video Annotation Research

jssprz/video_captioning_datasets • 3 Mar 2015

DVS is an audio narration describing the visual elements and actions in a movie for the visually impaired.

03 Mar 2015

Paper
Code

Describing Videos by Exploiting Temporal Structure

yaoli/arctic-capgen-vid • ICCV 2015

In this context, we propose an approach that successfully takes into account both the local and global temporal structure of videos to produce descriptions.

260

27 Feb 2015

Paper
Code

Video Description

Benchmarks Add a Result

Datasets

Latest papers

A Mid-level Video Representation based on Binary Descriptors: A Case Study for Pornography Detection

Video Description using Bidirectional Recurrent Neural Networks

TGIF: A New Dataset and Benchmark on Animated GIF Description

Improving LSTM-based Video Description with Linguistic Knowledge Mined from Text

Using Descriptive Video Services to Create a Large Data Source for Video Annotation Research

Describing Videos by Exploiting Temporal Structure

Content

Benchmarks

Add a Result