Video Description

26 papers with code • 0 benchmarks • 7 datasets

The goal of automatic Video Description is to tell a story about events happening in a video. While early Video Description methods produced captions for short clips that were manually segmented to contain a single event of interest, more recently dense video captioning has been proposed to both segment distinct events in time and describe them in a series of coherent sentences. This problem is a generalization of dense image region captioning and has many practical applications, such as generating textual summaries for the visually impaired, or detecting and describing important events in surveillance footage.

Source: Joint Event Detection and Description in Continuous Video Streams

Latest papers with no code

Multi-Layer Content Interaction Through Quaternion Product For Visual Question Answering

no code yet • 3 Jan 2020

To solve the issue for the intermediate layers, we propose an efficient Quaternion Block Network (QBN) to learn interaction not only for the last layer but also for all intermediate layers simultaneously.

Prediction and Description of Near-Future Activities in Video

no code yet • 2 Aug 2019

Most of the existing works on human activity analysis focus on recognition or early recognition of the activity labels from complete or partial observations.

End-to-End Video Captioning

no code yet • 4 Apr 2019

The decoder is then optimised on such static features to generate the video's description.

A Dataset for Telling the Stories of Social Media Videos

no code yet • EMNLP 2018

Video content on social media platforms constitutes a major part of the communication between people, as it allows everyone to share their stories.

Incorporating Background Knowledge into Video Description Generation

no code yet • EMNLP 2018

We develop an approach that uses video meta-data to retrieve topically related news documents for a video and extracts the events and named entities from these documents.

Attentive Sequence to Sequence Translation for Localizing Clips of Interest by Natural Language Descriptions

no code yet • 27 Aug 2018

We validate the effectiveness of our ASST on two large-scale datasets.

Bridge Video and Text with Cascade Syntactic Structure

no code yet • COLING 2018

We present a video captioning approach that encodes features by progressively completing syntactic structure (LSTM-CSS).

Multimodal Neural Machine Translation for Low-resource Language Pairs using Synthetic Data

no code yet • WS 2018

In this paper, we investigate the effectiveness of training a multimodal neural machine translation (MNMT) system with image features for a low-resource language pair, Hindi and English, using synthetic data.

Video Description: A Survey of Methods, Datasets and Evaluation Metrics

no code yet • 1 Jun 2018

Video description is the automatic generation of natural language sentences that describe the contents of a given video.

Interpretable Video Captioning via Trajectory Structured Localization

no code yet • CVPR 2018

Automatically describing open-domain videos with natural language are attracting increasing interest in the field of artificial intelligence.