About

The goal of automatic Video Description is to tell a story about events happening in a video. While early Video Description methods produced captions for short clips that were manually segmented to contain a single event of interest, more recently dense video captioning has been proposed to both segment distinct events in time and describe them in a series of coherent sentences. This problem is a generalization of dense image region captioning and has many practical applications, such as generating textual summaries for the visually impaired, or detecting and describing important events in surveillance footage.

Source: Joint Event Detection and Description in Continuous Video Streams

Benchmarks

No evaluation results yet. Help compare methods by submit evaluation metrics.

Datasets

Latest papers without code

Efficient data-driven encoding of scene motion using Eccentricity

3 Mar 2021

This paper presents a novel approach of representing dynamic visual scenes with static maps generated from video/image streams.

ACTIVITY RECOGNITION OBJECT TRACKING VIDEO DESCRIPTION

The Role of the Input in Natural Language Video Description

9 Feb 2021

Natural Language Video Description (NLVD) has recently received strong interest in the Computer Vision, Natural Language Processing (NLP), Multimedia, and Autonomous Robotics communities.

DATA AUGMENTATION VIDEO DESCRIPTION

Understanding Health Misinformation Transmission: An Interpretable Deep Learning Approach to Manage Infodemics

21 Dec 2020

Understanding how health misinformation is transmitted is an urgent goal for researchers, social media platforms, health sectors, and policymakers to mitigate those ramifications.

MISINFORMATION VIDEO DESCRIPTION

MSVD-Turkish: A Comprehensive Multimodal Dataset for Integrated Vision and Language Research in Turkish

13 Dec 2020

We hope that the MSVD-Turkish dataset and the results reported in this work will lead to better video captioning and multimodal machine translation models for Turkish and other morphology rich and agglutinative languages.

MULTIMODAL MACHINE TRANSLATION VIDEO CAPTIONING VIDEO DESCRIPTION

A Comprehensive Review on Recent Methods and Challenges of Video Description

30 Nov 2020

In this work, we report a comprehensive survey on the phases of video description approaches, the dataset for video description, evaluation metrics, open competitions for motivating the research on the video description, open challenges in this field, and future research directions.

MACHINE TRANSLATION VIDEO DESCRIPTION

Active Learning for Video Description With Cluster-Regularized Ensemble Ranking

27 Jul 2020

Automatic video captioning aims to train models to generate text descriptions for all segments in a video, however, the most effective approaches require large amounts of manual annotation which is slow and expensive.

ACTIVE LEARNING VIDEO CAPTIONING VIDEO DESCRIPTION

Multi-Layer Content Interaction Through Quaternion Product For Visual Question Answering

3 Jan 2020

To solve the issue for the intermediate layers, we propose an efficient Quaternion Block Network (QBN) to learn interaction not only for the last layer but also for all intermediate layers simultaneously.

QUESTION ANSWERING VIDEO DESCRIPTION VISUAL QUESTION ANSWERING

Prediction and Description of Near-Future Activities in Video

2 Aug 2019

Similarly, existing video captioning approaches focus on the observed events in videos.

VIDEO CAPTIONING VIDEO DESCRIPTION

End-to-End Video Captioning

4 Apr 2019

The decoder is then optimised on such static features to generate the video's description.

ACTION RECOGNITION MACHINE TRANSLATION TEXT GENERATION VIDEO CAPTIONING VIDEO DESCRIPTION

A Dataset for Telling the Stories of Social Media Videos

EMNLP 2018

Video content on social media platforms constitutes a major part of the communication between people, as it allows everyone to share their stories.

VIDEO CAPTIONING VIDEO DESCRIPTION