Video Description

26 papers with code • 0 benchmarks • 7 datasets

The goal of automatic Video Description is to tell a story about events happening in a video. While early Video Description methods produced captions for short clips that were manually segmented to contain a single event of interest, more recently dense video captioning has been proposed to both segment distinct events in time and describe them in a series of coherent sentences. This problem is a generalization of dense image region captioning and has many practical applications, such as generating textual summaries for the visually impaired, or detecting and describing important events in surveillance footage.

Source: Joint Event Detection and Description in Continuous Video Streams

Benchmarks

Add a Result

These leaderboards are used to track progress in Video Description

No evaluation results yet. Help compare methods by submitting evaluation metrics.

Datasets

Latest papers with no code

Most implemented Social Latest No code

X-VARS: Introducing Explainability in Football Refereeing with Multi-Modal Large Language Model

no code yet • 7 Apr 2024

The rapid advancement of artificial intelligence has led to significant improvements in automated decision-making.

Paper
Add Code

Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers

no code yet • 29 Feb 2024

Next, we finetune a retrieval model on a small subset where the best caption of each video is manually selected and then employ the model in the whole dataset to select the best caption as the annotation.

Paper
Add Code

Multi-modal News Understanding with Professionally Labelled Videos (ReutersViLNews)

no code yet • 23 Jan 2024

Towards a solution for designing this ability in algorithms, we present a large-scale analysis on an in-house dataset collected by the Reuters News Agency, called Reuters Video-Language News (ReutersViLNews) dataset which focuses on high-level video-language understanding with an emphasis on long-form news.

Paper
Add Code

ActionHub: A Large-scale Action Video Description Dataset for Zero-shot Action Recognition

no code yet • 22 Jan 2024

With the proposed ActionHub dataset, we further propose a novel Cross-modality and Cross-action Modeling (CoCo) framework for ZSAR, which consists of a Dual Cross-modality Alignment module and a Cross-action Invariance Mining module.

Paper
Add Code

Attention Based Encoder Decoder Model for Video Captioning in Nepali (2023)

no code yet • 12 Dec 2023

Video captioning in Nepali, a language written in the Devanagari script, presents a unique challenge due to the lack of existing academic work in this domain.

Paper
Add Code

Multi Sentence Description of Complex Manipulation Action Videos

no code yet • 13 Nov 2023

Automatic video description requires the generation of natural language statements about the actions, events, and objects in the video.

Paper
Add Code

CLearViD: Curriculum Learning for Video Description

no code yet • 8 Nov 2023

We introduce CLearViD, a transformer-based model for video description generation that leverages curriculum learning to accomplish this task.

Paper
Add Code

Analyzing Political Figures in Real-Time: Leveraging YouTube Metadata for Sentiment Analysis

no code yet • 28 Sep 2023

Sentiment analysis using big data from YouTube videos metadata can be conducted to analyze public opinions on various political figures who represent political parties.

Paper
Add Code

Edit As You Wish: Video Description Editing with Multi-grained Commands

no code yet • 15 May 2023

In this paper, we propose a novel Video Description Editing (VDEdit) task to automatically revise an existing video description guided by flexible user requests.

Paper
Add Code

Synchronized Audio-Visual Frames with Fractional Positional Encoding for Transformers in Video-to-Text Translation

no code yet • 28 Dec 2021

Video-to-Text (VTT) is the task of automatically generating descriptions for short audio-visual video clips, which can support visually impaired people to understand scenes of a YouTube video for instance.

Paper
Add Code

Video Description

Benchmarks Add a Result

Datasets

Latest papers with no code

Content

Benchmarks

Add a Result