Video Description

26 papers with code • 0 benchmarks • 7 datasets

The goal of automatic Video Description is to tell a story about events happening in a video. While early Video Description methods produced captions for short clips that were manually segmented to contain a single event of interest, more recently dense video captioning has been proposed to both segment distinct events in time and describe them in a series of coherent sentences. This problem is a generalization of dense image region captioning and has many practical applications, such as generating textual summaries for the visually impaired, or detecting and describing important events in surveillance footage.

Source: Joint Event Detection and Description in Continuous Video Streams

Benchmarks

Add a Result

These leaderboards are used to track progress in Video Description

No evaluation results yet. Help compare methods by submitting evaluation metrics.

Datasets

Latest papers

Most implemented Social Latest No code

TrafficVLM: A Controllable Visual Language Model for Traffic Video Captioning

quangminhdinh/trafficvlm • • 14 Apr 2024

Traffic video description and analysis have received much attention recently due to the growing demand for efficient and reliable urban surveillance systems.

14 Apr 2024

Paper
Code

JMI at SemEval 2024 Task 3: Two-step approach for multimodal ECAC using in-context learning with GPT and instruction-tuned Llama models

cmooncs/semeval-2024_multimodal_ecpe • • 5 Mar 2024

However, the complexities of these diverse modalities pose challenges for developing an efficient multimodal emotion cause analysis (ECA) system.

05 Mar 2024

Paper
Code

FunQA: Towards Surprising Video Comprehension

jingkang50/funqa • 26 Jun 2023

Surprising videos, such as funny clips, creative performances, or visual illusions, attract significant attention.

26 Jun 2023

Paper
Code

MSVD-Indonesian: A Benchmark for Multimodal Video-Text Tasks in Indonesian

willyfh/msvd-indonesian • 20 Jun 2023

Since the availability of the pretraining resources with Indonesian sentences is relatively limited, the applicability of those approaches to our dataset is still questionable.

20 Jun 2023

Paper
Code

Fine-grained Audible Video Description

opennlplab/favdbench • • CVPR 2023

We explore a new task for audio-visual-language modeling called fine-grained audible video description (FAVD).

27 Mar 2023

Paper
Code

Thinking Hallucination for Video Captioning

nasib-ullah/THVC • • 28 Sep 2022

In video captioning, there are two kinds of hallucination: object and action hallucination.

28 Sep 2022

Paper
Code

What's in a Caption? Dataset-Specific Linguistic Diversity and Its Effect on Visual Description Models and Metrics

cannylab/vdtk • 12 May 2022

While there have been significant gains in the field of automated video description, the generalization performance of automated description models to novel domains remains a major barrier to using these systems in the real world.

12 May 2022

Paper
Code

Learn to Understand Negation in Video Retrieval

ruc-aimc-lab/nt2vr • • 30 Apr 2022

We propose a learning based method for training a negation-aware video retrieval model.

30 Apr 2022

Paper
Code

Identity-Aware Multi-Sentence Video Description

jamespark3922/lsmdc-fillin • • ECCV 2020

This auxiliary task allows us to propose a two-stage approach to Identity-Aware Video Description.

22 Aug 2020

Paper
Code

Describing Unseen Videos via Multi-Modal Cooperative Dialog Agents

L-YeZhu/Video-Description-via-Dialog-Agents-ECCV2020 • • ECCV 2020

With the arising concerns for the AI systems provided with direct access to abundant sensitive information, researchers seek to develop more reliable AI with implicit information sources.

18 Aug 2020

Paper
Code

Video Description

Benchmarks Add a Result

Datasets

Latest papers

Content

Benchmarks

Add a Result