Video Description

26 papers with code • 0 benchmarks • 7 datasets

The goal of automatic Video Description is to tell a story about events happening in a video. While early Video Description methods produced captions for short clips that were manually segmented to contain a single event of interest, more recently dense video captioning has been proposed to both segment distinct events in time and describe them in a series of coherent sentences. This problem is a generalization of dense image region captioning and has many practical applications, such as generating textual summaries for the visually impaired, or detecting and describing important events in surveillance footage.

Source: Joint Event Detection and Description in Continuous Video Streams

TrafficVLM: A Controllable Visual Language Model for Traffic Video Captioning

quangminhdinh/trafficvlm 14 Apr 2024

Traffic video description and analysis have received much attention recently due to the growing demand for efficient and reliable urban surveillance systems.

10
14 Apr 2024

JMI at SemEval 2024 Task 3: Two-step approach for multimodal ECAC using in-context learning with GPT and instruction-tuned Llama models

cmooncs/semeval-2024_multimodal_ecpe 5 Mar 2024

However, the complexities of these diverse modalities pose challenges for developing an efficient multimodal emotion cause analysis (ECA) system.

2
05 Mar 2024

FunQA: Towards Surprising Video Comprehension

jingkang50/funqa 26 Jun 2023

Surprising videos, such as funny clips, creative performances, or visual illusions, attract significant attention.

89
26 Jun 2023

MSVD-Indonesian: A Benchmark for Multimodal Video-Text Tasks in Indonesian

willyfh/msvd-indonesian 20 Jun 2023

Since the availability of the pretraining resources with Indonesian sentences is relatively limited, the applicability of those approaches to our dataset is still questionable.

3
20 Jun 2023

Fine-grained Audible Video Description

opennlplab/favdbench CVPR 2023

We explore a new task for audio-visual-language modeling called fine-grained audible video description (FAVD).

69
27 Mar 2023

Thinking Hallucination for Video Captioning

nasib-ullah/THVC 28 Sep 2022

In video captioning, there are two kinds of hallucination: object and action hallucination.

11
28 Sep 2022

What's in a Caption? Dataset-Specific Linguistic Diversity and Its Effect on Visual Description Models and Metrics

cannylab/vdtk 12 May 2022

While there have been significant gains in the field of automated video description, the generalization performance of automated description models to novel domains remains a major barrier to using these systems in the real world.

10
12 May 2022

Learn to Understand Negation in Video Retrieval

ruc-aimc-lab/nt2vr 30 Apr 2022

We propose a learning based method for training a negation-aware video retrieval model.

4
30 Apr 2022

Identity-Aware Multi-Sentence Video Description

jamespark3922/lsmdc-fillin ECCV 2020

This auxiliary task allows us to propose a two-stage approach to Identity-Aware Video Description.

13
22 Aug 2020

Describing Unseen Videos via Multi-Modal Cooperative Dialog Agents

L-YeZhu/Video-Description-via-Dialog-Agents-ECCV2020 ECCV 2020

With the arising concerns for the AI systems provided with direct access to abundant sensitive information, researchers seek to develop more reliable AI with implicit information sources.

5
18 Aug 2020