Natural Language Visual Grounding

16 papers with code • 0 benchmarks • 6 datasets

This task has no description! Would you like to contribute one?

Most implemented papers

Panoptic Narrative Grounding

bcv-uniandes/png ICCV 2021

This paper proposes Panoptic Narrative Grounding, a spatially fine and general formulation of the natural language visual grounding problem.

Panoptic Narrative Grounding

bcv-uniandes/png 10 Sep 2021

This paper proposes Panoptic Narrative Grounding, a spatially fine and general formulation of the natural language visual grounding problem.

CALVIN: A Benchmark for Language-Conditioned Policy Learning for Long-Horizon Robot Manipulation Tasks

mees/calvin 6 Dec 2021

We show that a baseline model based on multi-context imitation learning performs poorly on CALVIN, suggesting that there is significant room for developing innovative agents that learn to relate human language to their world models with this benchmark.

TubeDETR: Spatio-Temporal Video Grounding with Transformers

antoyang/TubeDETR CVPR 2022

We consider the problem of localizing a spatio-temporal tube in a video corresponding to a given text query.

Belief Revision based Caption Re-ranker with Visual Semantic Information

ahmedssabir/belief-revision-score COLING 2022

In this work, we focus on improving the captions generated by image-caption generation systems.

Localizing Moments in Long Video Via Multimodal Guidance

waybarrios/guidance-based-video-grounding ICCV 2023

In this paper, we propose a method for improving the performance of natural language grounding in long videos by identifying and pruning out non-describable windows.