Reasoning

Natural Language Visual Grounding

16 papers with code • 0 benchmarks • 6 datasets

This task has no description! Would you like to contribute one?

Benchmarks

Add a Result

These leaderboards are used to track progress in Natural Language Visual Grounding

No evaluation results yet. Help compare methods by submitting evaluation metrics.

Datasets

Most implemented papers

Most implemented Social Latest No code

Panoptic Narrative Grounding

bcv-uniandes/png • • ICCV 2021

This paper proposes Panoptic Narrative Grounding, a spatially fine and general formulation of the natural language visual grounding problem.

Paper
Code

Panoptic Narrative Grounding

bcv-uniandes/png • • 10 Sep 2021

This paper proposes Panoptic Narrative Grounding, a spatially fine and general formulation of the natural language visual grounding problem.

Paper
Code

CALVIN: A Benchmark for Language-Conditioned Policy Learning for Long-Horizon Robot Manipulation Tasks

mees/calvin • • 6 Dec 2021

We show that a baseline model based on multi-context imitation learning performs poorly on CALVIN, suggesting that there is significant room for developing innovative agents that learn to relate human language to their world models with this benchmark.

Paper
Code

TubeDETR: Spatio-Temporal Video Grounding with Transformers

antoyang/TubeDETR • • CVPR 2022

We consider the problem of localizing a spatio-temporal tube in a video corresponding to a given text query.

Paper
Code

Belief Revision based Caption Re-ranker with Visual Semantic Information

ahmedssabir/belief-revision-score • • COLING 2022

In this work, we focus on improving the captions generated by image-caption generation systems.

Paper
Code

Localizing Moments in Long Video Via Multimodal Guidance

waybarrios/guidance-based-video-grounding • • ICCV 2023

In this paper, we propose a method for improving the performance of natural language grounding in long videos by identifying and pruning out non-describable windows.

Paper
Code

Natural Language Visual Grounding

Benchmarks Add a Result

Datasets

Most implemented papers

Panoptic Narrative Grounding

Panoptic Narrative Grounding

CALVIN: A Benchmark for Language-Conditioned Policy Learning for Long-Horizon Robot Manipulation Tasks

TubeDETR: Spatio-Temporal Video Grounding with Transformers

Belief Revision based Caption Re-ranker with Visual Semantic Information

Localizing Moments in Long Video Via Multimodal Guidance

Content

Benchmarks

Add a Result