Natural Language Visual Grounding
16 papers with code • 0 benchmarks • 6 datasets
Benchmarks
These leaderboards are used to track progress in Natural Language Visual Grounding
Most implemented papers
Panoptic Narrative Grounding
This paper proposes Panoptic Narrative Grounding, a spatially fine and general formulation of the natural language visual grounding problem.
Panoptic Narrative Grounding
This paper proposes Panoptic Narrative Grounding, a spatially fine and general formulation of the natural language visual grounding problem.
CALVIN: A Benchmark for Language-Conditioned Policy Learning for Long-Horizon Robot Manipulation Tasks
We show that a baseline model based on multi-context imitation learning performs poorly on CALVIN, suggesting that there is significant room for developing innovative agents that learn to relate human language to their world models with this benchmark.
TubeDETR: Spatio-Temporal Video Grounding with Transformers
We consider the problem of localizing a spatio-temporal tube in a video corresponding to a given text query.
Belief Revision based Caption Re-ranker with Visual Semantic Information
In this work, we focus on improving the captions generated by image-caption generation systems.
Localizing Moments in Long Video Via Multimodal Guidance
In this paper, we propose a method for improving the performance of natural language grounding in long videos by identifying and pruning out non-describable windows.