GroOT (Grounded Multiple Object Tracking)

One of the recent trends in vision problems is to use natural language captions to describe the objects of interest. This approach can overcome some limitations of traditional methods that rely on bounding boxes or category annotations. This paper introduces a novel paradigm for Multiple Object Tracking called Type-to-Track, which allows users to track objects in videos by typing natural language descriptions. We present a new dataset for that Grounded Multiple Object Tracking task, called GroOT, that contains videos with various types of objects and their corresponding textual captions of 256K words describing their appearance and action in detail. To cover a diverse range of scenes, GroOT was created using official videos and bounding box annotations from the MOT17, TAO and MOT20.

Homepage

Benchmarks

Add a new result Link an existing benchmark

No benchmarks yet. Start a new benchmark or link an existing one.

Papers

Paper	Code	Results	Date	Stars

Dataset Loaders

Add Remove

No data loaders found. You can submit your data loader here.

Tasks

Multi-Object Tracking

Similar Datasets

TAO

TNL2K

License

CC BY-NC-SA 3.0

Modalities

Languages

English

GroOT (Grounded Multiple Object Tracking)

Benchmarks Edit Add a new result Link an existing benchmark

Papers

Dataset Loaders Edit Add Remove

Tasks Edit