Visual Object Tracking
150 papers with code • 21 benchmarks • 26 datasets
Visual Object Tracking is an important research topic in computer vision, image understanding and pattern recognition. Given the initial state (centre location and scale) of a target in the first frame of a video sequence, the aim of Visual Object Tracking is to automatically obtain the states of the object in the subsequent video frames.
Libraries
Use these libraries to find Visual Object Tracking models and implementationsLatest papers
LRR: Language-Driven Resamplable Continuous Representation against Adversarial Tracking Attacks
To achieve high accuracy on both clean and adversarial data, we propose building a spatial-temporal continuous representation using the semantic text guidance of the object of interest.
OmniVid: A Generative Framework for Universal Video Understanding
The core of video understanding tasks, such as recognition, captioning, and tracking, is to automatically detect objects or actions in a video and analyze their temporal evolution.
Elysium: Exploring Object-level Perception in Videos via MLLM
Multi-modal Large Language Models (MLLMs) have demonstrated their ability to perceive objects in still images, but their application in video-related tasks, such as object tracking, remains understudied.
SDSTrack: Self-Distillation Symmetric Adapter Learning for Multi-Modal Visual Object Tracking
Multimodal Visual Object Tracking (VOT) has recently gained significant attention due to its robustness.
VastTrack: Vast Category Visual Object Tracking
The rich annotations of VastTrack enables development of both the vision-only and the vision-language tracking.
Spatio-temporal Prompting Network for Robust Video Feature Extraction
Then, these video prompts are prepended to the patch embeddings of the current frame as the updated input for video feature extraction.
Correlation-Embedded Transformer Tracking: A Single-Branch Framework
Thus, we reformulate the two-branch Siamese tracking as a conceptually simple, fully transformer-based Single-Branch Tracking pipeline, dubbed SBT.
Explicit Visual Prompts for Visual Object Tracking
Specifically, we utilize spatio-temporal tokens to propagate information between consecutive frames without focusing on updating templates.
ODTrack: Online Dense Temporal Token Learning for Visual Tracking
To alleviate the above problem, we propose a simple, flexible and effective video-level tracking pipeline, named \textbf{ODTrack}, which densely associates the contextual relationships of video frames in an online token propagation manner.
ARTrackV2: Prompting Autoregressive Tracker Where to Look and How to Describe
We present ARTrackV2, which integrates two pivotal aspects of tracking: determining where to look (localization) and how to describe (appearance analysis) the target object across video frames.