Visual Tracking
168 papers with code • 9 benchmarks • 26 datasets
Visual Tracking is an essential and actively researched problem in the field of computer vision with various real-world applications such as robotic services, smart surveillance systems, autonomous driving, and human-computer interaction. It refers to the automatic estimation of the trajectory of an arbitrary target object, usually specified by a bounding box in the first frame, as it moves around in subsequent video frames.
Source: Learning Reinforced Attentional Representation for End-to-End Visual Tracking
Libraries
Use these libraries to find Visual Tracking models and implementationsLatest papers
Long-term Frame-Event Visual Tracking: Benchmark Dataset and Baseline
Current event-/frame-event based trackers undergo evaluation on short-term tracking datasets, however, the tracking of real-world scenarios involves long-term tracking, and the performance of existing tracking algorithms in these scenarios remains unclear.
VastTrack: Vast Category Visual Object Tracking
The rich annotations of VastTrack enables development of both the vision-only and the vision-language tracking.
Unifying Visual and Vision-Language Tracking via Contrastive Learning
Single object tracking aims to locate the target object in a video sequence according to the state specified by different modal references, including the initial bounding box (BBOX), natural language (NL), or both (NL+BBOX).
Multi-task Learning for Joint Re-identification, Team Affiliation, and Role Classification for Sports Visual Tracking
To demonstrate the effectiveness of PRTreID, it is integrated with a state-of-the-art tracking method, using a part-based post-processing module to handle long-term tracking.
Explicit Visual Prompts for Visual Object Tracking
Specifically, we utilize spatio-temporal tokens to propagate information between consecutive frames without focusing on updating templates.
ODTrack: Online Dense Temporal Token Learning for Visual Tracking
To alleviate the above problem, we propose a simple, flexible and effective video-level tracking pipeline, named \textbf{ODTrack}, which densely associates the contextual relationships of video frames in an online token propagation manner.
Cross-Modal Object Tracking via Modality-Aware Fusion Network and A Large-Scale Dataset
Visual tracking often faces challenges such as invalid targets and decreased performance in low-light conditions when relying solely on RGB image sequences.
ZoomTrack: Target-aware Non-uniform Resizing for Efficient Visual Tracking
To this end, we non-uniformly resize the cropped image to have a smaller input size while the resolution of the area where the target is more likely to appear is higher and vice versa.
Event Stream-based Visual Object Tracking: A High-Resolution Benchmark Dataset and A Novel Baseline
Tracking using bio-inspired event cameras has drawn more and more attention in recent years.
LiteTrack: Layer Pruning with Asynchronous Feature Extraction for Lightweight and Efficient Visual Tracking
As an example, our fastest variant, LiteTrack-B4, achieves 65. 2% AO on the GOT-10k benchmark, surpassing all preceding efficient trackers, while running over 100 fps with ONNX on the Jetson Orin NX edge device.