ARTrackV2: Prompting Autoregressive Tracker Where to Look and How to Describe

28 Dec 2023  ยท  Yifan Bai, Zeyang Zhao, Yihong Gong, Xing Wei ยท

We present ARTrackV2, which integrates two pivotal aspects of tracking: determining where to look (localization) and how to describe (appearance analysis) the target object across video frames. Building on the foundation of its predecessor, ARTrackV2 extends the concept by introducing a unified generative framework to "read out" object's trajectory and "retell" its appearance in an autoregressive manner. This approach fosters a time-continuous methodology that models the joint evolution of motion and visual features, guided by previous estimates. Furthermore, ARTrackV2 stands out for its efficiency and simplicity, obviating the less efficient intra-frame autoregression and hand-tuned parameters for appearance updates. Despite its simplicity, ARTrackV2 achieves state-of-the-art performance on prevailing benchmark datasets while demonstrating remarkable efficiency improvement. In particular, ARTrackV2 achieves AO score of 79.5\% on GOT-10k, and AUC of 86.1\% on TrackingNet while being $3.6 \times$ faster than ARTrack. The code will be released.

PDF Abstract

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Benchmark
Visual Object Tracking GOT-10k ARTrackV2-L Average Overlap 79.5 # 1
Success Rate 0.5 87.8 # 1
Success Rate 0.75 79.6 # 1
Visual Object Tracking LaSOT ARTrackV2-L AUC 73.6 # 2
Normalized Precision 82.8 # 1
Precision 81.1 # 1
Visual Object Tracking LaSOT-ext ARTrackV2-L AUC 53.4 # 3
Normalized Precision 63.7 # 2
Precision 60.2 # 2
Visual Object Tracking NeedForSpeed ARTrackV2-L AUC 0.684 # 1
Visual Object Tracking TNL2K ARTrackV2-L AUC 61.6 # 2
Visual Object Tracking TrackingNet ARTrackV2-L Precision 86.2 # 2
Normalized Precision 90.4 # 1
Accuracy 86.1 # 1
Visual Object Tracking UAV123 ARTrackV2-L AUC 0.717 # 2

Methods


No methods listed for this paper. Add relevant methods here