ODTrack: Online Dense Temporal Token Learning for Visual Tracking

3 Jan 2024  ยท  Yaozong Zheng, Bineng Zhong, Qihua Liang, Zhiyi Mo, Shengping Zhang, Xianxian Li ยท

Online contextual reasoning and association across consecutive video frames are critical to perceive instances in visual tracking. However, most current top-performing trackers persistently lean on sparse temporal relationships between reference and search frames via an offline mode. Consequently, they can only interact independently within each image-pair and establish limited temporal correlations. To alleviate the above problem, we propose a simple, flexible and effective video-level tracking pipeline, named \textbf{ODTrack}, which densely associates the contextual relationships of video frames in an online token propagation manner. ODTrack receives video frames of arbitrary length to capture the spatio-temporal trajectory relationships of an instance, and compresses the discrimination features (localization information) of a target into a token sequence to achieve frame-to-frame association. This new solution brings the following benefits: 1) the purified token sequences can serve as prompts for the inference in the next video frame, whereby past information is leveraged to guide future inference; 2) the complex online update strategies are effectively avoided by the iterative propagation of token sequences, and thus we can achieve more efficient model representation and computation. ODTrack achieves a new \textit{SOTA} performance on seven benchmarks, while running at real-time speed. Code and models are available at \url{https://github.com/GXNU-ZhongLab/ODTrack}.

PDF Abstract

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Benchmark
Visual Object Tracking GOT-10k ODTrack-B Average Overlap 77.0 # 4
Visual Object Tracking GOT-10k ODTrack-L Average Overlap 78.2 # 3
Visual Object Tracking LaSOT ODTrack-B AUC 73.2 # 4
Visual Object Tracking LaSOT ODTrack-L AUC 74.0 # 1
Visual Object Tracking LaSOT-ext ODTrack-L AUC 53.9 # 2
Visual Object Tracking LaSOT-ext ODTrack-B AUC 52.4 # 6
Visual Object Tracking OTB-2015 ODTrack-B AUC 0.723 # 2
Visual Object Tracking OTB-2015 ODTrack-L AUC 0.724 # 1
Visual Object Tracking TNL2K ODTrack-B AUC 60.9 # 3
Visual Object Tracking TNL2K ODTrack-L AUC 61.7 # 1
Visual Object Tracking TrackingNet ODTrack-L Accuracy 86.1 # 1
Visual Object Tracking TrackingNet ODTrack-B Accuracy 85.1 # 7
Semi-Supervised Video Object Segmentation VOT2020 ODTrack-B EAO 0.581 # 7
Semi-Supervised Video Object Segmentation VOT2020 ODTrack-L EAO 0.605 # 3

Methods


No methods listed for this paper. Add relevant methods here