Efficient RGB-T Tracking via Cross-Modality Distillation

Most current RGB-T trackers adopt a two-stream structure to extract unimodal RGB and thermal features and complex fusion strategies to achieve multi-modal feature fusion, which require a huge number of parameters, thus hindering their real-life applications. On the other hand, a compact RGB-T tracker may be computationally efficient but encounter non-negligible performance degradation, due to the weakening of feature representation ability. To remedy this situation, a cross-modality distillation framework is presented to bridge the performance gap between a compact tracker and a powerful tracker. Specifically, a specific-common feature distillation module is proposed to transform the modality-common information as well as the modality-specific information from a deeper two-stream network to a shallower single-stream network. In addition, a multi-path selection distillation module is proposed to instruct a simple fusion module to learn more accurate multi-modal information from a well-designed fusion mechanism by using multiple paths. We validate the effectiveness of our method with extensive experiments on three RGB-T benchmarks, which achieves state-of-the-art performance but consumes much less computational resources.

PDF Abstract
No code implementations yet. Submit your code now

Datasets


Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Benchmark
Rgb-T Tracking GTOT CMD Precision 89.2 # 5
Success 73.4 # 4
Rgb-T Tracking LasHeR CMD Precision 59.0 # 11
Success 46.6 # 11
Rgb-T Tracking RGBT234 CMD Precision 82.4 # 14
Success 58.4 # 15

Methods


No methods listed for this paper. Add relevant methods here