MeMOTR: Long-Term Memory-Augmented Transformer for Multi-Object Tracking

ICCV 2023  ·  Ruopeng Gao, LiMin Wang ·

As a video task, Multiple Object Tracking (MOT) is expected to capture temporal information of targets effectively. Unfortunately, most existing methods only explicitly exploit the object features between adjacent frames, while lacking the capacity to model long-term temporal information. In this paper, we propose MeMOTR, a long-term memory-augmented Transformer for multi-object tracking. Our method is able to make the same object's track embedding more stable and distinguishable by leveraging long-term memory injection with a customized memory-attention layer. This significantly improves the target association ability of our model. Experimental results on DanceTrack show that MeMOTR impressively surpasses the state-of-the-art method by 7.9% and 13.0% on HOTA and AssA metrics, respectively. Furthermore, our model also outperforms other Transformer-based methods on association performance on MOT17 and generalizes well on BDD100K. Code is available at https://github.com/MCG-NJU/MeMOTR.

PDF Abstract ICCV 2023 PDF ICCV 2023 Abstract

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Multi-Object Tracking DanceTrack MeMOTR HOTA 68.5 # 6
DetA 80.5 # 13
AssA 58.4 # 6
MOTA 89.9 # 15
IDF1 71.2 # 7
Multi-Object Tracking DanceTrack MeMOTR (Deformable DETR) HOTA 63.4 # 11
DetA 77.0 # 18
AssA 52.3 # 10
MOTA 85.4 # 21
IDF1 65.5 # 10
Multi-Object Tracking SportsMOT MeMOTR HOTA 70.0 # 6
IDF1 71.4 # 8
AssA 59.1 # 6
MOTA 91.5 # 9
DetA 83.1 # 5
Multiple Object Tracking SportsMOT MeMOTR HOTA 70.0 # 7
IDF1 71.4 # 8
AssA 59.1 # 7
MOTA 91.5 # 9
DetA 83.1 # 7
Multiple Object Tracking SportsMOT MeMOTR (Deformable-DETR) HOTA 68.8 # 8
IDF1 69.9 # 10
AssA 57.8 # 8
MOTA 90.2 # 11
DetA 82.0 # 9
Multi-Object Tracking SportsMOT MeMOTR (Deformable-DETR) HOTA 68.8 # 8
IDF1 69.9 # 10
AssA 57.8 # 7
MOTA 90.2 # 11
DetA 82.0 # 8

Methods