Temporal 3D Shape Modeling for Video-Based Cloth-Changing Person Re-Identification

Video-based Cloth-Changing Person Re-ID (VCCRe-ID) refers to a real-world Re-ID problem where texture information like appearance or clothing becomes unreliable in long-term, limiting the applicability of traditional Re-ID methods. VCCRe-ID has not been well studied primarily due to (1) limited public datasets and (2) challenges related to extracting identity-related clothes-invariant cues from videos. Few existing works have heavily focused on gait-based features, which are severely affected under viewpoint changes and occlusions. In this work, we propose "Temporal 3D ShapE Modeling for VCCRe-ID" (SEMI), a lightweight end-to-end framework that addresses these issues by learning human 3D shape representations. The SEMI framework comprises of a Temporal 3D Shape Modeling branch, which extracts discriminative frame-wise 3D shape features using a temporal encoder, and an identity-aware 3D regressor. This is followed by a novel Attention-based Shape Aggregation (ASA) module that effectively aggregates frame-wise shape features for a fine-grained video-wise shape embedding. ASA leverages an attention mechanism to amplify the contribution of the most important frames while reducing redundancy during the aggregation process. Experiments on two VCCRe-ID datasets demonstrate that our proposed framework outperforms state-of-the-art methods by 10.7% in rank-1 accuracy and 7.4% in mAP in cloth-changing setting.

PDF Abstract

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here