no code implementations • 23 May 2024 • Ting Liu, Xuyang Liu, Liangtao Shi, Zunnan Xu, Siteng Huang, Yi Xin, Quanjun Yin
Sparse-Tuning efficiently fine-tunes the pre-trained ViT by sparsely preserving the informative tokens and merging redundant ones, enabling the ViT to focus on the foreground while reducing computational costs on background regions in the images.
no code implementations • 15 Mar 2024 • Jinxia Xie, Bineng Zhong, Zhiyi Mo, Shengping Zhang, Liangtao Shi, Shuxiang Song, Rongrong Ji
Firstly, we introduce a set of learnable and autoregressive queries to capture the instantaneous target appearance changes in a sliding window fashion.
1 code implementation • 6 Jan 2024 • Liangtao Shi, Bineng Zhong, Qihua Liang, Ning li, Shengping Zhang, Xianxian Li
Specifically, we utilize spatio-temporal tokens to propagate information between consecutive frames without focusing on updating templates.