1 code implementation • 19 Mar 2024 • Baifeng Shi, Ziyang Wu, Maolin Mao, Xin Wang, Trevor Darrell
Our results show that a multi-scale smaller model has comparable learning capacity to a larger model, and pre-training smaller models with S$^2$ can match or even exceed the advantage of larger models.
no code implementations • 29 Feb 2024 • Ilija Radosavovic, Bike Zhang, Baifeng Shi, Jathushan Rajasegaran, Sarthak Kamat, Trevor Darrell, Koushil Sreenath, Jitendra Malik
We cast real-world humanoid control as a next token prediction problem, akin to predicting the next word in language.
1 code implementation • 25 Jan 2024 • Letian Fu, Long Lian, Renhao Wang, Baifeng Shi, Xudong Wang, Adam Yala, Trevor Darrell, Alexei A. Efros, Ken Goldberg
In this work, we re-examine inter-patch dependencies in the decoding mechanism of masked autoencoders (MAE).
no code implementations • 4 Dec 2023 • Jiaxin Ge, Sanjay Subramanian, Baifeng Shi, Roei Herzig, Trevor Darrell
Visual Programming (VP) has emerged as a powerful framework for Visual Question Answering (VQA).
no code implementations • 29 Sep 2023 • Long Lian, Baifeng Shi, Adam Yala, Trevor Darrell, Boyi Li
We show that LLMs are able to understand complex spatiotemporal dynamics from text alone and generate layouts that align closely with both the prompts and the object motion patterns typically observed in the real world.
no code implementations • 16 Jun 2023 • Ilija Radosavovic, Baifeng Shi, Letian Fu, Ken Goldberg, Trevor Darrell, Jitendra Malik
We present a self-supervised sensorimotor pre-training approach for robotics.
1 code implementation • 24 May 2023 • Baifeng Shi, Siyu Gai, Trevor Darrell, Xin Wang
We introduce Top-Down Attention Steering (TOAST), a novel transfer learning algorithm that keeps the pre-trained backbone frozen, selects task-relevant features in the output, and feeds those features back to the model to steer the attention to the task-specific features.
1 code implementation • CVPR 2023 • Baifeng Shi, Trevor Darrell, Xin Wang
In this paper, we consider top-down attention from a classic Analysis-by-Synthesis (AbS) perspective of vision.
1 code implementation • 23 Apr 2022 • Baifeng Shi, Yale Song, Neel Joshi, Trevor Darrell, Xin Wang
We present VARS, Visual Attention from Recurrent Sparse reconstruction, a new attention formulation built on two prominent features of the human visual attention mechanism: recurrency and sparsity.
no code implementations • ICCV 2021 • Baifeng Shi, Qi Dai, Judy Hoffman, Kate Saenko, Trevor Darrell, Huijuan Xu
We extensively benchmark against the baselines for SSAD and OSAD on our created data splits in THUMOS14 and ActivityNet1. 2, and demonstrate the effectiveness of the proposed UFA and IB methods.
no code implementations • NeurIPS 2020 • Baifeng Shi, Judy Hoffman, Kate Saenko, Trevor Darrell, Huijuan Xu
By adjusting the auxiliary task weights to minimize the divergence between the surrogate prior and the true prior of the main task, we obtain a more accurate prior estimation, achieving the goal of minimizing the required amount of training data for the main task and avoiding a costly grid search.
1 code implementation • ICML 2020 • Baifeng Shi, Dinghuai Zhang, Qi Dai, Zhanxing Zhu, Yadong Mu, Jingdong Wang
Specifically, we discriminate texture from shape based on local self-information in an image, and adopt a Dropout-like algorithm to decorrelate the model output from the local texture.
1 code implementation • ECCV 2020 • Zhekun Luo, Devin Guillory, Baifeng Shi, Wei Ke, Fang Wan, Trevor Darrell, Huijuan Xu
Weakly-supervised action localization requires training a model to localize the action segments in the video given only video level action label.
Ranked #9 on Weakly Supervised Action Localization on THUMOS’14
1 code implementation • CVPR 2020 • Baifeng Shi, Qi Dai, Yadong Mu, Jingdong Wang
By maximizing the conditional probability with respect to the attention, the action and non-action frames are well separated.
Weakly Supervised Action Localization Weakly-supervised Temporal Action Localization +1