Search Results for author: Baifeng Shi

Found 14 papers, 8 papers with code

When Do We Not Need Larger Vision Models?

1 code implementation • 19 Mar 2024 • Baifeng Shi, Ziyang Wu, Maolin Mao, Xin Wang, Trevor Darrell

Our results show that a multi-scale smaller model has comparable learning capacity to a larger model, and pre-training smaller models with S$^2$ can match or even exceed the advantage of larger models.

Depth Estimation

184

Paper
Code

Humanoid Locomotion as Next Token Prediction

no code implementations • 29 Feb 2024 • Ilija Radosavovic, Bike Zhang, Baifeng Shi, Jathushan Rajasegaran, Sarthak Kamat, Trevor Darrell, Koushil Sreenath, Jitendra Malik

We cast real-world humanoid control as a next token prediction problem, akin to predicting the next word in language.

Humanoid Control

Paper
Add Code

Rethinking Patch Dependence for Masked Autoencoders

1 code implementation • 25 Jan 2024 • Letian Fu, Long Lian, Renhao Wang, Baifeng Shi, Xudong Wang, Adam Yala, Trevor Darrell, Alexei A. Efros, Ken Goldberg

In this work, we re-examine inter-patch dependencies in the decoding mechanism of masked autoencoders (MAE).

Instance Segmentation Representation Learning +1

Paper
Code

Recursive Visual Programming

no code implementations • 4 Dec 2023 • Jiaxin Ge, Sanjay Subramanian, Baifeng Shi, Roei Herzig, Trevor Darrell

Visual Programming (VP) has emerged as a powerful framework for Visual Question Answering (VQA).

Code Generation Question Answering +1

Paper
Add Code

LLM-grounded Video Diffusion Models

no code implementations • 29 Sep 2023 • Long Lian, Baifeng Shi, Adam Yala, Trevor Darrell, Boyi Li

We show that LLMs are able to understand complex spatiotemporal dynamics from text alone and generate layouts that align closely with both the prompts and the object motion patterns typically observed in the real world.

Language Modelling Large Language Model +1

Paper
Add Code

Robot Learning with Sensorimotor Pre-training

no code implementations • 16 Jun 2023 • Ilija Radosavovic, Baifeng Shi, Letian Fu, Ken Goldberg, Trevor Darrell, Jitendra Malik

We present a self-supervised sensorimotor pre-training approach for robotics.

Motion Planning

Paper
Add Code

TOAST: Transfer Learning via Attention Steering

1 code implementation • 24 May 2023 • Baifeng Shi, Siyu Gai, Trevor Darrell, Xin Wang

We introduce Top-Down Attention Steering (TOAST), a novel transfer learning algorithm that keeps the pre-trained backbone frozen, selects task-relevant features in the output, and feeds those features back to the model to steer the attention to the task-specific features.

Fine-Grained Image Classification Instruction Following +2

182

Paper
Code

Top-Down Visual Attention from Analysis by Synthesis

1 code implementation • CVPR 2023 • Baifeng Shi, Trevor Darrell, Xin Wang

In this paper, we consider top-down attention from a classic Analysis-by-Synthesis (AbS) perspective of vision.

Retrieval Semantic Segmentation +1

157

Paper
Code

Visual Attention Emerges from Recurrent Sparse Reconstruction

1 code implementation • 23 Apr 2022 • Baifeng Shi, Yale Song, Neel Joshi, Trevor Darrell, Xin Wang

We present VARS, Visual Attention from Recurrent Sparse reconstruction, a new attention formulation built on two prominent features of the human visual attention mechanism: recurrency and sparsity.

Paper
Code

Temporal Action Detection with Multi-level Supervision

no code implementations • ICCV 2021 • Baifeng Shi, Qi Dai, Judy Hoffman, Kate Saenko, Trevor Darrell, Huijuan Xu

We extensively benchmark against the baselines for SSAD and OSAD on our created data splits in THUMOS14 and ActivityNet1. 2, and demonstrate the effectiveness of the proposed UFA and IB methods.

Action Detection Semi-Supervised Action Detection

Paper
Add Code

Auxiliary Task Reweighting for Minimum-data Learning

no code implementations • NeurIPS 2020 • Baifeng Shi, Judy Hoffman, Kate Saenko, Trevor Darrell, Huijuan Xu

By adjusting the auxiliary task weights to minimize the divergence between the surrogate prior and the true prior of the main task, we obtain a more accurate prior estimation, achieving the goal of minimizing the required amount of training data for the main task and avoiding a costly grid search.

Domain Adaptation Multi-Label Classification

Paper
Add Code

Informative Dropout for Robust Representation Learning: A Shape-bias Perspective

1 code implementation • ICML 2020 • Baifeng Shi, Dinghuai Zhang, Qi Dai, Zhanxing Zhu, Yadong Mu, Jingdong Wang

Specifically, we discriminate texture from shape based on local self-information in an image, and adopt a Dropout-like algorithm to decorrelate the model output from the local texture.

Domain Generalization Representation Learning

125

Paper
Code