1 code implementation • 3 Apr 2024 • Druv Pai, Ziyang Wu, Sam Buchanan, Yaodong Yu, Yi Ma
We do this by exploiting a fundamental connection between diffusion, compression, and (masked) completion, deriving a deep transformer-like masked autoencoder architecture, called CRATE-MAE, in which the role of each layer is mathematically fully interpretable: they transform the data distribution to and from a structured representation.
1 code implementation • 22 Nov 2023 • Yaodong Yu, Sam Buchanan, Druv Pai, Tianzhe Chu, Ziyang Wu, Shengbang Tong, Hao Bai, Yuexiang Zhai, Benjamin D. Haeffele, Yi Ma
This leads to a family of white-box transformer-like deep network architectures, named CRATE, which are mathematically fully interpretable.
1 code implementation • 22 Oct 2023 • Zhenghan Fang, Sam Buchanan, Jeremias Sulam
Proximal operators are ubiquitous in inverse problems, commonly appearing as part of algorithmic strategies to regularize problems that are otherwise ill-posed.
1 code implementation • 30 Aug 2023 • Yaodong Yu, Tianzhe Chu, Shengbang Tong, Ziyang Wu, Druv Pai, Sam Buchanan, Yi Ma
Transformer-like models for vision tasks have recently proven effective for a wide range of downstream applications such as segmentation and detection.
no code implementations • ICCV 2023 • Brent Yi, Weijia Zeng, Sam Buchanan, Yi Ma
Factored feature volumes offer a simple way to build more compact, efficient, and intepretable neural fields, but also introduce biases that are not necessarily beneficial for real-world data.
1 code implementation • NeurIPS 2023 • Yaodong Yu, Sam Buchanan, Druv Pai, Tianzhe Chu, Ziyang Wu, Shengbang Tong, Benjamin D. Haeffele, Yi Ma
Particularly, we show that the standard transformer block can be derived from alternating optimization on complementary parts of this objective: the multi-head self-attention operator can be viewed as a gradient descent step to compress the token sets by minimizing their lossy coding rate, and the subsequent multi-layer perceptron can be viewed as attempting to sparsify the representation of the tokens.
1 code implementation • 9 Mar 2022 • Sam Buchanan, Jingkai Yan, Ellie Haber, John Wright
Achieving invariance to nuisance transformations is a fundamental challenge in the construction of robust and reliable vision systems.
no code implementations • NeurIPS 2021 • Tingran Wang, Sam Buchanan, Dar Gilboa, John Wright
Data with low-dimensional nonlinear structure are ubiquitous in engineering and scientific problems.
no code implementations • ICLR 2021 • Sam Buchanan, Dar Gilboa, John Wright
Our analysis demonstrates concrete benefits of depth and width in the context of a practically-motivated model problem: the depth acts as a fitting resource, with larger depths corresponding to smoother networks that can more readily separate the class manifolds, and the width acts as a statistical resource, enabling concentration of the randomly-initialized network and its gradients.