1 code implementation • 3 Apr 2024 • Druv Pai, Ziyang Wu, Sam Buchanan, Yaodong Yu, Yi Ma
We do this by exploiting a fundamental connection between diffusion, compression, and (masked) completion, deriving a deep transformer-like masked autoencoder architecture, called CRATE-MAE, in which the role of each layer is mathematically fully interpretable: they transform the data distribution to and from a structured representation.
1 code implementation • 22 Nov 2023 • Yaodong Yu, Sam Buchanan, Druv Pai, Tianzhe Chu, Ziyang Wu, Shengbang Tong, Hao Bai, Yuexiang Zhai, Benjamin D. Haeffele, Yi Ma
This leads to a family of white-box transformer-like deep network architectures, named CRATE, which are mathematically fully interpretable.
1 code implementation • 30 Aug 2023 • Yaodong Yu, Tianzhe Chu, Shengbang Tong, Ziyang Wu, Druv Pai, Sam Buchanan, Yi Ma
Transformer-like models for vision tasks have recently proven effective for a wide range of downstream applications such as segmentation and detection.
1 code implementation • NeurIPS 2023 • Yaodong Yu, Sam Buchanan, Druv Pai, Tianzhe Chu, Ziyang Wu, Shengbang Tong, Benjamin D. Haeffele, Yi Ma
Particularly, we show that the standard transformer block can be derived from alternating optimization on complementary parts of this objective: the multi-head self-attention operator can be viewed as a gradient descent step to compress the token sets by minimizing their lossy coding rate, and the subsequent multi-layer perceptron can be viewed as attempting to sparsify the representation of the tokens.
1 code implementation • 2 May 2023 • Michael Psenka, Druv Pai, Vishal Raman, Shankar Sastry, Yi Ma
This work proposes an algorithm for explicitly constructing a pair of neural networks that linearize and reconstruct an embedded submanifold, from finite samples of this manifold.
no code implementations • 18 Feb 2023 • Xili Dai, Ke Chen, Shengbang Tong, Jingyuan Zhang, Xingjian Gao, Mingyang Li, Druv Pai, Yuexiang Zhai, Xiaojun Yuan, Heung-Yeung Shum, Lionel M. Ni, Yi Ma
Our method is arguably the first to demonstrate that a concatenation of multiple convolution sparse coding/decoding layers leads to an interpretable and effective autoencoder for modeling the distribution of large-scale natural image datasets.
1 code implementation • 18 Jun 2022 • Druv Pai, Michael Psenka, Chih-Yuan Chiu, Manxi Wu, Edgar Dobriban, Yi Ma
We consider the problem of learning discriminative representations for data in a high-dimensional space with distribution supported on or around multiple low-dimensional linear subspaces.
no code implementations • 29 May 2022 • Chinmay Maheshwari, Manxi Wu, Druv Pai, Shankar Sastry
We propose a multi-agent reinforcement learning dynamics, and analyze its convergence in infinite-horizon discounted Markov potential games.