Search Results for author: Asher Trockman

Found 4 papers, 2 papers with code

Mimetic Initialization of Self-Attention Layers

no code implementations • 16 May 2023 • Asher Trockman, J. Zico Kolter

It is notoriously difficult to train Transformers on small datasets; typically, large pre-trained models are instead used as the starting point.

Paper
Add Code

Understanding the Covariance Structure of Convolutional Filters

no code implementations • 7 Oct 2022 • Asher Trockman, Devin Willmott, J. Zico Kolter

In this work, we first observe that such learned filters have highly-structured covariance matrices, and moreover, we find that covariances calculated from small networks may be used to effectively initialize a variety of larger networks of different depths, widths, patch sizes, and kernel sizes, indicating a degree of model-independence to the covariance structure.

Paper
Add Code

Patches Are All You Need?

11 code implementations • 24 Jan 2022 • Asher Trockman, J. Zico Kolter

Despite its simplicity, we show that the ConvMixer outperforms the ViT, MLP-Mixer, and some of their variants for similar parameter counts and data set sizes, in addition to outperforming classical vision models such as the ResNet.

Ranked #96 on Image Classification on CIFAR-10

Image Classification

48,540

Paper
Code

Orthogonalizing Convolutional Layers with the Cayley Transform

1 code implementation • ICLR 2021 • Asher Trockman, J. Zico Kolter

Recent work has highlighted several advantages of enforcing orthogonality in the weight layers of deep networks, such as maintaining the stability of activations, preserving gradient norms, and enhancing adversarial robustness by enforcing low Lipschitz constants.

Adversarial Robustness

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.