no code implementations • 16 May 2023 • Asher Trockman, J. Zico Kolter
It is notoriously difficult to train Transformers on small datasets; typically, large pre-trained models are instead used as the starting point.
no code implementations • 7 Oct 2022 • Asher Trockman, Devin Willmott, J. Zico Kolter
In this work, we first observe that such learned filters have highly-structured covariance matrices, and moreover, we find that covariances calculated from small networks may be used to effectively initialize a variety of larger networks of different depths, widths, patch sizes, and kernel sizes, indicating a degree of model-independence to the covariance structure.
11 code implementations • 24 Jan 2022 • Asher Trockman, J. Zico Kolter
Despite its simplicity, we show that the ConvMixer outperforms the ViT, MLP-Mixer, and some of their variants for similar parameter counts and data set sizes, in addition to outperforming classical vision models such as the ResNet.
Ranked #96 on Image Classification on CIFAR-10
1 code implementation • ICLR 2021 • Asher Trockman, J. Zico Kolter
Recent work has highlighted several advantages of enforcing orthogonality in the weight layers of deep networks, such as maintaining the stability of activations, preserving gradient norms, and enhancing adversarial robustness by enforcing low Lipschitz constants.