no code implementations • 15 May 2023 • Ibrahim Batuhan Akkaya, Senthilkumar S. Kathiresan, Elahe Arani, Bahram Zonooz
Vision transformers (ViTs) achieve remarkable performance on large datasets, but tend to perform worse than convolutional neural networks (CNNs) when trained from scratch on smaller datasets, possibly due to a lack of local inductive bias in the architecture.