no code implementations • CVPR 2021 • Jean-Baptiste Cordonnier, Aravindh Mahendran, Alexey Dosovitskiy, Dirk Weissenborn, Jakob Uszkoreit, Thomas Unterthiner
Neural Networks require large amounts of memory and compute to process high resolution images, even when only a small part of the image is actually informative for the task at hand.
1 code implementation • 5 Mar 2021 • Yihe Dong, Jean-Baptiste Cordonnier, Andreas Loukas
Attention-based architectures have become ubiquitous in machine learning, yet our understanding of the reasons for their effectiveness remains limited.
1 code implementation • ICLR 2021 • David W. Romero, Jean-Baptiste Cordonnier
We provide a general self-attention formulation to impose group equivariance to arbitrary symmetry groups.
2 code implementations • 29 Jun 2020 • Jean-Baptiste Cordonnier, Andreas Loukas, Martin Jaggi
We also show that it is possible to re-parametrize a pre-trained multi-head attention layer into our collaborative attention layer.
2 code implementations • 28 Dec 2019 • Ali Sabet, Prakhar Gupta, Jean-Baptiste Cordonnier, Robert West, Martin Jaggi
Recent advances in cross-lingual word embeddings have primarily relied on mapping-based methods, which project pretrained word embeddings from different languages into a shared space through a linear transformation.
Cross-Lingual Document Classification Cross-Lingual Word Embeddings +8
1 code implementation • ICLR 2020 • Jean-Baptiste Cordonnier, Andreas Loukas, Martin Jaggi
This work provides evidence that attention layers can perform convolution and, indeed, they often learn to do so in practice.
Ranked #151 on Image Classification on CIFAR-10
1 code implementation • 18 Mar 2019 • Jean-Baptiste Cordonnier, Andreas Loukas
We consider the problem of path inference: given a path prefix, i. e., a partially observed sequence of nodes in a graph, we want to predict which nodes are in the missing suffix.
1 code implementation • NeurIPS 2018 • Sebastian U. Stich, Jean-Baptiste Cordonnier, Martin Jaggi
Huge scale machine learning problems are nowadays tackled by distributed optimization algorithms, i. e. algorithms that leverage the compute power of many devices for training.