1 code implementation • 4 Feb 2024 • Matteo Pagliardini, Amirkeivan Mohtashami, Francois Fleuret, Martin Jaggi
The transformer architecture by Vaswani et al. (2017) is now ubiquitous across application domains, from natural language processing to speech processing and image understanding.
1 code implementation • 27 Nov 2023 • Zeming Chen, Alejandro Hernández Cano, Angelika Romanou, Antoine Bonnet, Kyle Matoba, Francesco Salvi, Matteo Pagliardini, Simin Fan, Andreas Köpf, Amirkeivan Mohtashami, Alexandre Sallinen, Alireza Sakhaeirad, Vinitra Swamy, Igor Krawczuk, Deniz Bayazit, Axel Marmet, Syrielle Montariol, Mary-Anne Hartley, Martin Jaggi, Antoine Bosselut
Large language models (LLMs) can potentially democratize access to medical knowledge.
Ranked #1 on Multiple Choice Question Answering (MCQA) on MedMCQA (Dev Set (Acc-%) metric)
Conditional Text Generation Multiple Choice Question Answering (MCQA)
no code implementations • 23 Oct 2023 • Simin Fan, Matteo Pagliardini, Martin Jaggi
Moreover, aiming to generalize to out-of-domain target tasks, which is unseen in the pretraining corpus (OOD domain), DoGE can effectively identify inter-domain dependencies, and consistently achieves better test perplexity on the target domain.
no code implementations • 16 Oct 2023 • Amirkeivan Mohtashami, Matteo Pagliardini, Martin Jaggi
The race to continually develop ever larger and deeper foundational models is underway.
1 code implementation • 1 Jun 2023 • Matteo Pagliardini, Daniele Paliotta, Martin Jaggi, François Fleuret
While many works have proposed schemes to sparsify the attention patterns and reduce the computational overhead of self-attention, those are often limited by implementations concerns and end up imposing a simple and static structure over the attention matrix.
1 code implementation • 27 Oct 2022 • Tatjana Chavdarova, Matteo Pagliardini, Tong Yang, Michael I. Jordan
We prove its convergence and show that the gap function of the last iterate of this inexact-ACVI method decreases at a rate of $\mathcal{O}(\frac{1}{\sqrt{K}})$ when the operator is $L$-Lipschitz and monotone, provided that the errors decrease at appropriate rates.
no code implementations • 11 Feb 2022 • Matteo Pagliardini, Gilberto Manunza, Martin Jaggi, Michael I. Jordan, Tatjana Chavdarova
We show that UDP is guaranteed to achieve the maximum margin decision boundary on linear models and that it notably increases it on challenging simulated datasets.
1 code implementation • 9 Feb 2022 • Matteo Pagliardini, Martin Jaggi, François Fleuret, Sai Praneeth Karimireddy
This behavior can hinder the transferability of trained models by (i) favoring the learning of simpler but spurious features -- present in the training data but absent from the test data -- and (ii) by only leveraging a small subset of predictive features.
1 code implementation • 9 Dec 2021 • Yehao Liu, Matteo Pagliardini, Tatjana Chavdarova, Sebastian U. Stich
Secondly, we show on a 2D toy example that both BNNs and MCDropout do not give high uncertainty estimates on OOD samples.
no code implementations • 29 Sep 2021 • Matteo Pagliardini, Gilberto Manunza, Martin Jaggi, Tatjana Chavdarova
The deep learning models' sensitivity to small input perturbations raises security concerns and limits their use for applications where reliability is critical.
1 code implementation • ICLR 2021 • Tatjana Chavdarova, Matteo Pagliardini, Sebastian U. Stich, Francois Fleuret, Martin Jaggi
Generative Adversarial Networks are notoriously challenging to train.
1 code implementation • NAACL 2019 • Prakhar Gupta, Matteo Pagliardini, Martin Jaggi
Pre-trained word vectors are ubiquitous in Natural Language Processing applications.
5 code implementations • NAACL 2018 • Matteo Pagliardini, Prakhar Gupta, Martin Jaggi
The recent tremendous success of unsupervised word embeddings in a multitude of applications raises the obvious question if similar methods could be derived to improve embeddings (i. e. semantic representations) of word sequences as well.