Search Results for author: Matteo Pagliardini

Found 13 papers, 9 papers with code

DenseFormer: Enhancing Information Flow in Transformers via Depth Weighted Averaging

1 code implementation • 4 Feb 2024 • Matteo Pagliardini, Amirkeivan Mohtashami, Francois Fleuret, Martin Jaggi

The transformer architecture by Vaswani et al. (2017) is now ubiquitous across application domains, from natural language processing to speech processing and image understanding.

1,085

Paper
Code

MEDITRON-70B: Scaling Medical Pretraining for Large Language Models

1 code implementation • 27 Nov 2023 • Zeming Chen, Alejandro Hernández Cano, Angelika Romanou, Antoine Bonnet, Kyle Matoba, Francesco Salvi, Matteo Pagliardini, Simin Fan, Andreas Köpf, Amirkeivan Mohtashami, Alexandre Sallinen, Alireza Sakhaeirad, Vinitra Swamy, Igor Krawczuk, Deniz Bayazit, Axel Marmet, Syrielle Montariol, Mary-Anne Hartley, Martin Jaggi, Antoine Bosselut

Large language models (LLMs) can potentially democratize access to medical knowledge.

Ranked #1 on Multiple Choice Question Answering (MCQA) on MedMCQA (Dev Set (Acc-%) metric)

Conditional Text Generation Multiple Choice Question Answering (MCQA)

1,555

Paper
Code

DoGE: Domain Reweighting with Generalization Estimation

no code implementations • 23 Oct 2023 • Simin Fan, Matteo Pagliardini, Martin Jaggi

Moreover, aiming to generalize to out-of-domain target tasks, which is unseen in the pretraining corpus (OOD domain), DoGE can effectively identify inter-domain dependencies, and consistently achieves better test perplexity on the target domain.

Domain Generalization Language Modelling

Paper
Add Code

CoTFormer: More Tokens With Attention Make Up For Less Depth

no code implementations • 16 Oct 2023 • Amirkeivan Mohtashami, Matteo Pagliardini, Martin Jaggi

The race to continually develop ever larger and deeper foundational models is underway.

Paper
Add Code

Faster Causal Attention Over Large Sequences Through Sparse Flash Attention

1 code implementation • 1 Jun 2023 • Matteo Pagliardini, Daniele Paliotta, Martin Jaggi, François Fleuret

While many works have proposed schemes to sparsify the attention patterns and reduce the computational overhead of self-attention, those are often limited by implementations concerns and end up imposing a simple and static structure over the attention matrix.

16k 8k +1

116

Paper
Code

A Primal-dual Approach for Solving Variational Inequalities with General-form Constraints

1 code implementation • 27 Oct 2022 • Tatjana Chavdarova, Matteo Pagliardini, Tong Yang, Michael I. Jordan

We prove its convergence and show that the gap function of the last iterate of this inexact-ACVI method decreases at a rate of $\mathcal{O}(\frac{1}{\sqrt{K}})$ when the operator is $L$-Lipschitz and monotone, provided that the errors decrease at appropriate rates.

Paper
Code

Improving Generalization via Uncertainty Driven Perturbations

no code implementations • 11 Feb 2022 • Matteo Pagliardini, Gilberto Manunza, Martin Jaggi, Michael I. Jordan, Tatjana Chavdarova

We show that UDP is guaranteed to achieve the maximum margin decision boundary on linear models and that it notably increases it on challenging simulated datasets.

Paper
Add Code

Agree to Disagree: Diversity through Disagreement for Better Transferability

1 code implementation • 9 Feb 2022 • Matteo Pagliardini, Martin Jaggi, François Fleuret, Sai Praneeth Karimireddy

This behavior can hinder the transferability of trained models by (i) favoring the learning of simpler but spurious features -- present in the training data but absent from the test data -- and (ii) by only leveraging a small subset of predictive features.

Out of Distribution (OOD) Detection

Paper
Code

The Peril of Popular Deep Learning Uncertainty Estimation Methods

1 code implementation • 9 Dec 2021 • Yehao Liu, Matteo Pagliardini, Tatjana Chavdarova, Sebastian U. Stich

Secondly, we show on a 2D toy example that both BNNs and MCDropout do not give high uncertainty estimates on OOD samples.

Paper
Code

Improved Generalization-Robustness Trade-off via Uncertainty Targeted Attacks

no code implementations • 29 Sep 2021 • Matteo Pagliardini, Gilberto Manunza, Martin Jaggi, Tatjana Chavdarova

The deep learning models' sensitivity to small input perturbations raises security concerns and limits their use for applications where reliability is critical.

Paper
Add Code

Taming GANs with Lookahead-Minmax

1 code implementation • ICLR 2021 • Tatjana Chavdarova, Matteo Pagliardini, Sebastian U. Stich, Francois Fleuret, Martin Jaggi

Generative Adversarial Networks are notoriously challenging to train.

Paper
Code

Better Word Embeddings by Disentangling Contextual n-Gram Information

1 code implementation • NAACL 2019 • Prakhar Gupta, Matteo Pagliardini, Martin Jaggi

Pre-trained word vectors are ubiquitous in Natural Language Processing applications.

Word Embeddings

1,187

Paper
Code

Unsupervised Learning of Sentence Embeddings using Compositional n-Gram Features

5 code implementations • NAACL 2018 • Matteo Pagliardini, Prakhar Gupta, Martin Jaggi

The recent tremendous success of unsupervised word embeddings in a multitude of applications raises the obvious question if similar methods could be derived to improve embeddings (i. e. semantic representations) of word sequences as well.

Sentence Sentence Embeddings +1

1,187

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.