Search Results for author: Matteo Pagliardini

Found 13 papers, 9 papers with code

DenseFormer: Enhancing Information Flow in Transformers via Depth Weighted Averaging

1 code implementation4 Feb 2024 Matteo Pagliardini, Amirkeivan Mohtashami, Francois Fleuret, Martin Jaggi

The transformer architecture by Vaswani et al. (2017) is now ubiquitous across application domains, from natural language processing to speech processing and image understanding.

DoGE: Domain Reweighting with Generalization Estimation

no code implementations23 Oct 2023 Simin Fan, Matteo Pagliardini, Martin Jaggi

Moreover, aiming to generalize to out-of-domain target tasks, which is unseen in the pretraining corpus (OOD domain), DoGE can effectively identify inter-domain dependencies, and consistently achieves better test perplexity on the target domain.

Domain Generalization Language Modelling

CoTFormer: More Tokens With Attention Make Up For Less Depth

no code implementations16 Oct 2023 Amirkeivan Mohtashami, Matteo Pagliardini, Martin Jaggi

The race to continually develop ever larger and deeper foundational models is underway.

Faster Causal Attention Over Large Sequences Through Sparse Flash Attention

1 code implementation1 Jun 2023 Matteo Pagliardini, Daniele Paliotta, Martin Jaggi, François Fleuret

While many works have proposed schemes to sparsify the attention patterns and reduce the computational overhead of self-attention, those are often limited by implementations concerns and end up imposing a simple and static structure over the attention matrix.

16k 8k +1

A Primal-dual Approach for Solving Variational Inequalities with General-form Constraints

1 code implementation27 Oct 2022 Tatjana Chavdarova, Matteo Pagliardini, Tong Yang, Michael I. Jordan

We prove its convergence and show that the gap function of the last iterate of this inexact-ACVI method decreases at a rate of $\mathcal{O}(\frac{1}{\sqrt{K}})$ when the operator is $L$-Lipschitz and monotone, provided that the errors decrease at appropriate rates.

Improving Generalization via Uncertainty Driven Perturbations

no code implementations11 Feb 2022 Matteo Pagliardini, Gilberto Manunza, Martin Jaggi, Michael I. Jordan, Tatjana Chavdarova

We show that UDP is guaranteed to achieve the maximum margin decision boundary on linear models and that it notably increases it on challenging simulated datasets.

Agree to Disagree: Diversity through Disagreement for Better Transferability

1 code implementation9 Feb 2022 Matteo Pagliardini, Martin Jaggi, François Fleuret, Sai Praneeth Karimireddy

This behavior can hinder the transferability of trained models by (i) favoring the learning of simpler but spurious features -- present in the training data but absent from the test data -- and (ii) by only leveraging a small subset of predictive features.

Out of Distribution (OOD) Detection

The Peril of Popular Deep Learning Uncertainty Estimation Methods

1 code implementation9 Dec 2021 Yehao Liu, Matteo Pagliardini, Tatjana Chavdarova, Sebastian U. Stich

Secondly, we show on a 2D toy example that both BNNs and MCDropout do not give high uncertainty estimates on OOD samples.

Improved Generalization-Robustness Trade-off via Uncertainty Targeted Attacks

no code implementations29 Sep 2021 Matteo Pagliardini, Gilberto Manunza, Martin Jaggi, Tatjana Chavdarova

The deep learning models' sensitivity to small input perturbations raises security concerns and limits their use for applications where reliability is critical.

Unsupervised Learning of Sentence Embeddings using Compositional n-Gram Features

5 code implementations NAACL 2018 Matteo Pagliardini, Prakhar Gupta, Martin Jaggi

The recent tremendous success of unsupervised word embeddings in a multitude of applications raises the obvious question if similar methods could be derived to improve embeddings (i. e. semantic representations) of word sequences as well.

Sentence Sentence Embeddings +1

Cannot find the paper you are looking for? You can Submit a new open access paper.