no code implementations • EMNLP 2021 • Andrea Schioppa, David Vilar, Artem Sokolov, Katja Filippova
Fine-grained control of machine translation (MT) outputs along multiple attributes is critical for many modern MT applications and is a requirement for gaining users’ trust.
no code implementations • 6 Feb 2024 • Andrea Schioppa
Two important scenarios are training data attribution (tracing a model's behavior to the training data), where one needs to store a gradient for each training example, and the study of the spectrum of the Hessian (to analyze the training dynamics), where one needs to store multiple Hessian vector products.
no code implementations • 19 May 2023 • Andrea Schioppa, Xavier Garcia, Orhan Firat
The recent rapid progress in pre-training Large Language Models has relied on using self-supervised language modeling objectives like next token prediction or span corruption.
2 code implementations • 6 Dec 2021 • Andrea Schioppa, Polina Zablotskaia, David Vilar, Artem Sokolov
We address efficient calculation of influence functions for tracking predictions back to the training data.
1 code implementation • 17 Sep 2019 • Andrea Schioppa
We report on an open-source implementation for distributed function minimization on top of Apache Spark by using gradient and quasi-Newton methods.
1 code implementation • 4 Aug 2019 • Andrea Schioppa
We compare several approaches to learn an Optimal Map, represented as a neural network, between probability distributions.
no code implementations • 22 Oct 2018 • Andrea Schioppa
We study convergence properties of Stochastic Gradient Descent (SGD) for convex objectives without assumptions on smoothness or strict convexity.