Search Results for author: Francesco Orabona

Found 55 papers, 11 papers with code

Better-than-KL PAC-Bayes Bounds

no code implementations • 14 Feb 2024 • Ilja Kuzborskij, Kwang-Sung Jun, Yulian Wu, Kyoungseok Jang, Francesco Orabona

In this paper, we consider the problem of proving concentration inequalities to estimate the mean of the sequence.

Paper
Add Code

Towards Training Without Depth Limits: Batch Normalization Without Gradient Explosion

1 code implementation • 3 Oct 2023 • Alexandru Meterez, Amir Joudaki, Francesco Orabona, Alexander Immer, Gunnar Rätsch, Hadi Daneshmand

We answer this question in the affirmative by giving a particular construction of an Multi-Layer Perceptron (MLP) with linear activations and batch-normalization that provably has bounded gradients at any depth.

Paper
Code

Normalized Gradients for All

no code implementations • 10 Aug 2023 • Francesco Orabona

In this short note, I show how to adapt to H\"{o}lder smoothness using normalized gradients in a black-box way.

Paper
Add Code

Implicit Interpretation of Importance Weight Aware Updates

no code implementations • 22 Jul 2023 • Keyi Chen, Francesco Orabona

Due to its speed and simplicity, subgradient descent is one of the most used optimization algorithms in convex machine learning algorithms.

Paper
Add Code

Generalized Implicit Follow-The-Regularized-Leader

no code implementations • 31 May 2023 • Keyi Chen, Francesco Orabona

We propose a new class of online learning algorithms, generalized implicit Follow-The-Regularized-Leader (FTRL), that expands the scope of FTRL framework.

Paper
Add Code

Tighter PAC-Bayes Bounds Through Coin-Betting

no code implementations • 12 Feb 2023 • Kyoungseok Jang, Kwang-Sung Jun, Ilja Kuzborskij, Francesco Orabona

We consider the problem of estimating the mean of a sequence of random elements $f(X_1, \theta)$ $, \ldots, $ $f(X_n, \theta)$ where $f$ is a fixed scalar function, $S=(X_1, \ldots, X_n)$ are independent random variables, and $\theta$ is a possibly $S$-dependent parameter.

Paper
Add Code

Optimal Stochastic Non-smooth Non-convex Optimization through Online-to-Non-convex Conversion

no code implementations • 7 Feb 2023 • Ashok Cutkosky, Harsh Mehta, Francesco Orabona

Our primary technique is a reduction from non-smooth non-convex optimization to online learning, after which our results follow from standard regret bounds in online learning.

Paper
Add Code

Robustness to Unbounded Smoothness of Generalized SignSGD

no code implementations • 23 Aug 2022 • Michael Crawshaw, Mingrui Liu, Francesco Orabona, Wei zhang, Zhenxun Zhuang

We also compare these algorithms with popular optimizers on a set of deep learning tasks, observing that we can match the performance of Adam while beating the others.

Paper
Add Code

Implicit Parameter-free Online Learning with Truncated Linear Models

no code implementations • 19 Mar 2022 • Keyi Chen, Ashok Cutkosky, Francesco Orabona

Parameter-free algorithms are online learning algorithms that do not require setting learning rates.

Stochastic Optimization

Paper
Add Code

Understanding AdamW through Proximal Methods and Scale-Freeness

1 code implementation • 31 Jan 2022 • Zhenxun Zhuang, Mingrui Liu, Ashok Cutkosky, Francesco Orabona

First, we show how to re-interpret AdamW as an approximation of a proximal gradient method, which takes advantage of the closed-form proximal mapping of the regularizer instead of only utilizing its gradient information as in Adam-$\ell_2$.

Paper
Code

Tight Concentrations and Confidence Sequences from the Regret of Universal Portfolio

1 code implementation • 27 Oct 2021 • Francesco Orabona, Kwang-Sung Jun

A classic problem in statistics is the estimation of the expectation of random variables from samples.

Paper
Code

Minimax Optimal Quantile and Semi-Adversarial Regret via Root-Logarithmic Regularizers

1 code implementation • NeurIPS 2021 • Jeffrey Negrea, Blair Bilodeau, Nicolò Campolongo, Francesco Orabona, Daniel M. Roy

Quantile (and, more generally, KL) regret bounds, such as those achieved by NormalHedge (Chaudhuri, Freund, and Hsu 2009) and its variants, relax the goal of competing against the best individual expert to only competing against a majority of experts on adversarial data.

Paper
Code

Online Learning with Optimism and Delay

1 code implementation • 13 Jun 2021 • Genevieve Flaspohler, Francesco Orabona, Judah Cohen, Soukayna Mouatadid, Miruna Oprescu, Paulo Orenstein, Lester Mackey

Inspired by the demands of real-time climate and weather forecasting, we develop optimistic online learning algorithms that require no parameter tuning and have optimal regret guarantees under delayed feedback.

Benchmarking Weather Forecasting

Paper
Code

On the Initialization for Convex-Concave Min-max Problems

no code implementations • 27 Feb 2021 • Mingrui Liu, Francesco Orabona

This means that the convergence speed does not have any improvement even if the algorithm starts from the optimal solution, and hence, is oblivious to the initialization.

Paper
Add Code

A closer look at temporal variability in dynamic online learning

no code implementations • 15 Feb 2021 • Nicolò Campolongo, Francesco Orabona

Our proposed algorithm is adaptive not only to the temporal variability of the loss functions, but also to the path length of the sequence of comparators when an upper bound is known.

Paper
Add Code

On the Last Iterate Convergence of Momentum Methods

no code implementations • 13 Feb 2021 • Xiaoyu Li, Mingrui Liu, Francesco Orabona

In this paper, we focus on the convergence rate of the last iterate of SGDM.

Stochastic Optimization

Paper
Add Code

Parameter-free Stochastic Optimization of Variationally Coherent Functions

no code implementations • 30 Jan 2021 • Francesco Orabona, Dávid Pál

We design and analyze an algorithm for first-order stochastic optimization of a large class of functions on $\mathbb{R}^d$.

Stochastic Optimization

Paper
Add Code

Adam$^+$: A Stochastic Method with Adaptive Variance Reduction

no code implementations • 24 Nov 2020 • Mingrui Liu, Wei zhang, Francesco Orabona, Tianbao Yang

As a result, Adam$^+$ requires few parameter tuning, as Adam, but it enjoys a provable convergence guarantee.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Paper
Add Code

A High Probability Analysis of Adaptive SGD with Momentum

no code implementations • 28 Jul 2020 • Xiaoyu Li, Francesco Orabona

We use it to prove for the first time the convergence of the gradients to zero in high probability in the smooth nonconvex setting for Delayed AdaGrad with momentum.

Vocal Bursts Intensity Prediction

Paper
Add Code

Better Parameter-free Stochastic Optimization with ODE Updates for Coin-Betting

no code implementations • 12 Jun 2020 • Keyi Chen, John Langford, Francesco Orabona

Parameter-free stochastic gradient descent (PFSGD) algorithms do not require setting learning rates while achieving optimal theoretical performance.

Stochastic Optimization

Paper
Add Code

Temporal Variability in Implicit Online Learning

no code implementations • NeurIPS 2020 • Nicolò Campolongo, Francesco Orabona

We prove a novel static regret bound that depends on the temporal variability of the sequence of loss functions, a quantity which is often encountered when considering dynamic competitors.

Paper
Add Code

A Second look at Exponential and Cosine Step Sizes: Simplicity, Adaptivity, and Performance

2 code implementations • 12 Feb 2020 • Xiaoyu Li, Zhenxun Zhuang, Francesco Orabona

Moreover, we show the surprising property that these two strategies are \emph{adaptive} to the noise level in the stochastic gradients of PL functions.

Stochastic Optimization

Paper
Code

A Modern Introduction to Online Learning

1 code implementation • 31 Dec 2019 • Francesco Orabona

I present first-order and second-order algorithms for online learning with convex losses, in Euclidean and non-Euclidean settings.

Multi-Armed Bandits

Paper
Code

Parameter-Free Locally Differentially Private Stochastic Subgradient Descent

no code implementations • 21 Nov 2019 • Kwang-Sung Jun, Francesco Orabona

We consider the problem of minimizing a convex risk with stochastic subgradients guaranteeing $\epsilon$-locally differentially private ($\epsilon$-LDP).

Stochastic Optimization

Paper
Add Code

Kernel Truncated Randomized Ridge Regression: Optimal Rates and Low Noise Acceleration

no code implementations • NeurIPS 2019 • Kwang-Sung Jun, Ashok Cutkosky, Francesco Orabona

In this paper, we consider the nonparametric least square regression in a Reproducing Kernel Hilbert Space (RKHS).

regression

Paper
Add Code

Momentum-Based Variance Reduction in Non-Convex SGD

2 code implementations • NeurIPS 2019 • Ashok Cutkosky, Francesco Orabona

Variance reduction has emerged in recent years as a strong competitor to stochastic gradient descent in non-convex problems, providing the first algorithms to improve upon the converge rate of stochastic gradient descent for finding first-order critical points.

32,835

Paper
Code

Parameter-Free Online Convex Optimization with Sub-Exponential Noise

no code implementations • 5 Feb 2019 • Kwang-Sung Jun, Francesco Orabona

We show that BANCO achieves the optimal regret rate in our problem.

Paper
Add Code

Surrogate Losses for Online Learning of Stepsizes in Stochastic Non-Convex Optimization

1 code implementation • 25 Jan 2019 • Zhenxun Zhuang, Ashok Cutkosky, Francesco Orabona

Stochastic Gradient Descent (SGD) has played a central role in machine learning.

Stochastic Optimization

Paper
Code

On the Convergence of Stochastic Gradient Descent with Adaptive Stepsizes

no code implementations • 21 May 2018 • Xiaoyu Li, Francesco Orabona

In this paper, we start closing this gap: we theoretically analyze in the convex and non-convex settings a generalized version of the AdaGrad stepsizes.

Paper
Add Code

Black-Box Reductions for Parameter-free Online Learning in Banach Spaces

no code implementations • 17 Feb 2018 • Ashok Cutkosky, Francesco Orabona

We introduce several new black-box reductions that significantly improve the design of adaptive and parameter-free online learning algorithms by simplifying analysis, improving regret guarantees, and sometimes even improving runtime.

Paper
Add Code

Online Learning for Changing Environments using Coin Betting

no code implementations • 6 Nov 2017 • Kwang-Sung Jun, Francesco Orabona, Stephen Wright, Rebecca Willett

A key challenge in online learning is that classical algorithms can be slow to adapt to changing environments.

Metric Learning

Paper
Add Code

Efficient Online Bandit Multiclass Learning with O(sqrt{T}) Regret

no code implementations • ICML 2017 • Alina Beygelzimer, Francesco Orabona, Chicheng Zhang

An efficient bandit algorithm for $\sqrt{T}$-regret in online multiclass prediction?

Paper
Add Code

Training Deep Networks without Learning Rates Through Coin Betting

6 code implementations • NeurIPS 2017 • Francesco Orabona, Tatiana Tommasi

Instead, we reduce the optimization process to a game of betting on a coin and propose a learning-rate-free optimal algorithm for this scenario.

Ranked #1 on Stochastic Optimization on MNIST

Stochastic Optimization

1,677

Paper
Code

Efficient Online Bandit Multiclass Learning with $\tilde{O}(\sqrt{T})$ Regret

no code implementations • 25 Feb 2017 • Alina Beygelzimer, Francesco Orabona, Chicheng Zhang

The regret bound holds simultaneously with respect to a family of loss functions parameterized by $\eta$, for a range of $\eta$ restricted by the norm of the competitor.

Paper
Add Code

Improved Strongly Adaptive Online Learning using Coin Betting

no code implementations • 14 Oct 2016 • Kwang-Sung Jun, Francesco Orabona, Rebecca Willett, Stephen Wright

This paper describes a new parameter-free online learning algorithm for changing environments.

Metric Learning

Paper
Add Code

Coin Betting and Parameter-Free Online Learning

1 code implementation • NeurIPS 2016 • Francesco Orabona, Dávid Pál

We present a new intuitive framework to design parameter-free algorithms for \emph{both} online linear optimization over Hilbert spaces and for learning with expert advice, based on reductions to betting on outcomes of adversarial coins.

Paper
Code

High Dimensional Inference with Random Maximum A-Posteriori Perturbations

no code implementations • 10 Feb 2016 • Tamir Hazan, Francesco Orabona, Anand D. Sarwate, Subhransu Maji, Tommi Jaakkola

This paper shows that the expected value of perturb-max inference with low dimensional perturbations can be used sequentially to generate unbiased samples from the Gibbs distribution.

Vocal Bursts Intensity Prediction

Paper
Add Code

Solving Ridge Regression using Sketched Preconditioned SVRG

no code implementations • 7 Feb 2016 • Alon Gonen, Francesco Orabona, Shai Shalev-Shwartz

We develop a novel preconditioning method for ridge regression, based on recent linear sketching methods.

regression

Paper
Add Code

Scale-Free Online Learning

no code implementations • 8 Jan 2016 • Francesco Orabona, Dávid Pál

We design and analyze algorithms for online linear optimization that have optimal regret and at the same time do not need to know any upper or lower bounds on the norm of the loss vectors.

Paper
Add Code

Optimal Non-Asymptotic Lower Bound on the Minimax Regret of Learning with Expert Advice

no code implementations • 6 Nov 2015 • Francesco Orabona, David Pal

We prove non-asymptotic lower bounds on the expectation of the maximum of $d$ independent Gaussian variables and the expectation of the maximum of $d$ independent symmetric random walks.

Paper
Add Code

The ABACOC Algorithm: a Novel Approach for Nonparametric Classification of Data Streams

no code implementations • 20 Aug 2015 • Rocco De Rosa, Francesco Orabona, Nicolò Cesa-Bianchi

Stream mining poses unique challenges to machine learning: predictive models are required to be scalable, incrementally trainable, must remain bounded in size (even when the data stream is arbitrarily long), and be nonparametric in order to achieve high accuracy even in complex and dynamic environments.

General Classification

Paper
Add Code

Scale-Free Algorithms for Online Linear Optimization

no code implementations • 19 Feb 2015 • Francesco Orabona, David Pal

We design algorithms for online linear optimization that have optimal regret and at the same time do not need to know any upper or lower bounds on the norm of the loss vectors.

Paper
Add Code

A Simple Expression for Mill's Ratio of the Student's $t$-Distribution

no code implementations • 5 Feb 2015 • Francesco Orabona

I show a simple expression of the Mill's ratio of the Student's t-Distribution.

Paper
Add Code

Fast Rates by Transferring from Auxiliary Hypotheses

no code implementations • 4 Dec 2014 • Ilja Kuzborskij, Francesco Orabona

In this work we consider the learning setting where, in addition to the training set, the learner receives a collection of auxiliary hypotheses originating from other tasks.

Paper
Add Code

Scalable Greedy Algorithms for Transfer Learning

no code implementations • 6 Aug 2014 • Ilja Kuzborskij, Francesco Orabona, Barbara Caputo

In this paper we consider the binary transfer learning problem, focusing on how to select and combine sources from a large pool to yield a good performance on a target task.

feature selection Transfer Learning

Paper
Add Code

Simultaneous Model Selection and Optimization through Parameter-free Stochastic Learning

no code implementations • NeurIPS 2014 • Francesco Orabona

Stochastic gradient descent algorithms for training linear and kernel predictors are gaining more and more importance, thanks to their scalability.

Learning Theory Model Selection

Paper
Add Code

Unconstrained Online Linear Learning in Hilbert Spaces: Minimax Algorithms and Normal Approximations

no code implementations • 3 Mar 2014 • H. Brendan McMahan, Francesco Orabona

When $T$ is known, we derive an algorithm with an optimal regret bound (up to constant factors).

Paper
Add Code

Regression-tree Tuning in a Streaming Setting

no code implementations • NeurIPS 2013 • Samory Kpotufe, Francesco Orabona

We consider the problem of maintaining the data-structures of a partition-based regression procedure in a setting where the training data arrives sequentially over time.

regression

Paper
Add Code

Dimension-Free Exponentiated Gradient

no code implementations • NeurIPS 2013 • Francesco Orabona

We present a new online learning algorithm that extends the exponentiated gradient to infinite dimensional spaces.

Paper
Add Code

On Measure Concentration of Random Maximum A-Posteriori Perturbations

no code implementations • 15 Oct 2013 • Francesco Orabona, Tamir Hazan, Anand D. Sarwate, Tommi Jaakkola

Applying the general result to MAP perturbations can yield a more efficient algorithm to approximate sampling from the Gibbs distribution.

Paper
Add Code

From N to N+1: Multiclass Transfer Incremental Learning

no code implementations • CVPR 2013 • Ilja Kuzborskij, Francesco Orabona, Barbara Caputo

Since the seminal work of Thrun [17], the learning to learn paradigm has been defined as the ability of an agent to improve its performance at each task with experience, with the number of tasks.

Incremental Learning Object Categorization +1

Paper
Add Code

A Generalized Online Mirror Descent with Applications to Classification and Regression

no code implementations • 10 Apr 2013 • Francesco Orabona, Koby Crammer, Nicolò Cesa-Bianchi

A unifying perspective on the design and the analysis of online algorithms is provided by online mirror descent, a general prediction strategy from which most first-order algorithms can be obtained as special cases.

General Classification regression

Paper
Add Code

On Multilabel Classification and Ranking with Partial Feedback

no code implementations • NeurIPS 2012 • Claudio Gentile, Francesco Orabona

We present a novel multilabel/ranking algorithm working in partial information settings.

Classification General Classification

Paper
Add Code

Learning from Candidate Labeling Sets

no code implementations • NeurIPS 2010 • Jie Luo, Francesco Orabona

In this paper, we propose a semi-supervised framework to model this kind of problems.

Paper
Add Code

New Adaptive Algorithms for Online Classification

no code implementations • NeurIPS 2010 • Francesco Orabona, Koby Crammer

We propose a general framework to online learning for classification problems with time-varying potential functions in the adversarial setting.

Classification General Classification

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.