Search Results for author: Francesco Orabona

Found 55 papers, 11 papers with code

Better-than-KL PAC-Bayes Bounds

no code implementations14 Feb 2024 Ilja Kuzborskij, Kwang-Sung Jun, Yulian Wu, Kyoungseok Jang, Francesco Orabona

In this paper, we consider the problem of proving concentration inequalities to estimate the mean of the sequence.

Inductive Bias

Towards Training Without Depth Limits: Batch Normalization Without Gradient Explosion

1 code implementation3 Oct 2023 Alexandru Meterez, Amir Joudaki, Francesco Orabona, Alexander Immer, Gunnar Rätsch, Hadi Daneshmand

We answer this question in the affirmative by giving a particular construction of an Multi-Layer Perceptron (MLP) with linear activations and batch-normalization that provably has bounded gradients at any depth.

Normalized Gradients for All

no code implementations10 Aug 2023 Francesco Orabona

In this short note, I show how to adapt to H\"{o}lder smoothness using normalized gradients in a black-box way.

Implicit Interpretation of Importance Weight Aware Updates

no code implementations22 Jul 2023 Keyi Chen, Francesco Orabona

Due to its speed and simplicity, subgradient descent is one of the most used optimization algorithms in convex machine learning algorithms.

Generalized Implicit Follow-The-Regularized-Leader

no code implementations31 May 2023 Keyi Chen, Francesco Orabona

We propose a new class of online learning algorithms, generalized implicit Follow-The-Regularized-Leader (FTRL), that expands the scope of FTRL framework.

Tighter PAC-Bayes Bounds Through Coin-Betting

no code implementations12 Feb 2023 Kyoungseok Jang, Kwang-Sung Jun, Ilja Kuzborskij, Francesco Orabona

We consider the problem of estimating the mean of a sequence of random elements $f(X_1, \theta)$ $, \ldots, $ $f(X_n, \theta)$ where $f$ is a fixed scalar function, $S=(X_1, \ldots, X_n)$ are independent random variables, and $\theta$ is a possibly $S$-dependent parameter.

Optimal Stochastic Non-smooth Non-convex Optimization through Online-to-Non-convex Conversion

no code implementations7 Feb 2023 Ashok Cutkosky, Harsh Mehta, Francesco Orabona

Our primary technique is a reduction from non-smooth non-convex optimization to online learning, after which our results follow from standard regret bounds in online learning.

Robustness to Unbounded Smoothness of Generalized SignSGD

no code implementations23 Aug 2022 Michael Crawshaw, Mingrui Liu, Francesco Orabona, Wei zhang, Zhenxun Zhuang

We also compare these algorithms with popular optimizers on a set of deep learning tasks, observing that we can match the performance of Adam while beating the others.

Implicit Parameter-free Online Learning with Truncated Linear Models

no code implementations19 Mar 2022 Keyi Chen, Ashok Cutkosky, Francesco Orabona

Parameter-free algorithms are online learning algorithms that do not require setting learning rates.

Stochastic Optimization

Understanding AdamW through Proximal Methods and Scale-Freeness

1 code implementation31 Jan 2022 Zhenxun Zhuang, Mingrui Liu, Ashok Cutkosky, Francesco Orabona

First, we show how to re-interpret AdamW as an approximation of a proximal gradient method, which takes advantage of the closed-form proximal mapping of the regularizer instead of only utilizing its gradient information as in Adam-$\ell_2$.

Tight Concentrations and Confidence Sequences from the Regret of Universal Portfolio

1 code implementation27 Oct 2021 Francesco Orabona, Kwang-Sung Jun

A classic problem in statistics is the estimation of the expectation of random variables from samples.

Minimax Optimal Quantile and Semi-Adversarial Regret via Root-Logarithmic Regularizers

1 code implementation NeurIPS 2021 Jeffrey Negrea, Blair Bilodeau, Nicolò Campolongo, Francesco Orabona, Daniel M. Roy

Quantile (and, more generally, KL) regret bounds, such as those achieved by NormalHedge (Chaudhuri, Freund, and Hsu 2009) and its variants, relax the goal of competing against the best individual expert to only competing against a majority of experts on adversarial data.

Online Learning with Optimism and Delay

1 code implementation13 Jun 2021 Genevieve Flaspohler, Francesco Orabona, Judah Cohen, Soukayna Mouatadid, Miruna Oprescu, Paulo Orenstein, Lester Mackey

Inspired by the demands of real-time climate and weather forecasting, we develop optimistic online learning algorithms that require no parameter tuning and have optimal regret guarantees under delayed feedback.

Benchmarking Weather Forecasting

On the Initialization for Convex-Concave Min-max Problems

no code implementations27 Feb 2021 Mingrui Liu, Francesco Orabona

This means that the convergence speed does not have any improvement even if the algorithm starts from the optimal solution, and hence, is oblivious to the initialization.

A closer look at temporal variability in dynamic online learning

no code implementations15 Feb 2021 Nicolò Campolongo, Francesco Orabona

Our proposed algorithm is adaptive not only to the temporal variability of the loss functions, but also to the path length of the sequence of comparators when an upper bound is known.

On the Last Iterate Convergence of Momentum Methods

no code implementations13 Feb 2021 Xiaoyu Li, Mingrui Liu, Francesco Orabona

In this paper, we focus on the convergence rate of the last iterate of SGDM.

Stochastic Optimization

Parameter-free Stochastic Optimization of Variationally Coherent Functions

no code implementations30 Jan 2021 Francesco Orabona, Dávid Pál

We design and analyze an algorithm for first-order stochastic optimization of a large class of functions on $\mathbb{R}^d$.

Stochastic Optimization

A High Probability Analysis of Adaptive SGD with Momentum

no code implementations28 Jul 2020 Xiaoyu Li, Francesco Orabona

We use it to prove for the first time the convergence of the gradients to zero in high probability in the smooth nonconvex setting for Delayed AdaGrad with momentum.

Vocal Bursts Intensity Prediction

Better Parameter-free Stochastic Optimization with ODE Updates for Coin-Betting

no code implementations12 Jun 2020 Keyi Chen, John Langford, Francesco Orabona

Parameter-free stochastic gradient descent (PFSGD) algorithms do not require setting learning rates while achieving optimal theoretical performance.

Stochastic Optimization

Temporal Variability in Implicit Online Learning

no code implementations NeurIPS 2020 Nicolò Campolongo, Francesco Orabona

We prove a novel static regret bound that depends on the temporal variability of the sequence of loss functions, a quantity which is often encountered when considering dynamic competitors.

A Second look at Exponential and Cosine Step Sizes: Simplicity, Adaptivity, and Performance

2 code implementations12 Feb 2020 Xiaoyu Li, Zhenxun Zhuang, Francesco Orabona

Moreover, we show the surprising property that these two strategies are \emph{adaptive} to the noise level in the stochastic gradients of PL functions.

Stochastic Optimization

A Modern Introduction to Online Learning

1 code implementation31 Dec 2019 Francesco Orabona

I present first-order and second-order algorithms for online learning with convex losses, in Euclidean and non-Euclidean settings.

Multi-Armed Bandits

Parameter-Free Locally Differentially Private Stochastic Subgradient Descent

no code implementations21 Nov 2019 Kwang-Sung Jun, Francesco Orabona

We consider the problem of minimizing a convex risk with stochastic subgradients guaranteeing $\epsilon$-locally differentially private ($\epsilon$-LDP).

Stochastic Optimization

Momentum-Based Variance Reduction in Non-Convex SGD

2 code implementations NeurIPS 2019 Ashok Cutkosky, Francesco Orabona

Variance reduction has emerged in recent years as a strong competitor to stochastic gradient descent in non-convex problems, providing the first algorithms to improve upon the converge rate of stochastic gradient descent for finding first-order critical points.

On the Convergence of Stochastic Gradient Descent with Adaptive Stepsizes

no code implementations21 May 2018 Xiaoyu Li, Francesco Orabona

In this paper, we start closing this gap: we theoretically analyze in the convex and non-convex settings a generalized version of the AdaGrad stepsizes.

Black-Box Reductions for Parameter-free Online Learning in Banach Spaces

no code implementations17 Feb 2018 Ashok Cutkosky, Francesco Orabona

We introduce several new black-box reductions that significantly improve the design of adaptive and parameter-free online learning algorithms by simplifying analysis, improving regret guarantees, and sometimes even improving runtime.

Online Learning for Changing Environments using Coin Betting

no code implementations6 Nov 2017 Kwang-Sung Jun, Francesco Orabona, Stephen Wright, Rebecca Willett

A key challenge in online learning is that classical algorithms can be slow to adapt to changing environments.

Metric Learning

Training Deep Networks without Learning Rates Through Coin Betting

6 code implementations NeurIPS 2017 Francesco Orabona, Tatiana Tommasi

Instead, we reduce the optimization process to a game of betting on a coin and propose a learning-rate-free optimal algorithm for this scenario.

Stochastic Optimization

Efficient Online Bandit Multiclass Learning with $\tilde{O}(\sqrt{T})$ Regret

no code implementations25 Feb 2017 Alina Beygelzimer, Francesco Orabona, Chicheng Zhang

The regret bound holds simultaneously with respect to a family of loss functions parameterized by $\eta$, for a range of $\eta$ restricted by the norm of the competitor.

Improved Strongly Adaptive Online Learning using Coin Betting

no code implementations14 Oct 2016 Kwang-Sung Jun, Francesco Orabona, Rebecca Willett, Stephen Wright

This paper describes a new parameter-free online learning algorithm for changing environments.

Metric Learning

Coin Betting and Parameter-Free Online Learning

1 code implementation NeurIPS 2016 Francesco Orabona, Dávid Pál

We present a new intuitive framework to design parameter-free algorithms for \emph{both} online linear optimization over Hilbert spaces and for learning with expert advice, based on reductions to betting on outcomes of adversarial coins.

High Dimensional Inference with Random Maximum A-Posteriori Perturbations

no code implementations10 Feb 2016 Tamir Hazan, Francesco Orabona, Anand D. Sarwate, Subhransu Maji, Tommi Jaakkola

This paper shows that the expected value of perturb-max inference with low dimensional perturbations can be used sequentially to generate unbiased samples from the Gibbs distribution.

Vocal Bursts Intensity Prediction

Solving Ridge Regression using Sketched Preconditioned SVRG

no code implementations7 Feb 2016 Alon Gonen, Francesco Orabona, Shai Shalev-Shwartz

We develop a novel preconditioning method for ridge regression, based on recent linear sketching methods.

regression

Scale-Free Online Learning

no code implementations8 Jan 2016 Francesco Orabona, Dávid Pál

We design and analyze algorithms for online linear optimization that have optimal regret and at the same time do not need to know any upper or lower bounds on the norm of the loss vectors.

Optimal Non-Asymptotic Lower Bound on the Minimax Regret of Learning with Expert Advice

no code implementations6 Nov 2015 Francesco Orabona, David Pal

We prove non-asymptotic lower bounds on the expectation of the maximum of $d$ independent Gaussian variables and the expectation of the maximum of $d$ independent symmetric random walks.

The ABACOC Algorithm: a Novel Approach for Nonparametric Classification of Data Streams

no code implementations20 Aug 2015 Rocco De Rosa, Francesco Orabona, Nicolò Cesa-Bianchi

Stream mining poses unique challenges to machine learning: predictive models are required to be scalable, incrementally trainable, must remain bounded in size (even when the data stream is arbitrarily long), and be nonparametric in order to achieve high accuracy even in complex and dynamic environments.

General Classification

Scale-Free Algorithms for Online Linear Optimization

no code implementations19 Feb 2015 Francesco Orabona, David Pal

We design algorithms for online linear optimization that have optimal regret and at the same time do not need to know any upper or lower bounds on the norm of the loss vectors.

A Simple Expression for Mill's Ratio of the Student's $t$-Distribution

no code implementations5 Feb 2015 Francesco Orabona

I show a simple expression of the Mill's ratio of the Student's t-Distribution.

Fast Rates by Transferring from Auxiliary Hypotheses

no code implementations4 Dec 2014 Ilja Kuzborskij, Francesco Orabona

In this work we consider the learning setting where, in addition to the training set, the learner receives a collection of auxiliary hypotheses originating from other tasks.

Scalable Greedy Algorithms for Transfer Learning

no code implementations6 Aug 2014 Ilja Kuzborskij, Francesco Orabona, Barbara Caputo

In this paper we consider the binary transfer learning problem, focusing on how to select and combine sources from a large pool to yield a good performance on a target task.

feature selection Transfer Learning

Simultaneous Model Selection and Optimization through Parameter-free Stochastic Learning

no code implementations NeurIPS 2014 Francesco Orabona

Stochastic gradient descent algorithms for training linear and kernel predictors are gaining more and more importance, thanks to their scalability.

Learning Theory Model Selection

Unconstrained Online Linear Learning in Hilbert Spaces: Minimax Algorithms and Normal Approximations

no code implementations3 Mar 2014 H. Brendan McMahan, Francesco Orabona

When $T$ is known, we derive an algorithm with an optimal regret bound (up to constant factors).

Regression-tree Tuning in a Streaming Setting

no code implementations NeurIPS 2013 Samory Kpotufe, Francesco Orabona

We consider the problem of maintaining the data-structures of a partition-based regression procedure in a setting where the training data arrives sequentially over time.

regression

Dimension-Free Exponentiated Gradient

no code implementations NeurIPS 2013 Francesco Orabona

We present a new online learning algorithm that extends the exponentiated gradient to infinite dimensional spaces.

On Measure Concentration of Random Maximum A-Posteriori Perturbations

no code implementations15 Oct 2013 Francesco Orabona, Tamir Hazan, Anand D. Sarwate, Tommi Jaakkola

Applying the general result to MAP perturbations can yield a more efficient algorithm to approximate sampling from the Gibbs distribution.

From N to N+1: Multiclass Transfer Incremental Learning

no code implementations CVPR 2013 Ilja Kuzborskij, Francesco Orabona, Barbara Caputo

Since the seminal work of Thrun [17], the learning to learn paradigm has been defined as the ability of an agent to improve its performance at each task with experience, with the number of tasks.

Incremental Learning Object Categorization +1

A Generalized Online Mirror Descent with Applications to Classification and Regression

no code implementations10 Apr 2013 Francesco Orabona, Koby Crammer, Nicolò Cesa-Bianchi

A unifying perspective on the design and the analysis of online algorithms is provided by online mirror descent, a general prediction strategy from which most first-order algorithms can be obtained as special cases.

General Classification regression

Learning from Candidate Labeling Sets

no code implementations NeurIPS 2010 Jie Luo, Francesco Orabona

In this paper, we propose a semi-supervised framework to model this kind of problems.

New Adaptive Algorithms for Online Classification

no code implementations NeurIPS 2010 Francesco Orabona, Koby Crammer

We propose a general framework to online learning for classification problems with time-varying potential functions in the adversarial setting.

Classification General Classification

Cannot find the paper you are looking for? You can Submit a new open access paper.