no code implementations • 19 Mar 2024 • Edoardo Cetin, Andrea Tirinzoni, Matteo Pirotta, Alessandro Lazaric, Yann Ollivier, Ahmed Touati
Offline reinforcement learning algorithms have proven effective on datasets highly connected to the target downstream task.
1 code implementation • 29 Sep 2022 • Ahmed Touati, Jérémy Rapin, Yann Ollivier
A zero-shot RL agent is an agent that can solve any RL task in a given environment, instantly with no additional planning or learning, after an initial reward-free learning phase.
no code implementations • 30 May 2022 • Benjamin Scellier, Siddhartha Mishra, Yoshua Bengio, Yann Ollivier
This work establishes that a physical system can perform statistical learning without gradient computations, via an Agnostic Equilibrium Propagation (Aeqprop) procedure that combines energy minimization, homeostatic control, and nudging towards the correct response.
no code implementations • 16 Jun 2021 • Léonard Blier, Yann Ollivier
We introduce unbiased deep Q-learning and actor-critic algorithms that can handle such infinitely sparse rewards, and test them in toy environments.
2 code implementations • NeurIPS 2021 • Ahmed Touati, Yann Ollivier
In the test phase, a reward representation is estimated either from observations or an explicit reward description (e. g., a target state).
no code implementations • 18 Jan 2021 • Léonard Blier, Corentin Tallec, Yann Ollivier
In reinforcement learning, temporal difference-based algorithms can be sample-inefficient: for instance, with sparse rewards, no learning occurs until a reward is observed.
no code implementations • 12 May 2020 • Pierre-Yves Massé, Yann Ollivier
This is more data-agnostic and creates differences with respect to standard SGD theory, especially for the range of possible learning rates.
no code implementations • 1 Feb 2020 • Pierre Wolinski, Guillaume Charpiat, Yann Ollivier
We fully characterize the regularizers that can arise according to this procedure, and provide a systematic way to compute the prior corresponding to a given penalty.
no code implementations • 29 Aug 2019 • Alexandre Sablayrolles, Matthijs Douze, Yann Ollivier, Cordelia Schmid, Hervé Jégou
Membership inference determines, given a sample and trained parameters of a machine learning model, whether the sample was part of the training set.
no code implementations • ICLR 2019 • Carl-Johann Simon-Gabriel, Yann Ollivier, Léon Bottou, Bernhard Schölkopf, David Lopez-Paz
Over the past four years, neural networks have been proven vulnerable to adversarial images: targeted but imperceptible image perturbations lead to drastically different predictions.
1 code implementation • 5 Feb 2019 • Joshua Romoff, Peter Henderson, Ahmed Touati, Emma Brunskill, Joelle Pineau, Yann Ollivier
In settings where this bias is unacceptable - where the system must optimize for longer horizons at higher discounts - the target of the value function approximator may increase in variance leading to difficulties in learning.
1 code implementation • 28 Jan 2019 • Corentin Tallec, Léonard Blier, Yann Ollivier
Despite remarkable successes, Deep Reinforcement Learning (DRL) is not robust to hyperparameterization, implementation details, or small environment changes (Henderson et al. 2017, Zhang et al. 2018).
no code implementations • 3 Jan 2019 • Yann Ollivier
In principle this makes it possible to treat the underlying trajectory as the parameter of a statistical model of the observations.
1 code implementation • 2 Oct 2018 • Léonard Blier, Pierre Wolinski, Yann Ollivier
Hyperparameter tuning is a bothersome step in the training of deep learning models.
no code implementations • 27 Sep 2018 • Léonard Blier, Pierre Wolinski, Yann Ollivier
Hyperparameter tuning is a bothersome step in the training of deep learning mod- els.
no code implementations • ICML 2018 • Thomas Lucas, Corentin Tallec, Jakob Verbeek, Yann Ollivier
We propose to feed the discriminator with mixed batches of true and fake samples, and train it to predict the ratio of true samples in the batch.
no code implementations • 2 May 2018 • Yann Ollivier
In this case, approximate TD is exactly a gradient descent of the \emph{Dirichlet norm}, the norm of the difference of \emph{gradients} between the true and approximate value functions.
1 code implementation • ICLR 2018 • Corentin Tallec, Yann Ollivier
Successful recurrent models such as long short-term memories (LSTMs) and gated recurrent units (GRUs) use ad hoc gating mechanisms.
no code implementations • NeurIPS 2018 • Léonard Blier, Yann Ollivier
This might explain the relatively poor practical performance of variational methods in deep learning.
1 code implementation • ICLR 2019 • Carl-Johann Simon-Gabriel, Yann Ollivier, Léon Bottou, Bernhard Schölkopf, David Lopez-Paz
Over the past few years, neural networks were proven vulnerable to adversarial images: targeted but imperceptible image perturbations lead to drastically different predictions.
no code implementations • 22 Dec 2017 • Yann Ollivier
We introduce a simple algorithm, True Asymptotic Natural Gradient Optimization (TANGO), that converges to a true natural gradient descent in the limit of small learning rates, without explicit Fisher matrix estimation.
1 code implementation • 4 Dec 2017 • Gaétan Marceau-Caron, Yann Ollivier
The resulting natural Langevin dynamics combines the advantages of Amari's natural gradient descent and Fisher-preconditioned Langevin dynamics for large neural networks.
no code implementations • ICLR 2018 • Corentin Tallec, Yann Ollivier
Truncated BPTT keeps the computational benefits of Backpropagation Through Time (BPTT) while relieving the need for a complete backtrack through the whole data sequence at every step.
no code implementations • 1 Mar 2017 • Yann Ollivier
case, we prove that the joint Kalman filter over states and parameters is a natural gradient on top of real-time recurrent learning (RTRL), a classical algorithm to train recurrent models.
1 code implementation • ICLR 2018 • Corentin Tallec, Yann Ollivier
The novel Unbiased Online Recurrent Optimization (UORO) algorithm allows for online learning of general recurrent computational graphs such as recurrent network models.
no code implementations • 25 Feb 2016 • Gaétan Marceau-Caron, Yann Ollivier
We provide the first experimental results on non-synthetic datasets for the quasi-diagonal Riemannian gradient descents for neural networks introduced in [Ollivier, 2015].
no code implementations • 8 Nov 2015 • Pierre-Yves Massé, Yann Ollivier
The practical performance of online stochastic gradient descent algorithms is highly dependent on the chosen step size, which must be tediously hand-tuned in many applications.
no code implementations • 28 Jul 2015 • Yann Ollivier, Corentin Tallec, Guillaume Charpiat
The evolution of this search direction is partly stochastic and is constructed in such a way to provide, at every time, an unbiased random estimate of the gradient of the loss function with respect to the parameters.
no code implementations • 30 Mar 2014 • Yann Ollivier
We discuss the similarities and differences between training an auto-encoder to minimize the reconstruction error, and training the same auto-encoder to compress the data via a generative model.
no code implementations • 3 Jun 2013 • Yann Ollivier
Recurrent neural networks are powerful models for sequential data, able to represent complex dependencies in the sequence that simpler models such as hidden Markov models cannot handle.
no code implementations • 4 Mar 2013 • Yann Ollivier
We describe four algorithms for neural network training, each adapted to different scalability constraints.