Search Results for author: Alex Ayoub

Found 7 papers, 1 papers with code

Switching the Loss Reduces the Cost in Batch Reinforcement Learning

no code implementations • 8 Mar 2024 • Alex Ayoub, Kaiwen Wang, Vincent Liu, Samuel Robertson, James McInerney, Dawen Liang, Nathan Kallus, Csaba Szepesvári

We propose training fitted Q-iteration with log-loss (FQI-LOG) for batch reinforcement learning (RL).

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Exploration via linearly perturbed loss minimisation

no code implementations • 13 Nov 2023 • David Janz, Shuai Liu, Alex Ayoub, Csaba Szepesvári

We show that, for the case of generalised linear bandits, EVILL reduces to perturbed history exploration (PHE), a method where exploration is done by training on randomly perturbed rewards.

Thompson Sampling

Paper
Add Code

Managing Temporal Resolution in Continuous Value Estimation: A Fundamental Trade-off

no code implementations • NeurIPS 2023 • Zichen Zhang, Johannes Kirschner, Junxi Zhang, Francesco Zanini, Alex Ayoub, Masood Dehghan, Dale Schuurmans

A default assumption in reinforcement learning (RL) and optimal control is that observations arrive at discrete time points on a fixed clock cycle.

Continuous Control Reinforcement Learning (RL)

Paper
Add Code

Resmax: An Alternative Soft-Greedy Operator for Reinforcement Learning

no code implementations • 29 Sep 2021 • Erfan Miahi, Revan MacQueen, Alex Ayoub, Abbas Masoumzadeh, Martha White

Soft-greedy operators, namely $\varepsilon$-greedy and softmax, remain a common choice to induce a basic level of exploration for action-value methods in reinforcement learning.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

An Elementary Proof that Q-learning Converges Almost Surely

no code implementations • 5 Aug 2021 • Matthew T. Regehr, Alex Ayoub

Watkins' and Dayan's Q-learning is a model-free reinforcement learning algorithm that iteratively refines an estimate for the optimal action-value function of an MDP by stochastically "visiting" many state-ation pairs [Watkins and Dayan, 1992].

Q-Learning reinforcement-learning +1

Paper
Add Code

Randomized Exploration for Reinforcement Learning with General Value Function Approximation

1 code implementation • 15 Jun 2021 • Haque Ishfaq, Qiwen Cui, Viet Nguyen, Alex Ayoub, Zhuoran Yang, Zhaoran Wang, Doina Precup, Lin F. Yang

We propose a model-free reinforcement learning algorithm inspired by the popular randomized least squares value iteration (RLSVI) algorithm as well as the optimism principle.

reinforcement-learning Reinforcement Learning (RL)

Paper
Code

Model-Based Reinforcement Learning with Value-Targeted Regression

no code implementations • ICML 2020 • Alex Ayoub, Zeyu Jia, Csaba Szepesvari, Mengdi Wang, Lin F. Yang

We propose a model based RL algorithm that is based on optimism principle: In each episode, the set of models that are `consistent' with the data collected is constructed.

Model-based Reinforcement Learning regression +2

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.