Search Results for author: Yasin Abbasi-Yadkori

Found 30 papers, 3 papers with code

A New Look at Dynamic Regret for Non-Stationary Stochastic Bandits

no code implementations17 Jan 2022 Yasin Abbasi-Yadkori, Andras Gyorgy, Nevena Lazic

We propose a method that achieves, in $K$-armed bandit problems, a near-optimal $\widetilde O(\sqrt{K N(S+1)})$ dynamic regret, where $N$ is the time horizon of the problem and $S$ is the number of times the identity of the optimal arm changes, without prior knowledge of $S$.

Efficient Local Planning with Linear Function Approximation

no code implementations12 Aug 2021 Dong Yin, Botao Hao, Yasin Abbasi-Yadkori, Nevena Lazić, Csaba Szepesvári

Under the assumption that the Q-functions of all policies are linear in known features of the state-action pairs, we show that our algorithms have polynomial query and computational costs in the dimension of the features, the effective planning horizon, and the targeted sub-optimality, while these costs are independent of the size of the state space.

Feature and Parameter Selection in Stochastic Linear Bandits

no code implementations9 Jun 2021 Ahmadreza Moradipari, Berkay Turan, Yasin Abbasi-Yadkori, Mahnoosh Alizadeh, Mohammad Ghavamzadeh

In the second setting, the reward parameter of the LB problem is arbitrarily selected from $M$ models represented as (possibly) overlapping balls in $\mathbb R^d$.

feature selection Model Selection

Improved Regret Bound and Experience Replay in Regularized Policy Iteration

no code implementations25 Feb 2021 Nevena Lazic, Dong Yin, Yasin Abbasi-Yadkori, Csaba Szepesvari

We first show that the regret analysis of the Politex algorithm (a version of regularized policy iteration) can be sharpened from $O(T^{3/4})$ to $O(\sqrt{T})$ under nearly identical assumptions, and instantiate the bound with linear function approximation.

Optimization Issues in KL-Constrained Approximate Policy Iteration

no code implementations11 Feb 2021 Nevena Lazić, Botao Hao, Yasin Abbasi-Yadkori, Dale Schuurmans, Csaba Szepesvári

We compare the use of KL divergence as a constraint vs. as a regularizer, and point out several optimization issues with the widely-used constrained approach.

On Query-efficient Planning in MDPs under Linear Realizability of the Optimal State-value Function

no code implementations3 Feb 2021 Gellért Weisz, Philip Amortila, Barnabás Janzer, Yasin Abbasi-Yadkori, Nan Jiang, Csaba Szepesvári

We consider local planning in fixed-horizon MDPs with a generative model under the assumption that the optimal value function lies close to the span of a feature map.

Open-Ended Question Answering

The Elliptical Potential Lemma Revisited

no code implementations20 Oct 2020 Alexandra Carpentier, Claire Vernade, Yasin Abbasi-Yadkori

This note proposes a new proof and new perspectives on the so-called Elliptical Potential Lemma.

LEMMA

Regret Balancing for Bandit and RL Model Selection

no code implementations9 Jun 2020 Yasin Abbasi-Yadkori, Aldo Pacchiano, My Phan

Given a set of base learning algorithms, an effective model selection strategy adapts to the best learning algorithm in an online fashion.

Model Selection

Sample Efficient Graph-Based Optimization with Noisy Observations

1 code implementation4 Jun 2020 Tan Nguyen, Ali Shameli, Yasin Abbasi-Yadkori, Anup Rao, Branislav Kveton

We study sample complexity of optimizing "hill-climbing friendly" functions defined on a graph under noisy observations.

Re-Ranking

Model Selection in Contextual Stochastic Bandit Problems

no code implementations NeurIPS 2020 Aldo Pacchiano, My Phan, Yasin Abbasi-Yadkori, Anup Rao, Julian Zimmert, Tor Lattimore, Csaba Szepesvari

Our methods rely on a novel and generic smoothing transformation for bandit algorithms that permits us to obtain optimal $O(\sqrt{T})$ model selection guarantees for stochastic contextual bandit problems as long as the optimal base algorithm satisfies a high probability regret guarantee.

Model Selection Multi-Armed Bandits

Adaptive Approximate Policy Iteration

1 code implementation8 Feb 2020 Botao Hao, Nevena Lazic, Yasin Abbasi-Yadkori, Pooria Joulani, Csaba Szepesvari

This is an improvement over the best existing bound of $\tilde{O}(T^{3/4})$ for the average-reward case with function approximation.

Exploration-Enhanced POLITEX

no code implementations27 Aug 2019 Yasin Abbasi-Yadkori, Nevena Lazic, Csaba Szepesvari, Gellert Weisz

POLITEX has sublinear regret guarantees in uniformly-mixing MDPs when the value estimation error can be controlled, which can be satisfied if all policies sufficiently explore the environment.

Thompson Sampling with Approximate Inference

no code implementations NeurIPS 2019 My Phan, Yasin Abbasi-Yadkori, Justin Domke

We study the effects of approximate inference on the performance of Thompson sampling in the $k$-armed bandit problems.

Decision Making Thompson Sampling

Bootstrapping Upper Confidence Bound

no code implementations NeurIPS 2019 Botao Hao, Yasin Abbasi-Yadkori, Zheng Wen, Guang Cheng

Upper Confidence Bound (UCB) method is arguably the most celebrated one used in online decision making with partial information feedback.

Decision Making Multi-Armed Bandits

Large-Scale Markov Decision Problems via the Linear Programming Dual

no code implementations6 Jan 2019 Yasin Abbasi-Yadkori, Peter L. Bartlett, Xi Chen, Alan Malek

Moreover, we propose an efficient algorithm, scaling with the size of the subspace but not the state space, that is able to find a policy with low excess loss relative to the best policy in this class.

Sharp convergence rates for Langevin dynamics in the nonconvex setting

no code implementations4 May 2018 Xiang Cheng, Niladri S. Chatterji, Yasin Abbasi-Yadkori, Peter L. Bartlett, Michael. I. Jordan

We study the problem of sampling from a distribution $p^*(x) \propto \exp\left(-U(x)\right)$, where the function $U$ is $L$-smooth everywhere and $m$-strongly convex outside a ball of radius $R$, but potentially nonconvex inside this ball.

Offline Evaluation of Ranking Policies with Click Models

no code implementations27 Apr 2018 Shuai Li, Yasin Abbasi-Yadkori, Branislav Kveton, S. Muthukrishnan, Vishwa Vinay, Zheng Wen

We analyze our estimators and prove that they are more efficient than the estimators that do not use the structure of the click model, under the assumption that the click model holds.

Recommendation Systems

Model-Free Linear Quadratic Control via Reduction to Expert Prediction

no code implementations17 Apr 2018 Yasin Abbasi-Yadkori, Nevena Lazic, Csaba Szepesvari

Model-free approaches for reinforcement learning (RL) and continuous control find policies based only on past states and rewards, without fitting a model of the system dynamics.

Continuous Control Reinforcement Learning (RL)

Optimizing over a Restricted Policy Class in Markov Decision Processes

no code implementations26 Feb 2018 Ershad Banijamali, Yasin Abbasi-Yadkori, Mohammad Ghavamzadeh, Nikos Vlassis

However, under a condition that is akin to the occupancy measures of the base policies having large overlap, we show that there exists an efficient algorithm that finds a policy that is almost as good as the best convex combination of the base policies.

Policy Gradient Methods

Posterior Sampling for Large Scale Reinforcement Learning

no code implementations21 Nov 2017 Georgios Theocharous, Zheng Wen, Yasin Abbasi-Yadkori, Nikos Vlassis

Our algorithm termed deterministic schedule PSRL (DS-PSRL) is efficient in terms of time, sample, and space complexity.

reinforcement-learning Reinforcement Learning (RL)

Conservative Contextual Linear Bandits

no code implementations NeurIPS 2017 Abbas Kazerouni, Mohammad Ghavamzadeh, Yasin Abbasi-Yadkori, Benjamin Van Roy

We prove an upper-bound on the regret of CLUCB and show that it can be decomposed into two terms: 1) an upper-bound for the regret of the standard linear UCB algorithm that grows with the time horizon and 2) a constant (does not grow with the time horizon) term that accounts for the loss of being conservative in order to satisfy the safety constraint.

Decision Making Marketing

Hit-and-Run for Sampling and Planning in Non-Convex Spaces

no code implementations19 Oct 2016 Yasin Abbasi-Yadkori, Peter L. Bartlett, Victor Gabillon, Alan Malek

We propose the Hit-and-Run algorithm for planning and sampling problems in non-convex spaces.

Online learning in MDPs with side information

no code implementations26 Jun 2014 Yasin Abbasi-Yadkori, Gergely Neu

We study online learning of finite Markov decision process (MDP) problems when a side information vector is available.

Recommendation Systems

Linear Programming for Large-Scale Markov Decision Problems

no code implementations27 Feb 2014 Yasin Abbasi-Yadkori, Peter L. Bartlett, Alan Malek

We consider the problem of controlling a Markov decision process (MDP) with a large state space, so as to minimize average cost.

Improved Algorithms for Linear Stochastic Bandits

no code implementations NeurIPS 2011 Yasin Abbasi-Yadkori, Dávid Pál, Csaba Szepesvári

We improve the theoretical analysis and empirical performance of algorithms for the stochastic multi-armed bandit problem and the linear stochastic multi-armed bandit problem.

Cannot find the paper you are looking for? You can Submit a new open access paper.