Search Results for author: Yasin Abbasi-Yadkori

Found 30 papers, 3 papers with code

Non-stationary Bandits and Meta-Learning with a Small Set of Optimal Arms

1 code implementation • 25 Feb 2022 • MohammadJavad Azizi, Thang Duong, Yasin Abbasi-Yadkori, András György, Claire Vernade, Mohammad Ghavamzadeh

We study a sequential decision problem where the learner faces a sequence of $K$-armed bandit tasks.

Meta-Learning

Paper
Code

A New Look at Dynamic Regret for Non-Stationary Stochastic Bandits

no code implementations • 17 Jan 2022 • Yasin Abbasi-Yadkori, Andras Gyorgy, Nevena Lazic

We propose a method that achieves, in $K$-armed bandit problems, a near-optimal $\widetilde O(\sqrt{K N(S+1)})$ dynamic regret, where $N$ is the time horizon of the problem and $S$ is the number of times the identity of the optimal arm changes, without prior knowledge of $S$.

Paper
Add Code

Efficient Local Planning with Linear Function Approximation

no code implementations • 12 Aug 2021 • Dong Yin, Botao Hao, Yasin Abbasi-Yadkori, Nevena Lazić, Csaba Szepesvári

Under the assumption that the Q-functions of all policies are linear in known features of the state-action pairs, we show that our algorithms have polynomial query and computational costs in the dimension of the features, the effective planning horizon, and the targeted sub-optimality, while these costs are independent of the size of the state space.

Paper
Add Code

Feature and Parameter Selection in Stochastic Linear Bandits

no code implementations • 9 Jun 2021 • Ahmadreza Moradipari, Berkay Turan, Yasin Abbasi-Yadkori, Mahnoosh Alizadeh, Mohammad Ghavamzadeh

In the second setting, the reward parameter of the LB problem is arbitrarily selected from $M$ models represented as (possibly) overlapping balls in $\mathbb R^d$.

feature selection Model Selection

Paper
Add Code

Improved Regret Bound and Experience Replay in Regularized Policy Iteration

no code implementations • 25 Feb 2021 • Nevena Lazic, Dong Yin, Yasin Abbasi-Yadkori, Csaba Szepesvari

We first show that the regret analysis of the Politex algorithm (a version of regularized policy iteration) can be sharpened from $O(T^{3/4})$ to $O(\sqrt{T})$ under nearly identical assumptions, and instantiate the bound with linear function approximation.

Paper
Add Code

Optimization Issues in KL-Constrained Approximate Policy Iteration

no code implementations • 11 Feb 2021 • Nevena Lazić, Botao Hao, Yasin Abbasi-Yadkori, Dale Schuurmans, Csaba Szepesvári

We compare the use of KL divergence as a constraint vs. as a regularizer, and point out several optimization issues with the widely-used constrained approach.

Paper
Add Code

On Query-efficient Planning in MDPs under Linear Realizability of the Optimal State-value Function

no code implementations • 3 Feb 2021 • Gellért Weisz, Philip Amortila, Barnabás Janzer, Yasin Abbasi-Yadkori, Nan Jiang, Csaba Szepesvári

We consider local planning in fixed-horizon MDPs with a generative model under the assumption that the optimal value function lies close to the span of a feature map.

Open-Ended Question Answering

Paper
Add Code

The Elliptical Potential Lemma Revisited

no code implementations • 20 Oct 2020 • Alexandra Carpentier, Claire Vernade, Yasin Abbasi-Yadkori

This note proposes a new proof and new perspectives on the so-called Elliptical Potential Lemma.

LEMMA

Paper
Add Code

Regret Balancing for Bandit and RL Model Selection

no code implementations • 9 Jun 2020 • Yasin Abbasi-Yadkori, Aldo Pacchiano, My Phan

Given a set of base learning algorithms, an effective model selection strategy adapts to the best learning algorithm in an online fashion.

Model Selection

Paper
Add Code

Sample Efficient Graph-Based Optimization with Noisy Observations

1 code implementation • 4 Jun 2020 • Tan Nguyen, Ali Shameli, Yasin Abbasi-Yadkori, Anup Rao, Branislav Kveton

We study sample complexity of optimizing "hill-climbing friendly" functions defined on a graph under noisy observations.

Re-Ranking

Paper
Code

Model Selection in Contextual Stochastic Bandit Problems

no code implementations • NeurIPS 2020 • Aldo Pacchiano, My Phan, Yasin Abbasi-Yadkori, Anup Rao, Julian Zimmert, Tor Lattimore, Csaba Szepesvari

Our methods rely on a novel and generic smoothing transformation for bandit algorithms that permits us to obtain optimal $O(\sqrt{T})$ model selection guarantees for stochastic contextual bandit problems as long as the optimal base algorithm satisfies a high probability regret guarantee.

Model Selection Multi-Armed Bandits

Paper
Add Code

Adaptive Approximate Policy Iteration

1 code implementation • 8 Feb 2020 • Botao Hao, Nevena Lazic, Yasin Abbasi-Yadkori, Pooria Joulani, Csaba Szepesvari

This is an improvement over the best existing bound of $\tilde{O}(T^{3/4})$ for the average-reward case with function approximation.

Paper
Code

Exploration-Enhanced POLITEX

no code implementations • 27 Aug 2019 • Yasin Abbasi-Yadkori, Nevena Lazic, Csaba Szepesvari, Gellert Weisz

POLITEX has sublinear regret guarantees in uniformly-mixing MDPs when the value estimation error can be controlled, which can be satisfied if all policies sufficiently explore the environment.

Paper
Add Code

Thompson Sampling with Approximate Inference

no code implementations • NeurIPS 2019 • My Phan, Yasin Abbasi-Yadkori, Justin Domke

We study the effects of approximate inference on the performance of Thompson sampling in the $k$-armed bandit problems.

Decision Making Thompson Sampling

Paper
Add Code

Bootstrapping Upper Confidence Bound

no code implementations • NeurIPS 2019 • Botao Hao, Yasin Abbasi-Yadkori, Zheng Wen, Guang Cheng

Upper Confidence Bound (UCB) method is arguably the most celebrated one used in online decision making with partial information feedback.

Decision Making Multi-Armed Bandits

Paper
Add Code

Large-Scale Markov Decision Problems via the Linear Programming Dual

no code implementations • 6 Jan 2019 • Yasin Abbasi-Yadkori, Peter L. Bartlett, Xi Chen, Alan Malek

Moreover, we propose an efficient algorithm, scaling with the size of the subspace but not the state space, that is able to find a policy with low excess loss relative to the best policy in this class.

Paper
Add Code

New Insights into Bootstrapping for Bandits

no code implementations • 24 May 2018 • Sharan Vaswani, Branislav Kveton, Zheng Wen, Anup Rao, Mark Schmidt, Yasin Abbasi-Yadkori

We investigate the use of bootstrapping in the bandit setting.

Thompson Sampling

Paper
Add Code

Sharp convergence rates for Langevin dynamics in the nonconvex setting

no code implementations • 4 May 2018 • Xiang Cheng, Niladri S. Chatterji, Yasin Abbasi-Yadkori, Peter L. Bartlett, Michael. I. Jordan

We study the problem of sampling from a distribution $p^*(x) \propto \exp\left(-U(x)\right)$, where the function $U$ is $L$-smooth everywhere and $m$-strongly convex outside a ball of radius $R$, but potentially nonconvex inside this ball.

Paper
Add Code

Offline Evaluation of Ranking Policies with Click Models

no code implementations • 27 Apr 2018 • Shuai Li, Yasin Abbasi-Yadkori, Branislav Kveton, S. Muthukrishnan, Vishwa Vinay, Zheng Wen

We analyze our estimators and prove that they are more efficient than the estimators that do not use the structure of the click model, under the assumption that the click model holds.

Recommendation Systems

Paper
Add Code

Model-Free Linear Quadratic Control via Reduction to Expert Prediction

no code implementations • 17 Apr 2018 • Yasin Abbasi-Yadkori, Nevena Lazic, Csaba Szepesvari

Model-free approaches for reinforcement learning (RL) and continuous control find policies based only on past states and rewards, without fitting a model of the system dynamics.

Continuous Control Reinforcement Learning (RL)

Paper
Add Code

Optimizing over a Restricted Policy Class in Markov Decision Processes

no code implementations • 26 Feb 2018 • Ershad Banijamali, Yasin Abbasi-Yadkori, Mohammad Ghavamzadeh, Nikos Vlassis

However, under a condition that is akin to the occupancy measures of the base policies having large overlap, we show that there exists an efficient algorithm that finds a policy that is almost as good as the best convex combination of the base policies.

Policy Gradient Methods

Paper
Add Code

A Continuation Method for Discrete Optimization and its Application to Nearest Neighbor Classification

no code implementations • 10 Feb 2018 • Ali Shameli, Yasin Abbasi-Yadkori

We show the effectiveness of the proposed technique in the problem of nearest neighbor classification.

General Classification

Paper
Add Code

Stochastic Low-Rank Bandits

no code implementations • 13 Dec 2017 • Branislav Kveton, Csaba Szepesvari, Anup Rao, Zheng Wen, Yasin Abbasi-Yadkori, S. Muthukrishnan

Many problems in computer vision and recommender systems involve low-rank matrices.

Recommendation Systems

Paper
Add Code

Posterior Sampling for Large Scale Reinforcement Learning

no code implementations • 21 Nov 2017 • Georgios Theocharous, Zheng Wen, Yasin Abbasi-Yadkori, Nikos Vlassis

Our algorithm termed deterministic schedule PSRL (DS-PSRL) is efficient in terms of time, sample, and space complexity.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Conservative Contextual Linear Bandits

no code implementations • NeurIPS 2017 • Abbas Kazerouni, Mohammad Ghavamzadeh, Yasin Abbasi-Yadkori, Benjamin Van Roy

We prove an upper-bound on the regret of CLUCB and show that it can be decomposed into two terms: 1) an upper-bound for the regret of the standard linear UCB algorithm that grows with the time horizon and 2) a constant (does not grow with the time horizon) term that accounts for the loss of being conservative in order to satisfy the safety constraint.

Decision Making Marketing

Paper
Add Code

Hit-and-Run for Sampling and Planning in Non-Convex Spaces

no code implementations • 19 Oct 2016 • Yasin Abbasi-Yadkori, Peter L. Bartlett, Victor Gabillon, Alan Malek

We propose the Hit-and-Run algorithm for planning and sampling problems in non-convex spaces.

Paper
Add Code

Online learning in MDPs with side information

no code implementations • 26 Jun 2014 • Yasin Abbasi-Yadkori, Gergely Neu

We study online learning of finite Markov decision process (MDP) problems when a side information vector is available.

Recommendation Systems

Paper
Add Code

Bayesian Optimal Control of Smoothly Parameterized Systems: The Lazy Posterior Sampling Algorithm

no code implementations • 16 Jun 2014 • Yasin Abbasi-Yadkori, Csaba Szepesvari

We study Bayesian optimal control of a general class of smoothly parameterized Markov decision problems.

Computational Efficiency

Paper
Add Code

Linear Programming for Large-Scale Markov Decision Problems

no code implementations • 27 Feb 2014 • Yasin Abbasi-Yadkori, Peter L. Bartlett, Alan Malek

We consider the problem of controlling a Markov decision process (MDP) with a large state space, so as to minimize average cost.

Paper
Add Code

Improved Algorithms for Linear Stochastic Bandits

no code implementations • NeurIPS 2011 • Yasin Abbasi-Yadkori, Dávid Pál, Csaba Szepesvári

We improve the theoretical analysis and empirical performance of algorithms for the stochastic multi-armed bandit problem and the linear stochastic multi-armed bandit problem.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.