Browse SoTA > Miscellaneous > Multi-Armed Bandits

Multi-Armed Bandits

42 papers with code · Miscellaneous

Multi-armed bandits refer to a task where a fixed amount of resources must be allocated between competing resources that maximizes expected gain. Typically these problems involve an exploration/exploitation trade-off.

( Image credit: Microsoft Research )

Benchmarks

Greatest papers with code

Deep Bayesian Bandits Showdown: An Empirical Comparison of Bayesian Deep Networks for Thompson Sampling

ICLR 2018 tensorflow/models

At the same time, advances in approximate Bayesian methods have made posterior approximation for flexible neural network models practical.

DECISION MAKING MULTI-ARMED BANDITS

Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits

4 Feb 2014VowpalWabbit/vowpal_wabbit

We present a new algorithm for the contextual bandit learning problem, where the learner repeatedly takes one of $K$ actions in response to the observed context, and observes the reward only for that chosen action.

MULTI-ARMED BANDITS

Adapting multi-armed bandits policies to contextual bandits scenarios

11 Nov 2018david-cortes/contextualbandits

This work explores adaptations of successful multi-armed bandits policies to the online contextual bandits scenario with binary rewards using binary classification algorithms such as logistic regression as black-box oracles.

MULTI-ARMED BANDITS

Hierarchical Adaptive Contextual Bandits for Resource Constraint based Recommendation

2 Apr 2020ymy4323460/HATCH

In this paper, we propose a hierarchical adaptive contextual bandit method (HATCH) to conduct the policy learning of contextual bandits with a budget constraint.

MULTI-ARMED BANDITS

Model Selection for Contextual Bandits

NeurIPS 2019 akshaykr/oracle_cb

We work in the stochastic realizable setting with a sequence of nested linear policy classes of dimension $d_1 < d_2 < \ldots$, where the $m^\star$-th class contains the optimal policy, and we design an algorithm that achieves $\tilde{O}l(T^{2/3}d^{1/3}_{m^\star})$ regret with no prior knowledge of the optimal dimension $d_{m^\star}$.

MODEL SELECTION MULTI-ARMED BANDITS

Model selection for contextual bandits

NeurIPS 2019 akshaykr/oracle_cb

We work in the stochastic realizable setting with a sequence of nested linear policy classes of dimension $d_1 < d_2 < \ldots$, where the $m^\star$-th class contains the optimal policy, and we design an algorithm that achieves $\tilde{O}(T^{2/3}d^{1/3}_{m^\star})$ regret with no prior knowledge of the optimal dimension $d_{m^\star}$.

MODEL SELECTION MULTI-ARMED BANDITS

Semiparametric Contextual Bandits

ICML 2018 akshaykr/oracle_cb

This paper studies semiparametric contextual bandits, a generalization of the linear stochastic bandit problem where the reward for an action is modeled as a linear function of known action features confounded by an non-linear action-independent term.

MULTI-ARMED BANDITS

Unified Models of Human Behavioral Agents in Bandits, Contextual Bandits and RL

10 May 2020doerlbh/mentalRL

Artificial behavioral agents are often evaluated based on their consistent behaviors and performance to take sequential actions in an environment to maximize some notion of cumulative reward.

DECISION MAKING MULTI-ARMED BANDITS

Practical Calculation of Gittins Indices for Multi-armed Bandits

11 Sep 2019jedwards24/gittins

Gittins indices provide an optimal solution to the classical multi-armed bandit problem.

MULTI-ARMED BANDITS

Bayesian Optimisation over Multiple Continuous and Categorical Inputs

20 Jun 2019rubinxin/CoCaBO_code

Efficient optimisation of black-box problems that comprise both continuous and categorical inputs is important, yet poses significant challenges.

BAYESIAN OPTIMISATION MULTI-ARMED BANDITS