Multi-Armed Bandits

196 papers with code • 1 benchmarks • 2 datasets

Multi-armed bandits refer to a task where a fixed amount of resources must be allocated between competing resources that maximizes expected gain. Typically these problems involve an exploration/exploitation trade-off.

( Image credit: Microsoft Research )

Benchmarks

Add a Result

These leaderboards are used to track progress in Multi-Armed Bandits

Trend	Dataset	Best Model	Paper	Code	Compare
	Mushroom	Linear FullPosterior-MR			See all

Libraries

Use these libraries to find Multi-Armed Bandits models and implementations

facebookresearch/Horizon

2 papers

3,521

facebookresearch/ReAgent

2 papers

3,521

st-tech/zr-obp

2 papers

614

Datasets

Most implemented papers

Most implemented Social Latest No code

Quantile Bandits for Best Arms Identification

Mengyanz/QSAR • 22 Oct 2020

We consider a variant of the best arm identification task in stochastic multi-armed bandits.

Paper
Code

Inverse Contextual Bandits: Learning How Behavior Evolves over Time

alihanhyk/invconban • • 13 Jul 2021

Understanding a decision-maker's priorities by observing their behavior is critical for transparency and accountability in decision processes, such as in healthcare.

Paper
Code

Empirical analysis of representation learning and exploration in neural kernel bandits

mlisicki/neuralkernelbandits • • 5 Nov 2021

We consider policies based on a GP and a Student's t-process (TP).

Paper
Code

Doubly Robust Off-Policy Evaluation for Ranking Policies under the Cascade Behavior Model

st-tech/zr-obp • 3 Feb 2022

We show that the proposed estimator is unbiased in more cases compared to existing estimators that make stronger assumptions.

Paper
Code

Truncated LinUCB for Stochastic Linear Bandits

simonzhou86/tr_linucb • • 23 Feb 2022

This paper considers contextual bandits with a finite number of arms, where the contexts are independent and identically distributed $d$-dimensional random vectors, and the expected rewards are linear in both the arm parameters and contexts.

Paper
Code

Kernel Conditional Moment Constraints for Confounding Robust Inference

kstoneriv3/confounding-robust-inference-old • • 26 Feb 2023

It can be shown that our estimator contains the recently proposed sharp estimator by Dorn and Guo (2022) as a special case, and our method enables a novel extension of the classical marginal sensitivity model using f-divergence.

Paper
Code

Doubly Robust Policy Evaluation and Learning

leoguelman/BLBF • • 23 Mar 2011

The key challenge is that the past data typically does not faithfully represent proportions of actions taken by a new policy.

Paper
Code

Thompson Sampling for Contextual Bandits with Linear Payoffs

yanyangbaobeiIsEmma/Reinforcement-Learning-Contextual-Bandits • 15 Sep 2012

Thompson Sampling is one of the oldest heuristics for multi-armed bandit problems.

Paper
Code

Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits

VowpalWabbit/vowpal_wabbit • 4 Feb 2014

We present a new algorithm for the contextual bandit learning problem, where the learner repeatedly takes one of $K$ actions in response to the observed context, and observes the reward only for that chosen action.

Paper
Code

Regulating Greed Over Time in Multi-Armed Bandits

5tefan0/Regulating-Greed-Over-Time • 21 May 2015

In the corrected methods, exploitation (greed) is regulated over time, so that more exploitation occurs during higher reward periods, and more exploration occurs in periods of low reward.

Paper
Code

Multi-Armed Bandits

Benchmarks Add a Result

Libraries

Datasets

Most implemented papers

Content

Benchmarks

Add a Result