Multi-Armed Bandits

195 papers with code • 1 benchmarks • 2 datasets

Multi-armed bandits refer to a task where a fixed amount of resources must be allocated between competing resources that maximizes expected gain. Typically these problems involve an exploration/exploitation trade-off.

( Image credit: Microsoft Research )

Benchmarks

Add a Result

These leaderboards are used to track progress in Multi-Armed Bandits

Trend	Dataset	Best Model	Paper	Code	Compare
	Mushroom	Linear FullPosterior-MR			See all

Libraries

Use these libraries to find Multi-Armed Bandits models and implementations

facebookresearch/Horizon

2 papers

3,521

facebookresearch/ReAgent

2 papers

3,521

st-tech/zr-obp

2 papers

612

Datasets

Latest papers

Most implemented Social Latest No code

Optimal Regret with Limited Adaptivity for Generalized Linear Contextual Bandits

nirjhar-das/glbandit_limited_adaptivity • 10 Apr 2024

For our batch learning algorithm B-GLinCB, with $\Omega\left( \log{\log T} \right)$ batches, the regret scales as $\tilde{O}(\sqrt{T})$.

10 Apr 2024

Paper
Code

Sequential Decision Making with Expert Demonstrations under Unobserved Heterogeneity

vdblm/ExPerior • • 10 Apr 2024

We study the problem of online sequential decision-making given auxiliary demonstrations from experts who made their decisions based on unobserved contextual information.

10 Apr 2024

Paper
Code

Best Arm Identification for Prompt Learning under a Limited Budget

shengroup/triple • 15 Feb 2024

Based on this connection, a general framework TRIPLE (besT aRm Identification for Prompt LEarning) is proposed to harness the power of BAI-FB in prompt learning systematically.

15 Feb 2024

Paper
Code

Fairness of Exposure in Online Restless Multi-armed Bandits

rchiso/mf-rmab • 9 Feb 2024

However, they do not consider the distribution of pulls among the arms.

09 Feb 2024

Paper
Code

A Bayesian Approach to Online Learning for Contextual Restless Bandits with Applications to Public Health

biyonka/bcor • 7 Feb 2024

Restless multi-armed bandits (RMABs) are used to model sequential resource allocation in public health intervention programs.

07 Feb 2024

Paper
Code

Off-Policy Evaluation of Slate Bandit Policies via Optimizing Abstraction

aiueola/webconf2024-slate-ope-via-abstraction • • 3 Feb 2024

The PseudoInverse (PI) estimator has been introduced to mitigate the variance issue by assuming linearity in the reward function, but this can result in significant bias as this assumption is hard-to-verify from observed data and is often substantially violated.

03 Feb 2024

Paper
Code

Falcon: Fair Active Learning using Multi-armed Bandits

khtae8250/falcon • 23 Jan 2024

Given a user-specified group fairness measure, Falcon identifies samples from "target groups" (e. g., (attribute=female, label=positive)) that are the most informative for improving fairness.

23 Jan 2024

Paper
Code

On Quantum Natural Policy Gradients

andre-sequeira10/gqnpg • • 16 Jan 2024

This research delves into the role of the quantum Fisher Information Matrix (FIM) in enhancing the performance of Parameterized Quantum Circuit (PQC)-based reinforcement learning agents.

16 Jan 2024

Paper
Code

Let's Get It Started: Fostering the Discoverability of New Releases on Deezer

deezer/new-releases-ecir2024 • 5 Jan 2024

This paper presents our recent initiatives to foster the discoverability of new releases on the music streaming service Deezer.

05 Jan 2024

Paper
Code

In-Context Reinforcement Learning for Variable Action Spaces

corl-team/headless-ad • • 20 Dec 2023

Recently, it has been shown that transformers pre-trained on diverse datasets with multi-episode contexts can generalize to new reinforcement learning tasks in-context.

20 Dec 2023

Paper
Code

Multi-Armed Bandits

Benchmarks Add a Result

Libraries

Datasets

Latest papers

Content

Benchmarks

Add a Result