Multi-Armed Bandits

195 papers with code • 1 benchmarks • 2 datasets

Multi-armed bandits refer to a task where a fixed amount of resources must be allocated between competing resources that maximizes expected gain. Typically these problems involve an exploration/exploitation trade-off.

( Image credit: Microsoft Research )

Benchmarks

Add a Result

These leaderboards are used to track progress in Multi-Armed Bandits

Trend	Dataset	Best Model	Paper	Code	Compare
	Mushroom	Linear FullPosterior-MR			See all

Libraries

Use these libraries to find Multi-Armed Bandits models and implementations

facebookresearch/Horizon

2 papers

3,521

facebookresearch/ReAgent

2 papers

3,521

st-tech/zr-obp

2 papers

614

Datasets

Latest papers with no code

Most implemented Social Latest No code

Feel-Good Thompson Sampling for Contextual Dueling Bandits

no code yet • 9 Apr 2024

In this paper, we propose a Thompson sampling algorithm, named FGTS. CDB, for linear contextual dueling bandits.

Paper
Add Code

On the Importance of Uncertainty in Decision-Making with Large Language Models

no code yet • 3 Apr 2024

We compare this baseline to LLM bandits that make active use of uncertainty estimation by integrating the uncertainty in a Thompson Sampling policy.

Paper
Add Code

Doubly-Robust Off-Policy Evaluation with Estimated Logging Policy

no code yet • 2 Apr 2024

We introduce a novel doubly-robust (DR) off-policy evaluation (OPE) estimator for Markov decision processes, DRUnknown, designed for situations where both the logging policy and the value function are unknown.

Paper
Add Code

Nearly-tight Approximation Guarantees for the Improving Multi-Armed Bandits Problem

no code yet • 1 Apr 2024

We give nearly-tight upper and lower bounds for the improving multi-armed bandits problem.

Paper
Add Code

A Correction of Pseudo Log-Likelihood Method

no code yet • 26 Mar 2024

Pseudo log-likelihood is a type of maximum likelihood estimation (MLE) method used in various fields including contextual bandits, influence maximization of social networks, and causal bandits.

Paper
Add Code

Contextual Restless Multi-Armed Bandits with Application to Demand Response Decision-Making

no code yet • 22 Mar 2024

This paper introduces a novel multi-armed bandits framework, termed Contextual Restless Bandits (CRB), for complex online decision-making.

Paper
Add Code

Transfer in Sequential Multi-armed Bandits via Reward Samples

no code yet • 19 Mar 2024

We consider a sequential stochastic multi-armed bandit problem where the agent interacts with bandit over multiple episodes.

Paper
Add Code

Phasic Diversity Optimization for Population-Based Reinforcement Learning

no code yet • 17 Mar 2024

Furthermore, we construct a dogfight scenario for aerial agents to demonstrate the practicality of the PDO algorithm.

Paper
Add Code

ε-Neural Thompson Sampling of Deep Brain Stimulation for Parkinson Disease Treatment

no code yet • 11 Mar 2024

Traditional commercial DBS devices are only able to deliver fixed-frequency periodic pulses to the basal ganglia (BG) regions of the brain, i. e., continuous DBS (cDBS).

Paper
Add Code

Efficient Public Health Intervention Planning Using Decomposition-Based Decision-Focused Learning

no code yet • 8 Mar 2024

However, the availability and time of these health workers are limited resources.

Paper
Add Code

Multi-Armed Bandits

Benchmarks Add a Result

Libraries

Datasets

Latest papers with no code

Content

Benchmarks

Add a Result