Multi-Armed Bandits

195 papers with code • 1 benchmarks • 2 datasets

Multi-armed bandits refer to a task where a fixed amount of resources must be allocated between competing resources that maximizes expected gain. Typically these problems involve an exploration/exploitation trade-off.

( Image credit: Microsoft Research )

Libraries

Use these libraries to find Multi-Armed Bandits models and implementations

Latest papers with no code

Feel-Good Thompson Sampling for Contextual Dueling Bandits

no code yet • 9 Apr 2024

In this paper, we propose a Thompson sampling algorithm, named FGTS. CDB, for linear contextual dueling bandits.

On the Importance of Uncertainty in Decision-Making with Large Language Models

no code yet • 3 Apr 2024

We compare this baseline to LLM bandits that make active use of uncertainty estimation by integrating the uncertainty in a Thompson Sampling policy.

Doubly-Robust Off-Policy Evaluation with Estimated Logging Policy

no code yet • 2 Apr 2024

We introduce a novel doubly-robust (DR) off-policy evaluation (OPE) estimator for Markov decision processes, DRUnknown, designed for situations where both the logging policy and the value function are unknown.

Nearly-tight Approximation Guarantees for the Improving Multi-Armed Bandits Problem

no code yet • 1 Apr 2024

We give nearly-tight upper and lower bounds for the improving multi-armed bandits problem.

A Correction of Pseudo Log-Likelihood Method

no code yet • 26 Mar 2024

Pseudo log-likelihood is a type of maximum likelihood estimation (MLE) method used in various fields including contextual bandits, influence maximization of social networks, and causal bandits.

Contextual Restless Multi-Armed Bandits with Application to Demand Response Decision-Making

no code yet • 22 Mar 2024

This paper introduces a novel multi-armed bandits framework, termed Contextual Restless Bandits (CRB), for complex online decision-making.

Transfer in Sequential Multi-armed Bandits via Reward Samples

no code yet • 19 Mar 2024

We consider a sequential stochastic multi-armed bandit problem where the agent interacts with bandit over multiple episodes.

Phasic Diversity Optimization for Population-Based Reinforcement Learning

no code yet • 17 Mar 2024

Furthermore, we construct a dogfight scenario for aerial agents to demonstrate the practicality of the PDO algorithm.

ε-Neural Thompson Sampling of Deep Brain Stimulation for Parkinson Disease Treatment

no code yet • 11 Mar 2024

Traditional commercial DBS devices are only able to deliver fixed-frequency periodic pulses to the basal ganglia (BG) regions of the brain, i. e., continuous DBS (cDBS).

Efficient Public Health Intervention Planning Using Decomposition-Based Decision-Focused Learning

no code yet • 8 Mar 2024

However, the availability and time of these health workers are limited resources.