Multi-Armed Bandits
195 papers with code • 1 benchmarks • 2 datasets
Multi-armed bandits refer to a task where a fixed amount of resources must be allocated between competing resources that maximizes expected gain. Typically these problems involve an exploration/exploitation trade-off.
( Image credit: Microsoft Research )
Libraries
Use these libraries to find Multi-Armed Bandits models and implementationsLatest papers with no code
Feel-Good Thompson Sampling for Contextual Dueling Bandits
In this paper, we propose a Thompson sampling algorithm, named FGTS. CDB, for linear contextual dueling bandits.
On the Importance of Uncertainty in Decision-Making with Large Language Models
We compare this baseline to LLM bandits that make active use of uncertainty estimation by integrating the uncertainty in a Thompson Sampling policy.
Doubly-Robust Off-Policy Evaluation with Estimated Logging Policy
We introduce a novel doubly-robust (DR) off-policy evaluation (OPE) estimator for Markov decision processes, DRUnknown, designed for situations where both the logging policy and the value function are unknown.
Nearly-tight Approximation Guarantees for the Improving Multi-Armed Bandits Problem
We give nearly-tight upper and lower bounds for the improving multi-armed bandits problem.
A Correction of Pseudo Log-Likelihood Method
Pseudo log-likelihood is a type of maximum likelihood estimation (MLE) method used in various fields including contextual bandits, influence maximization of social networks, and causal bandits.
Contextual Restless Multi-Armed Bandits with Application to Demand Response Decision-Making
This paper introduces a novel multi-armed bandits framework, termed Contextual Restless Bandits (CRB), for complex online decision-making.
Transfer in Sequential Multi-armed Bandits via Reward Samples
We consider a sequential stochastic multi-armed bandit problem where the agent interacts with bandit over multiple episodes.
Phasic Diversity Optimization for Population-Based Reinforcement Learning
Furthermore, we construct a dogfight scenario for aerial agents to demonstrate the practicality of the PDO algorithm.
ε-Neural Thompson Sampling of Deep Brain Stimulation for Parkinson Disease Treatment
Traditional commercial DBS devices are only able to deliver fixed-frequency periodic pulses to the basal ganglia (BG) regions of the brain, i. e., continuous DBS (cDBS).
Efficient Public Health Intervention Planning Using Decomposition-Based Decision-Focused Learning
However, the availability and time of these health workers are limited resources.