Multi-Armed Bandits
195 papers with code • 1 benchmarks • 2 datasets
Multi-armed bandits refer to a task where a fixed amount of resources must be allocated between competing resources that maximizes expected gain. Typically these problems involve an exploration/exploitation trade-off.
( Image credit: Microsoft Research )
Libraries
Use these libraries to find Multi-Armed Bandits models and implementationsLatest papers
Optimal Regret with Limited Adaptivity for Generalized Linear Contextual Bandits
For our batch learning algorithm B-GLinCB, with $\Omega\left( \log{\log T} \right)$ batches, the regret scales as $\tilde{O}(\sqrt{T})$.
Sequential Decision Making with Expert Demonstrations under Unobserved Heterogeneity
We study the problem of online sequential decision-making given auxiliary demonstrations from experts who made their decisions based on unobserved contextual information.
Best Arm Identification for Prompt Learning under a Limited Budget
Based on this connection, a general framework TRIPLE (besT aRm Identification for Prompt LEarning) is proposed to harness the power of BAI-FB in prompt learning systematically.
Fairness of Exposure in Online Restless Multi-armed Bandits
However, they do not consider the distribution of pulls among the arms.
A Bayesian Approach to Online Learning for Contextual Restless Bandits with Applications to Public Health
Restless multi-armed bandits (RMABs) are used to model sequential resource allocation in public health intervention programs.
Off-Policy Evaluation of Slate Bandit Policies via Optimizing Abstraction
The PseudoInverse (PI) estimator has been introduced to mitigate the variance issue by assuming linearity in the reward function, but this can result in significant bias as this assumption is hard-to-verify from observed data and is often substantially violated.
Falcon: Fair Active Learning using Multi-armed Bandits
Given a user-specified group fairness measure, Falcon identifies samples from "target groups" (e. g., (attribute=female, label=positive)) that are the most informative for improving fairness.
On Quantum Natural Policy Gradients
This research delves into the role of the quantum Fisher Information Matrix (FIM) in enhancing the performance of Parameterized Quantum Circuit (PQC)-based reinforcement learning agents.
Let's Get It Started: Fostering the Discoverability of New Releases on Deezer
This paper presents our recent initiatives to foster the discoverability of new releases on the music streaming service Deezer.
In-Context Reinforcement Learning for Variable Action Spaces
Recently, it has been shown that transformers pre-trained on diverse datasets with multi-episode contexts can generalize to new reinforcement learning tasks in-context.