Multi-Armed Bandits

195 papers with code • 1 benchmarks • 2 datasets

Multi-armed bandits refer to a task where a fixed amount of resources must be allocated between competing resources that maximizes expected gain. Typically these problems involve an exploration/exploitation trade-off.

( Image credit: Microsoft Research )

Libraries

Use these libraries to find Multi-Armed Bandits models and implementations

Optimal Regret with Limited Adaptivity for Generalized Linear Contextual Bandits

nirjhar-das/glbandit_limited_adaptivity 10 Apr 2024

For our batch learning algorithm B-GLinCB, with $\Omega\left( \log{\log T} \right)$ batches, the regret scales as $\tilde{O}(\sqrt{T})$.

0
10 Apr 2024

Sequential Decision Making with Expert Demonstrations under Unobserved Heterogeneity

vdblm/ExPerior 10 Apr 2024

We study the problem of online sequential decision-making given auxiliary demonstrations from experts who made their decisions based on unobserved contextual information.

0
10 Apr 2024

Best Arm Identification for Prompt Learning under a Limited Budget

shengroup/triple 15 Feb 2024

Based on this connection, a general framework TRIPLE (besT aRm Identification for Prompt LEarning) is proposed to harness the power of BAI-FB in prompt learning systematically.

1
15 Feb 2024

Fairness of Exposure in Online Restless Multi-armed Bandits

rchiso/mf-rmab 9 Feb 2024

However, they do not consider the distribution of pulls among the arms.

0
09 Feb 2024

A Bayesian Approach to Online Learning for Contextual Restless Bandits with Applications to Public Health

biyonka/bcor 7 Feb 2024

Restless multi-armed bandits (RMABs) are used to model sequential resource allocation in public health intervention programs.

1
07 Feb 2024

Off-Policy Evaluation of Slate Bandit Policies via Optimizing Abstraction

aiueola/webconf2024-slate-ope-via-abstraction 3 Feb 2024

The PseudoInverse (PI) estimator has been introduced to mitigate the variance issue by assuming linearity in the reward function, but this can result in significant bias as this assumption is hard-to-verify from observed data and is often substantially violated.

0
03 Feb 2024

Falcon: Fair Active Learning using Multi-armed Bandits

khtae8250/falcon 23 Jan 2024

Given a user-specified group fairness measure, Falcon identifies samples from "target groups" (e. g., (attribute=female, label=positive)) that are the most informative for improving fairness.

0
23 Jan 2024

On Quantum Natural Policy Gradients

andre-sequeira10/gqnpg 16 Jan 2024

This research delves into the role of the quantum Fisher Information Matrix (FIM) in enhancing the performance of Parameterized Quantum Circuit (PQC)-based reinforcement learning agents.

0
16 Jan 2024

Let's Get It Started: Fostering the Discoverability of New Releases on Deezer

deezer/new-releases-ecir2024 5 Jan 2024

This paper presents our recent initiatives to foster the discoverability of new releases on the music streaming service Deezer.

1
05 Jan 2024

In-Context Reinforcement Learning for Variable Action Spaces

corl-team/headless-ad 20 Dec 2023

Recently, it has been shown that transformers pre-trained on diverse datasets with multi-episode contexts can generalize to new reinforcement learning tasks in-context.

15
20 Dec 2023