no code implementations • 30 Oct 2023 • Zirui Yan, Arpan Mukherjee, Burak Varici, Ali Tajer
Cumulative regret is adopted as the design criteria, based on which the objective is to design a sequence of interventions that incur the smallest cumulative regret with respect to an oracle aware of the entire causal model and its fluctuations.
no code implementations • 20 Oct 2023 • P. N. Karthik, Vincent Y. F. Tan, Arpan Mukherjee, Ali Tajer
It is shown that under every policy, the state-action visitation proportions satisfy a specific approximate flow conservation constraint and that these proportions match the optimal proportions dictated by the lower bound under any asymptotically optimal policy.
no code implementations • 10 Jan 2023 • Arpan Mukherjee, Ali Tajer
Two key metrics for assessing bandit algorithms are computational efficiency and performance optimality (e. g., in sample complexity).
no code implementations • 10 Aug 2022 • Arpan Mukherjee, Ali Tajer, Pin-Yu Chen, Payel Das
Additionally, each process $i\in\{1, \dots, K\}$ has a private parameter $\alpha_i$.
no code implementations • 22 Jul 2022 • Arpan Mukherjee, Ali Tajer
Based on this test statistic, a BAI algorithm is designed that leverages the canonical sequential probability ratio tests for arm selection and is amenable to tractable analysis for the exponential family of bandits.
no code implementations • NeurIPS 2021 • Arpan Mukherjee, Ali Tajer, Pin-Yu Chen, Payel Das
Owing to the adversarial contamination of the rewards, each arm's mean is only partially identifiable.
no code implementations • NeurIPS 2021 • Arpan Mukherjee, Ali Tajer, Pin-Yu Chen, Payel Das
Owing to the adversarial contamination of the rewards, each arm's mean is only partially identifiable.