1 code implementation • 10 Apr 2024 • Ayush Sawarni, Nirjhar Das, Siddharth Barman, Gaurav Sinha
For our batch learning algorithm B-GLinCB, with $\Omega\left( \log{\log T} \right)$ batches, the regret scales as $\tilde{O}(\sqrt{T})$.
no code implementations • 16 Feb 2024 • Nirjhar Das, Souradip Chakraborty, Aldo Pacchiano, Sayak Ray Chowdhury
Experimental evaluations on a human preference dataset validate \texttt{APO}'s efficacy as a sample-efficient and practical solution to data collection for RLHF, facilitating alignment of LLMs with human preferences in a cost-effective and scalable manner.
no code implementations • 14 May 2023 • Nirjhar Das, Arpan Chattopadhyay
In this work, we propose a novel inverse reinforcement learning (IRL) algorithm for constrained Markov decision process (CMDP) problems.
no code implementations • 27 Jun 2022 • Mustafa Chasmai, Nirjhar Das, Aman Bhardwaj, Rahul Garg
We argue that for most of the applications, validation accuracies on unseen subjects and unseen camera angles would be most important.