no code implementations • 5 Feb 2024 • Chinmaya Kausik, Mirco Mutti, Aldo Pacchiano, Ambuj Tewari
We show reductions from the the two dominant forms of human feedback in RLHF - cardinal and dueling feedback to PORRL.
no code implementations • 26 May 2023 • Chinmaya Kausik, Kashvi Srivastava, Rishi Sonthalia
Motivated by this, we study supervised denoising and noisy-input regression under distribution shift.
no code implementations • 29 Nov 2022 • Chinmaya Kausik, Yangyi Lu, Kevin Tan, Maggie Makar, Yixin Wang, Ambuj Tewari
Evaluating and optimizing policies in the presence of unobserved confounders is a problem of growing interest in offline reinforcement learning.
1 code implementation • 17 Nov 2022 • Chinmaya Kausik, Kevin Tan, Ambuj Tewari
We present an algorithm for learning mixtures of Markov chains and Markov decision processes (MDPs) from short unlabeled trajectories.