no code implementations • 12 Apr 2024 • Shreyas Chaudhari, Pranjal Aggarwal, Vishvak Murahari, Tanmay Rajpurohit, Ashwin Kalyan, Karthik Narasimhan, Ameet Deshpande, Bruno Castro da Silva
A promising approach is reinforcement learning from human feedback (RLHF), which leverages human feedback to update the model in accordance with human preferences and mitigate issues like toxicity and hallucinations.
no code implementations • 10 Apr 2024 • Shreyas Chaudhari, Srinivasa Pranav, José M. F. Moura
Our analysis leads to two distinct GradNet architectures, GradNet-C and GradNet-M, and we describe the corresponding monotone versions, mGradNet-C and mGradNet-M. Our empirical results show that these architectures offer efficient parameterizations and outperform popular methods in gradient field learning tasks.
no code implementations • 20 Dec 2023 • Dhawal Gupta, Scott M. Jordan, Shreyas Chaudhari, Bo Liu, Philip S. Thomas, Bruno Castro da Silva
In this paper, we introduce a fresh perspective on the challenges of credit assignment and policy evaluation.
1 code implementation • 27 Aug 2023 • Shreyas Chaudhari, David Arbour, Georgios Theocharous, Nikos Vlassis
Prior work has developed estimators that leverage the structure in slates to estimate the expected off-policy performance, but the estimation of the entire performance distribution remains elusive.
no code implementations • 25 Jan 2023 • Shreyas Chaudhari, Srinivasa Pranav, José M. F. Moura
While much effort has been devoted to deriving and analyzing effective convex formulations of signal processing problems, the gradients of convex functions also have critical applications ranging from gradient-based optimization to optimal transport.
1 code implementation • 15 May 2022 • Peikai Li, Ipek Ilayda Onur, Scott Dodelson, Shreyas Chaudhari
Next-generation cosmic microwave background (CMB) surveys are expected to provide valuable information about the primordial universe by creating maps of the mass along the line of sight.
Generative Adversarial Network Vocal Bursts Intensity Prediction
no code implementations • 18 Feb 2021 • Shreyas Chaudhari, Harideep Nair, José M. F. Moura, John Paul Shen
Unsupervised time series clustering is a challenging problem with diverse industrial applications such as anomaly detection, bio-wearables, etc.
1 code implementation • ICLR 2021 • Benjamin Eysenbach, Swapnil Asawa, Shreyas Chaudhari, Sergey Levine, Ruslan Salakhutdinov
Building off of a probabilistic view of RL, we formally show that we can achieve this goal by compensating for the difference in dynamics by modifying the reward function.
2 code implementations • 6 Nov 2019 • Samarth Gupta, Shreyas Chaudhari, Gauri Joshi, Osman Yağan
We consider a multi-armed bandit framework where the rewards obtained by pulling different arms are correlated.
no code implementations • 18 Oct 2018 • Samarth Gupta, Shreyas Chaudhari, Subhojyoti Mukherjee, Gauri Joshi, Osman Yağan
We consider a finite-armed structured bandit problem in which mean rewards of different arms are known functions of a common hidden parameter $\theta^*$.