Search Results for author: Shreyas Chaudhari

Found 10 papers, 4 papers with code

RLHF Deciphered: A Critical Analysis of Reinforcement Learning from Human Feedback for LLMs

no code implementations12 Apr 2024 Shreyas Chaudhari, Pranjal Aggarwal, Vishvak Murahari, Tanmay Rajpurohit, Ashwin Kalyan, Karthik Narasimhan, Ameet Deshpande, Bruno Castro da Silva

A promising approach is reinforcement learning from human feedback (RLHF), which leverages human feedback to update the model in accordance with human preferences and mitigate issues like toxicity and hallucinations.

Language Modelling reinforcement-learning

Gradient Networks

no code implementations10 Apr 2024 Shreyas Chaudhari, Srinivasa Pranav, José M. F. Moura

Our analysis leads to two distinct GradNet architectures, GradNet-C and GradNet-M, and we describe the corresponding monotone versions, mGradNet-C and mGradNet-M. Our empirical results show that these architectures offer efficient parameterizations and outperform popular methods in gradient field learning tasks.

From Past to Future: Rethinking Eligibility Traces

no code implementations20 Dec 2023 Dhawal Gupta, Scott M. Jordan, Shreyas Chaudhari, Bo Liu, Philip S. Thomas, Bruno Castro da Silva

In this paper, we introduce a fresh perspective on the challenges of credit assignment and policy evaluation.

Distributional Off-Policy Evaluation for Slate Recommendations

1 code implementation27 Aug 2023 Shreyas Chaudhari, David Arbour, Georgios Theocharous, Nikos Vlassis

Prior work has developed estimators that leverage the structure in slates to estimate the expected off-policy performance, but the estimation of the entire performance distribution remains elusive.

Fairness Off-policy evaluation

Learning Gradients of Convex Functions with Monotone Gradient Networks

no code implementations25 Jan 2023 Shreyas Chaudhari, Srinivasa Pranav, José M. F. Moura

While much effort has been devoted to deriving and analyzing effective convex formulations of signal processing problems, the gradients of convex functions also have critical applications ranging from gradient-based optimization to optimal transport.

High-Resolution CMB Lensing Reconstruction with Deep Learning

1 code implementation15 May 2022 Peikai Li, Ipek Ilayda Onur, Scott Dodelson, Shreyas Chaudhari

Next-generation cosmic microwave background (CMB) surveys are expected to provide valuable information about the primordial universe by creating maps of the mass along the line of sight.

Generative Adversarial Network Vocal Bursts Intensity Prediction

Unsupervised Clustering of Time Series Signals using Neuromorphic Energy-Efficient Temporal Neural Networks

no code implementations18 Feb 2021 Shreyas Chaudhari, Harideep Nair, José M. F. Moura, John Paul Shen

Unsupervised time series clustering is a challenging problem with diverse industrial applications such as anomaly detection, bio-wearables, etc.

Anomaly Detection Clustering +2

Off-Dynamics Reinforcement Learning: Training for Transfer with Domain Classifiers

1 code implementation ICLR 2021 Benjamin Eysenbach, Swapnil Asawa, Shreyas Chaudhari, Sergey Levine, Ruslan Salakhutdinov

Building off of a probabilistic view of RL, we formally show that we can achieve this goal by compensating for the difference in dynamics by modifying the reward function.

Continuous Control Domain Adaptation +2

Multi-Armed Bandits with Correlated Arms

2 code implementations6 Nov 2019 Samarth Gupta, Shreyas Chaudhari, Gauri Joshi, Osman Yağan

We consider a multi-armed bandit framework where the rewards obtained by pulling different arms are correlated.

Multi-Armed Bandits

A Unified Approach to Translate Classical Bandit Algorithms to the Structured Bandit Setting

no code implementations18 Oct 2018 Samarth Gupta, Shreyas Chaudhari, Subhojyoti Mukherjee, Gauri Joshi, Osman Yağan

We consider a finite-armed structured bandit problem in which mean rewards of different arms are known functions of a common hidden parameter $\theta^*$.

Thompson Sampling

Cannot find the paper you are looking for? You can Submit a new open access paper.