Search Results for author: Shreyas Chaudhari

Found 10 papers, 4 papers with code

RLHF Deciphered: A Critical Analysis of Reinforcement Learning from Human Feedback for LLMs

no code implementations • 12 Apr 2024 • Shreyas Chaudhari, Pranjal Aggarwal, Vishvak Murahari, Tanmay Rajpurohit, Ashwin Kalyan, Karthik Narasimhan, Ameet Deshpande, Bruno Castro da Silva

A promising approach is reinforcement learning from human feedback (RLHF), which leverages human feedback to update the model in accordance with human preferences and mitigate issues like toxicity and hallucinations.

Language Modelling reinforcement-learning

Paper
Add Code

Gradient Networks

no code implementations • 10 Apr 2024 • Shreyas Chaudhari, Srinivasa Pranav, José M. F. Moura

Our analysis leads to two distinct GradNet architectures, GradNet-C and GradNet-M, and we describe the corresponding monotone versions, mGradNet-C and mGradNet-M. Our empirical results show that these architectures offer efficient parameterizations and outperform popular methods in gradient field learning tasks.

Paper
Add Code

From Past to Future: Rethinking Eligibility Traces

no code implementations • 20 Dec 2023 • Dhawal Gupta, Scott M. Jordan, Shreyas Chaudhari, Bo Liu, Philip S. Thomas, Bruno Castro da Silva

In this paper, we introduce a fresh perspective on the challenges of credit assignment and policy evaluation.

Paper
Add Code

Distributional Off-Policy Evaluation for Slate Recommendations

1 code implementation • 27 Aug 2023 • Shreyas Chaudhari, David Arbour, Georgios Theocharous, Nikos Vlassis

Prior work has developed estimators that leverage the structure in slates to estimate the expected off-policy performance, but the estimation of the entire performance distribution remains elusive.

Fairness Off-policy evaluation

Paper
Code

Learning Gradients of Convex Functions with Monotone Gradient Networks

no code implementations • 25 Jan 2023 • Shreyas Chaudhari, Srinivasa Pranav, José M. F. Moura

While much effort has been devoted to deriving and analyzing effective convex formulations of signal processing problems, the gradients of convex functions also have critical applications ranging from gradient-based optimization to optimal transport.

Paper
Add Code

High-Resolution CMB Lensing Reconstruction with Deep Learning

1 code implementation • 15 May 2022 • Peikai Li, Ipek Ilayda Onur, Scott Dodelson, Shreyas Chaudhari

Next-generation cosmic microwave background (CMB) surveys are expected to provide valuable information about the primordial universe by creating maps of the mass along the line of sight.

Generative Adversarial Network Vocal Bursts Intensity Prediction

Paper
Code

Unsupervised Clustering of Time Series Signals using Neuromorphic Energy-Efficient Temporal Neural Networks

no code implementations • 18 Feb 2021 • Shreyas Chaudhari, Harideep Nair, José M. F. Moura, John Paul Shen

Unsupervised time series clustering is a challenging problem with diverse industrial applications such as anomaly detection, bio-wearables, etc.

Anomaly Detection Clustering +2

Paper
Add Code

Off-Dynamics Reinforcement Learning: Training for Transfer with Domain Classifiers

1 code implementation • ICLR 2021 • Benjamin Eysenbach, Swapnil Asawa, Shreyas Chaudhari, Sergey Levine, Ruslan Salakhutdinov

Building off of a probabilistic view of RL, we formally show that we can achieve this goal by compensating for the difference in dynamics by modifying the reward function.

Continuous Control Domain Adaptation +2

Paper
Code

Multi-Armed Bandits with Correlated Arms

2 code implementations • 6 Nov 2019 • Samarth Gupta, Shreyas Chaudhari, Gauri Joshi, Osman Yağan

We consider a multi-armed bandit framework where the rewards obtained by pulling different arms are correlated.

Multi-Armed Bandits

Paper
Code

A Unified Approach to Translate Classical Bandit Algorithms to the Structured Bandit Setting

no code implementations • 18 Oct 2018 • Samarth Gupta, Shreyas Chaudhari, Subhojyoti Mukherjee, Gauri Joshi, Osman Yağan

We consider a finite-armed structured bandit problem in which mean rewards of different arms are known functions of a common hidden parameter $\theta^*$.

Thompson Sampling

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.