Search Results for author: Dhawal Gupta

Found 5 papers, 2 papers with code

From Past to Future: Rethinking Eligibility Traces

no code implementations • 20 Dec 2023 • Dhawal Gupta, Scott M. Jordan, Shreyas Chaudhari, Bo Liu, Philip S. Thomas, Bruno Castro da Silva

In this paper, we introduce a fresh perspective on the challenges of credit assignment and policy evaluation.

Paper
Add Code

Exploring the impact of low-rank adaptation on the performance, efficiency, and regularization of RLHF

1 code implementation • 16 Sep 2023 • Simeng Sun, Dhawal Gupta, Mohit Iyyer

During the last stage of RLHF, a large language model is aligned to human intents via PPO training, a process that generally requires large-scale computational resources.

Language Modelling Large Language Model

Paper
Code

Coagent Networks: Generalized and Scaled

no code implementations • 16 May 2023 • James E. Kostas, Scott M. Jordan, Yash Chandak, Georgios Theocharous, Dhawal Gupta, Martha White, Bruno Castro da Silva, Philip S. Thomas

However, the coagent framework is not just an alternative to BDL; the two approaches can be blended: BDL can be combined with coagent learning rules to create architectures with the advantages of both approaches.

Reinforcement Learning (RL)

Paper
Add Code

Structural Credit Assignment in Neural Networks using Reinforcement Learning

no code implementations • NeurIPS 2021 • Dhawal Gupta, Gabor Mihucz, Matthew Schlegel, James Kostas, Philip S. Thomas, Martha White

In this work, we revisit this approach and investigate if we can leverage other reinforcement learning approaches to improve learning.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Gradient Temporal-Difference Learning with Regularized Corrections

1 code implementation • ICML 2020 • Sina Ghiassian, Andrew Patterson, Shivam Garg, Dhawal Gupta, Adam White, Martha White

It is still common to use Q-learning and temporal difference (TD) learning-even though they have divergence issues and sound Gradient TD alternatives exist-because divergence seems rare and they typically perform well.

Q-Learning

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.