Search Results for author: Dhruv Malik

Found 8 papers, 0 papers with code

Specifying and Solving Robust Empirical Risk Minimization Problems Using CVXPY

no code implementations9 Jun 2023 Eric Luxenberg, Dhruv Malik, Yuanzhi Li, Aarti Singh, Stephen Boyd

We consider robust empirical risk minimization (ERM), where model parameters are chosen to minimize the worst-case empirical loss when each data point varies over a given convex uncertainty set.

Weighted Tallying Bandits: Overcoming Intractability via Repeated Exposure Optimality

no code implementations4 May 2023 Dhruv Malik, Conor Igoe, Yuanzhi Li, Aarti Singh

Motivated by this, a significant line of work has formalized settings where an action's loss is a function of the number of times that action was recently played in the prior $m$ timesteps, where $m$ corresponds to a bound on human memory capacity.

Recommendation Systems

Complete Policy Regret Bounds for Tallying Bandits

no code implementations24 Apr 2022 Dhruv Malik, Yuanzhi Li, Aarti Singh

Policy regret is a well established notion of measuring the performance of an online learning algorithm against an adaptive adversary.

Sample Efficient Reinforcement Learning In Continuous State Spaces: A Perspective Beyond Linearity

no code implementations15 Jun 2021 Dhruv Malik, Aldo Pacchiano, Vishwak Srinivasan, Yuanzhi Li

Reinforcement learning (RL) is empirically successful in complex nonlinear Markov decision processes (MDPs) with continuous state spaces.

Atari Games reinforcement-learning +1

When Is Generalizable Reinforcement Learning Tractable?

no code implementations NeurIPS 2021 Dhruv Malik, Yuanzhi Li, Pradeep Ravikumar

Agents trained by reinforcement learning (RL) often fail to generalize beyond the environment they were trained in, even when presented with new scenarios that seem similar to the training environment.

reinforcement-learning Reinforcement Learning (RL) +1

Derivative-Free Methods for Policy Optimization: Guarantees for Linear Quadratic Systems

no code implementations20 Dec 2018 Dhruv Malik, Ashwin Pananjady, Kush Bhatia, Koulik Khamaru, Peter L. Bartlett, Martin J. Wainwright

We focus on characterizing the convergence rate of these methods when applied to linear-quadratic systems, and study various settings of driving noise and reward feedback.

An Efficient, Generalized Bellman Update For Cooperative Inverse Reinforcement Learning

no code implementations ICML 2018 Dhruv Malik, Malayandi Palaniappan, Jaime F. Fisac, Dylan Hadfield-Menell, Stuart Russell, Anca D. Dragan

We apply this update to a variety of POMDP solvers and find that it enables us to scale CIRL to non-trivial problems, with larger reward parameter spaces, and larger action spaces for both robot and human.

reinforcement-learning Reinforcement Learning (RL)

Pragmatic-Pedagogic Value Alignment

no code implementations20 Jul 2017 Jaime F. Fisac, Monica A. Gates, Jessica B. Hamrick, Chang Liu, Dylan Hadfield-Menell, Malayandi Palaniappan, Dhruv Malik, S. Shankar Sastry, Thomas L. Griffiths, Anca D. Dragan

In robotics, value alignment is key to the design of collaborative robots that can integrate into human workflows, successfully inferring and adapting to their users' objectives as they go.

Decision Making

Cannot find the paper you are looking for? You can Submit a new open access paper.