no code implementations • 17 Jun 2022 • Ramki Gummadi, Saurabh Kumar, Junfeng Wen, Dale Schuurmans
Approaches to policy optimization have been motivated from diverse principles, based on how the parametric model is interpreted (e. g. value versus policy representation) or how the learning objective is formulated, yet they share a common goal of maximizing expected return.
no code implementations • ICLR 2022 • Chenjun Xiao, Bo Dai, Jincheng Mei, Oscar A Ramirez, Ramki Gummadi, Chris Harris, Dale Schuurmans
To better understand the utility of deep models in RL we present an analysis of recursive value estimation using overparameterized linear representations that provides useful, transferable findings.
no code implementations • 13 Jun 2021 • Junfeng Wen, Saurabh Kumar, Ramki Gummadi, Dale Schuurmans
Actor-critic (AC) methods are ubiquitous in reinforcement learning.
no code implementations • NeurIPS 2019 • Minmin Chen, Ramki Gummadi, Chris Harris, Dale Schuurmans
We investigate batch policy optimization for cost-sensitive classification and contextual bandits---two related tasks that obviate exploration but require generalizing from observed rewards to action selections in unseen contexts.
no code implementations • 5 Apr 2018 • Aditya Grover, Ramki Gummadi, Miguel Lazaro-Gredilla, Dale Schuurmans, Stefano Ermon
Learning latent variable models with stochastic variational inference is challenging when the approximate posterior is far from the true posterior, due to high variance in the gradient estimates.