no code implementations • 12 Mar 2024 • Adam Villaflor, Brian Yang, Huangyuan Su, Katerina Fragkiadaki, John Dolan, Jeff Schneider
Although these models have conventionally been evaluated for open-loop prediction, we show that they can be used to parameterize autoregressive closed-loop models without retraining.
no code implementations • 21 Jul 2022 • Adam Villaflor, Zhe Huang, Swapnil Pande, John Dolan, Jeff Schneider
Impressive results in natural language processing (NLP) based on the Transformer neural network architecture have inspired researchers to explore viewing offline reinforcement learning (RL) as a generic sequence modeling problem.
no code implementations • 26 Apr 2022 • Ian Char, Viraj Mehta, Adam Villaflor, John M. Dolan, Jeff Schneider
Past efforts for developing algorithms in this area have revolved around introducing constraints to online reinforcement learning algorithms to ensure the actions of the learned policy are constrained to the logged data.
no code implementations • 22 Mar 2021 • Christoph Killing, Adam Villaflor, John M. Dolan
We train policies to robustly negotiate with opposing vehicles of an unobservable degree of cooperativeness using multi-agent reinforcement learning (MARL).
no code implementations • 1 Jan 2021 • Adam Villaflor, John Dolan, Jeff Schneider
Then, we can optionally enter a second stage where we fine-tune the policy using our novel Model-Based Behavior-Regularized Policy Optimization (MB2PO) algorithm.
1 code implementation • 16 Oct 2018 • Gregory Kahn, Adam Villaflor, Pieter Abbeel, Sergey Levine
We show that a simulated robotic car and a real-world RC car can gather data and train fully autonomously without any human-provided labels beyond those needed to train the detectors, and then at test-time be able to accomplish a variety of different tasks.
2 code implementations • 29 Sep 2017 • Gregory Kahn, Adam Villaflor, Bosen Ding, Pieter Abbeel, Sergey Levine
To address the need to learn complex policies with few samples, we propose a generalized computation graph that subsumes value-based model-free methods and model-based methods, with specific instantiations interpolating between model-free and model-based.
no code implementations • 3 Feb 2017 • Gregory Kahn, Adam Villaflor, Vitchyr Pong, Pieter Abbeel, Sergey Levine
However, practical deployment of reinforcement learning methods must contend with the fact that the training process itself can be unsafe for the robot.