2 code implementations • NeurIPS 2020 • Gabriel Kalweit, Maria Huegle, Moritz Werling, Joschka Boedecker
In this work, we introduce a novel class of algorithms that only needs to solve the MDP underlying the demonstrated behavior once to recover the expert policy.
no code implementations • 20 Mar 2020 • Gabriel Kalweit, Maria Huegle, Moritz Werling, Joschka Boedecker
We analyze the advantages of Constrained Q-learning in the tabular case and compare Constrained DQN to reward shaping and Lagrangian methods in the application of high-level decision making in autonomous driving, considering constraints for safety, keeping right and comfort.
no code implementations • 30 Sep 2019 • Maria Huegle, Gabriel Kalweit, Moritz Werling, Joschka Boedecker
The common pipeline in autonomous driving systems is highly modular and includes a perception component which extracts lists of surrounding objects and passes these lists to a high-level decision component.
no code implementations • 30 Sep 2019 • Gabriel Kalweit, Maria Huegle, Joschka Boedecker
We prove that the combination of these short- and long-term predictions is a representation of the full return, leading to the Composite Q-learning algorithm.
no code implementations • 25 Sep 2019 • Gabriel Kalweit, Maria Huegle, Joschka Boedecker
In the past few years, off-policy reinforcement learning methods have shown promising results in their application for robot control.
no code implementations • 25 Jul 2019 • Maria Huegle, Gabriel Kalweit, Branka Mirchevska, Moritz Werling, Joschka Boedecker
In many real-world decision making problems, reaching an optimal decision requires taking into account a variable number of objects around the agent.