Generalized State-Dependent Exploration for Deep Reinforcement Learning in Robotics

12 May 2020 Antonin Raffin Freek Stulp

Reinforcement learning (RL) enables robots to learn skills from interactions with the real world. In practice, the unstructured step-based exploration used in Deep RL -- often very successful in simulation -- leads to jerky motion patterns on real robots... (read more)

PDF Abstract

Datasets


Results from the Paper


TASK DATASET MODEL METRIC NAME METRIC VALUE GLOBAL RANK RESULT BENCHMARK
Continuous Control PyBullet Ant PPO gSDE Return 2587 # 5
Continuous Control PyBullet Ant PPO Return 2160 # 7
Continuous Control PyBullet Ant TD3 Return 2865 # 3
Continuous Control PyBullet Ant SAC Return 2859 # 4
Continuous Control PyBullet Ant A2C gSDE Return 2560 # 6
Continuous Control PyBullet Ant A2C Return 1967 # 8
Continuous Control PyBullet Ant SAC gSDE Return 3106 # 2
Continuous Control PyBullet Ant TD3 gSDE Return 3267 # 1
Continuous Control PyBullet HalfCheetah PPO Return 2254 # 6
Continuous Control PyBullet HalfCheetah TD3 gSDE Return 2578 # 5
Continuous Control PyBullet HalfCheetah TD3 Return 2687 # 4
Continuous Control PyBullet HalfCheetah A2C + gSDE Return 2028 # 7
Continuous Control PyBullet HalfCheetah A2C Return 1652 # 8
Continuous Control PyBullet HalfCheetah PPO + gSDE Return 2760 # 3
Continuous Control PyBullet HalfCheetah SAC gSDE Return 2945 # 1
Continuous Control PyBullet HalfCheetah SAC Return 2883 # 2
Continuous Control PyBullet Hopper A2C Return 1559 # 7
Continuous Control PyBullet Hopper A2C gSDE Return 1448 # 8
Continuous Control PyBullet Hopper PPO gSDE Return 2508 # 2
Continuous Control PyBullet Hopper SAC gSDE Return 2515 # 1
Continuous Control PyBullet Hopper PPO Return 1622 # 6
Continuous Control PyBullet Hopper SAC Return 2477 # 3
Continuous Control PyBullet Hopper TD3 gSDE Return 2353 # 5
Continuous Control PyBullet Hopper TD3 Return 2470 # 4
Continuous Control PyBullet Walker2D SAC Return 2215 # 2
Continuous Control PyBullet Walker2D SAC gSDE Return 2270 # 1
Continuous Control PyBullet Walker2D A2C gSDE Return 694 # 7
Continuous Control PyBullet Walker2D PPO gSDE Return 1776 # 5
Continuous Control PyBullet Walker2D TD3 gSDE Return 1989 # 4
Continuous Control PyBullet Walker2D TD3 Return 2106 # 3
Continuous Control PyBullet Walker2D PPO Return 1238 # 6
Continuous Control PyBullet Walker2D A2C Return 443 # 8

Methods used in the Paper


METHOD TYPE
A2C
Policy Gradient Methods
Target Policy Smoothing
Regularization
Clipped Double Q-learning
Off-Policy TD Control
TD3
Policy Gradient Methods
Experience Replay
Replay Memory
Dense Connections
Feedforward Networks
ReLU
Activation Functions
Adam
Stochastic Optimization
Soft Actor Critic
Policy Gradient Methods
Entropy Regularization
Regularization
PPO
Policy Gradient Methods