Policy Optimization With Penalized Point Probability Distance: An Alternative To Proximal Policy Optimization

2 Jul 2018 Xiangxiang Chu

As the most successful variant and improvement for Trust Region Policy Optimization (TRPO), proximal policy optimization (PPO) has been widely applied across various domains with several advantages: efficient data utilization, easy implementation, and good parallelism. In this paper, a first-order gradient reinforcement learning algorithm called Policy Optimization with Penalized Point Probability Distance (POP3D), which is a lower bound to the square of total variance divergence is proposed as another powerful variant... (read more)

PDF Abstract

Datasets


Results from the Paper


TASK DATASET MODEL METRIC NAME METRIC VALUE GLOBAL RANK RESULT BENCHMARK
Atari Games Atari 2600 Alien POP3D Score 1510.8 # 26
Atari Games Atari 2600 Amidar POP3D Score 729.15 # 24
Atari Games Atari 2600 Assault POP3D Score 5400.13 # 21
Atari Games Atari 2600 Asterix POP3D Score 4310.67 # 35
Atari Games Atari 2600 Asteroids POP3D Score 2488.1 # 19
Atari Games Atari 2600 Atlantis POP3D Score 2193605.67 # 2
Atari Games Atari 2600 Bank Heist POP3D Score 1212.23 # 12
Atari Games Atari 2600 Battle Zone POP3D Score 15466.67 # 36
Atari Games Atari 2600 Beam Rider POP3D Score 4549 # 35
Atari Games Atari 2600 Bowling POP3D Score 38.99 # 30
Atari Games Atari 2600 Boxing POP3D Score 97.23 # 12
Atari Games Atari 2600 Breakout POP3D Score 458.41 # 15
Atari Games Atari 2600 Centipede POP3D Score 3315.44 # 36
Atari Games Atari 2600 Chopper Command POP3D Score 6308.33 # 21
Atari Games Atari 2600 Crazy Climber POP3D Score 120247.33 # 24
Atari Games Atari 2600 Demon Attack POP3D Score 61147.33 # 23
Atari Games Atari 2600 Double Dunk POP3D Score -7.89 # 28
Atari Games Atari 2600 Enduro POP3D Score 459.85 # 29
Atari Games Atari 2600 Fishing Derby POP3D Score 28.99 # 16
Atari Games Atari 2600 Freeway POP3D Score 21.21 # 29
Atari Games Atari 2600 Frostbite POP3D Score 316.87 # 38
Atari Games Atari 2600 Gopher POP3D Score 6207 # 31
Atari Games Atari 2600 Gravitar POP3D Score 557.17 # 19
Atari Games Atari 2600 Ice Hockey POP3D Score -4.12 # 29
Atari Games Atari 2600 James Bond POP3D Score 358.54 # 32
Atari Games Atari 2600 Kangaroo POP3D Score 3891.67 # 24
Atari Games Atari 2600 Krull POP3D Score 7715.68 # 26
Atari Games Atari 2600 Kung-Fu Master POP3D Score 33728 # 22
Atari Games Atari 2600 Montezuma's Revenge POP3D Score 0 # 35
Atari Games Atari 2600 Ms. Pacman POP3D Score 1683.87 # 30
Atari Games Atari 2600 Name This Game POP3D Score 6065.63 # 31
Atari Games Atari 2600 Pitfall! POP3D Score 0 # 4
Atari Games Atari 2600 Pong POP3D Score 20.5 # 6
Atari Games Atari 2600 Private Eye POP3D Score 79.67 # 39
Atari Games Atari 2600 Q*Bert POP3D Score 15396.67 # 18
Atari Games Atari 2600 River Raid POP3D Score 8052.23 # 26
Atari Games Atari 2600 Road Runner POP3D Score 44679.67 # 22
Atari Games Atari 2600 Robotank POP3D Score 4.6 # 34
Atari Games Atari 2600 Seaquest POP3D Score 1807.47 # 33
Atari Games Atari 2600 Space Invaders POP3D Score 1216.15 # 34
Atari Games Atari 2600 Star Gunner POP3D Score 48984 # 28
Atari Games Atari 2600 Tennis POP3D Score -8.32 # 24
Atari Games Atari 2600 Time Pilot POP3D Score 3770.33 # 35
Atari Games Atari 2600 Tutankham POP3D Score 241.21 # 13
Atari Games Atari 2600 Up and Down POP3D Score 242701.51 # 8
Atari Games Atari 2600 Venture POP3D Score 36.33 # 30
Atari Games Atari 2600 Video Pinball POP3D Score 37780.7 # 32
Atari Games Atari 2600 Wizard of Wor POP3D Score 4704 # 28
Atari Games Atari 2600 Zaxxon POP3D Score 9472 # 24
MuJoCo Games HalfCheetah POP3D Mean 3184.54 # 1
MuJoCo Games Hopper POP3D Mean 1452.09 # 1
MuJoCo Games InvertedDoublePendulum POP3D Mean 4907.64 # 1
MuJoCo Games InvertedPendulum POP3D Mean 741.94 # 1
MuJoCo Games Reacher POP3D Mean -4.29 # 1
MuJoCo Games Swimmer POP3D Mean 111.08 # 1
MuJoCo Games Walker2d POP3D Mean 3966.01 # 1

Methods used in the Paper


METHOD TYPE
Entropy Regularization
Regularization
PPO
Policy Gradient Methods