In this paper, we implement three state-of-art continuous reinforcement
learning algorithms, Deep Deterministic Policy Gradient (DDPG), Proximal Policy
Optimization (PPO) and Policy Gradient (PG)in portfolio management. All of them
are widely-used in game playing and robot control...
What's more, PPO has
appealing theoretical propeties which is hopefully potential in portfolio
management. We present the performances of them under different settings,
including different learning rates, objective functions, feature combinations,
in order to provide insights for parameters tuning, features selection and data
preparation. We also conduct intensive experiments in China Stock market and
show that PG is more desirable in financial market than DDPG and PPO, although
both of them are more advanced. What's more, we propose a so called Adversarial
Training method and show that it can greatly improve the training efficiency
and significantly promote average daily return and sharpe ratio in back test. Based on this new modification, our experiments results show that our agent
based on Policy Gradient can outperform UCRP.