Search Results for author: Tianbing Xu

Found 7 papers, 1 papers with code

Learning to Explore via Meta-Policy Gradient

no code implementations ICML 2018 Tianbing Xu, Qiang Liu, Liang Zhao, Jian Peng

The performance of off-policy learning, including deep Q-learning and deep deterministic policy gradient (DDPG), critically depends on the choice of the exploration policy.

Continuous Control Q-Learning +2

Learning to Explore with Meta-Policy Gradient

no code implementations13 Mar 2018 Tianbing Xu, Qiang Liu, Liang Zhao, Jian Peng

The performance of off-policy learning, including deep Q-learning and deep deterministic policy gradient (DDPG), critically depends on the choice of the exploration policy.

Q-Learning Reinforcement Learning (RL)

Variational Inference for Policy Gradient

no code implementations21 Feb 2018 Tianbing Xu

Inspired by the seminal work on Stein Variational Inference and Stein Variational Policy Gradient, we derived a method to generate samples from the posterior variational parameter distribution by \textit{explicitly} minimizing the KL divergence to match the target distribution in an amortize fashion.

reinforcement-learning Reinforcement Learning (RL) +1

Stochastic Variance Reduction for Policy Gradient Estimation

no code implementations17 Oct 2017 Tianbing Xu, Qiang Liu, Jian Peng

Recent advances in policy gradient methods and deep learning have demonstrated their applicability for complex reinforcement learning problems.

Continuous Control Policy Gradient Methods +2

Online Classification Using a Voted RDA Method

no code implementations17 Oct 2013 Tianbing Xu, Jianfeng Gao, Lin Xiao, Amelia Regan

We propose a voted dual averaging method for online classification problems with explicit regularization.

Classification General Classification

Thompson Sampling in Dynamic Systems for Contextual Bandit Problems

no code implementations17 Oct 2013 Tianbing Xu, Yaming Yu, John Turner, Amelia Regan

For the context bandit problems, Thompson Sampling is adopted based on the underlying posterior distributions of the parameters.

Thompson Sampling

Cannot find the paper you are looking for? You can Submit a new open access paper.