no code implementations • 5 Jul 2022 • Tomoya Yamanokuchi, Yuhwan Kwon, Yoshihisa Tsurumine, Eiji Uchibe, Jun Morimoto, Takamitsu Matsubara
However, such works are limited to one-shot transfer, where real-world data must be collected once to perform the sim-to-real transfer, which remains a significant human effort in transferring the models learned in simulations to new domains in the real world.
no code implementations • 21 Jun 2022 • Eiji Uchibe
We derive structured discriminators so that the learning of the policy and the model is efficient.
no code implementations • 16 May 2022 • Lingwei Zhu, Zheng Chen, Eiji Uchibe, Takamitsu Matsubara
Maximum Tsallis entropy (MTE) framework in reinforcement learning has gained popularity recently by virtue of its flexible modeling choices including the widely used Shannon entropy and sparse entropy.
no code implementations • 16 May 2022 • Lingwei Zhu, Zheng Chen, Eiji Uchibe, Takamitsu Matsubara
The recently successful Munchausen Reinforcement Learning (M-RL) features implicit Kullback-Leibler (KL) regularization by augmenting the reward function with logarithm of the current stochastic policy.
no code implementations • 17 Aug 2020 • Eiji Uchibe, Kenji Doya
A forward RL step minimizes the reverse KL estimated by the inverse RL step.
no code implementations • 25 Jul 2018 • Stefan Elfwing, Eiji Uchibe, Kenji Doya
In this study, by adopting features of the EE-RBM approach to feed-forward neural networks, we propose the UnBounded output network (UBnet) which is characterized by three features: (1) unbounded output units; (2) the target value of correct classification is set to a value much greater than one; and (3) the models are trained by a modified mean-squared error objective.
no code implementations • 30 Oct 2017 • Tadashi Kozuno, Eiji Uchibe, Kenji Doya
Approximate dynamic programming algorithms, such as approximate value iteration, have been successfully applied to many complex reinforcement learning tasks, and a better approximate dynamic programming algorithm is expected to further extend the applicability of reinforcement learning to various tasks.
no code implementations • 24 Feb 2017 • Stefan Elfwing, Eiji Uchibe, Kenji Doya
In the OMPAC method, several instances of a reinforcement learning algorithm are run in parallel with small differences in the initial values of the meta-parameters.
no code implementations • 10 Feb 2017 • Stefan Elfwing, Eiji Uchibe, Kenji Doya
First, we propose two activation functions for neural network function approximation in reinforcement learning: the sigmoid-weighted linear unit (SiLU) and its derivative function (dSiLU).
no code implementations • NeurIPS 2009 • Tetsuro Morimura, Eiji Uchibe, Junichiro Yoshimoto, Kenji Doya
In this paper, we describe a generalized Natural Gradient (gNG) by linearly interpolating the two FIMs and propose an efficient implementation for the gNG learning based on a theory of the estimating function, generalized Natural Actor-Critic (gNAC).