no code implementations • 15 Feb 2024 • Taisuke Kobayashi
As a result, it is confirmed through numerical simulations that the proposed stabilization tricks make ER applicable to an advantage actor-critic, an on-policy algorithm.
no code implementations • 24 Aug 2023 • Taisuke Kobayashi
Although this problem can be avoided by paying attention to the reward design, it is essential in practical use of TD learning to review the exception handling at termination.
no code implementations • 8 Mar 2023 • Taisuke Kobayashi
However, the priority of maximizing the policy entropy is automatically tuned in the current implementation, the rule of which can be interpreted as one for equality constraint, binding the policy entropy into its specified lower bound.
no code implementations • 21 Dec 2022 • Taisuke Kobayashi
This paper introduces a novel method of adding intrinsic bonuses to task-oriented reward function in order to efficiently facilitate reinforcement learning search.
no code implementations • 8 Aug 2022 • Taisuke Kobayashi, Ryoma Watanuki
We experimentally verified the benefits of the sparsification by the proposed method that it can easily find the necessary and sufficient six dimensions for a reaching task with a mobile manipulator that requires a six-dimensional state space.
no code implementations • 18 Mar 2022 • Taisuke Kobayashi
However, the density ratio is asymmetric for its center, and the possible error scale from its center, which should be close to the threshold, would depend on how the baseline policy is given.
no code implementations • 25 Feb 2022 • Taisuke Kobayashi
Recently, T-soft update has been proposed as a noise-robust update rule for the target network and has contributed to improving the DRL performance.
no code implementations • 15 Feb 2022 • Taisuke Kobayashi
RL is known for the instability of the learning process and the sensitivity of the acquired policy to noise.
1 code implementation • 18 Jan 2022 • Wendyam Eric Lionel Ilboudo, Taisuke Kobayashi, Takamitsu Matsubara
In this paper, we propose AdaTerm, a novel approach that incorporates the Student's t-distribution to derive not only the first-order moment but also all the associated statistics.
no code implementations • 29 Nov 2021 • Taisuke Kobayashi, Takahito Enomoto
On the other hand, the concept of personal mobility is also getting popular, and its autonomous driving specialized for individual drivers is expected for a new step.
no code implementations • 3 Sep 2021 • Maciej Pietrowski, Andrzej Gajda, Takuto Yamamoto, Taisuke Kobayashi, Lana Sinapayen, Eiji Watanabe
GPU-specific computational processing is more indeterminate than that by CPUs, and hardware-derived uncertainties, which are often considered obstacles that need to be eliminated, might, in some cases, be successfully incorporated into the training of deep neural networks.
no code implementations • 2 Aug 2021 • Wendyam Eric Lionel Ilboudo, Taisuke Kobayashi, Kenji Sugimoto
In order to allow the imitators to effectively learn from imperfect demonstrations, we propose to employ the robust t-momentum optimization algorithm.
no code implementations • 23 Jun 2021 • Taisuke Kobayashi, Akiyoshi Kitaoka, Manabu Kosaka, Kenta Tanaka, Eiji Watanabe
In our previous study, we successfully reproduced the illusory motion of the rotating snakes illusion using deep neural networks incorporating predictive coding theory.
no code implementations • 18 Jun 2021 • Taisuke Kobayashi, Eiji Watanabe
Rotating Snakes is a visual illusion in which a stationary design is perceived to move dramatically.
no code implementations • 3 Jun 2021 • Taisuke Kobayashi
This paper proposes a new reinforcement learning with hyperbolic discounting.
no code implementations • 27 May 2021 • Taisuke Kobayashi
This paper addresses a new interpretation of the traditional optimization method in reinforcement learning (RL) as optimization problems using reverse Kullback-Leibler (KL) divergence, and derives a new optimization method using forward KL divergence, instead of reverse KL divergence in the optimization problems.
no code implementations • 1 Apr 2021 • Taisuke Kobayashi, Kenta Yoshizawa
To alleviate this drawback of the FB controllers, feedback error learning integrates one of them with a feedforward (FF) controller.
no code implementations • 20 Nov 2020 • Koki Kobayashi, Masaki Ogura, Taisuke Kobayashi, Kenji Sugimoto
In this paper, we propose a deep unfolding-based framework for the output feedback control of systems with input saturation.
no code implementations • 7 Oct 2020 • Taisuke Kobayashi
PPO clips density ratio of the latest and baseline policies with a threshold, while its minimization target is unclear.
no code implementations • 25 Aug 2020 • Taisuke Kobayashi, Wendyam Eric Lionel Ilboudo
The problem with its conventional update rule is the fact that all the parameters are smoothly copied with the same speed from the main network, even when some of them are trying to update toward the wrong directions.
no code implementations • 23 Aug 2020 • Taisuke Kobayashi
The eligibility traces method is well known as an online learning technique for improving sample efficiency in traditional reinforcement learning with linear regressors rather than DRL.
no code implementations • 31 Jul 2020 • Taisuke Kobayashi
In the real-world data, noise and outliers cannot be excluded from dataset to be used for learning robot skills.
no code implementations • 4 Mar 2020 • Taisuke Kobayashi
In the proposed method, a standard VAE is employed to statistically extract latent space hidden in sampled data, and this latent space helps make robots controllable in feasible computational time and cost.
3 code implementations • 29 Feb 2020 • Wendyam Eric Lionel Ilboudo, Taisuke Kobayashi, Kenji Sugimoto
Machine learning algorithms aim to find patterns from observations, which may include some noise, especially in robotics domain.