no code implementations • NeurIPS 2023 • Sungho Choi, Seungyul Han, Woojun Kim, Jongseong Chae, Whiyoung Jung, Youngchul Sung
In this paper, we consider domain-adaptive imitation learning with visual observation, where an agent in a target domain learns to perform a task by observing expert demonstrations in a source domain.
1 code implementation • 22 Aug 2023 • Yonghyeon Jo, Sunwoo Lee, Junghyuk Yeom, Seungyul Han
Recently, deep multi-agent reinforcement learning (MARL) has gained significant popularity due to its success in various cooperative multi-agent tasks.
1 code implementation • 19 Jun 2022 • Jongseong Chae, Seungyul Han, Whiyoung Jung, Myungsik Cho, Sungho Choi, Youngchul Sung
In this paper, we propose a robust imitation learning (IL) framework that improves the robustness of IL when environment dynamics are perturbed.
1 code implementation • NeurIPS 2021 • Seungyul Han, Youngchul Sung
In this paper, we propose a max-min entropy framework for reinforcement learning (RL) to overcome the limitation of the soft actor-critic (SAC) algorithm implementing the maximum entropy RL in model-free sample-based learning.
no code implementations • 14 Dec 2020 • Sohee Bae, Seungyul Han, Youngchul Sung
A condition for the reward function of reinforcement learning (RL) for queue stability is derived.
no code implementations • 2 Jun 2020 • Sungho Choi, Seungyul Han, Woojun Kim, Youngchul Sung
In this paper, we consider cross-domain imitation learning (CDIL) in which an agent in a target domain learns a policy to perform well in the target domain by observing expert demonstrations in a source domain without accessing any reward function.
1 code implementation • 2 Jun 2020 • Seungyul Han, Youngchul Sung
In this paper, sample-aware policy entropy regularization is proposed to enhance the conventional policy entropy regularization for better exploration.
1 code implementation • 7 May 2019 • Seungyul Han, Youngchul Sung
In importance sampling (IS)-based reinforcement learning algorithms such as Proximal Policy Optimization (PPO), IS weights are typically clipped to avoid large variance in learning.
no code implementations • 12 Oct 2017 • Seungyul Han, Youngchul Sung
In this paper, a new adaptive multi-batch experience replay scheme is proposed for proximal policy optimization (PPO) for continuous action control.