no code implementations • 13 Mar 2024 • Shangding Gu, Alois Knoll, Ming Jin
The development of Large Language Models (LLMs) often confronts challenges stemming from the heavy reliance on human annotators in the reinforcement learning with human feedback (RLHF) framework, or the frequent and costly external queries tied to the self-instruct paradigm.
no code implementations • 12 Jan 2024 • Shangding Gu
In this study, we employ a teacher-student learning framework to tackle these problems, specifically by offering feedback for LLMs using RL models and providing high-level information for RL models with LLMs in a cooperative multi-agent setting.
no code implementations • 11 Dec 2023 • Jing Hou, Guang Chen, Ruiqi Zhang, Zhijun Li, Shangding Gu, Changjun Jiang
While existing parallel RL frameworks encompass a variety of RL algorithms and parallelization techniques, the excessively burdensome communication frameworks hinder the attainment of the hardware's limit for final throughput and training effects on a single desktop.
no code implementations • 1 Nov 2023 • Jaafar Mhamed, Shangding Gu
In this study, we define the safety critic, a mechanism that nullifies rewards obtained through violating safety constraints.
no code implementations • 25 Feb 2023 • Shangding Gu, Alap Kshirsagar, Yali Du, Guang Chen, Jan Peters, Alois Knoll
Deployment of Reinforcement Learning (RL) algorithms for robotics applications in the real world requires ensuring the safety of the robot and its environment.
1 code implementation • 20 May 2022 • Shangding Gu, Long Yang, Yali Du, Guang Chen, Florian Walter, Jun Wang, Yaodong Yang, Alois Knoll
To establish a good foundation for future research in this thread, in this paper, we provide a review for safe RL from the perspectives of methods, theory and applications.
3 code implementations • 6 Oct 2021 • Shangding Gu, Jakub Grudzien Kuba, Munning Wen, Ruiqing Chen, Ziyan Wang, Zheng Tian, Jun Wang, Alois Knoll, Yaodong Yang
To fill these gaps, in this work, we formulate the safe MARL problem as a constrained Markov game and solve it with policy optimisation methods.
Multi-agent Reinforcement Learning reinforcement-learning +1
1 code implementation • NeurIPS 2021 • Jakub Grudzien Kuba, Muning Wen, Yaodong Yang, Linghui Meng, Shangding Gu, Haifeng Zhang, David Henry Mguni, Jun Wang
In multi-agent RL (MARL), although the PG theorem can be naturally extended, the effectiveness of multi-agent PG (MAPG) methods degrades as the variance of gradient estimates increases rapidly with the number of agents.