no code implementations • 15 Jan 2024 • Xingzhou Lou, Junge Zhang, Ziyan Wang, Kaiqi Huang, Yali Du
Through the use of pre-trained LMs and the elimination of the need for a ground-truth cost, our method enhances safe policy learning under a diverse set of human-derived free-form natural language constraints.
1 code implementation • 25 Dec 2023 • Xingzhou Lou, Junge Zhang, Timothy J. Norman, Kaiqi Huang, Yali Du
We propose Topology-based multi-Agent Policy gradiEnt (TAPE) for both stochastic and deterministic MAPG methods.
1 code implementation • 16 Jan 2023 • Xingzhou Lou, Jiaxian Guo, Junge Zhang, Jun Wang, Kaiqi Huang, Yali Du
We conduct experiments on the Overcooked environment, and evaluate the zero-shot human-AI coordination performance of our method with both behavior-cloned human proxies and real humans.