Search Results for author: Xingzhou Lou

Found 4 papers, 2 papers with code

SPO: Multi-Dimensional Preference Sequential Alignment With Implicit Reward Modeling

no code implementations • 21 May 2024 • Xingzhou Lou, Junge Zhang, Jian Xie, Lifeng Liu, Dong Yan, Kaiqi Huang

Human preference alignment is critical in building powerful and reliable large language models (LLMs).

Paper
Add Code

Safe Reinforcement Learning with Free-form Natural Language Constraints and Pre-Trained Language Models

no code implementations • 15 Jan 2024 • Xingzhou Lou, Junge Zhang, Ziyan Wang, Kaiqi Huang, Yali Du

Through the use of pre-trained LMs and the elimination of the need for a ground-truth cost, our method enhances safe policy learning under a diverse set of human-derived free-form natural language constraints.

Reinforcement Learning (RL) Safe Reinforcement Learning

Paper
Add Code

TAPE: Leveraging Agent Topology for Cooperative Multi-Agent Policy Gradient

1 code implementation • 25 Dec 2023 • Xingzhou Lou, Junge Zhang, Timothy J. Norman, Kaiqi Huang, Yali Du

We propose Topology-based multi-Agent Policy gradiEnt (TAPE) for both stochastic and deterministic MAPG methods.

Paper
Code

PECAN: Leveraging Policy Ensemble for Context-Aware Zero-Shot Human-AI Coordination

1 code implementation • 16 Jan 2023 • Xingzhou Lou, Jiaxian Guo, Junge Zhang, Jun Wang, Kaiqi Huang, Yali Du

We conduct experiments on the Overcooked environment, and evaluate the zero-shot human-AI coordination performance of our method with both behavior-cloned human proxies and real humans.

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.