1 code implementation • 29 Sep 2023 • Wanpeng Zhang, Zongqing Lu
Large Language Models (LLMs) have demonstrated significant success across various domains.
no code implementations • 5 Jun 2023 • Wanpeng Zhang, Yilin Li, Boyu Yang, Zongqing Lu
COREP primarily employs a guided updating mechanism to learn a stable graph representation for states termed as causal-origin representation.
no code implementations • 25 Oct 2022 • Ziluo Ding, Wanpeng Zhang, Junpeng Yue, Xiangjun Wang, Tiejun Huang, Zongqing Lu
We investigate the use of natural language to drive the generalization of policies in multi-agent settings.
Multi-agent Reinforcement Learning reinforcement-learning +1
no code implementations • 26 Aug 2021 • Wanpeng Zhang, Xiaoyan Cao, Yao Yao, Zhicheng An, Xi Xiao, Dijun Luo
In this paper, we present a model-based robust RL framework for autonomous greenhouse control to meet the sample efficiency and safety challenges.
no code implementations • 4 Aug 2021 • Xiaopeng Yu, Jiechuan Jiang, Wanpeng Zhang, Haobin Jiang, Zongqing Lu
When one agent interacts with a multi-agent environment, it is challenging to deal with various opponents unseen before.
no code implementations • 3 Aug 2021 • Wanpeng Zhang, Xi Xiao, Yao Yao, Mingzhe Chen, Dijun Luo
MBDP consists of two kinds of dropout mechanisms, where the rollout-dropout aims to improve the robustness with a small cost of sample efficiency, while the model-dropout is designed to compensate for the lost efficiency at a slight expense of robustness.
1 code implementation • 6 Jul 2021 • Xiaoyan Cao, Yao Yao, Lanqing Li, Wanpeng Zhang, Zhicheng An, Zhong Zhang, Li Xiao, Shihui Guo, Xiaoyu Cao, Meihong Wu, Dijun Luo
However, the optimal control of autonomous greenhouses is challenging, requiring decision-making based on high-dimensional sensory data, and the scaling of production is limited by the scarcity of labor capable of handling this task.
1 code implementation • 5 Jul 2021 • Yao Yao, Li Xiao, Zhicheng An, Wanpeng Zhang, Dijun Luo
Model-based deep reinforcement learning has achieved success in various domains that require high sample efficiencies, such as Go and robotics.
no code implementations • 25 Sep 2019 • Junren Luo, Wei Gao, Zhiyong Liao, Weilin Yuan, Wanpeng Zhang, Shaofei Chen
Goal recognition based on the observations of the behaviors collected online has been used to model some potential applications.
no code implementations • 13 Apr 2019 • Bowen Zhao, Xi Xiao, Wanpeng Zhang, Bin Zhang, Shu-Tao Xia
There is a probabilistic version of PCA, known as Probabilistic PCA (PPCA).