no code implementations • 10 Mar 2024 • Rui Yan, Shuai Mi, Xiaoming Duan, Jintao Chen, Xiangyang Ji
The pursuers cooperate to protect a convex region from the evaders who try to reach the region.
no code implementations • 27 Dec 2023 • Chendi Qu, Jianping He, Xiaoming Duan, Jiming Chen
A simplistic model is less likely to contain the real reward function, while a model with high complexity leads to substantial computation cost and risks overfitting.
no code implementations • 27 Dec 2023 • Chendi Qu, Jianping He, Xiaoming Duan
Designing controllers to generate various trajectories has been studied for years, while recently, recovering an optimal controller from trajectories receives increasing attention.
no code implementations • 4 Nov 2023 • Rui Yan, Xiaoming Duan, Rui Zou, Xin He, Zongying Shi, Francesco Bullo
We propose a cooperative strategy for the pursuers based on subgames for multiple pursuers against one evader and optimal task allocation.
no code implementations • 28 Aug 2023 • Yohan John, Gilberto Diaz-Garcia, Xiaoming Duan, Jason R. Marden, Francesco Bullo
Stochastic patrol routing is known to be advantageous in adversarial settings; however, the optimal choice of stochastic routing strategy is dependent on a model of the adversary.
no code implementations • 23 Jun 2023 • Yash Paliwal, Rajarshi Roy, Jean-Raphaël Gaglione, Nasim Baharisangari, Daniel Neider, Xiaoming Duan, Ufuk Topcu, Zhe Xu
We study a class of reinforcement learning (RL) tasks where the objective of the agent is to accomplish temporally extended goals.
no code implementations • 20 Jan 2023 • Haoxuan Pan, Deheng Ye, Xiaoming Duan, Qiang Fu, Wei Yang, Jianping He, Mingfei Sun
We show that, despite such state distribution shift, the policy gradient estimation bias can be reduced in the following three ways: 1) a small learning rate; 2) an adaptive-learning-rate-based optimizer; and 3) KL regularization.
no code implementations • 26 Mar 2021 • Zhe Xu, Xiaoming Duan
We provide simulation results in two different scenarios for robust control of the COVID-19 pandemic: one for vaccination control, and another for shield immunity control, with the model parameters estimated from data in Lombardy, Italy.
no code implementations • 17 Jun 2020 • Rui Yan, Xiaoming Duan, Zongying Shi, Yisheng Zhong, Jason R. Marden, Francesco Bullo
With this knowledge we propose a class of perturbed SBRD with the following property: only policies with maximum metric are observed with nonzero probability for a broad class of stochastic games with finite memory.
Multi-agent Reinforcement Learning reinforcement-learning +1