no code implementations • 1 Nov 2023 • Young Wu, Jeremy McMahan, Yiding Chen, Yudong Chen, Xiaojin Zhu, Qiaomin Xie
We study the game modification problem, where a benevolent game designer or a malevolent adversary modifies the reward function of a zero-sum Markov game so that a target deterministic or stochastic policy profile becomes the unique Markov perfect Nash equilibrium and has a value within a target range, in a way that minimizes the modification cost.
no code implementations • 20 Jul 2022 • Eric Pulick, Shubham Bharti, Yiding Chen, Vladimir Menkov, Yonatan Mintz, Paul Kantor, Vicki M. Bier
Existing benchmark environments for ML, such as board and video games, offer well-defined benchmarks for progress, but constituent tasks are often complex, and it is frequently unclear how task characteristics contribute to overall difficulty for the machine learner.
no code implementations • 1 Jun 2022 • Yiding Chen, Xuezhou Zhang, Kaiqing Zhang, Mengdi Wang, Xiaojin Zhu
We consider a distributed reinforcement learning setting where multiple agents separately explore the environment and communicate their experiences through a central server.
no code implementations • 11 Jun 2021 • Xuezhou Zhang, Yiding Chen, Jerry Zhu, Wen Sun
Surprisingly, in this case, the knowledge of $\epsilon$ is necessary, as we show that being adaptive to unknown $\epsilon$ is impossible. This again contrasts with recent results on corruption-robust online RL and implies that robust offline RL is a strictly harder problem.
1 code implementation • 11 Feb 2021 • Xuezhou Zhang, Yiding Chen, Xiaojin Zhu, Wen Sun
Our first result shows that no algorithm can find a better than $O(\epsilon)$-optimal policy under our attack model.
no code implementations • 18 Oct 2019 • Zhiyan Ding, Yiding Chen, Qin Li, Xiaojin Zhu
To our knowledge, this is the first analysis for SGD error lower bound without the strong convexity assumption.
no code implementations • 1 Feb 2019 • Yiding Chen, Xiaojin Zhu
In the white-box setting where the attacker knows the environment and forecast models, we present the optimal attack using LQR for linear models, and Model Predictive Control (MPC) for nonlinear models.