2 code implementations • 2 May 2024 • Shangding Gu, Bilgehan Sel, Yuhao Ding, Lu Wang, QIngwei Lin, Ming Jin, Alois Knoll
Ensuring the safety of Reinforcement Learning (RL) is crucial for its deployment in real-world applications.
no code implementations • 24 Feb 2023 • Jiajun Zhou, Jiajun Wu, Yizhao Gao, Yuhao Ding, Chaofan Tao, Boyu Li, Fengbin Tu, Kwang-Ting Cheng, Hayden Kwok-Hay So, Ngai Wong
To accelerate the inference of deep neural networks (DNNs), quantization with low-bitwidth numbers is actively researched.
no code implementations • 15 Feb 2023 • Donghao Ying, Yuhao Ding, Alec Koppel, Javad Lavaei
The objective is to find a localized policy that maximizes the average of the team's local utility functions without the full observability of each agent in the team.
Multi-agent Reinforcement Learning reinforcement-learning +1
no code implementations • 19 Nov 2022 • Yuhao Ding, Ming Jin, Javad Lavaei
We study risk-sensitive reinforcement learning (RL) based on an entropic risk measure in episodic non-stationary Markov decision processes (MDPs).
no code implementations • 22 May 2022 • Donghao Ying, Mengzi Amy Guo, Yuhao Ding, Javad Lavaei, Zuo-Jun Max Shen
We study convex Constrained Markov Decision Processes (CMDPs) in which the objective is concave and the constraints are convex in the state-action occupancy measure.
no code implementations • 28 Jan 2022 • Yuhao Ding, Javad Lavaei
We consider primal-dual-based reinforcement learning (RL) in episodic constrained Markov decision processes (CMDPs) with non-stationary objectives and constraints, which plays a central role in ensuring the safety of RL in time-varying environments.
no code implementations • 19 Oct 2021 • Yuhao Ding, Junzi Zhang, Javad Lavaei
Our result is the first global convergence and sample complexity results for the stochastic entropy-regularized vanilla PG method.
no code implementations • 19 Oct 2021 • Yuhao Ding, Junzi Zhang, Javad Lavaei
For the generic Fisher-non-degenerate policy parametrizations, our result is the first single-loop and finite-batch PG algorithm achieving $\tilde{O}(\epsilon^{-3})$ global optimality sample complexity.
no code implementations • 17 Oct 2021 • Donghao Ying, Yuhao Ding, Javad Lavaei
We study entropy-regularized constrained Markov decision processes (CMDPs) under the soft-max parameterization, in which an agent aims to maximize the entropy-regularized value function while satisfying constraints on the expected total utility.
no code implementations • 25 Aug 2021 • Yuhao Ding, Yik-Cheung Tam
In multi-domain task-oriented dialog system, user utterances and system responses may mention multiple named entities and attributes values.