no code implementations • 21 Mar 2024 • Rohan Chitnis, Shentao Yang, Alborz Geramifard
In particular, we hypothesize that the objectives under which sequential decision-making can improve autocomplete systems are not tailored solely to text entry speed, but more broadly to metrics such as user satisfaction and convenience.
1 code implementation • 13 Feb 2024 • Shentao Yang, Tianqi Chen, Mingyuan Zhou
Aligning text-to-image diffusion model (T2I) with preference has been gaining increasing research attention.
2 code implementations • 20 Feb 2023 • Yihao Feng, Shentao Yang, Shujian Zhang, JianGuo Zhang, Caiming Xiong, Mingyuan Zhou, Huan Wang
Prior works mainly focus on adopting advanced RL techniques to train the ToD agents, while the design of the reward function is not well studied.
1 code implementation • 12 Oct 2022 • Shentao Yang, Shujian Zhang, Yihao Feng, Mingyuan Zhou
In offline model-based reinforcement learning (offline MBRL), we learn a dynamic model from historically collected data, and subsequently utilize the learned model and fixed datasets for policy learning, without further interacting with the environment.
1 code implementation • 14 Jun 2022 • Shentao Yang, Yihao Feng, Shujian Zhang, Mingyuan Zhou
Offline reinforcement learning (RL) extends the paradigm of classical RL algorithms to purely learning from static datasets, without interacting with the underlying environment during the learning process.
no code implementations • 19 Feb 2022 • Shentao Yang, Zhendong Wang, Huangjie Zheng, Yihao Feng, Mingyuan Zhou
For training more effective agents, we propose a framework that supports learning a flexible yet well-regularized fully-implicit policy.
no code implementations • 29 Sep 2021 • Shentao Yang, Zhendong Wang, Huangjie Zheng, Mingyuan Zhou
For training more effective agents, we propose a framework that supports learning a flexible and well-regularized policy, which consists of a fully implicit policy and a regularization through the state-action visitation frequency induced by the current policy and that induced by the data-collecting behavior policy.