1 code implementation • 9 Feb 2024 • Ruijie Zheng, Yongyuan Liang, Xiyao Wang, Shuang Ma, Hal Daumé III, Huazhe Xu, John Langford, Praveen Palanisamy, Kalyan Shankar Basu, Furong Huang
We present Premier-TACO, a multitask feature representation learning approach designed to improve few-shot policy learning efficiency in sequential decision-making tasks.
no code implementations • 22 Jan 2024 • YuHang Zhou, Paiheng Xu, Xiyao Wang, Xuan Lu, Ge Gao, Wei Ai
Our objective is to validate the hypothesis that ChatGPT can serve as a viable alternative to human annotators in emoji research and that its ability to explain emoji meanings can enhance clarity and transparency in online communications.
1 code implementation • 19 Jan 2024 • Xiyao Wang, YuHang Zhou, Xiaoyu Liu, Hongjin Lu, Yuancheng Xu, Feihong He, Jaehong Yoon, Taixi Lu, Gedas Bertasius, Mohit Bansal, Huaxiu Yao, Furong Huang
However, current MLLM benchmarks are predominantly designed to evaluate reasoning based on static information about a single image, and the ability of modern MLLMs to extrapolate from image sequences, which is essential for understanding our ever-changing world, has been less investigated.
2 code implementations • 30 Oct 2023 • Guowei Xu, Ruijie Zheng, Yongyuan Liang, Xiyao Wang, Zhecheng Yuan, Tianying Ji, Yu Luo, Xiaoyu Liu, Jiaxin Yuan, Pu Hua, Shuzhen Li, Yanjie Ze, Hal Daumé III, Furong Huang, Huazhe Xu
To quantify this inactivity, we adopt dormant ratio as a metric to measure inactivity in the RL agent's network.
no code implementations • 11 Oct 2023 • Xiyao Wang, Ruijie Zheng, Yanchao Sun, Ruonan Jia, Wichayaporn Wongkamjan, Huazhe Xu, Furong Huang
In this paper, we propose $\texttt{COPlanner}$, a planning-driven framework for model-based methods to address the inaccurately learned dynamics model problem with conservative model rollouts and optimistic environment exploration.
1 code implementation • 7 Sep 2023 • Yuancheng Xu, ChengHao Deng, Yanchao Sun, Ruijie Zheng, Xiyao Wang, Jieyu Zhao, Furong Huang
Moreover, we show that the policy gradient of Long-term Benefit Rate can be analytically reduced to standard policy gradient.
1 code implementation • 22 Jun 2023 • Ruijie Zheng, Xiyao Wang, Yanchao Sun, Shuang Ma, Jieyu Zhao, Huazhe Xu, Hal Daumé III, Furong Huang
Despite recent progress in reinforcement learning (RL) from raw pixel data, sample inefficiency continues to present a substantial obstacle.
no code implementations • 2 Feb 2023 • Ruijie Zheng, Xiyao Wang, Huazhe Xu, Furong Huang
To test this hypothesis, we devise two practical robust training mechanisms through computing the adversarial noise and regularizing the value network's spectral norm to directly regularize the Lipschitz condition of the value functions.
1 code implementation • 25 Jul 2022 • Xiyao Wang, Wichayaporn Wongkamjan, Furong Huang
Model-based reinforcement learning (RL) often achieves higher sample efficiency in practice than model-free RL by learning a dynamics model to generate samples for policy learning.
no code implementations • ICLR 2022 • Yanchao Sun, Ruijie Zheng, Xiyao Wang, Andrew Cohen, Furong Huang
In many reinforcement learning (RL) applications, the observation space is specified by human developers and restricted by physical realizations, and may thus be subject to dramatic changes over time (e. g. increased number of observable features).
1 code implementation • 18 Apr 2021 • Yankun Yu, Huan Liu, Minghan Fu, Jun Chen, Xiyao Wang, Keyan Wang
Recently, there has been rapid and significant progress on image dehazing.
no code implementations • 24 Oct 2020 • Xiyao Wang, Junge Zhang, Wenzhen Huang, Qiyue Yin
We give an upper bound of the trajectory reward estimation error and point out that increasing the agent's exploration ability is the key to reduce trajectory reward estimation error, thereby alleviating dynamics bottleneck dilemma.