Search Results for author: Xiyao Wang

Found 12 papers, 7 papers with code

Emojis Decoded: Leveraging ChatGPT for Enhanced Understanding in Social Media Communications

no code implementations22 Jan 2024 YuHang Zhou, Paiheng Xu, Xiyao Wang, Xuan Lu, Ge Gao, Wei Ai

Our objective is to validate the hypothesis that ChatGPT can serve as a viable alternative to human annotators in emoji research and that its ability to explain emoji meanings can enhance clarity and transparency in online communications.

Mementos: A Comprehensive Benchmark for Multimodal Large Language Model Reasoning over Image Sequences

1 code implementation19 Jan 2024 Xiyao Wang, YuHang Zhou, Xiaoyu Liu, Hongjin Lu, Yuancheng Xu, Feihong He, Jaehong Yoon, Taixi Lu, Gedas Bertasius, Mohit Bansal, Huaxiu Yao, Furong Huang

However, current MLLM benchmarks are predominantly designed to evaluate reasoning based on static information about a single image, and the ability of modern MLLMs to extrapolate from image sequences, which is essential for understanding our ever-changing world, has been less investigated.

Language Modelling Large Language Model

COPlanner: Plan to Roll Out Conservatively but to Explore Optimistically for Model-Based RL

no code implementations11 Oct 2023 Xiyao Wang, Ruijie Zheng, Yanchao Sun, Ruonan Jia, Wichayaporn Wongkamjan, Huazhe Xu, Furong Huang

In this paper, we propose $\texttt{COPlanner}$, a planning-driven framework for model-based methods to address the inaccurately learned dynamics model problem with conservative model rollouts and optimistic environment exploration.

Continuous Control Model-based Reinforcement Learning +1

Equal Long-term Benefit Rate: Adapting Static Fairness Notions to Sequential Decision Making

1 code implementation7 Sep 2023 Yuancheng Xu, ChengHao Deng, Yanchao Sun, Ruijie Zheng, Xiyao Wang, Jieyu Zhao, Furong Huang

Moreover, we show that the policy gradient of Long-term Benefit Rate can be analytically reduced to standard policy gradient.

Decision Making Fairness

TACO: Temporal Latent Action-Driven Contrastive Loss for Visual Reinforcement Learning

1 code implementation22 Jun 2023 Ruijie Zheng, Xiyao Wang, Yanchao Sun, Shuang Ma, Jieyu Zhao, Huazhe Xu, Hal Daumé III, Furong Huang

Despite recent progress in reinforcement learning (RL) from raw pixel data, sample inefficiency continues to present a substantial obstacle.

Continuous Control Contrastive Learning +3

Is Model Ensemble Necessary? Model-based RL via a Single Model with Lipschitz Regularized Value Function

no code implementations2 Feb 2023 Ruijie Zheng, Xiyao Wang, Huazhe Xu, Furong Huang

To test this hypothesis, we devise two practical robust training mechanisms through computing the adversarial noise and regularizing the value network's spectral norm to directly regularize the Lipschitz condition of the value functions.

Model-based Reinforcement Learning

Live in the Moment: Learning Dynamics Model Adapted to Evolving Policy

1 code implementation25 Jul 2022 Xiyao Wang, Wichayaporn Wongkamjan, Furong Huang

Model-based reinforcement learning (RL) often achieves higher sample efficiency in practice than model-free RL by learning a dynamics model to generate samples for policy learning.

Continuous Control Model-based Reinforcement Learning +1

Transfer RL across Observation Feature Spaces via Model-Based Regularization

no code implementations ICLR 2022 Yanchao Sun, Ruijie Zheng, Xiyao Wang, Andrew Cohen, Furong Huang

In many reinforcement learning (RL) applications, the observation space is specified by human developers and restricted by physical realizations, and may thus be subject to dramatic changes over time (e. g. increased number of observable features).

Reinforcement Learning (RL)

Planning with Exploration: Addressing Dynamics Bottleneck in Model-based Reinforcement Learning

no code implementations24 Oct 2020 Xiyao Wang, Junge Zhang, Wenzhen Huang, Qiyue Yin

We give an upper bound of the trajectory reward estimation error and point out that increasing the agent's exploration ability is the key to reduce trajectory reward estimation error, thereby alleviating dynamics bottleneck dilemma.

Continuous Control Decision Making +3

Cannot find the paper you are looking for? You can Submit a new open access paper.