no code implementations • 19 Mar 2024 • Lirui Luo, Guoxi Zhang, Hongming Xu, Yaodong Yang, Cong Fang, Qing Li
In this paper, we present a framework that is capable of learning structured states and symbolic policies simultaneously, whose key idea is to overcome the efficiency bottleneck by distilling vision foundation models into a scalable perception module.
no code implementations • 15 Mar 2024 • Guoxi Zhang, Han Bao, Hisashi Kashima
To address this problem, the present study introduces a framework that consolidates offline preferences and \emph{virtual preferences} for PbRL, which are comparisons between the agent's behaviors and the offline data.
1 code implementation • 25 Sep 2023 • Xiaofeng Lin, Guoxi Zhang, Xiaotian Lu, Han Bao, Koh Takeuchi, Hisashi Kashima
One popular application of this estimation lies in the prediction of the impact of a treatment (e. g., a promotion) on an outcome (e. g., sales) of a particular unit (e. g., an item), known as the individual treatment effect (ITE).
no code implementations • 13 Feb 2023 • Guoxi Zhang, Xing Yao, Xuanji Xiao
An ultimate goal of recommender systems (RS) is to improve user engagement.
1 code implementation • 29 Nov 2022 • Guoxi Zhang, Hisashi Kashima
To overcome this drawback, the present study proposes a latent variable model to infer a set of policies from data, which allows an agent to use as behavior policy the policy that best describes a particular trajectory.
no code implementations • 8 Nov 2021 • Guoxi Zhang, Hisashi Kashima
This paper addresses the lack of reward in a batch reinforcement learning setting by learning a reward function from preferences.