no code implementations • 6 Feb 2024 • Bohao Qu, Xiaofeng Cao, Qing Guo, Yi Chang, Ivor W. Tsang, Chengqi Zhang
In this study, we present a transductive inference approach on that reward information propagation graph, which enables the effective estimation of rewards for unlabelled data in offline reinforcement learning.
no code implementations • 28 Feb 2023 • Bohao Qu, Xiaofeng Cao, Jielong Yang, Hechang Chen, Chang Yi, Ivor W. Tsang, Yew-Soon Ong
To resolve this problem, this paper tries to learn the diverse policies from the history of state-action pairs under a non-Markovian environment, in which a policy dispersion scheme is designed for seeking diverse policy representation.