1 code implementation • ICLR 2022 • Jixian Guo, Mingming Gong, DaCheng Tao
However, because environments are not labelled, the extracted information inevitably contains redundant information unrelated to the dynamics in transition segments and thus fails to maintain a crucial property of $Z$: $Z$ should be similar in the same environment and dissimilar in different ones.
Model-based Reinforcement Learning reinforcement-learning +1