Emergence of Collective Policies Inside Simulations with Biased Representations
We consider a setting where biases are involved when agents internalise an environment. Agents have different biases, all of which resulting in imperfect evidence collected for taking optimal actions. Throughout the interactions, each agent asynchronously internalises their own predictive model of the environment and forms a virtual simulation within which the agent plays trials of the episodes in entirety. In this research, we focus on developing a collective policy trained solely inside agents' simulations, which can then be transferred to the real-world environment. The key idea is to let agents imagine together; make them take turns to host virtual episodes within which all agents participate and interact with their own biased representations. Since agents' biases vary, the collective policy developed while sequentially visiting the internal simulations complement one another's shortcomings. In our experiment, the collective policies consistently achieve significantly higher returns than the best individually trained policies.
PDF Abstract