no code implementations • NeurIPS 2009 • Chenghui Cai, Xuejun Liao, Lawrence Carin
In this paper we propose a dual-policy method for jointly learning the agent behavior and the balance between exploration exploitation, in partially observable environments.