no code implementations • 6 Apr 2024 • Tianle Pu, Changjun Fan, Mutian Shen, Yizhou Lu, Li Zeng, Zohar Nussinov, Chao Chen, Zhong Liu
The technique is originated from physics, but is very effective in enabling RL agents to explore to continuously improve the solutions during test.