1 code implementation • 4 Oct 2023 • Minsu Kim, Joohwan Ko, Taeyoung Yun, Dinghuai Zhang, Ling Pan, Woochang Kim, Jinkyoo Park, Emmanuel Bengio, Yoshua Bengio
We find that the challenge is greatly reduced if a learned function of the temperature is used to scale the policy's logits directly.