1 code implementation • 7 Nov 2023 • Geyang Guo, Ranchi Zhao, Tianyi Tang, Wayne Xin Zhao, Ji-Rong Wen
Alignment with human preference is a desired property of large language models (LLMs).
Imitation Learning