1 code implementation • 23 Jan 2024 • Maxim Khanov, Jirayu Burapacheep, Yixuan Li
Aligning large language models with human objectives is paramount, yet common approaches including RLHF suffer from unstable and resource-intensive training.