no code implementations • 19 Feb 2024 • Pedro Freire, ChengCheng Tan, Adam Gleave, Dan Hendrycks, Scott Emmons
Do language models implicitly learn a concept of human wellbeing?
Ethics Language Modelling +1