Search Results for author: Megha Roshan

Found 1 papers, 0 papers with code

From Representational Harms to Quality-of-Service Harms: A Case Study on Llama 2 Safety Safeguards

no code implementations • 20 Mar 2024 • Khaoula Chehbouni, Megha Roshan, Emmanuel Ma, Futian Andrew Wei, Afaf Taik, Jackie CK Cheung, Golnoosh Farnadi

Despite growing mitigation efforts to develop safety safeguards, such as supervised safety-oriented fine-tuning and leveraging safe reinforcement learning from human feedback, multiple concerns regarding the safety and ingrained biases in these models remain.

Safe Reinforcement Learning

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.