3 code implementations • 15 Jun 2023 • Stephen Casper, Jason Lin, Joe Kwon, Gatlen Culp, Dylan Hadfield-Menell
Using a pre-existing classifier does not allow for red-teaming to be tailored to the target model.