Search Results for author: Daniel Scalena

Found 1 papers, 1 papers with code

Let the Models Respond: Interpreting Language Model Detoxification Through the Lens of Prompt Dependence

1 code implementation1 Sep 2023 Daniel Scalena, Gabriele Sarti, Malvina Nissim, Elisabetta Fersini

Due to language models' propensity to generate toxic or hateful responses, several techniques were developed to align model generations with users' preferences.

Language Modelling reinforcement-learning

Cannot find the paper you are looking for? You can Submit a new open access paper.