no code implementations • 21 Mar 2024 • Lucas Monteiro Paes, Dennis Wei, Hyo Jin Do, Hendrik Strobelt, Ronny Luss, Amit Dhurandhar, Manish Nagireddy, Karthikeyan Natesan Ramamurthy, Prasanna Sattigeri, Werner Geyer, Soumya Ghosh
To address the challenges of text as output and long text inputs, we propose a general framework called MExGen that can be instantiated with different attribution algorithms.
no code implementations • 26 Feb 2024 • Juan Felipe Gomez, Caio Vieira Machado, Lucas Monteiro Paes, Flavio P. Calmon
Our findings also contribute to content moderation and intermediary liability laws being discussed and passed in many countries, such as the Digital Services Act in the European Union, the Online Safety Act in the United Kingdom, and the Fake News Bill in Brazil.
no code implementations • 6 Dec 2023 • Lucas Monteiro Paes, Ananda Theertha Suresh, Alex Beutel, Flavio P. Calmon, Ahmad Beirami
Here, the sample complexity for estimating the worst-case performance gap across groups (e. g., the largest difference in error rates) increases exponentially with the number of group-denoting sensitive attributes.
no code implementations • 3 Jun 2023 • Alexander Lin, Lucas Monteiro Paes, Sree Harsha Tanneru, Suraj Srinivas, Himabindu Lakkaraju
We introduce a method for computing scores for each word in the prompt; these scores represent its influence on biases in the model's output.