Contextualizing Hate Speech Classifiers with Post-hoc Explanation

ACL 2020 Brendan KennedyXisen JinAida Mostafazadeh DavaniMorteza DehghaniXiang Ren

Hate speech classifiers trained on imbalanced datasets struggle to determine if group identifiers like "gay" or "black" are used in offensive or prejudiced ways. Such biases manifest in false positives when these identifiers are present, due to models' inability to learn the contexts which constitute a hateful usage of identifiers... (read more)

PDF Abstract ACL 2020 PDF ACL 2020 Abstract

Code


No code implementations yet. Submit your code now

Tasks


Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods used in the Paper