Sex, drugs, and violence

11 Aug 2016  ·  Stefania Raimondo, Frank Rudzicz ·

Automatically detecting inappropriate content can be a difficult NLP task, requiring understanding context and innuendo, not just identifying specific keywords. Due to the large quantity of online user-generated content, automatic detection is becoming increasingly necessary. We take a largely unsupervised approach using a large corpus of narratives from a community-based self-publishing website and a small segment of crowd-sourced annotations. We explore topic modelling using latent Dirichlet allocation (and a variation), and use these to regress appropriateness ratings, effectively automating rating for suitability. The results suggest that certain topics inferred may be useful in detecting latent inappropriateness -- yielding recall up to 96% and low regression errors.

PDF Abstract
No code implementations yet. Submit your code now

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here