Factorial LDA: Sparse Multi-Dimensional Text Models

NeurIPS 2012  ·  Michael Paul, Mark Dredze ·

Multi-dimensional latent variable models can capture the many latent factors in a text corpus, such as topic, author perspective and sentiment. We introduce factorial LDA, a multi-dimensional latent variable model in which a document is influenced by K different factors, and each word token depends on a K-dimensional vector of latent variables. Our model incorporates structured word priors and learns a sparse product of factors. Experiments on research abstracts show that our model can learn latent factors such as research topic, scientific discipline, and focus (e.g. methods vs. applications.) Our modeling improvements reduce test perplexity and improve human interpretability of the discovered factors.

PDF Abstract
No code implementations yet. Submit your code now

Tasks


Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods