Multinomial Variational Autoencoders can recover Principal Components

1 Jan 2021 · James Morton, Justin Silverman, Gleb Tikhonov, Harri Lähdesmäki, Rich Bonneau ·

Covariance estimation on high dimensional data is a central challenge across multiple scientific disciplines. Sparse high-dimensional count data frequently encountered in biological applications such as DNA sequencing and proteomics are often well modeled using multinomial logistic-normal models. In many cases these datasets are also compositional, presented item-wise as fractions of a normalized total, necessitated by measurement and instrument constraints. Yet three key challenge prove limiting in covariance estimation with these models: (1) the computational complexity of inverting high-dimensional covariance matrices, (2) non-exchangability introduced from the summation constraint on multinomial parameters, (3) the irreducibility of the component multinomial logistic-normal distribution that necessitates the use of parameter augmentation, or similar techniques, during inference. We show that a variational autoencoder augmented with a fast Isometric Log-ratio (ILR) transform can address these issues and accurately estimate principal components from multinomially logistic-normal distributed data. This model can be optimized on GPUs and modified to handle mini-batching, with the ability to scale across thousands of dimensions and thousands of samples.

PDF Abstract

Code

Add Remove Mark official

No code implementations yet. Submit your code now

Tasks

Add Remove

Datasets

Add Datasets introduced or used in this paper

Results from the Paper

Add Remove

Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods

Add Remove

AutoEncoder

Edit Social Preview

Multinomial Variational Autoencoders can recover Principal Components

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Add Remove

Methods

Add Remove