Multinomial Variational Autoencoders can recover Principal Components

1 Jan 2021  ·  James Morton, Justin Silverman, Gleb Tikhonov, Harri Lähdesmäki, Rich Bonneau ·

Covariance estimation on high dimensional data is a central challenge across multiple scientific disciplines. Sparse high-dimensional count data frequently encountered in biological applications such as DNA sequencing and proteomics are often well modeled using multinomial logistic-normal models. In many cases these datasets are also compositional, presented item-wise as fractions of a normalized total, necessitated by measurement and instrument constraints. Yet three key challenge prove limiting in covariance estimation with these models: (1) the computational complexity of inverting high-dimensional covariance matrices, (2) non-exchangability introduced from the summation constraint on multinomial parameters, (3) the irreducibility of the component multinomial logistic-normal distribution that necessitates the use of parameter augmentation, or similar techniques, during inference. We show that a variational autoencoder augmented with a fast Isometric Log-ratio (ILR) transform can address these issues and accurately estimate principal components from multinomially logistic-normal distributed data. This model can be optimized on GPUs and modified to handle mini-batching, with the ability to scale across thousands of dimensions and thousands of samples.

PDF Abstract
No code implementations yet. Submit your code now

Tasks


Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods