Locally Masked Convolution for Autoregressive Models

22 Jun 2020  ·  Ajay Jain, Pieter Abbeel, Deepak Pathak ·

High-dimensional generative models have many applications including image compression, multimedia generation, anomaly detection and data completion. State-of-the-art estimators for natural images are autoregressive, decomposing the joint distribution over pixels into a product of conditionals parameterized by a deep neural network, e.g. a convolutional neural network such as the PixelCNN. However, PixelCNNs only model a single decomposition of the joint, and only a single generation order is efficient. For tasks such as image completion, these models are unable to use much of the observed context. To generate data in arbitrary orders, we introduce LMConv: a simple modification to the standard 2D convolution that allows arbitrary masks to be applied to the weights at each location in the image. Using LMConv, we learn an ensemble of distribution estimators that share parameters but differ in generation order, achieving improved performance on whole-image density estimation (2.89 bpd on unconditional CIFAR10), as well as globally coherent image completions. Our code is available at https://ajayjain.github.io/lmconv.

PDF Abstract

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Image Generation Binarized MNIST Locally Masked PixelCNN (8 orders) bits/dimension 0.143 # 1
nats 77.58 # 2
Image Generation CelebA 256x256 Locally Masked PixelCNN bpd 0.74 # 7
Image Generation CIFAR-10 Locally Masked PixelCNN, 8 orders bits/dimension 2.89 # 20
Image Generation MNIST Locally Masked PixelCNN (8 orders) bits/dimension 0.65 # 1

Methods