ICLR 2020

On Mutual Information Maximization for Representation Learning

ICLR 2020 google-research/google-research

Many recent methods for unsupervised or self-supervised representation learning train feature extractors by maximizing an estimate of the mutual information (MI) between different views of the data.

REPRESENTATION LEARNING SELF-SUPERVISED IMAGE CLASSIFICATION

ProtoAttend: Attention-Based Prototypical Learning

ICLR 2020 google-research/google-research

We propose a novel inherently interpretable machine learning method that bases decisions on few relevant examples that we call prototypes.

DECISION MAKING INTERPRETABLE MACHINE LEARNING

U-GAT-IT: Unsupervised Generative Attentional Networks with Adaptive Layer-Instance Normalization for Image-to-Image Translation

ICLR 2020 taki0112/UGATIT

We propose a novel method for unsupervised image-to-image translation, which incorporates a new attention module and a new learnable normalization function in an end-to-end manner.

UNSUPERVISED IMAGE-TO-IMAGE TRANSLATION

WikiMatrix: Mining 135M Parallel Sentences in 1620 Language Pairs from Wikipedia

ICLR 2020 facebookresearch/LASER

We present an approach based on multilingual sentence embeddings to automatically extract parallel sentences from the content of Wikipedia articles in 85 languages, including several dialects or low-resource languages.

SENTENCE EMBEDDINGS

Large Batch Optimization for Deep Learning: Training BERT in 76 minutes

ICLR 2020 microsoft/DeepSpeed

In this paper, we first study a principled layerwise adaptation strategy to accelerate training of deep neural networks using large mini-batches.

#9 best model for Question Answering on SQuAD1.1 dev (F1 metric)

QUESTION ANSWERING STOCHASTIC OPTIMIZATION

Recurrent Independent Mechanisms

ICLR 2020 maximecb/gym-minigrid

Learning modular structures which reflect the dynamics of the environment can lead to better generalization and robustness to changes which only affect a few of the underlying causes.

AutoSlim: Towards One-Shot Architecture Search for Channel Numbers

ICLR 2020 JiahuiYu/slimmable_networks

Notably, by setting optimized channel numbers, our AutoSlim-MobileNet-v2 at 305M FLOPs achieves 74. 2% top-1 accuracy, 2. 4% better than default MobileNet-v2 (301M FLOPs), and even 0. 2% better than RL-searched MNasNet (317M FLOPs).

NEURAL ARCHITECTURE SEARCH

Decentralized Distributed PPO: Mastering PointGoal Navigation

ICLR 2020 facebookresearch/habitat-api

We leverage this scaling to train an agent for 2. 5 Billion steps of experience (the equivalent of 80 years of human experience) -- over 6 months of GPU-time training in under 3 days of wall-clock time with 64 GPUs.

AUTONOMOUS NAVIGATION POINTGOAL NAVIGATION

Generative Models for Effective ML on Private, Decentralized Datasets

ICLR 2020 tensorflow/gan

To improve real-world applications of machine learning, experienced modelers develop intuition about their datasets, their models, and how the two interact.

Sparse Networks from Scratch: Faster Training without Losing Performance

ICLR 2020 TimDettmers/sparse_learning

We demonstrate the possibility of what we call sparse learning: accelerated training of deep neural networks that maintain sparse weights throughout training while achieving dense performance levels.

IMAGE CLASSIFICATION SPARSE LEARNING