Many recent methods for unsupervised or self-supervised representation learning train feature extractors by maximizing an estimate of the mutual information (MI) between different views of the data.
We propose a novel inherently interpretable machine learning method that bases decisions on few relevant examples that we call prototypes.
We propose a novel method for unsupervised image-to-image translation, which incorporates a new attention module and a new learnable normalization function in an end-to-end manner.
We present an approach based on multilingual sentence embeddings to automatically extract parallel sentences from the content of Wikipedia articles in 85 languages, including several dialects or low-resource languages.
In this paper, we first study a principled layerwise adaptation strategy to accelerate training of deep neural networks using large mini-batches.
#9 best model for Question Answering on SQuAD1.1 dev (F1 metric)
Notably, by setting optimized channel numbers, our AutoSlim-MobileNet-v2 at 305M FLOPs achieves 74. 2% top-1 accuracy, 2. 4% better than default MobileNet-v2 (301M FLOPs), and even 0. 2% better than RL-searched MNasNet (317M FLOPs).
We leverage this scaling to train an agent for 2. 5 Billion steps of experience (the equivalent of 80 years of human experience) -- over 6 months of GPU-time training in under 3 days of wall-clock time with 64 GPUs.
To improve real-world applications of machine learning, experienced modelers develop intuition about their datasets, their models, and how the two interact.
We demonstrate the possibility of what we call sparse learning: accelerated training of deep neural networks that maintain sparse weights throughout training while achieving dense performance levels.
#22 best model for Image Classification on MNIST