Learning Representations by Maximizing Mutual Information Across Views

NeurIPS 2019 Philip BachmanR Devon HjelmWilliam Buchwalter

We propose an approach to self-supervised representation learning based on maximizing mutual information between features extracted from multiple views of a shared context. For example, one could produce multiple views of a local spatio-temporal context by observing it from different locations (e.g., camera positions within a scene), and via different modalities (e.g., tactile, auditory, or visual)... (read more)

PDF Abstract

Results from the Paper


TASK DATASET MODEL METRIC NAME METRIC VALUE GLOBAL RANK LEADERBOARD
Self-Supervised Image Classification ImageNet AMDIM (small) Top 1 Accuracy 63.5% # 29
Image Classification STL-10 AMDIM Percentage correct 94.5 # 6

Results from Other Papers


TASK DATASET MODEL METRIC NAME METRIC VALUE GLOBAL RANK SOURCE PAPER COMPARE
Self-Supervised Image Classification ImageNet AMDIM (large) Top 1 Accuracy 68.1% # 21
Number of Params 626M # 2

Methods used in the Paper


METHOD TYPE
🤖 No Methods Found Help the community by adding them if they're not listed; e.g. Deep Residual Learning for Image Recognition uses ResNet