Pre-training a neural network using unsupervised (self-supervised) auxiliary tasks on unlabeled data.
Unsupervised pre-training of large neural models has recently revolutionized Natural Language Processing.
Our experiments on WSJ reduce WER of a strong character-based log-mel filterbank baseline by up to 36% when only a few hours of transcribed data is available.
Ranked #5 on Speech Recognition on TIMIT (using extra training data)
Our goal is to bridge the performance gap between unsupervised methods trained on curated data, which are costly to obtain, and massive raw datasets that are easily available.
Contrastive learning between multiple views of the data has recently achieved state of the art performance in the field of self-supervised representation learning.
Ranked #11 on Self-Supervised Image Classification on ImageNet
We propose ways to improve the performance of fully connected networks.
We show that constituency parsing benefits from unsupervised pre-training across a variety of languages and a range of pre-training conditions.
In this paper, we conduct a further study on MPC and focus on three important aspects: the effect of pre-training data speaking style, its extension on streaming model, and how to better transfer learned knowledge from pre-training stage to downstream tasks.
We further exhibit a new class of random orthogonal initial conditions on weights that, like unsupervised pre-training, enjoys depth independent learning times.