Transfer learning is a methodology where weights from a model trained on one task are taken and either used (a) to construct a fixed feature extractor, (b) as weight initialization and/or fine-tuning.
( Image credit: Subodh Malgonde )
|TREND||DATASET||BEST METHOD||PAPER TITLE||PAPER||CODE||COMPARE|
GLDv2 is the largest such dataset to date by a large margin, including over 5M images and 200k distinct instance labels.
We introduce "talking-heads attention" - a variation on multi-head attention which includes linearprojections across the attention-heads dimension, immediately before and after the softmax operation. While inserting only a small number of additional parameters and a moderate amount of additionalcomputation, talking-heads attention leads to better perplexities on masked language modeling tasks, aswell as better quality when transfer-learning to language comprehension and question answering tasks.
The approach combines, in a black-box fashion, multiple models trained with disjoint datasets, such as records from different subsets of users.
Humans read and write hundreds of billions of messages every day.
Ranked #13 on Natural Language Inference on RTE
Magnitude pruning is a widely used strategy for reducing model size in pure supervised learning; however, it is less effective in the transfer learning regime that has become standard for state-of-the-art natural language processing applications.
Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP).
Ranked #1 on Sentiment Analysis on SST-2 Binary classification
COMMON SENSE REASONING COREFERENCE RESOLUTION DOCUMENT SUMMARIZATION LINGUISTIC ACCEPTABILITY MACHINE TRANSLATION NATURAL LANGUAGE INFERENCE QUESTION ANSWERING SEMANTIC TEXTUAL SIMILARITY SENTIMENT ANALYSIS TEXT CLASSIFICATION TRANSFER LEARNING WORD SENSE DISAMBIGUATION
Transformer architectures have facilitated building higher-capacity models and pretraining has made it possible to effectively utilize this capacity for a wide variety of tasks.
As Transfer Learning from large-scale pre-trained models becomes more prevalent in Natural Language Processing (NLP), operating these large models in on-the-edge and/or under constrained computational training or inference budgets remains challenging.
Ranked #6 on Semantic Textual Similarity on MRPC