Transfer learning is a methodology where weights from a model trained on one task are taken and either used (a) to construct a fixed feature extractor, (b) as weight initialization and/or fine-tuning.
( Image credit: Subodh Malgonde )
|TREND||DATASET||BEST METHOD||PAPER TITLE||PAPER||CODE||COMPARE|
GLDv2 is the largest such dataset to date by a large margin, including over 5M images and 200k distinct instance labels.
We introduce "talking-heads attention" - a variation on multi-head attention which includes linearprojections across the attention-heads dimension, immediately before and after the softmax operation. While inserting only a small number of additional parameters and a moderate amount of additionalcomputation, talking-heads attention leads to better perplexities on masked language modeling tasks, aswell as better quality when transfer-learning to language comprehension and question answering tasks.
The approach combines, in a black-box fashion, multiple models trained with disjoint datasets, such as records from different subsets of users.
Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP).
SOTA for Linguistic Acceptability on CoLA
COMMON SENSE REASONING COREFERENCE RESOLUTION DOCUMENT SUMMARIZATION LINGUISTIC ACCEPTABILITY MACHINE TRANSLATION NATURAL LANGUAGE INFERENCE QUESTION ANSWERING SEMANTIC TEXTUAL SIMILARITY SENTIMENT ANALYSIS TEXT CLASSIFICATION TRANSFER LEARNING WORD SENSE DISAMBIGUATION
In this paper, we present HuggingFace's Transformers library, a library for state-of-the-art NLP, making these developments available to the community by gathering state-of-the-art general-purpose pretrained models under a unified API together with an ecosystem of libraries, examples, tutorials and scripts targeting many downstream NLP tasks.
As Transfer Learning from large-scale pre-trained models becomes more prevalent in Natural Language Processing (NLP), operating these large models in on-the-edge and/or under constrained computational training or inference budgets remains challenging.
#5 best model for Semantic Textual Similarity on MRPC
Training large-scale question answering systems is complicated because training sources usually cover a small portion of the range of possible questions.
SOTA for Question Answering on WebQuestions
Since the source and the target domains are usually from different distributions, existing methods mainly focus on adapting the cross-domain marginal or conditional distributions.
#5 best model for Domain Adaptation on ImageCLEF-DA
To address the issue, we propose a novel transfer learning approach based on meta-learning that can automatically learn what knowledge to transfer from the source network to where in the target network.