Transfer Learning

2819 papers with code • 7 benchmarks • 14 datasets

Transfer Learning is a machine learning technique where a model trained on one task is re-purposed and fine-tuned for a related, but different task. The idea behind transfer learning is to leverage the knowledge learned from a pre-trained model to solve a new, but related problem. This can be useful in situations where there is limited data available to train a new model from scratch, or when the new task is similar enough to the original task that the pre-trained model can be adapted to the new problem with only minor modifications.

( Image credit: Subodh Malgonde )

Libraries

Use these libraries to find Transfer Learning models and implementations

Most implemented papers

Unsupervised Domain Adaptation by Backpropagation

PaddlePaddle/PaddleSpeech 26 Sep 2014

Here, we propose a new approach to domain adaptation in deep architectures that can be trained on large amount of labeled data from the source domain and large amount of unlabeled data from the target domain (no labeled target-domain data is necessary).

TransferTransfo: A Transfer Learning Approach for Neural Network Based Conversational Agents

huggingface/transfer-learning-conv-ai 23 Jan 2019

We introduce a new approach to generative data-driven dialogue systems (e. g. chatbots) called TransferTransfo which is a combination of a Transfer learning based training scheme and a high-capacity Transformer model.

Unsupervised Data Augmentation for Consistency Training

google-research/uda NeurIPS 2020

In this work, we present a new perspective on how to effectively noise unlabeled examples and argue that the quality of noising, specifically those produced by advanced data augmentation methods, plays a crucial role in semi-supervised learning.

Going deeper with Image Transformers

rwightman/pytorch-image-models ICCV 2021

In particular, we investigate the interplay of architecture and optimization of such dedicated transformers.

Parameter-Efficient Transfer Learning for NLP

google-research/adapter-bert 2 Feb 2019

On GLUE, we attain within 0. 4% of the performance of full fine-tuning, adding only 3. 6% parameters per task.

FNet: Mixing Tokens with Fourier Transforms

google-research/google-research NAACL 2022

At longer input lengths, our FNet model is significantly faster: when compared to the "efficient" Transformers on the Long Range Arena benchmark, FNet matches the accuracy of the most accurate models, while outpacing the fastest models across all sequence lengths on GPUs (and across relatively shorter lengths on TPUs).

GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding

ofa-sys/ofa WS 2018

For natural language understanding (NLU) technology to be maximally useful, both practically and as a scientific object of study, it must be general: it must be able to process language in a way that is not exclusively tailored to any one specific task or dataset.

Rethinking Channel Dimensions for Efficient Model Design

clovaai/rexnet CVPR 2021

We then investigate the channel configuration of a model by searching network architectures concerning the channel configuration under the computational cost restriction.

DeiT III: Revenge of the ViT

facebookresearch/deit 14 Apr 2022

Our evaluations on Image classification (ImageNet-1k with and without pre-training on ImageNet-21k), transfer learning and semantic segmentation show that our procedure outperforms by a large margin previous fully supervised training recipes for ViT.