Data Augmentation

2513 papers with code • 2 benchmarks • 63 datasets

Data augmentation involves techniques used for increasing the amount of data, based on different modifications, to expand the amount of examples in the original dataset. Data augmentation not only helps to grow the dataset but it also increases the diversity of the dataset. When training machine learning models, data augmentation acts as a regularizer and helps to avoid overfitting.

Data augmentation techniques have been found useful in domains like NLP and computer vision. In computer vision, transformations like cropping, flipping, and rotation are used. In NLP, data augmentation techniques can include swapping, deletion, random insertion, among others.

Benchmarks

Add a Result

These leaderboards are used to track progress in Data Augmentation

Trend	Dataset	Best Model	Paper	Code	Compare
	ImageNet	DeiT-B (+MixPro)			See all
	CIFAR-10	Shake-Shake (26 2×96d) (Faster AA)			See all

Libraries

Use these libraries to find Data Augmentation models and implementations

Westlake-AI/openmixup

15 papers

568

rwightman/pytorch-image-models

7 papers

29,713

makcedward/nlpaug

7 papers

4,296

faceonlive/ai-research

7 papers

140

See all 7 libraries.

Datasets

Subtasks

Most implemented papers

Most implemented Social Latest No code

Random Erasing Data Augmentation

zhunzhong07/Random-Erasing • • 16 Aug 2017

In this paper, we introduce Random Erasing, a new data augmentation method for training the convolutional neural network (CNN).

Paper
Code

DENSER: Deep Evolutionary Network Structured Representation

fillassuncao/denser-models • 4 Jan 2018

Deep Evolutionary Network Structured Representation (DENSER) is a novel approach to automatically design Artificial Neural Networks (ANNs) using Evolutionary Computation.

Paper
Code

RandAugment: Practical automated data augmentation with a reduced search space

rwightman/pytorch-image-models • • NeurIPS 2020

Additionally, due to the separate search phase, these approaches are unable to adjust the regularization strength based on model or dataset size.

Paper
Code

Unsupervised Learning of Visual Features by Contrasting Cluster Assignments

facebookresearch/swav • • NeurIPS 2020

In addition, we also propose a new data augmentation strategy, multi-crop, that uses a mix of views with different resolutions in place of two full-resolution views, without increasing the memory or compute requirements much.

Paper
Code

How to train your ViT? Data, Augmentation, and Regularization in Vision Transformers

rwightman/pytorch-image-models • • 18 Jun 2021

Vision Transformers (ViT) have been shown to attain highly competitive performance for a wide range of vision applications, such as image classification, object detection and semantic image segmentation.

Paper
Code

EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks

jasonwei20/eda_nlp • IJCNLP 2019

We present EDA: easy data augmentation techniques for boosting performance on text classification tasks.

Paper
Code

ResMLP: Feedforward networks for image classification with data-efficient training

facebookresearch/deit • • NeurIPS 2021

We present ResMLP, an architecture built entirely upon multi-layer perceptrons for image classification.

Paper
Code

Sampling Generative Networks

dribnet/plat • 14 Sep 2016

We introduce several techniques for sampling and visualizing the latent spaces of generative models.

Paper
Code

Self-training with Noisy Student improves ImageNet classification

tensorflow/tpu • • CVPR 2020

During the learning of the student, we inject noise such as dropout, stochastic depth, and data augmentation via RandAugment to the student so that the student generalizes better than the teacher.

Paper
Code

ResNet strikes back: An improved training procedure in timm

rwightman/pytorch-image-models • • NeurIPS Workshop ImageNet_PPF 2021

We share competitive training settings and pre-trained models in the timm open-source library, with the hope that they will serve as better baselines for future work.

Paper
Code

Data Augmentation

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Most implemented papers

Content

Benchmarks

Add a Result