Data Augmentation

2513 papers with code • 2 benchmarks • 63 datasets

Data augmentation involves techniques used for increasing the amount of data, based on different modifications, to expand the amount of examples in the original dataset. Data augmentation not only helps to grow the dataset but it also increases the diversity of the dataset. When training machine learning models, data augmentation acts as a regularizer and helps to avoid overfitting.

Data augmentation techniques have been found useful in domains like NLP and computer vision. In computer vision, transformations like cropping, flipping, and rotation are used. In NLP, data augmentation techniques can include swapping, deletion, random insertion, among others.

Further readings:

( Image credit: Albumentations )

Libraries

Use these libraries to find Data Augmentation models and implementations

Most implemented papers

Random Erasing Data Augmentation

zhunzhong07/Random-Erasing 16 Aug 2017

In this paper, we introduce Random Erasing, a new data augmentation method for training the convolutional neural network (CNN).

DENSER: Deep Evolutionary Network Structured Representation

fillassuncao/denser-models 4 Jan 2018

Deep Evolutionary Network Structured Representation (DENSER) is a novel approach to automatically design Artificial Neural Networks (ANNs) using Evolutionary Computation.

RandAugment: Practical automated data augmentation with a reduced search space

rwightman/pytorch-image-models NeurIPS 2020

Additionally, due to the separate search phase, these approaches are unable to adjust the regularization strength based on model or dataset size.

Unsupervised Learning of Visual Features by Contrasting Cluster Assignments

facebookresearch/swav NeurIPS 2020

In addition, we also propose a new data augmentation strategy, multi-crop, that uses a mix of views with different resolutions in place of two full-resolution views, without increasing the memory or compute requirements much.

How to train your ViT? Data, Augmentation, and Regularization in Vision Transformers

rwightman/pytorch-image-models 18 Jun 2021

Vision Transformers (ViT) have been shown to attain highly competitive performance for a wide range of vision applications, such as image classification, object detection and semantic image segmentation.

EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks

jasonwei20/eda_nlp IJCNLP 2019

We present EDA: easy data augmentation techniques for boosting performance on text classification tasks.

ResMLP: Feedforward networks for image classification with data-efficient training

facebookresearch/deit NeurIPS 2021

We present ResMLP, an architecture built entirely upon multi-layer perceptrons for image classification.

Sampling Generative Networks

dribnet/plat 14 Sep 2016

We introduce several techniques for sampling and visualizing the latent spaces of generative models.

Self-training with Noisy Student improves ImageNet classification

tensorflow/tpu CVPR 2020

During the learning of the student, we inject noise such as dropout, stochastic depth, and data augmentation via RandAugment to the student so that the student generalizes better than the teacher.

ResNet strikes back: An improved training procedure in timm

rwightman/pytorch-image-models NeurIPS Workshop ImageNet_PPF 2021

We share competitive training settings and pre-trained models in the timm open-source library, with the hope that they will serve as better baselines for future work.