Data Augmentation

2529 papers with code • 2 benchmarks • 63 datasets

Data augmentation involves techniques used for increasing the amount of data, based on different modifications, to expand the amount of examples in the original dataset. Data augmentation not only helps to grow the dataset but it also increases the diversity of the dataset. When training machine learning models, data augmentation acts as a regularizer and helps to avoid overfitting.

Data augmentation techniques have been found useful in domains like NLP and computer vision. In computer vision, transformations like cropping, flipping, and rotation are used. In NLP, data augmentation techniques can include swapping, deletion, random insertion, among others.

Further readings:

( Image credit: Albumentations )

Libraries

Use these libraries to find Data Augmentation models and implementations

Most implemented papers

Recurrent Neural Networks with Top-k Gains for Session-based Recommendations

hidasib/GRU4Rec ICLR 2018

RNNs have been shown to be excellent models for sequential data and in particular for data that is generated by users in an session-based manner.

Fast AutoAugment

kakaobrain/fast-autoaugment NeurIPS 2019

Data augmentation is an essential technique for improving generalization ability of deep learning models.

NiftyNet: a deep-learning platform for medical imaging

NifTK/NiftyNet 11 Sep 2017

NiftyNet provides a modular deep-learning pipeline for a range of medical imaging applications including segmentation, regression, image generation and representation learning applications.

Camera Style Adaptation for Person Re-identification

zhunzhong07/CamStyle CVPR 2018

In this paper, we explicitly consider this challenge by introducing camera style (CamStyle) adaptation.

TransGAN: Two Pure Transformers Can Make One Strong GAN, and That Can Scale Up

VITA-Group/TransGAN NeurIPS 2021

Our vanilla GAN architecture, dubbed TransGAN, consists of a memory-friendly transformer-based generator that progressively increases feature resolution, and correspondingly a multi-scale discriminator to capture simultaneously semantic contexts and low-level textures.

DeiT III: Revenge of the ViT

facebookresearch/deit 14 Apr 2022

Our evaluations on Image classification (ImageNet-1k with and without pre-training on ImageNet-21k), transfer learning and semantic segmentation show that our procedure outperforms by a large margin previous fully supervised training recipes for ViT.

Pythia v0.1: the Winning Entry to the VQA Challenge 2018

facebookresearch/pythia 26 Jul 2018

We demonstrate that by making subtle but important changes to the model architecture and the learning rate schedule, fine-tuning image features, and adding data augmentation, we can significantly improve the performance of the up-down model on VQA v2. 0 dataset -- from 65. 67% to 70. 22%.

Optimizing Millions of Hyperparameters by Implicit Differentiation

Guang000/Awesome-Dataset-Distillation 6 Nov 2019

We propose an algorithm for inexpensive gradient-based hyperparameter optimization that combines the implicit function theorem (IFT) with efficient inverse Hessian approximations.

Efficiently Modeling Long Sequences with Structured State Spaces

hazyresearch/state-spaces ICLR 2022

A central goal of sequence modeling is designing a single principled model that can address sequence data across a range of modalities and tasks, particularly on long-range dependencies.

ECG arrhythmia classification using a 2-D convolutional neural network

ankur219/ECG-Arrhythmia-classification 18 Apr 2018

In this paper, we propose an effective electrocardiogram (ECG) arrhythmia classification method using a deep two-dimensional convolutional neural network (CNN) which recently shows outstanding performance in the field of pattern recognition.