Data Augmentation

2529 papers with code • 2 benchmarks • 63 datasets

Data augmentation involves techniques used for increasing the amount of data, based on different modifications, to expand the amount of examples in the original dataset. Data augmentation not only helps to grow the dataset but it also increases the diversity of the dataset. When training machine learning models, data augmentation acts as a regularizer and helps to avoid overfitting.

Data augmentation techniques have been found useful in domains like NLP and computer vision. In computer vision, transformations like cropping, flipping, and rotation are used. In NLP, data augmentation techniques can include swapping, deletion, random insertion, among others.

Benchmarks

Add a Result

These leaderboards are used to track progress in Data Augmentation

Trend	Dataset	Best Model	Paper	Code	Compare
	ImageNet	DeiT-B (+MixPro)			See all
	CIFAR-10	Shake-Shake (26 2×96d) (Faster AA)			See all

Libraries

Use these libraries to find Data Augmentation models and implementations

Westlake-AI/openmixup

15 papers

574

rwightman/pytorch-image-models

7 papers

29,846

makcedward/nlpaug

7 papers

4,305

faceonlive/ai-research

7 papers

186

See all 7 libraries.

Datasets

Subtasks

Latest papers

Most implemented Social Latest No code

Aligning Actions and Walking to LLM-Generated Textual Descriptions

radu1999/walkandtext • 18 Apr 2024

For action recognition, we employ LLMs to generate textual descriptions of actions in the BABEL-60 dataset, facilitating the alignment of motion sequences with linguistic representations.

18 Apr 2024

Paper
Code

The Victim and The Beneficiary: Exploiting a Poisoned Model to Train a Clean Model on Poisoned Data

zixuan-zhu/vab • • ICCV 2023

This inspires us to propose a novel dual-network training framework: The Victim and The Beneficiary (V&B), which exploits a poisoned model to train a clean model without extra benign samples.

17 Apr 2024

Paper
Code

Consistency Training by Synthetic Question Generation for Conversational Question Answering

hamedhematian/syncqg • 17 Apr 2024

In our novel model-agnostic approach, referred to as CoTaH (Consistency-Trained augmented History), we augment the historical information with synthetic questions and subsequently employ consistency training to train a model that utilizes both real and augmented historical data to implicitly make the reasoning robust to irrelevant history.

17 Apr 2024

Paper
Code

Can We Break Free from Strong Data Augmentations in Self-Supervised Learning?

neurai-lab/ssl-prior • • 15 Apr 2024

Self-supervised learning (SSL) has emerged as a promising solution for addressing the challenge of limited labeled data in deep neural networks (DNNs), offering scalability potential.

15 Apr 2024

Paper
Code

RoofDiffusion: Constructing Roofs from Severely Corrupted Point Data via Diffusion

kylelo/roofdiffusion • • 14 Apr 2024

Accurate completion and denoising of roof height maps are crucial to reconstructing high-quality 3D buildings.

14 Apr 2024

Paper
Code

DeDoDe v2: Analyzing and Improving the DeDoDe Keypoint Detector

parskatt/dedode • • 13 Apr 2024

First, we find that DeDoDe keypoints tend to cluster together, which we fix by performing non-max suppression on the target distribution of the detector during training.

307

13 Apr 2024

Paper
Code

MaSkel: A Model for Human Whole-body X-rays Generation from Human Masking Images

2022yingjie/maskel • • 13 Apr 2024

In our work, We proposed a new method to directly generate the 2D human whole-body X-rays from the human masking images.

13 Apr 2024

Paper
Code

An evaluation framework for synthetic data generation models

novelcore/synthetic_data_evaluation_framework • 13 Apr 2024

Two use case scenarios demonstrate the applicability of the proposed framework for evaluating the ability of synthetic data generation models to generated high quality data.

13 Apr 2024

Paper
Code

Joint Physical-Digital Facial Attack Detection Via Simulating Spoofing Clues

Recognito-Vision/NIST-FRVT-Top-1-Face-Recognition • 12 Apr 2024

SPSC and SDSC augment live samples into simulated attack samples by simulating spoofing clues of physical and digital attacks, respectively, which significantly improve the capability of the model to detect "unseen" attack types.

218

12 Apr 2024

Paper
Code

Masked Image Modeling as a Framework for Self-Supervised Learning across Eye Movements

faceonlive/ai-research • 12 Apr 2024

To make sense of their surroundings, intelligent systems must transform complex sensory inputs to structured codes that are reduced to task-relevant information such as object category.

186

12 Apr 2024

Paper
Code

Data Augmentation

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Latest papers

Content

Benchmarks

Add a Result