Data Augmentation

2517 papers with code • 2 benchmarks • 63 datasets

Data augmentation involves techniques used for increasing the amount of data, based on different modifications, to expand the amount of examples in the original dataset. Data augmentation not only helps to grow the dataset but it also increases the diversity of the dataset. When training machine learning models, data augmentation acts as a regularizer and helps to avoid overfitting.

Data augmentation techniques have been found useful in domains like NLP and computer vision. In computer vision, transformations like cropping, flipping, and rotation are used. In NLP, data augmentation techniques can include swapping, deletion, random insertion, among others.

Benchmarks

Add a Result

These leaderboards are used to track progress in Data Augmentation

Trend	Dataset	Best Model	Paper	Code	Compare
	ImageNet	DeiT-B (+MixPro)			See all
	CIFAR-10	Shake-Shake (26 2×96d) (Faster AA)			See all

Libraries

Use these libraries to find Data Augmentation models and implementations

Westlake-AI/openmixup

15 papers

570

rwightman/pytorch-image-models

7 papers

29,774

makcedward/nlpaug

7 papers

4,298

faceonlive/ai-research

7 papers

156

See all 7 libraries.

Datasets

Subtasks

Latest papers

Most implemented Social Latest No code

Joint Physical-Digital Facial Attack Detection Via Simulating Spoofing Clues

Recognito-Vision/NIST-FRVT-Top-1-Face-Recognition • 12 Apr 2024

SPSC and SDSC augment live samples into simulated attack samples by simulating spoofing clues of physical and digital attacks, respectively, which significantly improve the capability of the model to detect "unseen" attack types.

213

12 Apr 2024

Paper
Code

Masked Image Modeling as a Framework for Self-Supervised Learning across Eye Movements

faceonlive/ai-research • 12 Apr 2024

To make sense of their surroundings, intelligent systems must transform complex sensory inputs to structured codes that are reduced to task-relevant information such as object category.

156

12 Apr 2024

Paper
Code

Data-Augmentation-Based Dialectal Adaptation for LLMs

faceonlive/ai-research • 11 Apr 2024

We propose an approach that combines the strengths of different types of language models and leverages data augmentation techniques to improve task performance on three South Slavic dialects: Chakavian, Cherkano, and Torlak.

156

11 Apr 2024

Paper
Code

AnnoCTR: A Dataset for Detecting and Linking Entities, Tactics, and Techniques in Cyber Threat Reports

faceonlive/ai-research • 11 Apr 2024

In our few-shot scenario, we find that for identifying the MITRE ATT&CK concepts that are mentioned explicitly or implicitly in a text, concept descriptions from MITRE ATT&CK are an effective source for training data augmentation.

156

11 Apr 2024

Paper
Code

Leveraging Data Augmentation for Process Information Extraction

faceonlive/ai-research • 11 Apr 2024

Our study shows, that data augmentation is an important component in enabling machine learning methods for the task of business process model generation from natural language text, where currently mostly rule-based systems are still state of the art.

156

11 Apr 2024

Paper
Code

Nostra Domina at EvaLatin 2024: Improving Latin Polarity Detection through Data Augmentation

faceonlive/ai-research • 11 Apr 2024

This paper describes submissions from the team Nostra Domina to the EvaLatin 2024 shared task of emotion polarity detection.

156

11 Apr 2024

Paper
Code

MindBridge: A Cross-Subject Brain Decoding Framework

littlepure2333/mindbridge • • 11 Apr 2024

Currently, brain decoding is confined to a per-subject-per-model paradigm, limiting its applicability to the same individual for whom the decoding model is trained.

11 Apr 2024

Paper
Code

GANsemble for Small and Imbalanced Data Sets: A Baseline for Synthetic Microplastics Data

DanielPlatnick/GANsemble • • 10 Apr 2024

We experiment with the GANsemble framework on a small and imbalanced microplastics data set.

10 Apr 2024

Paper
Code

PairAug: What Can Augmented Image-Text Pairs Do for Radiology?

faceonlive/ai-research • 7 Apr 2024

Acknowledging this limitation, our objective is to devise a framework capable of concurrently augmenting medical image and text data.

156

07 Apr 2024

Paper
Code

FPL+: Filtered Pseudo Label-based Unsupervised Cross-Modality Adaptation for 3D Medical Image Segmentation

hilab-git/fpl-plus • • 7 Apr 2024

Adapting a medical image segmentation model to a new domain is important for improving its cross-domain transferability, and due to the expensive annotation process, Unsupervised Domain Adaptation (UDA) is appealing where only unlabeled images are needed for the adaptation.

07 Apr 2024

Paper
Code

Data Augmentation

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Latest papers

Content

Benchmarks

Add a Result