Data Augmentation

2517 papers with code • 2 benchmarks • 63 datasets

Data augmentation involves techniques used for increasing the amount of data, based on different modifications, to expand the amount of examples in the original dataset. Data augmentation not only helps to grow the dataset but it also increases the diversity of the dataset. When training machine learning models, data augmentation acts as a regularizer and helps to avoid overfitting.

Data augmentation techniques have been found useful in domains like NLP and computer vision. In computer vision, transformations like cropping, flipping, and rotation are used. In NLP, data augmentation techniques can include swapping, deletion, random insertion, among others.

Further readings:

( Image credit: Albumentations )

Libraries

Use these libraries to find Data Augmentation models and implementations

Joint Physical-Digital Facial Attack Detection Via Simulating Spoofing Clues

Recognito-Vision/NIST-FRVT-Top-1-Face-Recognition 12 Apr 2024

SPSC and SDSC augment live samples into simulated attack samples by simulating spoofing clues of physical and digital attacks, respectively, which significantly improve the capability of the model to detect "unseen" attack types.

213
12 Apr 2024

Masked Image Modeling as a Framework for Self-Supervised Learning across Eye Movements

faceonlive/ai-research 12 Apr 2024

To make sense of their surroundings, intelligent systems must transform complex sensory inputs to structured codes that are reduced to task-relevant information such as object category.

156
12 Apr 2024

Data-Augmentation-Based Dialectal Adaptation for LLMs

faceonlive/ai-research 11 Apr 2024

We propose an approach that combines the strengths of different types of language models and leverages data augmentation techniques to improve task performance on three South Slavic dialects: Chakavian, Cherkano, and Torlak.

156
11 Apr 2024

AnnoCTR: A Dataset for Detecting and Linking Entities, Tactics, and Techniques in Cyber Threat Reports

faceonlive/ai-research 11 Apr 2024

In our few-shot scenario, we find that for identifying the MITRE ATT&CK concepts that are mentioned explicitly or implicitly in a text, concept descriptions from MITRE ATT&CK are an effective source for training data augmentation.

156
11 Apr 2024

Leveraging Data Augmentation for Process Information Extraction

faceonlive/ai-research 11 Apr 2024

Our study shows, that data augmentation is an important component in enabling machine learning methods for the task of business process model generation from natural language text, where currently mostly rule-based systems are still state of the art.

156
11 Apr 2024

Nostra Domina at EvaLatin 2024: Improving Latin Polarity Detection through Data Augmentation

faceonlive/ai-research 11 Apr 2024

This paper describes submissions from the team Nostra Domina to the EvaLatin 2024 shared task of emotion polarity detection.

156
11 Apr 2024

MindBridge: A Cross-Subject Brain Decoding Framework

littlepure2333/mindbridge 11 Apr 2024

Currently, brain decoding is confined to a per-subject-per-model paradigm, limiting its applicability to the same individual for whom the decoding model is trained.

36
11 Apr 2024

GANsemble for Small and Imbalanced Data Sets: A Baseline for Synthetic Microplastics Data

DanielPlatnick/GANsemble 10 Apr 2024

We experiment with the GANsemble framework on a small and imbalanced microplastics data set.

2
10 Apr 2024

PairAug: What Can Augmented Image-Text Pairs Do for Radiology?

faceonlive/ai-research 7 Apr 2024

Acknowledging this limitation, our objective is to devise a framework capable of concurrently augmenting medical image and text data.

156
07 Apr 2024

FPL+: Filtered Pseudo Label-based Unsupervised Cross-Modality Adaptation for 3D Medical Image Segmentation

hilab-git/fpl-plus 7 Apr 2024

Adapting a medical image segmentation model to a new domain is important for improving its cross-domain transferability, and due to the expensive annotation process, Unsupervised Domain Adaptation (UDA) is appealing where only unlabeled images are needed for the adaptation.

5
07 Apr 2024