Data Augmentation

2555 papers with code • 2 benchmarks • 63 datasets

Data augmentation involves techniques used for increasing the amount of data, based on different modifications, to expand the amount of examples in the original dataset. Data augmentation not only helps to grow the dataset but it also increases the diversity of the dataset. When training machine learning models, data augmentation acts as a regularizer and helps to avoid overfitting.

Data augmentation techniques have been found useful in domains like NLP and computer vision. In computer vision, transformations like cropping, flipping, and rotation are used. In NLP, data augmentation techniques can include swapping, deletion, random insertion, among others.

Further readings:

( Image credit: Albumentations )

Libraries

Use these libraries to find Data Augmentation models and implementations

Latest papers with no code

A Comprehensive Survey on Data Augmentation

no code yet • 15 May 2024

Existing literature surveys only focus on a certain type of specific modality data, and categorize these methods from modality-specific and operation-centric perspectives, which lacks a consistent summary of data augmentation methods across multiple modalities and limits the comprehension of how existing data samples serve the data augmentation process.

Training Deep Learning Models with Hybrid Datasets for Robust Automatic Target Detection on real SAR images

no code yet • 15 May 2024

To address the lack of representative training data, we propose a Deep Learning approach to train ATD models with synthetic target signatures produced with the MOCEM simulator.

Targeted Augmentation for Low-Resource Event Extraction

no code yet • 14 May 2024

Addressing the challenge of low-resource information extraction remains an ongoing issue due to the inherent information scarcity within limited training examples.

The RoboDrive Challenge: Drive Anytime Anywhere in Any Condition

no code yet • 14 May 2024

In the realm of autonomous driving, robust perception under out-of-distribution conditions is paramount for the safe deployment of vehicles.

Image to Pseudo-Episode: Boosting Few-Shot Segmentation by Unlabeled Data

no code yet • 14 May 2024

Few-shot segmentation (FSS) aims to train a model which can segment the object from novel classes with a few labeled samples.

Dynamic Feature Learning and Matching for Class-Incremental Learning

no code yet • 14 May 2024

The misalignment between dynamic feature and classifier constrains the capabilities of the model.

Stable Diffusion-based Data Augmentation for Federated Learning with Non-IID Data

no code yet • 13 May 2024

The proliferation of edge devices has brought Federated Learning (FL) to the forefront as a promising paradigm for decentralized and collaborative model training while preserving the privacy of clients' data.

MuMath-Code: Combining Tool-Use Large Language Models with Multi-perspective Data Augmentation for Mathematical Reasoning

no code yet • 13 May 2024

To fully leverage the advantages of our augmented data, we propose a two-stage training strategy: In Stage-1, we finetune Llama-2 on pure CoT data to get an intermediate model, which then is trained on the code-nested data in Stage-2 to get the resulting MuMath-Code.

Feature Expansion and enhanced Compression for Class Incremental Learning

no code yet • 13 May 2024

In this context, we propose a new algorithm that enhances the compression of previous class knowledge by cutting and mixing patches of previous class samples with the new images during compression using our Rehearsal-CutMix method.

CoViews: Adaptive Augmentation Using Cooperative Views for Enhanced Contrastive Learning

no code yet • 12 May 2024

In this paper, we address these challenges by proposing a framework for learning efficient adaptive data augmentation policies for contrastive learning with minimal computational overhead.