Data Augmentation
2555 papers with code • 2 benchmarks • 63 datasets
Data augmentation involves techniques used for increasing the amount of data, based on different modifications, to expand the amount of examples in the original dataset. Data augmentation not only helps to grow the dataset but it also increases the diversity of the dataset. When training machine learning models, data augmentation acts as a regularizer and helps to avoid overfitting.
Data augmentation techniques have been found useful in domains like NLP and computer vision. In computer vision, transformations like cropping, flipping, and rotation are used. In NLP, data augmentation techniques can include swapping, deletion, random insertion, among others.
Further readings:
- A Survey of Data Augmentation Approaches for NLP
- A survey on Image Data Augmentation for Deep Learning
( Image credit: Albumentations )
Libraries
Use these libraries to find Data Augmentation models and implementationsLatest papers with no code
A Comprehensive Survey on Data Augmentation
Existing literature surveys only focus on a certain type of specific modality data, and categorize these methods from modality-specific and operation-centric perspectives, which lacks a consistent summary of data augmentation methods across multiple modalities and limits the comprehension of how existing data samples serve the data augmentation process.
Training Deep Learning Models with Hybrid Datasets for Robust Automatic Target Detection on real SAR images
To address the lack of representative training data, we propose a Deep Learning approach to train ATD models with synthetic target signatures produced with the MOCEM simulator.
Targeted Augmentation for Low-Resource Event Extraction
Addressing the challenge of low-resource information extraction remains an ongoing issue due to the inherent information scarcity within limited training examples.
The RoboDrive Challenge: Drive Anytime Anywhere in Any Condition
In the realm of autonomous driving, robust perception under out-of-distribution conditions is paramount for the safe deployment of vehicles.
Image to Pseudo-Episode: Boosting Few-Shot Segmentation by Unlabeled Data
Few-shot segmentation (FSS) aims to train a model which can segment the object from novel classes with a few labeled samples.
Dynamic Feature Learning and Matching for Class-Incremental Learning
The misalignment between dynamic feature and classifier constrains the capabilities of the model.
Stable Diffusion-based Data Augmentation for Federated Learning with Non-IID Data
The proliferation of edge devices has brought Federated Learning (FL) to the forefront as a promising paradigm for decentralized and collaborative model training while preserving the privacy of clients' data.
MuMath-Code: Combining Tool-Use Large Language Models with Multi-perspective Data Augmentation for Mathematical Reasoning
To fully leverage the advantages of our augmented data, we propose a two-stage training strategy: In Stage-1, we finetune Llama-2 on pure CoT data to get an intermediate model, which then is trained on the code-nested data in Stage-2 to get the resulting MuMath-Code.
Feature Expansion and enhanced Compression for Class Incremental Learning
In this context, we propose a new algorithm that enhances the compression of previous class knowledge by cutting and mixing patches of previous class samples with the new images during compression using our Rehearsal-CutMix method.
CoViews: Adaptive Augmentation Using Cooperative Views for Enhanced Contrastive Learning
In this paper, we address these challenges by proposing a framework for learning efficient adaptive data augmentation policies for contrastive learning with minimal computational overhead.