Data Augmentation

2529 papers with code • 2 benchmarks • 63 datasets

Data augmentation involves techniques used for increasing the amount of data, based on different modifications, to expand the amount of examples in the original dataset. Data augmentation not only helps to grow the dataset but it also increases the diversity of the dataset. When training machine learning models, data augmentation acts as a regularizer and helps to avoid overfitting.

Data augmentation techniques have been found useful in domains like NLP and computer vision. In computer vision, transformations like cropping, flipping, and rotation are used. In NLP, data augmentation techniques can include swapping, deletion, random insertion, among others.

Further readings:

( Image credit: Albumentations )

Libraries

Use these libraries to find Data Augmentation models and implementations

Latest papers with no code

A Unified Replay-based Continuous Learning Framework for Spatio-Temporal Prediction on Streaming Data

no code yet • 23 Apr 2024

The widespread deployment of wireless and mobile devices results in a proliferation of spatio-temporal data that is used in applications, e. g., traffic prediction, human mobility mining, and air quality prediction, where spatio-temporal prediction is often essential to enable safety, predictability, or reliability.

EEGEncoder: Advancing BCI with Transformer-Based Motor Imagery Classification

no code yet • 23 Apr 2024

Brain-computer interfaces (BCIs) harness electroencephalographic signals for direct neural control of devices, offering a significant benefit for individuals with motor impairments.

A Comparative Study on Enhancing Prediction in Social Network Advertisement through Data Augmentation

no code yet • 22 Apr 2024

In the ever-evolving landscape of social network advertising, the volume and accuracy of data play a critical role in the performance of predictive models.

DSDRNet: Disentangling Representation and Reconstruct Network for Domain Generalization

no code yet • 22 Apr 2024

Domain generalization faces challenges due to the distribution shift between training and testing sets, and the presence of unseen target domains.

SI-FID: Only One Objective Indicator for Evaluating Stitched Images

no code yet • 22 Apr 2024

We then evaluate the altered FID after introducing interference to the test set and examine if the noise can improve the consistency between objective and subjective evaluation results.

SOS-1K: A Fine-grained Suicide Risk Classification Dataset for Chinese Social Media Analysis

no code yet • 19 Apr 2024

Seven pre-trained models were evaluated in two tasks: high and low suicide risk, and fine-grained suicide risk classification on a level of 0 to 10.

Privacy-Preserving Debiasing using Data Augmentation and Machine Unlearning

no code yet • 19 Apr 2024

Data augmentation is widely used to mitigate data bias in the training dataset.

Unlocking Robust Segmentation Across All Age Groups via Continual Learning

no code yet • 19 Apr 2024

Most deep learning models in medical imaging are trained on adult data with unclear performance on pediatric images.

Automatic Cranial Defect Reconstruction with Self-Supervised Deep Deformable Masked Autoencoders

no code yet • 19 Apr 2024

However, even the synthetic ground-truth generation is time-consuming and limits the data heterogeneity, thus the deep models' generalizability.

MambaPupil: Bidirectional Selective Recurrent model for Event-based Eye tracking

no code yet • 18 Apr 2024

Event-based eye tracking has shown great promise with the high temporal resolution and low redundancy provided by the event camera.