Data Augmentation
2529 papers with code • 2 benchmarks • 63 datasets
Data augmentation involves techniques used for increasing the amount of data, based on different modifications, to expand the amount of examples in the original dataset. Data augmentation not only helps to grow the dataset but it also increases the diversity of the dataset. When training machine learning models, data augmentation acts as a regularizer and helps to avoid overfitting.
Data augmentation techniques have been found useful in domains like NLP and computer vision. In computer vision, transformations like cropping, flipping, and rotation are used. In NLP, data augmentation techniques can include swapping, deletion, random insertion, among others.
Further readings:
- A Survey of Data Augmentation Approaches for NLP
- A survey on Image Data Augmentation for Deep Learning
( Image credit: Albumentations )
Libraries
Use these libraries to find Data Augmentation models and implementationsLatest papers with no code
A Unified Replay-based Continuous Learning Framework for Spatio-Temporal Prediction on Streaming Data
The widespread deployment of wireless and mobile devices results in a proliferation of spatio-temporal data that is used in applications, e. g., traffic prediction, human mobility mining, and air quality prediction, where spatio-temporal prediction is often essential to enable safety, predictability, or reliability.
EEGEncoder: Advancing BCI with Transformer-Based Motor Imagery Classification
Brain-computer interfaces (BCIs) harness electroencephalographic signals for direct neural control of devices, offering a significant benefit for individuals with motor impairments.
A Comparative Study on Enhancing Prediction in Social Network Advertisement through Data Augmentation
In the ever-evolving landscape of social network advertising, the volume and accuracy of data play a critical role in the performance of predictive models.
DSDRNet: Disentangling Representation and Reconstruct Network for Domain Generalization
Domain generalization faces challenges due to the distribution shift between training and testing sets, and the presence of unseen target domains.
SI-FID: Only One Objective Indicator for Evaluating Stitched Images
We then evaluate the altered FID after introducing interference to the test set and examine if the noise can improve the consistency between objective and subjective evaluation results.
SOS-1K: A Fine-grained Suicide Risk Classification Dataset for Chinese Social Media Analysis
Seven pre-trained models were evaluated in two tasks: high and low suicide risk, and fine-grained suicide risk classification on a level of 0 to 10.
Privacy-Preserving Debiasing using Data Augmentation and Machine Unlearning
Data augmentation is widely used to mitigate data bias in the training dataset.
Unlocking Robust Segmentation Across All Age Groups via Continual Learning
Most deep learning models in medical imaging are trained on adult data with unclear performance on pediatric images.
Automatic Cranial Defect Reconstruction with Self-Supervised Deep Deformable Masked Autoencoders
However, even the synthetic ground-truth generation is time-consuming and limits the data heterogeneity, thus the deep models' generalizability.
MambaPupil: Bidirectional Selective Recurrent model for Event-based Eye tracking
Event-based eye tracking has shown great promise with the high temporal resolution and low redundancy provided by the event camera.