Data Augmentation
2517 papers with code • 2 benchmarks • 63 datasets
Data augmentation involves techniques used for increasing the amount of data, based on different modifications, to expand the amount of examples in the original dataset. Data augmentation not only helps to grow the dataset but it also increases the diversity of the dataset. When training machine learning models, data augmentation acts as a regularizer and helps to avoid overfitting.
Data augmentation techniques have been found useful in domains like NLP and computer vision. In computer vision, transformations like cropping, flipping, and rotation are used. In NLP, data augmentation techniques can include swapping, deletion, random insertion, among others.
Further readings:
- A Survey of Data Augmentation Approaches for NLP
- A survey on Image Data Augmentation for Deep Learning
( Image credit: Albumentations )
Libraries
Use these libraries to find Data Augmentation models and implementationsLatest papers
Joint Physical-Digital Facial Attack Detection Via Simulating Spoofing Clues
SPSC and SDSC augment live samples into simulated attack samples by simulating spoofing clues of physical and digital attacks, respectively, which significantly improve the capability of the model to detect "unseen" attack types.
Masked Image Modeling as a Framework for Self-Supervised Learning across Eye Movements
To make sense of their surroundings, intelligent systems must transform complex sensory inputs to structured codes that are reduced to task-relevant information such as object category.
Data-Augmentation-Based Dialectal Adaptation for LLMs
We propose an approach that combines the strengths of different types of language models and leverages data augmentation techniques to improve task performance on three South Slavic dialects: Chakavian, Cherkano, and Torlak.
AnnoCTR: A Dataset for Detecting and Linking Entities, Tactics, and Techniques in Cyber Threat Reports
In our few-shot scenario, we find that for identifying the MITRE ATT&CK concepts that are mentioned explicitly or implicitly in a text, concept descriptions from MITRE ATT&CK are an effective source for training data augmentation.
Leveraging Data Augmentation for Process Information Extraction
Our study shows, that data augmentation is an important component in enabling machine learning methods for the task of business process model generation from natural language text, where currently mostly rule-based systems are still state of the art.
Nostra Domina at EvaLatin 2024: Improving Latin Polarity Detection through Data Augmentation
This paper describes submissions from the team Nostra Domina to the EvaLatin 2024 shared task of emotion polarity detection.
MindBridge: A Cross-Subject Brain Decoding Framework
Currently, brain decoding is confined to a per-subject-per-model paradigm, limiting its applicability to the same individual for whom the decoding model is trained.
GANsemble for Small and Imbalanced Data Sets: A Baseline for Synthetic Microplastics Data
We experiment with the GANsemble framework on a small and imbalanced microplastics data set.
PairAug: What Can Augmented Image-Text Pairs Do for Radiology?
Acknowledging this limitation, our objective is to devise a framework capable of concurrently augmenting medical image and text data.
FPL+: Filtered Pseudo Label-based Unsupervised Cross-Modality Adaptation for 3D Medical Image Segmentation
Adapting a medical image segmentation model to a new domain is important for improving its cross-domain transferability, and due to the expensive annotation process, Unsupervised Domain Adaptation (UDA) is appealing where only unlabeled images are needed for the adaptation.