Data Augmentation
2529 papers with code • 2 benchmarks • 63 datasets
Data augmentation involves techniques used for increasing the amount of data, based on different modifications, to expand the amount of examples in the original dataset. Data augmentation not only helps to grow the dataset but it also increases the diversity of the dataset. When training machine learning models, data augmentation acts as a regularizer and helps to avoid overfitting.
Data augmentation techniques have been found useful in domains like NLP and computer vision. In computer vision, transformations like cropping, flipping, and rotation are used. In NLP, data augmentation techniques can include swapping, deletion, random insertion, among others.
Further readings:
- A Survey of Data Augmentation Approaches for NLP
- A survey on Image Data Augmentation for Deep Learning
( Image credit: Albumentations )
Libraries
Use these libraries to find Data Augmentation models and implementationsLatest papers
Aligning Actions and Walking to LLM-Generated Textual Descriptions
For action recognition, we employ LLMs to generate textual descriptions of actions in the BABEL-60 dataset, facilitating the alignment of motion sequences with linguistic representations.
The Victim and The Beneficiary: Exploiting a Poisoned Model to Train a Clean Model on Poisoned Data
This inspires us to propose a novel dual-network training framework: The Victim and The Beneficiary (V&B), which exploits a poisoned model to train a clean model without extra benign samples.
Consistency Training by Synthetic Question Generation for Conversational Question Answering
In our novel model-agnostic approach, referred to as CoTaH (Consistency-Trained augmented History), we augment the historical information with synthetic questions and subsequently employ consistency training to train a model that utilizes both real and augmented historical data to implicitly make the reasoning robust to irrelevant history.
Can We Break Free from Strong Data Augmentations in Self-Supervised Learning?
Self-supervised learning (SSL) has emerged as a promising solution for addressing the challenge of limited labeled data in deep neural networks (DNNs), offering scalability potential.
RoofDiffusion: Constructing Roofs from Severely Corrupted Point Data via Diffusion
Accurate completion and denoising of roof height maps are crucial to reconstructing high-quality 3D buildings.
DeDoDe v2: Analyzing and Improving the DeDoDe Keypoint Detector
First, we find that DeDoDe keypoints tend to cluster together, which we fix by performing non-max suppression on the target distribution of the detector during training.
MaSkel: A Model for Human Whole-body X-rays Generation from Human Masking Images
In our work, We proposed a new method to directly generate the 2D human whole-body X-rays from the human masking images.
An evaluation framework for synthetic data generation models
Two use case scenarios demonstrate the applicability of the proposed framework for evaluating the ability of synthetic data generation models to generated high quality data.
Joint Physical-Digital Facial Attack Detection Via Simulating Spoofing Clues
SPSC and SDSC augment live samples into simulated attack samples by simulating spoofing clues of physical and digital attacks, respectively, which significantly improve the capability of the model to detect "unseen" attack types.
Masked Image Modeling as a Framework for Self-Supervised Learning across Eye Movements
To make sense of their surroundings, intelligent systems must transform complex sensory inputs to structured codes that are reduced to task-relevant information such as object category.