Data Augmentation

2513 papers with code • 2 benchmarks • 63 datasets

Data augmentation involves techniques used for increasing the amount of data, based on different modifications, to expand the amount of examples in the original dataset. Data augmentation not only helps to grow the dataset but it also increases the diversity of the dataset. When training machine learning models, data augmentation acts as a regularizer and helps to avoid overfitting.

Data augmentation techniques have been found useful in domains like NLP and computer vision. In computer vision, transformations like cropping, flipping, and rotation are used. In NLP, data augmentation techniques can include swapping, deletion, random insertion, among others.

Further readings:

( Image credit: Albumentations )

Libraries

Use these libraries to find Data Augmentation models and implementations

Aligning Actions and Walking to LLM-Generated Textual Descriptions

radu1999/walkandtext 18 Apr 2024

For action recognition, we employ LLMs to generate textual descriptions of actions in the BABEL-60 dataset, facilitating the alignment of motion sequences with linguistic representations.

0
18 Apr 2024

The Victim and The Beneficiary: Exploiting a Poisoned Model to Train a Clean Model on Poisoned Data

zixuan-zhu/vab ICCV 2023

This inspires us to propose a novel dual-network training framework: The Victim and The Beneficiary (V&B), which exploits a poisoned model to train a clean model without extra benign samples.

6
17 Apr 2024

Consistency Training by Synthetic Question Generation for Conversational Question Answering

hamedhematian/syncqg 17 Apr 2024

In our novel model-agnostic approach, referred to as CoTaH (Consistency-Trained augmented History), we augment the historical information with synthetic questions and subsequently employ consistency training to train a model that utilizes both real and augmented historical data to implicitly make the reasoning robust to irrelevant history.

0
17 Apr 2024

Can We Break Free from Strong Data Augmentations in Self-Supervised Learning?

neurai-lab/ssl-prior 15 Apr 2024

Self-supervised learning (SSL) has emerged as a promising solution for addressing the challenge of limited labeled data in deep neural networks (DNNs), offering scalability potential.

0
15 Apr 2024

RoofDiffusion: Constructing Roofs from Severely Corrupted Point Data via Diffusion

kylelo/roofdiffusion 14 Apr 2024

Accurate completion and denoising of roof height maps are crucial to reconstructing high-quality 3D buildings.

2
14 Apr 2024

DeDoDe v2: Analyzing and Improving the DeDoDe Keypoint Detector

parskatt/dedode 13 Apr 2024

First, we find that DeDoDe keypoints tend to cluster together, which we fix by performing non-max suppression on the target distribution of the detector during training.

305
13 Apr 2024

MaSkel: A Model for Human Whole-body X-rays Generation from Human Masking Images

2022yingjie/maskel 13 Apr 2024

In our work, We proposed a new method to directly generate the 2D human whole-body X-rays from the human masking images.

2
13 Apr 2024

An evaluation framework for synthetic data generation models

novelcore/synthetic_data_evaluation_framework 13 Apr 2024

Two use case scenarios demonstrate the applicability of the proposed framework for evaluating the ability of synthetic data generation models to generated high quality data.

1
13 Apr 2024

Joint Physical-Digital Facial Attack Detection Via Simulating Spoofing Clues

FaceOnLive/Face-Liveness-Detection-SDK-Linux 12 Apr 2024

SPSC and SDSC augment live samples into simulated attack samples by simulating spoofing clues of physical and digital attacks, respectively, which significantly improve the capability of the model to detect "unseen" attack types.

205
12 Apr 2024

Masked Image Modeling as a Framework for Self-Supervised Learning across Eye Movements

faceonlive/ai-research 12 Apr 2024

To make sense of their surroundings, intelligent systems must transform complex sensory inputs to structured codes that are reduced to task-relevant information such as object category.

144
12 Apr 2024