Synthetic Data Generation

180 papers with code • 1 benchmarks • 5 datasets

The generation of tabular data by any means possible.

Libraries

Use these libraries to find Synthetic Data Generation models and implementations

Latest papers with no code

Automated data processing and feature engineering for deep learning and big data applications: a survey

no code yet • 18 Mar 2024

In addition to automating specific data processing tasks, we discuss the use of AutoML methods and tools to simultaneously optimize all stages of the machine learning pipeline.

Structured Evaluation of Synthetic Tabular Data

no code yet • 15 Mar 2024

Many metrics exist for evaluating the quality of synthetic tabular data; however, we lack an objective, coherent interpretation of the many metrics.

Generative AI for Synthetic Data Generation: Methods, Challenges and the Future

no code yet • 7 Mar 2024

The recent surge in research focused on generating synthetic data from large language models (LLMs), especially for scenarios with limited data availability, marks a notable shift in Generative Artificial Intelligence (AI).

LAB: Large-Scale Alignment for ChatBots

no code yet • 2 Mar 2024

This work introduces LAB (Large-scale Alignment for chatBots), a novel methodology designed to overcome the scalability challenges in the instruction-tuning phase of large language model (LLM) training.

A Quantum Approach to Synthetic Minority Oversampling Technique (SMOTE)

no code yet • 27 Feb 2024

The paper proposes the Quantum-SMOTE method, a novel solution that uses quantum computing techniques to solve the prevalent problem of class imbalance in machine learning datasets.

Enhancement of 3D Camera Synthetic Training Data with Noise Models

no code yet • 26 Feb 2024

The goal of this paper is to assess the impact of noise in 3D camera-captured data by modeling the noise of the imaging process and applying it on synthetic training data.

API-BLEND: A Comprehensive Corpora for Training and Benchmarking API LLMs

no code yet • 23 Feb 2024

There is a growing need for Large Language Models (LLMs) to effectively use tools and external Application Programming Interfaces (APIs) to plan and complete tasks.

Protect and Extend -- Using GANs for Synthetic Data Generation of Time-Series Medical Records

no code yet • 21 Feb 2024

Preservation of private user data is of paramount importance for high Quality of Experience (QoE) and acceptability, particularly with services treating sensitive data, such as IT-based health services.

Grasping the Essentials: Tailoring Large Language Models for Zero-Shot Relation Extraction

no code yet • 17 Feb 2024

(2) We fine-tune a bidirectional Small Language Model (SLM) using these initial seeds to learn the relations for the target domain.

Generative Modeling for Tabular Data via Penalized Optimal Transport Network

no code yet • 16 Feb 2024

To this end, we propose POTNet (Penalized Optimal Transport Network), a generative deep neural network based on a novel, robust, and interpretable marginally-penalized Wasserstein (MPW) loss.