Synthetic Data Generation

176 papers with code • 1 benchmarks • 5 datasets

The generation of tabular data by any means possible.

Libraries

Use these libraries to find Synthetic Data Generation models and implementations

Latest papers with no code

ViFu: Multiple 360$^\circ$ Objects Reconstruction with Clean Background via Visible Part Fusion

no code yet • 15 Apr 2024

In this paper, we propose a method to segment and recover a static, clean background and multiple 360$^\circ$ objects from observations of scenes at different timestamps.

SiloFuse: Cross-silo Synthetic Data Generation with Latent Tabular Diffusion Models

no code yet • 4 Apr 2024

We introduce SiloFuse, a novel generative framework for high-quality synthesis from cross-silo tabular data.

Synthetic Data Generation and Joint Learning for Robust Code-Mixed Translation

no code yet • 25 Mar 2024

In this paper, we tackle the problem of code-mixed (Hinglish and Bengalish) to English machine translation.

Does Differentially Private Synthetic Data Lead to Synthetic Discoveries?

no code yet • 20 Mar 2024

Objectives: The aim of this study is to evaluate the Mann-Whitney U test on DP-synthetic biomedical data in terms of Type I and Type II errors, in order to establish whether statistical hypothesis testing performed on privacy preserving synthetic data is likely to lead to loss of test's validity or decreased power.

Six Levels of Privacy: A Framework for Financial Synthetic Data

no code yet • 20 Mar 2024

In addition to the benefits it provides, such as improved financial modeling and better testing procedures, it poses privacy risks as well.

Automated data processing and feature engineering for deep learning and big data applications: a survey

no code yet • 18 Mar 2024

In addition to automating specific data processing tasks, we discuss the use of AutoML methods and tools to simultaneously optimize all stages of the machine learning pipeline.

Structured Evaluation of Synthetic Tabular Data

no code yet • 15 Mar 2024

Many metrics exist for evaluating the quality of synthetic tabular data; however, we lack an objective, coherent interpretation of the many metrics.

Generative AI for Synthetic Data Generation: Methods, Challenges and the Future

no code yet • 7 Mar 2024

The recent surge in research focused on generating synthetic data from large language models (LLMs), especially for scenarios with limited data availability, marks a notable shift in Generative Artificial Intelligence (AI).

LAB: Large-Scale Alignment for ChatBots

no code yet • 2 Mar 2024

This work introduces LAB (Large-scale Alignment for chatBots), a novel methodology designed to overcome the scalability challenges in the instruction-tuning phase of large language model (LLM) training.

A Quantum Approach to Synthetic Minority Oversampling Technique (SMOTE)

no code yet • 27 Feb 2024

The paper proposes the Quantum-SMOTE method, a novel solution that uses quantum computing techniques to solve the prevalent problem of class imbalance in machine learning datasets.