Synthetic Data Generation
180 papers with code • 1 benchmarks • 5 datasets
The generation of tabular data by any means possible.
Libraries
Use these libraries to find Synthetic Data Generation models and implementationsDatasets
Latest papers
Simulating Task-Oriented Dialogues with State Transition Graphs and Large Language Models
In our experiments, using graph-guided response simulations leads to significant improvements in intent classification, slot filling and response relevance compared to naive single-prompt simulated conversations.
Better Synthetic Data by Retrieving and Transforming Existing Datasets
Recent work has studied prompt-driven synthetic data generation using large language models, but these generated datasets tend to lack complexity and diversity.
Aligning Actions and Walking to LLM-Generated Textual Descriptions
For action recognition, we employ LLMs to generate textual descriptions of actions in the BABEL-60 dataset, facilitating the alignment of motion sequences with linguistic representations.
An evaluation framework for synthetic data generation models
Two use case scenarios demonstrate the applicability of the proposed framework for evaluating the ability of synthetic data generation models to generated high quality data.
Towards Algorithmic Fidelity: Mental Health Representation across Demographics in Synthetic vs. Human-generated Data
Using GPT-3, we develop HEADROOM, a synthetic dataset of 3, 120 posts about depression-triggering stressors, by controlling for race, gender, and time frame (before and after COVID-19).
SYNCS: Synthetic Data and Contrastive Self-Supervised Training for Central Sulcus Segmentation
Identifying risk markers early is crucial for understanding disease progression and enabling preventive measures.
Joint Selection: Adaptively Incorporating Public Information for Private Synthetic Data
This technique allows for public data to be included in a graphical-model-based mechanism.
Synthetic data generation for system identification: leveraging knowledge transfer from similar systems
This paper addresses the challenge of overfitting in the learning of dynamical systems by introducing a novel approach for the generation of synthetic data, aimed at enhancing model generalization and robustness in scenarios characterized by data scarcity.
IR2: Information Regularization for Information Retrieval
This approach, representing a novel application of regularization techniques in synthetic data creation for IR, is tested on three recent IR tasks characterized by complex queries: DORIS-MAE, ArguAna, and WhatsThatBook.
Synthetic location trajectory generation using categorical diffusion models
Diffusion probabilistic models (DPMs) have rapidly evolved to be one of the predominant generative models for the simulation of synthetic data, for instance, for computer vision, audio, natural language processing, or biomolecule generation.