Data Valuation

29 papers with code • 0 benchmarks • 0 datasets

Data valuation in machine learning tries to determine the worth of data, or data sets, for downstream tasks. Some methods are task-agnostic and consider datasets as a whole, mostly for decision making in data markets. These look at distributional distances between samples. More often, methods look at how individual points affect performance of specific machine learning models. They assign a scalar to each element of a training set which reflects its contribution to the final performance of some model trained on it. Some concepts of value depend on a specific model of interest, others are model-agnostic.

Concepts of the usefulness of a datum or its influence on the outcome of a prediction have a long history in statistics and ML, in particular through the notion of the influence function. However, it has only been recently that rigorous and practical notions of value for data, and in particular data-sets, have appeared in the ML literature, often based on concepts from collaborative game theory, but also from generalization estimates of neural networks, or optimal transport theory, among others.

Benchmarks

Add a Result

These leaderboards are used to track progress in Data Valuation

No evaluation results yet. Help compare methods by submitting evaluation metrics.

Libraries

Use these libraries to find Data Valuation models and implementations

aai-institute/pyDVL

8 papers

Subtasks

Data Interaction

Most implemented papers

Most implemented Social Latest No code

2D-Shapley: A Framework for Fragmented Data Valuation

ruoxi-jia-group/2dshapley • 18 Jun 2023

Data valuation -- quantifying the contribution of individual data sources to certain predictive behaviors of a model -- is of great importance to enhancing the transparency of machine learning and designing incentive systems for data sharing.

Paper
Code

Exploring Data Redundancy in Real-world Image Classification through Data Selection

zhenyutang2023/data_selection • • 25 Jun 2023

Deep learning models often require large amounts of data for training, leading to increased costs.

Paper
Code

Data Valuation and Detections in Federated Learning

muz1lee/motdata • • 9 Nov 2023

In scenarios involving numerous data clients within FL, it is often the case that only a subset of clients and datasets are pertinent to a specific learning task, while others might have either a negative or negligible impact on the model training process.

Paper
Code

DeRDaVa: Deletion-Robust Data Valuation for Machine Learning

snoidetx/derdava • 18 Dec 2023

Data valuation is concerned with determining a fair valuation of data from data sources to compensate them or to identify training examples that are the most or least useful for predictions.

Paper
Code

Stochastic Amortization: A Unified Approach to Accelerate Feature and Data Attribution

iancovert/amortized-valuation • • 29 Jan 2024

Many tasks in explainable machine learning, such as data valuation and feature attribution, perform expensive computation for each data point and can be intractable for large datasets.

Paper
Code

Interpretable Machine Learning for TabPFN

david-rundel/tabpfn_iml • • 16 Mar 2024

The recently developed Prior-Data Fitted Networks (PFNs) have shown very promising results for applications in low-data regimes.

Paper
Code

Neural Dynamic Data Valuation

liangzhangyong/nddv • • 30 Apr 2024

Data constitute the foundational component of the data economy and its marketplaces.

Paper
Code

Data Valuation with Gradient Similarity

nathanieljevans/DVGS • • 13 May 2024

High-quality data is crucial for accurate machine learning and actionable analytics, however, mislabeled or noisy data is a common problem in many domains.

Paper
Code

What is Your Data Worth to GPT? LLM-Scale Data Valuation with Influence Functions

logix-project/logix • • 22 May 2024

Large language models (LLMs) are trained on a vast amount of human-written data, but data providers often remain uncredited.

Paper
Code

Data Valuation

Benchmarks Add a Result

Libraries

Subtasks

Most implemented papers

Content

Benchmarks

Add a Result