Data Summarization

33 papers with code • 0 benchmarks • 2 datasets

Data Summarization is a central problem in the area of machine learning, where we want to compute a small summary of the data.

Source: How to Solve Fair k-Center in Massive Data Models

Benchmarks

Add a Result

These leaderboards are used to track progress in Data Summarization

No evaluation results yet. Help compare methods by submitting evaluation metrics.

Libraries

Use these libraries to find Data Summarization models and implementations

MikeJaredS/hermiter

2 papers

Datasets

Latest papers

Most implemented Social Latest No code

Synthetic Dataset Generation of Driver Telematics

sstocksieker/dair • 30 Jan 2021

This article describes techniques employed in the production of a synthetic dataset of driver telematics emulated from a similar real insurance dataset.

30 Jan 2021

Paper
Code

Sequential estimation of Spearman rank correlation using Hermite series estimators

MikeJaredS/hermiter • 11 Dec 2020

To treat the non-stationary setting, we introduce a novel, exponentially weighted estimator for the Spearman rank correlation, which allows the local nonparametric correlation of a bivariate data stream to be tracked.

11 Dec 2020

Paper
Code

Very Fast Streaming Submodular Function Maximization

sbuschjaeger/SubmodularStreamingMaximization • 20 Oct 2020

Data summarization has become a valuable tool in understanding even terabytes of data.

20 Oct 2020

Paper
Code

Semi-supervised Batch Active Learning via Bilevel Optimization

zalanborsos/bilevel_coresets • • 19 Oct 2020

Active learning is an effective technique for reducing the labeling cost by improving data efficiency.

19 Oct 2020

Paper
Code

Fair and Representative Subset Selection from Data Streams

FraFabbri/fair-subset-datastream • 9 Oct 2020

We study the problem of extracting a small subset of representative items from a large data stream.

09 Oct 2020

Paper
Code

$β$-Cores: Robust Large-Scale Bayesian Data Summarization in the Presence of Outliers

dionman/beta-cores • 31 Aug 2020

Modern machine learning applications should be able to address the intrinsic challenges arising over inference on massive real-world datasets, including scalability and robustness to outliers.

31 Aug 2020

Paper
Code

Understanding collections of related datasets using dependent MMD coresets

sinead/dmmd • 24 Jun 2020

Understanding how two datasets differ can help us determine whether one dataset under-represents certain sub-populations, and provides insights into how well models will generalize across datasets.

24 Jun 2020

Paper
Code

Flexible Dataset Distillation: Learn Labels Instead of Images

Guang000/Awesome-Dataset-Distillation • 15 Jun 2020

In particular, we study the problem of label distillation - creating synthetic labels for a small set of real images, and show it to be more effective than the prior image-based approach to dataset distillation.

1,174

15 Jun 2020

Paper
Code

Deuteros 2.0: Peptide-level significance testing of data from hydrogen deuterium exchange mass spectrometry

andymlau/Deuteros_2.0 • 17 May 2020

There are currently very few software packages available that offer quick and informative comparison of HDX-MS datasets and even few-er which offer statistical analysis and advanced visualization.

17 May 2020

Paper
Code

CO-Optimal Transport

PythonOT/COOT • NeurIPS 2020

Optimal transport (OT) is a powerful geometric and probabilistic tool for finding correspondences and measuring similarity between two distributions.

10 Feb 2020

Paper
Code

Data Summarization

Benchmarks Add a Result

Libraries

Datasets

Latest papers

Content

Benchmarks

Add a Result