Data Summarization
33 papers with code • 0 benchmarks • 2 datasets
Data Summarization is a central problem in the area of machine learning, where we want to compute a small summary of the data.
Benchmarks
These leaderboards are used to track progress in Data Summarization
Libraries
Use these libraries to find Data Summarization models and implementationsLatest papers
Synthetic Dataset Generation of Driver Telematics
This article describes techniques employed in the production of a synthetic dataset of driver telematics emulated from a similar real insurance dataset.
Sequential estimation of Spearman rank correlation using Hermite series estimators
To treat the non-stationary setting, we introduce a novel, exponentially weighted estimator for the Spearman rank correlation, which allows the local nonparametric correlation of a bivariate data stream to be tracked.
Very Fast Streaming Submodular Function Maximization
Data summarization has become a valuable tool in understanding even terabytes of data.
Semi-supervised Batch Active Learning via Bilevel Optimization
Active learning is an effective technique for reducing the labeling cost by improving data efficiency.
Fair and Representative Subset Selection from Data Streams
We study the problem of extracting a small subset of representative items from a large data stream.
$β$-Cores: Robust Large-Scale Bayesian Data Summarization in the Presence of Outliers
Modern machine learning applications should be able to address the intrinsic challenges arising over inference on massive real-world datasets, including scalability and robustness to outliers.
Understanding collections of related datasets using dependent MMD coresets
Understanding how two datasets differ can help us determine whether one dataset under-represents certain sub-populations, and provides insights into how well models will generalize across datasets.
Flexible Dataset Distillation: Learn Labels Instead of Images
In particular, we study the problem of label distillation - creating synthetic labels for a small set of real images, and show it to be more effective than the prior image-based approach to dataset distillation.
Deuteros 2.0: Peptide-level significance testing of data from hydrogen deuterium exchange mass spectrometry
There are currently very few software packages available that offer quick and informative comparison of HDX-MS datasets and even few-er which offer statistical analysis and advanced visualization.
CO-Optimal Transport
Optimal transport (OT) is a powerful geometric and probabilistic tool for finding correspondences and measuring similarity between two distributions.