Data Summarization

33 papers with code • 0 benchmarks • 2 datasets

Data Summarization is a central problem in the area of machine learning, where we want to compute a small summary of the data.

Source: How to Solve Fair k-Center in Massive Data Models

Libraries

Use these libraries to find Data Summarization models and implementations

Most implemented papers

Soft-Label Dataset Distillation and Text Dataset Distillation

ilia10000/dataset-distillation 6 Oct 2019

We propose to simultaneously distill both images and their labels, thus assigning each synthetic sample a `soft' label (a distribution of labels).

Iterative Projection and Matching: Finding Structure-preserving Representatives and Its Application to Computer Vision

zaeemzadeh/IPM CVPR 2019

In our algorithm, at each iteration, the maximum information from the structure of the data is captured by one selected sample, and the captured information is neglected in the next iterations by projection on the null-space of previously selected samples.

Flexible Dataset Distillation: Learn Labels Instead of Images

ondrejbohdal/label-distillation 15 Jun 2020

In particular, we study the problem of label distillation - creating synthetic labels for a small set of real images, and show it to be more effective than the prior image-based approach to dataset distillation.

Sequential estimation of Spearman rank correlation using Hermite series estimators

MikeJaredS/hermiter 11 Dec 2020

To treat the non-stationary setting, we introduce a novel, exponentially weighted estimator for the Spearman rank correlation, which allows the local nonparametric correlation of a bivariate data stream to be tracked.

Sequential Quantiles via Hermite Series Density Estimation

MikeJaredS/hermiter 17 Jul 2015

These algorithms go beyond existing sequential quantile estimation algorithms in that they allow arbitrary quantiles (as opposed to pre-specified quantiles) to be estimated at any point in time.

Scalable k-Means Clustering via Lightweight Coresets

webis-de/small-text 27 Feb 2017

As such, they have been successfully used to scale up clustering models to massive data sets.

An Online Algorithm for Nonparametric Correlations

wxiao0421/onlineNPCORR 5 Dec 2017

This paper investigates the problem of computing nonparametric correlations on the fly for streaming data.

Fair and Diverse DPP-based Data Summarization

DamianStraszak/FairDiverseDPPSampling ICML 2018

Sampling methods that choose a subset of the data proportional to its diversity in the feature space are popular for data summarization.

A Mixed Hierarchical Attention based Encoder-Decoder Approach for Standard Table Summarization

parajain/StructuredData_To_Descriptions NAACL 2018

Structured data summarization involves generation of natural language summaries from structured input data.

Coverage-Based Designs Improve Sample Mining and Hyper-Parameter Optimization

gowthamasu/Coverage_based_sample_design 5 Sep 2018

Sampling one or more effective solutions from large search spaces is a recurring idea in machine learning, and sequential optimization has become a popular solution.