Data Summarization
33 papers with code • 0 benchmarks • 2 datasets
Data Summarization is a central problem in the area of machine learning, where we want to compute a small summary of the data.
Benchmarks
These leaderboards are used to track progress in Data Summarization
Libraries
Use these libraries to find Data Summarization models and implementationsLatest papers
Streaming Submodular Maximization under a $k$-Set System Constraint
In this paper, we propose a novel framework that converts streaming algorithms for monotone submodular maximization into streaming algorithms for non-monotone submodular maximization.
Scalability vs. Utility: Do We Have to Sacrifice One for the Other in Data Importance Quantification?
Quantifying the importance of each training point to a learning task is a fundamental problem in machine learning and the estimated importance scores have been leveraged to guide a range of data workflows such as data summarization and domain adaption.
Soft-Label Dataset Distillation and Text Dataset Distillation
We propose to simultaneously distill both images and their labels, thus assigning each synthetic sample a `soft' label (a distribution of labels).
Fast and Accurate Least-Mean-Squares Solvers
Least-mean squares (LMS) solvers such as Linear / Ridge / Lasso-Regression, SVD and Elastic-Net not only solve fundamental machine learning problems, but are also the building blocks in a variety of other methods, such as decision trees and matrix factorizations.
apricot: Submodular selection for data summarization in Python
This paper presents an explanation of submodular selection, an overview of the features in apricot, and an application to several data sets.
Fair k-Center Clustering for Data Summarization
In data summarization we want to choose $k$ prototypes in order to summarize a data set.
Iterative Projection and Matching: Finding Structure-preserving Representatives and Its Application to Computer Vision
In our algorithm, at each iteration, the maximum information from the structure of the data is captured by one selected sample, and the captured information is neglected in the next iterations by projection on the null-space of previously selected samples.
Coverage-Based Designs Improve Sample Mining and Hyper-Parameter Optimization
Sampling one or more effective solutions from large search spaces is a recurring idea in machine learning, and sequential optimization has become a popular solution.
A Mixed Hierarchical Attention based Encoder-Decoder Approach for Standard Table Summarization
Structured data summarization involves generation of natural language summaries from structured input data.
Fair and Diverse DPP-based Data Summarization
Sampling methods that choose a subset of the data proportional to its diversity in the feature space are popular for data summarization.