Data Summarization
33 papers with code • 0 benchmarks • 2 datasets
Data Summarization is a central problem in the area of machine learning, where we want to compute a small summary of the data.
Benchmarks
These leaderboards are used to track progress in Data Summarization
Libraries
Use these libraries to find Data Summarization models and implementationsLatest papers
DiffRed: Dimensionality Reduction guided by stable rank
We rigorously prove that DiffRed achieves a general upper bound of $O\left(\sqrt{\frac{1-p}{k_2}}\right)$ on Stress and $O\left(\frac{(1-p)}{\sqrt{k_2*\rho(A^{*})}}\right)$ on M1 where $p$ is the fraction of variance explained by the first $k_1$ principal components and $\rho(A^{*})$ is the stable rank of $A^{*}$.
Time-to-Pattern: Information-Theoretic Unsupervised Learning for Scalable Time Series Summarization
Data summarization is the process of generating interpretable and representative subsets from a dataset.
ChartSumm: A Comprehensive Benchmark for Automatic Chart Summarization of Long and Short Summaries
Automatic chart to text summarization is an effective tool for the visually impaired people along with providing precise insights of tabular data in natural language to the user.
MatCha: Enhancing Visual Language Pretraining with Math Reasoning and Chart Derendering
Visual language data such as plots, charts, and infographics are ubiquitous in the human world.
Black-box Coreset Variational Inference
Recent advances in coreset methods have shown that a selection of representative datapoints can replace massive volumes of data for Bayesian inference, preserving the relevant statistical information and significantly accelerating subsequent downstream tasks.
Balancing Utility and Fairness in Submodular Maximization (Technical Report)
Submodular function maximization is a fundamental combinatorial optimization problem with plenty of applications -- including data summarization, influence maximization, and recommendation.
Streaming Algorithms for Diversity Maximization with Fairness Constraints
Given a set $X$ of $n$ elements, it asks to select a subset $S$ of $k \ll n$ elements with maximum \emph{diversity}, as quantified by the dissimilarities among the elements in $S$.
Towards Neural Numeric-To-Text Generation From Temporal Personal Health Data
We examine recurrent, convolutional, and Transformer-based encoder-decoder models to automatically generate natural language summaries from numeric temporal personal health data.
Group Equality in Adaptive Submodular Maximization
In this paper, we study the classic submodular maximization problem subject to a group equality constraint under both non-adaptive and adaptive settings.
Submodlib: A Submodular Optimization Library
A recent work has also leveraged submodular functions to propose submodular information measures which have been found to be very useful in solving the problems of guided subset selection and guided summarization.