Asymptotic Optimality of Self-Representative Low-Rank Approximation and Its Applications

We propose a novel technique for sampling representatives from a large, unsupervised dataset. The approach is based on the concept of {\em self-rank}, defined as the minimum number of samples needed to reconstruct all samples with an accuracy proportional to the rank-$K$ approximation. As the exact computation of self-rank requires a computationally expensive combinatorial search, we propose an efficient algorithm that jointly estimates self-rank and selects the optimal samples with high accuracy. A theoretical upper bound is derived that reaches the tightest bound for two asymptotic cases. The best approximation ratio for self-representative low-rank approximation was presented in ICML 2017~\cite{Chierichetti-icml-2017}, which was further improved by the bound $\sqrt{1+K}$ reported in~NeurIPS 2019~\cite{dan2019optimal}. Both of these bounds depend solely on the number of selected samples. In this paper, for the first time, we present an adaptive approximation ratio depending on spectral properties of the original dataset, $\small{\boldsymbol{A}\in \mathbb{R}^{N\times M}}$. In particular, our performance bound is proportional to the condition number $\kappa(\boldsymbol{A})$. Our derived approximation ratio is expressed as $1+(\kappa(\boldsymbol{A})^2-1)/(N-K)$ which approaches $1$ in two asymptotic cases. In addition to evaluating the proposed algorithm on a synthetic dataset, we show that the proposed sampling scheme can be utilized in real-world applications such as graph node sampling for optimizing the shortest path criterion, and learning a classifier with sampled data.

PDF Abstract
No code implementations yet. Submit your code now

Tasks


Datasets


Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here