# Deep Learning Meets Projective Clustering

A common approach for compressing NLP networks is to encode the embedding layer as a matrix $A\in\mathbb{R}^{n\times d}$, compute its rank-$j$ approximation $A_j$ via SVD, and then factor $A_j$ into a pair of matrices that correspond to smaller fully-connected layers to replace the original embedding layer. Geometrically, the rows of $A$ represent points in $\mathbb{R}^d$, and the rows of $A_j$ represent their projections onto the $j$-dimensional subspace that minimizes the sum of squared distances ("errors") to the points... (read more)

PDF Abstract ICLR 2021 PDF (under review) ICLR 2021 Abstract (under review)

# Code Add Remove Mark official

No code implementations yet. Submit your code now