Scaling Hierarchical Coreference with Homomorphic Compression

AKBC 2019 · Michael Wick, Swetasudha Panda, Joseph Tassarotti, Jean-Baptiste Tristan ·

Locality sensitive hashing schemes such as \simhash provide compact representations of multisets from which similarity can be estimated. However, in certain applications, we need to estimate the similarity of dynamically changing sets. In this case, we need the representation to be a homomorphism so that the hash of unions and differences of sets can be computed directly from the hashes of operands. We propose two representations that have this property for cosine similarity (an extension of \simhash and angle-preserving random projections), and make substantial progress on a third representation for Jaccard similarity (an extension of \minhash). We employ these hashes to compress the sufficient statistics of a conditional random field (CRF) coreference model and study how this compression affects our ability to compute similarities as entities are split and merged during inference. \cut{We study these hashes in a conditional random field (CRF) hierarchical coreference model in order to compute the similarity of entities as they are merged and split during inference.} We also provide novel statistical analysis of \simhash to help justify it as an estimator inside a CRF, showing that the bias and variance reduce quickly with the number of bits. On a problem of author coreference, we find that our \simhash scheme allows scaling the hierarchical coreference algorithm by an order of magnitude without degrading its statistical performance or the model's coreference accuracy, as long as we employ at least 128 or 256 bits. Angle-preserving random projections further improve the coreference quality, potentially allowing even fewer dimensions to be used.

PDF Abstract

Code

Add Remove Mark official

No code implementations yet. Submit your code now

Tasks

Add Remove

Datasets

Add Datasets introduced or used in this paper

Results from the Paper

Add Remove

Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

Scaling Hierarchical Coreference with Homomorphic Compression

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Add Remove

Methods

Add Remove