Highly-Efficient Robinson-Foulds Distance Estimation with Matrix Correction

Phylogenetic trees are essential in studying evolutionary relationships, and the Robinson-Foulds (RF) distance is a widely used metric to calculate pairwise dissimilarities between phylogenetic trees, with various applications in both the biology and computing communities. However, generating a precise RF distance matrix becomes difficult or even intractable when tree information is partially missing. To address this issue, we introduce a novel distance correction algorithm for estimating the RF distance matrix of incomplete phylogenetic trees. Our method innovatively harnesses the assumption of Euclidean embedding, correcting an approximate distance matrix into a valid distance metric, guaranteed to be closer to the unknown ground-truth. Despite its simplicity, our approach exhibits robust performance, efficiency, and scalability in empirical evaluations, outperforming classical distance correction algorithms and holding potential benefits in downstream applications. Our code is available at https://github.com/CUHKSZ-Yu/EMC.

PDF

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here