Ollivier persistent Ricci curvature (OPRC) based molecular representation for drug design
Efficient molecular featurization is one of the major issues for machine learning models in drug design. Here we propose persistent Ricci curvature (PRC), in particular Ollivier persistent Ricci curvature (OPRC), for the molecular featurization and feature engineering, for the first time. Filtration process proposed in persistent homology is employed to generate a series of nested molecular graphs. Persistence and variation of Ollivier Ricci curvatures on these nested graphs are defined as Ollivier persistent Ricci curvature. Moreover, persistent attributes, which are statistical and combinatorial properties of OPRCs during the filtration process, are used as molecular descriptors, and further combined with machine learning models, in particular, gradient boosting tree (GBT). Our OPRC-GBT model is used in the prediction of protein-ligand binding affinity, which is one of key steps in drug design. Based on three most-commonly used datasets from the well-established protein-ligand binding databank, i.e., PDBbind, we intensively test our model and compare with existing models. It has been found that our model are better than all machine learning models with traditional molecular descriptors.
PDF Abstract