preon: Fast and accurate entity normalization for drug names and cancer types in precision oncology

Bioinformatics 2024 · Arik Ermshaus, Michael Piechotta, Gina Rüter, Ulrich Keilholz, Ulf Leser, Manuela Benary ·

Motivation In precision oncology (PO), clinicians aim to find the best treatment for any patient based on their molecular characterization. A major bottleneck is the manual annotation and evaluation of individual variants, for which usually a range of knowledge bases are screened. To incorporate and integrate the vast information of different databases, fast and accurate methods for harmonizing databases with different types of information are necessary. An essential step for harmonization in PO includes the normalization of tumor entities as well as therapy options for patients. Summary preon is a fast and accurate library for the normalization of drug names and cancer types in large-scale data integration. Availability and implementation preon is implemented in Python and freely available via the PyPI repository. Source code and the data underlying this article are available in GitHub at https://github.com/ermshaua/preon/.

PDF Abstract