Towards antigenic peptide discovery with better MHC-I binding prediction and improved benchmark methodology

Machine Learning for Drug Discovery Workshop, ICLR 2023 · Stanisław Giziński, Grzegorz Preibisch, Piotr Kucharski, Michał Tyrolski, Michał Rembalski, Piotr Grzegorczyk, Anna Gambin ·

The Major Histocompatibility Complex (MHC) is a crucial component of the cellular immune system in vertebrates, responsible for, among others, presenting peptides derived from intracellular proteins. The MHC-I presentation is vital in the immune response and holds great promise in vaccine development and cancer immunotherapy. In this study, we analyze the limitations of existing methods and benchmarks for MHC-I presentation. We introduce a new benchmark to measure crucial generalization properties and models’ reliability on unseen MHC molecules and peptides. Finally, we present HLABERT, a pre-trained language model which significantly surpasses prior methods on our benchmark and also sets new state-of-the-art on the old benchmarks.

PDF Abstract