FLIP includes several benchmark datasets that contain a variety of protein sequences, each with a real-valued label indicating its "fitness" (how well the protein performs some particular function). The goal is to predict the fitness of a given protein sequence using the sequence. Different representations of protein sequences (e.g. learned embeddings from large language models) may prove helpful here.
Some of the benchmark datasets (thermostability) contain a highly diverse set of sequences from many different protein families. Others (AAV, GB1) contain all sequences that are mutants of a single parent sequence. Each benchmark dataset features multiple "splits" -- different ways of train-test splitting the data to assess how well a model might generalize given limited information. The AAV benchmark, for example, features the "mutant vs designed" split in which a model is trained on randomly generated mutants and asked to predict the fitness of designed sequences, and the "seven vs many" split in which a model is trained on sequences with seven mutations and asked to make predictions for sequences with a different number of mutations.
Paper | Code | Results | Date | Stars |
---|