FLIP -- AAV, Designed vs mutant (adeno-associated virus)

Introduced by Dallago et al. in FLIP: Benchmark tasks in fitness landscape inference for proteins

FLIP includes several benchmark datasets that contain a variety of protein sequences, each with a real-valued label indicating its "fitness" (how well the protein performs some particular function). The goal is to predict the fitness of a given protein sequence using the sequence. Different representations of protein sequences (e.g. learned embeddings from large language models) may prove helpful here.

This sub-dataset (AAV) is a set of 201,426 training sequences and 82,583 test sequences in which the goal is to predict the fitness of mutants of the capsid protein from the adeno-associated virus (AAV). The training set proteins were designed, while the test set proteins are random mutants. The absolute value of the fitness is not important, but its ranking / relative value is -- protein designers would like to be able to pick a sequence with high fitness relative to those in the training set. Performance is therefore usually assessed using Spearman's r correlation coefficient.

Homepage

Benchmarks

Add a new result Link an existing benchmark

No benchmarks yet. Start a new benchmark or link an existing one.

Papers

Paper	Code	Results	Date	Stars

Dataset Loaders

Add Remove

No data loaders found. You can submit your data loader here.

Tasks

Similar Datasets

FLIP

Usage

FLIP -- AAV, Designed vs mutant (adeno-associated virus)

Benchmarks Edit Add a new result Link an existing benchmark

Papers

Dataset Loaders Edit Add Remove

Tasks Edit

Similar Datasets

FLIP

Usage

License Edit

Modalities Edit

Languages Edit

Benchmarks

Add a new result Link an existing benchmark

Dataset Loaders

Add Remove

Tasks

License

Modalities

Languages