Expanding functional protein sequence space using generative adversarial networks
De novo protein design for catalysis of any desired chemical reaction is a long standing goal in proteinengineering, due to the broad spectrum of technological, scientific and medical applications. Currently,mapping protein sequence to protein function is, however, neither computationionally nor experimentally tangible1,2. Here we developed ProteinGAN, a specialised variant of the generative adversarial network3that is able to 'learn' natural protein sequence diversity and enables the generation of functional proteinsequences. ProteinGAN learns the evolutionary relationships of protein sequences directly from thecomplex multidimensional amino acid sequence space and creates new, highly diverse sequence variantswith natural-like physical properties. Using malate dehydrogenase as a template enzyme, we show that24% of the ProteinGAN-generated and experimentally tested sequences are soluble and display wild-typelevel catalytic activity in the tested conditionsin vitro, even in highly mutated (>100 mutations) sequences.ProteinGAN therefore demonstrates the potential of artificial intelligence to rapidly generate highly diversenovel functional proteins within the allowed biological constraints of the sequence space.
PDF Abstract