De novo protein design for catalysis of any desired chemical reaction is a long standing goal in proteinengineering, due to the broad spectrum of technological, scientific and medical applications. Currently,mapping protein sequence to protein function is, however, neither computationionally nor experimentally tangible​1,2​. Here we developed ProteinGAN, a specialised variant of the generative adversarial network​3that is able to 'learn' natural protein sequence diversity and enables the generation of functional proteinsequences. ProteinGAN learns the evolutionary relationships of protein sequences directly from thecomplex multidimensional amino acid sequence space and creates new, highly diverse sequence variantswith natural-like physical properties. Using malate dehydrogenase as a template enzyme, we show that24% of the ProteinGAN-generated and experimentally tested sequences are soluble and display wild-typelevel catalytic activity in the tested conditions​in vitro​, even in highly mutated (>100 mutations) sequences.ProteinGAN therefore demonstrates the potential of artificial intelligence to rapidly generate highly diversenovel functional proteins within the allowed biological constraints of the sequence space.

PDF Abstract

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here