Conditional Ξ²-VAE for De Novo Molecular Generation

ChemRxiv 2022  Β·  Ryan J Richards, Austen M Groener Β·

Deep learning has significantly advanced and accelerated de novo molecular generation. Generative networks, namely Variational Autoencoders (VAEs) can not only randomly generate new molecules, but also alter molecular structures to optimize specific chemical properties which are pivotal for drug- discovery. While VAEs have been proposed and researched in the past for pharmaceutical applications, they possess deficiencies that limit their ability to both optimize properties and decode syntactically valid molecules. We present a recurrent, conditional Ξ²-VAE that disentangles the latent space to enhance post hoc molecule optimization. We create a mutual information driven training protocol and data augmentations to both increase molecular validity and promote longer sequence generation. We demonstrate the efficacy of our framework on the ZINC-250k dataset, achieving SOTA unconstrained optimization results on the penalized LogP (pLogP) and QED scores, while also matching current SOTA results for validity, novelty, and uniqueness scores for random generation. We match the current SOTA on QED for top-3 molecules at 0.948, while setting a new SOTA for pLogP optimization at 104.29, 90.12, 69.68 and demonstrating improved results on the constrained optimization task.

PDF

Datasets


Results from the Paper


 Ranked #1 on Molecular Graph Generation on ZINC (QED Top-3 metric)

     Get a GitHub badge
Task Dataset Model Metric Name Metric Value Global Rank Benchmark
Molecular Graph Generation ZINC 𝛽-VAE Validty 98.77 # 8
QED Top-3 0.948, 0.948, 0.948 # 1
PlogP Top-3 6.53, 6.47, 6.44 # 1
Validity w/o Check 98.28 # 2
Uniqueness 98.31 # 4
Novelty 99.75 # 4
NUV 97.62 # 3
Molecular Graph Generation ZINC 𝛽-CVAE Validty 99.64 # 7
QED Top-3 0.948, 0.948, 0.948 # 1
PlogP Top-3 104.29, 90.12, 69.68 # 1
Validity w/o Check 99.44 # 1
Uniqueness 88.69 # 5
Novelty 99.42 # 5
NUV 80.11 # 4

Methods


HOC β€’ VAE