Energy-based View of Retrosynthesis

14 Jul 2020  ·  Ruoxi Sun, Hanjun Dai, Li Li, Steven Kearnes, Bo Dai ·

Retrosynthesis -- the process of identifying a set of reactants to synthesize a target molecule -- is of vital importance to material design and drug discovery. Existing machine learning approaches based on language models and graph neural networks have achieved encouraging results. In this paper, we propose a framework that unifies sequence- and graph-based methods as energy-based models (EBMs) with different energy functions. This unified perspective provides critical insights about EBM variants through a comprehensive assessment of performance. Additionally, we present a novel dual variant within the framework that performs consistent training over Bayesian forward- and backward-prediction by constraining the agreement between the two directions. This model improves state-of-the-art performance by 9.6% for template-free approaches where the reaction type is unknown.

PDF Abstract

Datasets


Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Single-step retrosynthesis USPTO-50k Dual-TF Top-1 accuracy 55.3 # 1
Top-3 accuracy 69.7 # 9
Top-5 accuracy 73.0 # 11
Top-10 accuracy 75.0 # 13
Single-step retrosynthesis USPTO-50k Dual-TB Top-1 accuracy 55.2 # 2
Top-3 accuracy 74.6 # 5
Top-5 accuracy 80.5 # 7
Top-10 accuracy 86.9 # 5

Methods