1 code implementation • 1 Apr 2024 • Vincent Fan, Yujie Qian, Alex Wang, Amber Wang, Connor W. Coley, Regina Barzilay
Our machine learning models attain state-of-the-art performance when evaluated individually, and we meticulously annotate a challenging dataset of reaction schemes with R-groups to evaluate our pipeline as a whole, achieving an F1 score of 69. 5%.
no code implementations • 7 Mar 2024 • Joonyoung F. Joung, Mun Hong Fong, Jihye Roh, Zhengkai Tu, John Bradshaw, Connor W. Coley
Mechanistic understanding of organic reactions can facilitate reaction development, impurity prediction, and in principle, reaction discovery.
1 code implementation • 19 Feb 2024 • Wenhao Gao, Priyanka Raghavan, Ron Shprints, Connor W. Coley
In this work, we introduce a novel pre-training strategy, substrate scope contrastive learning, which learns atomic representations tailored to chemical reactivity.
1 code implementation • 8 Dec 2023 • Yujie Qian, Zhening Li, Zhengkai Tu, Connor W. Coley, Regina Barzilay
Conventionally, chemoinformatics models are trained with extensive structured data manually extracted from the literature.
1 code implementation • 3 Nov 2023 • Jenna C. Fromer, Connor W. Coley
Small molecules exhibiting desirable property profiles are often discovered through an iterative process of designing, synthesizing, and testing sets of molecules.
no code implementations • 16 Oct 2023 • Jenna C. Fromer, David E. Graff, Connor W. Coley
The discovery of therapeutic molecules is fundamentally a multi-objective optimization problem.
1 code implementation • 29 Sep 2023 • Yanqiao Zhu, Jeehyun Hwang, Keir Adams, Zhen Liu, Bozhao Nan, Brock Stenfors, Yuanqi Du, Jatin Chauhan, Olaf Wiest, Olexandr Isayev, Connor W. Coley, Yizhou Sun, Wei Wang
Molecular Representation Learning (MRL) has proven impactful in numerous biochemical applications such as drug discovery and enzyme design.
1 code implementation • 17 Jul 2023 • Xuan Zhang, Limei Wang, Jacob Helwig, Youzhi Luo, Cong Fu, Yaochen Xie, Meng Liu, Yuchao Lin, Zhao Xu, Keqiang Yan, Keir Adams, Maurice Weiler, Xiner Li, Tianfan Fu, Yucheng Wang, Haiyang Yu, Yuqing Xie, Xiang Fu, Alex Strasser, Shenglong Xu, Yi Liu, Yuanqi Du, Alexandra Saxton, Hongyi Ling, Hannah Lawrence, Hannes Stärk, Shurui Gui, Carl Edwards, Nicholas Gao, Adriana Ladera, Tailin Wu, Elyssa F. Hofgard, Aria Mansouri Tehrani, Rui Wang, Ameya Daigavane, Montgomery Bohde, Jerry Kurtin, Qian Huang, Tuong Phung, Minkai Xu, Chaitanya K. Joshi, Simon V. Mathis, Kamyar Azizzadenesheli, Ada Fang, Alán Aspuru-Guzik, Erik Bekkers, Michael Bronstein, Marinka Zitnik, Anima Anandkumar, Stefano Ermon, Pietro Liò, Rose Yu, Stephan Günnemann, Jure Leskovec, Heng Ji, Jimeng Sun, Regina Barzilay, Tommi Jaakkola, Connor W. Coley, Xiaoning Qian, Xiaofeng Qian, Tess Smidt, Shuiwang Ji
Advances in artificial intelligence (AI) are fueling a new paradigm of discoveries in natural sciences.
1 code implementation • 17 Jul 2023 • Samuel Goldman, Jiayi Xin, Joules Provenzano, Connor W. Coley
Importantly, MIST-CF learns in a data dependent fashion using a Formula Transformer neural network architecture and circumvents the need for fragmentation tree construction.
1 code implementation • 19 May 2023 • Yujie Qian, Jiang Guo, Zhengkai Tu, Connor W. Coley, Regina Barzilay
Reaction diagram parsing is the task of extracting reaction schemes from a diagram in the chemistry literature.
no code implementations • 14 May 2023 • David E. Graff, Edward O. Pyzer-Knapp, Kirk E. Jordan, Eugene I. Shakhnovich, Connor W. Coley
When the correlation between structure and property weakens, a dataset is described as "rough," but this characteristic is partly a function of the chosen representation.
1 code implementation • 25 Apr 2023 • Samuel Goldman, Janet Li, Connor W. Coley
The accurate prediction of tandem mass spectra from molecular structures has the potential to unlock new metabolomic discoveries by augmenting the community's libraries of experimental reference standards.
1 code implementation • NeurIPS 2023 • Samuel Goldman, John Bradshaw, Jiayi Xin, Connor W. Coley
Computational predictions of mass spectra from molecules have enabled the discovery of clinically relevant metabolites.
1 code implementation • 28 Nov 2022 • Tianfan Fu, Wenhao Gao, Connor W. Coley, Jimeng Sun
The neural models take the 3D structure of the targets and ligands as inputs and are pre-trained using native complex structures to utilize the knowledge of the shared binding physics from different targets and then fine-tuned during optimization.
1 code implementation • 4 Nov 2022 • Divya Nori, Connor W. Coley, Rocío Mercado
After fine-tuning, predicted activity against a challenging POI increases from 50% to >80% with near-perfect chemical validity for sampled compounds, suggesting this is a promising approach for the optimization of large, PROTAC-like molecules for targeted protein degradation.
no code implementations • 13 Oct 2022 • Jenna C. Fromer, Connor W. Coley
Molecular discovery is a multi-objective optimization problem that requires identifying a molecule or set of molecules that balance multiple, often competing, properties.
1 code implementation • 6 Oct 2022 • Keir Adams, Connor W. Coley
Shape-based virtual screening is widely employed in ligand-based drug design to search chemical libraries for molecules with similar 3D shapes yet novel 2D chemical structures compared to known ligands.
2 code implementations • 19 Jul 2022 • Matteo Aldeghi, David E. Graff, Nathan Frey, Joseph A. Morrone, Edward O. Pyzer-Knapp, Kirk E. Jordan, Connor W. Coley
In molecular discovery and drug design, structure-property relationships and activity landscapes are often qualitatively or quantitatively analyzed to guide the navigation of chemical space.
1 code implementation • 28 May 2022 • Yujie Qian, Jiang Guo, Zhengkai Tu, Zhening Li, Connor W. Coley, Regina Barzilay
Molecular structure recognition is the task of translating a molecular image into its graph structure.
1 code implementation • 17 May 2022 • Matteo Aldeghi, Connor W. Coley
While doing so, we built a dataset of simulated electron affinity and ionization potential values for >40k polymers with varying monomer composition, stoichiometry, and chain architecture, which may be used in the development of other tailored machine learning approaches.
2 code implementations • 3 May 2022 • David E. Graff, Matteo Aldeghi, Joseph A. Morrone, Kirk E. Jordan, Edward O. Pyzer-Knapp, Connor W. Coley
In this study, we propose an extension to the framework of model-guided optimization that mitigates inferences costs using a technique we refer to as design space pruning (DSP), which irreversibly removes poor-performing candidates from consideration.
1 code implementation • 17 Dec 2021 • David E. Graff, Connor W. Coley
pyscreener is a Python library that seeks to alleviate the challenges of large-scale structure-based design using computational docking.
no code implementations • NeurIPS Workshop AI4Scien 2021 • Nathan C. Frey, Siddharth Samsi, Bharath Ramsundar, Connor W. Coley, Vijay Gadepally
Artificial intelligence has not yet revolutionized the design of materials and molecules.
1 code implementation • NeurIPS Workshop AI4Scien 2021 • Nathan C. Frey, Siddharth Samsi, Joseph McDonald, Lin Li, Connor W. Coley, Vijay Gadepally
Deep learning in molecular and materials sciences is limited by the lack of integration between applied science, artificial intelligence, and high-performance computing.
1 code implementation • 19 Oct 2021 • Zhengkai Tu, Connor W. Coley
Synthesis planning and reaction outcome prediction are two fundamental problems in computer-aided organic chemistry for which a variety of data-driven approaches have emerged.
Ranked #10 on Single-step retrosynthesis on USPTO-50k
1 code implementation • ICLR 2022 • Wenhao Gao, Rocío Mercado, Connor W. Coley
Molecular design and synthesis planning are two critical steps in the process of molecular discovery that we propose to formulate as a single shared task of conditional synthetic pathway generation.
1 code implementation • ICLR 2022 • Keir Adams, Lagnajit Pattanaik, Connor W. Coley
Molecular chirality, a form of stereochemistry most often describing relative spatial arrangements of bonded neighbors around tetrahedral carbon centers, influences the set of 3D conformers accessible to the molecule without changing its 2D graph connectivity.
no code implementations • ICLR 2022 • Tianfan Fu, Wenhao Gao, Cao Xiao, Jacob Yasonik, Connor W. Coley, Jimeng Sun
The structural design of functional molecules, also called molecular optimization, is an essential chemical science and engineering task with important applications, such as drug discovery.
no code implementations • 22 Sep 2021 • Tianfan Fu, Wenhao Gao, Cao Xiao, Jacob Yasonik, Connor W. Coley, Jimeng Sun
The structural design of functional molecules, also called molecular optimization, is an essential chemical science and engineering task with important applications, such as drug discovery.
1 code implementation • 8 Sep 2021 • Samuel Goldman, Ria Das, Kevin K. Yang, Connor W. Coley
However, the adoption of biocatalysis is limited by our ability to select enzymes that will catalyze their natural chemical transformation on non-natural substrates.
1 code implementation • 27 Aug 2021 • Katherine S. Lim, Andrew G. Reidenbach, Bruce K. Hua, Jeremy W. Mason, Christopher J. Gerry, Paul A. Clemons, Connor W. Coley
Further, this approach to uncertainty-aware regression is applicable to other sparse or noisy datasets where the nature of stochasticity is known or can be modeled; in particular, the Poisson enrichment ratio metric we use can apply to other settings that compare sequencing count data between two experimental conditions.
1 code implementation • NeurIPS 2021 • Octavian-Eugen Ganea, Lagnajit Pattanaik, Connor W. Coley, Regina Barzilay, Klavs F. Jensen, William H. Green, Tommi S. Jaakkola
Prediction of a molecule's 3D conformer ensemble from the molecular graph holds a key role in areas of cheminformatics and drug discovery.
no code implementations • arXiv 2021 • Vignesh Ram Somnath, Charlotte Bunne, Connor W. Coley, Andreas Krause, Regina Barzilay
Retrosynthesis prediction is a fundamental problem in organic synthesis, where the task is to identify precursor molecules that can be used to synthesize a target molecule.
Ranked #6 on Single-step retrosynthesis on USPTO-50k
no code implementations • 26 May 2021 • Shuangjia Zheng, Tao Zeng, Chengtao Li, Binghong Chen, Connor W. Coley, Yuedong Yang, Ruibo Wu
Nature, a synthetic master, creates more than 300, 000 natural products (NPs) which are the major constituents of FDA-proved drugs owing to the vast chemical space of NPs.
2 code implementations • 18 Feb 2021 • Kexin Huang, Tianfan Fu, Wenhao Gao, Yue Zhao, Yusuf Roohani, Jure Leskovec, Connor W. Coley, Cao Xiao, Jimeng Sun, Marinka Zitnik
Here, we introduce Therapeutics Data Commons (TDC), the first unifying platform to systematically access and evaluate machine learning across the entire range of therapeutics.
1 code implementation • 13 Dec 2020 • David E. Graff, Eugene I. Shakhnovich, Connor W. Coley
Structure-based virtual screening is an important tool in early stage drug discovery that scores the interactions between a target protein and candidate ligands.
1 code implementation • 24 Nov 2020 • Lagnajit Pattanaik, Octavian-Eugen Ganea, Ian Coley, Klavs F. Jensen, William H. Green, Connor W. Coley
Molecules with identical graph connectivity can exhibit different physical and biological properties if they exhibit stereochemistry-a spatial structural characteristic.
2 code implementations • NeurIPS 2021 • Vignesh Ram Somnath, Charlotte Bunne, Connor W. Coley, Andreas Krause, Regina Barzilay
Retrosynthesis prediction is a fundamental problem in organic synthesis, where the task is to identify precursor molecules that can be used to synthesize a target molecule.
1 code implementation • 20 May 2020 • Lior Hirschfeld, Kyle Swanson, Kevin Yang, Regina Barzilay, Connor W. Coley
While we believe these results show that existing UQ methods are not sufficient for all common use-cases and demonstrate the benefits of further research, we conclude with a practical recommendation as to which existing techniques seem to perform well relative to others.
1 code implementation • 26 Apr 2020 • Sai Krishna Gottipati, Boris Sattarov, Sufeng. Niu, Yashaswi Pathak, Hao-Ran Wei, Shengchao Liu, Karam M. J. Thomas, Simon Blackburn, Connor W. Coley, Jian Tang, Sarath Chandar, Yoshua Bengio
Over the last decade, there has been significant progress in the field of machine learning for de novo drug design, particularly in deep generative models.
no code implementations • 30 Mar 2020 • Connor W. Coley, Natalie S. Eyke, Klavs F. Jensen
This two-part review examines how automation has contributed to different aspects of discovery in the chemical sciences.
no code implementations • 30 Mar 2020 • Connor W. Coley, Natalie S. Eyke, Klavs F. Jensen
This two-part review examines how automation has contributed to different aspects of discovery in the chemical sciences.
1 code implementation • 17 Feb 2020 • Wenhao Gao, Connor W. Coley
The discovery of functional molecules is an expensive and time-consuming process, exemplified by the rising costs of small molecule therapeutic discovery.
1 code implementation • NeurIPS 2019 • Hanjun Dai, Chengtao Li, Connor W. Coley, Bo Dai, Le Song
Retrosynthesis is one of the fundamental problems in organic chemistry.
Ranked #11 on Single-step retrosynthesis on USPTO-50k
no code implementations • 19 Jan 2019 • John S. Schreck, Connor W. Coley, Kyle J. M. Bishop
The problem of retrosynthetic planning can be framed as one player game, in which the chemist (or a computer program) works backwards from a molecular target to simpler starting materials though a series of choices regarding which reactions to perform.
no code implementations • Chemical Science 2018 • Connor W. Coley, Wengong Jin, Luke Rogers, Timothy F. Jamison, Tommi S. Jaakkola, William H. Green, Regina Barzilay, Klavs F. Jensen
We present a supervised learning approach to predict the products of organic reactions given their reactants, reagents, and solvent(s).
1 code implementation • NeurIPS 2017 • Wengong Jin, Connor W. Coley, Regina Barzilay, Tommi Jaakkola
The prediction of organic reaction outcomes is a fundamental problem in computational chemistry.