no code implementations • 4 Apr 2024 • Jerret Ross, Brian Belgodere, Samuel C. Hoffman, Vijil Chenthamarakshan, Youssef Mroueh, Payel Das
Impressively, we find GP-MoLFormer is able to generate a significant fraction of novel, valid, and unique SMILES even when the number of generated molecules is in the 10 billion range and the reference set is over a billion.
no code implementations • 18 Mar 2024 • Payel Das, Subhajit Chaudhury, Elliot Nelson, Igor Melnyk, Sarath Swaminathan, Sihui Dai, Aurélie Lozano, Georgios Kollias, Vijil Chenthamarakshan, Jiří, Navrátil, Soham Dan, Pin-Yu Chen
Efficient and accurate updating of knowledge stored in Large Language Models (LLMs) is one of the most pressing research challenges today.
no code implementations • 28 Feb 2024 • Amit Dhurandhar, Tejaswini Pedapati, Ronny Luss, Soham Dan, Aurelie Lozano, Payel Das, Georgios Kollias
Transformer-based Language Models have become ubiquitous in Natural Language Processing (NLP) due to their impressive performance on various tasks.
no code implementations • 12 Feb 2024 • Yunsheng Tian, Ane Zuniga, Xinwei Zhang, Johannes P. Dürholt, Payel Das, Jie Chen, Wojciech Matusik, Mina Konaković Luković
In this paper, we observe that in such scenarios optimal solution typically lies on the boundary between feasible and infeasible regions of the design space, making it considerably more difficult than that with interior optima.
no code implementations • 10 Feb 2024 • Zuobai Zhang, Jiarui Lu, Vijil Chenthamarakshan, Aurélie Lozano, Payel Das, Jian Tang
Protein function annotation is an important yet challenging task in biology.
1 code implementation • 7 Feb 2024 • Zuobai Zhang, Jiarui Lu, Vijil Chenthamarakshan, Aurélie Lozano, Payel Das, Jian Tang
To address this issue, we introduce the integration of remote homology detection to distill structural information into protein language models without requiring explicit protein structures as input.
1 code implementation • 4 Sep 2023 • Minghao Guo, Veronika Thost, Samuel W Song, Adithya Balachandran, Payel Das, Jie Chen, Wojciech Matusik
Still, these techniques are faced with a common challenge in practice: Labeled data are limited by the cost of manual extraction from literature and laborious experimentation.
2 code implementations • NeurIPS 2023 • Amirhossein Kazemnejad, Inkit Padhi, Karthikeyan Natesan Ramamurthy, Payel Das, Siva Reddy
In this paper, we conduct a systematic empirical study comparing the length generalization performance of decoder-only Transformers with five different position encoding approaches including Absolute Position Embedding (APE), T5's Relative PE, ALiBi, and Rotary, in addition to Transformers without positional encoding (NoPE).
no code implementations • 22 May 2023 • Ioana Baldini, Chhavi Yadav, Payel Das, Kush R. Varshney
Bias auditing is further complicated by LM brittleness: when a presumably biased outcome is observed, is it due to model bias or model brittleness?
3 code implementations • 11 Mar 2023 • Zuobai Zhang, Chuanrui Wang, Minghao Xu, Vijil Chenthamarakshan, Aurélie Lozano, Payel Das, Jian Tang
Recent sequence representation learning methods based on Protein Language Models (PLMs) excel in sequence-based tasks, but their direct adaptation to tasks involving protein structures remains a challenge.
no code implementations • 8 Jan 2023 • Pin-Yu Chen, Payel Das
With the advancements in machine learning (ML) methods and compute resources, artificial intelligence (AI) empowered systems are becoming a prevailing technology.
no code implementations • 5 Jan 2023 • Ria Vinod, Pin-Yu Chen, Payel Das
To this end, we reprogram an off-the-shelf pre-trained English language transformer and benchmark it on a set of protein physicochemical prediction tasks (secondary structure, stability, homology, stability) as well as on a biomedically relevant set of protein function prediction tasks (antimicrobial, toxicity, antibody affinity).
1 code implementation • 18 Nov 2022 • Igor Melnyk, Pierre Dognin, Payel Das
In this work we propose a novel end-to-end multi-stage Knowledge Graph (KG) generation system from textual inputs, separating the overall process into two stages.
Ranked #1 on Joint Entity and Relation Extraction on WebNLG 3.0
1 code implementation • 8 Nov 2022 • Jenna A. Bilbrey, Kristina M. Herman, Henry Sprueill, Soritis S. Xantheas, Payel Das, Manuel Lopez Roldan, Mike Kraus, Hatem Helal, Sutanay Choudhury
The demonstrated success of transfer learning has popularized approaches that involve pretraining models from massive data sources and subsequent finetuning towards a specific task.
no code implementations • 1 Nov 2022 • Chanakya Ekbote, Moksh Jain, Payel Das, Yoshua Bengio
We hypothesize that this can lead to incompatibility between the inductive optimization biases in training $R$ and in training the GFlowNet, potentially leading to worse samples and slow adaptation to changes in the distribution.
no code implementations • 13 Oct 2022 • Sourya Basu, Prasanna Sattigeri, Karthikeyan Natesan Ramamurthy, Vijil Chenthamarakshan, Kush R. Varshney, Lav R. Varshney, Payel Das
We also provide a novel group-theoretic definition for fairness in NLG.
no code implementations • 6 Oct 2022 • Ching-Yun Ko, Pin-Yu Chen, Jeet Mohapatra, Payel Das, Luca Daniel
Given a pretrained model, the representations of data synthesized from the Gaussian mixture are used to compare with our reference to infer the quality.
1 code implementation • 5 Oct 2022 • Igor Melnyk, Vijil Chenthamarakshan, Pin-Yu Chen, Payel Das, Amit Dhurandhar, Inkit Padhi, Devleena Das
Results on antibody design benchmarks show that our model on low-resourced antibody sequence dataset provides highly diverse CDR sequences, up to more than a two-fold increase of diversity over the baselines, without losing structural integrity and naturalness.
1 code implementation • 5 Oct 2022 • Igor Melnyk, Aurelie Lozano, Payel Das, Vijil Chenthamarakshan
This model can then be used as a structure consistency regularizer in training the inverse folding model.
no code implementations • 13 Aug 2022 • Brian Belgodere, Vijil Chenthamarakshan, Payel Das, Pierre Dognin, Toby Kurien, Igor Melnyk, Youssef Mroueh, Inkit Padhi, Mattia Rigotti, Jarret Ross, Yair Schiff, Richard A. Young
With the prospect of automating a number of chemical tasks with high fidelity, chemical language processing models are emerging at a rapid speed.
no code implementations • 10 Aug 2022 • Arpan Mukherjee, Ali Tajer, Pin-Yu Chen, Payel Das
Additionally, each process $i\in\{1, \dots, K\}$ has a private parameter $\alpha_i$.
no code implementations • 14 Jul 2022 • Samuel C. Hoffman, Kahini Wadhawan, Payel Das, Prasanna Sattigeri, Karthikeyan Shanmugam
In this work, we provide a simple algorithm that relies on perturbation experiments on latent codes of a pre-trained generative autoencoder to uncover a causal graph that is implied by the generative model.
1 code implementation • 8 Jul 2022 • Matteo Manica, Jannis Born, Joris Cadow, Dimitrios Christofidellis, Ashish Dave, Dean Clarke, Yves Gaetan Nana Teukam, Giorgio Giannone, Samuel C. Hoffman, Matthew Buchan, Vijil Chenthamarakshan, Timothy Donovan, Hsiang Han Hsu, Federico Zipoli, Oliver Schilter, Akihiro Kishimoto, Lisa Hamada, Inkit Padhi, Karl Wehden, Lauren McHugh, Alexy Khrabrov, Payel Das, Seiji Takeda, John R. Smith
With the growing availability of data within various scientific domains, generative models hold enormous potential to accelerate scientific discovery.
no code implementations • 20 May 2022 • N. Joseph Tatro, Payel Das, Pin-Yu Chen, Vijil Chenthamarakshan, Rongjie Lai
Massive molecular simulations of drug-target proteins have been used as a tool to understand disease mechanism and develop therapeutics.
no code implementations • 19 Apr 2022 • Vijil Chenthamarakshan, Samuel C. Hoffman, C. David Owen, Petra Lukacik, Claire Strain-Damerell, Daren Fearon, Tika R. Malla, Anthony Tumber, Christopher J. Schofield, Helen M. E. Duyvesteyn, Wanwisa Dejnirattisai, Loic Carrique, Thomas S. Walter, Gavin R. Screaton, Tetiana Matviiuk, Aleksandra Mojsilovic, Jason Crain, Martin A. Walsh, David I. Stuart, Payel Das
To perform target-aware design of novel inhibitor molecules, a protein sequence-conditioned sampling on the generative foundation model is performed.
1 code implementation • 13 Apr 2022 • Bhanushee Sharma, Vijil Chenthamarakshan, Amit Dhurandhar, Shiranee Pereira, James A. Hendler, Jonathan S. Dordick, Payel Das
Additionally, our multi-task approach is comprehensive in the sense that it is comparable to state-of-the-art approaches for specific endpoints in in vitro, in vivo and clinical platforms.
1 code implementation • ICLR 2022 • Minghao Guo, Veronika Thost, Beichen Li, Payel Das, Jie Chen, Wojciech Matusik
This is a non-trivial task for neural network-based generative models since the relevant chemical knowledge can only be extracted and generalized from the limited training data.
2 code implementations • 11 Mar 2022 • Zuobai Zhang, Minghao Xu, Arian Jamasb, Vijil Chenthamarakshan, Aurelie Lozano, Payel Das, Jian Tang
Despite the effectiveness of sequence-based approaches, the power of pretraining on known protein structures, which are available in smaller numbers only, has not been explored for protein property prediction, though protein structures are known to be determinants of protein function.
1 code implementation • 2 Mar 2022 • Moksh Jain, Emmanuel Bengio, Alex-Hernandez Garcia, Jarrid Rector-Brooks, Bonaventure F. P. Dossou, Chanakya Ekbote, Jie Fu, Tianyu Zhang, Micheal Kilgour, Dinghuai Zhang, Lena Simine, Payel Das, Yoshua Bengio
In this work, we propose an active learning algorithm leveraging epistemic uncertainty estimation and the recently proposed GFlowNets as a generator of diverse candidate solutions, with the objective to obtain a diverse batch of useful (as defined by some utility function, for example, the predicted anti-microbial activity of a peptide) and informative candidates after each round.
no code implementations • 1 Mar 2022 • Celia Cintas, Payel Das, Brian Quanz, Girmaw Abebe Tadesse, Skyler Speakman, Pin-Yu Chen
We propose group-based subset scanning to identify, quantify, and characterize creative processes by detecting a subset of anomalous node-activations in the hidden layers of the generative models.
no code implementations • 8 Feb 2022 • Hamid Dadkhahi, Jesus Rios, Karthikeyan Shanmugam, Payel Das
In order to improve the performance and sample efficiency of such algorithms, we propose to use existing methods in conjunction with a surrogate model for the black-box evaluations over purely categorical variables.
no code implementations • 2 Dec 2021 • Samuel C. Hoffman, Vijil Chenthamarakshan, Dmitry Yu. Zubarev, Daniel P. Sanders, Payel Das
Photo-acid generators (PAGs) are compounds that release acids ($H^+$ ions) when exposed to light.
no code implementations • NeurIPS 2021 • Arpan Mukherjee, Ali Tajer, Pin-Yu Chen, Payel Das
Owing to the adversarial contamination of the rewards, each arm's mean is only partially identifiable.
no code implementations • NeurIPS 2021 • Arpan Mukherjee, Ali Tajer, Pin-Yu Chen, Payel Das
Owing to the adversarial contamination of the rewards, each arm's mean is only partially identifiable.
no code implementations • 12 Nov 2021 • Igor Melnyk, Payel Das, Vijil Chenthamarakshan, Aurelie Lozano
Here we consider three recently proposed deep generative frameworks for protein design: (AR) the sequence-based autoregressive generative model, (GVP) the precise structure-based graph neural network, and Fold2Seq that leverages a fuzzy and scale-free representation of a three-dimensional fold, while enforcing structure-to-sequence (and vice versa) consistency.
no code implementations • 10 Nov 2021 • Raphaël Pestourie, Youssef Mroueh, Chris Rackauckas, Payel Das, Steven G. Johnson
Many physics and engineering applications demand Partial Differential Equations (PDE) property evaluations that are traditionally computed with resource-intensive high-fidelity numerical solvers.
1 code implementation • EMNLP 2021 • Pierre L. Dognin, Inkit Padhi, Igor Melnyk, Payel Das
Automatic construction of relevant Knowledge Bases (KBs) from text, and generation of semantically meaningful text from KBs are both long-standing goals in Machine Learning.
Ranked #1 on Joint Entity and Relation Extraction on TekGen
no code implementations • 18 Aug 2021 • Kahini Wadhawan, Payel Das, Barbara A. Han, Ilya R. Fischhoff, Adrian C. Castellanos, Arvind Varsani, Kush R. Varshney
Current methods for viral discovery target evolutionarily conserved proteins that accurately identify virus families but remain unable to distinguish the zoonotic potential of newly discovered viruses.
1 code implementation • 24 Jun 2021 • Yue Cao, Payel Das, Vijil Chenthamarakshan, Pin-Yu Chen, Igor Melnyk, Yang shen
Designing novel protein sequences for a desired 3D topological fold is a fundamental yet non-trivial task in protein engineering.
1 code implementation • 17 Jun 2021 • Jerret Ross, Brian Belgodere, Vijil Chenthamarakshan, Inkit Padhi, Youssef Mroueh, Payel Das
Models based on machine learning can enable accurate and fast molecular property predictions, which is of interest in drug discovery and material design.
no code implementations • NeurIPS 2021 • Yair Schiff, Brian Quanz, Payel Das, Pin-Yu Chen
However, despite these successes, the recent Predicting Generalization in Deep Learning (PGDL) NeurIPS 2020 competition suggests that there is a need for more robust and efficient measures of network generalization.
no code implementations • 8 Jun 2021 • Yair Schiff, Vijil Chenthamarakshan, Samuel Hoffman, Karthikeyan Natesan Ramamurthy, Payel Das
Deep generative models have emerged as a powerful tool for learning useful molecular representations and designing novel molecules with desired properties, with applications in drug discovery and material design.
no code implementations • 8 Apr 2021 • Yair Schiff, Brian Quanz, Payel Das, Pin-Yu Chen
The field of Deep Learning is rich with empirical evidence of human-like performance on a variety of regression, classification, and control tasks.
no code implementations • 1 Apr 2021 • Celia Cintas, Payel Das, Brian Quanz, Skyler Speakman, Victor Akinwande, Pin-Yu Chen
We propose group-based subset scanning to quantify, detect, and characterize creative processes by detecting a subset of anomalous node-activations in the hidden layers of generative models.
no code implementations • 1 Jan 2021 • Norman Joseph Tatro, Payel Das, Pin-Yu Chen, Vijil Chenthamarakshan, Rongjie Lai
Empowered by the disentangled latent space learning, the extrinsic latent embedding is successfully used for classification or property prediction of different drugs bound to a specific protein.
1 code implementation • 22 Dec 2020 • Minhao Cheng, Pin-Yu Chen, Sijia Liu, Shiyu Chang, Cho-Jui Hsieh, Payel Das
Enhancing model robustness under new and even adversarial environments is a crucial milestone toward building trustworthy machine learning systems.
no code implementations • 7 Dec 2020 • Ria Vinod, Pin-Yu Chen, Payel Das
Recent advancements in transfer learning have made it a promising approach for domain adaptation via transfer of learned representations.
1 code implementation • 3 Nov 2020 • Samuel Hoffman, Vijil Chenthamarakshan, Kahini Wadhawan, Pin-Yu Chen, Payel Das
Machine learning based methods have shown potential for optimizing existing molecules with more desirable properties, a critical step towards accelerating new chemical discovery.
no code implementations • EMNLP 2020 • Pierre L. Dognin, Igor Melnyk, Inkit Padhi, Cicero Nogueira dos santos, Payel Das
In this work, we present a dual learning approach for unsupervised text to path and path to text transfers in Commonsense Knowledge Bases (KBs).
no code implementations • NeurIPS Workshop TDA_and_Beyond 2020 • Yair Schiff, Vijil Chenthamarakshan, Karthikeyan Natesan Ramamurthy, Payel Das
In this work, we propose a method for measuring how well the latent space of deep generative models is able to encode structural and chemical features of molecular datasets by correlating latent space metrics with metrics from the field of topological data analysis (TDA).
no code implementations • 23 Sep 2020 • Kar Wai Lim, Bhanushee Sharma, Payel Das, Vijil Chenthamarakshan, Jonathan S. Dordick
Chemical toxicity prediction using machine learning is important in drug development to reduce repeated animal and human testing, thus saving cost and time.
1 code implementation • NeurIPS 2020 • N. Joseph Tatro, Pin-Yu Chen, Payel Das, Igor Melnyk, Prasanna Sattigeri, Rongjie Lai
Yet, current curve finding algorithms do not consider the influence of symmetry in the loss surface created by model weight permutations.
no code implementations • 24 Aug 2020 • Raphaël Pestourie, Youssef Mroueh, Thanh V. Nguyen, Payel Das, Steven G. Johnson
Surrogate models for partial-differential equations are widely used in the design of meta-materials to rapidly evaluate the behavior of composable components.
no code implementations • 6 Jun 2020 • Hamid Dadkhahi, Karthikeyan Shanmugam, Jesus Rios, Payel Das, Samuel Hoffman, Troy David Loeffler, Subramanian Sankaranarayanan
We consider the problem of black-box function optimization over the boolean hypercube.
2 code implementations • 22 May 2020 • Payel Das, Tom Sercu, Kahini Wadhawan, Inkit Padhi, Sebastian Gehrmann, Flaviu Cipcigan, Vijil Chenthamarakshan, Hendrik Strobelt, Cicero dos Santos, Pin-Yu Chen, Yi Yan Yang, Jeremy Tan, James Hedrick, Jason Crain, Aleksandra Mojsilovic
De novo therapeutic design is challenged by a vast chemical repertoire and multiple constraints, e. g., high broad-spectrum potency and low toxicity.
no code implementations • ACL 2020 • Inkit Padhi, Pierre Dognin, Ke Bai, Cicero Nogueira dos santos, Vijil Chenthamarakshan, Youssef Mroueh, Payel Das
Generative feature matching network (GFMN) is an approach for training implicit generative models for images by performing moment matching on features from pre-trained neural networks.
3 code implementations • ICLR 2020 • Pu Zhao, Pin-Yu Chen, Payel Das, Karthikeyan Natesan Ramamurthy, Xue Lin
In this work, we propose to employ mode connectivity in loss landscapes to study the adversarial robustness of deep neural networks, and provide novel methods for improving this robustness.
no code implementations • NeurIPS 2020 • Vijil Chenthamarakshan, Payel Das, Samuel C. Hoffman, Hendrik Strobelt, Inkit Padhi, Kar Wai Lim, Benjamin Hoover, Matteo Manica, Jannis Born, Teodoro Laino, Aleksandra Mojsilovic
CogMol also includes insilico screening for assessing toxicity of parent molecules and their metabolites with a multi-task toxicity classifier, synthetic feasibility with a chemical retrosynthesis predictor, and target structure binding with docking simulations.
no code implementations • ICLR Workshop DeepDiffEq 2019 • Thanh V. Nguyen, Youssef Mroueh, Samuel Hoffman, Payel Das, Pierre Dognin, Giuseppe Romano, Chinmay Hegde
We consider the problem of optimizing by sampling under multiple black-box constraints in nano-material design.
no code implementations • 4 Feb 2020 • Wei Zhang, Xiaodong Cui, Abdullah Kayi, Mingrui Liu, Ulrich Finkler, Brian Kingsbury, George Saon, Youssef Mroueh, Alper Buyuktosunoglu, Payel Das, David Kung, Michael Picheny
Decentralized Parallel SGD (D-PSGD) and its asynchronous variant Asynchronous Parallel SGD (AD-PSGD) is a family of distributed learning algorithms that have been demonstrated to perform well for large-scale deep learning tasks.
no code implementations • ICLR 2020 • Mingrui Liu, Youssef Mroueh, Jerret Ross, Wei zhang, Xiaodong Cui, Payel Das, Tianbao Yang
Then we propose an adaptive variant of OSG named Optimistic Adagrad (OAdagrad) and reveal an \emph{improved} adaptive complexity $O\left(\epsilon^{-\frac{2}{1-\alpha}}\right)$, where $\alpha$ characterizes the growth rate of the cumulative stochastic gradient and $0\leq \alpha\leq 1/2$.
no code implementations • NeurIPS 2020 • Mingrui Liu, Wei zhang, Youssef Mroueh, Xiaodong Cui, Jerret Ross, Tianbao Yang, Payel Das
Despite recent progress on decentralized algorithms for training deep neural networks, it remains unclear whether it is possible to train GANs in a decentralized manner.
no code implementations • 25 Sep 2019 • Minhao Cheng, Pin-Yu Chen, Sijia Liu, Shiyu Chang, Cho-Jui Hsieh, Payel Das
Enhancing model robustness under new and even adversarial environments is a crucial milestone toward building trustworthy and reliable machine learning systems.
no code implementations • 25 Sep 2019 • N. Joseph Tatro, Pin-Yu Chen, Payel Das, Igor Melnyk, Prasanna Sattigeri, Rongjie Lai
Empirically, this initialization is critical for efficiently learning a simple, planar, low-loss curve between networks that successfully generalizes.
no code implementations • 25 Sep 2019 • Thanh V Nguyen, Youssef Mroueh, Samuel C. Hoffman, Payel Das, Pierre Dognin, Giuseppe Romano, Chinmay Hegde
We consider the problem of generating configurations that satisfy physical constraints for optimal material nano-pattern design, where multiple (and often conflicting) properties need to be simultaneously satisfied.
no code implementations • ICLR Workshop DeepGenStruct 2019 • Tom Sercu, Sebastian Gehrmann, Hendrik Strobelt, Payel Das, Inkit Padhi, Cicero dos Santos, Kahini Wadhawan, Vijil Chenthamarakshan
We present the pipeline in an interactive visual tool to enable the exploration of the metrics, analysis of the learned latent space, and selection of the best model for a given task.
no code implementations • 6 Feb 2019 • Payel Das, Brian Quanz, Pin-Yu Chen, Jae-wook Ahn, Dhruv Shah
Creativity, a process that generates novel and meaningful ideas, involves increased association between task-positive (control) and task-negative (default) networks in the human brain.
no code implementations • 17 Oct 2018 • Payel Das, Kahini Wadhawan, Oscar Chang, Tom Sercu, Cicero dos Santos, Matthew Riemer, Vijil Chenthamarakshan, Inkit Padhi, Aleksandra Mojsilovic
Our model learns a rich latent space of the biological peptide context by taking advantage of abundant, unlabeled peptide sequences.
4 code implementations • NeurIPS 2018 • Amit Dhurandhar, Pin-Yu Chen, Ronny Luss, Chun-Chen Tu, Pai-Shun Ting, Karthikeyan Shanmugam, Payel Das
important object pixels in an image) to justify its classification and analogously what should be minimally and necessarily \emph{absent} (viz.
no code implementations • 21 Dec 2017 • Ravi Tejwani, Adam Liska, Hongyuan You, Jenna Reinen, Payel Das
The goal of the present study is to identify autism using machine learning techniques and resting-state brain imaging data, leveraging the temporal variability of the functional connections (FC) as the only information.
no code implementations • 16 Nov 2017 • Tejas Dharamsi, Payel Das, Tejaswini Pedapati, Gregory Bramble, Vinod Muthusamy, Horst Samulowitz, Kush R. Varshney, Yuvaraj Rajamanickam, John Thomas, Justin Dauwels
In this work, we present a cloud-based deep neural network approach to provide decision support for non-specialist physicians in EEG analysis and interpretation.