TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Text-based de novo Molecule Generation	ChEBI-20	MolReGPT (GPT-4-0413)	Text2Mol	59.3	# 1
Text-based de novo Molecule Generation	ChEBI-20	MolReGPT (GPT-4-0413)	BLEU	85.7	# 3
Text-based de novo Molecule Generation	ChEBI-20	MolReGPT (GPT-4-0413)	Exact Match	28.0	# 6
Text-based de novo Molecule Generation	ChEBI-20	MolReGPT (GPT-4-0413)	Levenshtein	17.14	# 9
Text-based de novo Molecule Generation	ChEBI-20	MolReGPT (GPT-4-0413)	MACCS FTS	90.3	# 2
Text-based de novo Molecule Generation	ChEBI-20	MolReGPT (GPT-4-0413)	RDK FTS	80.5	# 3
Text-based de novo Molecule Generation	ChEBI-20	MolReGPT (GPT-4-0413)	Morgan FTS	73.9	# 4
Text-based de novo Molecule Generation	ChEBI-20	MolReGPT (GPT-4-0413)	Frechet ChemNet Distance (FCD)	0.41	# 6
Text-based de novo Molecule Generation	ChEBI-20	MolReGPT (GPT-4-0413)	Validity	89.9	# 9
Text-based de novo Molecule Generation	ChEBI-20	MolReGPT (GPT-4-0413)	Parameter Count	None	# 1
Molecule Captioning	ChEBI-20	MolReGPT (GPT-4-0314)	BLEU-2	60.7	# 6
Molecule Captioning	ChEBI-20	MolReGPT (GPT-4-0314)	BLEU-4	52.5	# 6
Molecule Captioning	ChEBI-20	MolReGPT (GPT-4-0314)	ROUGE-1	63.4	# 11
Molecule Captioning	ChEBI-20	MolReGPT (GPT-4-0314)	ROUGE-2	47.6	# 14
Molecule Captioning	ChEBI-20	MolReGPT (GPT-4-0314)	ROUGE-L	56.2	# 15
Molecule Captioning	ChEBI-20	MolReGPT (GPT-4-0314)	METEOR	61.0	# 9
Molecule Captioning	ChEBI-20	MolReGPT (GPT-4-0314)	Text2Mol	58.5	# 3
Text-based de novo Molecule Generation	ChEBI-20	MolReGPT (GPT-3.5-turbo)	Text2Mol	57.1	# 10
Text-based de novo Molecule Generation	ChEBI-20	MolReGPT (GPT-3.5-turbo)	BLEU	79.0	# 12
Text-based de novo Molecule Generation	ChEBI-20	MolReGPT (GPT-3.5-turbo)	Exact Match	13.9	# 15
Text-based de novo Molecule Generation	ChEBI-20	MolReGPT (GPT-3.5-turbo)	Levenshtein	24.91	# 4
Text-based de novo Molecule Generation	ChEBI-20	MolReGPT (GPT-3.5-turbo)	MACCS FTS	84.7	# 13
Text-based de novo Molecule Generation	ChEBI-20	MolReGPT (GPT-3.5-turbo)	RDK FTS	70.8	# 13
Text-based de novo Molecule Generation	ChEBI-20	MolReGPT (GPT-3.5-turbo)	Morgan FTS	62.4	# 15
Text-based de novo Molecule Generation	ChEBI-20	MolReGPT (GPT-3.5-turbo)	Frechet ChemNet Distance (FCD)	0.57	# 10
Text-based de novo Molecule Generation	ChEBI-20	MolReGPT (GPT-3.5-turbo)	Validity	88.7	# 11
Molecule Captioning	ChEBI-20	MolReGPT (GPT-3.5-turbo)	BLEU-2	56.5	# 12
Molecule Captioning	ChEBI-20	MolReGPT (GPT-3.5-turbo)	BLEU-4	48.2	# 12
Molecule Captioning	ChEBI-20	MolReGPT (GPT-3.5-turbo)	ROUGE-1	45.0	# 18
Molecule Captioning	ChEBI-20	MolReGPT (GPT-3.5-turbo)	ROUGE-2	54.3	# 3
Molecule Captioning	ChEBI-20	MolReGPT (GPT-3.5-turbo)	ROUGE-L	58.5	# 10
Molecule Captioning	ChEBI-20	MolReGPT (GPT-3.5-turbo)	METEOR	62.3	# 7
Molecule Captioning	ChEBI-20	MolReGPT (GPT-3.5-turbo)	Text2Mol	56.0	# 7

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/empowering-molecule-discovery-for-molecule/text-based-de-novo-molecule-generation-on)](https://paperswithcode.com/sota/text-based-de-novo-molecule-generation-on?p=empowering-molecule-discovery-for-molecule)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/empowering-molecule-discovery-for-molecule/molecule-captioning-on-chebi-20)](https://paperswithcode.com/sota/molecule-captioning-on-chebi-20?p=empowering-molecule-discovery-for-molecule)`

Empowering Molecule Discovery for Molecule-Caption Translation with Large Language Models: A ChatGPT Perspective

11 Jun 2023 · Jiatong Li, Yunqing Liu, Wenqi Fan, Xiao-Yong Wei, Hui Liu, Jiliang Tang, Qing Li ·

Molecule discovery plays a crucial role in various scientific fields, advancing the design of tailored materials and drugs. However, most of the existing methods heavily rely on domain experts, require excessive computational cost, or suffer from sub-optimal performance. On the other hand, Large Language Models (LLMs), like ChatGPT, have shown remarkable performance in various cross-modal tasks due to their powerful capabilities in natural language understanding, generalization, and in-context learning (ICL), which provides unprecedented opportunities to advance molecule discovery. Despite several previous works trying to apply LLMs in this task, the lack of domain-specific corpus and difficulties in training specialized LLMs still remain challenges. In this work, we propose a novel LLM-based framework (MolReGPT) for molecule-caption translation, where an In-Context Few-Shot Molecule Learning paradigm is introduced to empower molecule discovery with LLMs like ChatGPT to perform their in-context learning capability without domain-specific pre-training and fine-tuning. MolReGPT leverages the principle of molecular similarity to retrieve similar molecules and their text descriptions from a local database to enable LLMs to learn the task knowledge from context examples. We evaluate the effectiveness of MolReGPT on molecule-caption translation, including molecule understanding and text-based molecule generation. Experimental results show that compared to fine-tuned models, MolReGPT outperforms MolT5-base and is comparable to MolT5-large without additional training. To the best of our knowledge, MolReGPT is the first work to leverage LLMs via in-context learning in molecule-caption translation for advancing molecule discovery. Our work expands the scope of LLM applications, as well as providing a new paradigm for molecule discovery and design.

PDF Abstract

Code

Add Remove Mark official

phenixace/molregpt official

Tasks

Add Remove

In-Context Learning

Molecule Captioning

Natural Language Understanding

Text-based de novo Molecule Generation

Translation

Datasets

ChEBI-20

Results from the Paper

Edit

Ranked #3 on Text-based de novo Molecule Generation on ChEBI-20

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Text-based de novo Molecule Generation	ChEBI-20	MolReGPT (GPT-4-0413)	Text2Mol	59.3	# 1	Compare
			BLEU	85.7	# 3	Compare
			Exact Match	28.0	# 6	Compare
			Levenshtein	17.14	# 9	Compare
			MACCS FTS	90.3	# 2	Compare
			RDK FTS	80.5	# 3	Compare
			Morgan FTS	73.9	# 4	Compare
			Frechet ChemNet Distance (FCD)	0.41	# 6	Compare
			Validity	89.9	# 9	Compare
			Parameter Count	None	# 1	Compare
Molecule Captioning	ChEBI-20	MolReGPT (GPT-4-0314)	BLEU-2	60.7	# 6	Compare
			BLEU-4	52.5	# 6	Compare
			ROUGE-1	63.4	# 11	Compare
			ROUGE-2	47.6	# 14	Compare
			ROUGE-L	56.2	# 15	Compare
			METEOR	61.0	# 9	Compare
			Text2Mol	58.5	# 3	Compare
Text-based de novo Molecule Generation	ChEBI-20	MolReGPT (GPT-3.5-turbo)	Text2Mol	57.1	# 10	Compare
			BLEU	79.0	# 12	Compare
			Exact Match	13.9	# 15	Compare
			Levenshtein	24.91	# 4	Compare
			MACCS FTS	84.7	# 13	Compare
			RDK FTS	70.8	# 13	Compare
			Morgan FTS	62.4	# 15	Compare
			Frechet ChemNet Distance (FCD)	0.57	# 10	Compare
			Validity	88.7	# 11	Compare
Molecule Captioning	ChEBI-20	MolReGPT (GPT-3.5-turbo)	BLEU-2	56.5	# 12	Compare
			BLEU-4	48.2	# 12	Compare
			ROUGE-1	45.0	# 18	Compare
			ROUGE-2	54.3	# 3	Compare
			ROUGE-L	58.5	# 10	Compare
			METEOR	62.3	# 7	Compare
			Text2Mol	56.0	# 7	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

Empowering Molecule Discovery for Molecule-Caption Translation with Large Language Models: A ChatGPT Perspective

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove