TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Molecule Captioning	ChEBI-20	Text+Chem T5-augm-Base	BLEU-2	62.5	# 3
Molecule Captioning	ChEBI-20	Text+Chem T5-augm-Base	BLEU-4	54.2	# 3
Molecule Captioning	ChEBI-20	Text+Chem T5-augm-Base	ROUGE-1	68.2	# 3
Molecule Captioning	ChEBI-20	Text+Chem T5-augm-Base	ROUGE-2	54.3	# 3
Molecule Captioning	ChEBI-20	Text+Chem T5-augm-Base	ROUGE-L	62.2	# 3
Molecule Captioning	ChEBI-20	Text+Chem T5-augm-Base	METEOR	64.8	# 4
Molecule Captioning	ChEBI-20	Text+Chem T5-Base	BLEU-2	58	# 11
Molecule Captioning	ChEBI-20	Text+Chem T5-Base	BLEU-4	49	# 11
Molecule Captioning	ChEBI-20	Text+Chem T5-Base	ROUGE-1	64.7	# 9
Molecule Captioning	ChEBI-20	Text+Chem T5-Base	ROUGE-2	49.8	# 10
Molecule Captioning	ChEBI-20	Text+Chem T5-Base	ROUGE-L	58.6	# 9
Molecule Captioning	ChEBI-20	Text+Chem T5-Base	METEOR	60.4	# 11
Molecule Captioning	ChEBI-20	Text+Chem T5-augm-Small	BLEU-2	56.0	# 13
Molecule Captioning	ChEBI-20	Text+Chem T5-augm-Small	BLEU-4	47.0	# 13
Molecule Captioning	ChEBI-20	Text+Chem T5-augm-Small	ROUGE-1	63.8	# 10
Molecule Captioning	ChEBI-20	Text+Chem T5-augm-Small	ROUGE-2	48.8	# 11
Molecule Captioning	ChEBI-20	Text+Chem T5-augm-Small	ROUGE-L	58	# 11
Molecule Captioning	ChEBI-20	Text+Chem T5-augm-Small	METEOR	58.8	# 13
Molecule Captioning	ChEBI-20	Text+Chem T5-Small	BLEU-2	55.3	# 14
Molecule Captioning	ChEBI-20	Text+Chem T5-Small	BLEU-4	46.2	# 14
Molecule Captioning	ChEBI-20	Text+Chem T5-Small	ROUGE-1	63.3	# 13
Molecule Captioning	ChEBI-20	Text+Chem T5-Small	ROUGE-2	48.1	# 13
Molecule Captioning	ChEBI-20	Text+Chem T5-Small	ROUGE-L	57.4	# 13
Molecule Captioning	ChEBI-20	Text+Chem T5-Small	METEOR	58.3	# 14
Text-based de novo Molecule Generation	ChEBI-20	Text+Chem T5-augm base	BLEU	85.3	# 5
Text-based de novo Molecule Generation	ChEBI-20	Text+Chem T5-augm base	Exact Match	32.2	# 3
Text-based de novo Molecule Generation	ChEBI-20	Text+Chem T5-augm base	Levenshtein	16.87	# 12
Text-based de novo Molecule Generation	ChEBI-20	Text+Chem T5-augm base	MACCS FTS	90.1	# 3
Text-based de novo Molecule Generation	ChEBI-20	Text+Chem T5-augm base	RDK FTS	81.6	# 2
Text-based de novo Molecule Generation	ChEBI-20	Text+Chem T5-augm base	Morgan FTS	75.7	# 3
Text-based de novo Molecule Generation	ChEBI-20	Text+Chem T5-augm base	Frechet ChemNet Distance (FCD)	.05	# 1
Text-based de novo Molecule Generation	ChEBI-20	Text+Chem T5-augm base	Validity	94.3	# 6
Text-based de novo Molecule Generation	ChEBI-20	Text+Chem T5-augm base	Parameter Count	220000000	# 10
Text-based de novo Molecule Generation	ChEBI-20	Text+Chem T5 base	BLEU	75	# 16
Text-based de novo Molecule Generation	ChEBI-20	Text+Chem T5 base	Exact Match	21.2	# 10
Text-based de novo Molecule Generation	ChEBI-20	Text+Chem T5 base	Levenshtein	27.39	# 2
Text-based de novo Molecule Generation	ChEBI-20	Text+Chem T5 base	MACCS FTS	87.4	# 5
Text-based de novo Molecule Generation	ChEBI-20	Text+Chem T5 base	RDK FTS	76.7	# 7
Text-based de novo Molecule Generation	ChEBI-20	Text+Chem T5 base	Morgan FTS	69.7	# 9
Text-based de novo Molecule Generation	ChEBI-20	Text+Chem T5 base	Frechet ChemNet Distance (FCD)	0.061	# 3
Text-based de novo Molecule Generation	ChEBI-20	Text+Chem T5 base	Validity	79.2	# 14
Text-based de novo Molecule Generation	ChEBI-20	Text+Chem T5 base	Parameter Count	220000000	# 10
Text-based de novo Molecule Generation	ChEBI-20	Text+Chem T5-augm small	BLEU	81.5	# 9
Text-based de novo Molecule Generation	ChEBI-20	Text+Chem T5-augm small	Exact Match	19.1	# 12
Text-based de novo Molecule Generation	ChEBI-20	Text+Chem T5-augm small	Levenshtein	21.78	# 6
Text-based de novo Molecule Generation	ChEBI-20	Text+Chem T5-augm small	MACCS FTS	86.4	# 8
Text-based de novo Molecule Generation	ChEBI-20	Text+Chem T5-augm small	RDK FTS	74.4	# 10
Text-based de novo Molecule Generation	ChEBI-20	Text+Chem T5-augm small	Morgan FTS	67.2	# 12
Text-based de novo Molecule Generation	ChEBI-20	Text+Chem T5-augm small	Frechet ChemNet Distance (FCD)	0.06	# 2
Text-based de novo Molecule Generation	ChEBI-20	Text+Chem T5-augm small	Validity	95.1	# 5
Text-based de novo Molecule Generation	ChEBI-20	Text+Chem T5-augm small	Parameter Count	60000000	# 5
Text-based de novo Molecule Generation	ChEBI-20	Text+Chem T5 small	BLEU	73.9	# 17
Text-based de novo Molecule Generation	ChEBI-20	Text+Chem T5 small	Exact Match	15.7	# 14
Text-based de novo Molecule Generation	ChEBI-20	Text+Chem T5 small	Levenshtein	28.54	# 1
Text-based de novo Molecule Generation	ChEBI-20	Text+Chem T5 small	MACCS FTS	85.9	# 9
Text-based de novo Molecule Generation	ChEBI-20	Text+Chem T5 small	RDK FTS	73.6	# 12
Text-based de novo Molecule Generation	ChEBI-20	Text+Chem T5 small	Morgan FTS	66	# 14
Text-based de novo Molecule Generation	ChEBI-20	Text+Chem T5 small	Frechet ChemNet Distance (FCD)	0.066	# 4
Text-based de novo Molecule Generation	ChEBI-20	Text+Chem T5 small	Validity	77.6	# 16
Text-based de novo Molecule Generation	ChEBI-20	Text+Chem T5 small	Parameter Count	60000000	# 5

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/unifying-molecular-and-textual/molecule-captioning-on-chebi-20)](https://paperswithcode.com/sota/molecule-captioning-on-chebi-20?p=unifying-molecular-and-textual)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/unifying-molecular-and-textual/text-based-de-novo-molecule-generation-on)](https://paperswithcode.com/sota/text-based-de-novo-molecule-generation-on?p=unifying-molecular-and-textual)`

Unifying Molecular and Textual Representations via Multi-task Language Modelling

29 Jan 2023 · Dimitrios Christofidellis, Giorgio Giannone, Jannis Born, Ole Winther, Teodoro Laino, Matteo Manica ·

The recent advances in neural language models have also been successfully applied to the field of chemistry, offering generative solutions for classical problems in molecular design and synthesis planning. These new methods have the potential to fuel a new era of data-driven automation in scientific discovery. However, specialized models are still typically required for each task, leading to the need for problem-specific fine-tuning and neglecting task interrelations. The main obstacle in this field is the lack of a unified representation between natural language and chemical representations, complicating and limiting human-machine interaction. Here, we propose the first multi-domain, multi-task language model that can solve a wide range of tasks in both the chemical and natural language domains. Our model can handle chemical and natural language concurrently, without requiring expensive pre-training on single domains or task-specific models. Interestingly, sharing weights across domains remarkably improves our model when benchmarked against state-of-the-art baselines on single-domain and cross-domain tasks. In particular, sharing information across domains and tasks gives rise to large improvements in cross-domain tasks, the magnitude of which increase with scale, as measured by more than a dozen of relevant metrics. Our work suggests that such models can robustly and efficiently accelerate discovery in physical sciences by superseding problem-specific fine-tuning and enhancing human-model interactions.

PDF Abstract

Code

Add Remove Mark official

gt4sd/multitask_text_and_chemistry_… official

↳ Quickstart in

Spaces

Tasks

Add Remove

Language Modelling

Molecule Captioning

Multi-Task Learning

Text-based de novo Molecule Generation

Datasets

ChEBI-20

Results from the Paper

Edit

Ranked #3 on Molecule Captioning on ChEBI-20

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Molecule Captioning	ChEBI-20	Text+Chem T5-augm-Base	BLEU-2	62.5	# 3	Compare
			BLEU-4	54.2	# 3	Compare
			ROUGE-1	68.2	# 3	Compare
			ROUGE-2	54.3	# 3	Compare
			ROUGE-L	62.2	# 3	Compare
			METEOR	64.8	# 4	Compare
Molecule Captioning	ChEBI-20	Text+Chem T5-Base	BLEU-2	58	# 11	Compare
			BLEU-4	49	# 11	Compare
			ROUGE-1	64.7	# 9	Compare
			ROUGE-2	49.8	# 10	Compare
			ROUGE-L	58.6	# 9	Compare
			METEOR	60.4	# 11	Compare
Molecule Captioning	ChEBI-20	Text+Chem T5-augm-Small	BLEU-2	56.0	# 13	Compare
			BLEU-4	47.0	# 13	Compare
			ROUGE-1	63.8	# 10	Compare
			ROUGE-2	48.8	# 11	Compare
			ROUGE-L	58	# 11	Compare
			METEOR	58.8	# 13	Compare
Molecule Captioning	ChEBI-20	Text+Chem T5-Small	BLEU-2	55.3	# 14	Compare
			BLEU-4	46.2	# 14	Compare
			ROUGE-1	63.3	# 13	Compare
			ROUGE-2	48.1	# 13	Compare
			ROUGE-L	57.4	# 13	Compare
			METEOR	58.3	# 14	Compare
Text-based de novo Molecule Generation	ChEBI-20	Text+Chem T5-augm base	BLEU	85.3	# 5	Compare
			Exact Match	32.2	# 3	Compare
			Levenshtein	16.87	# 12	Compare
			MACCS FTS	90.1	# 3	Compare
			RDK FTS	81.6	# 2	Compare
			Morgan FTS	75.7	# 3	Compare
			Frechet ChemNet Distance (FCD)	.05	# 1	Compare
			Validity	94.3	# 6	Compare
			Parameter Count	220000000	# 10	Compare
Text-based de novo Molecule Generation	ChEBI-20	Text+Chem T5 base	BLEU	75	# 16	Compare
			Exact Match	21.2	# 10	Compare
			Levenshtein	27.39	# 2	Compare
			MACCS FTS	87.4	# 5	Compare
			RDK FTS	76.7	# 7	Compare
			Morgan FTS	69.7	# 9	Compare
			Frechet ChemNet Distance (FCD)	0.061	# 3	Compare
			Validity	79.2	# 14	Compare
			Parameter Count	220000000	# 10	Compare
Text-based de novo Molecule Generation	ChEBI-20	Text+Chem T5-augm small	BLEU	81.5	# 9	Compare
			Exact Match	19.1	# 12	Compare
			Levenshtein	21.78	# 6	Compare
			MACCS FTS	86.4	# 8	Compare
			RDK FTS	74.4	# 10	Compare
			Morgan FTS	67.2	# 12	Compare
			Frechet ChemNet Distance (FCD)	0.06	# 2	Compare
			Validity	95.1	# 5	Compare
			Parameter Count	60000000	# 5	Compare
Text-based de novo Molecule Generation	ChEBI-20	Text+Chem T5 small	BLEU	73.9	# 17	Compare
			Exact Match	15.7	# 14	Compare
			Levenshtein	28.54	# 1	Compare
			MACCS FTS	85.9	# 9	Compare
			RDK FTS	73.6	# 12	Compare
			Morgan FTS	66	# 14	Compare
			Frechet ChemNet Distance (FCD)	0.066	# 4	Compare
			Validity	77.6	# 16	Compare
			Parameter Count	60000000	# 5	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

Unifying Molecular and Textual Representations via Multi-task Language Modelling

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove