USPTO-30K

Introduced by Morin et al. in MolGrapher: Graph-based Visual Recognition of Chemical Structures

We introduce USPTO-30K, a large-scale benchmark dataset of annotated molecule images, which overcomes these limitations. It is created using the pairs of images and MolFiles by the United States Patent and Trademark Office. Each molecule was independently selected among all the available documents from 2001 to 2020. The set consists of three subsets to decouple the study of clean molecules, molecules with abbreviations and large molecules.

USPTO-10K contains 10,000 clean molecules, i.e. without any abbreviated groups. USPTO-10K-abb contains 10,000 molecules with superatom groups. USPTO-10K-L contains 10,000 clean molecules with more than 70 atoms.

Homepage

Benchmarks

Add a new result Link an existing benchmark

No benchmarks yet. Start a new benchmark or link an existing one.

Papers

Paper	Code	Results	Date	Stars

Dataset Loaders

Add Remove

No data loaders found. You can submit your data loader here.

Tasks

Similar Datasets

MolGrapher-Synthetic-300K

Usage

License

CDLA-Permissive-1.0

USPTO-30K

Benchmarks

Add a new result Link an existing benchmark

Papers

Dataset Loaders

Add Remove

Tasks

Similar Datasets

MolGrapher-Synthetic-300K

Usage

License

Modalities

Languages

USPTO-30K

Benchmarks Edit Add a new result Link an existing benchmark

Papers

Dataset Loaders Edit Add Remove

Tasks Edit

Similar Datasets

MolGrapher-Synthetic-300K

Usage

License Edit

Modalities Edit

Languages Edit

Benchmarks

Add a new result Link an existing benchmark

Dataset Loaders

Add Remove

Tasks

License

Modalities

Languages