MolGrapher-Synthetic-300K

Introduced by Morin et al. in MolGrapher: Graph-based Visual Recognition of Chemical Structures

The set is created using molecule SMILES retrieved from the database PubChem. Images are then generated from SMILES using the molecule drawing library RDKit. The synthetic set is augmented at multiple levels:

Molecule level: Molecules are randomly transformed by: (1) displaying explicit hydrogens, (2) reducing of the size of bonds connected to explicit hydrogens, (3) displaying explicit methyls, (4) displaying explicit carbons, (5) selecting a molecular conformation, (6) removing implicit hydrogens of atom labels, (7) rotating triple bonds, (8) displaying explicit carbons connected to triple bonds, adding artificial superatom groups with (9) single or (10) multiple attachment points, (11) displaying wedge bonds using solid or dashed bonds, and (12) displaying single bonds as wavy bonds. Rendering level: The rendering parameters used in RDKit are randomly set: (1) the bond width, (2) the font, (3) the font size, (4) the atom label padding, (5) the molecule rotation, which does not rotate atom labels, (6) the display of atom indices and (7) their font size, (8) the hand-drawing style, (9) the charges positions, (10) the display of encircled charges and (11) their size, and (12) the display of aromatic cycles using circles.

Papers


Paper Code Results Date Stars

Dataset Loaders


No data loaders found. You can submit your data loader here.

Tasks


Similar Datasets


License


  • CDLA-Permissive-1.0

Modalities


Languages