Skip to main content

Table 2 Checker total number of the different penalty scores output from subjecting the ChEMBL Literature set, the SureChEMBL set and the PubChem Set to the Checker process

From: An open source chemical structure curation pipeline using RDKit

Penalty score Penalty explanation SureChEMBL ChEMBL Literature PubChem
7 Error-9986 (Cannot process aromatic bonds) 4 0 0
Illegal input 0 1 0
InChI: Unknown element(s) 3 0 1355
6 All atoms have zero coordinates 0 0 12
InChI: Accepted unusual valence(s) 73 1 2155
InChI: Empty structure 0 1 5824
Molecule has 3D coordinates 0 1 1024
Molecule has a radical that is not found in the known list 187 1 252
Molecule has six (or more) atoms with exactly the same coordinates 3 0 206
Number of atoms less than 1 0 1 5824
Polymer information in mol file 2 0 0
V3000 mol file 594 0 0
5 InChI_RDKit/Mol stereo mismatch 588 152 339
Mol/Inchi/RDKit stereo mismatch 0 0 28
RDKit_Mol/InChI stereo mismatch 23 22 1479
Molecule has a bond with an illegal stereo flag 1054 0 0
Molecule has a bond with an illegal type 6 0 0
Molecule has a crossed bond in a ring 34 36 134
Molecule has two (or more) atoms with exactly the same coordinates 4 5 2367
2 InChI_Mol/RDKit stereo mismatch 0 55 307
Molecule has a stereo bond in a ring 2359 5763 7061
Molecule has an atom with multiple stereo bonds 1493 52 3660
Molecule has a stereo bond to a stereocenter 331 27 983
Molecule has the 3D flag set for a 2D conformer 0 0 5
Other InChI Warnings 20188 34052 170678
  No errors 15015 111137 177815
  1. Note that the number of penalty scores output is not the same as the number of compounds as some compounds return multiple penalty scores