Skip to main content

Table 5 Summary of the number of compounds that have changed InChIKeys following standardisation for the SureChEMBL, ChEMBL literature and PubChem deposited set

From: An open source chemical structure curation pipeline using RDKit

InChIKey layer change SureChEMBL ChEMBL Literature PubChem
Connectivity 15 13 67
Connectivity and Protonation 5 1 33
Protonation 67 297 4358
Stereochemistry 11 0 16
Stereochemistry and Protonation 0 0 4
Total no of changed InChIKeys after standardisation 98 311 4478
Total no of compounds 520174 147008 297864
% changes InChIKeys 0.19 0.21 1.50
  1. This also includes the number of compounds in the dataset and the percentage of the total sets with changed InChIKeys