Skip to main content

Table 5 Summary of the number of compounds that have changed InChIKeys following standardisation for the SureChEMBL, ChEMBL literature and PubChem deposited set

From: An open source chemical structure curation pipeline using RDKit

InChIKey layer change

SureChEMBL

ChEMBL Literature

PubChem

Connectivity

15

13

67

Connectivity and Protonation

5

1

33

Protonation

67

297

4358

Stereochemistry

11

0

16

Stereochemistry and Protonation

0

0

4

Total no of changed InChIKeys after standardisation

98

311

4478

Total no of compounds

520174

147008

297864

% changes InChIKeys

0.19

0.21

1.50

  1. This also includes the number of compounds in the dataset and the percentage of the total sets with changed InChIKeys