Skip to main content

Table 5 Detections of problematic structure (I)

From: canSAR chemistry registration and standardization pipeline

 

Sure_ChEMBL (SI1)

Pubchem (SI2)

ChEMBL literautre (SI3)

Structures #

52,074

297,864

147,008

ChEMBL pipeline errors (not uploaded structures)

849 (1.6%)

10,692 (3.59%)

0

ChEMBL uploaded structures

51,225 (98.37%)

287,172 (96.41%)

100%

canSAR pipeline errors (rejected structures)

114 (0.22%)

7431 (2.5%)

3 (0.002%)

 SDF parsing errors

0

0

0

 Sanitization errors

110

1540

2

 Standardization errors

4

67

0

 Empty molblock

0

5824

1

canSAR accepted structures

51,960 (99.78%)

290,433 (97.5%)

147,005 (99.99%)

  1. Comparison with ChEMBL Checker on Supplementary files available in ChEMBL chemical standardization pipeline paper [8]. The canSAR pipeline is overall more inclusive with a lower percentage of rejected structures