Skip to main content

Table 6 Detections of problematic structure (II)

From: canSAR chemistry registration and standardization pipeline

  

PubChem

Add_File_4

 

Structures #

375,397

 

 PubChem pipeline errors (rejected compounds)

375,397 (100%)

ERRORS found

PubChem Checker

  Invalid isotope specifications

141

  Valence check

364,946 (97.22%)

  Identical charges on adjacent atoms or invalid valence after valence bond canonicalization

10,243

  Excess the limit of 999 explicit atoms

65

 

 canSAR pipeline errors (rejected compounds)

285,552 (76.07%)

 

  SDF parsing errors

0

 

  Sanitization errors

270,131 (71.96%)

  Standardization errors

2954 (0.78%)

 

  Empty molblock

12,467 (3.37%)

 

canSAR accepted structures

89,845 (23.93%)

  1. Comparison with PubChem Checker on Supplementary files available from PubChem chemical standardization pipeline paper [9]. Additional file 4 was used. The canSAR pipeline is overall more inclusive with a lower percentage of rejected structures but a superior performance in correcting wrong structure ahead of importing them in the database