Skip to main content

Table 2 Standardization modification rates

From: PubChem chemical structure standardization

  Organic   Inorganic   Mixed   Total  
(102,177,263 successfully standardized substances) (52,082 successfully standardized substances) (2,064,089 successfully standardized substances) (104,293,434 successfully standardized substances)
Modified substances Exclusively modified Substances Modified substances Exclusively modified Substances Modified substances Exclusively modified Substances Modified substances Exclusively modified Substances
Verify element 0 0
Verify hydrogens 228,654 49,436 68,629 2598 297,283 52,034
Verify functional groups 226,890 66,911 1680 1678 296,446 121,551 525,016 190,140
Verify valence 0 0
Standardize annotations 0 0
Standardize valence bond 37,258,340 9,643,776 463,847 97,463 37,722,187 9,741,239
Standardize aromaticity 38,305,291 11,510,081 2 444,851 80,666 38,750,144 11,590,747
Standardize stereochemistry 17,614,166 9,738,948 597,317 407,022 18,211,483 10,145,970
Standardize explicit hydrogens 3190 8 3580 13 6770 21
Modified substances 45,307,338   1680   1,064,295   46,373,313  
Modification rate 44.34%   3.23%   51.56%   44.46%  
  1. Provided is the number of substances that is modified in each standardization step on the PubChem Substance database as well as the number of substances that is modified exclusively in a given step. The total numbers of substances for every substance class (organic, inorganic, mixed) differ from those provided in Table 1 because structures rejected by standardization were not included in the modification analysis