Skip to main content

Table 2 Standardization modification rates

From: PubChem chemical structure standardization

 

Organic

 

Inorganic

 

Mixed

 

Total

 

(102,177,263 successfully standardized substances)

(52,082 successfully standardized substances)

(2,064,089 successfully standardized substances)

(104,293,434 successfully standardized substances)

Modified substances

Exclusively modified Substances

Modified substances

Exclusively modified Substances

Modified substances

Exclusively modified Substances

Modified substances

Exclusively modified Substances

Verify element

–

–

–

–

–

–

0

0

Verify hydrogens

228,654

49,436

–

–

68,629

2598

297,283

52,034

Verify functional groups

226,890

66,911

1680

1678

296,446

121,551

525,016

190,140

Verify valence

–

–

–

–

–

–

0

0

Standardize annotations

–

–

–

–

–

–

0

0

Standardize valence bond

37,258,340

9,643,776

–

–

463,847

97,463

37,722,187

9,741,239

Standardize aromaticity

38,305,291

11,510,081

2

–

444,851

80,666

38,750,144

11,590,747

Standardize stereochemistry

17,614,166

9,738,948

–

–

597,317

407,022

18,211,483

10,145,970

Standardize explicit hydrogens

3190

8

–

–

3580

13

6770

21

Modified substances

45,307,338

 

1680

 

1,064,295

 

46,373,313

 

Modification rate

44.34%

 

3.23%

 

51.56%

 

44.46%

 
  1. Provided is the number of substances that is modified in each standardization step on the PubChem Substance database as well as the number of substances that is modified exclusively in a given step. The total numbers of substances for every substance class (organic, inorganic, mixed) differ from those provided in Table 1 because structures rejected by standardization were not included in the modification analysis