Skip to main content

Table 1 Evaluation of molecular formula generators

From: The Chemistry Development Kit (CDK) v2.0: atom typing, depiction, molecular formulas, and substructure searching

Input

Mass tolerance (±Da)

# of generated formulas

Runtime (s)

HR2

PFG

CDK

HR2

PFG

CDK

10,000 small masses

0.001

616,846

616,846

616,843

669

168

41

10,000 small masses

0.01

6,163,303

6,163,302

6,163,326

689

501

212

20 large masses

0.001

4,912,939

4,912,939

4,912,904

26,370

1292

177

20 large masses

0.01

49,128,811

49,128,810

49,128,815

26,587

3406

1580

  1. The resulting formula counts and runtimes of the HR2, PFG, and CDK chemical formula generators on two different inputs with two different mass tolerance settings. For the set of small masses, 10,000 mass values in the range of 0–500 Da were randomly selected from the Global Natural Products Social Molecular Networking database [64]. For the set of large masses, 20 mass values in the range of 1500–3500 Da were randomly selected from the same database. Formulas were generated using chemical elements C, H, N, O, P, S without bounds (the allowed atom count was set to 0–10,000 for each element). All heuristic filtering rules were disabled for the purpose of the evaluation. The slight differences in the number of generated formulas were caused by different isotope masses embedded in each software and/or by rounding errors during calculation. The runtimes are average values from three independent runs performed on three different 16-core Intel Xeon 2.9 GHz CPU workstations equipped with 189 GB RAM, running Ubuntu Linux version 12.04.5 LTS and OpenJDK Java runtime version 1.7.0_101