Skip to main content


Table 1 Evaluation of molecular formula generators

From: The Chemistry Development Kit (CDK) v2.0: atom typing, depiction, molecular formulas, and substructure searching

Input Mass tolerance (±Da) # of generated formulas Runtime (s)
10,000 small masses 0.001 616,846 616,846 616,843 669 168 41
10,000 small masses 0.01 6,163,303 6,163,302 6,163,326 689 501 212
20 large masses 0.001 4,912,939 4,912,939 4,912,904 26,370 1292 177
20 large masses 0.01 49,128,811 49,128,810 49,128,815 26,587 3406 1580
  1. The resulting formula counts and runtimes of the HR2, PFG, and CDK chemical formula generators on two different inputs with two different mass tolerance settings. For the set of small masses, 10,000 mass values in the range of 0–500 Da were randomly selected from the Global Natural Products Social Molecular Networking database [64]. For the set of large masses, 20 mass values in the range of 1500–3500 Da were randomly selected from the same database. Formulas were generated using chemical elements C, H, N, O, P, S without bounds (the allowed atom count was set to 0–10,000 for each element). All heuristic filtering rules were disabled for the purpose of the evaluation. The slight differences in the number of generated formulas were caused by different isotope masses embedded in each software and/or by rounding errors during calculation. The runtimes are average values from three independent runs performed on three different 16-core Intel Xeon 2.9 GHz CPU workstations equipped with 189 GB RAM, running Ubuntu Linux version 12.04.5 LTS and OpenJDK Java runtime version 1.7.0_101