Skip to main content

Table 5 Summary of systematic benchmark comparing v1.4.19 to v2.0

From: The Chemistry Development Kit (CDK) v2.0: atom typing, depiction, molecular formulas, and substructure searching

Benchmark

Data set

CDK v1.4.19

CDK v2.0

Improvement

Skip

Time

Per min

Skip

Time

Per min

countheavy

ChEBI 149

smi

2112

22.51s

108.2K

9

0.85s

2.9M

26.48

sdf

0

7.21s

355.4K

25

3s

854.1K

2.4

ChEMBL 22.1

smi

0

8m39.3s

193.9K

9

10.74s

9.4M

48.35

sdf

0

3m17.29s

510.4K

0

53.27s

1.9M

3.7

rings

-mark

ChEBI 149

smi

2112

22.91s

106.3K

9

1.06s

2.3M

21.61

sdf

0

8.71s

294.2K

25

3.11s

823.9K

2.8

ChEMBL 22.1

smi

0

8m45.78s

191.5K

9

17.09s

5.9M

30.77

sdf

0

4m12.01s

399.6K

0

1m6.54s

1.5M

3.79

rings

-sssr

ChEBI 149

smi

2112

27.4s

88.9K

9

1.43s

1.7M

19.16

sdf

0

11.84s

216.4K

25

3.78s

677.8K

3.13

ChEMBL 22.1

smi

0

12m4.62s

139K

9

27.16s

3.7M

26.68

sdf

0

7m9.58s

234.4K

0

1m8.17s

1.5M

6.3

rings

-all

ChEBI 149

smi

2126

45.28s

53.8K

26

1.26s

1.9M

35.94

sdf

16

36.56s

70.1K

40

3.51s

730K

10.42

ChEMBL 22.1

smi

88

12m40.2s

132.5K

9

24.97s

4M

30.44

sdf

90

8m5.64s

207.4K

0

1m5.68s

1.5M

7.39

cansmi

ChEBI 149

smi

2112

36.58s

66.6K

9

1.91s

1.3M

19.15

sdf

35

21.15s

121.1K

26

4.37s

586.3K

4.84

ChEMBL 22.1

smi

14

14m33.86s

115.2K

9

40.84s

2.5M

21.4

sdf

0

8m59.82s

186.6K

0

1m29.33s

1.1M

6.04

convert

-ofmt smi

ChEBI 149

smi

2112

35.63s

68.4K

16

1.47s

1.7M

24.24

sdf

35

20.91s

122.5K

25

4.55s

563.1K

4.6

ChEMBL 22.1

smi

14

14m26.02s

116.3K

37

26.2s

3.8M

33.05

sdf

0

8m59.38s

186.7K

1

1m12.49s

1.4M

7.44

convert

-ofmt sdf

ChEBI 149

smi

2112

32.42s

75.1K

9

10.39s

234.4K

3.12

sdf

13

17s

150.7K

25

13.96s

183.5K

1.22

ChEMBL 22.1

smi

0

14m25.82s

116.3K

9

5m26.29s

308.6K

2.65

sdf

1

8m51.33s

189.5K

0

6m34.5s

255.3K

1.35

convert

-gen2d

-ofmt sdf

ChEBI 149

smi

2112

24m28.02s

1.7K

9

35.86s

67.9K

40.94

sdf

13

35m12.03s

1.2K

25

42.43s

60.4K

49.78

ChEMBL 22.1

smi

0

3h27m7s

8.1K

9

17m44.64s

94.6K

11.67

sdf

1

5h58m30s

4.7K

0

19m42.77s

85.1K

18.19

fpgen

-type path

ChEBI 149

smi

2112

1m38s

24.9K

9

10.28s

236.9K

9.53

sdf

0

2m11.03s

19.6K

25

13.03s

196.6K

10.06

ChEMBL 22.1

smi

0

42m56.15s

39.1K

9

6m34.67s

255.2K

6.53

sdf

0

47m5.58s

35.6K

0

7m52.32s

213.2K

5.98

fpgen

-type maccs

ChEBI 149

smi

2150

1h37m35s

416

9

19.51s

124.8K

300.1

sdf

48

1h44m17s

409

25

21.25s

120.6K

294.45

ChEMBL 22.1

smi

214

20h24m57s

1.4K

9

13m31.21s

124.1K

90.6

sdf

225

24h41m46s

1.1K

0

13m26.41s

124.9K

110.25

fpgen

-type circ

ChEBI 149

smi

0

 

–

9

4.37s

557.4K

0

sdf

0

 

–

25

6.81s

376.2K

0

ChEMBL 22.1

smi

0

 

–

9

2m43.45s

616.1K

0

sdf

0

 

–

0

3m42.01s

453.6K

0

  1. The total elapsed real time was measured with the unix time utility. The throughput is reported in molecules per minute (K = thousand, M = million) as a relatable metric. This throughput was calculated by taking the total elapsed time and dividing it by the number of molecule in the dataset (42704 for ChEBI 149, and 1678393 for ChEMBL 22.1). The ChEBI SMILES input contains 2107 blank (but valid) inputs, this accounts for the majority skipped in v1.4.19. The throughput calculation was adjust to account for this