Skip to main content

Table 1 Examples of Chemical Compounds Databases

From: Bloom filters for molecules

Name

Size (# of compounds)

Size (GB)\(^a\)

Bloom filter size needed (GB)\(^{b,c,d}\)

ZINC [2]

\(>2\) billion

>100

>2.56730

ChemBL [18]

2, 354, 965

0.117

0.003023

Coconut [13]

407, 270

0.0204

0.000522

BindingDB [19]

566, 000

0.0283

0.000727

PubChem [20]

113, 993, 087

5.700

0.146344

SureChemBL [21]

22, 843, 364

1.1422

0.02932

Available Chemical Directory

\(>3,2\) million

> 0.16

>0.004108

ChemNavigator

10, 000, 000

0.5

0.012837

ChemBridge

1,3 million

0.065

0.001668

ChemSpider [22]

\(>115,000,000\)

>5.75

>0.14763

  1. \(^a\) Estimated text file sizes for SMILES based on a 95 M sample; total sizes inferred by linear scaling to avoid loading entire datasets. \(^b\) Estimated Bloom filter size needed for a fixed false positive rate of 0.005. \(^c\) With this specifications, all Bloom filters use 8 hashing functions. \(^d\) To get the length in bits of the filters transform from GB to bits with \(1\texttt {GB} = 8 \times 10^{9} \texttt {Bits}\). Finally, the index time for our Bloom filters is approximately 1 million elements per second