Skip to main content

Table 3 Fingerprint data set sizes in FPB format and largest chunk sizes

From: The chemfp project

Data set

#Bits

#Fingerprints (in millions)

FPB size (in MiB)

AREN size (in MiB)

FPID size (in MiB)

HASH size (in MiB)

chemfp benchmark

166

1.00

54.0

22.9

15.9

15.3

chemfp benchmark

881

1.00

134

107

11.6

15.3

chemfp benchmark

1021

1.00

153

122

15.9

15.3

chemfp benchmark

2048

1.00

275

244

15.9

15.3

ChEMBL 24

2048

1.82

501

444

29.9

27.8

PubChem

881

96.9

13,000

10,300

1130

1480

  1. The AREN chunk contains the fingerprints, the FPID chunk contains record identifiers indexed by position, and the HASH chunk contains a hash table mapping identifiers to index