Skip to main content

Table 2 Fingerprint target data set sizes in FPS format

From: The chemfp project

Data set

#Bits

Fingerprint type

#Fingerprints (in millions)

Unique

FPS size (in MiB)

FPS.gz size (in MiB)

chemfp benchmark

ChEMBL 23 subset

166

OpenEye MACCS

1.00

83.6%

54

17.7

chemfp benchmark

PubChem subset

881

PubChem/CACTVS

1.00

98.2

222

53.1

chemfp benchmark

ChEMBL 23 subset

1021

Open Babel FP2

1.00

96.0

258

80.5

chemfp benchmark

ChEMBL 23 subset

2048

RDKit Morgan

1.00

90.6

502

59.9

ChEMBL 24

2048

RDKit Morgan

1.82

94.1

914

99.7

PubChem

881

PubChem/CACTVS

96.9

65.3

21,500

2910

  1. “Unique” is the number of distinct fingerprints as a percentage of the total number of fingerprints