Skip to main content

Table 2 Range of Tanimoto similarity values in similarity matrices

From: Statistical-based database fingerprint: chemical space dependent representation of compound databases

Representation

MACCS keys (166-bit)

ECFP4 (2048-bit)

Minimum

Average

Maximum

Range

Minimum

Average

Maximum

Range

All compoundsa

0.293

0.407

0.804

0.511

0.059

0.114

0.553

0.494

DFP

0.254

0.540

1.000

0.746

0.070

0.408

1.000

0.930

SB-DFP

0.050

0.342

1.000

0.950

0.011

0.185

1.000

0.989

  1. aIt should be noted that the comparisons involving the self-similarity of data sets does not reach a value of 1 and in some cases such self-similarity does not correspond to the highest value in the matrix row, that could be misinterpreted as the existence of pairs of databases more similar to each other than to themselves, which makes no sense. The matrices constructed by using DFP or SB-DFP do not present such problem, since when dealing with unique comparisons, a maximum of 1 is guaranteed for the diagonal of the matrix