Table 2 Range of Tanimoto similarity values in similarity matrices

From: Statistical-based database fingerprint: chemical space dependent representation of compound databases

Representation MACCS keys (166-bit) ECFP4 (2048-bit)
Minimum Average Maximum Range Minimum Average Maximum Range
All compoundsa 0.293 0.407 0.804 0.511 0.059 0.114 0.553 0.494
DFP 0.254 0.540 1.000 0.746 0.070 0.408 1.000 0.930
SB-DFP 0.050 0.342 1.000 0.950 0.011 0.185 1.000 0.989
  1. aIt should be noted that the comparisons involving the self-similarity of data sets does not reach a value of 1 and in some cases such self-similarity does not correspond to the highest value in the matrix row, that could be misinterpreted as the existence of pairs of databases more similar to each other than to themselves, which makes no sense. The matrices constructed by using DFP or SB-DFP do not present such problem, since when dealing with unique comparisons, a maximum of 1 is guaranteed for the diagonal of the matrix