Skip to main content


Figure 4 | Journal of Cheminformatics

Figure 4

From: Why is Tanimoto index an appropriate choice for fingerprint-based similarity calculations?

Figure 4

Box and whisker plot of the SRD values for eight similarity (and distance) metrics (with range scaling as data pretreatment method) in the SRDall dataset. The uncertainties (distribution) of SRD values reveal equivalent similarity metrics (e.g. Eucl and Manh). The high SRD values of the Euclidean, Manhattan and Substructure similarities indicate that their ranking behavior is significantly different from the average of the eight metrics (consensus), while Cosine, Dice, Soergel and Tanimoto similarities better represent the ranking based on the averages. The coefficient is 1 for non-outlier range. 1.5 coefficients is the limit for the outliers and over 1.5 coefficients the point is detected as an extreme value.

Back to article page