Skip to main content

Table 2 Formulas for the various similarity and distance metrics

From: Why is Tanimoto index an appropriate choice for fingerprint-based similarity calculations?

Distance metric Formula for continuous variables a Formula for dichotomous variables a
Manhattan distance \( {D}_{A,\ B}={\displaystyle \sum_{j=1}^n}\left|{x}_{jA}-{x}_{jB}\right| \) D A,B  = a + b − 2c
Euclidean distance \( {D}_{A,\ B}={\left[{\displaystyle \sum_{j=1}^n}{\left({x}_{jA}-{x}_{jB}\right)}^2\right]}^{\raisebox{1ex}{$1$}\!\left/ \!\raisebox{-1ex}{$2$}\right.} \) \( {D}_{A,B}={\left[a+b-2c\right]}^{\raisebox{1ex}{$1$}\!\left/ \!\raisebox{-1ex}{$2$}\right.} \)
Cosine coefficient \( {S}_{A,B}=\left[{\displaystyle \sum_{j=1}^n}{x}_{jA}{x}_{jB}\right]/{\left[{\displaystyle \sum_{j=1}^n}{\left({x}_{jA}\right)}^2{\displaystyle \sum_{j=1}^n}{\left({x}_{jB}\right)}^2\right]}^{\raisebox{1ex}{$1$}\!\left/ \!\raisebox{-1ex}{$2$}\right.} \) \( {S}_{A,B}=\frac{c}{{\left[ ab\right]}^{\raisebox{1ex}{$1$}\!\left/ \!\raisebox{-1ex}{$2$}\right.}} \)
Dice coefficient \( {S}_{A,B}=\left[2{\displaystyle \sum_{j=1}^n}{x}_{jA}{x}_{jB}\right]/\left[{\displaystyle \sum_{j=1}^n}{\left({x}_{jA}\right)}^2+{\displaystyle \sum_{j=1}^n}{\left({x}_{jB}\right)}^2\right] \) S A,B  = 2c/[a + b]
Tanimoto coefficient \( {S}_{A,B}=\frac{\left[{\displaystyle {\sum}_{j=1}^n}{x}_{jA}{x}_{jB}\right]}{\left[{\displaystyle {\sum}_{j=1}^n}{\left({x}_{jA}\right)}^2+{\displaystyle {\sum}_{j=1}^n}{\left({x}_{jB}\right)}^2-{\displaystyle {\sum}_{j=1}^n}{x}_{jA}{x}_{jB}\right]} \) S A,B  = c/[a + b − c]
Soergel distanceb \( {D}_{A,\ B}=\left[{\displaystyle \sum_{j=1}^n}\left|{x}_{jA}-{x}_{jB}\right|\right]/\left[{\displaystyle \sum_{j=1}^n} max\left({x}_{jA},{x}_{jB}\right)\right] \) \( {D}_{A,B}=1-\frac{c}{\left[a+b-c\right]} \)
Substructure similarity See Ref [24]
Superstructure similarity See Ref [25]
  1. aS denotes similarities, while D denotes distances (according to the more commonly used formula for the given metric). Note that distances and similarities can be converted to one another using Equation  1 . x jA means the j-th feature of molecule A. a is the number of on bits in molecule A, b is number of on bits in molecule B, while c is the number of bits that are on in both molecules.
  2. bThe Soergel distance is the complement of the Tanimoto coefficient.