Skip to main content

Table 2 Formulas for the various similarity and distance metrics

From: Why is Tanimoto index an appropriate choice for fingerprint-based similarity calculations?

Distance metric

Formula for continuous variables a

Formula for dichotomous variables a

Manhattan distance

\( {D}_{A,\ B}={\displaystyle \sum_{j=1}^n}\left|{x}_{jA}-{x}_{jB}\right| \)

D A,B  = a + b − 2c

Euclidean distance

\( {D}_{A,\ B}={\left[{\displaystyle \sum_{j=1}^n}{\left({x}_{jA}-{x}_{jB}\right)}^2\right]}^{\raisebox{1ex}{$1$}\!\left/ \!\raisebox{-1ex}{$2$}\right.} \)

\( {D}_{A,B}={\left[a+b-2c\right]}^{\raisebox{1ex}{$1$}\!\left/ \!\raisebox{-1ex}{$2$}\right.} \)

Cosine coefficient

\( {S}_{A,B}=\left[{\displaystyle \sum_{j=1}^n}{x}_{jA}{x}_{jB}\right]/{\left[{\displaystyle \sum_{j=1}^n}{\left({x}_{jA}\right)}^2{\displaystyle \sum_{j=1}^n}{\left({x}_{jB}\right)}^2\right]}^{\raisebox{1ex}{$1$}\!\left/ \!\raisebox{-1ex}{$2$}\right.} \)

\( {S}_{A,B}=\frac{c}{{\left[ ab\right]}^{\raisebox{1ex}{$1$}\!\left/ \!\raisebox{-1ex}{$2$}\right.}} \)

Dice coefficient

\( {S}_{A,B}=\left[2{\displaystyle \sum_{j=1}^n}{x}_{jA}{x}_{jB}\right]/\left[{\displaystyle \sum_{j=1}^n}{\left({x}_{jA}\right)}^2+{\displaystyle \sum_{j=1}^n}{\left({x}_{jB}\right)}^2\right] \)

S A,B  = 2c/[a + b]

Tanimoto coefficient

\( {S}_{A,B}=\frac{\left[{\displaystyle {\sum}_{j=1}^n}{x}_{jA}{x}_{jB}\right]}{\left[{\displaystyle {\sum}_{j=1}^n}{\left({x}_{jA}\right)}^2+{\displaystyle {\sum}_{j=1}^n}{\left({x}_{jB}\right)}^2-{\displaystyle {\sum}_{j=1}^n}{x}_{jA}{x}_{jB}\right]} \)

S A,B  = c/[a + b − c]

Soergel distanceb

\( {D}_{A,\ B}=\left[{\displaystyle \sum_{j=1}^n}\left|{x}_{jA}-{x}_{jB}\right|\right]/\left[{\displaystyle \sum_{j=1}^n} max\left({x}_{jA},{x}_{jB}\right)\right] \)

\( {D}_{A,B}=1-\frac{c}{\left[a+b-c\right]} \)

Substructure similarity

See Ref [24]

Superstructure similarity

See Ref [25]

  1. aS denotes similarities, while D denotes distances (according to the more commonly used formula for the given metric). Note that distances and similarities can be converted to one another using Equation  1 . x jA means the j-th feature of molecule A. a is the number of on bits in molecule A, b is number of on bits in molecule B, while c is the number of bits that are on in both molecules.
  2. bThe Soergel distance is the complement of the Tanimoto coefficient.