# Table 2 Formulas for the various similarity and distance metrics

Distance metric Formula for continuous variables a Formula for dichotomous variables a
Manhattan distance $${D}_{A,\ B}={\displaystyle \sum_{j=1}^n}\left|{x}_{jA}-{x}_{jB}\right|$$ D A,B  = a + b − 2c
Euclidean distance $${D}_{A,\ B}={\left[{\displaystyle \sum_{j=1}^n}{\left({x}_{jA}-{x}_{jB}\right)}^2\right]}^{\raisebox{1ex}{1}\!\left/ \!\raisebox{-1ex}{2}\right.}$$ $${D}_{A,B}={\left[a+b-2c\right]}^{\raisebox{1ex}{1}\!\left/ \!\raisebox{-1ex}{2}\right.}$$
Cosine coefficient $${S}_{A,B}=\left[{\displaystyle \sum_{j=1}^n}{x}_{jA}{x}_{jB}\right]/{\left[{\displaystyle \sum_{j=1}^n}{\left({x}_{jA}\right)}^2{\displaystyle \sum_{j=1}^n}{\left({x}_{jB}\right)}^2\right]}^{\raisebox{1ex}{1}\!\left/ \!\raisebox{-1ex}{2}\right.}$$ $${S}_{A,B}=\frac{c}{{\left[ ab\right]}^{\raisebox{1ex}{1}\!\left/ \!\raisebox{-1ex}{2}\right.}}$$
Dice coefficient $${S}_{A,B}=\left[2{\displaystyle \sum_{j=1}^n}{x}_{jA}{x}_{jB}\right]/\left[{\displaystyle \sum_{j=1}^n}{\left({x}_{jA}\right)}^2+{\displaystyle \sum_{j=1}^n}{\left({x}_{jB}\right)}^2\right]$$ S A,B  = 2c/[a + b]
Tanimoto coefficient $${S}_{A,B}=\frac{\left[{\displaystyle {\sum}_{j=1}^n}{x}_{jA}{x}_{jB}\right]}{\left[{\displaystyle {\sum}_{j=1}^n}{\left({x}_{jA}\right)}^2+{\displaystyle {\sum}_{j=1}^n}{\left({x}_{jB}\right)}^2-{\displaystyle {\sum}_{j=1}^n}{x}_{jA}{x}_{jB}\right]}$$ S A,B  = c/[a + b − c]
Soergel distanceb $${D}_{A,\ B}=\left[{\displaystyle \sum_{j=1}^n}\left|{x}_{jA}-{x}_{jB}\right|\right]/\left[{\displaystyle \sum_{j=1}^n} max\left({x}_{jA},{x}_{jB}\right)\right]$$ $${D}_{A,B}=1-\frac{c}{\left[a+b-c\right]}$$
Substructure similarity See Ref [24]
Superstructure similarity See Ref [25]
1. aS denotes similarities, while D denotes distances (according to the more commonly used formula for the given metric). Note that distances and similarities can be converted to one another using Equation  1 . x jA means the j-th feature of molecule A. a is the number of on bits in molecule A, b is number of on bits in molecule B, while c is the number of bits that are on in both molecules.
2. bThe Soergel distance is the complement of the Tanimoto coefficient.