Skip to main content

Table 1 Metrics used to compute the “distance” between two atoms of a molecule

From: Examining the predictive accuracy of the novel 3D N-linear algebraic molecular codifications on benchmark datasets

Metrics

Formulaa

Rangeb

Average

Range

Minkowski (M1–M7)

p = 0.25, 0.5, 1, 1.5, 2, 2.5, 3, and ∞ [where, when p = 1 it is the Manhattan, city-block or taxi distance (also known as Hamming distance between binary vectors) and p = 2 is Euclidean distance)

\(d_{XY} = \left( {\mathop \sum \limits_{j = 1}^{h} \left| {x_{j} - y_{j} } \right|^{p} } \right)^{{\frac{1}{p}}}\)

[0, ∞)

\(\bar{d} = \frac{{d_{XY} }}{{n^{1/p}}}\)

[0, ∞)

Chebyshev/Lagrange (M8) (Minkowski formula when p = ∞)

\(d_{XY} = max\left\{ {\left| {x_{j} - y_{j} } \right|} \right\}\)

Canberra (M10)

\(d_{XY} = \mathop \sum \limits_{j = 1}^{h} \frac{{\left| {x_{j} - y_{j} } \right| }}{{\left| {x_{j} } \right| + \left| {y_{j} } \right|}}\)

[0, n]

\(\bar{d} = \frac{{d_{XY} }}{n}\)

[0, 1]

Lance–Williams/Bray–Curtis (M11)

\(d_{XY} = \frac{{\mathop \sum \nolimits_{j = 1}^{h} \left| {x_{j} - y_{j} } \right| }}{{\mathop \sum \nolimits_{j = 1}^{h} \left( {\left| {x_{j} } \right| + \left| {y_{j} } \right|} \right) }}\)

[0, 1]

\(\bar{d} = \frac{{d_{XY} }}{n}\)

\(\left[ {0,\frac{1}{n}} \right]\)

Clark/coefficient of divergence (M12)

\(d_{XY} = \sqrt {\mathop \sum \limits_{j = 1}^{h} \left( {\frac{{x_{j} - y_{j} }}{{\left| {x_{j} } \right| + \left| {y_{j} } \right|}}} \right)^{2} }\)

[0, n]

\(\bar{d} = \frac{{d_{XY} }}{\sqrt n }\)

\(\left[ {0,\sqrt n } \right]\)

Soergel (M13)

\(d_{XY} = \frac{1}{n}\mathop \sum \limits_{j = 1}^{h} \frac{{\left| {x_{j} - y_{j} } \right| }}{{max\left\{ {x_{j} ,y_{j} } \right\}}}\)

[0, 1]

\(\bar{d} = \frac{{d_{XY} }}{n}\)

\(\left[ {0,\frac{1}{n}} \right]\)

Bhattacharyya (M14)

\(d_{XY} = \sqrt {\mathop \sum \limits_{j = 1}^{h} \left( {\sqrt {x_{j} } - \sqrt {y_{j} } } \right)^{2} }\)

[0, ∞)

\(\bar{d} = \frac{{d_{XY} }}{\sqrt n }\)

[0, ∞)

Wave–Edges (M15)

\(d_{XY} = \mathop \sum \limits_{j = 1}^{h} \left( {1 - \frac{{min\left\{ {x_{j} ,y_{j} } \right\} }}{{max\left\{ {x_{j} ,y_{j} } \right\}}}} \right)\)

[0, n]

\(\bar{d} = \frac{{d_{XY} }}{n}\)

[0, 1]

Angular separation/[1 − Cosine (Ochiai)] (M16)

d XY  = 1−Cos XY where, \(Cos_{XY} = \frac{{\varvec{XY}}}{{\varvec{XY}}} = \frac{{\mathop \sum \nolimits_{j = 1}^{h} x_{j} y_{j} }}{{\sqrt {\mathop \sum \nolimits_{j = 1}^{h} x_{j}^{2} \mathop \sum \nolimits_{j = 1}^{h} y_{j}^{2} } }}\)

[0, 2]

  
  1. aThe variables x j and y j are the values of the coordinate j of the atoms X and Y of a molecule, respectively. The h value is equal to 3 and corresponds to the 3D Cartesian coordinates (x, y, z) of an atom. The p values in Minkowski metric are 0.25, 0.5, 1 (Manhattan), 1.5, 2 (Euclidean), 2.5 and 3 (Minkowski)
  2. bRange” refers to “range” and not to “rank” and is defined as Range = max{x j } − min{x j }