Skip to main content

Advertisement

Fig. 1 | Journal of Cheminformatics

Fig. 1

From: How frequently do clusters occur in hierarchical clustering analysis? A graph theoretical approach to studying ties in proximity

Fig. 1

The four contrast functions are schematically depicted by means of an example illustrating how both graph-cluster contrasts yielded different results to those of the set-cluster contrasts. (1) Graph-cluster contrast: \(D_i\) is partitioned into its subtrees (graphs) \(g_1\) and \(g_2\), which are gathered in \(P(D_i)=\{g_1, g_2\}\). C is then contrasted with \(P(D_i)\) by assessing whether C is one of the elements of \(P(D_i)\) (red), as in this case it is not, we have \(CC_g(C,D_i)=0\). (2) Relaxed-graph-cluster contrast: to quantify the presence of the parts of P(C) in \(P(D_i)\), the parts of \(P(D_i)\) are expanded into their respective graph partition sets \(P(g_1)\) and \(P(g_2)\) (blue) determining the common graphs between P(C) and \(P(g_1)\) and between P(C) and \(P(g_2)\), if any. There is no common graph in this example, thus \(CC_{rg}(C,D_i)=0\). (3) As set, C is characterised by its elements (leaves), called \(L(C)=\{a,c,d\}\). Graph \(D_i\), to be contrasted with L(C), is characterized by the collection \(N=\{s_1,s_2\}\) where \(s_j=L(g_j)\) i.e. the set of its leaves. L(C) is contrasted with N by evaluating whether L(C) is any of the elements of N (green). In this case, \(s_2=L(C)\) and the cluster contrast as set is 1 \(CC_s(C,D_i)=1\). (4) L(C) is intersected with each element of N to quantify to what extent the elements of L(C) are present in the subtrees of \(D_i\), yielding sets \(\{a,c,d\}\) and \(\{c,d\}\) (purple). The first set indicates that there are three common elements between L(C) and \(\{a,c,d\}\) out of the three elements of L(C) and \(\{a,c,d\}\). The second set shows that there are two common elements between L(C) and \(\{c,d\}\) out of the three elements of L(C) and \(\{c,d\}\). It can thus be stated that the cluster contrast of C in \(D_i\) is 1 (3/3), which is the maximum overlap between L(C) and N (purple)

Back to article page