Skip to main content


Fig. 4 | Journal of Cheminformatics

Fig. 4

From: Mapping and classifying molecules from a high-throughput structural database

Fig. 4

Representation of the similarity matrix corresponding to the protonated lysine dipeptide dataset using the agglomerative clustering algorithm (top) and the sketchmap algorithm (bottom, projection parameters shown following the scheme \(\sigma\)A_Ba_b). A few representative structures (see Eq. 7) of interesting clusters are shown (right) and their corresponding position on the sketchmaps and dendrogram representation is highlighted. The six sketchmaps are colored according to the conformational energy, the minimal distance between \(\hbox {O}_{1}\) or \(\hbox {O}_{2}\) with \(\hbox {N}_{3}\) called \(\text {D}_{\text {ON}}\), and the backbone dihedral angles ϕ, ψ, \({{\upomega }}_1\) and \({{\upomega }}_2\). The dendrogram shows the clustering hierarchy of the structures of the dataset. Each structure is vertically aligned with its properties shown using color bars below the dendrogram. The dendrogram is cut at a linkage distance of 0.1 since structural properties are very similar below this threshold, and the clusters that are merged at this level are shown as thick gray bars separated by light-gray lines. Clusters composed of only one structure are drawn as a black line reaching the bottom of the dendrogram. The main structural motifs of this set of structures are governed by the dihedral angles \({{\upomega }}_1\) and \({{\upomega }}_2\) and the distance \(\text {D}_{\text {ON}}\). The two main clusters a, b are showing a global correlation with the angle \({{\upomega }}_2\) while the angle \({{\upomega }}_1\) splits them into well correlated sub-clusters (e.g. sub-clusters d, e). The other important sub-clustering parameter is the distance \(\text {D}_{\text {ON}}\), e.g. sub-clusters (c) and (b), which also correlates well with the separation between low and high conformational energy shown on the sketchmaps. Two sub-clusters are particular: g is a clear ‘outlier’ due to a chemical change and f features a H-bonding pattern with the side chain NH\(_3^+\) pointing to both carboxy groups that sets this cluster apart from all others

Back to article page