Skip to main content
Fig. 3 | Journal of Cheminformatics

Fig. 3

From: Mapping and classifying molecules from a high-throughput structural database

Fig. 3

Representation of the similarity matrix corresponding to the lysine dipeptide dataset using the agglomerative clustering algorithm (top) and the sketchmap algorithm (bottom, projection parameters shown following the scheme \(\sigma\)–A_B–a_b). A few representative structures (see Eq. (7)) of interesting clusters are shown (right) and their corresponding position on the sketchmaps and dendrogram representation is highlighted. The five sketchmaps are colored according to the conformational energy and the backbone dihedral angles ϕ, ψ, \({{\upomega }}_1\) and \({{\upomega }}_2\). The dendrogram shows the clustering hierarchy of the structures of the dataset. Each structure is vertically aligned with its properties shown using color bars below the dendrogram. The dendrogram is cut at a linkage distance of 0.1 since structural properties are very similar below this threshold, and the clusters that are merged at this level are shown as thick gray bars separated by light-gray lines. Clusters composed of only one structure are drawn as a black line reaching the bottom of the dendrogram. The main structural motifs of this set of structures are governed by the peptide bond dihedral angles \({{\upomega }}_1\) and \({{\upomega }}_2\). The two main clusters a, b are showing a global correlation with the angle \({{\upomega }}_2\) while the angle \({{\upomega }}_1\) splits them into two well correlated sub-clusters (d)–(g) respectively. The cluster c is highlighted as an example containing ‘outlier’ structures of low conformational energy

Back to article page