Skip to main content
Fig. 4 | Journal of Cheminformatics

Fig. 4

From: Visualization of very large high-dimensional data sets as minimum spanning trees

Fig. 4

TMAP visualizations of the RCSB Protein Data Bank (PDB), PANCAN, and ProteomeHD data. For a and b, please use the interactive versions at http://pdb-tmap.gdb.tools to visualize protein structures associated with each point. 3DP-encoded PDB entries visualized using TMAP with weighted MinHash indexing, the color bars show the log–log distribution of the property values. a Colored according to the macromolecular size (heavy atom count). The resulting map reflects the size-sensitivity of the 3DP fingerprint. b Colored according to the fraction of negative charges in the molecules. Macromolecules with a high fraction of negatively charged atoms, predominantly nucleic acids, are visible as clusters of red branches. c The PANCAN data set (n = 801, d = 20,531) consists of gene expressions data of five types of tumors (PRAD, KIRC, LUAD, COAD, and BRCA) and was indexed using a weighted variant of the MinHash algorithm. d Visualization of the ProteomeHD data set (n = 5013, d = 5013) based on co-regulation scores of proteins. The data points have been colored according to the associated cellular location

Back to article page