From: Are phylogenetic trees suitable for chemogenomics analyses of bioactivity data sets: the importance of shared active compounds and choosing a suitable data embedding method, as exemplified on Kinases

Mean-centered SAC score versus distance plot. Mean centering was performed with respect to both axes in order to better visualize the collection of data points: the average distance was set to 0.5 and the average percentage of shared active compounds was set to 50%, and was called ‘SAC score’ after scaling. A clear, negative relationship was observed with most data points (60%) clustered between SAC score ranges of 40 and 100 and distance ranges of 0.2 and 0.6. Extreme SAC score values above 200 were observed for distances smaller than 0.3. Data points with distances larger than 1.0 were less common (only 4%) and compared to the variation in SAC score observed for data points in distance ranges below 0.5 (between SAC score values of 0 and 200), relatively little variation in SAC scores was observed for these data points (between SAC score values of 20 and 40).

