From: Predicting cytotoxicity from heterogeneous data sources with Bayesian learning

Histograms of FCFP_6 Tanimoto internal similarity distribution. Toxic (red) and non-toxic (blue) compounds are shown for the Scripps IC50 set (left) and NCGC IC50 set (right). For each compound, the highest similarity score was kept to any other compound in the same (toxic/non-toxic) class. In the Scripps set the toxic compounds are on average more similar to each other than the non-toxic compounds. In the NCGC set the opposite is the case, toxic compounds do resemble each other less than non-toxic compounds. Similar distributions were obtained with BCI fingerprints. A similarity of 1 does not necessarily imply compounds are identical

