From: Comparing structural fingerprints using a literature-based similarity benchmark

Histogram showing the effect of successive filters on the pairwise similarity of structures in the same assay. Pairwise similarity was measured using the LECFP4 fingerprint for pairs of structures from each assay in the dataset and a histogram generated using a bin width of 0.05. The initial data (green) was for assays containing up to 25 structures. Successive filters were then applied to restrict the data to those assays of size 8 or greater, to remove promiscuous molecules, and to remove molecules found in Wikipedia or with INNs. For comparison, the pairwise similarity of randomly chosen molecules from the entire dataset is shown as the dashed line. Histograms were normalised to 100 % over all bins, except for the histogram for the random data which was scaled to 30 %

