Mean performance of the benchmarked descriptor sets in the GPCR 70–30 validation experiments. The mean is calculated over all 32 receptors (performed 10 times) and the error bar represents the standard deviation. Shown are the MCC (A) and the sensitivity (B). The differences between individual descriptor sets are smaller (MCC difference < 0.030, sensitivity difference < 0.020) than in the ACE inhibitor experiments, likely due to the fact that models are based on both chemical and protein similarity. For individual receptors larger performance differences occur (mean MCC difference 0.712, mean sensitivity difference 0.231) (See Additional file 1: Figure S4 for details). Z-scales (3) perform the best on this dataset, while ProtFP (Feature) performs the worst.