Mean performance of the benchmarked descriptor sets in the GPCR LOSO validation experiments. The mean is calculated over all 32 receptors and the error bar represents the standard deviation. Shown are the MCC (A) and the sensitivity (B). Note that error bars are large due to different performance between models trained on different GPCRs, not between repeats of the individual models. Here extrapolation takes place on the target side as the test set contains unseen targets. The differences between individual descriptor sets are small. Again for individual receptors larger performance differences occur (see main text and Additional file 1: Figure S11 for details). In this case, Z-Scales (3) and Z-Scales (Avg) is the descriptor set exhibiting best performance while ProtFP (Feature) performs the worst.