Mean performance of the benchmarked descriptor sets in the NNRTIs LOSO validation experiments. The mean is calculated over all 14 mutants and the error bar represents the standard deviation. Shown are the R02(A) and the RMSE (B). Note that error bars are large due to different performance between models trained on different mutants, not between repeats of the individual models. Extrapolation takes place on the target side as the test set contains unseen targets. The differences between individual descriptor sets are still small but the spread of the standard deviation increases. Again for individual receptors larger performance differences occur (see main text and Additional file 1: Figure S10 for details). In this part of the study ProtFP (Feature) shows very good performance, which indicates that a simplified representation on the protein side is favorable for this data set.