Mean performance of the benchmarked descriptor sets in the NNRTIs 70–30 validation experiments. The mean is calculated over all 14 mutants (performed 10 times) and the error bar represents the standard deviation. Shown are the R02(A) and the RMSE (B). (See Additional file 1: Figure S15 for details.) Slightly more variance is seen compared to the GPCR experiments. In this case BLOSUM performs the worst among all descriptor sets considered, while ProtFP (Feature) performs the best.