PCA plots of target similarity of the protease mutants. Shown are (A) the best and (B) worst performing descriptor sets. The feature based descriptor only codes for presence or absence of features. This leads to points scattered over a smaller area in PCA space and could explain the decreased performance (B). However, the information is shown to have a synergistic effect when combined with a physicochemical property based descriptor (A).