How to approach machine learning-based prediction of drug/compound–target interactions

Table 1 Protein family-based average Spearman scores of the best models and baseline models in each dataset split

Name of the descriptor set/representation (explanation)	Fully-dissimilar-split	Dissimilar-compound-split	Random- split
Best performing protein representation (compound: ECFP4)	0.363	0.518	0.868
random200 (protein: random continuous vectors, compound: ECFP4)	0.193	0.436	0.861
only-ecfp4 (no protein vector, compound: ECFP4)	0.302	0.379	0.709
random200-random-ecfp4 (protein: random continuous vectors, compound: random binary vectors)	0.056	0.272	0.504
only-random-ecfp4 (no protein vector, compound: random binary vectors)	0.002	− 0.002	0.315