Skip to main content

Table 1 Protein family-based average Spearman scores of the best models and baseline models in each dataset split

From: How to approach machine learning-based prediction of drug/compound–target interactions

Name of the descriptor set/representation (explanation)

Fully-dissimilar-split

Dissimilar-compound-split

Random-

split

Best performing protein representation (compound: ECFP4)

0.363

0.518

0.868

random200 (protein: random continuous vectors, compound: ECFP4)

0.193

0.436

0.861

only-ecfp4 (no protein vector, compound: ECFP4)

0.302

0.379

0.709

random200-random-ecfp4 (protein: random continuous vectors, compound: random binary vectors)

0.056

0.272

0.504

only-random-ecfp4 (no protein vector, compound: random binary vectors)

0.002

− 0.002

0.315

  1. Please refer to “Methods” for details about baseline models