Table 1 Overview of the proteochemometric datasets modeled in this work

  Adenosine receptors Dengue virus NS3 Proteases Aminergic GPCRs
Datapoints 10,999 199 24,593
Sequences 8 4 91
Ligands 4,419 56 11,121
Source Organisms H. sapiens and Rattus norvegicus Dengue virus H. sapiens, Rattus norvegicus, Mus musculus, Bos taurus, Sus scrofa, Canis familiaris, Cavia porcellus, Chlorocebus aethiops, and Mesocricetus auratus
Bioactivity p K i K c a t p K i
Matrix Completeness (%) 31.11 88.84 2.43
  1. Whereas the compound-target interaction matrix of the dengue virus NS3 proteases dataset is almost complete (88.84%), the adenosine receptors and GPCRs dataset are more challenging to model given: (i) their sparsity (31.11 and 2.43% of matrix completness respectively), and (ii) the consideration of information from human orthologues, being the respective number of different sequences 8 and 91.