Skip to main content

Table 2 Statistics of collected DTI datasets

From: Sequence-based prediction of protein binding regions and drug–target interactions

 

Dataset

No. of compounds

No. of proteins

No. of positive DTIs

No. of negative DTIs

DTI training dataset

DrugBank (v2020)

5080

2685

14,679

KEGG (v2020)

4033

772

11,835

IUPHAR (v2020)

6295

2017

14,282

Total

12,814

3789

36,152

72,304

DTI validation dataset

MATADOR

252

145

307

Liu et al

255

410

508

Total

499

538

307

508

DTI test dataset

DTI test dataset 1a

21,459

1453

20,391

20,391

DTI test dataset 2b

4991

134

5001

5001

  1. aPubChem Bioassays whose druggable proteins were from DGIdb
  2. bSubset of DTI test dataset 1, whose proteins have the same SCOPe family as the BR training dataset