Skip to main content

Table 1 Summary of the molecular datasets used in this study

From: Machine intelligence-driven framework for optimized hit selection in virtual screening

Dataset Name

No. of molecules

No. of active molecules (1)

No. of inactive molecules (0)

Protein class: CXC-chemokine receptor 4 (CXCR4)

 Training dataset*

175

81

94

 Small independent validation dataset

56

43

13

 Large independent benchmark dataset

3415

115

3300

Protein class: Androgen receptor (AR)

 Training dataset*

303

146

157

 Independent test dataset

1121

249

872

  1. *The training dataset partitioned into 7:3 classified as Internal test set (x') for both CS- and PS-modules