Machine intelligence-driven framework for optimized hit selection in virtual screening

Table 1 Summary of the molecular datasets used in this study

Dataset Name	No. of molecules	No. of active molecules (1)	No. of inactive molecules (0)
Protein class: CXC-chemokine receptor 4 (CXCR4)
Training dataset*	175	81	94
Small independent validation dataset	56	43	13
Large independent benchmark dataset	3415	115	3300
Protein class: Androgen receptor (AR)
Training dataset*	303	146	157
Independent test dataset	1121	249	872

^*The training dataset partitioned into 7:3 classified as Internal test set (x') for both CS- and PS-modules

ISSN: 1758-2946