Exploring the ability of machine learning-based virtual screening models to identify the functional groups responsible for binding

Table 1 Performance of the RF_Morgan model on different datasets

Dataset	Random Accuracy	Accuracy	AU-PRC	Balanced Accuracy	Balanced AU-PRC
ZINC	0.504	0.52	0.53	N/A	N/A
DUDE-AA2AR	0.93	1.0	1.0	0.984	0.996
DUDE-DRD3	0.972	0.998	1.0	0.978	0.995
DUDE-FA10	0.97	1.0	1.0	0.992	1.0
DUDE-MK14	0.976	0.998	1.0	0.994	1.0
DUDE-VGFR2	0.984	1.0	1.0	0.99	0.999
LIT-ALDH1	0.613	0.76	0.809	0.768	0.806
LIT-FEN1	0.956	0.958	0.584	0.778	0.883
LIT-MAPK1	0.964	0.964	0.292	0.692	0.797
LIT-PKM2	0.944	0.952	0.755	79	0.901
LIT-VDR	0.928	0.942	0.6	0.772	0.87

Predictive accuracy substantially better than random suggests that the datasets may suffer from ligand-specific bias
Accuracy denotes the proportion of correctly classified examples, whereas Random Accuracy denotes the accuracy that would have been obtained by assigning all examples the most common label \((=\text {max}(\% \text { actives, }\%\text { inactives})\). AU-PRC denotes the area under the Precision-Recall curve. Balanced Accuracy and Balanced AU-PRC denote the respective accuracy and area under the Precision-Recall curve when the model was trained using an equivalent number of actives and inactives

ISSN: 1758-2946