Skip to main content
Fig. 2 | Journal of Cheminformatics

Fig. 2

From: Filtered circular fingerprints improve either prediction or runtime performance while retaining interpretability

Fig. 2

Mean changes in AUROC and AUPRC when comparing filtered, folded and unprocessed fingerprints (ECFP4 with bit-vector size 2048). The applied machine learning algorithms are random forests (RF), support vector machines (SVM) and naive Bayes (NB). Note that the scale for the area under the ROC curve (AUROC) is half the scale of the area under prediction recall curve (AUPRC): both measures have a maximum value of 1, however, the baseline of AUROC scores is 0.5 whereas the baseline of AUPRC is close to 0 on some datasets (on the MUV datasets, the ratio of active compounds is 0.002). Run-time corresponds to the speedup for mining the features and building a model of filtering/folding compared to unprocessed fragments. The run-time of filtering is only slightly slower than folding, as both methods yield an equal number of features and the actual filtering routine is fast (on average 5.1 s) compared to the entire model building process (38.4 s for RF)

Back to article page