Skip to main content

Table 3 Cross-validation and testing metrics for the single and ensemble PCM models trained on the COX dataset

From: Chemically Aware Model Builder (camb): an R package for property and bioactivity modelling of small molecules

Algorithm \(R^{2}_{CV}\) RMSECV \(R^{2}_{0\ test}\) RMSEtest
A
 GBM 0.59 0.77 0.60 0.76
 RF 0.60 0.78 0.61 0.79
 SVM 0.61 0.75 0.60 0.76
B
 Greedy ensemble 0.73 0.63 0.73
 Linear stacking 0.63 0.73 0.63 0.73
 EN stacking 0.63 0.72 0.62 0.72
 SVM linear stacking 0.63 0.73 0.62 0.73
 SVM radial stacking 0.63 0.73 0.63 0.73
 RF stacking 0.61 0.76 0.58 0.77
  1. Combining single models trained with different algorithms in model ensembles allows to increase model predictive ability. We obtained the highest \(R^{2}_{0\ test}\) and RMSEtest values namely, 0.63 and 0.73 pIC50 unit respectively, with the greedy ensemble, and with the following model stacking techniques: (1) linear, and (2) SVM radial.
  2. EN Elastic Net, GBM Gradient Boosting Machine, RF Random Forest, RMSE root mean square error in prediction, SVM Support Vector Machines.