Skip to main content

Table 2 Internal and external validation metrics (mean values +/- one stardard deviation) for the PCM (A), Family QSAM (B), Family QSAR (B), Individual QSAR models (C), Ensemble PCM models combining the most predictive models (D), and Ensemble PCM models combining the whole model library (E)

From: Prediction of the potency of mammalian cyclooxygenase inhibitors with ensemble proteochemometric modeling

   \({q^{2}_{\textit {int}}}\) RMSE int \({R^{2}_{0\ test}}\) RMSE test \({q^{2}_{\textit {test}}}\) CCC
A GBM 0.59 +/- 0.02 0.77 +/- 0.01 0.60 +/- 0.03 0.76 +/- 0.02 0.60 +/- 0.03 0.76 +/- 0.02
  RF 0.60 +/- 0.03 0.78 +/- 0.02 0.61 +/- 0.03 0.79 +/- 0.03 0.61 +/- 0.03 0.74 +/- 0.02
  SVM 0.61 +/- 0.03 0.75 +/- 0.03 0.60 +/- 0.03 0.76 +/- 0.03 0.60 +/- 0.03 0.76 +/- 0.02
B Family QSAR 0.17 +/- 0.02 1.13 +/- 0.02 0.17 +/- 0.04 1.09 +/- 0.03 0.17 +/- 0.04 0.43 +/- 0.03
  Family QSAM 0.16 +/- 0.02 1.10 +/- 0.02 0.16 +/- 0.03 1.10 +/- 0.02 0.16 +/- 0.03 0.28 +/- 0.02
C Ind. QSAR human COX-1 0.31 +/- 0.04 0.75 +/- 0.05 0.30 +/- 0.06 0.74 +/- 0.04 0.30 +/- 0.06 0.45 +/- 0.05
  Ind. QSAR human COX-2 0.60 +/- 0.24 0.78 +/- 0.03 0.54 +/- 0.04 0.78 +/- 0.04 0.53 +/- 0.04 0.68 +/- 0.03
  Ind. QSAR ovine COX-1 0.28 +/- 0.11 0.83 +/- 0.08 0.35 +/- 0.08 0.71 +/- 0.08 0.09 +/- 0.09 0.50 +/- 0.07
  Ind. QSAR ovine COX-2 0.53 +/- 0.07 0.78 +/- 0.06 0.57 +/- 0.13 0.79 +/- 0.08 0.57 +/- 0.13 0.74 +/- 0.09
  Ind. QSAR mouse COX-2 0.49 +/- 0.08 0.84 +/- 0.10 0.57 +/- 0.10 0.81 +/- 0.10 0.57 +/- 0.11 0.71 +/- 0.07
D Greedy Ensemble Best - 0.73 +/- 0.01 0.63 +/- 0.05 0.73 +/- 0.03 0.63 +/- 0.05 0.77 +/- 0.02
  MS Linear Ensemble Best 0.63 +/- 0.02 0.73 +/- 0.01 0.63 +/- 0.05 0.73 +/- 0.03 0.63 +/- 0.05 0.78 +/- 0.02
  MS EN Ensemble Best 0.63 +/- 0.02 0.72 +/- 0.02 0.62 +/- 0.05 0.72 +/- 0.03 0.62 +/- 0.05 0.78 +/- 0.02
  MS SVM Linear Ensemble Best 0.63 +/- 0.01 0.73 +/- 0.02 0.62 +/- 0.04 0.73 +/- 0.03 0.63 +/- 0.05 0.78 +/- 0.02
  MS SVM Radial Ensemble Best 0.63 +/- 0.02 0.73 +/- 0.02 0.63 +/- 0.05 0.73 +/- 0.03 0.63 +/- 0.05 0.78 +/- 0.02
  MS RF Ensemble Best 0.61 +/- 0.01 0.76 +/- 0.01 0.58 +/- 0.05 0.77 +/- 0.03 0.58 +/- 0.05 0.75 +/- 0.02
E Greedy Ensemble - 0.73 +/- 0.01 0.64 +/- 0.05 0.72 +/- 0.03 0.64 +/- 0.05 0.78 +/- 0.02
  MS Linear Ensemble 0.63 +/- 0.02 0.73 +/- 0.02 0.64 +/- 0.05 0.72 +/- 0.02 0.64 +/- 0.05 0.78 +/- 0.02
  MS EN Ensemble 0.64 +/- 0.01 0.73 +/- 0.01 0.63 +/- 0.04 0.73 +/- 0.02 0.63 +/- 0.04 0.78 +/- 0.02
  MS SVM Linear Ensemble 0.64 +/- 0.03 0.73 +/- 0.04 0.64 +/- 0.04 0.71 +/- 0.03 0.64 +/- 0.04 0.80 +/- 0.02
  MS SVM Radial Ensemble 0.64 +/- 0.02 0.73 +/- 0.02 0.65 +/- 0.04 0.71 +/- 0.03 0.65 +/- 0.04 0.80 +/- 0.02
  MS RF Ensemble 0.64 +/- 0.02 0.73 +/- 0.02 0.63 +/- 0.05 0.73 +/- 0.03 0.63 +/- 0.05 0.78 +/- 0.02
  1. “Best” refers to the ensembles trained on only the three most predictive RF, GBM and SVM models. MS of models trained with different algorithms in a models ensemble allows to increase predictive ability, as the highest \(\textit {R}^{\text {2}}_{\text {0}\; \textit {test}}\) and RMSEtest values, 0.652 and 0.706 pIC 50 units respectively, were obtaind with the “MS SVM Radial Ensemble”. The standard deviation for the metrics was calculated with the bootstrap method [74].
  2. Abreviations: CCC Concordance Correlation Coefficient [75,76], EN Elastic Net, GBM Gradient Boosting Machine, Ind. Individual, MS Models Stacking, RF Random Forest, RMSE root mean square error in prediction, SVM Support Vector Machines.