Chemically Aware Model Builder (camb): an R package for property and bioactivity modelling of small molecules

Table 3 Cross-validation and testing metrics for the single and ensemble PCM models trained on the COX dataset

Algorithm	\(R^{2}_{CV}\)	RMSE_CV	\(R^{2}_{0\ test}\)	RMSE_test
A
GBM	0.59	0.77	0.60	0.76
RF	0.60	0.78	0.61	0.79
SVM	0.61	0.75	0.60	0.76
B
Greedy ensemble	–	0.73	0.63	0.73
Linear stacking	0.63	0.73	0.63	0.73
EN stacking	0.63	0.72	0.62	0.72
SVM linear stacking	0.63	0.73	0.62	0.73
SVM radial stacking	0.63	0.73	0.63	0.73
RF stacking	0.61	0.76	0.58	0.77

Combining single models trained with different algorithms in model ensembles allows to increase model predictive ability. We obtained the highest \(R^{2}_{0\ test}\) and RMSE_test values namely, 0.63 and 0.73 pIC₅₀ unit respectively, with the greedy ensemble, and with the following model stacking techniques: (1) linear, and (2) SVM radial.
EN Elastic Net, GBM Gradient Boosting Machine, RF Random Forest, RMSE root mean square error in prediction, SVM Support Vector Machines.

ISSN: 1758-2946