Use of structure-activity landscape index curves and curve integrals to evaluate the performance of multiple machine learning prediction models

Table 1 Summary of 2D model performance

Model	ANNE	SVM	MLR	KPLS	RF	PLS	ANNE AZ	ANNE Random
Training set
MAE	0.19	0.22	0.20	0.19	0.08	0.22	0.20	0.19
Kendall τ	0.63	0.58	0.61	0.63	0.86	0.60	0.60	0.62
SCI	0.12	0.20	0.90	0.48	0.94	0.12	0.17	-0.13
S(0)	0.63	0.57	0.60	0.62	0.86	0.58	0.59	0.62
S(1)	1.00	1.00	1.00	1.00	1.00	1.00	1.00	-1.00
Test Set
MAE	0.22	0.25	0.23	0.24	0.22	0.25	0.21	0.22
Kendall τ	0.51	0.45	0.48	0.48	0.51	0.45	0.53	0.56
SCI	0.83	0.93	0.93	0.94	0.94	0.75	0.96	-0.67
S(0)	0.52	0.46	0.50	0.50	0.52	0.46	0.54	0.56
S(1)	1.00	1.00	1.00	1.00	1.00	1.00	1.00	-1.00
Prospective Set
MAE	0.36	0.32	0.52	0.19	*	0.33	0.36	0.35
Kendall τ	0.34	0.36	0.14	0.37	*	0.32	0.36	0.33
SCI	0.98	0.98	0.72	0.78	*	0.97	0.77	0.98
S(0)	0.35	0.37	0.15	0.38	*	0.33	0.37	0.35
S(1)	1.00	1.00	1.00	1.00	*	1.00	1.00	1.00

Models using ADMET 2D Predictor descriptors and Kohonen map: ANNE, ADMET Predictor neural net; SVM, ADMET Predictor support vector machine; MLR, ADMET Predictor multiple linear regression; KPLS, ADMET Predictor kernel partial least squares; RF, Pipeline Pilot random forest; PLS, SIMCA-P+ partial least squares; ANNE AZ, ADMET Predictor neural net with AZ descriptors; ANNE Random, ADMET Predictor neural net with randomized choice of training/test sets. The performance properties of the models were calculated as described in CALCULATIONS AND STATISTICS. The properties were not calculated for RF since prediction outliers could not be identified.

ISSN: 1758-2946