Leveraging heterogeneous data from GHS toxicity annotations, molecular and protein target descriptors and Tox21 assay readouts to predict and rationalise acute toxicity

Table 4 Summary of Random Forest classifier performances across the three different test sets and the four different combinations of descriptors

Test set	Descriptors	ROC AUC	Average precision	Sensitivity	Specificity	CCR
Random	Molecular	0.92	0.83	0.92	0.78	0.85
	Protein target	0.85	0.71	0.81	0.73	0.77
	Tox21 assay	0.60	0.40	0.47	0.67	0.57
	Molecular and protein target	0.91	0.82	0.85	0.79	0.82
Rare scaffolds	Molecular	0.80	0.68	0.64	0.81	0.72
	Protein target	0.70	0.51	0.70	0.59	0.65
	Tox21 assay	0.57	0.36	0.67	0.43	0.55
	Molecular and protein target	0.80	0.68	0.83	0.63	0.73
Single source	Molecular	0.83	0.65	0.70	0.81	0.75
	Protein target	0.79	0.63	0.76	0.67	0.72
	Tox21 assay	0.61	0.39	0.43	0.73	0.58
	Molecular and protein target	0.85	0.69	0.77	0.76	0.76

Generally, the best performing models were those trained using either molecular descriptors alone or in combination with protein target descriptors. Classifiers found the random test set less challenging to predict than the two more challenging test sets

ISSN: 1758-2946