Skip to main content

Table 4 Summary of Random Forest classifier performances across the three different test sets and the four different combinations of descriptors

From: Leveraging heterogeneous data from GHS toxicity annotations, molecular and protein target descriptors and Tox21 assay readouts to predict and rationalise acute toxicity

Test set Descriptors ROC AUC Average precision Sensitivity Specificity CCR
Random Molecular 0.92 0.83 0.92 0.78 0.85
Protein target 0.85 0.71 0.81 0.73 0.77
Tox21 assay 0.60 0.40 0.47 0.67 0.57
Molecular and protein target 0.91 0.82 0.85 0.79 0.82
Rare scaffolds Molecular 0.80 0.68 0.64 0.81 0.72
Protein target 0.70 0.51 0.70 0.59 0.65
Tox21 assay 0.57 0.36 0.67 0.43 0.55
Molecular and protein target 0.80 0.68 0.83 0.63 0.73
Single source Molecular 0.83 0.65 0.70 0.81 0.75
Protein target 0.79 0.63 0.76 0.67 0.72
Tox21 assay 0.61 0.39 0.43 0.73 0.58
Molecular and protein target 0.85 0.69 0.77 0.76 0.76
  1. Generally, the best performing models were those trained using either molecular descriptors alone or in combination with protein target descriptors. Classifiers found the random test set less challenging to predict than the two more challenging test sets