Skip to main content

Table 4 Summary of Random Forest classifier performances across the three different test sets and the four different combinations of descriptors

From: Leveraging heterogeneous data from GHS toxicity annotations, molecular and protein target descriptors and Tox21 assay readouts to predict and rationalise acute toxicity

Test set

Descriptors

ROC AUC

Average precision

Sensitivity

Specificity

CCR

Random

Molecular

0.92

0.83

0.92

0.78

0.85

Protein target

0.85

0.71

0.81

0.73

0.77

Tox21 assay

0.60

0.40

0.47

0.67

0.57

Molecular and protein target

0.91

0.82

0.85

0.79

0.82

Rare scaffolds

Molecular

0.80

0.68

0.64

0.81

0.72

Protein target

0.70

0.51

0.70

0.59

0.65

Tox21 assay

0.57

0.36

0.67

0.43

0.55

Molecular and protein target

0.80

0.68

0.83

0.63

0.73

Single source

Molecular

0.83

0.65

0.70

0.81

0.75

Protein target

0.79

0.63

0.76

0.67

0.72

Tox21 assay

0.61

0.39

0.43

0.73

0.58

Molecular and protein target

0.85

0.69

0.77

0.76

0.76

  1. Generally, the best performing models were those trained using either molecular descriptors alone or in combination with protein target descriptors. Classifiers found the random test set less challenging to predict than the two more challenging test sets