Skip to main content

Table 2 Nine metrics for evaluating the performance of four classification methods (RF, RUS, SMO and SMN) with twelve Tox21 qHTS assay datasets

From: Structure–activity relationship-based chemical classification of highly imbalanced Tox21 datasets

Metrics Classifier NR-AR NR-AR-LBD NR-AhR NR-Aromatase NR-ER NR-ER-LBD NR-PPAR-γ SR-ARE SR-ATAD5 SR-HSE SR-MMP SR-p53 Mean CVa (%)
F1 score RF 0.1538 0.0000 0.4340 0.2326 0.2727 0.2400 0.0606 0.3359 0.2500 0.2500 0.5106 0.1364 0.2397 60
RUS 0.1176 0.1667 0.4507 0.2222 0.2605 0.1849 0.4051 0.4185 0.2063 0.1058 0.5867 0.2527 0.2815 53
SMO 0.2500 0.0000 0.3883 0.1905 0.3692 0.2857 0.1765 0.2927 0.2439 0.1905 0.3902 0.1395 0.2431 47
SMN 0.1951 0.1111 0.5856 0.5070 0.6078 0.3636 0.3929 0.6791 0.3636 0.2400 0.5850 0.4225 0.4211 42
MCC RF 0.2859 − 0.0050 0.4101 0.3202 0.2726 0.2891 0.0767 0.2770 0.3377 0.2619 0.4701 0.1801 0.2647 49
RUS 0.1056 0.1602 0.4209 0.1914 0.1816 0.1908 0.3810 0.2950 0.2049 0.1190 0.5537 0.2769 0.2568 53
SMO 0.2805 − 0.0071 0.3669 0.2792 0.3990 0.3018 0.2355 0.2498 0.3091 0.2327 0.3662 0.2019 0.2679 39
SMN 0.1886 0.0975 0.5342 0.4711 0.5643 0.3404 0.3627 0.6177 0.3261 0.2226 0.5492 0.3872 0.3885 42
AUROC RF 0.8232 0.7963 0.9063 0.7356 0.7601 0.6963 0.6640 0.7867 0.7827 0.7610 0.9194 0.7443 0.7813 10
RUS 0.6785 0.9133 0.8852 0.7627 0.7174 0.7619 0.7937 0.7698 0.7791 0.7065 0.9295 0.8168 0.7929 10
SMO 0.7780 0.7509 0.8936 0.8112 0.7296 0.8072 0.7872 0.7714 0.8151 0.7983 0.8893 0.8510 0.8069 6
SMN 0.6810 0.7969 0.9196 0.8500 0.8628 0.8233 0.7713 0.8910 0.8093 0.8483 0.9294 0.8785 0.8384 8
AUPRC RF 0.3521 0.0565 0.5846 0.2825 0.3203 0.1887 0.1120 0.4224 0.2881 0.1608 0.5632 0.1881 0.2933 57
RUS 0.1444 0.1068 0.4836 0.2043 0.2420 0.1545 0.5067 0.4140 0.2423 0.0622 0.5237 0.2295 0.2762 59
SMO 0.3290 0.0821 0.5065 0.3504 0.3895 0.2658 0.2806 0.4052 0.3350 0.1993 0.4928 0.2913 0.3273 36
SMN 0.0685 0.0639 0.5660 0.3845 0.5688 0.2018 0.3736 0.6443 0.2422 0.1134 0.5234 0.3254 0.3396 60
Balanced accuracy (BA) RF 0.5417 0.4991 0.6518 0.5665 0.5830 0.5732 0.5146 0.6016 0.5726 0.5847 0.7053 0.5368 0.5776 10
RUS 0.5929 0.6124 0.8129 0.6828 0.6513 0.6968 0.7454 0.6977 0.7133 0.6665 0.8523 0.7777 0.7085 11
SMO 0.5815 0.4982 0.6304 0.5530 0.6181 0.5964 0.5499 0.5833 0.5718 0.5571 0.6354 0.5377 0.5761 7
SMN 0.6443 0.5544 0.8228 0.7265 0.7922 0.6858 0.6753 0.8545 0.7018 0.6529 0.8452 0.6812 0.7198 13
Precision RF 1.0000 0.0000 0.6389 0.8333 0.5294 0.6000 0.2500 0.5116 0.8333 0.4286 0.6000 0.5000 0.5604 48
RUS 0.0769 0.1250 0.2991 0.1302 0.1604 0.1111 0.3200 0.2869 0.1193 0.0576 0.4583 0.1464 0.1909 64
SMO 0.5000 0.0000 0.6061 0.8000 0.7500 0.5000 0.6000 0.5143 0.7143 0.5000 0.5714 0.6000 0.5547 36
SMN 0.1379 0.1000 0.4775 0.5294 0.5849 0.3333 0.4074 0.5748 0.2963 0.1818 0.4624 0.4545 0.3784 44
Recall or Sensitivity RF 0.0833 0.0000 0.3286 0.1351 0.1837 0.1500 0.0345 0.2500 0.1471 0.1765 0.4444 0.0789 0.1677 75
RUS 0.2500 0.2500 0.9143 0.7568 0.6939 0.5500 0.5517 0.7727 0.7647 0.6471 0.8148 0.9211 0.6573 34
SMO 0.1667 0.0000 0.2857 0.1081 0.2449 0.2000 0.1034 0.2045 0.1471 0.1176 0.2963 0.0789 0.1628 54
SMN 0.3333 0.1250 0.7571 0.4865 0.6327 0.4000 0.3793 0.8295 0.4706 0.3529 0.7963 0.3947 0.4965 43
Brier score (BS) RF 0.3817 0.5425 0.3404 0.3997 0.3883 0.4163 0.3961 0.3725 0.3947 0.4257 0.3215 0.3810 0.3967 14
RUS 0.4461 0.3874 0.3104 0.3724 0.3793 0.4299 0.3204 0.3735 0.3829 0.4871 0.3892 0.3936 0.3894 13
SMO 0.4263 0.6739 0.3281 0.3379 0.4205 0.4067 0.4138 0.3881 0.3924 0.4146 0.3467 0.3814 0.4109 22
SMN 0.4303 0.4156 0.2583 0.3327 0.3134 0.3670 0.3503 0.2761 0.3431 0.3491 0.2371 0.3014 0.3312 18
Sensitivity–specificity gap (SSG)b RF 0.9167 0.9982 0.6464 0.8628 0.7987 0.8464 0.9601 0.7031 0.8511 0.8165 0.5217 0.9157 0.8198 17
RUS 0.6857 0.7249 0.2028 0.1480 0.0851 0.2937 0.3874 0.1499 0.1027 0.0388 0.0750 0.2867 0.2651 87
SMO 0.8297 0.9964 0.6893 0.8898 0.7463 0.7929 0.8930 0.7576 0.8494 0.8789 0.6783 0.9175 0.8266 12
SMN 0.6221 0.8588 0.1314 0.4800 0.3189 0.5716 0.5920 0.0500 0.4625 0.6000 0.0978 0.5730 0.4465 55
Averagec RF 0.2157 − 0.0215 0.3297 0.2048 0.1928 0.1638 0.0396 0.2344 0.2184 0.1535 0.3744 0.1187 0.1854 59
RUS 0.0927 0.1358 0.4171 0.2700 0.2714 0.2140 0.3329 0.3479 0.2827 0.2043 0.4728 0.3045 0.2788 39
SMO 0.1811 − 0.0385 0.2956 0.2072 0.2593 0.1953 0.1585 0.2084 0.2105 0.1447 0.2907 0.1557 0.1890 46
SMN 0.1329 0.0638 0.4748 0.3491 0.4424 0.2455 0.2689 0.5294 0.2671 0.1848 0.4840 0.2966 0.3116 47
  1. The metrics were calculated using the test datasets (see Table 1). The best performer among the four classifiers is highlighted in bold for each assay and each evaluation metric. The highest value represents the best performer except for Brier score and sensitivity–specificity gap which are the opposite (i.e., the lower the better). See Additional file 1: Table S1 for the specificity values
  2. aCoefficient of variation (CV) = standard deviation/mean of 12 assays
  3. bSSG = absolute value of (Specificity—Sensitivity)
  4. cAverage (of 9 metrics) = (F1 + MCC + AUROC + AUPRC + BA + Precision + Recall-BS-SSG)/9. The values of BS and SSG are subtracted (instead of added) to the sum because BS and SSG are negatively correlated to model performance