Skip to main content

Table 2 Nine metrics for evaluating the performance of four classification methods (RF, RUS, SMO and SMN) with twelve Tox21 qHTS assay datasets

From: Structure–activity relationship-based chemical classification of highly imbalanced Tox21 datasets

Metrics

Classifier

NR-AR

NR-AR-LBD

NR-AhR

NR-Aromatase

NR-ER

NR-ER-LBD

NR-PPAR-γ

SR-ARE

SR-ATAD5

SR-HSE

SR-MMP

SR-p53

Mean

CVa (%)

F1 score

RF

0.1538

0.0000

0.4340

0.2326

0.2727

0.2400

0.0606

0.3359

0.2500

0.2500

0.5106

0.1364

0.2397

60

RUS

0.1176

0.1667

0.4507

0.2222

0.2605

0.1849

0.4051

0.4185

0.2063

0.1058

0.5867

0.2527

0.2815

53

SMO

0.2500

0.0000

0.3883

0.1905

0.3692

0.2857

0.1765

0.2927

0.2439

0.1905

0.3902

0.1395

0.2431

47

SMN

0.1951

0.1111

0.5856

0.5070

0.6078

0.3636

0.3929

0.6791

0.3636

0.2400

0.5850

0.4225

0.4211

42

MCC

RF

0.2859

− 0.0050

0.4101

0.3202

0.2726

0.2891

0.0767

0.2770

0.3377

0.2619

0.4701

0.1801

0.2647

49

RUS

0.1056

0.1602

0.4209

0.1914

0.1816

0.1908

0.3810

0.2950

0.2049

0.1190

0.5537

0.2769

0.2568

53

SMO

0.2805

− 0.0071

0.3669

0.2792

0.3990

0.3018

0.2355

0.2498

0.3091

0.2327

0.3662

0.2019

0.2679

39

SMN

0.1886

0.0975

0.5342

0.4711

0.5643

0.3404

0.3627

0.6177

0.3261

0.2226

0.5492

0.3872

0.3885

42

AUROC

RF

0.8232

0.7963

0.9063

0.7356

0.7601

0.6963

0.6640

0.7867

0.7827

0.7610

0.9194

0.7443

0.7813

10

RUS

0.6785

0.9133

0.8852

0.7627

0.7174

0.7619

0.7937

0.7698

0.7791

0.7065

0.9295

0.8168

0.7929

10

SMO

0.7780

0.7509

0.8936

0.8112

0.7296

0.8072

0.7872

0.7714

0.8151

0.7983

0.8893

0.8510

0.8069

6

SMN

0.6810

0.7969

0.9196

0.8500

0.8628

0.8233

0.7713

0.8910

0.8093

0.8483

0.9294

0.8785

0.8384

8

AUPRC

RF

0.3521

0.0565

0.5846

0.2825

0.3203

0.1887

0.1120

0.4224

0.2881

0.1608

0.5632

0.1881

0.2933

57

RUS

0.1444

0.1068

0.4836

0.2043

0.2420

0.1545

0.5067

0.4140

0.2423

0.0622

0.5237

0.2295

0.2762

59

SMO

0.3290

0.0821

0.5065

0.3504

0.3895

0.2658

0.2806

0.4052

0.3350

0.1993

0.4928

0.2913

0.3273

36

SMN

0.0685

0.0639

0.5660

0.3845

0.5688

0.2018

0.3736

0.6443

0.2422

0.1134

0.5234

0.3254

0.3396

60

Balanced accuracy (BA)

RF

0.5417

0.4991

0.6518

0.5665

0.5830

0.5732

0.5146

0.6016

0.5726

0.5847

0.7053

0.5368

0.5776

10

RUS

0.5929

0.6124

0.8129

0.6828

0.6513

0.6968

0.7454

0.6977

0.7133

0.6665

0.8523

0.7777

0.7085

11

SMO

0.5815

0.4982

0.6304

0.5530

0.6181

0.5964

0.5499

0.5833

0.5718

0.5571

0.6354

0.5377

0.5761

7

SMN

0.6443

0.5544

0.8228

0.7265

0.7922

0.6858

0.6753

0.8545

0.7018

0.6529

0.8452

0.6812

0.7198

13

Precision

RF

1.0000

0.0000

0.6389

0.8333

0.5294

0.6000

0.2500

0.5116

0.8333

0.4286

0.6000

0.5000

0.5604

48

RUS

0.0769

0.1250

0.2991

0.1302

0.1604

0.1111

0.3200

0.2869

0.1193

0.0576

0.4583

0.1464

0.1909

64

SMO

0.5000

0.0000

0.6061

0.8000

0.7500

0.5000

0.6000

0.5143

0.7143

0.5000

0.5714

0.6000

0.5547

36

SMN

0.1379

0.1000

0.4775

0.5294

0.5849

0.3333

0.4074

0.5748

0.2963

0.1818

0.4624

0.4545

0.3784

44

Recall or Sensitivity

RF

0.0833

0.0000

0.3286

0.1351

0.1837

0.1500

0.0345

0.2500

0.1471

0.1765

0.4444

0.0789

0.1677

75

RUS

0.2500

0.2500

0.9143

0.7568

0.6939

0.5500

0.5517

0.7727

0.7647

0.6471

0.8148

0.9211

0.6573

34

SMO

0.1667

0.0000

0.2857

0.1081

0.2449

0.2000

0.1034

0.2045

0.1471

0.1176

0.2963

0.0789

0.1628

54

SMN

0.3333

0.1250

0.7571

0.4865

0.6327

0.4000

0.3793

0.8295

0.4706

0.3529

0.7963

0.3947

0.4965

43

Brier score (BS)

RF

0.3817

0.5425

0.3404

0.3997

0.3883

0.4163

0.3961

0.3725

0.3947

0.4257

0.3215

0.3810

0.3967

14

RUS

0.4461

0.3874

0.3104

0.3724

0.3793

0.4299

0.3204

0.3735

0.3829

0.4871

0.3892

0.3936

0.3894

13

SMO

0.4263

0.6739

0.3281

0.3379

0.4205

0.4067

0.4138

0.3881

0.3924

0.4146

0.3467

0.3814

0.4109

22

SMN

0.4303

0.4156

0.2583

0.3327

0.3134

0.3670

0.3503

0.2761

0.3431

0.3491

0.2371

0.3014

0.3312

18

Sensitivity–specificity gap (SSG)b

RF

0.9167

0.9982

0.6464

0.8628

0.7987

0.8464

0.9601

0.7031

0.8511

0.8165

0.5217

0.9157

0.8198

17

RUS

0.6857

0.7249

0.2028

0.1480

0.0851

0.2937

0.3874

0.1499

0.1027

0.0388

0.0750

0.2867

0.2651

87

SMO

0.8297

0.9964

0.6893

0.8898

0.7463

0.7929

0.8930

0.7576

0.8494

0.8789

0.6783

0.9175

0.8266

12

SMN

0.6221

0.8588

0.1314

0.4800

0.3189

0.5716

0.5920

0.0500

0.4625

0.6000

0.0978

0.5730

0.4465

55

Averagec

RF

0.2157

− 0.0215

0.3297

0.2048

0.1928

0.1638

0.0396

0.2344

0.2184

0.1535

0.3744

0.1187

0.1854

59

RUS

0.0927

0.1358

0.4171

0.2700

0.2714

0.2140

0.3329

0.3479

0.2827

0.2043

0.4728

0.3045

0.2788

39

SMO

0.1811

− 0.0385

0.2956

0.2072

0.2593

0.1953

0.1585

0.2084

0.2105

0.1447

0.2907

0.1557

0.1890

46

SMN

0.1329

0.0638

0.4748

0.3491

0.4424

0.2455

0.2689

0.5294

0.2671

0.1848

0.4840

0.2966

0.3116

47

  1. The metrics were calculated using the test datasets (see Table 1). The best performer among the four classifiers is highlighted in bold for each assay and each evaluation metric. The highest value represents the best performer except for Brier score and sensitivity–specificity gap which are the opposite (i.e., the lower the better). See Additional file 1: Table S1 for the specificity values
  2. aCoefficient of variation (CV) = standard deviation/mean of 12 assays
  3. bSSG = absolute value of (Specificity—Sensitivity)
  4. cAverage (of 9 metrics) = (F1 + MCC + AUROC + AUPRC + BA + Precision + Recall-BS-SSG)/9. The values of BS and SSG are subtracted (instead of added) to the sum because BS and SSG are negatively correlated to model performance