Skip to main content

Table 11 AUROC, accuracy, F1, MCC precision and recall scores with bootstrap variability of the TransformerCNN models transfer learned on Ames data

From: Using test-time augmentation to investigate explainable AI: inconsistencies between method, model and human intuition

Architecture

Training

AUROC\(\uparrow\)

Accuracy\(\uparrow\)

F1\(\uparrow\)

MCC\(\uparrow\)

Precision\(\uparrow\)

Recall\(\uparrow\)

No training

Untrained

0.565 ± 0.006

0.502 ± 0.002

0.665 ± 0.001

0.019 ± 0.016

0.990 ± 0.002

0.501 ± 0.001

 

Native

0.702 ± 0.005

0.659 ± 0.006

0.669 ± 0.006

0.318 ± 0.012

0.689 ± 0.007

0.650 ± 0.006

Encoder only

C2C

0.697 ± 0.005

0.654 ± 0.007

0.654 ± 0.007

0.309 ± 0.013

0.652 ± 0.007

0.655 ± 0.007

 

R2C

0.714 ± 0.005

0.652 ± 0.007

0.660 ± 0.007

0.304 ± 0.013

0.676 ± 0.007

0.645 ± 0.006

 

E2C

0.692 ± 0.005

0.649 ± 0.007

0.660 ± 0.007

0.298 ± 0.014

0.683 ± 0.007

0.639 ± 0.007

 

MC2C

0.725 ± 0.006

0.660 ± 0.008

0.663 ± 0.008

0.320 ± 0.015

0.668 ± 0.008

0.657 ± 0.008

 

MR2C

0.695 ± 0.005

0.643 ± 0.007

0.649 ± 0.007

0.287 ± 0.014

0.660 ± 0.007

0.638 ± 0.007

 

ME2C

0.740 ± 0.006

0.680 ± 0.008

0.686 ± 0.007

0.360 ± 0.015

0.701 ± 0.008

0.673 ± 0.007

Encoder-decoder

C2C

0.711 ± 0.004

0.659 ± 0.006

0.657 ± 0.006

0.319 ± 0.012

0.652 ± 0.007

0.662 ± 0.006

 

R2C

0.680 ± 0.006

0.634 ± 0.007

0.633 ± 0.008

0.269 ± 0.015

0.631 ± 0.008

0.635 ± 0.008

 

E2C

0.713 ± 0.004

0.653 ± 0.006

0.652 ± 0.006

0.306 ± 0.012

0.649 ± 0.006

0.655 ± 0.006

 

MC2C

0.726 ± 0.006

0.663 ± 0.008

0.667 ± 0.007

0.326 ± 0.015

0.676 ± 0.008

0.659 ± 0.007

 

MR2C

0.644 ± 0.004

0.584 ± 0.001

0.583 ± 0.001

0.167 ± 0.001

0.583 ± 0.001

0.584 ± 0.001

 

ME2C

0.721 ± 0.006

0.663 ± 0.008

0.668 ± 0.008

0.326 ± 0.016

0.678 ± 0.008

0.658 ± 0.008

  1. Values are based on the scaffold split. ± values have been determined using 1000 fold test-time bootstrapping