Skip to main content

Table 8 Model Statistics of pretrained transformer models

From: Using test-time augmentation to investigate explainable AI: inconsistencies between method, model and human intuition

Architecture

Training

Canonical

Enumerated

Time (hh:mm)

Size

Character accuracy

Sequence accuracy

Character accuracy

Sequence accuracy

Encoder only

C2C

1.000

1.000

1.000

1.000

01:16

6.7M

 

R2C

0.202

0.000

0.195

0.000

00:42

6.7M

 

E2C

1.000

1.000

1.000

1.000

03:33

6.7M

 

MC2C

0.999

0.993

0.994

0.819

01:25

6.7M

 

MR2C

0.202

0.000

0.196

0.000

00:46

6.7M

 

ME2C

1.000

0.991

0.999

0.962

10:31

6.7M

Encoder-decoder

C2C

0.998

0.995

0.994

0.994

0.998

0.996

0.994

0.994

01:45

15.9M

 

R2C

0.813

0.089

0.000

0.000

0.554

0.082

0.000

0.000

02:44

15.9M

 

E2C

0.998

0.995

0.995

0.995

0.998

0.995

0.994

0.994

08:43

15.9M

 

MC2C

0.998

0.996

0.993

0.987

0.976

0.965

0.463

0.452

02:53

15.9M

 

MR2C

0.812

0.040

0.000

0.000

0.549

0.040

0.000

0.000

05:56

15.9M

 

ME2C

0.998

0.994

0.983

0.958

0.998

0.994

0.979

0.936

24:04

15.9M

  1. Bold values are accuracies calculated without previous token information. Models were tested on canonical SMILES to canonical SMILES (canonical) and enumerated SMILES to canonical SMILES (enumerated)