Using test-time augmentation to investigate explainable AI: inconsistencies between method, model and human intuition

Table 8 Model Statistics of pretrained transformer models

Architecture	Training	Canonical				Enumerated				Time (hh:mm)	Size
Architecture	Training	Character accuracy		Sequence accuracy		Character accuracy		Sequence accuracy		Time (hh:mm)	Size
Encoder only	C2C	1.000		1.000		1.000		1.000		01:16	6.7M
	R2C	0.202		0.000		0.195		0.000		00:42	6.7M
	E2C	1.000		1.000		1.000		1.000		03:33	6.7M
	MC2C	0.999		0.993		0.994		0.819		01:25	6.7M
	MR2C	0.202		0.000		0.196		0.000		00:46	6.7M
	ME2C	1.000		0.991		0.999		0.962		10:31	6.7M
Encoder-decoder	C2C	0.998	0.995	0.994	0.994	0.998	0.996	0.994	0.994	01:45	15.9M
	R2C	0.813	0.089	0.000	0.000	0.554	0.082	0.000	0.000	02:44	15.9M
	E2C	0.998	0.995	0.995	0.995	0.998	0.995	0.994	0.994	08:43	15.9M
	MC2C	0.998	0.996	0.993	0.987	0.976	0.965	0.463	0.452	02:53	15.9M
	MR2C	0.812	0.040	0.000	0.000	0.549	0.040	0.000	0.000	05:56	15.9M
	ME2C	0.998	0.994	0.983	0.958	0.998	0.994	0.979	0.936	24:04	15.9M

Bold values are accuracies calculated without previous token information. Models were tested on canonical SMILES to canonical SMILES (canonical) and enumerated SMILES to canonical SMILES (enumerated)

ISSN: 1758-2946