An explainability framework for deep learning on chemical reactions exemplified by enzyme-catalysed reaction classification

Table 2 Ablation study on models trained on Rhea. For each run, a fraction (0.01 to 0.5) of the labels has been shuffled in order to simulate real world conditions of non-curated data containing misclassifications

Shuffled fraction/Model	ECX_Rhea	ECXY_Rhea	ECXYZ_Rhea
–	\(0.98\pm 0.00\) (\(0.97\pm 0.01\))	\(0.96\pm 0.01\) (\(0.88\pm 0.02\))	\(0.95\pm 0.00\) (\(0.87\pm 0.02\))
0.01	\(0.98\pm 0.01\) (\(0.96\pm 0.01\))	\(0.95\pm 0.00\) (\(0.86\pm 0.00\))	\(0.94\pm 0.01\) (\(0.86\pm 0.02\))
0.05	\(0.96\pm 0.01\) (\(0.94\pm 0.01\))	\(0.91\pm 0.00\) (\(0.83\pm 0.02\))	\(0.91\pm 0.01\) (\(0.81\pm 0.01\))
0.10	\(0.93\pm 0.00\) (\(0.91\pm 0.01\))	\(0.88\pm 0.00\) (\(0.77\pm 0.02\))	\(0.88\pm 0.01\) (\(0.75\pm 0.01\))
0.20	\(0.88\pm 0.01\) (\(0.86\pm 0.01\))	\(0.88\pm 0.01\) (\(0.86\pm 0.01\))	\(0.82\pm 0.01\) (\(0.68\pm 0.02\))
0.50	\(0.74\pm 0.02\) (\(0.66\pm 0.01\))	\(0.64\pm 0.01\) (\(0.49\pm 0.03\))	\(0.60\pm 0.03\) (\(0.44\pm 0.01\))

In addition, the statistics for the model trained on the non-shuffled data is shown (–). The metrics reported are the accuracies and, shown in parentheses, the F-Scores. Runtimes and energy usage is identical to those reported in the main text

ISSN: 1758-2946