Skip to main content

Table 2 Ablation study on models trained on Rhea. For each run, a fraction (0.01 to 0.5) of the labels has been shuffled in order to simulate real world conditions of non-curated data containing misclassifications

From: An explainability framework for deep learning on chemical reactions exemplified by enzyme-catalysed reaction classification

Shuffled fraction/Model

ECXRhea

ECXYRhea

ECXYZRhea

–

\(0.98\pm 0.00\) (\(0.97\pm 0.01\))

\(0.96\pm 0.01\) (\(0.88\pm 0.02\))

\(0.95\pm 0.00\) (\(0.87\pm 0.02\))

0.01

\(0.98\pm 0.01\) (\(0.96\pm 0.01\))

\(0.95\pm 0.00\) (\(0.86\pm 0.00\))

\(0.94\pm 0.01\) (\(0.86\pm 0.02\))

0.05

\(0.96\pm 0.01\) (\(0.94\pm 0.01\))

\(0.91\pm 0.00\) (\(0.83\pm 0.02\))

\(0.91\pm 0.01\) (\(0.81\pm 0.01\))

0.10

\(0.93\pm 0.00\) (\(0.91\pm 0.01\))

\(0.88\pm 0.00\) (\(0.77\pm 0.02\))

\(0.88\pm 0.01\) (\(0.75\pm 0.01\))

0.20

\(0.88\pm 0.01\) (\(0.86\pm 0.01\))

\(0.88\pm 0.01\) (\(0.86\pm 0.01\))

\(0.82\pm 0.01\) (\(0.68\pm 0.02\))

0.50

\(0.74\pm 0.02\) (\(0.66\pm 0.01\))

\(0.64\pm 0.01\) (\(0.49\pm 0.03\))

\(0.60\pm 0.03\) (\(0.44\pm 0.01\))

  1. In addition, the statistics for the model trained on the non-shuffled data is shown (–). The metrics reported are the accuracies and, shown in parentheses, the F-Scores. Runtimes and energy usage is identical to those reported in the main text