From: Improving the quality of chemical language model outcomes with atom-in-SMILES tokenization
Tokenization schemes | \(\text {rep-l}|_{P} - \text {rep-l}|_{GT} \ge 2\) | Acc.(%) greedy | |
---|---|---|---|
String exact | Tc exact | ||
Atom-wise baseline [57] | – | 42.00 | – |
Atom-wise (ref. [57] is reproduced) | 801 | 42.05 | 44.72 |
SmilesPE (ref. [21]) | 821 | 19.82 | 22.74 |
SELFIES (ref. [17]) | 886 | 28.82 | 30.76 |
DeepSMILES (ref. [16]) | 902 | 38.63 | 41.20 |
Atom-in-SMILES | 727 | 46.32 | 47.62 |