From: Improving the quality of chemical language model outcomes with atom-in-SMILES tokenization
GDB-13 subsets [33] (cumulative) | Prediction accuracy (%) | |||||
---|---|---|---|---|---|---|
Atom-wise | Atom-in-SMILES | |||||
x10 | x30 | x50 | x10 | x30 | x50 | |
ab | 34.2 | 34.3 | 33.2 | 37.3 | 35.9 | 34.1 |
abc | 31.0 | 30.8 | 29.6 | 33.7 | 32.1 | 30.4 |
abcd | 30.8 | 30.4 | 29.2 | 34.3 | 32.3 | 30.5 |
abcde | 48.7 | 47.6 | 45.5 | 53.6 | 50.0 | 47.0 |
abcdef | 41.8 | 40.6 | 39.1 | 52.5 | 49.6 | 46.9 |
abcdefg | 50.9 | 50.9 | 50.0 | 59.9 | 58.6 | 56.8 |