From: Improving the quality of chemical language model outcomes with atom-in-SMILES tokenization
 | SMILES | DeepSMILES | SELFIES | SmilesPE | AIS |
---|---|---|---|---|---|
Regression Datasets: RMSE | |||||
ESOL | 0.628 | 0.631 | 0.675 | 0.689 | 0.553 |
FreeSolv | 0.545 | 0.544 | 0.564 | 0.761 | 0.441 |
Lip | 0.924 | 0.895 | 0.938 | 0.800 | 0.683 |
Classification Datasets: ROC-AUC | |||||
BBBP | 0.758 | 0.777 | 0.799 | 0.847 | 0.885 |
BACE | 0.740 | 0.774 | 0.746 | 0.837 | 0.835 |
HIV | 0.649 | 0.648 | 0.653 | 0.739 | 0.729 |