Journal of Cheminformatics

Table 1 Comparison of the important aspects of natural and chemical languages within the NLP framework

From: Improving the quality of chemical language model outcomes with atom-in-SMILES tokenization

Aspects	Natural language	SMILES language
Sequence length	15-20 words	\(\sim\) 3 times higher
Token space	>100K	\(\sim\) 1000 times smaller
Token order	Tone, meaning, fluency	\({}_n C_{2}\) alternatives^*
Meaning-wise	isolation \(\equiv\) context	isolation \(\equiv\) context

*practically less due to the rules of chemistry

Back to article page

ISSN: 1758-2946

Contact us

Submission enquiries: journalsubmissions@springernature.com