Skip to main content
Fig. 5 | Journal of Cheminformatics

Fig. 5

From: Improving the quality of chemical language model outcomes with atom-in-SMILES tokenization

Fig. 5

Performance of atom-wise (blue) and atom-in-SMILES (purple) tokenization schemes tested on various restricted GDB-13 test sets [33]. a Test results of \(\times\)10 augmented training set. b Model overview. c Test results of \(\times\)50 augmented training set. The training is conducted with one million randomly sampled molecules taken from the GDB-13, combined with 150K randomly sampled subset of the strictest cumulative abcdefgh data, which we augmented at different levels (\(\times\)10, \(\times\)30, and \(\times\)50)

Back to article page