Fig. 5From: Improving the quality of chemical language model outcomes with atom-in-SMILES tokenizationPerformance of atom-wise (blue) and atom-in-SMILES (purple) tokenization schemes tested on various restricted GDB-13 test sets [33]. a Test results of \(\times\)10 augmented training set. b Model overview. c Test results of \(\times\)50 augmented training set. The training is conducted with one million randomly sampled molecules taken from the GDB-13, combined with 150K randomly sampled subset of the strictest cumulative abcdefgh data, which we augmented at different levels (\(\times\)10, \(\times\)30, and \(\times\)50)Back to article page