From: Exploring the GDB-13 chemical space using deep generative models

Plots of the frequency (left y axis) and the percent in database (right y axis) of 1 and 2-g in the canonical smiles of all GDB-13 molecules. The plot is sorted by the percentage present in the database. a Plot with the 1-g (tokens). In blue the mean frequency and in orange the percent of 1-g in database. Notice that the numeric tokens have been highlighted in red. b Plot with the 2-g mean frequency (blue) and percent (dashed orange). As the number of 2-g is too large (287), the x axis has been intentionally left blank and the mean frequency has been smoothed by an average window function size 8

