Skip to main content
Fig. 7 | Journal of Cheminformatics

Fig. 7

From: On the difficulty of validating molecular generative models realistically: a case study on public and proprietary data

Fig. 7

Average of single nearest neighbour similarity (aSNN) between generated and middle/late stage’s test compounds. The aSNN between generated compounds from all projects in reinforcement learning (RL) for (a, d) all 5,000 compounds generated, for (b, e) the highest-scored 500 compounds by an in silico classification model, and for the (c, f) highest-scored 100 scored compounds by an in silico classification model to the real compounds in middle (a to c) or late (d to f) stage are shown. From a to c, it can be seen that activity model selection generally increases aSNN, with the magnitude of the effect widely varying across projects, from d to f, generally speaking, values are lower than in a to c (for middle-stage compounds), and hence long-term compound evolution is much more difficult to model than short-term compound evolution. The cut-off values of aSNN considered similar was set to be 0.3

Back to article page