Skip to main content

Table 1 Comparison of fixed, generated, and training set molecules

From: UnCorrupt SMILES: a novel approach to de novo design

Case – reference

Uniquenessa

Novelty

Similarity

KL divergence

SNN

Fragment

Scaffold

Fixed—generated

0.97 ± 0.05

0.97 ± 0.06

0.45 ± 0.09

0.93 ± 0.10

0.49 ± 0.31

0.77 ± 0.15

Fixed—train

1.00 ± 0.00

0.41 ± 0.02

0.92 ± 0.12

0.30 ± 0.06

0.80 ± 0.23

Generated—train

1.00 ± 0.00

1.00 ± 0.00

0.39 ± 0.10

0.88 ± 0.18

0.40 ± 0.17

0.75 ± 0.28

  1. The mean ± standard deviation for the 3 general generative model case studies is given. Uniqueness is the fraction of unique molecules in a sample of 10,000 valid molecules. Novelty is the fraction of 10,000 molecules that are not present in a sample of 100,000 from the reference set. For the similarity metrics, 10,000 molecules were compared to 100,000 molecules from the reference set. SNN is the similarity to the nearest neighbor. Fragment and scaffold similarity are calculated by comparing the frequency distribution of different fragments or scaffolds compared to the reference set. KL divergence describes the similarity of the physicochemical property distributions of 10,000 molecules compared to the reference set
  2. aUniqueness is calculated for the fixed and generated set