Skip to main content

Table 3 Best models trained on subsets of GDB-13 after the hyperparameter optimization

From: Randomized SMILES strings improve the quality of molecular generative models

Set

SMILES

Time

% GDB-13

Valid

Unif

Comp

Closed

UCC

1M

Canonical

4:08

72.8

0.994

0.879

0.836

0.861

0.633

Rand. unr.

31:47

80.9

0.995

0.970

0.929

0.876

0.790

Rand. unr. no DA

1:37

77.0

0.987

0.957

0.795

0.883

0.672

Rand. rest.

7:19

83.0

0.999

0.977

0.953

0.925

0.860

Rand. rest. no DA

1:21

78.2

0.992

0.957

0.829

0.898

0.712

DS branch

1:33

72.1

0.987

0.881

0.828

0.834

0.608

DS rings

1:11

68.6

0.979

0.852

0.788

0.798

0.535

DS both

1:05

68.4

0.979

0.851

0.785

0.796

0.532

10K

Canonical

0:04

38.8

0.905

0.666

0.445

0.426

0.126

Rand. rest.

0:36

62.3

0.974

0.882

0.715

0.598

0.377

1K

Canonical

0:01

14.5

0.504

0.611

0.167

0.133

0.014

Rand. rest.

0:04

34.1

0.812

0.790

0.392

0.276

0.085

  1. See “Methods” section for a description of the ratios
  2. Best result for each training set size are indicated in italics
  3. Set Benchmark training set size, SMILES SMILES variant, including randomized variants with and without data augmentation (DA), Time training time up in hh:mm, % GDB-13 Percent of unique molecules from GDB-13 generated in a 2 billion sample with replacement, Valid valid SMILES, Unif uniformity ratio, Comp completeness ratio, Closed closedness ratio, UCC UCC ratio