Skip to main content

Table 3 Best models trained on subsets of GDB-13 after the hyperparameter optimization

From: Randomized SMILES strings improve the quality of molecular generative models

SetSMILESTime% GDB-13ValidUnifCompClosedUCC
1MCanonical4:0872.80.9940.8790.8360.8610.633
Rand. unr.31:4780.90.9950.9700.9290.8760.790
Rand. unr. no DA1:3777.00.9870.9570.7950.8830.672
Rand. rest.7:1983.00.9990.9770.9530.9250.860
Rand. rest. no DA1:2178.20.9920.9570.8290.8980.712
DS branch1:3372.10.9870.8810.8280.8340.608
DS rings1:1168.60.9790.8520.7880.7980.535
DS both1:0568.40.9790.8510.7850.7960.532
10KCanonical0:0438.80.9050.6660.4450.4260.126
Rand. rest.0:3662.30.9740.8820.7150.5980.377
1KCanonical0:0114.50.5040.6110.1670.1330.014
Rand. rest.0:0434.10.8120.7900.3920.2760.085
  1. See “Methods” section for a description of the ratios
  2. Best result for each training set size are indicated in italics
  3. Set Benchmark training set size, SMILES SMILES variant, including randomized variants with and without data augmentation (DA), Time training time up in hh:mm, % GDB-13 Percent of unique molecules from GDB-13 generated in a 2 billion sample with replacement, Valid valid SMILES, Unif uniformity ratio, Comp completeness ratio, Closed closedness ratio, UCC UCC ratio