Skip to main content

Table 1 Training and validation set sizes for the different benchmarks

From: Randomized SMILES strings improve the quality of molecular generative models

Model Training set size Validation set size
GDB-13 1M 1,000,000 10,000
GDB-13 10K 10,000 1000
GDB-13 1K 1000 1000
ChEMBL 1,483,943 78,102
  1. Notice that depending on the expected size of the target chemical space and the total amount of molecules, different ratios have been used