Skip to main content

Table 1 Scaffolds and IFG statistics for the datasets at different stages of the workflow

From: Scalable estimator of the diversity for de novo molecular generation resulting in a more robust QM dataset (OD9) and a more efficient molecular optimization

Dataset

Size

Distinct generic scaffolds

Distinct scaffolds

Acyclic graphs

Distinct IFG

QM9 step 3

122,227

1964

14,060

12,615

6981

PC9 step 3

77,790

2772

6566

31,542

13,887

OD9_0 step 3

184,158

3798

18,850

40,103

20,075

OD9_1 step 1

1,023,624

9163

460,978

28,725

461,247

OD9_1 step 2

854,059

108,832

334,256

66,078

428,136

OD9_1 step 3

250,874

4858

88,094

15,956

124,396

OD9 step 1

1,276,171

12,929

480,464

90,965

482,009

OD9 step 2

1,088,773

109,573

351,845

122,771

446,367

OD9 step 3

435,032

6776

104,529

56,059

141,090