Skip to main content

Table 1 Scaffolds and IFG statistics for the datasets at different stages of the workflow

From: Scalable estimator of the diversity for de novo molecular generation resulting in a more robust QM dataset (OD9) and a more efficient molecular optimization

Dataset Size Distinct generic scaffolds Distinct scaffolds Acyclic graphs Distinct IFG
QM9 step 3 122,227 1964 14,060 12,615 6981
PC9 step 3 77,790 2772 6566 31,542 13,887
OD9_0 step 3 184,158 3798 18,850 40,103 20,075
OD9_1 step 1 1,023,624 9163 460,978 28,725 461,247
OD9_1 step 2 854,059 108,832 334,256 66,078 428,136
OD9_1 step 3 250,874 4858 88,094 15,956 124,396
OD9 step 1 1,276,171 12,929 480,464 90,965 482,009
OD9 step 2 1,088,773 109,573 351,845 122,771 446,367
OD9 step 3 435,032 6776 104,529 56,059 141,090