Table 1 Compound sets used in synthetic accessibility assessment

From: Profiling and analysis of chemical compounds using pointwise mutual information

Compound set Type Number of compounds
nonpher HS 693,353
savi ES 610,245
scubidoo ES 999,794
zinc_random ES 693,353
nonpher_complex HS 161
savi_complex HS 2930
scubidoo_complex HS 104
gdb_complex HS 3581
  1. ES compounds are easy to synthesize, HS compounds are hard to synthesize. The nonpher compound set corresponds to the S- data set from the SYBA publications [37] in which its construction is described in a detail. savi compounds form the alpha version of the Synthetically Accessible Virtual Inventory (SAVI) Database [38, 39] released on July 2015. scubidoo compounds form the L representative sample of the Screenable Chemical Universe Based on Intuitive Data OrganizatiOn (SCUBIDOO) database [40]. zinc_random compounds are randomly selected from the ZINC15 database [25] and their molecular weight distribution is the same as in the nonpher compound set. The zinc_random compound set corresponds to the S+ data set in the SYBA publication [37]. Compounds in _complex sets exceed four complexity thresholds, given by Bertz [41], Whitlock [42], BC [43] and SMCM [44] indices, at once