From: Structural diversity of biologically interesting datasets: a scaffold analysis approach
Datasets | Number of molecules | Clustered dataset | Reference | |
---|---|---|---|---|
Drugs | DrugBank | 1372 | 3788 | [22] |
 | KEGG drugs | 7057 |  | [23] |
Metabolites | HMDB | 7888 | 6124, 2072* | [24] |
 | HumanCYC | 984 |  | [25] |
 | BiGG | 730 |  | [26] |
Toxics | DSSTox | 582 | 2166 | [27] |
 | FDA Carcinogenicity | 125 |  | [28] |
 | ITER | 514 |  | [30] |
 | SuperToxic | 1097 |  | [31] |
NPs | ZINC NP database | 89425 | 61972 | [32] |
Leads | BioNET | 42699 | 67983 | [33] |
 | Maybridge | 60550 |  | [34] |
NCI | NCI database | 260071 | 161336 | [39] |
ChEMBL | ChEMBL dataset | 600625 | 379827 | [36] |