Skip to main content

Table 5 Databases used in this study

From: Structural diversity of biologically interesting datasets: a scaffold analysis approach

Datasets

Number of molecules

Clustered dataset

Reference

Drugs

DrugBank

1372

3788

[22]

 

KEGG drugs

7057

 

[23]

Metabolites

HMDB

7888

6124, 2072*

[24]

 

HumanCYC

984

 

[25]

 

BiGG

730

 

[26]

Toxics

DSSTox

582

2166

[27]

 

FDA Carcinogenicity

125

 

[28]

 

ITER

514

 

[30]

 

SuperToxic

1097

 

[31]

NPs

ZINC NP database

89425

61972

[32]

Leads

BioNET

42699

67983

[33]

 

Maybridge

60550

 

[34]

NCI

NCI database

260071

161336

[39]

ChEMBL

ChEMBL dataset

600625

379827

[36]

  1. *Metabolite dataset excluding lipids and large molecules (details in the Methods section).