Could graph neural networks learn better molecular representation for drug discovery? A comparison study of descriptor-based and graph-based models

Table 2 The detailed information of the datasets used in this study

Datasets	Task Type	Compounds	Tasks	Metric	Descriptions
ESOL	Regression	1127	1	RMSE	Water solubility for organic small molecules
FreeSolv	Regression	639	1	RMSE	Hydration free energy of small molecules in water
Lipop	Regression	4200	1	RMSE	Octanol/water distribution coefficient (logD at pH = 7.4)
HIV	Classification	40748	1	AUC-ROC	Inhibition to HIV replication
BACE	Classification	1513	1	AUC-ROC	Inhibition to human β-secretase 1 (BACE-1)
BBBP	Classification	2035	1	AUC-ROC	Binary labels of blood–brain barrier penetration
ClinTox	Classification	1475	2	AUC-ROC	Qualitative data of drugs approved by the FDA and those that have failed clinical trials for toxicity reasons
SIDER	Classification	1366	27	AUC-ROC	Database of marketed drugs and adverse drug reactions (ADR), grouped into 27 system organ classes
Tox21	Classification	7811	12	AUC-ROC	Qualitative toxicity measurements on 12 biological targets, including nuclear receptors and stress response pathways
ToxCast	Classification	8539	182	AUC-ROC	Toxicology data for a large library of compounds based on in vitro high-throughput screening, including experiments on over 600 tasks
MUV	Classification	93087	17	AUC-PRC	Subset of PubChem BioAssay by applying a refined nearest neighbor analysis, designed for the validation of virtual screening techniques

ISSN: 1758-2946