Skip to main content

Table 4 Average mean squared error on the test set of the best-performing model for each representation and dataset

From: Extended study on atomic featurization in graph neural networks for molecular property prediction

Representation

Rat \(\downarrow\)

Human \(\downarrow\)

QM9 \(\downarrow\)

ESOL (random) \(\downarrow\)

ESOL (scaffold) \(\downarrow\)

F

A

0.182

0.214

0.218

0.246

9.193

26.369

0.118

0.159

0.166

0.235

A + N

A + H

A + C

A + R

A + A

0.188

0.196

0.215

0.194

0.203

0.225

0.248

0.246

0.235

0.241

46.386

41.047

52.825

89.794

27.365

0.113

0.131

0.174

0.115

0.187

0.242

0.215

0.229

0.237

0.212

F-N

F-H

F-C

F-R

F-A

0.200

0.183

0.180

0.181

0.178

0.220

0.220

0.213

0.223

0.216

39.243

60.035

9.698

8.278

23.786

0.190

0.113

0.123

0.119

0.120

0.189

0.202

0.201

0.185

0.221

Tree-based baseline

0.207

0.235

699.125

0.432

0.801

XGBoost baseline

0.216

0.233

803.153

0.483

0.452

  1. Two baselines based on the ECFP fingerprints are included. The best results are in bold. The error variance is below 0.001 for all datasets excluding QM9 and thus is not reported. Graph models perform better when trained with representations that include more features and usually outperform baseline models trained on traditional fingerprints