Skip to main content

Table 1 Mean absolute errors for atomisation energies \(U_0\) in kcal/mol, HOMO and LUMO energies (in eV) for several models Kernel Ridge regression (KRR), Elastic Net (EN), Gaussian process regression (KRR), and neural networks (NN) reported in the literature (from oldest to most recent)

From: Dataset’s chemical diversity limits the generalizability of machine learning predictions

ReferencesML method/descriptorDataset(Training–Test sizes)\(U_0\)HOMOLUMO
Rupp [12]KRR/CMQM7(7000–165)10.0
Montavon [21]multitask NNQM7b(CV 5000–2211)3.70.150.13
Hansen [14]KRR/BoBQM7(CV 5732–1433)1.5
Huang [16]KRR/BoBQM7b(5011–2200)1.80.150.16
Huang [16]KRR/BAMLQM7b(5011–2200)1.20.100.11
Faber [17]EN/CMQM9(CV 118k–13k)21.00.340.63
Faber [17]EN/BoBQM9(CV 118k–13k)13.90.280.52
Faber [17]KRR/CMQM9(CV 118k–13k)3.00.130.18
Faber [17]KRR/BoBQM9(CV 118k–13k)1.50.090.12
Faber [17]KRR/BAMLQM9(CV 118k–13k)1.20.090.12
Bartók [19]GPR/SOAP-GAPQM7b(5411–1800)0.40
Bartók [19]GPR/SOAP-GAPQM9(100k–31k)0.28
Gilmer [23]NMP NNQM9(120k–10k)0.450.040.04
Smith [22]ANI-1 NNANI(13.7M–1.7M)<1.5
Hou [26]multitask NNQM9(119k–13k)44.00.380.63
Schütt [24]SchNet NNQM9(CV 110k–10k)0.320.040.03
Lubbers [27]HIP-NNQM9(CV 110k–20k)0.26
Unke [28]HDNNQM9(CV 100k–30k)0.41
Willatt [30]KRR/SOAPQM9(CV 100k–30k)0.14
Unke [2]PhysNet NNQM9(CV 110k–20k)0.14
  1. CV denotes a cross validation procedure. Since NN descriptors can be quite complex, they have been omitted