From: Dataset’s chemical diversity limits the generalizability of machine learning predictions
References | ML method/descriptor | Dataset | (Training–Test sizes) | \(U_0\) | HOMO | LUMO |
---|---|---|---|---|---|---|
Rupp [12] | KRR/CM | QM7 | (7000–165) | 10.0 | – | – |
Montavon [21] | multitask NN | QM7b | (CV 5000–2211) | 3.7 | 0.15 | 0.13 |
Hansen [14] | KRR/BoB | QM7 | (CV 5732–1433) | 1.5 | – | – |
Huang [16] | KRR/BoB | QM7b | (5011–2200) | 1.8 | 0.15 | 0.16 |
Huang [16] | KRR/BAML | QM7b | (5011–2200) | 1.2 | 0.10 | 0.11 |
Faber [17] | EN/CM | QM9 | (CV 118k–13k) | 21.0 | 0.34 | 0.63 |
Faber [17] | EN/BoB | QM9 | (CV 118k–13k) | 13.9 | 0.28 | 0.52 |
Faber [17] | KRR/CM | QM9 | (CV 118k–13k) | 3.0 | 0.13 | 0.18 |
Faber [17] | KRR/BoB | QM9 | (CV 118k–13k) | 1.5 | 0.09 | 0.12 |
Faber [17] | KRR/BAML | QM9 | (CV 118k–13k) | 1.2 | 0.09 | 0.12 |
Bartók [19] | GPR/SOAP-GAP | QM7b | (5411–1800) | 0.40 | – | – |
Bartók [19] | GPR/SOAP-GAP | QM9 | (100k–31k) | 0.28 | – | – |
Gilmer [23] | NMP NN | QM9 | (120k–10k) | 0.45 | 0.04 | 0.04 |
Smith [22] | ANI-1 NN | ANI | (13.7M–1.7M) | <1.5 | – | – |
Hou [26] | multitask NN | QM9 | (119k–13k) | 44.0 | 0.38 | 0.63 |
Schütt [24] | SchNet NN | QM9 | (CV 110k–10k) | 0.32 | 0.04 | 0.03 |
Lubbers [27] | HIP-NN | QM9 | (CV 110k–20k) | 0.26 | – | – |
Unke [28] | HDNN | QM9 | (CV 100k–30k) | 0.41 | – | – |
Willatt [30] | KRR/SOAP | QM9 | (CV 100k–30k) | 0.14 | – | – |
Unke [2] | PhysNet NN | QM9 | (CV 110k–20k) | 0.14 | – | – |