From: Dataset’s chemical diversity limits the generalizability of machine learning predictions
Selection | Functional group classes | PC9 data | QM9 data | ||
---|---|---|---|---|---|
Occurrences | QM9 model | Occurrences | PC9 model | ||
CF and NN bonds | Azide | 358 | 11.5 | 0 | – |
Azo compound | 272 | 5.3 | 10 | 4.1 | |
Acyl fluoride | 105 | 6.4 | 0 | – | |
Aryl fluoride | 936 | 9.4 | 1562 | 4.1 | |
Alkyl fluoride | 4576 | 6.8 | 52 | 4.3 | |
Abundant in QM9 | Carbonitrile | 4624 | 5.4 | 10315 | 4.7 |
Secondary alcohol | 6282 | 6.2 | 10668 | 4.3 | |
Trialkylamine | 3301 | 6.4 | 10687 | 4.2 | |
Alkyne | 3906 | 6.1 | 10873 | 4.5 | |
Tertiary amine | 3388 | 6.4 | 11057 | 4.2 | |
Aromatic compound | 12728 | 8.2 | 15863 | 4.3 | |
Dialkyl ether | 9275 | 6.2 | 24012 | 4.3 | |
Heterocyclic compound | 42665 | 7.0 | 61904 | 4.3 | |
QM9 model focus | Hydroperoxide | 717 | 45.4 | 0 | – |
Diaryl ether | 22 | 39.1 | 0 | – | |
Peroxide | 430 | 32.7 | 0 | – | |
Diarylamine | 11 | 32.6 | 0 | – | |
Carbamic acid halide | 16 | 24.4 | 0 | – | |
Nitrite | 1 | 22.7 | 0 | – | |
Hydroxamic acid | 46 | 19.4 | 0 | – | |
Nitroso compound | 48 | 17.0 | 0 | – | |
Hydroxylamine | 805 | 16.2 | 13 | 3.4 | |
Hemiacetal | 819 | 2.8 | 0 | – | |
PC9 model focus | Enol ether | 1977 | 4.4 | 2 | 16.0 |
Amide acetal | 59 | 14.2 | 28 | 7.1 | |
Carboxylci acid | 4213 | 6.6 | 106 | 6.2 | |
Hemiaminal | 3034 | 6.5 | 207 | 6.0 | |
Acyl cyanide | 67 | 4.0 | 281 | 6.0 |