Skip to main content

Advertisement

Table 3 Selection of functional groups detected by Checkmol with their corresponding number of molecules in PC9 and QM9 datasets and MAE in generalization conditions

From: Dataset’s chemical diversity limits the generalizability of machine learning predictions

SelectionFunctional group classesPC9 dataQM9 data
OccurrencesQM9 modelOccurrencesPC9 model
CF and NN bondsAzide35811.50
Azo compound2725.3104.1
Acyl fluoride1056.40
Aryl fluoride9369.415624.1
Alkyl fluoride45766.8524.3
Abundant in QM9Carbonitrile46245.4103154.7
Secondary alcohol62826.2106684.3
Trialkylamine33016.4106874.2
Alkyne39066.1108734.5
Tertiary amine33886.4110574.2
Aromatic compound127288.2158634.3
Dialkyl ether92756.2240124.3
Heterocyclic compound426657.0619044.3
QM9 model focusHydroperoxide71745.40
Diaryl ether2239.10
Peroxide43032.70
 Diarylamine1132.60
Carbamic acid halide1624.40
Nitrite122.70
Hydroxamic acid4619.40
Nitroso compound4817.00
Hydroxylamine80516.2133.4
Hemiacetal8192.80
PC9 model focusEnol ether19774.4216.0
Amide acetal5914.2287.1
Carboxylci acid42136.61066.2
Hemiaminal30346.52076.0
Acyl cyanide674.02816.0
  1. The selection italic CF and NN interactions, the most prominent groups in QM9, the largest MAE of the model trained on QM9 and the largest MAE of the model trained on PC9