Skip to main content

Table 3 Selection of functional groups detected by Checkmol with their corresponding number of molecules in PC9 and QM9 datasets and MAE in generalization conditions

From: Dataset’s chemical diversity limits the generalizability of machine learning predictions

Selection

Functional group classes

PC9 data

QM9 data

Occurrences

QM9 model

Occurrences

PC9 model

CF and NN bonds

Azide

358

11.5

0

Azo compound

272

5.3

10

4.1

Acyl fluoride

105

6.4

0

Aryl fluoride

936

9.4

1562

4.1

Alkyl fluoride

4576

6.8

52

4.3

Abundant in QM9

Carbonitrile

4624

5.4

10315

4.7

Secondary alcohol

6282

6.2

10668

4.3

Trialkylamine

3301

6.4

10687

4.2

Alkyne

3906

6.1

10873

4.5

Tertiary amine

3388

6.4

11057

4.2

Aromatic compound

12728

8.2

15863

4.3

Dialkyl ether

9275

6.2

24012

4.3

Heterocyclic compound

42665

7.0

61904

4.3

QM9 model focus

Hydroperoxide

717

45.4

0

Diaryl ether

22

39.1

0

Peroxide

430

32.7

0

 

Diarylamine

11

32.6

0

Carbamic acid halide

16

24.4

0

Nitrite

1

22.7

0

Hydroxamic acid

46

19.4

0

Nitroso compound

48

17.0

0

Hydroxylamine

805

16.2

13

3.4

Hemiacetal

819

2.8

0

PC9 model focus

Enol ether

1977

4.4

2

16.0

Amide acetal

59

14.2

28

7.1

Carboxylci acid

4213

6.6

106

6.2

Hemiaminal

3034

6.5

207

6.0

Acyl cyanide

67

4.0

281

6.0

  1. The selection italic CF and NN interactions, the most prominent groups in QM9, the largest MAE of the model trained on QM9 and the largest MAE of the model trained on PC9