Skip to main content

Table 1 Fragmentation scheme developed in this work for the published UNIFAC groups and the respective pattern described used for sorting

From: Flexible heuristic algorithm for automatic molecule fragmentation: application to the UNIFAC group contribution model

Group information

Descriptors

Number

Name

SMILES

1

2

3

4

5

6

7

8

1

CH3

[CH3;X4]

False

False

1

True

0

False

0

0

2

CH2

[CH2;X4]

False

False

1

False

0

False

0

0

3

CH

[CH1;X4]

False

False

1

False

0

False

0

0

4

C

[CH0;X4], [CH0;X3]

False

False

1

False

0

False

0

0

5

CH2=CH

[CH2]=[CH]

False

False

2

True

0

False

0

1

6

CH=CH

[CH]=[CH]

False

False

2

False

0

False

0

1

7

CH2=C

[CH2]=[C], [CH2]=[c]

False

False

2

False

0

False

0

1

8

CH=C

[CH]=[CH0], [CH]=[cH0]

False

False

2

False

0

False

0

1

9

ACH

[cH]

False

False

1

False

0

True

0

0

10

AC

[cH0]

False

False

1

False

0

True

0

0

11

ACCH3

[c][CH3;X4]

False

False

2

False

0

True

0

0

12

ACCH2

[c][CH2;X4]

False

False

2

False

0

True

0

0

13

ACCH

[c][CH;X4]

False

False

2

False

0

True

0

0

14

OH

[OH]

False

False

1

True

1

False

0

0

15

CH3OH

[CH3][OH]

True

False

2

False

1

False

0

0

16

H2O

[OH2]

True

False

1

False

1

False

0

0

17

ACOH

[c][OH]

False

False

2

False

1

True

0

0

18

CH3CO

[CH3][CH0]=O

False

False

3

True

1

False

0

1

19

CH2CO

[CH2][CH0]=O

False

False

3

False

1

False

0

1

20

CH=O

[CH]=O

False

False

2

True

1

False

0

1

21

CH3COO

[CH3]C(=O)[OH0]

False

False

4

True

2

False

0

1

22

CH2COO

[CH2]C(=O)[OH0]

False

False

4

False

2

False

0

1

23

HCOO

[CH](=O)[OH0]

False

False

3

True

2

False

0

1

24

CH3O

[CH3][OH0]

False

False

2

True

1

False

0

0

25

CH2O

[CH2][OH0]

False

False

2

False

1

False

0

0

26

CHO

[CH][OH0]

False

False

2

False

1

False

0

0

27

THF

[CH2;R][OH0]

False

False

2

False

1

True

0

0

28

CH3NH2

[CH3][NH2]

True

False

2

False

1

False

0

0

29

CH2NH2

[CH2][NH2]

False

False

2

True

1

False

0

0

30

CHNH2

[CH][NH2]

False

False

2

False

1

False

0

0

31

CH3NH

[CH3][NH]

False

False

2

True

1

False

0

0

32

CH2NH

[CH2][NH]

False

False

2

False

1

False

0

0

33

CHNH

[CH][NH]

False

False

2

False

1

False

0

0

34

CH3N

[CH3][N], [CH3][n]

False

False

2

False

1

False

0

0

35

CH2N

[CH2][N]

False

False

2

False

1

False

0

0

36

ACNH2

[c][NH2]

False

False

2

False

1

True

0

0

37

C5H5N

n1[cH][cH][cH][cH][cH]1

True

False

6

False

1

True

0

0

38

C5H4N

n1[c][cH][cH][cH][cH]1, n1[cH][c][cH][cH][cH]1, n1[cH][cH][c][cH][cH]1

False

False

6

True

1

True

0

0

39

C5H3N

n1[c][c][cH][cH][cH]1, n1[c][cH][c][cH][cH]1, n1[c][cH][cH][c][cH]1, n1[c][cH][cH][cH][c]1, n1[cH][c][c][cH][cH]1, n1[cH][c][cH][c][cH]1

False

False

6

False

1

True

0

0

40

CH3CN

[CH3]C#N

True

False

3

False

1

False

1

0

41

CH2CN

[CH2]C#N

False

False

3

True

1

False

1

0

42

COOH

C(=O)[OH]

False

False

3

True

2

False

0

1

43

HCOOH

[CH](=O)[OH]

True

False

3

False

2

False

0

1

44

CH2Cl

[CH2]Cl

False

True

2

True

1

False

0

0

45

CHCl

[CH]Cl

False

True

2

False

1

False

0

0

46

CCl

[CH0]Cl

False

True

2

False

1

False

0

0

47

CH2Cl2

[CH2](Cl)Cl

True

False

3

False

2

False

0

0

48

CHCl2

[CH](Cl)Cl

False

True

3

True

2

False

0

0

49

CCl2

C(Cl)Cl

False

True

3

False

2

False

0

0

50

CHCl3

[CH](Cl)(Cl)Cl

True

False

4

False

3

False

0

0

51

CCl3

C(Cl)(Cl)(Cl)

False

True

4

True

3

False

0

0

52

CCl4

C(Cl)(Cl)(Cl)(Cl)

True

False

5

False

4

False

0

0

53

ACCl

[c]Cl

False

True

2

False

1

True

0

0

54

CH3NO2

[CH3][N+](=O)[O−]

False

False

4

True

3

False

0

1

55

CH2NO2

[CH2][N+](=O)[O−]

False

False

4

False

3

False

0

1

56

CHNO2

[CH][N+](=O)[O−]

False

False

4

False

3

False

0

1

57

ACNO2

[c][N+](=O)[O−]

False

False

4

False

3

True

0

1

58

CS2

C(=S)=S

True

False

3

False

2

False

0

2

59

CH3SH

[CH3][SH]

True

False

2

False

1

False

0

0

60

CH2SH

[CH2][SH]

False

False

2

True

1

False

0

0

61

Furfural

O=[CH]c1[cH][cH][cH]o1

True

False

7

False

2

True

0

1

62

DOH

[OH][CH2][CH2][OH]

True

False

4

False

2

False

0

0

63

I

[IH0]

False

True

1

True

1

False

0

0

64

Br

[BrH0]

False

True

1

True

1

False

0

0

65

CH#C

[CH]#C

False

False

2

True

0

False

1

0

66

C#C

C#C

False

False

2

False

0

False

1

0

67

DMSO

[CH3]S(=O)[CH3]

True

False

4

False

2

False

0

1

68

ACRY

[CH2]=[CH1][C]#N

True

False

4

False

1

False

1

1

69

Cl(C=C)

[$(Cl[C]=[C])]

False

True

3

True

1

False

0

0

70

C=C

[CH0]=[CH0]

False

False

2

False

0

False

0

1

71

ACF

[c]F

False

True

2

False

1

True

0

0

72

DMF

[CH](=O)N([CH3])[CH3]

True

False

5

False

2

False

0

1

73

HCON(CH2)2

[CH](=O)N([CH2])[CH2], [CH](=O)N([CH2])[CH3]

False

False

5

False

2

False

0

1

74

CF3

C(F)(F)F

False

True

4

True

3

False

0

0

75

CF2

C(F)F

False

True

3

False

2

False

0

0

76

CF

[C]F

False

True

2

False

1

False

0

0

77

COO

[CH0](=O)[OH0], [cH0](=O)[oH0]

False

False

3

False

2

False

0

1

78

SiH3

[SiH3]

False

False

1

True

1

False

0

0

79

SiH2

[SiH2]

False

False

1

False

1

False

0

0

80

SiH

[SiH]

False

False

1

False

1

False

0

0

81

Si

[Si]

False

False

1

False

1

False

0

0

82

SiH2O

[SiH2][OH0]

False

False

2

False

2

False

0

0

83

SiHO

[SiH][OH0]

False

False

2

False

2

False

0

0

84

SiO

[Si][OH0]

False

False

2

False

2

False

0

0

85

NMP

[CH3]N1[CH2][CH2][CH2]C(=O)1

True

False

7

False

2

False

0

1

86

CCl3F

C(Cl)(Cl)(Cl)F

True

False

5

False

4

False

0

0

87

CCl2F

C(Cl)(Cl)F

False

True

4

True

3

False

0

0

88

HCCl2F

[CH](Cl)(Cl)F

True

False

4

False

3

False

0

0

89

HCClF

[CH](Cl)F

False

True

3

True

2

False

0

0

90

CClF2

C(Cl)(F)F

False

True

4

True

3

False

0

0

91

HCClF2

[CH](Cl)(F)F

True

False

4

False

3

False

0

0

92

CClF3

C(Cl)(F)(F)F

True

False

5

False

4

False

0

0

93

CCl2F2

C(Cl)(Cl)(F)F

True

False

5

False

4

False

0

0

94

CONH2

C(=O)[NH2]

False

False

3

True

2

False

0

1

95

CONHCH3

C(=O)[NH][CH3]

False

False

4

True

2

False

0

1

96

CONHCH2

C(=O)[NH][CH2]

False

False

4

False

2

False

0

1

97

CON(CH3)2

C(=O)N([CH3])[CH3]

False

False

5

True

2

False

0

1

98

CONCH3CH2

C(=O)N([CH3])[CH2]

False

False

5

False

2

False

0

1

99

CON(CH2)2

C(=O)N([CH2])[CH2]

False

False

5

False

2

False

0

1

100

C2H5O2

[OH0;!$(OC=O);!R][CH2;!R][CH2;!R][OH]

False

False

4

True

2

False

0

0

101

C2H4O2

[OH0;!$(OC=O);!R][CH;!R][CH2;!R][OH], [OH0;!$(OC=O);!R][CH2;!R][CH;!R][OH]

False

False

4

False

2

False

0

0

102

CH3S

[CH3]S

False

False

2

True

1

False

0

0

103

CH2S

[CH2]S

False

False

2

False

1

False

0

0

104

CHS

[CH]S

False

False

2

False

1

False

0

0

105

MORPH

[CH2]1[CH2][NH][CH2][CH2]O1

True

False

6

False

2

False

0

0

106

C4H4S

[cH]1[cH][s;X2][cH][cH]1

True

False

5

False

1

True

0

0

107

C4H3S

[c]1[cH][s;X2][cH][cH]1, [cH]1[c][s;X2][cH][cH]1

False

False

5

True

1

True

0

0

108

C4H2S

[c]1[c][s;X2][cH][cH]1, [c]1[cH][s;X2][cH][c]1, [cH]1[c][s;X2][c][cH]1, [cH]1[c][s;X2][cH][c]1

False

False

5

False

1

True

0

0

109

NCO

N=C=O

False

False

3

True

2

False

0

2

118

(CH2)2SU

[CH2]S(=O)(=O)[CH2]

False

False

5

False

3

False

0

2

119

CH2CHSU

[CH2]S(=O)(=O)[CH]

False

False

5

False

3

False

0

2

  1. In the name of the group, AC stands for aromatic carbon atom. The names of the groups are based on the original UNIFAC names as described on their webpage [44]. If several patterns were employed to find one group, these are shown separated by a comma. The underlined patterns were added to improve the matching of the algorithm in comparison to the results of the reference database. The values of the descriptors for each group, as described in “Simple fragmentation” section, are also shown in this table. For sorting, the boolean descriptor values can be replace by integer values (True: 1, False: 0). Descriptors: 1: Whether the pattern has zero bonds 2: Whether the pattern is simple 3: Number of atoms defining the group. 4: Whether the number of available bonds is one: first the patterns with one bond, then patterns with more bonds. 5: Number of atoms in the pattern that are neither hydrogen nor carbon. 6: Whether the pattern includes atoms in a ring. 7: Number of triple bonds. 8: Number of double bonds