Skip to main content

Table 4 Comparison of curated and OPSIN-derived SMILES

From: Using SMILES strings for the description of chemical connectivity in the Crystallography Open Database

Count

%

Type

19,475

64.68

Identical

1648

5.47

Missing description of configuration around double bonds in OPSIN

17

0.06

Different number of explicit H

602

2.00

Missing chirality information in OPSIN

49

0.16

Missing chirality information in this work

1130

3.75

Different representation of racemates

2474

8.22

Different representation of nitro groups

33

0.11

Different representation of other groups

667

2.22

Different charge settings

18

0.06

Different aromaticity settings

302

1.00

Different bond orders

66

0.22

Different representation of ionic compounds

25

0.08

Missing O moieties in OPSIN

94

0.31

Different connectivity

74

0.25

Different number of rings

954

3.17

Different number of moieties

229

0.76

Different configuration around double bonds

233

0.77

Different configurations at chiral centres

50

0.17

Missing moieties in OPSIN

17

0.06

Missing moieties in this work

166

0.55

Missing C atoms in OPSIN

87

0.29

Missing C atoms in this work

190

0.63

Different stoichiometry

342

1.14

Different chemical composition

1167

3.88

Reason different from those listed above

30,109

100.00

Total

  1. The discrepancies are listed in the table in increasing order of severity. Any entry displaying more than one discrepancy reason is included only in the category corresponding to the most serious discrepancy reason found (i.e., that closer to the bottom of the table)