Skip to main content

Table 1 Summary of the issues that prevented COD CIFs from being successfully processed in the chemical description derivation pipeline. The three rows in bold describe the part of the dataset that was successfully processed, the part of the dataset that was excluded from further calculations, and the overall input dataset

From: A workflow for deriving chemical entities from crystallographic data and its application to the Crystallography Open Database

Status

Entry count

% of all entries

Successfully processed

322 776

68.17

Excluded due to one of the following reasons:

150 724

31.83

  Describes polymers

106 622

22.52

  Contains steric clashes (“atomic bumps”)

23 668

5.00

  Contains atoms with exceeded valency

15 438

3.26

  Fails when processed by cif_molecule

4 313

0.91

  Exceeds MDL Molfile V2000 limitations

680

0.14

  Exceeds allocated CPU time

2

0.00

  Raises uncategorised errors

1

0.00

Total

473 500

100.00