Skip to main content

Table 2 Summary of the chemical description validation results. Since the same description may manifest issues from more than one category, the total number of invalid entries is less than the sum of entries from all issue categories. The three rows in bold describe the part of the dataset that successfully passed all validation checks, the part of the dataset that failed at least one validation check, and the overall dataset used in the validation

From: A workflow for deriving chemical entities from crystallographic data and its application to the Crystallography Open Database

Validation status

Entry count

% of all entries

Successfully validated

236 328

73.22

Failed one or more checks related to:

86 448

26.78

  Chemical structure

57 504

17.82

  Chemical formula

46 307

14.35

  Compositional disorder

8 314

2.58

  Data provenance

4 653

1.44

  COD entry markup

3 084

0.96

  Manual review

406

0.13

Total

322 776

100.00