Skip to main content

Table 1 Extraction, upload and PubChem match statistics for US20120040982

From: Extracting and connecting chemical structures from text sources using chemicalize.org

Extraction source

 

Upload

Conv.

Fail

CIDs

SID:CID

CZ in PC

CZ-only

Full-text URL

n/a

1414

1364

34

1308

63

1252

52

Main examples (PDF)

497

486

468

0

462

2.1

457

16

Claims-only (PDF)

38

34

34

0

30

3.1

28

0

  1. The row results are derived from; 1) the FPO URL as shown in Figure 5 2) the main example section IUPAC names pasted out of “Description”, saved as a PDF and subsequently uploaded for document extraction, 3) the IUPAC names in the “Claims” section, similarly saved as a PDF for uploaded . The column results are; 1) the count of SMILES downloaded from chemicalize.org and subsequently uploaded, 2) successful PubChem conversions, 3) PubChem conversion failures, 4) PubChem CID exact matches, 5) substance: compound counts (i.e. the average SID: CID ratio), 6) matches in PubChem that included chemicalize.org as one of the CID sources and 6) with chemicalize.org as the only source.