Skip to main content

Advertisement

Table 1 Extraction, upload and PubChem match statistics for US20120040982

From: Extracting and connecting chemical structures from text sources using chemicalize.org

Extraction source   Upload Conv. Fail CIDs SID:CID CZ in PC CZ-only
Full-text URL n/a 1414 1364 34 1308 63 1252 52
Main examples (PDF) 497 486 468 0 462 2.1 457 16
Claims-only (PDF) 38 34 34 0 30 3.1 28 0
  1. The row results are derived from; 1) the FPO URL as shown in Figure 5 2) the main example section IUPAC names pasted out of “Description”, saved as a PDF and subsequently uploaded for document extraction, 3) the IUPAC names in the “Claims” section, similarly saved as a PDF for uploaded . The column results are; 1) the count of SMILES downloaded from chemicalize.org and subsequently uploaded, 2) successful PubChem conversions, 3) PubChem conversion failures, 4) PubChem CID exact matches, 5) substance: compound counts (i.e. the average SID: CID ratio), 6) matches in PubChem that included chemicalize.org as one of the CID sources and 6) with chemicalize.org as the only source.