Comparing manual and automated extraction of chemical entities from documents

Tyrchan, Christian; Muresan, Sorel

doi:10.1186/1758-2946-2-S1-P7

Volume 2 Supplement 1

5th German Conference on Cheminformatics: 23. CIC-Workshop

Poster presentation
Open access
Published: 04 May 2010

Comparing manual and automated extraction of chemical entities from documents

Christian Tyrchan¹ &
Sorel Muresan¹

Journal of Cheminformatics volume 2, Article number: P7 (2010) Cite this article

2167 Accesses
2 Citations
Metrics details

The chemical information landscape is changing rapidly with a yearly increase of over 1 million new compounds and more than 700,000 publications related to chemistry [1]. Exploring the chemical space covered by relevant journals and patents is a crucial step in early stage medicinal chemistry projects. Extracting chemical entities from unstructured text is a complex task and different approaches are currently used including manual extraction by expert curators, text mining supported by chemical NER or combinations thereof [2]. The chemical information and corresponding annotations are subsequently stored in relational databases allowing for complex chemical and text queries.

To assess the capability of chemical NER in documents and to understand the coverage and accuracy of the underlying data we compared the chemistry extracted by manual curation (GVKBIO) and text mining (SureChem) from a small patent corpus.

• GVKBIO databases are populated with explicit relationships between compounds, assays and sequence identifiers that have been manually extracted from journals and patents on a large scale [3].

• SureChem Portal [4] is a gateway for chemical patent search on full text collections for USPTO, EPO and WO. SureChem users can perform structure and keyword searches on more than 9 million unique compounds.

We have selected a set of 250 patents covering various target classes and for which a minimum of 25 records per patents were retrieved from GVKBIO Patent database. The analysis was done using PipelinePilot protocols [5].

These initial results demonstrate the benefits and challenges of text mining for chemical information extraction from unstructured text.

References

Engel T: J Chem Inf Model. 2006, 46: 2267-2277. 10.1021/ci600234z.
Article CAS Google Scholar
Banville DL, (Ed.): Chemical Information Mining: Facilitating Literature-Based Discovery. 2009, CRC Press: Boca Raton, London New York
Google Scholar
[http://www.gvkbio.com/informatics.html]
[http://www.surechem.org]
[http://accelrys.com/products/scitegic]

Download references

Author information

Authors and Affiliations

AstraZeneca R&D, LG CVGI, Pepparedsleden 1, S-43183, Mölndal, Sweden
Christian Tyrchan & Sorel Muresan

Authors

Christian Tyrchan
View author publications
You can also search for this author in PubMed Google Scholar
Sorel Muresan
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Open Access This is an open access article distributed under the terms of the Creative Commons Attribution Noncommercial License ( https://creativecommons.org/licenses/by-nc/2.0 ), which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.

Reprints and permissions

About this article

Cite this article

Tyrchan, C., Muresan, S. Comparing manual and automated extraction of chemical entities from documents. J Cheminform 2 (Suppl 1), P7 (2010). https://doi.org/10.1186/1758-2946-2-S1-P7

Download citation

Published: 04 May 2010
DOI: https://doi.org/10.1186/1758-2946-2-S1-P7

5th German Conference on Cheminformatics: 23. CIC-Workshop

Comparing manual and automated extraction of chemical entities from documents

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Keywords

Journal of Cheminformatics

Contact us

5th German Conference on Cheminformatics: 23. CIC-Workshop

Comparing manual and automated extraction of chemical entities from documents

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Journal of Cheminformatics

Contact us