Skip to main content


We're creating a new version of this page. See preview

  • Oral presentation
  • Open Access

Semantics vs. statistics in chemical markup

Journal of Cheminformatics20124 (Suppl 1) :O16

  • Published:


  • Natural Language
  • Pharmaceutical Company
  • Royal Society
  • Text Analysis
  • Language Processing

Since the late 1990s, natural language processing (NLP) has seen a massive shift from high-precision, low-recall systems based on small sets of hand-written rules, to methods based on the statistical analysis of large corpora. The field of chemoinformatics, likewise, is dominated by statistical and machine-learning approaches. In recent years, however, pharmaceutical companies have been engaging more and more with Semantic Web technologies, which are largely built around the sorts of hand-written systems that NLP has moved away from this century. We discuss where our current text analysis and Semantic Web efforts at the Royal Society of Chemistry are headed and how we're making use of the unreasonable effectiveness of data.

Authors’ Affiliations

Royal Society of Chemistry, Thomas Graham House, Cambridge, CB4 0WF, UK


© Batchelor; licensee BioMed Central Ltd. 2012

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.