Skip to content


  • Poster presentation
  • Open Access

Get the best from substructure mining

  • 1
Journal of Cheminformatics20102 (Suppl 1) :P51

  • Published:


  • Biological Activity
  • Drug Design
  • Chemical Information
  • Chemical Content
  • Simple Pattern

The chemical information that is present in a set of compounds is rarely fully exploited. This is mostly because no descriptor set can capture all biologically important features. As a result, valuable chemical knowledge can thus stay hidden from hypothesis-based drug design. The simplest form of a structure-activity relationship (SAR) is a substructure that predisposes compounds towards reduced or increased biological activity. Such simple patterns should not be missed during drug design.

The aim of substructure mining is to present those substructures that are most likely related to biological activity. This method thus provides rapid access to a substantial repertoire of chemical descriptors that otherwise remains hidden: substructures. In short, substructure mining consists of a focused, but exhaustive, series of substructure searches.

This poster describes how AweSuM, the new Awesome Substructure Mining tool from Curios-IT, was employed to learn the most interesting substructures. The poster also discusses the value of enriching the data with 2D pharmacophore information prior to mining. An enriched, detailed SAR analysis produced a scaffold that summarises the chemical content of datasets better than any standard substructure. The pharmacophore that AweSuM extracted shows predictive power and agrees with published chemical knowledge. These results demonstrate that useful SAR knowledge can be extracted from the vast space of substructure descriptors. In this way, AweSuM reveals key substructures (e.g., pharmacophores or toxicophores), which can often be predictive for biological activities.

Authors’ Affiliations

Curios-IT, Anna van Burenhof 63, 2316GP Leiden, The Netherlands


© Jeroen; licensee BioMed Central Ltd. 2010

This article is published under license to BioMed Central Ltd.