Skip to main content
  • Poster presentation
  • Open access
  • Published:

Introducing fuzziness into maximum common substructures for meaningful cluster characterisation

Arranging similar structures in clusters is one of the typical tasks of modern Chemoinformatics with high impact in HTS follow-up, generation of structure activity relationships (SAR) and selection of starting points for compound optimisation. Methods for cluster generation are as diverse as the structures which they are applied to [1], may they be e.g. similarity- or substructure-based. Typically, medicinal chemists tend to orientate themselves in structure subsets like clusters with the help of substructures, so-called "scaffolds", which intuitively characterise the structural relationships between the molecules of the subset. In the case of substructure-based clustering, well established methods are existing for the generation of Maximum Common Substructures (MCS) which are present in all members of the structure population or a defined proportion thereof [2]. But in the case of similarity-based clusters, such MCS may either not be existing for the required dataset proportion or the common substructure may be so small that it is no longer representative and therefore meaningless.

The approach presented here allows the generation of MCS also for similarity-based clusters with a given inherent structural diversity. It does so by generating an MCS of reduced graphs in a first step, followed by mapping atom and bond indexes of this reduced MCS onto the full structures and aggregation of atom and bond information for each indexed atom/bond. In a final step, query features of the MDL SDF format (atom lists, query bonds) are utilized to map aggregated element and bond information onto the reduced MCS. As a result, "fuzziness" in atom and bond information is added to the MCS which, although still being fully database-searchable, is more meaningful for the characterisation of clusters as it can cover larger parts of the full structures than a conventional MCS could do. The approach was implemented in Pipeline Pilot™ for proof of concept but is general enough to be transferred to other technical platforms as well.


  1. Downs GM, Barnard JM: Clustering Methods and Their Uses in Computational Chemistry. Reviews in Computational Chemistry. 2003, Chichester: Wiley and Sons, 18: 1-40.

    Google Scholar 

  2. Ehrlich HC, Rarey M: Maximum common subgraph isomorphism algorithms and their applications in molecular science: a review. WIREs Comput Mol Sci. 2011, 1: 68-79. 10.1002/wcms.5.

    Article  CAS  Google Scholar 

Download references

Author information

Authors and Affiliations


Corresponding author

Correspondence to Christian Herhaus.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits use, duplication, adaptation, distribution, and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Cite this article

Herhaus, C. Introducing fuzziness into maximum common substructures for meaningful cluster characterisation. J Cheminform 6 (Suppl 1), P17 (2014).

Download citation

  • Published:

  • DOI: