Skip to main content

Automatic pharmacophore model generation using weighted substructure assignments

The generation of a pharmacophore model is a challenging process, which often requires the interaction of medicinal chemists. Given a number of ligands for a specific target, the aim is to identify the pharmacophore patterns that are responsible for the biological activities of chemical compounds. A recent study of optimal assignment methods has shown that the assignment of chemical substructures is able to detect active compounds in a data set [1]. Therefore, we investigated the possibility to use this technique to identify key features of a set of active compounds.

To determine important substructures of active compounds, we integrated n weight factors, where n is the number of substructures. The substructures were defined using the pharmacophore definitions of Phase 3.0 [2]. To define the individual weights of the pharmacophore patterns, we integrated a genetic algorithm which assigns weight factors to the previously defined patterns. The experimental setup was designed as follows: Given a data set with active compounds, the most active compound was selected as query structure for the experiment. The remaining active compounds were inserted into a background data set containing inactive compounds. The genetic algorithm evolved n weights for the pharmacophore patterns of the query structure. To evaluate the fitness of an individual, we performed a single query screening with the weights of the individual. During the optimization process, the BEDROC score [3] is optimized which puts emphasis on the early recognition performance. The result of the genetic algorithm was a weight vector that assigns each pharmacophore feature the weight of the best individual.

We evaluated our approach on a subset of the Directory of Useful Decoys that is suitable for ligand-based virtual screening [1][4]. The query structure was extracted from the same complexed crystal structure used by Huang et al. [4] to determine the binding site of the protein.

The presented method is able to provide valuable information about key features that are important for the biological activity of a compound. Additionally, information of the protein structure is not needed. Therefore, the method can also be used to derive a pharmacophore model if no protein structure is available (e.g. GPCRs).


  1. 1.

    Jahn A, Hinselmann G, Fechner N, Zell A: Journal of Cheminformatics. 2009, 1: 14-10.1186/1758-2946-1-14.

    Article  Google Scholar 

  2. 2.

    Phase, version 3.0. 2008, Schrödinger, LLC, New York, NY

  3. 3.

    Truchon JF, Bayly CI: J Chem Inf Model. 2007, 41: 488-508. 10.1021/ci600426e.

    Article  Google Scholar 

  4. 4.

    Huang N, Shoichet B, Irwin J: J Med Chem. 2006, 49: 6789-6801. 10.1021/jm0608356.

    CAS  Article  Google Scholar 

Download references

Author information



Corresponding author

Correspondence to Andreas Jahn.

Rights and permissions

Open Access This is an open access article distributed under the terms of the Creative Commons Attribution Noncommercial License ( ), which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.

Reprints and Permissions

About this article

Cite this article

Jahn, A., Planatscher, H., Hinselmann, G. et al. Automatic pharmacophore model generation using weighted substructure assignments. J Cheminform 2, P42 (2010).

Download citation


  • Genetic Algorithm
  • Active Compound
  • Virtual Screening
  • Pharmacophore Model
  • Optimal Assignment