Skip to main content


Automatic pharmacophore model generation using weighted substructure assignments

Article metrics

  • 1044 Accesses

The generation of a pharmacophore model is a challenging process, which often requires the interaction of medicinal chemists. Given a number of ligands for a specific target, the aim is to identify the pharmacophore patterns that are responsible for the biological activities of chemical compounds. A recent study of optimal assignment methods has shown that the assignment of chemical substructures is able to detect active compounds in a data set [1]. Therefore, we investigated the possibility to use this technique to identify key features of a set of active compounds.

To determine important substructures of active compounds, we integrated n weight factors, where n is the number of substructures. The substructures were defined using the pharmacophore definitions of Phase 3.0 [2]. To define the individual weights of the pharmacophore patterns, we integrated a genetic algorithm which assigns weight factors to the previously defined patterns. The experimental setup was designed as follows: Given a data set with active compounds, the most active compound was selected as query structure for the experiment. The remaining active compounds were inserted into a background data set containing inactive compounds. The genetic algorithm evolved n weights for the pharmacophore patterns of the query structure. To evaluate the fitness of an individual, we performed a single query screening with the weights of the individual. During the optimization process, the BEDROC score [3] is optimized which puts emphasis on the early recognition performance. The result of the genetic algorithm was a weight vector that assigns each pharmacophore feature the weight of the best individual.

We evaluated our approach on a subset of the Directory of Useful Decoys that is suitable for ligand-based virtual screening [1][4]. The query structure was extracted from the same complexed crystal structure used by Huang et al. [4] to determine the binding site of the protein.

The presented method is able to provide valuable information about key features that are important for the biological activity of a compound. Additionally, information of the protein structure is not needed. Therefore, the method can also be used to derive a pharmacophore model if no protein structure is available (e.g. GPCRs).


  1. 1.

    Jahn A, Hinselmann G, Fechner N, Zell A: Journal of Cheminformatics. 2009, 1: 14-10.1186/1758-2946-1-14.

  2. 2.

    Phase, version 3.0. 2008, Schrödinger, LLC, New York, NY

  3. 3.

    Truchon JF, Bayly CI: J Chem Inf Model. 2007, 41: 488-508. 10.1021/ci600426e.

  4. 4.

    Huang N, Shoichet B, Irwin J: J Med Chem. 2006, 49: 6789-6801. 10.1021/jm0608356.

Download references

Author information

Correspondence to Andreas Jahn.

Rights and permissions

Reprints and Permissions

About this article


  • Genetic Algorithm
  • Active Compound
  • Virtual Screening
  • Pharmacophore Model
  • Optimal Assignment