Skip to main content

Text-based similarity searching for hit- and lead-candidate identification

The Pharmacophore Alignment Search Tool (PhAST) is a string-based approach to virtual screening. Molecules are represented by linear sequences which describe their respective pattern of interaction possibilities. The problem of molecule linearization is tackled by applying Minimum Volume Embedding in combination with a Diffusion Kernel to the molecular graph [1, 2]. Linear representations are compared using global pairwise sequence alignment [3]. PhAST exhibited enrichment capabilities comparable or superior to most common virtual screening approaches. Compound rankings were proven to be dissimilar to those of other virtual screening techniques. It was shown that emphasis on key interactions through the application of position specific weights in the alignment process significantly increases enrichment.

Significance of chemical similarity was determined in form of p-values of global alignment scores, calculated in an approach that was adapted from its original application to local sequence alignments of protein sequences utilizing Marcov chain Monte Carlo simulation [4]. Bonferroni correction was used to correct p-values with respect to the size of the screening library [5].

PhAST was employed in two prospective applications: A screening for non-nucleoside analogue inhibitors of bacterial thymidine kinase yielded a hit with a distinct structural framework but only weak activity. Screenings for drugs that are not members of the NSAID (non-steroidal anti-inflammatory drug) class as modulators of gamma secretase resulted in a potent modulator with clear structural distinction from the reference compound.


  1. Shaw R, Jebara T: Minimum Volume Embedding. Proceedings of the Eleventh International Conference on Artificial Intelligence and Statistics: 21-24 March 2007; San Juan (Puerto Rico). Edited by: Meila M, Shen X. 2007, Omnipress, 460-467.

    Google Scholar 

  2. Smola AJ, Kondor RI: Kernels and Regularization on Graphs. Proceedings of the 16th Annual Conference on Computational Learning Theory and 7th Kernel Workshop: 24-27 August 2003; Washington DC. Edited by: Schölkopf B, Warmuth, M. 2003, Springer, 144-158.

    Google Scholar 

  3. Durbin R, Eddy S, Krogh A, Mitchison G: Alignment with affine gap scores. Biological Sequence Analysis. 1998, Cambridge University Press, 29-31.

    Chapter  Google Scholar 

  4. Hartmann AK: Sampling rare events: statistics of local sequence alignments. Phys Rev E Stat Nonlin Soft Matter Phys. 2002, 65 (5 Pt 2): 056102-

    Article  Google Scholar 

  5. Bonferroni CE: Teoria statistica delle classi e calcolo delle probabilità. Pubblicazioni del R Istituto Superiore di Scienze Economiche e Commerciali di Firenze. 1936, 8: 3-62.

    Google Scholar 

Download references

Author information

Authors and Affiliations


Corresponding author

Correspondence to Volker Hähnke.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Hähnke, V. Text-based similarity searching for hit- and lead-candidate identification. J Cheminform 4 (Suppl 1), O12 (2012).

Download citation

  • Published:

  • DOI: