Statistical modeling of value distributions of similarity coefficients in virtual screening and its application to predicting fingerprint search performance
Journal of Cheminformatics volume 5, Article number: O5 (2013)
Similarity searching using fingerprints is a popular ligand-based virtual screening approach. The Tanimoto coefficient (Tc) is the most widely used measure for quantifying fingerprint similarity. In general, it is very difficult to assess the significance of the similarity of two molecules solely based on their calculated Tc values. In the literature, Tc cut-off values are frequently intuitively chosen as similarity criteria for virtual screening. This can be very problematic because the distribution of similarity scores largely depends on the specific type of fingerprint that is used and the reference compound for which the fingerprint is calculated. In order to rationalize similarity value considerations, a statistical approach named the conditional correlated Bernoulli model is presented that models similarity scores based on the statistical distribution of fingerprint features in large compound databases. Fingerprint features are modeled as dependent Bernoulli variables and conditional distributions of Tanimoto similarity values of database compounds are determined with respect to given reference compounds. The model makes it possible to estimate the position of a compound in a database ranking only based on its Tc value relative to the reference. This rank estimation of molecules enables the quantitative comparison of similarity values of different fingerprint types. Moreover, it can be utilized to rapidly assess the potential of fingerprints to identify new active molecules in a database search given a set of known reference molecules .
Vogt M, Bajorath J: Introduction of the Conditional Correlated Bernoulli Model of Similarity Value Distributions and its Application to the Prospective Prediction of Fingerprint Search Performance. J Chem Inf Model. 2011, 51: 2496-2506. 10.1021/ci2003472.
About this article
Cite this article
Vogt, M., Bajorath, J. Statistical modeling of value distributions of similarity coefficients in virtual screening and its application to predicting fingerprint search performance. J Cheminform 5 (Suppl 1), O5 (2013). https://doi.org/10.1186/1758-2946-5-S1-O5