Probabilistic classifier: generated using randomised sub-sampling of the feature space

Tyzack, Jonathan D; Mussa, Hamse Y; Glen, Robert C

doi:10.1186/1758-2946-4-S1-P40

Volume 4 Supplement 1

7th German Conference on Chemoinformatics: 25 CIC-Workshop

Poster presentation
Open access
Published: 01 May 2012

Probabilistic classifier: generated using randomised sub-sampling of the feature space

Jonathan D Tyzack¹,
Hamse Y Mussa¹ &
Robert C Glen¹

Journal of Cheminformatics volume 4, Article number: P40 (2012) Cite this article

1469 Accesses
Metrics details

Nowadays supervised classification, based on the concept of pattern recognition, is an integral part of virtual screening. The central idea of supervised classification in chemoinformatics is to design a classifying algorithm that accurately assigns a new molecule to one of a set of predefined classes.

Naturally, probabilistic classifiers can be far more useful than hard point classifiers in making a decision on problems [1], such as virtual screening, where there is an associated risk in classifying an instance to one class or the other.

For their conceptual simplicity and computational efficiency probabilistic classification methods based on the Naive Bayes concept are widely employed in chemoinformatics. The simplicity of the Naive Bayes is due to the assumption that the descriptors representing the molecule one desires to classify are statistically independent. Unfortunately it is well documented that when the molecular descriptors are binary-valued - which is often the case in chemoinformatics - and thus take values of 0 or 1 the Naive Bayesian classifier can only act as a linear classifier in the descriptor space.

Techniques such as the Parzen-Window approach can address the above shortcomings but suffer from being computationally expensive as they require one to retain all the training dataset in core memory [2, 3].

In an attempt to address the above mentioned drawbacks, a new probabilistic classifier is proposed which uses randomized sub-sampling of the descriptor space. The proposed algorithm generates better class membership predictions than its Naive Bayesian counterpart on classifying molecules that are non-linearly separable in descriptor space.

We present a realistic test of the new method by classifying large chemical datasets generated from the ChEMBL database [4].

References

Duda RO, Hart PE: Pattern Classification and Scene Analysis. 1973, John Wiley & Sons, Ltd : New York, NY
Google Scholar
Parzen E: The Annals of Mathematical Statistics. 1962, 33: 1065-1076.
Google Scholar
Harper G, Bradshaw J, Gittins JC, Green DVS, Leach AR: . J Chem Inf Comput Sci. 2001, 41: 1295-1300. 10.1021/ci000397q.
Article CAS Google Scholar
ChEMBL. J Comput-Aided Mol Des. 2009, 4: 195-198.

Download references

Author information

Authors and Affiliations

Unilever Centre for Molecular Sciences Informatics, Department of Chemistry, University of Cambridge, Cambridge, CB2 1EW, UK
Jonathan D Tyzack, Hamse Y Mussa & Robert C Glen

Authors

Jonathan D Tyzack
View author publications
You can also search for this author in PubMed Google Scholar
Hamse Y Mussa
View author publications
You can also search for this author in PubMed Google Scholar
Robert C Glen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jonathan D Tyzack.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Tyzack, J.D., Mussa, H.Y. & Glen, R.C. Probabilistic classifier: generated using randomised sub-sampling of the feature space. J Cheminform 4 (Suppl 1), P40 (2012). https://doi.org/10.1186/1758-2946-4-S1-P40

Download citation

Published: 01 May 2012
DOI: https://doi.org/10.1186/1758-2946-4-S1-P40

7th German Conference on Chemoinformatics: 25 CIC-Workshop

Probabilistic classifier: generated using randomised sub-sampling of the feature space

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Journal of Cheminformatics

Contact us

7th German Conference on Chemoinformatics: 25 CIC-Workshop

Probabilistic classifier: generated using randomised sub-sampling of the feature space

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Journal of Cheminformatics

Contact us