Applicability domain for classification problems

Sushko, Iurii; Novotarskyi, S; Pandey, AK; Körner, R; Tetko, Igor

doi:10.1186/1758-2946-2-S1-P41

Volume 2 Supplement 1

5th German Conference on Cheminformatics: 23. CIC-Workshop

Poster presentation
Open access
Published: 04 May 2010

Applicability domain for classification problems

Iurii Sushko¹,
S Novotarskyi¹,
AK Pandey¹,
R Körner¹ &
…
Igor Tetko¹

Journal of Cheminformatics volume 2, Article number: P41 (2010) Cite this article

1659 Accesses
Metrics details

Classification models are frequent in QSAR modeling. It is of crucial importance to provide good accuracy estimation for classification. Applicability domain provides additional information to identify which compounds are classified with best accuracy and which are expected to have poor and unreliable predictions. The selection of the most reliable predictions can dramatically improve performance of methods while decreasing coverage of predictions [1].

In binary classification problems, labels for machine learning methods are discrete {-1, 1}. Nonetheless, model usually yields prediction that is continuous. Most apparent metrics for accuracy estimation is distance between prediction point and edge of a class, i.e. the more is the distance between prediction the edge of the class, the more reliable and accurate is the prediction of given compound. This metric has been already used in several previous studies (e.g., [2]) and demonstrated good separation of reliable and non-reliable classifications. In quantitative predictions, the standard deviation of ensemble predictions has been found as the most accurate measure distance in a recent benchmarking [3].

We propose to integrate both metrics. Rather than giving a point estimate, this approach provides us with a probability distribution of finding particular compound in one of the classes. Suggested metrics is probability

where E is class domain a - ensemble's average prediction, v -- variance of ensemble's prediction, N(a, v, x) is probability density of the Gaussian distribution. Performance of this metric and its comparison to the traditional ones are evaluated for several QSAR/QSPR classification problems. The developed approach can be freely accessed to develop and estimate applicability domain of classification models at http://qspr.eu web site.

References

Tetko IV, Bruneau P, Mewes HW, Rohrer DC, Poda GI: Drug Discov Today. 2006, 11: 700-10.1016/j.drudis.2006.06.013.
Article CAS Google Scholar
Manallack DT, Tehan BG, Gancia E, Hudson BD, Ford MG, Livingstone DJ, Whitley DC, Pitt WR: J Chem Inf Comput Sci. 2003, 43: 674-
Article CAS Google Scholar
Tetko IV, Sushko I, Pandey AK, Zhu H, Tropsha A, Papa E, Oberg T, Todeschini R, Fourches D, Varnek A: J Chem Inf Model. 2008, 48: 1733-10.1021/ci800151m.
Article CAS Google Scholar

Download references

Author information

Authors and Affiliations

Sushko, I., Helmholtz Zentrum München/IBIS, Ingolstädter Landstraße 1, D-85764, Neuherberg, Germany
Iurii Sushko, S Novotarskyi, AK Pandey, R Körner & Igor Tetko

Authors

Iurii Sushko
View author publications
You can also search for this author in PubMed Google Scholar
S Novotarskyi
View author publications
You can also search for this author in PubMed Google Scholar
AK Pandey
View author publications
You can also search for this author in PubMed Google Scholar
R Körner
View author publications
You can also search for this author in PubMed Google Scholar
Igor Tetko
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Open Access This is an open access article distributed under the terms of the Creative Commons Attribution Noncommercial License ( https://creativecommons.org/licenses/by-nc/2.0 ), which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.

Reprints and permissions

About this article

Cite this article

Sushko, I., Novotarskyi, S., Pandey, A. et al. Applicability domain for classification problems. J Cheminform 2 (Suppl 1), P41 (2010). https://doi.org/10.1186/1758-2946-2-S1-P41

Download citation

Published: 04 May 2010
DOI: https://doi.org/10.1186/1758-2946-2-S1-P41

5th German Conference on Cheminformatics: 23. CIC-Workshop

Applicability domain for classification problems

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Keywords

Journal of Cheminformatics

Contact us

5th German Conference on Cheminformatics: 23. CIC-Workshop

Applicability domain for classification problems

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Journal of Cheminformatics

Contact us