Reliable estimation of externally validated prediction errors for QSAR models

Baumann, Désirée; Baumann, Knut

doi:10.1186/1758-2946-5-S1-P33

Volume 5 Supplement 1

8th German Conference on Chemoinformatics: 26 CIC-Workshop

Poster presentation
Open access
Published: 22 March 2013

Reliable estimation of externally validated prediction errors for QSAR models

Désirée Baumann¹ &
Knut Baumann¹

Journal of Cheminformatics volume 5, Article number: P33 (2013) Cite this article

1812 Accesses
Metrics details

In most cases of QSAR modelling the final model used to make predictions, is not known a priori but has to be selected in a data driven fashion (e.g. selection of principal components, variable selection, selection of the best mathematical modelling technique). Reliable estimation of externally validated prediction errors under this model uncertainty is still a challenge in chemoinformatics. To fulfil the standards of external validation, the test data set has to be independent not only from model building but also from model selection.

There still is a controversy in the literature how the independent test data set should be chosen and how large it should be. For setting aside a test data set there are basically two different options: 1) a single test data set is set aside and 2) the test data are generated by repeatedly partitioning the available data into test and training set partitions - i.e. cross-validation. Since cross-validation uses the data more efficiently, it is to be preferred in particular for small data sets.

The aforementioned cross-validation step must not be confused with a cross-validation step that might be necessary to select the model! If model selection is also done by cross-validation two loops of cross-validation are necessary [1]. In the inner loop, cross-validation is employed for model selection [2] (also referred to as internal validation) while in the outer loop of cross-validation different test data sets are generated repeatedly that are used to assess the readily selected models (external validation).

In this contribution double cross-validation is evaluated for its ability to estimate prediction errors under model uncertainty. Depending on how double cross-validation is parameterized (test set size, number of repetitions), it either yields biased or highly variable estimates of the prediction error. The sources of bias and variability will be highlighted and recommendations are provided how to determine the test set size in order to obtain a favourable bias-variability trade-off.

References

Filzmoser P, Liebmann B, Varmuza K: Repeated double cross-validation. J Chemometrics. 2009, 23: 160-171. 10.1002/cem.1225.
Article CAS Google Scholar
Baumann K: Cross-validation as the objective function for variable selection. Trends Anal Chem. 2003, 22: 395-406. 10.1016/S0165-9936(03)00607-1.
Article CAS Google Scholar

Download references

Author information

Authors and Affiliations

Institut für Medizinische und Pharmazeutische Chemie, Technische Universität Braunschweig, Beethovenstraße 55, D-38106, Braunschweig, Germany
Désirée Baumann & Knut Baumann

Authors

Désirée Baumann
View author publications
You can also search for this author in PubMed Google Scholar
Knut Baumann
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Désirée Baumann.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Baumann, D., Baumann, K. Reliable estimation of externally validated prediction errors for QSAR models. J Cheminform 5 (Suppl 1), P33 (2013). https://doi.org/10.1186/1758-2946-5-S1-P33

Download citation

Published: 22 March 2013
DOI: https://doi.org/10.1186/1758-2946-5-S1-P33

8th German Conference on Chemoinformatics: 26 CIC-Workshop

Reliable estimation of externally validated prediction errors for QSAR models

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Journal of Cheminformatics

Contact us

8th German Conference on Chemoinformatics: 26 CIC-Workshop

Reliable estimation of externally validated prediction errors for QSAR models

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Journal of Cheminformatics

Contact us