The influence of training actives/inactives ratio on machine learning performance

Kurczab, Rafał; Smusz, Sabina; Bojarski, Andrzej J

doi:10.1186/1758-2946-5-S1-P30

Volume 5 Supplement 1

8th German Conference on Chemoinformatics: 26 CIC-Workshop

Poster presentation
Open access
Published: 22 March 2013

The influence of training actives/inactives ratio on machine learning performance

Rafał Kurczab¹,
Sabina Smusz^1,2 &
Andrzej J Bojarski¹

Journal of Cheminformatics volume 5, Article number: P30 (2013) Cite this article

1603 Accesses
2 Citations
Metrics details

In drug discovery, machine learning is widely used to classify molecules as active or inactive against a particular target. The vast majority of these methods (supervised learning) needs a training set of objects (molecules) to develop a decision rule that can be used to classify new entities (the test set) into one of the two mentioned classes [1].

A lot of studies, searching an optimal learning parameters and their impact on classification effectiveness were performed [2, 3]. Unfortunately, there is no data showing the influence of actives/inactives ratio, used to model training, on the efficiency of new active compounds identification. Therefore, the main goal of this study was to examine the impact of changing the number of inactives in the training set with fixed amount of actives. For a given ratio, the inactives were randomly selected from ZINC database (10-times to prevent an overestimations error). This concept was verified on three different protein targets (i.e. 5-HT_1A, HIV-1 protease and matrix metalloproteinase) and a set of algorithms (SMO, Naïve Bayes, Ibk, J48 and Random Forest) implemented in WEKA package [4]. To compounds representation, two types of molecular fingerprints were used (MACCS and hashed fingerprint), to determine their possible impact on machine learning performance.

References

Melville JL, Burke EK, Hirst JD: Machine learning in virtual screening. Comb Chem & High Thr Scr. 2009, 12: 332-343.
Article CAS Google Scholar
Ma XH, Wang R, Yang SY, Li R, Xue Y, Wei YC, Low BC, Chen YZ: Evaluation of virtual screening performance of support vector machines trained by sparsely distributed active compounds. J Chem Inf Mod. 2008, 48: 1227-1237. 10.1021/ci800022e.
Article CAS Google Scholar
Plewczynski D, Spieser SH, Koch U: Assessing different classification methods for virtual screening. J Chem Inf Mod. 2006, 46: 1098-106. 10.1021/ci050519k.
Article CAS Google Scholar
Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH: The WEKA data mining software: an update. SIGKDD Explorations. 2009, 11 (1): 10-18. 10.1145/1656274.1656278.
Article Google Scholar

Download references

Acknowledgements

The study was supported by a grant PRELUDIUM 2011/03/N/NZ2/02478 financed by the National Science Centre.

Author information

Authors and Affiliations

Department of Medicinal Chemistry, Institute of Pharmacology Polish Academy of Sciences, Kraków, 31-343, Poland
Rafał Kurczab, Sabina Smusz & Andrzej J Bojarski
Faculty of Chemistry, Jagiellonian University, Kraków, 30-060, Poland
Sabina Smusz

Authors

Rafał Kurczab
View author publications
You can also search for this author in PubMed Google Scholar
Sabina Smusz
View author publications
You can also search for this author in PubMed Google Scholar
Andrzej J Bojarski
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rafał Kurczab.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Kurczab, R., Smusz, S. & Bojarski, A.J. The influence of training actives/inactives ratio on machine learning performance. J Cheminform 5 (Suppl 1), P30 (2013). https://doi.org/10.1186/1758-2946-5-S1-P30

Download citation

Published: 22 March 2013
DOI: https://doi.org/10.1186/1758-2946-5-S1-P30

8th German Conference on Chemoinformatics: 26 CIC-Workshop

The influence of training actives/inactives ratio on machine learning performance

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Journal of Cheminformatics

Contact us

8th German Conference on Chemoinformatics: 26 CIC-Workshop

The influence of training actives/inactives ratio on machine learning performance

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Journal of Cheminformatics

Contact us