The influence of hashed fingerprints density on the machine learning methods performance

Smusz, Sabina; Kurczab, Rafał; Bojarski, Andrzej J

doi:10.1186/1758-2946-5-S1-P25

Volume 5 Supplement 1

8th German Conference on Chemoinformatics: 26 CIC-Workshop

Poster presentation
Open access
Published: 22 March 2013

The influence of hashed fingerprints density on the machine learning methods performance

Sabina Smusz^1,2,
Rafał Kurczab¹ &
Andrzej J Bojarski¹

Journal of Cheminformatics volume 5, Article number: P25 (2013) Cite this article

1578 Accesses
1 Citations
Metrics details

Computational techniques have become a vital part of today's drug discovery campaigns. Among a wide range of tools applied in this process, machine learning methods can be distinguished. They are used for instance in virtual screening (VS), where its role is to identify potentially active compounds out of large libraries of structures [1].

In order to enable the application of various learning algorithms in VS tasks, an appropriate representation of molecules is needed. One of the solutions comes from the hashed fingerprints, encoding the information about the structure in a form of a bit string [2].

Both length and density (the percentage of 1's) can be modified during hashed fingerprint generation, which (as it was already proved) influence the similarity searching process [3]. The aim of our study was to examine the impact of such fingerprint density on the performance of machine learning methods. A series of bit strings with different density values and of various lengths was generated by means of the RDKit software [4]. They were tested in classification tests of 5-HT_1A ligands, with the use of a set of algorithms (Naïve Bayes, SMO, Ibk, Decorate, Hyperpipes, J48 and Random Forest), in order to determine an optimal values of the variables for machine learning experiments.

References

Geppert H, Vogt M, Bajorath J: Current Trends in Ligand-Based Virtual Screening: Molecular Representations, Data Mining Methods, New Application Areas, and Performance Evaluation. J Chem Inf Model. 2010, 50: 205-216. 10.1021/ci900419k.
Article CAS Google Scholar
Rijnbeek M, Steinbeck C: OrChem - An open source chemistry search engine for Oracle^®. J Cheminf. 2009, 1: 17-10.1186/1758-2946-1-17.
Article Google Scholar
Wang Y, Bajorath J: Balancing the Influence of Molecular Complexity on Fingerprint Similarity Searching. J Chem Inf Model. 2008, 48: 75-84. 10.1021/ci700314x.
Article CAS Google Scholar
RDKit: Open-source cheminformatics. [http://www.rdkit.org]

Download references

Acknowledgements

The study was supported by a grant PRELUDIUM 2011/03/N/NZ2/02478 financed by the National Science Centre.

Author information

Authors and Affiliations

Department of Medicinal Chemistry, Institute of Pharmacology Polish Academy of Sciences, Kraków, 31-343, Poland
Sabina Smusz, Rafał Kurczab & Andrzej J Bojarski
Faculty of Chemistry, Jagiellonian University, Kraków, 30-060, Poland
Sabina Smusz

Authors

Sabina Smusz
View author publications
You can also search for this author in PubMed Google Scholar
Rafał Kurczab
View author publications
You can also search for this author in PubMed Google Scholar
Andrzej J Bojarski
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sabina Smusz.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Smusz, S., Kurczab, R. & Bojarski, A.J. The influence of hashed fingerprints density on the machine learning methods performance. J Cheminform 5 (Suppl 1), P25 (2013). https://doi.org/10.1186/1758-2946-5-S1-P25

Download citation

Published: 22 March 2013
DOI: https://doi.org/10.1186/1758-2946-5-S1-P25

8th German Conference on Chemoinformatics: 26 CIC-Workshop

The influence of hashed fingerprints density on the machine learning methods performance

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Journal of Cheminformatics

Contact us

8th German Conference on Chemoinformatics: 26 CIC-Workshop

The influence of hashed fingerprints density on the machine learning methods performance

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Journal of Cheminformatics

Contact us