The FPS fingerprint format and chemfp toolkit

Dalke, Andrew

doi:10.1186/1758-2946-5-S1-P36

Volume 5 Supplement 1

8th German Conference on Chemoinformatics: 26 CIC-Workshop

Poster presentation
Open access
Published: 22 March 2013

The FPS fingerprint format and chemfp toolkit

Andrew Dalke¹

Journal of Cheminformatics volume 5, Article number: P36 (2013) Cite this article

2155 Accesses
12 Citations
7 Altmetric
Metrics details

During GCC 2010 poster session I presented a draft version of the FPS format for storing dense binary fingerprints. That format is now stable, and supported by RDKit [1], CACTVS [2], and other software. The chemfp package is a set of command-line tools and a Python library for fingerprint generation and high-speed Tanimoto search. It can extract pre-computed fingerprints from an SD tag or use OpenEye's OEChem [3], Open Babel [4], or RDKit to generate fingerprints. Search uses a combination of careful indexing [5], CPU-specific instructions (if available), and OpenMP. Nearest-100 similarity searches of PubChem-sized take less than a second on a laptop, and Butina clustering [6] of 2 million compounds takes about 6 hours on a 15 CPU node. In my poster I present the FPS format and chemfp package, and describe how the memory and performance requirements lead to the internal search architecture.

References

[http://rdkit.org]
[http://xemistry.org/]
[http://www.eyesopen.com/oechem-tk]
[http://openbabel.org]
Swamidass SJ, Baldi P: Bounds and Algorithms for Fast Exact Searches of Chemical Fingerprints in Linear and Sublinear Time. J Chem Inf Model. 2007, 47: 302-317. 10.1021/ci600358f.
Article CAS Google Scholar
Butina D: Unsupervised Data Base Clustering Based on Daylight's Fingerprint and Tanimoto Similarity: A Fast and Automated Way To Cluster Small and Large Data Sets. J Chem Inf Model. 1999, 39: 747-750. 10.1021/ci9803381.
Article CAS Google Scholar

Download references

Author information

Authors and Affiliations

Andrew Dalke Scientific, 41134, Göteborg, Sweden
Andrew Dalke

Authors

Andrew Dalke
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Andrew Dalke.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Dalke, A. The FPS fingerprint format and chemfp toolkit. J Cheminform 5 (Suppl 1), P36 (2013). https://doi.org/10.1186/1758-2946-5-S1-P36

Download citation

Published: 22 March 2013
DOI: https://doi.org/10.1186/1758-2946-5-S1-P36

8th German Conference on Chemoinformatics: 26 CIC-Workshop

The FPS fingerprint format and chemfp toolkit

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Journal of Cheminformatics

Contact us

8th German Conference on Chemoinformatics: 26 CIC-Workshop

The FPS fingerprint format and chemfp toolkit

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Journal of Cheminformatics

Contact us