- Poster presentation
- Open Access
The FPS fingerprint format and chemfp toolkit
Journal of Cheminformatics volume 5, Article number: P36 (2013)
During GCC 2010 poster session I presented a draft version of the FPS format for storing dense binary fingerprints. That format is now stable, and supported by RDKit , CACTVS , and other software. The chemfp package is a set of command-line tools and a Python library for fingerprint generation and high-speed Tanimoto search. It can extract pre-computed fingerprints from an SD tag or use OpenEye's OEChem , Open Babel , or RDKit to generate fingerprints. Search uses a combination of careful indexing , CPU-specific instructions (if available), and OpenMP. Nearest-100 similarity searches of PubChem-sized take less than a second on a laptop, and Butina clustering  of 2 million compounds takes about 6 hours on a 15 CPU node. In my poster I present the FPS format and chemfp package, and describe how the memory and performance requirements lead to the internal search architecture.
Swamidass SJ, Baldi P: Bounds and Algorithms for Fast Exact Searches of Chemical Fingerprints in Linear and Sublinear Time. J Chem Inf Model. 2007, 47: 302-317. 10.1021/ci600358f.
Butina D: Unsupervised Data Base Clustering Based on Daylight's Fingerprint and Tanimoto Similarity: A Fast and Automated Way To Cluster Small and Large Data Sets. J Chem Inf Model. 1999, 39: 747-750. 10.1021/ci9803381.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
About this article
Cite this article
Dalke, A. The FPS fingerprint format and chemfp toolkit. J Cheminform 5 (Suppl 1), P36 (2013). https://doi.org/10.1186/1758-2946-5-S1-P36
- Similarity Search
- Performance Requirement
- Poster Session
- Draft Version
- Careful Indexing