Skip to main content

Advertisement

The FPS fingerprint format and chemfp toolkit

Article metrics

During GCC 2010 poster session I presented a draft version of the FPS format for storing dense binary fingerprints. That format is now stable, and supported by RDKit [1], CACTVS [2], and other software. The chemfp package is a set of command-line tools and a Python library for fingerprint generation and high-speed Tanimoto search. It can extract pre-computed fingerprints from an SD tag or use OpenEye's OEChem [3], Open Babel [4], or RDKit to generate fingerprints. Search uses a combination of careful indexing [5], CPU-specific instructions (if available), and OpenMP. Nearest-100 similarity searches of PubChem-sized take less than a second on a laptop, and Butina clustering [6] of 2 million compounds takes about 6 hours on a 15 CPU node. In my poster I present the FPS format and chemfp package, and describe how the memory and performance requirements lead to the internal search architecture.

References

  1. 1.

    [http://rdkit.org]

  2. 2.

    [http://xemistry.org/]

  3. 3.

    [http://www.eyesopen.com/oechem-tk]

  4. 4.

    [http://openbabel.org]

  5. 5.

    Swamidass SJ, Baldi P: Bounds and Algorithms for Fast Exact Searches of Chemical Fingerprints in Linear and Sublinear Time. J Chem Inf Model. 2007, 47: 302-317. 10.1021/ci600358f.

  6. 6.

    Butina D: Unsupervised Data Base Clustering Based on Daylight's Fingerprint and Tanimoto Similarity: A Fast and Automated Way To Cluster Small and Large Data Sets. J Chem Inf Model. 1999, 39: 747-750. 10.1021/ci9803381.

Download references

Author information

Correspondence to Andrew Dalke.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and Permissions

About this article

Cite this article

Dalke, A. The FPS fingerprint format and chemfp toolkit. J Cheminform 5, P36 (2013) doi:10.1186/1758-2946-5-S1-P36

Download citation

Keywords

  • Similarity Search
  • Performance Requirement
  • Poster Session
  • Draft Version
  • Careful Indexing