Skip to content

Advertisement

  • Poster presentation
  • Open Access

chemfp - fast and portable fingerprint formats and tools

Journal of Cheminformatics20113 (Suppl 1) :P12

https://doi.org/10.1186/1758-2946-3-S1-P12

  • Published:

Keywords

  • Simple Format
  • File Format
  • Format Description
  • Text Format
  • Binary Format

Fingerprints are conceptually simple but the abstract sequence of 0 and 1 bits are represented in an astonishing variety of forms. The diversity exists for a very practical sense: it's easier for most researchers to create a simple format than it is to search for or advocate a common standard. Incompatible formats often have no immediate or large negative consequence. The problems are more subtle. Ad hoc formats cannot easily be exchanged with other groups. They lack metadata to help track the provenance of a data set. They do not have existing tools for creating and manipulating records, and the tools which are written are often an order of magnitude slower than what an optimized program can achive.

I have developed two file portable file formats for storing the short and dense fingerprints (order 16 K bits or less, with density > 1%) often seen in cheminformatics. The FPS format is a line-based text format using hex fingerprint encoding. It is designed to be readable and easy to generate and parse. The FPB format is a block-based binary format designed for high-performance operations, including optimized ordering for sublinear Tanimoto searches [1]. The format descriptions are freely available at [2] along with the chemfp Python package to generate, convert, and work with the formats. It includes a C library and extension for fast parsing and fingerprint operations.

Authors’ Affiliations

(1)
Andrew Dalke Scientific AB, Göteborg, 413 10, Sweden

References

  1. Swamidass S, Baldi P: Bounds and Algorithms for Fast Exact Searches of Chemical Fingerprints in Linear and Sublinear Time. J Chem Inf Model. 2007, 47: 302-317. 10.1021/ci600358f.View ArticleGoogle Scholar
  2. chem-fingerprints project at Google code. http://code.google.com/p/chem-fingerprints/,

Copyright

Advertisement