Skip to main content
Fig. 2 | Journal of Cheminformatics

Fig. 2

From: Sachem: a chemical cartridge for high-performance substructure search

Fig. 2

Algorithm to select fingerprint bits most relevant to the given query. Upon input, it receives set q of fingerprint bits from the first step of the fingerprint reduction algorithm, set A of atoms that are in the query, and mapping M from the query fingerprint bits to corresponding covered atoms. The algorithm is parameterized by the positive integers MaxBits and MinCover. The MaxBits parameter is a hard limit on the count of bits in the reduced fingerprint, and the MinCover parameter sets the minimal count of distinct fingerprint bits in the reduced fingerprint that cover each atom present in the query molecule. The algorithm assigns a covering counter (initially set to zero) to each atom of the query molecule. The query fingerprint bits are then traversed in descending order of filtering power. For each bit, it is determined whether there exists a query atom that is covered by the bit information. If its associated counter is less than MinCover, all counters of atoms covered by the bit are increased, and the bit is added to the resulting reduced query fingerprint; otherwise, the bit is discarded. During the development, we experimentally determined that 2 and 32 are suitable values for MinCover and MaxBits, respectively. The filtering power of distinct bits (function F) is obtained by counting the relative occurrences of the bits in the dataset. The resulting F is portable to other datasets; re-computation is needed only after substantial statistical changes in data

Back to article page