Skip to main content

Table 1 Relative performance of different popcount implementations

From: The chemfp project

Popcount methodPerformance relative to 8-bit lookup table
166 bits881 bits1024 bits2048 bits
8-bit lookup table
16-bit lookup table2.
Gillies-Miller [18]
Lauradoux [19]
SSSE3 [15]  5.46.1
POPCNT (8 bytes/loop)
POPCNT (fully unrolled)
AVX2 [20] (fully unrolled)
 Dispatch  8.69.2
 Dispatch, prefetch  8.79.3
 Inline  9.89.9
 Inline, prefetch  11.010.6
  1. Times are scaled relative to an 8-bit lookup table, as measured by the threshold searches from the chemfp benchmark suite. In most cases the search algorithm uses a function pointer to dispatch to the appropriate popcount function, without memory prefetching. The “fully unrolled” variants implement the fingerprint popcount without using a loop. The “inline” and “prefetch” variants inline the calculation and use memory prefetching, respectively. Timings were made with chemfp 3.3. Chemfp 1.5 does not support inlining, AVX2, or prefetching