Skip to main content

Table 1 Relative performance of different popcount implementations

From: The chemfp project

Popcount methodPerformance relative to 8-bit lookup table
166 bits881 bits1024 bits2048 bits
8-bit lookup table
16-bit lookup table2.02.82.92.4
Gillies-Miller [18]1.62.93.13.4
Lauradoux [19] 3.13.33.7
SSSE3 [15]  5.46.1
POPCNT (8 bytes/loop)
 Dispatch3.66.06.36.4
 Inline4.96.66.96.6
POPCNT (fully unrolled)
 Dispatch5.37.98.27.8
 Inline6.78.28.48.0
AVX2 [20] (fully unrolled)
 Dispatch  8.69.2
 Dispatch, prefetch  8.79.3
 Inline  9.89.9
 Inline, prefetch  11.010.6
  1. Times are scaled relative to an 8-bit lookup table, as measured by the threshold searches from the chemfp benchmark suite. In most cases the search algorithm uses a function pointer to dispatch to the appropriate popcount function, without memory prefetching. The “fully unrolled” variants implement the fingerprint popcount without using a loop. The “inline” and “prefetch” variants inline the calculation and use memory prefetching, respectively. Timings were made with chemfp 3.3. Chemfp 1.5 does not support inlining, AVX2, or prefetching