Skip to main content

Table 1 Relative performance of different popcount implementations

From: The chemfp project

Popcount method

Performance relative to 8-bit lookup table

166 bits

881 bits

1024 bits

2048 bits

8-bit lookup table

16-bit lookup table

2.0

2.8

2.9

2.4

Gillies-Miller [18]

1.6

2.9

3.1

3.4

Lauradoux [19]

 

3.1

3.3

3.7

SSSE3 [15]

  

5.4

6.1

POPCNT (8 bytes/loop)

 Dispatch

3.6

6.0

6.3

6.4

 Inline

4.9

6.6

6.9

6.6

POPCNT (fully unrolled)

 Dispatch

5.3

7.9

8.2

7.8

 Inline

6.7

8.2

8.4

8.0

AVX2 [20] (fully unrolled)

 Dispatch

  

8.6

9.2

 Dispatch, prefetch

  

8.7

9.3

 Inline

  

9.8

9.9

 Inline, prefetch

  

11.0

10.6

  1. Times are scaled relative to an 8-bit lookup table, as measured by the threshold searches from the chemfp benchmark suite. In most cases the search algorithm uses a function pointer to dispatch to the appropriate popcount function, without memory prefetching. The “fully unrolled” variants implement the fingerprint popcount without using a loop. The “inline” and “prefetch” variants inline the calculation and use memory prefetching, respectively. Timings were made with chemfp 3.3. Chemfp 1.5 does not support inlining, AVX2, or prefetching