From: The chemfp project
Popcount method | Performance relative to 8-bit lookup table | |||
---|---|---|---|---|
166 bits | 881 bits | 1024 bits | 2048 bits | |
8-bit lookup table | 1× | 1× | 1× | 1× |
16-bit lookup table | 2.0 | 2.8 | 2.9 | 2.4 |
Gillies-Miller [18] | 1.6 | 2.9 | 3.1 | 3.4 |
Lauradoux [19] | 3.1 | 3.3 | 3.7 | |
SSSE3 [15] | 5.4 | 6.1 | ||
POPCNT (8 bytes/loop) | ||||
Dispatch | 3.6 | 6.0 | 6.3 | 6.4 |
Inline | 4.9 | 6.6 | 6.9 | 6.6 |
POPCNT (fully unrolled) | ||||
Dispatch | 5.3 | 7.9 | 8.2 | 7.8 |
Inline | 6.7 | 8.2 | 8.4 | 8.0 |
AVX2 [20] (fully unrolled) | ||||
Dispatch | 8.6 | 9.2 | ||
Dispatch, prefetch | 8.7 | 9.3 | ||
Inline | 9.8 | 9.9 | ||
Inline, prefetch | 11.0 | 10.6 |