https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67153
--- Comment #12 from ncm at cantrip dot org --- As regards hot spots, the program has two: int score[7] = { 0, }; for (Letters word : words) /**/ if (!(word & ~seven)) for_each_in_seven([&](Letters letter, int place) { if (word & letter) /**/ score[place] += (word == seven) ? 3 : 1; }); The first is executed 300M times, the second 3.3M times. Inserting a counter bump before the second eliminates the slowdown: if (word & letter) { ++count; /**/ score[place] += (word == seven) ? 3 : 1; } This fact seems consequential. The surrounding for_each_in_seven loop isn't doing popcounts, but is doing "while (v &= -v)". I have repeated tests using -m[no-]bmi[2], with identical results (i.e. no effect).