https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62011
--- Comment #5 from finis at in dot tum.de --- Maybe there are a lot more instructions with such a false dependency. popcnt may only be the tip of the ice berg. I don't think Intel only got this operation wrong and all other SSE/AVX/... instructions are correct. I rather think a group of operations is implemented like popcnt. The source code in the linked SO question yields a good testbed for other operations as well: Simply replace popcount by another intrinsic and check if the performance deviations occur.