https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62011

--- Comment #5 from finis at in dot tum.de ---
Maybe there are a lot more instructions with such a false dependency. popcnt
may only be the tip of the ice berg. I don't think Intel only got this
operation wrong and all other SSE/AVX/... instructions are correct. I rather
think a group of operations is implemented like popcnt. The source code in the
linked SO question yields a good testbed for other operations as well: Simply
replace popcount by another intrinsic and check if the performance deviations
occur.

Reply via email to