https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106220
--- Comment #3 from Michael_S <already5chosen at yahoo dot com> --- -march-haswell is not very important. I added it only because in absence of BMI extension an issue is somewhat obscured by need to keep shift count in CL register. -O2 is also not important. -O3 is the same. And -O1, due to absence of if-conversion, demonstrates the same issue in different form. In practice, I'd guess -O1 code would perform quite well, unlike -O2 and -O3, but it does not make it less ugly looking.