https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63791
Marcus Kool <marcus.kool at urlfilterdb dot com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |REOPENED Resolution|INVALID |--- --- Comment #6 from Marcus Kool <marcus.kool at urlfilterdb dot com> --- (In reply to H.J. Lu from comment #5) > (In reply to Marcus Kool from comment #2) > > > > To resume, gcc 4.8.4 and gcc 4.9.2 produce code that can be optimised > > further, and gcc 5.1.0 produces even slower code which means that the > > implementation of *_set1_epi8() is slower/much-slower than that it can be. > > That is done on purpose. Add -mtune=intel will get what you want. I do not understand why with the "-O3 -mavx2" flags gcc "on purpose" produces 4 instructions and the compiler flag "-mtune=intel" is mandatory to improve this. The -mavx2 flags says that the platform is AVX2 (i.e. Haswell and better). I read the man page of 5.1.0 and must say that I am frustrated with the explanation given for the flag "-mtune=intel". I always thought that -O3 was the way to tell the compiler "do your best to make the code run fast" and up to gcc 4.9.2, "-mavx2" was good enough to say to generate code for AVX2. Resuming, for gcc previous to 5.x, "-O3 -mavx2" did the trick and starting with 5.x "-O3 -mavx2 -mtune=intel" does the trick. And the additional caveat is that the behaviour of "-mtune=intel" can change in the future. What is the reasoning behind this ?