AVX2 platforms

marcus.kool at urlfilterdb dot com Fri, 01 May 2015 15:43:14 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63791


Marcus Kool <marcus.kool at urlfilterdb dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |REOPENED
         Resolution|INVALID                     |---

--- Comment #6 from Marcus Kool <marcus.kool at urlfilterdb dot com> ---
(In reply to H.J. Lu from comment #5)
> (In reply to Marcus Kool from comment #2)
> >
> > To resume, gcc 4.8.4 and gcc 4.9.2 produce code that can be optimised
> > further, and gcc 5.1.0 produces even slower code which means that the
> > implementation of *_set1_epi8() is slower/much-slower than that it can be.
> 
> That is done on purpose.  Add -mtune=intel will get what you want.

I do not understand why with the "-O3 -mavx2" flags gcc "on purpose" produces 4
instructions and the compiler flag "-mtune=intel" is mandatory to improve this. 

The -mavx2 flags says that the platform is AVX2 (i.e. Haswell and better).
I read the man page of 5.1.0 and must say that I am frustrated with the
explanation given for the flag "-mtune=intel".  I always thought that -O3 was
the way to tell the compiler "do your best to make the code run fast" and up to
gcc 4.9.2, "-mavx2" was good enough to say to generate code for AVX2.

Resuming, for gcc previous to 5.x, "-O3 -mavx2" did the trick and starting with
5.x "-O3 -mavx2 -mtune=intel" does the trick.  And the additional caveat is
that the behaviour of "-mtune=intel" can change in the future.  
What is the reasoning behind this ?

[Bug target/63791] use 32-byte version of vpbroadcastb (and register to poulate) on AVX/AVX2 platforms

Reply via email to