[Bug target/65456] powerpc64le autovectorized copy loop missed optimization

wschmidt at gcc dot gnu.org Sun, 22 Mar 2015 09:17:33 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65456


Bill Schmidt <wschmidt at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED
           Assignee|unassigned at gcc dot gnu.org      |wschmidt at gcc dot 
gnu.org

--- Comment #9 from Bill Schmidt <wschmidt at gcc dot gnu.org> ---
So, this can be viewed either as a phase-ordering problem or an expand problem;
probably the latter is more correct.

The swap optimization runs early in the RTL phases, shortly after expand.  At
the time that it sees this computation, the RTL represents something more like
this:

        lxvd2x 0,9,4
        xxpermdi 12,0,0,2
        [r18=high half of vs12]
        [r19=low half of vs12]
        std 18,0,28
        std 19,8,28

(This is well before RA so I am making up register numbers for illustration
purposes.)  The swap optimization doesn't know what to do when a vector is
split into pieces, so it punts here.

Later, in the split2 phase that runs after RA, the last four lines above are
recognized as a pattern that can be replaced by an stxvd2x in BE mode, or by an
xxswapd/stxvd2x in LE mode.  This is how we end up with the code you see in the
final output.

The question is, why is the expander generating the two doubleword stores? 
Probably because it thinks that we are ok generating unaligned doubleword
stores, but not ok generating unaligned quadword stores.  In other words, there
is probably something in there that needs to be taught that unaligned vector
stores on P8 are better to use than moving pieces into GPRs and storing them
separately.

I will take on the investigation of this, but there are a few more urgent
things that need attention first.  I expect this to be a fairly easy fix that
we'll be able to backport.

[Bug target/65456] powerpc64le autovectorized copy loop missed optimization

Reply via email to