https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118076

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jakub at gcc dot gnu.org,
                   |                            |law at gcc dot gnu.org

--- Comment #7 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
The behavior of #c6 at -O3 changed with
r12-1564-g967b46530234b4e6ad3983057705aea6c20a03c4
That said, even the GCC 11 emitted code is not good, we can just construct it
directly into the outgoing arguments area, don't have to copy it through a
temporary which nobody otherwise uses.
That can't be easily optimized during GIMPLE optimizations though, the outgoing
argument area is something that isn't represented in the IL at that point.

Note, clang trunk emits the same thing as gcc trunk except for different
scheduling.

I think RTL DSE should be able to find the rhs of the stores to the MEM
locations which are later read, but in this case it likely doesn't trigger
because in both GCC 11 and later GCCs the reads and pushes/stores into the
argument area are TImode, so each read loads a pair of the registers.  So we'd
need to be able to track that there are two separate stores and try to optimize
those away and replace the TImode store by 2 DImode  stores.  This is dse.cc
(replace_read) or so.

Reply via email to