[Bug tree-optimization/78821] GCC7: Copying whole 32 bits structure field by field not optimised into copying whole 32 bits at once

rguenther at suse dot de Thu, 09 Nov 2017 01:55:39 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78821

--- Comment #11 from rguenther at suse dot de <rguenther at suse dot de> ---
On Thu, 9 Nov 2017, jakub at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78821
> 
> --- Comment #10 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
> All the store merging changes so far were for the same operations on all the
> loads/constant values.
> In order to handle something like this, we'd need to best hook in the bswap
> machinery, probably start with moving over the bswap pass from
> tree-ssa-math-opts.c to gimple-ssa-store-merging.c.

Moving the pass was on my list of thoughts as well.

>  Then for stores that are
> 8/16/32 bits wide, try/remember find_bswap_or_nop_1 (stmt as well as
> symbolic_number).
> Then, if the stores are really all adjacent and form a power of two bitsize 
> and
> their symbolic numbers combined are cmpnop or cmpxchg consider that as 
> identity
> or bswap operation and use bswap_replace to prepare the argument for the group
> store.
> Now, it would be somewhat different in the way it needs to be handled, the
> alignment needs to be taken into account already at coalesce_immediate_stores
> time and split_group would for such a group need to result in store of
> everything together.

I think bswap doesn't currently track operations like ~ ontop of the
individual bytes so that would need to be added as well.  It would
become more and more a "mini vectorization" pass thus even operations
like + constant would be interesting (but more difficult if the
individual pieces are not bytes).

Note that to avoid exponential issues we should remember the
bswap state for each SSA def we ever processed (still starting only
from stores for the purpose of store merging).  I think the current
bswap pass doesn't do that (but restricts itself to single-uses)

[Bug tree-optimization/78821] GCC7: Copying whole 32 bits structure field by field not optimised into copying whole 32 bits at once

Reply via email to