https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117395

--- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Tamar Christina from comment #4)
> (In reply to Andrew Pinski from comment #1)
> > It is a slight regression from GCC 14 though.
> > 
> > Which produced:
> > ```
> > foo:
> >         ldr     q31, [x0, 32]
> >         sub     sp, sp, #128
> >         add     sp, sp, 128
> >         dup     d0, v31.d[1]
> >         add     v0.4h, v0.4h, v31.4h
> >         ret
> > ```
> > 
> > But that is only because vget_low_s16/vget_high_s16 didn't expand to using
> > BIT_FIELD_REF before.
> > 
> 
> That's a good point, lowering it in RTL as we did before prevented the
> subreg inlining so reload didn't have to spill.
> 
> I wonder if instead of using BIT_FIELD_REF we should instead use
> VEC_PERM_EXPR + VIEW_CONVERT.  This would get us the right rotate again and
> recover the regression.
> 
> We'd still need SRA for optimal codegen though without the stack allocations.
> 
> 
> (In reply to Richard Biener from comment #3)
> > Having memcpy in the IL is preventing SRA.  There's probably no type
> > suitable for the single load/store memcpy inlining done by
> > gimple_fold_builtin_memory_op.
> > 
> 
> Yeah the original loop doesn't have memcpy but it's being idiom recognized.
> 
> > We could try to fold all memcpy to aggregate char[] array assignments,
> > at least when a decl is involved on either side with the idea to
> > eventually elide TREE_ADDRESSABLE.  But we need to make sure this
> > doesn't pessimize RTL expansion or other code dealing with memcpy but
> > not aggregate array copy.
> > 
> > SRA could handle memcpy and friends transparently iff it were to locally
> > compute its own idea of TREE_ADDRESSABLE.
> 
> I suppose the second option is better in general? does SRA have the same
> issue with memset? Would it be possible to get a rough sketch of what this
> would entail?

Yes, same with memset.  memset can be replaced with aggregate init from {}.

SRA really wants to see whether all accesses to a decl are "direct" and its
address does not escape.  TREE_ADDRESSABLE is a conservative approximation
here and works for early disqualifying of candidates.

SRA already walks all stmts, it could instead update the disqualified
bitmap using gimple_ior_addresses_taken, like execute_update_addresses_taken
does, and explicitly handle memset and memcpy - it needs to explicitly
handle those calls during stmt rewriting and access analysis as well then,
of course.

Reply via email to