https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117395
--- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> --- (In reply to Tamar Christina from comment #4) > (In reply to Andrew Pinski from comment #1) > > It is a slight regression from GCC 14 though. > > > > Which produced: > > ``` > > foo: > > ldr q31, [x0, 32] > > sub sp, sp, #128 > > add sp, sp, 128 > > dup d0, v31.d[1] > > add v0.4h, v0.4h, v31.4h > > ret > > ``` > > > > But that is only because vget_low_s16/vget_high_s16 didn't expand to using > > BIT_FIELD_REF before. > > > > That's a good point, lowering it in RTL as we did before prevented the > subreg inlining so reload didn't have to spill. > > I wonder if instead of using BIT_FIELD_REF we should instead use > VEC_PERM_EXPR + VIEW_CONVERT. This would get us the right rotate again and > recover the regression. > > We'd still need SRA for optimal codegen though without the stack allocations. > > > (In reply to Richard Biener from comment #3) > > Having memcpy in the IL is preventing SRA. There's probably no type > > suitable for the single load/store memcpy inlining done by > > gimple_fold_builtin_memory_op. > > > > Yeah the original loop doesn't have memcpy but it's being idiom recognized. > > > We could try to fold all memcpy to aggregate char[] array assignments, > > at least when a decl is involved on either side with the idea to > > eventually elide TREE_ADDRESSABLE. But we need to make sure this > > doesn't pessimize RTL expansion or other code dealing with memcpy but > > not aggregate array copy. > > > > SRA could handle memcpy and friends transparently iff it were to locally > > compute its own idea of TREE_ADDRESSABLE. > > I suppose the second option is better in general? does SRA have the same > issue with memset? Would it be possible to get a rough sketch of what this > would entail? Yes, same with memset. memset can be replaced with aggregate init from {}. SRA really wants to see whether all accesses to a decl are "direct" and its address does not escape. TREE_ADDRESSABLE is a conservative approximation here and works for early disqualifying of candidates. SRA already walks all stmts, it could instead update the disqualified bitmap using gimple_ior_addresses_taken, like execute_update_addresses_taken does, and explicitly handle memset and memcpy - it needs to explicitly handle those calls during stmt rewriting and access analysis as well then, of course.