[Bug tree-optimization/114908] fails to optimize avx2 in-register permute written with std::experimental::simd

rguenth at gcc dot gnu.org via Gcc-bugs Mon, 19 Aug 2024 06:17:31 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114908


--- Comment #13 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Matthias Kretz (Vir) from comment #12)
> (In reply to rguent...@suse.de from comment #11)
> > On Wed, 17 Jul 2024, mkretz at gcc dot gnu.org wrote:
> > > Unless the target has a masked load instruction (e.g. AVX512) or ptr is 
> > > known
> > > to be aligned to at least 16 Bytes (in which case we know there cannot be 
> > > a
> > > page boundary at ptr + 24 Bytes). No? In this specific example, ptr is 
> > > pointing
> > > to a 32-Byte vector object.
> > 
> > Sure but here we have no alignment info available (at most 8 byte 
> > alignment from the pointer type).
> 
> Excuse my ignorance :) but the back end does emit an aligned load
> instruction on memcpy. Why does the back end have more information than the
> middle end?

It possibly knows after inlining or performs a re-alignment somehow.

> > > The library can do this and it makes a difference:
> > > 
> > >     if (__builtin_object_size(ptr, 0) >= 4 * sizeof(T))
> > >       __builtin_memcpy(&ret, ptr, 4 * sizeof(T));
> > >     else
> > >       __builtin_memcpy(&ret, ptr, 3 * sizeof(T));
> > 
> > I see, but that's then of course after inlining.
> 
> Right, but since the SIMD library depends on inlining all over the place to
> produce efficient code, I can make use of it. If you think it is worthwhile
> to make the optimizers life easier by introducing the above pattern in the
> std::simd implementation, let me know!

If it helps code generation it's probably worth it.  I don't see any
low-hanging fruit to pick in the optimizer at least.

[Bug tree-optimization/114908] fails to optimize avx2 in-register permute written with std::experimental::simd

Reply via email to