[Bug tree-optimization/114908] fails to optimize avx2 in-register permute written with std::experimental::simd

mkretz at gcc dot gnu.org via Gcc-bugs Sun, 18 Aug 2024 23:49:41 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114908

--- Comment #12 from Matthias Kretz (Vir) <mkretz at gcc dot gnu.org> ---
(In reply to rguent...@suse.de from comment #11)
> On Wed, 17 Jul 2024, mkretz at gcc dot gnu.org wrote:
> > Unless the target has a masked load instruction (e.g. AVX512) or ptr is 
> > known
> > to be aligned to at least 16 Bytes (in which case we know there cannot be a
> > page boundary at ptr + 24 Bytes). No? In this specific example, ptr is 
> > pointing
> > to a 32-Byte vector object.
> 
> Sure but here we have no alignment info available (at most 8 byte 
> alignment from the pointer type).

Excuse my ignorance :) but the back end does emit an aligned load instruction
on memcpy. Why does the back end have more information than the middle end?

> > The library can do this and it makes a difference:
> > 
> >     if (__builtin_object_size(ptr, 0) >= 4 * sizeof(T))
> >       __builtin_memcpy(&ret, ptr, 4 * sizeof(T));
> >     else
> >       __builtin_memcpy(&ret, ptr, 3 * sizeof(T));
> 
> I see, but that's then of course after inlining.

Right, but since the SIMD library depends on inlining all over the place to
produce efficient code, I can make use of it. If you think it is worthwhile to
make the optimizers life easier by introducing the above pattern in the
std::simd implementation, let me know!

[Bug tree-optimization/114908] fails to optimize avx2 in-register permute written with std::experimental::simd

Reply via email to