https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114908
--- Comment #13 from Richard Biener <rguenth at gcc dot gnu.org> --- (In reply to Matthias Kretz (Vir) from comment #12) > (In reply to rguent...@suse.de from comment #11) > > On Wed, 17 Jul 2024, mkretz at gcc dot gnu.org wrote: > > > Unless the target has a masked load instruction (e.g. AVX512) or ptr is > > > known > > > to be aligned to at least 16 Bytes (in which case we know there cannot be > > > a > > > page boundary at ptr + 24 Bytes). No? In this specific example, ptr is > > > pointing > > > to a 32-Byte vector object. > > > > Sure but here we have no alignment info available (at most 8 byte > > alignment from the pointer type). > > Excuse my ignorance :) but the back end does emit an aligned load > instruction on memcpy. Why does the back end have more information than the > middle end? It possibly knows after inlining or performs a re-alignment somehow. > > > The library can do this and it makes a difference: > > > > > > if (__builtin_object_size(ptr, 0) >= 4 * sizeof(T)) > > > __builtin_memcpy(&ret, ptr, 4 * sizeof(T)); > > > else > > > __builtin_memcpy(&ret, ptr, 3 * sizeof(T)); > > > > I see, but that's then of course after inlining. > > Right, but since the SIMD library depends on inlining all over the place to > produce efficient code, I can make use of it. If you think it is worthwhile > to make the optimizers life easier by introducing the above pattern in the > std::simd implementation, let me know! If it helps code generation it's probably worth it. I don't see any low-hanging fruit to pick in the optimizer at least.