https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114908
--- Comment #12 from Matthias Kretz (Vir) <mkretz at gcc dot gnu.org> --- (In reply to rguent...@suse.de from comment #11) > On Wed, 17 Jul 2024, mkretz at gcc dot gnu.org wrote: > > Unless the target has a masked load instruction (e.g. AVX512) or ptr is > > known > > to be aligned to at least 16 Bytes (in which case we know there cannot be a > > page boundary at ptr + 24 Bytes). No? In this specific example, ptr is > > pointing > > to a 32-Byte vector object. > > Sure but here we have no alignment info available (at most 8 byte > alignment from the pointer type). Excuse my ignorance :) but the back end does emit an aligned load instruction on memcpy. Why does the back end have more information than the middle end? > > The library can do this and it makes a difference: > > > > if (__builtin_object_size(ptr, 0) >= 4 * sizeof(T)) > > __builtin_memcpy(&ret, ptr, 4 * sizeof(T)); > > else > > __builtin_memcpy(&ret, ptr, 3 * sizeof(T)); > > I see, but that's then of course after inlining. Right, but since the SIMD library depends on inlining all over the place to produce efficient code, I can make use of it. If you think it is worthwhile to make the optimizers life easier by introducing the above pattern in the std::simd implementation, let me know!