https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113025
--- Comment #5 from juki at gcc dot mail.kapsi.fi --- (In reply to Xi Ruoyao from comment #4) > (In reply to Xi Ruoyao from comment #3) > > > > This won't work if ptr is a __m128i *. It is allowed to optimize > > (uintptr_t)(__m128i *)foo % 15 to 0 because the standard says (__m128i *)foo > > I mean % 16, not % 15. > > > invokes undefined behavior when foo is a pointer not aligned to 16-byte > > boundary (C23 section 6.3.2.3p6). ptr on this case is one of the parameter types defined for various memory load intrinsics of NEON instruction set like vld1q_u8(const uint8_t *ptr) or vld1q_u16(const uint16_t *ptr). These instructions expect natural alignment of that type which is why unaligned load is needed for SSE operation to work with them in general case. The same macro is used to implement all those different loads that just need to read 128 bits of memory into a vector. Alignment of ptr is something smaller than 16 and can be as low as 1 for const uint8_t which it is has been for the cases that have been crashing for me.