https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84508
--- Comment #27 from Peter Cordes <pcordes at gmail dot com> --- (In reply to Hongtao Liu from comment #26) > (In reply to Hongtao Liu from comment #25) > > (In reply to Peter Cordes from comment #22) > > > Why are we adding an alignment requirement to _mm_storel_pd, the intrinsic > > > for MOVLPD? > > > > > From Intel intrinsic guide[1], there's explict "mem_addr does not need to be > > aligned on any particular boundary" for mm_store_sd, but not for > > _mm_storel_pd. > > [1] https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html > > > But for mm_loadl_pd, it also says no need for alignment, I need to confirm > with my peers if there's any specific purpose on that. > And yes, for <16-byte memory access, there's no alignment requirement > functionally. Interesting, yes some entries explicitly say the memory can be unaligned, some don't. But I don't think we should read that as alignment required being the default if not stated. Every intrinsic that does require alignment explicitly says so. (Like _mm_load_si128.) We could make the same argument in the other direction, that if an alignment requirement isn't mentioned, we should assume there isn't one. And I already posted earlier about why we shouldn't assume C semantics based on the pointer type as Andrew Pinski had thought. Intel's intrinsic docs were originally written for ICC (classic), which takes intrinsics very literally: an intrinsic in the C source will (almost?) always compile to the corresponding asm instruction. And presumably not optimizing based on pointer-alignment UB even on a deref. And definitely not on strict-aliasing UB. So the C defaults for deref of a double* or __m64* shouldn't be assumed even when the docs don't say anything about alignment. They also don't mention aliasing but we know from Intel's examples of how to use intrinsics (I think) that the load/store intrinsics are all may_alias accesses. Intel's current ICX compiler is based on LLVM which does care about aliasing and alignment UB when optimizing, but their intrinsic docs still read like they're thinking more in terms of asm than in terms of the C abstract machine. Probably they haven't been rewritten with that in mind since they implement them (in their own compilers) so they Just Work even when aliasing other types or without alignment.