https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107389

--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
Btw, with

typedef __uint128_t aligned_type __attribute__((aligned(16)));
_Static_assert(__alignof(aligned_type) == 16);
__uint128_t foo(aligned_type *p) { p = __builtin_assume_aligned (p, 16); return
__atomic_load_n(p, 0); }

I see

foo:
.LFB0:
        .cfi_startproc
        lpq     %r4,0(%r3)
        stmg    %r4,%r5,0(%r2)
        br      %r14

at -O2 but without the __builtin_assume_aligned optimization doesn't help much.
And without optimization but __builtin_assume_aligned in place we simply
leave that around - we probably should have elided it in
fold-all-builtins and set alignment on the destination SSA name also
when not optimizing (we do that there when optimizing), or do the same
during RTL expansion.

Reply via email to