https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107389
--- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> --- Created attachment 53784 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53784&action=edit prototype This implements an -O0 fold-builtins pass. I've disabled some but not all "optimizations" and instead of just throwing away __builtin_assume_aligned I'm processing it at -O0 (the machinery from CCP relies on a lattice, with optimization we should at least merge with the alignment info on the LHS). On s390 I then see foo: .LFB0: .cfi_startproc stmg %r11,%r15,88(%r15) .cfi_offset 11, -72 .cfi_offset 12, -64 .cfi_offset 13, -56 .cfi_offset 14, -48 .cfi_offset 15, -40 aghi %r15,-176 .cfi_def_cfa_offset 336 lgr %r11,%r15 .cfi_def_cfa_register 11 stg %r2,168(%r11) stg %r3,160(%r11) lg %r1,160(%r11) lpq %r2,0(%r1) lg %r1,168(%r11) stmg %r2,%r3,0(%r1) lg %r2,168(%r11) lmg %r11,%r15,264(%r11) .cfi_restore 15 .cfi_restore 14 .cfi_restore 13 .cfi_restore 12 .cfi_restore 11 .cfi_def_cfa 15, 160 br %r14 .cfi_endproc specifically I did not disable __atomic_add_fetch_* optimizations to .ATOMIC_ADD_FETCH_CMP_0 and friends and also kept optimizing stack_save/restore.