https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90582
Andrew Pinski <pinskia at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Severity|normal |enhancement --- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> --- (In reply to Andrew Pinski from comment #1) > > I assume EOR / CBNZ is as at least as efficient as SUBS / BNE on > > all/most AArch64 microarchitectures, but someone should check. > > It is similar as x86 with that respect on some cores (Marvell's cores > mostly). > That is ThunderX, ThunderX 2 and OcteonTX and OcteonTX2 all have the ability > to do macro-combining of the two instructions into one micro-op. Even on non-most Marvell cores now, subs/bne is better than eor/cbnz. Anyways starting GCC 10.3/9.4 we get: ldr x2, [x0] subs x1, x1, x2 mov x2, 0 bne .L5 Which we can't fuse anyways. I wonder if we should clobber x1 too. Note for -fomit-frame-pointer issue, it is not really an issue as only -momit-leaf-frame-pointer is turned on by default and now the function is NOT a leaf function due to the call to __stack_chk_fail . > mov x1,0 # and destroy the reg > mov w1, 3 # right before it's already > destroyed This is by design, GCC does not go back and figure out if we could remove the zeroing as if it deletes it on accident, it might introduce a "security hole". So emitting it always allows that NOT to happen. As far as the other issue dealing with the address formation, it is a small missed optmization and might not help in general or at all.