https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89859
--- Comment #5 from Andrew Pinski <pinskia at gcc dot gnu.org> --- (In reply to John Boyer from comment #4) > I see. Would you recommend using ARM for comparisons between different > assembly outputs to gauge which does more work? It depends. NOTE the output for x86_64 is fine if you understand that the add instruction with the memory location is going to be "cracked" into two/three different micro-ops. NOTE also x86_64 processor does merge load/stores micro-ops sometimes but that does cost bandwidth. Basically what I am saying is x86 processors are complex beasts where understanding the code differences is not as simple as counting instructions.