https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71768
Andrew Pinski <pinskia at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Keywords| |missed-optimization, ra CC| |pinskia at gcc dot gnu.org --- Comment #3 from Andrew Pinski <pinskia at gcc dot gnu.org> --- I see this on aarch64-linux-gnu also. t: stp x29, x30, [sp, -32]! add x29, sp, 0 movi v0.4s, 0xa #APP // 6 "t.c" 1 #v0 // 0 "" 2 #NO_APP str q0, [x29, 16] bl e ldr q0, [x29, 16] #APP // 8 "t.c" 1 #v0 // 0 "" 2 #NO_APP ldp x29, x30, [sp], 32 ret So it is even worse for AARCH64 since it is not even a load, it is a move.