https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82106
--- Comment #7 from Jim Wilson <wilson at gcc dot gnu.org> --- I have an initial attempt to fix this in the patch I just added as an attachment. It needs more work and more testing to be useful, and agreement from other gcc hackers that this makes sense. On the original testcase, without the patch we get sw a7,12(sp) fld fa0,12(sp) and with the patch we get lw a5,16(sp) sw a7,8(sp) sw a5,12(sp) fld fa0,8(sp) which is larger, but avoids the unaligned load, and hence may be faster is unaligned loads trap. On the alternate testcase, without the patch we get sw a7,12(sp) lw a0,12(sp) lw a1,16(sp) and with the patch we get lw a1,0(sp) mv a0,a7 which is smaller and faster.