https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96882
--- Comment #5 from emilie.feral at numworks dot com --- When compiling without the lto using the command: arm-none-eabi-gcc main.c -Os -mfloat-abi=hard -mthumb -mcpu=cortex-m4 -ffreestanding -nostdlib -lgcc -save-temps -o a.elf I get the following instructions for CalledFunction: Dump of assembler code for function CalledFunction: 0x00008000 <+0>: push {r4, r5, lr} 0x00008002 <+2>: ldr r5, [pc, #52] ; (0x8038 <CalledFunction+56>) 0x00008004 <+4>: ldmia r5!, {r0, r1, r2, r3} 0x00008006 <+6>: sub sp, #100 ; 0x64 0x00008008 <+8>: add r4, sp, #32 0x0000800a <+10>: stmia r4!, {r0, r1, r2, r3} 0x0000800c <+12>: ldmia.w r5, {r0, r1, r2, r3} 0x00008010 <+16>: add r5, sp, #32 0x00008012 <+18>: stmia.w r4, {r0, r1, r2, r3} 0x00008016 <+22>: ldmia r5!, {r0, r1, r2, r3} 0x00008018 <+24>: add r4, sp, #64 ; 0x40 0x0000801a <+26>: stmia r4!, {r0, r1, r2, r3} 0x0000801c <+28>: ldmia.w r5, {r0, r1, r2, r3} 0x00008020 <+32>: stmia.w r4, {r0, r1, r2, r3} 0x00008024 <+36>: vldr d0, [sp, #64] ; 0x40 0x00008028 <+40>: vldr d1, [sp, #72] ; 0x48 0x0000802c <+44>: vldr d2, [sp, #80] ; 0x50 0x00008030 <+48>: vldr d3, [sp, #88] ; 0x58 0x00008034 <+52>: add sp, #100 ; 0x64 0x00008036 <+54>: pop {r4, r5, pc} 0x00008038 <+56>: strh r0, [r3, #2] 0x0000803a <+58>: movs r0, r0 End of assembler dump. Which seems correct to me: the result is returned through registers d0-d3. Interesting fact, if I keep the lto but remove the mfloat-abi=hard option: arm-none-eabi-gcc main.c -Os -flto -mthumb -mcpu=cortex-m4 -ffreestanding -nostdlib -lgcc -save-temps -o a.elf The compilation also seems correct: the result is written at the address given by r0 and the address is returned through r0. Dump of assembler code for function CalledFunction: 0x00008000 <+0>: push {r4, r5, r6, lr} 0x00008002 <+2>: ldr r5, [pc, #20] ; (0x8018 <CalledFunction+24>) 0x00008004 <+4>: mov r6, r0 0x00008006 <+6>: mov r4, r0 0x00008008 <+8>: ldmia r5!, {r0, r1, r2, r3} 0x0000800a <+10>: stmia r4!, {r0, r1, r2, r3} 0x0000800c <+12>: ldmia.w r5, {r0, r1, r2, r3} 0x00008010 <+16>: stmia.w r4, {r0, r1, r2, r3} 0x00008014 <+20>: mov r0, r6 0x00008016 <+22>: pop {r4, r5, r6, pc} 0x00008018 <+24>: strh r0, [r5, #0] 0x0000801a <+26>: movs r0, r0 End of assembler dump.