https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101995
Bug ID: 101995 Summary: regression built-in memset missed-optimization arm -Os Product: gcc Version: 10.3.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: dumoulin.thibaut at gmail dot com Target Milestone: --- For cortex-m4 -Os, GCC10 produces bigger assembly code than GCC7 when memset is called. Here is the C code example to trigger the regression: ```C #include <stdio.h> #include <string.h> struct foo_t { int a; int b; int c; int d; }; /* Random function modifying foo with another value than 0 */ void doStuff(struct foo_t *foo) { foo->b = foo->a + foo->c; } void twoLinesFunction(struct foo_t *foo) { /* R0 is saved in GCC10 but not in GCC7 */ memset(foo, 0x00, sizeof(struct foo_t)); doStuff(foo); } int main(void) { struct foo_t foo; twoLinesFunction(&foo); return 0; } ``` compile command: `gcc -Os -mcpu=cortex-m4` GCC7.3.1 produces: ```asm <twoLinesFunction>: push {r3, lr} movs r2, #16 movs r1, #0 bl 8168 <memset> ldmia.w sp!, {r3, lr} b.w 8104 <doStuff> ``` While GCC10.3.0 produces: ```asm <twoLinesFunction>: push {r4, lr} movs r2, #16 mov r4, r0 --> backup r0 movs r1, #0 bl 8174 <memset> mov r0, r4 --> restore r0 ldmia.w sp!, {r4, lr} b.w 810c <doStuff> ``` Main function remains the same. The builtin memset function does not change R0 so there is no need to save it and restore it later. GCC7 is more efficient. GCC10 should not backup R0 for this builtin function in this case, it produces slower code. There is this PR https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61241 which is also referring to this behavior with a patch to implement the optimization but I'm not sure when this optimization has been wiped out.