https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101995

            Bug ID: 101995
           Summary: regression built-in memset missed-optimization arm -Os
           Product: gcc
           Version: 10.3.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
          Assignee: unassigned at gcc dot gnu.org
          Reporter: dumoulin.thibaut at gmail dot com
  Target Milestone: ---

For cortex-m4 -Os, GCC10 produces bigger assembly code than GCC7 when memset is
called.

Here is the C code example to trigger the regression:

```C
#include <stdio.h>
#include <string.h>

struct foo_t {
  int a;
  int b;
  int c;
  int d;
};

/* Random function modifying foo with another value than 0 */
void doStuff(struct foo_t *foo) {
  foo->b = foo->a + foo->c;
}

void twoLinesFunction(struct foo_t *foo) {
  /* R0 is saved in GCC10 but not in GCC7 */
  memset(foo, 0x00, sizeof(struct foo_t));
  doStuff(foo);
}

int main(void) {
  struct foo_t foo;
  twoLinesFunction(&foo);
  return 0;
}
```

compile command: `gcc -Os -mcpu=cortex-m4`

GCC7.3.1 produces:
```asm
<twoLinesFunction>:
    push    {r3, lr}
    movs    r2, #16
    movs    r1, #0
    bl      8168 <memset>
    ldmia.w sp!, {r3, lr}
    b.w     8104 <doStuff>
```

While GCC10.3.0 produces:
```asm
<twoLinesFunction>:
    push    {r4, lr}
    movs    r2, #16
    mov     r4, r0        --> backup r0
    movs    r1, #0
    bl      8174 <memset>
    mov     r0, r4        --> restore r0
    ldmia.w sp!, {r4, lr}
    b.w     810c <doStuff>
```

Main function remains the same.

The builtin memset function does not change R0 so there is no need to save it
and restore it later. GCC7 is more efficient.
GCC10 should not backup R0 for this builtin function in this case, it produces
slower code.

There is this PR https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61241 which is
also referring to this behavior with a patch to implement the optimization but
I'm not sure when this optimization has been wiped out.

Reply via email to