https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115834

            Bug ID: 115834
           Summary: two loads from same address for min/max
           Product: gcc
           Version: 15.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: rtl-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: pinskia at gcc dot gnu.org
  Target Milestone: ---
            Target: aarch64

Take:
```
#define min(a,b) ((a)>(b) ? (b) : (a))
#define max(a,b) ((a)<(b) ? (b) : (a))

void hi_smaxv(signed short *t)
{
  t[0] = max(t[1], t[0]);
}
void hi_sminv(signed short *t)
{
  t[0] = min(t[1], t[0]);
}
```

GCC produces at -O2:
```
hi_smaxv:
        ldrh    w2, [x0]
        ldrsh   w3, [x0, 2]
        ldrh    w1, [x0, 2]
        cmp     w3, w2, sxth
        csel    w1, w1, w2, ge
        strh    w1, [x0]
        ret
hi_sminv:
        ldrh    w2, [x0]
        ldrsh   w3, [x0, 2]
        ldrh    w1, [x0, 2]
        cmp     w3, w2, sxth
        csel    w1, w1, w2, le
        strh    w1, [x0]
        ret
```


Why have 2 loads from `[x0,2]` ? One that is sign extended and the other which
is not.

The tree level looks good:
```
  _1 = MEM[(short int *)t_4(D) + 2B];
  _2 = *t_4(D);
  _3 = MAX_EXPR <_1, _2>;
  *t_4(D) = _3;
```

Expansion for the MAX is not:
```
(insn 14 5 15 (set (reg:HI 117)
        (mem:HI (reg/v/f:DI 103 [ tD.4414 ]) [1 *t_4(D)+0 S2 A16]))
"/app/example.cpp":7:10 -1
     (nil))

(insn 15 14 16 (set (reg:SI 118)
        (sign_extend:SI (mem:HI (plus:DI (reg/v/f:DI 103 [ tD.4414 ])
                    (const_int 2 [0x2])) [1 MEM[(short intD.18 *)t_4(D) + 2B]+0
S2 A16]))) "/app/example.cpp":7:10 -1
     (nil))

(insn 16 15 17 (set (reg:SI 119)
        (sign_extend:SI (reg:HI 117))) "/app/example.cpp":7:10 -1
     (nil))

(insn 17 16 18 (set (reg:HI 120)
        (mem:HI (plus:DI (reg/v/f:DI 103 [ tD.4414 ])
                (const_int 2 [0x2])) [1 MEM[(short intD.18 *)t_4(D) + 2B]+0 S2
A16])) "/app/example.cpp":7:10 -1
     (nil))
```

120 could be lowpart of (118).

 -fno-tree-ter gives better code.

This seems only to happen for aarch64 for some reason

Reply via email to