https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115834
Bug ID: 115834 Summary: two loads from same address for min/max Product: gcc Version: 15.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: pinskia at gcc dot gnu.org Target Milestone: --- Target: aarch64 Take: ``` #define min(a,b) ((a)>(b) ? (b) : (a)) #define max(a,b) ((a)<(b) ? (b) : (a)) void hi_smaxv(signed short *t) { t[0] = max(t[1], t[0]); } void hi_sminv(signed short *t) { t[0] = min(t[1], t[0]); } ``` GCC produces at -O2: ``` hi_smaxv: ldrh w2, [x0] ldrsh w3, [x0, 2] ldrh w1, [x0, 2] cmp w3, w2, sxth csel w1, w1, w2, ge strh w1, [x0] ret hi_sminv: ldrh w2, [x0] ldrsh w3, [x0, 2] ldrh w1, [x0, 2] cmp w3, w2, sxth csel w1, w1, w2, le strh w1, [x0] ret ``` Why have 2 loads from `[x0,2]` ? One that is sign extended and the other which is not. The tree level looks good: ``` _1 = MEM[(short int *)t_4(D) + 2B]; _2 = *t_4(D); _3 = MAX_EXPR <_1, _2>; *t_4(D) = _3; ``` Expansion for the MAX is not: ``` (insn 14 5 15 (set (reg:HI 117) (mem:HI (reg/v/f:DI 103 [ tD.4414 ]) [1 *t_4(D)+0 S2 A16])) "/app/example.cpp":7:10 -1 (nil)) (insn 15 14 16 (set (reg:SI 118) (sign_extend:SI (mem:HI (plus:DI (reg/v/f:DI 103 [ tD.4414 ]) (const_int 2 [0x2])) [1 MEM[(short intD.18 *)t_4(D) + 2B]+0 S2 A16]))) "/app/example.cpp":7:10 -1 (nil)) (insn 16 15 17 (set (reg:SI 119) (sign_extend:SI (reg:HI 117))) "/app/example.cpp":7:10 -1 (nil)) (insn 17 16 18 (set (reg:HI 120) (mem:HI (plus:DI (reg/v/f:DI 103 [ tD.4414 ]) (const_int 2 [0x2])) [1 MEM[(short intD.18 *)t_4(D) + 2B]+0 S2 A16])) "/app/example.cpp":7:10 -1 (nil)) ``` 120 could be lowpart of (118). -fno-tree-ter gives better code. This seems only to happen for aarch64 for some reason