https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68894
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Keywords| |missed-optimization, TREE
--- Comment #6 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to Andrew Pinski from comment #5)
> I think this is fixed on the trunk now.
Or rather improved enough so PHI_OPT could do something about if needed:
if (_6 < _7)
goto <bb 6>;
else
goto <bb 7>;
<bb 6>:
_22 = MIN_EXPR <_6, pretmp_25>;
goto <bb 8>;
<bb 7>:
_28 = MIN_EXPR <_7, pretmp_25>;
<bb 8>:
# d_2 = PHI <_22(6), _28(7)>
Though we get:
.L12:
ldr w4, [x2, x8]
ldr w0, [x2, x6]
ldr w3, [x2, x7]
cmp w0, w4
csel w0, w0, w4, le
cmp w0, w3
csel w0, w0, w3, le
str w0, [x1, x2]
add x2, x2, 4
cmp x5, x2
bne .L12
From the assembly code which looks correct:
For aarch64 at -O3 we get:
.L18:
ldr q0, [x2, x7]
add w3, w3, 1
ldr q2, [x2, x5]
ldr q1, [x2, x6]
smin v0.4s, v0.4s, v2.4s
smin v0.4s, v0.4s, v1.4s
str q0, [x1, x2]
add x2, x2, 16
cmp w4, w3
bhi .L18
-O3 tree level:
<bb 14>:
# ivtmp.35_79 = PHI <0(13), ivtmp.35_78(14)>
_42 = MEM[symbol: a1, index: ivtmp.35_79, offset: 0B];
_43 = MEM[symbol: a2, index: ivtmp.35_79, offset: 0B];
pretmp_44 = MEM[symbol: a3, index: ivtmp.35_79, offset: 0B];
_45 = MIN_EXPR <_42, pretmp_44>;
_46 = MIN_EXPR <_43, pretmp_44>;
d_47 = _42 < _43 ? _45 : _46;
MEM[base: c_12(D), index: ivtmp.35_79, offset: 0B] = d_47;
ivtmp.35_78 = ivtmp.35_79 + 4;
if (ivtmp.35_78 == _111)
goto <bb 12>;
else
goto <bb 14>;
As you can see it does the right thing for the vectorized code.
Though in both cases it is not done at the tree level only at the RTL level.