https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106688
--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to Alexander Monakov from comment #0)
> It looks as if going out of SSA places in the loop a register copy
> corresponding to a phi node which is outside of the loop. Strangely, RTL
> optimizations do not clean it up either.
No it is IVOPTs that places the copy inside the loop:
<bb 5> [local count: 1006632961]:
# buf_25 = PHI <buf_21(5), buf_22(4)>
# vs1_28 = PHI <vs1_20(5), { 0, 0, 0, 0, 0, 0, 0, 0 }(4)>
__asm__("pmovzxbw %1, %0" : "=x" b_17 : "m" MEM[(i8v8 *)buf_25]);
vs1_18 = b_17 + vs1_28;
_15 = (unsigned long) buf_25;
_14 = _15 + 8;
_2 = (const unsigned char *) _14;
__asm__("pmovzxbw %1, %0" : "=x" b_19 : "m" MEM[(i8v8 *)_2]);
vs1_20 = vs1_18 + b_19;
buf_21 = buf_25 + 16;
_33 = (const unsigned char *) ivtmp.18_7;
if (buf_21 != _33)
goto <bb 5>; [93.75%]
else
goto <bb 6>; [6.25%]
Notice the cast is of ivtmp.18_7 assigned to _33 here. The cast is an
invariant.
I don't know why LIM4 didn't pull out the invariant.