https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77848
Bug ID: 77848
Summary: Gimple if-conversion results in redundant comparisons
Product: gcc
Version: 7.0
Status: UNCONFIRMED
Keywords: missed-optimization
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: wschmidt at gcc dot gnu.org
Target Milestone: ---
Target: powerpc64le-unknown-linux-gnu
Gimple if-conversion is aggressive about converting PHIs to conditional
expressions. When these expressions are not vectorized, they remain in
conditional form throughout the middle end phases. Sometimes such conditionals
do not correspond to any target instructions, so they must be re-expanded to
branching logic. When this happens, and several conditionals have the same
condition, GCC doesn't manage to combine the redundant conditions (at least,
not always).
I suspect that if such unusable conditionals were converted back to branching
logic after failed vectorization, jump threading would be able to pick up the
pieces and generate good code again, but I'm not certain.
As an example, on powerpc64le-linux, consider this Fortran code:
$ gfortran -S -O3 -mcpu=power8 -mtune=power8 -funroll-loops -ffast-math
-mrecip=all d138.f
subroutine sub(x,a,n,m)
implicit none
real*8 x(*),a(*),atemp
integer i,j,k,m,n
real*8 s,t,u,v
do j=1,m
atemp=0.d0
do i=1,n
if (abs(a(i)).gt.atemp) then
atemp=a(i)
k = i
end if
enddo
call dummy(atemp,k)
enddo
return
end
Prior to if-conversion, we have:
<bb 7>:
# i_29 = PHI <i_20(10), 1(6)>
# atemp_lsm.3_7 = PHI <atemp_lsm.3_9(10), 0.0(6)>
# atemp_lsm.4_6 = PHI <atemp_lsm.4_28(10), 0(6)>
# k_lsm.5_27 = PHI <k_lsm.5_26(10), k_lsm.5_38(6)>
_1 = (integer(kind=8)) i_29;
_2 = _1 + -1;
_3 = *a_17(D)[_2];
_4 = ABS_EXPR <_3>;
if (_4 > atemp_lsm.3_7)
goto <bb 8>;
else
goto <bb 9>;
<bb 8>:
<bb 9>:
# atemp_lsm.3_9 = PHI <atemp_lsm.3_7(7), _3(8)>
# atemp_lsm.4_28 = PHI <atemp_lsm.4_6(7), 1(8)>
# k_lsm.5_26 = PHI <k_lsm.5_27(7), i_29(8)>
i_20 = i_29 + 1;
if (_16 < i_20)
goto <bb 11>;
else
goto <bb 10>;
Following if-conversion, the PHIs in <bb 9> have been converted into
conditional expressions in <bb 7>:
<bb 7>:
# i_29 = PHI <i_20(8), 1(6)>
# atemp_lsm.3_7 = PHI <atemp_lsm.3_9(8), 0.0(6)>
# atemp_lsm.4_6 = PHI <atemp_lsm.4_28(8), 0(6)>
# k_lsm.5_27 = PHI <k_lsm.5_26(8), k_lsm.5_38(6)>
_1 = (integer(kind=8)) i_29;
_2 = _1 + -1;
_3 = *a_17(D)[_2];
_4 = ABS_EXPR <_3>;
atemp_lsm.3_9 = _4 > atemp_lsm.3_7 ? _3 : atemp_lsm.3_7;
atemp_lsm.4_28 = _4 > atemp_lsm.3_7 ? 1 : atemp_lsm.4_6;
k_lsm.5_26 = _4 > atemp_lsm.3_7 ? i_29 : k_lsm.5_27;
i_20 = i_29 + 1;
if (_16 < i_20)
goto <bb 9>;
else
goto <bb 8>;
Types of the vars in the converted expressions are:
integer(kind=4) k_lsm.5;
logical(kind=4) atemp_lsm.4;
real(kind=8) atemp_lsm.3;
The vectorizer is unable to vectorize the loop (unsupported pattern), so these
conditionals stay in place until expand time. The first of these corresponds
to a floating-point select statement, so it is fine. But the other two perform
floating-point comparisons to select between either integer or logical values,
and there is no such instruction for POWER.
The resulting code is (one iteration of an unrolled loop):
.L20:
addi 8,3,1
extsw 10,10
extsw 3,8
addi 4,4,8
.L42:
lfd 2,0(4)
fabs 3,2
fcmpu 7,3,6
fsub 4,6,3
fsel 5,4,6,2
ble 7,.L23
li 9,1
.L23:
fcmpu 0,3,6
rldicl 9,9,0,32
ble 0,.L24
mr 10,3
.L24:
We didn't use to if-convert these prior to r235436
(https://gcc.gnu.org/viewcvs/gcc/trunk/gcc/tree-if-conv.c?r1=235436&r2=235435&pathrev=235436).
Using GCC 6.2, we see the following preferable code:
.L46:
addi 10,10,1
addi 8,8,8
extsw 10,10
.L37:
lfd 3,0(8)
fabs 4,3
fcmpu 5,4,0
ble 5,.L47
fmr 12,3
fmr 0,3
mr 3,10
li 4,1
li 6,1
.L47:
The added if-conversion causes approximately 30% degradation in performance.
(I am not specifically blaming r235436; this just exposed the problem for this
particular case.)