https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105715
Bug ID: 105715
Summary: [13 Regression] missed RTL if-conversion with
COND_EXPR change
Product: gcc
Version: 13.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: rtl-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: rguenth at gcc dot gnu.org
Target Milestone: ---
gcc.target/i386/pr45685.c with -march=cascadelake shows missing RTL
if-conversion. The cruical GIMPLE difference is
_36 = _3 > 0;
iftmp.0_13 = _36 ? 1 : -1;
prephitmp_31 = ABS_EXPR <_3>;
prephitmp_32 = _36 ? -1 : 1;
prephitmp_33 = _36 ? 4294967295 : 1;
prephitmp_35 = _36 ? 1 : 4294967295;
...
_29 = prephitmp_31 != _42;
val_12 = _29 ? prephitmp_32 : iftmp.0_13;
prephitmp_37 = _29 ? prephitmp_33 : prephitmp_35;
vs.
iftmp.0_13 = _3 > 0 ? 1 : -1;
prephitmp_31 = ABS_EXPR <_3>;
prephitmp_32 = _3 > 0 ? -1 : 1;
prephitmp_33 = _3 > 0 ? 4294967295 : 1;
prephitmp_35 = _3 > 0 ? 1 : 4294967295;
...
val_12 = i.1_6 == prephitmp_31 ? iftmp.0_13 : prephitmp_32;
prephitmp_37 = i.1_6 != prephitmp_31 ? prephitmp_33 : prephitmp_35;
where the split out condition is now CSEd and the multi-use makes us not
TER the comparison. Previously we got two compare & jump sequences while
now we get the compare computing a QImode value and the then two
compare & jump sequences.
While without -march=cascadelake we do get the desired number of cmovs
the generated code is still worse.
The testcase is unfortunately a bit obfuscated due to the many
if-conversions taking place. Smaller GIMPLE testcases do not exhibit
jumpy RTL expansion.
void __GIMPLE(ssa, startwith("optimized"))
foo (long *p, long a, long b, long c, long d, long e, long f)
{
_Bool _2;
long _3;
long _8;
__BB(2):
_2 = a_1(D) < b_10(D);
_3 = _2 ? c_4(D) : d_5(D);
_8 = _2 ? f_6(D) : e_7(D);
__MEM <long> (p_9(D)) = _3;
__MEM <long> (p_9(D) + 4) = _8;
return;
}
#if __GNUC__ < 13
void __GIMPLE(ssa, startwith("optimized"))
bar (long *p, long a, long b, long c, long d, long e, long f)
{
long _3;
long _8;
__BB(2):
_3 = a_1(D) < b_10(D) ? c_4(D) : d_5(D);
_8 = a_1(D) >= b_10(D) ? e_7(D) : f_6(D);
__MEM <long> (p_9(D)) = _3;
__MEM <long> (p_9(D) + 4) = _8;
return;
}
#endif