https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94274
Bug ID: 94274
Summary: fold phi whose incoming args are defined from binary
operations
Product: gcc
Version: 10.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: z.zhanghaijian at huawei dot com
Target Milestone: ---
For if/else structure,
Example 1:
int test(int cond, int a, int b, int c)
{
int result = 0;
if (cond)
result = a + b;
else
result = a + c;
return result;
}
The expressions is binary operation and have a common subexpression "a", and
the opcode is the same.
E.g. on aarch64, gcc will do the binary operation first, and then do csel:
cmp w0, 0
add w0, w1, w2
add w1, w1, w3
csel w0, w1, w0, eq
In fact, it can be optimized to do csel first and then do binary operations:
cmp w0, 0
csel w2, w2, w3, ne
add w0, w2, w1
This can eliminate one instruction. This scenario is very common, and the
switch/case structure is the same.
Example 2:
int test(int cond, int a, int b, int c, int d)
{
int result = 0;
switch (cond) {
case 1:
result = a + b;
break;
case 8:
result = a + c;
break;
default:
result = a + d;
break;
}
return result;
}
gcc will do the binary operation first, and then do csel :
mov w5, w0
add w0, w1, w2
cmp w5, 1
beq .L1
add w4, w1, w4
cmp w5, 8
add w1, w1, w3
csel w0, w1, w4, eq
.L1:
ret
Which can further optimized into :
cmp w0, 1
beq .L3
cmp w0, 8
csel w4, w4, w3, ne
add w0, w1, w4
ret
.L3:
mov w4, w2
add w0, w1, w4
ret
My proposal: fold the merging phi node in tree_ssa_phiopt_worker (ssa-phiopt) :
For example 1:
replaces
bb0:
if (cond) goto bb1; else goto bb2;
bb1:
x1 = a + b;
goto <bb3>
bb2:
x2 = a + c;
bb3:
x = PHI <x1 (bb1), x2 (bb2), ...>;
with
bb0:
if (cond) goto bb1; else goto bb2;
bb1:
bb2:
bb3:
x3 = PHI <b (bb1), c (bb2), ...>;
x = a + x3;
For example 2:
replaces
bb0:
if (cond == 1) goto bb2; else goto bb1;
bb1:
if (cond == 8) goto bb3; else goto bb4;
bb2:
x2 = a + b;
goto <bb5>
bb3:
x3 = a + c;
goto <bb5>
bb4:
x4 = a + d;
bb5:
x5 = PHI <x2 (bb2), x3 (bb3), x4 (bb4), ...>;
with
bb0:
if (cond == 1) goto bb2; else goto bb1;
bb1:
if (cond == 8) goto bb3; else goto bb4;
bb2:
bb3:
bb4:
bb5:
x5 = PHI <b (bb2), c (bb3), c (bb4), ...>;
x = a + x5;
I have an initial implementation that is under testing. In part, it based on
the LLVM InstCombinePass(InstCombinePHI.cpp).
Any suggestions?