https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80724
Bug ID: 80724
Summary: gcc.target/aarch64/pr62178.c failed because of r247885
Product: gcc
Version: 8.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: amker at gcc dot gnu.org
Target Milestone: ---
After r247885, test gcc.target/aarch64/pr62178.c failed as below:
gcc.target/aarch64/pr62178.c scan-assembler ld1r\\t{v[0-9]+.
Firstly, innermost loop after ivopt is:
<bb 12> [26.32%]:
# vectp_b.12_66 = PHI <vectp_b.12_67(13), vectp_b.12_64(11)>
# vect__5.16_70 = PHI <vect__5.16_71(13), { 0, 0, 0, 0 }(11)>
# ivtmp.56_96 = PHI <ivtmp.56_97(13), ivtmp.56_98(11)>
_102 = (void *) ivtmp.56_96;
_2 = MEM[base: _102, offset: 4B];
vect_cst__62 = {_2, _2, _2, _2};
vect__3.14_68 = MEM[base: vectp_b.12_66, offset: 0B];
vect__4.15_69 = vect_cst__62 * vect__3.14_68;
vect__5.16_71 = vect__4.15_69 + vect__5.16_70;
vectp_b.12_67 = vectp_b.12_66 + 124;
ivtmp.56_97 = ivtmp.56_96 + 4;
_112 = (vector(4) int *) ivtmp.68_106;
if (vectp_b.12_67 != _112)
goto <bb 13>; [96.66%]
else
goto <bb 14>; [3.34%]
<bb 13> [25.44%]:
goto <bb 12>; [100.00%]
Note candidate ivtmp.56_96 is shifted by 4, thus MEM[base: _102, offset: 4B] is
generated rather than:
_2 = MEM[base: _102, offset: 0B];
Which combined with vect_cst__62 = {_2, _2, _2, _2}; ld1r can be used.
IVOPTs has no knowledge that MEM[base + 4] has different outcome to MEM[base]
in this case.
For this iv_use:
Group 0:
Type: ADDRESS
Use 0.0:
At stmt: _2 = a[i_27][k_29];
At pos: a[i_27][k_29]
IV struct:
Type: int *
Base: (int *) (&a + ((sizetype) i_27 * 124 + 4))
Step: 4
Object: (void *) &a
Biv: N
Overflowness wrto loop niter: Overflow
There are two candidates:
Candidate 13:
Var befor: ivtmp.55
Var after: ivtmp.55
Incr POS: before exit test
IV struct:
Type: unsigned long
Base: (unsigned long) (&a + ((sizetype) i_27 * 124 + 4))
Step: 4
Object: (void *) &a
Biv: N
Overflowness wrto loop niter: Overflow
Applying pattern match.pd:1902, generic-match.c:9693
Candidate 14:
Var befor: ivtmp.56
Var after: ivtmp.56
Incr POS: before exit test
IV struct:
Type: unsigned long
Base: (unsigned long) (&a + (sizetype) i_27 * 124)
Step: 4
Object: (void *) &a
Biv: N
Overflowness wrto loop niter: Overflow
The cost is as below:
<Candidate Costs>:
cand cost
0 5
1 5
2 5
3 5
4 4
5 5
6 5
7 5
8 5
9 5
10 5
11 5
12 5
13 6
14 5
<Group-candidate Costs>:
Group 0:
cand cost compl. inv.expr. inv.vars
1 2 2 1; NIL;
2 2 2 2; NIL;
3 1 2 3; NIL;
13 0 0 NIL; NIL;
14 0 1 NIL; NIL;
Note we choose cand_14 only because cost of cand_13 itself is higher than
cand_14.
This is because the loop iterates 30 times, and we have:
cand_13
base: (unsigned long) (&a + ((sizetype) i_27 * 124 + 4))
cost: 33 (before amortize against loop niter) / 30 = 1
cand_14
base: (unsigned long) (&a + (sizetype) i_27 * 124)
cost: 29 (before amortize against loop niter) / 30 = 0
Note, we are on the verge of loop niters.
With this ivopts issue, the inner most loop should have only one more
instruction. Unfortunately before RTL combine, we have:
74: r74:SI=[++r99:DI]
REG_INC r99:DI
75: r123:V4SI=[post r90:DI+=0x7c]
REG_INC r90:DI
77: r124:V4SI=vec_duplicate(r74:SI)
REG_DEAD r74:SI
78: r126:V4SI=r123:V4SI*r124:V4SI
REG_DEAD r124:V4SI
REG_DEAD r123:V4SI
79: r93:V4SI=r93:V4SI+r126:V4SI
REG_DEAD r126:V4SI
Combine pass tries to combine 77/78, rather than 78/79, like:
74: r74:SI=[++r99:DI]
REG_INC r99:DI
75: r123:V4SI=[post r90:DI+=0x7c]
REG_INC r90:DI
77: NOTE_INSN_DELETED
78: r126:V4SI=vec_duplicate(r74:SI)*r123:V4SI
REG_DEAD r74:SI
REG_DEAD r123:V4SI
79: r93:V4SI=r93:V4SI+r126:V4SI
REG_DEAD r126:V4SI
So it misses mul+add combination, but combined an pattern which has generate
two instructions:
fmov s3, w0 // 157 *movsi_aarch64/12 [length = 4]
mul v0.4s, v0.4s, v3.s[0] // 78 *aarch64_mul3_elt_from_dupv4si
[length = 4]