https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80724
Bug ID: 80724 Summary: gcc.target/aarch64/pr62178.c failed because of r247885 Product: gcc Version: 8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: amker at gcc dot gnu.org Target Milestone: --- After r247885, test gcc.target/aarch64/pr62178.c failed as below: gcc.target/aarch64/pr62178.c scan-assembler ld1r\\t{v[0-9]+. Firstly, innermost loop after ivopt is: <bb 12> [26.32%]: # vectp_b.12_66 = PHI <vectp_b.12_67(13), vectp_b.12_64(11)> # vect__5.16_70 = PHI <vect__5.16_71(13), { 0, 0, 0, 0 }(11)> # ivtmp.56_96 = PHI <ivtmp.56_97(13), ivtmp.56_98(11)> _102 = (void *) ivtmp.56_96; _2 = MEM[base: _102, offset: 4B]; vect_cst__62 = {_2, _2, _2, _2}; vect__3.14_68 = MEM[base: vectp_b.12_66, offset: 0B]; vect__4.15_69 = vect_cst__62 * vect__3.14_68; vect__5.16_71 = vect__4.15_69 + vect__5.16_70; vectp_b.12_67 = vectp_b.12_66 + 124; ivtmp.56_97 = ivtmp.56_96 + 4; _112 = (vector(4) int *) ivtmp.68_106; if (vectp_b.12_67 != _112) goto <bb 13>; [96.66%] else goto <bb 14>; [3.34%] <bb 13> [25.44%]: goto <bb 12>; [100.00%] Note candidate ivtmp.56_96 is shifted by 4, thus MEM[base: _102, offset: 4B] is generated rather than: _2 = MEM[base: _102, offset: 0B]; Which combined with vect_cst__62 = {_2, _2, _2, _2}; ld1r can be used. IVOPTs has no knowledge that MEM[base + 4] has different outcome to MEM[base] in this case. For this iv_use: Group 0: Type: ADDRESS Use 0.0: At stmt: _2 = a[i_27][k_29]; At pos: a[i_27][k_29] IV struct: Type: int * Base: (int *) (&a + ((sizetype) i_27 * 124 + 4)) Step: 4 Object: (void *) &a Biv: N Overflowness wrto loop niter: Overflow There are two candidates: Candidate 13: Var befor: ivtmp.55 Var after: ivtmp.55 Incr POS: before exit test IV struct: Type: unsigned long Base: (unsigned long) (&a + ((sizetype) i_27 * 124 + 4)) Step: 4 Object: (void *) &a Biv: N Overflowness wrto loop niter: Overflow Applying pattern match.pd:1902, generic-match.c:9693 Candidate 14: Var befor: ivtmp.56 Var after: ivtmp.56 Incr POS: before exit test IV struct: Type: unsigned long Base: (unsigned long) (&a + (sizetype) i_27 * 124) Step: 4 Object: (void *) &a Biv: N Overflowness wrto loop niter: Overflow The cost is as below: <Candidate Costs>: cand cost 0 5 1 5 2 5 3 5 4 4 5 5 6 5 7 5 8 5 9 5 10 5 11 5 12 5 13 6 14 5 <Group-candidate Costs>: Group 0: cand cost compl. inv.expr. inv.vars 1 2 2 1; NIL; 2 2 2 2; NIL; 3 1 2 3; NIL; 13 0 0 NIL; NIL; 14 0 1 NIL; NIL; Note we choose cand_14 only because cost of cand_13 itself is higher than cand_14. This is because the loop iterates 30 times, and we have: cand_13 base: (unsigned long) (&a + ((sizetype) i_27 * 124 + 4)) cost: 33 (before amortize against loop niter) / 30 = 1 cand_14 base: (unsigned long) (&a + (sizetype) i_27 * 124) cost: 29 (before amortize against loop niter) / 30 = 0 Note, we are on the verge of loop niters. With this ivopts issue, the inner most loop should have only one more instruction. Unfortunately before RTL combine, we have: 74: r74:SI=[++r99:DI] REG_INC r99:DI 75: r123:V4SI=[post r90:DI+=0x7c] REG_INC r90:DI 77: r124:V4SI=vec_duplicate(r74:SI) REG_DEAD r74:SI 78: r126:V4SI=r123:V4SI*r124:V4SI REG_DEAD r124:V4SI REG_DEAD r123:V4SI 79: r93:V4SI=r93:V4SI+r126:V4SI REG_DEAD r126:V4SI Combine pass tries to combine 77/78, rather than 78/79, like: 74: r74:SI=[++r99:DI] REG_INC r99:DI 75: r123:V4SI=[post r90:DI+=0x7c] REG_INC r90:DI 77: NOTE_INSN_DELETED 78: r126:V4SI=vec_duplicate(r74:SI)*r123:V4SI REG_DEAD r74:SI REG_DEAD r123:V4SI 79: r93:V4SI=r93:V4SI+r126:V4SI REG_DEAD r126:V4SI So it misses mul+add combination, but combined an pattern which has generate two instructions: fmov s3, w0 // 157 *movsi_aarch64/12 [length = 4] mul v0.4s, v0.4s, v3.s[0] // 78 *aarch64_mul3_elt_from_dupv4si [length = 4]