http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50955
--- Comment #18 from rguenther at suse dot de <rguenther at suse dot de> --- "amker.cheng at gmail dot com" <gcc-bugzi...@gcc.gnu.org> wrote: >http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50955 > >bin.cheng <amker.cheng at gmail dot com> changed: > > What |Removed |Added >---------------------------------------------------------------------------- > CC| |amker.cheng at gmail dot com > >--- Comment #17 from bin.cheng <amker.cheng at gmail dot com> --- >Hi Richard, >I am having difficulty in understanding cases if this PR. >For the reported case with two loops: > > for( y=0; y<4; y++, pDst += dstStep ) { > for( x=y+1; x<4; x++ ) { > s = ( p1[x-y-1] + p1[x-y] + p1[x-y] + p1[x-y+1] + 2 ) >> 2; > pDst[x] = (unsigned char)s; > } > > pDst[y] = p3; > } > >The dump for statement 'pDst[y] = p3;' before IVOPT is like: > ><bb 4>: >Invalid sum of incoming frequencies 1667, should be 278 > y.2_64 = (sizetype) y_89; > D.6421_65 = pDst_88 + y.2_64; > *D.6421_65 = p3_37; > pDst_69 = pDst_88 + pretmp.21_118; > ivtmp.35_116 = ivtmp.35_87 - 1; > if (ivtmp.35_116 != 0) > goto <bb 18>; > else > goto <bb 19>; > > >IVOPT chooses candidate 15: >candidate 15 > depends on 3 > var_before ivtmp.154 > var_after ivtmp.154 > incremented before exit test > type unsigned int > base (unsigned int) pDst_39(D) - (unsigned int) &p1 > step (unsigned int) (pretmp.21_118 + 1) >for use 1: >use 1 > address > in statement *D.6421_65 = p3_37; > > at position *D.6421_65 > type unsigned char * > base pDst_39(D) > step pretmp.21_118 + 1 > base object (void *) pDst_39(D) > related candidates > >After rewriting, the dump is like: > ><bb 4>: >Invalid sum of incoming frequencies 1667, should be 278 > MEM[symbol: p1, index: ivtmp.154_200, offset: 0B] = p3_37; > pDst_69 = pDst_88 + pretmp.21_118; > ivtmp.149_218 = ivtmp.149_249 - 1; > ivtmp.154_190 = ivtmp.154_200 + D.6617_250; > if (x_40 != 4) > goto <bb 18>; > else > goto <bb 19>; > >Eventually, the storing to TMR[p1,ivtmp,0] is considered local and >deleted. > >BUT, for your reduced case: > > p3 = (unsigned char)(((signed int)p1[1] + (signed int)p2[1] > + (signed int)p1[0] +(signed int)p1[0] + 2 ) >> 2 ); > > for( x=y+1; x<4; x++ ) { > s = ( p1[x-y-1] + p1[x-y] + p1[x-y] + p1[x-y+1] + 2 ) >> 2; > pDst[x] = (unsigned char)s; > } > > pDst[y] = p3; > >It is about the the TMR in below dump (before IVOPT): > ><bb 6>: > # vect_pp1.30_166 = PHI <vect_pp1.30_167(16), vect_pp1.33_165(5)> > # vect_pp1.37_176 = PHI <vect_pp1.37_177(16), vect_pp1.40_175(5)> > # vect_pp1.46_194 = PHI <vect_pp1.46_195(16), vect_pp1.49_193(5)> > # vect_p.60_223 = PHI <vect_p.60_224(16), vect_p.63_222(5)> > # ivtmp.64_225 = PHI <ivtmp.64_226(16), 0(5)> > ... > MEM[(unsigned char *)vect_p.60_223] = vect_var_.58_219; > vect_pp1.30_167 = vect_pp1.30_166 + 8; > vect_pp1.37_177 = vect_pp1.37_176 + 8; > vect_pp1.46_195 = vect_pp1.46_194 + 8; > vect_p.60_224 = vect_p.60_223 + 8; > ivtmp.64_226 = ivtmp.64_225 + 1; > if (ivtmp.64_226 < bnd.27_128) > goto <bb 16>; > else > goto <bb 7>; > >Your patch prevents IVOPT from choosing cand 4: >candidate 4 (important) > var_before ivtmp.110 > var_after ivtmp.110 > incremented before exit test > type unsigned int > base (unsigned int) (&p1 + 8) > step 8 > base object (void *) &p1 >for use 3: >use 3 > generic > in statement vect_p.60_223 = PHI <vect_p.60_224(16), vect_p.63_222(5)> > > at position > type vector(8) unsigned char * > base batmp.61_221 + 1 > step 8 > base object (void *) batmp.61_221 > is a biv > related candidates > >To prevent IVOPT from rewriting into: > ><bb 6>: > # ivtmp.107_150 = PHI <ivtmp.107_256(16), 0(5)> > # ivtmp.110_241 = PHI <ivtmp.110_146(16), ivtmp.110_132(5)> > D.6585_133 = (unsigned int) batmp.61_221; > p1.131_277 = (unsigned int) &p1; > D.6587_278 = D.6585_133 - p1.131_277; > D.6588_279 = D.6587_278 + ivtmp.110_241; > D.6589_280 = D.6588_279 + 4294967289; > D.6590_281 = (vector(8) unsigned char *) D.6589_280; > vect_p.60_223 = D.6590_281; > ... > MEM[(unsigned char *)vect_p.60_223] = vect_var_.58_219; > ivtmp.107_256 = ivtmp.107_150 + 1; > ivtmp.110_146 = ivtmp.110_241 + 8; > if (ivtmp.107_256 < bnd.27_128) > goto <bb 16>; > else > goto <bb 7>; > >Thus prevents IVOPT from generating candidate 15 in outer loop. >(Expressing >use 3 by cand 4 itself is good, right?) > > >------------------------------- >But, >It seems because the check: > > if (address_p) > { > /* Do not try to express address of an object with computation based > on address of a different object. This may cause problems in rtl > level alias analysis (that does not expect this to be happening, > as this is illegal in C), and would be unlikely to be useful > anyway. */ > if (use->iv->base_object > && cand->iv->base_object > && !operand_equal_p (use->iv->base_object, cand->iv->base_object, 0)) > return infinite_cost; > >failed because cand(15)->iv->base_object == NULL. For the reported >case, it's >not about an iv use appearing in memory reference while not marked as >address_p, and can be fixed by revise the existing check condition, is >it true? No, even expressing an address this way is broken as for example dependence analysis via scev can get confused about the actual base object. IIRC previously we already avoided the mem-use case and I had to generalize it to also avoid addresses. >PS, sorry for replying to a fixed PR, I found it's kind of impossible >to fix >PR52272 without fully understanding this one.