------- Comment #4 from changpeng dot fang at amd dot com 2010-07-28 18:22 ------- Andrew's example is exactly what the prefetch sees for the test case (in the bug description). Unfortunately, the prefetch pass could not recognize that vect_pa.6_24 and vect_pa.20_38 are exactly the same address:
<bb 2>: pretmp.2_18 = (float) beta_4(D); vect_pa.9_22 = (vector(4) float *) &a; vect_pa.6_23 = vect_pa.9_22; vect_cst_.12_27 = {pretmp.2_18, pretmp.2_18, pretmp.2_18, pretmp.2_18}; vect_pb.16_29 = (vector(4) float *) &b; vect_pb.13_30 = vect_pb.16_29; vect_pa.23_36 = (vector(4) float *) &a; vect_pa.20_37 = vect_pa.23_36; <bb 3>: # vect_pa.6_24 = PHI <vect_pa.6_25(4), vect_pa.6_23(2)> # vect_pb.13_31 = PHI <vect_pb.13_32(4), vect_pb.13_30(2)> # vect_pa.20_38 = PHI <vect_pa.20_39(4), vect_pa.20_37(2)> # ivtmp.24_40 = PHI <ivtmp.24_41(4), 0(2)> vect_var_.10_26 = *vect_pa.6_24; vect_var_.11_28 = vect_cst_.12_27; vect_var_.17_33 = *vect_pb.13_31; vect_var_.18_34 = vect_var_.11_28 * vect_var_.17_33; vect_var_.19_35 = vect_var_.10_26 + vect_var_.18_34; *vect_pa.20_38 = vect_var_.19_35; vect_pa.6_25 = vect_pa.6_24 + 16; vect_pb.13_32 = vect_pb.13_31 + 16; vect_pa.20_39 = vect_pa.20_38 + 16; ivtmp.24_41 = ivtmp.24_40 + 1; if (ivtmp.24_41 < 256) goto <bb 4>; else goto <bb 5>; <bb 4>: goto <bb 3>; -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45021