https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111403

Guo Jie <guojie at loongson dot cn> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |guojie at loongson dot cn

--- Comment #2 from Guo Jie <guojie at loongson dot cn> ---
It seems that “omp simd reduction” cannot collaborate well with “loop peeling”,
which will result in a probability error in this test case.

LoongArch tree vect pass dump:

  # “omp simd” temporary arrays.
  struct S D.3833[8];
  struct S D.3832[8];
  ...


  # prologue loop.
  <bb 20> [local count: 723433550]:
  MEM <struct S[32]> [(struct S *)&D.3832][0].s = 0;
  _44 = D.3832[0].s;
  _41 = (long unsigned int) i_1;
  _58 = _41 * 4;
  _59 = a_18(D) + _58;
  _60 = _59->s;
  _61 = _44 + _60;
  D.3832[0].s = _61;
  _64 = D.3833[0].s;
  _65 = D.3832[0].s;
  _66 = _64 + _65;
  D.3833[0].s = _66;  # Save temporary reduction results.
  MEM <struct S[32]> [(struct S *)&D.3832][0].s = _66;
  _69 = b_28(D) + _58;
  _70 = MEM <struct S[32]> [(const struct S &)&D.3832][0].s;
  _69->s = _70;
  i_72 = i_1 + 1;
  ivtmp_73 = ivtmp_2 - 1;
  ivtmp_78 = ivtmp_77 + 1;
  if (ivtmp_78 < prolog_loop_niters.42_7)
    goto <bb 21>; [85.71%]
  else
    goto <bb 18>; [14.29%]
 <bb 21> [local count: 620085901]:
  goto <bb 20>; [100.00%]


  # vector body loop.
  <bb 5> [local count: 118111599]:
  # i_48 = PHI <i_30(12), i_79(22)>
  # ivtmp_55 = PHI <ivtmp_45(12), ivtmp_81(22)>
  # vectp_a.50_126 = PHI <vectp_a.50_127(12), vectp_a.51_123(22)>
  # vectp_b.58_158 = PHI <vectp_b.58_159(12), vectp_b.59_155(22)>
  # ivtmp_161 = PHI <ivtmp_162(12), 0(22)>
  MEM <vector(8) int> [(struct S *)&D.3832] = { 0, 0, 0, 0, 0, 0, 0, 0 };
  _16 = (long unsigned int) i_48;
  _17 = _16 * 4;
  _19 = a_18(D) + _17;
  vect__20.52_128 = MEM <vector(8) int> [(int *)vectp_a.50_126];
  _20 = _19->s;
  MEM <vector(8) int> [(int *)&D.3832] = vect__20.52_128;
  vect__24.54_131 = MEM <vector(8) int> [(int *)&D.3833]; # Wrong value.
  ...
  vect__26.56_133 = vect__20.52_128 + vect__24.54_131;
  ...
  if (ivtmp_162 < bnd.44_109)
    goto <bb 12>; [0.00%]
  else
    goto <bb 25>; [100.00%]
  ...

The temporary reduction result of “prologue loop” is only stored in D.3833[0],
and all other elements of D.3833 are 0. Therefore, only the first element of
vect__26.56_133 accumulates the scalar reduction result of “prologue loop”. 

I think the reasonable solution should be to broadcast the scalar reduction
result of “prologue loop” to all elements of D.3833.

Reply via email to