https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111403
Guo Jie <guojie at loongson dot cn> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |guojie at loongson dot cn --- Comment #2 from Guo Jie <guojie at loongson dot cn> --- It seems that “omp simd reduction” cannot collaborate well with “loop peeling”, which will result in a probability error in this test case. LoongArch tree vect pass dump: # “omp simd” temporary arrays. struct S D.3833[8]; struct S D.3832[8]; ... # prologue loop. <bb 20> [local count: 723433550]: MEM <struct S[32]> [(struct S *)&D.3832][0].s = 0; _44 = D.3832[0].s; _41 = (long unsigned int) i_1; _58 = _41 * 4; _59 = a_18(D) + _58; _60 = _59->s; _61 = _44 + _60; D.3832[0].s = _61; _64 = D.3833[0].s; _65 = D.3832[0].s; _66 = _64 + _65; D.3833[0].s = _66; # Save temporary reduction results. MEM <struct S[32]> [(struct S *)&D.3832][0].s = _66; _69 = b_28(D) + _58; _70 = MEM <struct S[32]> [(const struct S &)&D.3832][0].s; _69->s = _70; i_72 = i_1 + 1; ivtmp_73 = ivtmp_2 - 1; ivtmp_78 = ivtmp_77 + 1; if (ivtmp_78 < prolog_loop_niters.42_7) goto <bb 21>; [85.71%] else goto <bb 18>; [14.29%] <bb 21> [local count: 620085901]: goto <bb 20>; [100.00%] # vector body loop. <bb 5> [local count: 118111599]: # i_48 = PHI <i_30(12), i_79(22)> # ivtmp_55 = PHI <ivtmp_45(12), ivtmp_81(22)> # vectp_a.50_126 = PHI <vectp_a.50_127(12), vectp_a.51_123(22)> # vectp_b.58_158 = PHI <vectp_b.58_159(12), vectp_b.59_155(22)> # ivtmp_161 = PHI <ivtmp_162(12), 0(22)> MEM <vector(8) int> [(struct S *)&D.3832] = { 0, 0, 0, 0, 0, 0, 0, 0 }; _16 = (long unsigned int) i_48; _17 = _16 * 4; _19 = a_18(D) + _17; vect__20.52_128 = MEM <vector(8) int> [(int *)vectp_a.50_126]; _20 = _19->s; MEM <vector(8) int> [(int *)&D.3832] = vect__20.52_128; vect__24.54_131 = MEM <vector(8) int> [(int *)&D.3833]; # Wrong value. ... vect__26.56_133 = vect__20.52_128 + vect__24.54_131; ... if (ivtmp_162 < bnd.44_109) goto <bb 12>; [0.00%] else goto <bb 25>; [100.00%] ... The temporary reduction result of “prologue loop” is only stored in D.3833[0], and all other elements of D.3833 are 0. Therefore, only the first element of vect__26.56_133 accumulates the scalar reduction result of “prologue loop”. I think the reasonable solution should be to broadcast the scalar reduction result of “prologue loop” to all elements of D.3833.