https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115629
--- Comment #5 from rguenther at suse dot de <rguenther at suse dot de> --- > Am 01.07.2024 um 12:10 schrieb tnfchris at gcc dot gnu.org > <[email protected]>: > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115629 > > --- Comment #4 from Tamar Christina <tnfchris at gcc dot gnu.org> --- > (In reply to Richard Biener from comment #3) >> So we now tail-merge the two b[i] loading blocks. Can you check SVE >> code-gen with this? If that fixes the PR consider adding a SVE testcase. > > Thanks, the codegen is much better now, but shows some other missing mask > tracking in the vectorizer. > > Atm we generate: > > .L3: > ld1w z31.s, p6/z, [x0, x6, lsl 2] <-- load a > cmpeq p7.s, p6/z, z31.s, #0 <-- a == 0, !a > ld1w z0.s, p7/z, [x2, x6, lsl 2] <-- load c conditionally on !a > cmpeq p7.s, p7/z, z0.s, #0 <-- !a && !c > orr z0.d, z31.d, z0.d <-- a || c > ld1w z29.s, p7/z, [x3, x6, lsl 2] <--- load d where !a && !c > cmpne p5.s, p6/z, z0.s, #0 <--- (a || c) & loop_mask > and p7.b, p6/z, p7.b, p7.b <--- ((!a && !c) && (!a && !c)) & > loop_mask > ld1w z30.s, p5/z, [x1, x6, lsl 2] <-- load b conditionally on (a || > c) > sel z30.s, p7, z29.s, z30.s <-- select (!a && !c, d, b) > st1w z30.s, p6, [x4, x6, lsl 2] > add x6, x6, x7 > whilelo p6.s, w6, w5 > b.any .L3 > > which corresponds to: > > # loop_mask_63 = PHI <next_mask_95(10), max_mask_94(20)> > vect__4.10_64 = .MASK_LOAD (vectp_a.8_53, 32B, loop_mask_63); > mask__31.11_66 = vect__4.10_64 != { 0, ... }; > mask__56.12_67 = ~mask__31.11_66; > vec_mask_and_70 = mask__56.12_67 & loop_mask_63; > vect__7.15_71 = .MASK_LOAD (vectp_c.13_68, 32B, vec_mask_and_70); > mask__22.16_73 = vect__7.15_71 == { 0, ... }; > mask__34.17_75 = vec_mask_and_70 & mask__22.16_73; > vect_iftmp.20_78 = .MASK_LOAD (vectp_d.18_76, 32B, mask__34.17_75); > vect__61.21_79 = vect__4.10_64 | vect__7.15_71; > mask__35.22_81 = vect__61.21_79 != { 0, ... }; > vec_mask_and_84 = mask__35.22_81 & loop_mask_63; > vect_iftmp.25_85 = .MASK_LOAD (vectp_b.23_82, 32B, vec_mask_and_84); > _86 = mask__34.17_75 & loop_mask_63; > vect_iftmp.26_87 = VEC_COND_EXPR <_86, vect_iftmp.20_78, vect_iftmp.25_85>; > .MASK_STORE (vectp_res.27_88, 32B, loop_mask_63, vect_iftmp.26_87); > > it looks like what's missing is that the mask tracking doesn't know that other > masked operations naturally perform an AND when combined. We do some of this > in the backend but I feel like it may be better to do it in the vectorizer. > > In this case, the second load is conditional on the first load mask, which > means it's already done an AND. > And crucially inverting it means you also inverted both conditions. > > So there are some superflous masking operations happening. But I guess that's > a separate bug. Shall I just add some tests here and close it and open a new > PR? Not sure if that helps - do we fully understand this is a separate issue and not related to how we if-convert? Adding a testcase is nevertheless OK of course. > -- > You are receiving this mail because: > You are on the CC list for the bug.
