Richard Biener <richard.guent...@gmail.com> writes: > On Tue, Aug 27, 2019 at 11:58 AM Richard Sandiford > <richard.sandif...@arm.com> wrote: >> ifcvt produces: >> >> <bb 3> [local count: 1063004407]: >> # i_34 = PHI <i_30(10), 0(21)> >> # ivtmp_5 = PHI <ivtmp_6(10), 100(21)> >> _1 = (long unsigned int) i_34; >> _2 = _1 * 2; >> _3 = a_23(D) + _2; >> _4 = *_3; >> _7 = b_24(D) + _2; >> _49 = _4 > 0; >> _8 = .MASK_LOAD (_7, 16B, _49); >> _12 = _4 > 0; >> _13 = _8 > 0; >> _9 = _12 & _13; >> _10 = _4 > 0; >> _11 = _8 > 0; >> _27 = ~_11; >> _15 = _10 & _27; >> _14 = c_25(D) + _2; >> iftmp.0_26 = .MASK_LOAD (_14, 16B, _15); >> iftmp.0_19 = _9 ? _4 : iftmp.0_26; >> _17 = x_28(D) + _2; >> _50 = _4 > 0; >> .MASK_STORE (_17, 16B, _50, iftmp.0_19); >> i_30 = i_34 + 1; >> ivtmp_6 = ivtmp_5 - 1; >> if (ivtmp_6 != 0) >> goto <bb 10>; [98.99%] >> else >> goto <bb 9>; [1.01%] >> >> <bb 10> [local count: 1052266994]: >> goto <bb 3>; [100.00%] >> >> which has 4 copies of _4 > 0 (a[i] > 0) and 2 copies of _8 > 0 (b[i] > 0). > > Huh. if-conversion does > > /* Now all statements are if-convertible. Combine all the basic > blocks into one huge basic block doing the if-conversion > on-the-fly. */ > combine_blocks (loop); > > /* Delete dead predicate computations. */ > ifcvt_local_dce (loop->header); > > /* Perform local CSE, this esp. helps the vectorizer analysis if loads > and stores are involved. CSE only the loop body, not the entry > PHIs, those are to be kept in sync with the non-if-converted copy. > ??? We'll still keep dead stores though. */ > exit_bbs = BITMAP_ALLOC (NULL); > bitmap_set_bit (exit_bbs, single_exit (loop)->dest->index); > bitmap_set_bit (exit_bbs, loop->latch->index); > todo |= do_rpo_vn (cfun, loop_preheader_edge (loop), exit_bbs); > > which should remove those redundant _4 > 0 checks. In fact when I > run this on x86_64 with -mavx512bw I see > > <bb 3> [local count: 1063004407]: > # i_25 = PHI <i_20(9), 0(21)> > # ivtmp_24 = PHI <ivtmp_12(9), 100(21)> > _1 = (long unsigned int) i_25; > _2 = _1 * 2; > _3 = a_14(D) + _2; > _4 = *_3; > _5 = b_15(D) + _2; > _49 = _4 > 0; > _6 = .MASK_LOAD (_5, 16B, _49); > _22 = _6 > 0; > _28 = ~_22; > _29 = _28 & _49; > _7 = c_16(D) + _2; > iftmp.0_17 = .MASK_LOAD (_7, 16B, _29); > iftmp.0_10 = _29 ? iftmp.0_17 : _4; > _8 = x_18(D) + _2; > .MASK_STORE (_8, 16B, _49, iftmp.0_10); > i_20 = i_25 + 1; > ivtmp_12 = ivtmp_24 - 1; > if (ivtmp_12 != 0) > > after if-conversion (that should be the case already on the GCC 9 branch).
Gah, sorry for the noise. Turns out I still had a local change that was trying to poke the patch into doing something wrong. Will try to check my facts more carefully next time. The redundant pattern statements I was thinking of come from vect_recog_mask_conversion_pattern, but I guess that isn't so interesting here. So yeah, let's drop this whole vn thing for now... Thanks, Richard