Richard Biener <richard.guent...@gmail.com> writes:
> On Tue, Aug 27, 2019 at 11:58 AM Richard Sandiford
> <richard.sandif...@arm.com> wrote:
>> ifcvt produces:
>>
>>   <bb 3> [local count: 1063004407]:
>>   # i_34 = PHI <i_30(10), 0(21)>
>>   # ivtmp_5 = PHI <ivtmp_6(10), 100(21)>
>>   _1 = (long unsigned int) i_34;
>>   _2 = _1 * 2;
>>   _3 = a_23(D) + _2;
>>   _4 = *_3;
>>   _7 = b_24(D) + _2;
>>   _49 = _4 > 0;
>>   _8 = .MASK_LOAD (_7, 16B, _49);
>>   _12 = _4 > 0;
>>   _13 = _8 > 0;
>>   _9 = _12 & _13;
>>   _10 = _4 > 0;
>>   _11 = _8 > 0;
>>   _27 = ~_11;
>>   _15 = _10 & _27;
>>   _14 = c_25(D) + _2;
>>   iftmp.0_26 = .MASK_LOAD (_14, 16B, _15);
>>   iftmp.0_19 = _9 ? _4 : iftmp.0_26;
>>   _17 = x_28(D) + _2;
>>   _50 = _4 > 0;
>>   .MASK_STORE (_17, 16B, _50, iftmp.0_19);
>>   i_30 = i_34 + 1;
>>   ivtmp_6 = ivtmp_5 - 1;
>>   if (ivtmp_6 != 0)
>>     goto <bb 10>; [98.99%]
>>   else
>>     goto <bb 9>; [1.01%]
>>
>>   <bb 10> [local count: 1052266994]:
>>   goto <bb 3>; [100.00%]
>>
>> which has 4 copies of _4 > 0 (a[i] > 0) and 2 copies of _8 > 0 (b[i] > 0).
>
> Huh.  if-conversion does
>
>   /* Now all statements are if-convertible.  Combine all the basic
>      blocks into one huge basic block doing the if-conversion
>      on-the-fly.  */
>   combine_blocks (loop);
>
>   /* Delete dead predicate computations.  */
>   ifcvt_local_dce (loop->header);
>
>   /* Perform local CSE, this esp. helps the vectorizer analysis if loads
>      and stores are involved.  CSE only the loop body, not the entry
>      PHIs, those are to be kept in sync with the non-if-converted copy.
>      ???  We'll still keep dead stores though.  */
>   exit_bbs = BITMAP_ALLOC (NULL);
>   bitmap_set_bit (exit_bbs, single_exit (loop)->dest->index);
>   bitmap_set_bit (exit_bbs, loop->latch->index);
>   todo |= do_rpo_vn (cfun, loop_preheader_edge (loop), exit_bbs);
>
> which should remove those redundant _4 > 0 checks.  In fact when I
> run this on x86_64 with -mavx512bw I see
>
>   <bb 3> [local count: 1063004407]:
>   # i_25 = PHI <i_20(9), 0(21)>
>   # ivtmp_24 = PHI <ivtmp_12(9), 100(21)>
>   _1 = (long unsigned int) i_25;
>   _2 = _1 * 2;
>   _3 = a_14(D) + _2;
>   _4 = *_3;
>   _5 = b_15(D) + _2;
>   _49 = _4 > 0;
>   _6 = .MASK_LOAD (_5, 16B, _49);
>   _22 = _6 > 0;
>   _28 = ~_22;
>   _29 = _28 & _49;
>   _7 = c_16(D) + _2;
>   iftmp.0_17 = .MASK_LOAD (_7, 16B, _29);
>   iftmp.0_10 = _29 ? iftmp.0_17 : _4;
>   _8 = x_18(D) + _2;
>   .MASK_STORE (_8, 16B, _49, iftmp.0_10);
>   i_20 = i_25 + 1;
>   ivtmp_12 = ivtmp_24 - 1;
>   if (ivtmp_12 != 0)
>
> after if-conversion (that should be the case already on the GCC 9 branch).

Gah, sorry for the noise.  Turns out I still had a local change that was
trying to poke the patch into doing something wrong.  Will try to check
my facts more carefully next time.

The redundant pattern statements I was thinking of come from
vect_recog_mask_conversion_pattern, but I guess that isn't so
interesting here.

So yeah, let's drop this whole vn thing for now...

Thanks,
Richard

Reply via email to