https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103126

            Bug ID: 103126
           Summary: Miss vectorization for bit_and/bit_ior/bit_xor
                    reduction
           Product: gcc
           Version: 12.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: middle-end
          Assignee: unassigned at gcc dot gnu.org
          Reporter: crazylht at gmail dot com
  Target Milestone: ---
              Host: x86_64-pc-linux-gnu

Cat test.c

#include<stdint.h>

void xor_bit_arr_nolcd (uint64_t *__restrict mat, uint64_t* a,uint64_t* b,
uint64_t *__restrict ans,
    int64_t n)
{
  int64_t i;
  uint64_t vec1, sum1;
  uint64_t vec2, sum2;

  while (n > 0) {
    sum1 = 0;
    vec1 = a[0];
    sum2 = 0;
    vec2 = b[0];

    for (i = 0; i < 64; i++) {
      uint64_t tmp = mat[i]; // always safe to load
      uint64_t vec1_i = (vec1 >> i);
      uint64_t vec2_i = (vec2 >> i);
      sum1 ^= (vec1_i & 1) ? tmp : 0;
      if (vec2_i&1) sum2 ^= tmp;
    }
    *ans++ ^= sum1;  n--;
    *ans++ ^= sum2;  n--;
  }
}


vectorizer failed exactly the same reason as PR98365 #c3

(In reply to Richard Biener from comment #3)
> The issue is that we hit
> 
>   /* If this isn't a nested cycle or if the nested cycle reduction value
>      is used ouside of the inner loop we cannot handle uses of the reduction
>      value.  */
>   if (nlatch_def_loop_uses > 1 || nphi_def_loop_uses > 1)
>     {
>       if (dump_enabled_p ())
>         dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>                          "reduction used in loop.\n");
>       return NULL;
>     }
> 
> because cnt_21 is used in both the update and the COND_EXPR.  The reduction
> doesn't fit the cond reductions we support but is a blend of a cond and
> regular reduction.  Making the COND-reduction support handle this case
> should be possible though.
> 
> Using 'int' we arrive at handled IL:
> 
>   # cnt_19 = PHI <cnt_8(7), 0(15)>
>   _ifc__32 = _4 == _7 ? 1 : 0;
>   cnt_8 = cnt_19 + _ifc__32;
> 
> so adjusting if-conversion can indeed help.

I'm working on a patch to extend ifcvt(is_cond_scalar_reduction) to handle
bit_and/bit_ior/bit_xor operation.

Reply via email to