https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103126
Bug ID: 103126 Summary: Miss vectorization for bit_and/bit_ior/bit_xor reduction Product: gcc Version: 12.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: crazylht at gmail dot com Target Milestone: --- Host: x86_64-pc-linux-gnu Cat test.c #include<stdint.h> void xor_bit_arr_nolcd (uint64_t *__restrict mat, uint64_t* a,uint64_t* b, uint64_t *__restrict ans, int64_t n) { int64_t i; uint64_t vec1, sum1; uint64_t vec2, sum2; while (n > 0) { sum1 = 0; vec1 = a[0]; sum2 = 0; vec2 = b[0]; for (i = 0; i < 64; i++) { uint64_t tmp = mat[i]; // always safe to load uint64_t vec1_i = (vec1 >> i); uint64_t vec2_i = (vec2 >> i); sum1 ^= (vec1_i & 1) ? tmp : 0; if (vec2_i&1) sum2 ^= tmp; } *ans++ ^= sum1; n--; *ans++ ^= sum2; n--; } } vectorizer failed exactly the same reason as PR98365 #c3 (In reply to Richard Biener from comment #3) > The issue is that we hit > > /* If this isn't a nested cycle or if the nested cycle reduction value > is used ouside of the inner loop we cannot handle uses of the reduction > value. */ > if (nlatch_def_loop_uses > 1 || nphi_def_loop_uses > 1) > { > if (dump_enabled_p ()) > dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > "reduction used in loop.\n"); > return NULL; > } > > because cnt_21 is used in both the update and the COND_EXPR. The reduction > doesn't fit the cond reductions we support but is a blend of a cond and > regular reduction. Making the COND-reduction support handle this case > should be possible though. > > Using 'int' we arrive at handled IL: > > # cnt_19 = PHI <cnt_8(7), 0(15)> > _ifc__32 = _4 == _7 ? 1 : 0; > cnt_8 = cnt_19 + _ifc__32; > > so adjusting if-conversion can indeed help. I'm working on a patch to extend ifcvt(is_cond_scalar_reduction) to handle bit_and/bit_ior/bit_xor operation.