https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91204

--- Comment #2 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
The vectorizer vectorizes the loop using 8-byte vectors, but then match.pd
during forwprop4 optimizes:
-  vect__6.17_46 = _52 ^ vect__1.12_39;
-  vect__7.18_47 = vect__6.17_46 & vect__1.12_39;
+  _27 = ~_52;
+  vect__7.18_47 = _27 & vect__1.12_39;
using the:
/* Fold (X & Y) ^ Y and (X ^ Y) & Y as ~X & Y.  */
(for opo (bit_and bit_xor)
     opi (bit_xor bit_and)
 (simplify
  (opo:c (opi:cs @0 @1) @1)
  (bit_and (bit_not @0) @1)))
rule, but does that without checking whether the one_cmpl optab is available
for the vector mode, and that is done after vector lowering.
The options I see are, either add some optab checking to match.pd patterns when
working on vector types, at least when post vector lowering, or simply ensure
that all backends have one_cmpl expanders for modes for which they have xor
instructions, or teach the expander to fallback to using xor if they need
one_cmpl optab which isn't available and xor is.  The last one seems like the
best option to me.

Reply via email to