https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91204
--- Comment #2 from Jakub Jelinek <jakub at gcc dot gnu.org> --- The vectorizer vectorizes the loop using 8-byte vectors, but then match.pd during forwprop4 optimizes: - vect__6.17_46 = _52 ^ vect__1.12_39; - vect__7.18_47 = vect__6.17_46 & vect__1.12_39; + _27 = ~_52; + vect__7.18_47 = _27 & vect__1.12_39; using the: /* Fold (X & Y) ^ Y and (X ^ Y) & Y as ~X & Y. */ (for opo (bit_and bit_xor) opi (bit_xor bit_and) (simplify (opo:c (opi:cs @0 @1) @1) (bit_and (bit_not @0) @1))) rule, but does that without checking whether the one_cmpl optab is available for the vector mode, and that is done after vector lowering. The options I see are, either add some optab checking to match.pd patterns when working on vector types, at least when post vector lowering, or simply ensure that all backends have one_cmpl expanders for modes for which they have xor instructions, or teach the expander to fallback to using xor if they need one_cmpl optab which isn't available and xor is. The last one seems like the best option to me.