https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109973
Jakub Jelinek <jakub at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |jakub at gcc dot gnu.org --- Comment #3 from Jakub Jelinek <jakub at gcc dot gnu.org> --- Guess the optimization is perfectly valid when it is just the ZF flag that is tested, i.e. in bar: #include <immintrin.h> int foo (__m256i x, __m256i y) { __m256i a = _mm256_and_si256 (x, y); return _mm256_testc_si256 (a, a); } int bar (__m256i x, __m256i y) { __m256i a = _mm256_and_si256 (x, y); return _mm256_testz_si256 (a, a); } _mm256_testc_si256 (a, a) is dumb (always returns non-zero because a & ~a is 0), perhaps we could fold it in gimple folding to 1. Still I'm afraid at RTL we can't rely on that folding. One option could be to use CCZmode instead of CCmode for the _mm*_testz* cases and perform this optimization solely for CCZmode and not for CCmode that would be used for _mm*_testc*. It has a disadvantage that we'd likely not be able to merge _mm256_testc_si256 (a, b) + _mm256_testz_si256 (a, b) (or vice versa).