https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109973

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jakub at gcc dot gnu.org

--- Comment #3 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Guess the optimization is perfectly valid when it is just the ZF flag that is
tested, i.e. in bar:

#include <immintrin.h>

int
foo (__m256i x, __m256i y)
{
  __m256i a = _mm256_and_si256 (x, y);
  return _mm256_testc_si256 (a, a);
}

int
bar (__m256i x, __m256i y)
{
  __m256i a = _mm256_and_si256 (x, y);
  return _mm256_testz_si256 (a, a);
}

_mm256_testc_si256 (a, a) is dumb (always returns non-zero because a & ~a is
0), perhaps we could fold it in gimple folding to 1.  Still I'm afraid at RTL
we can't rely on that folding.  One option could be to use CCZmode instead of
CCmode for the _mm*_testz* cases and perform this optimization solely for
CCZmode and not for CCmode that would be used
for _mm*_testc*.  It has a disadvantage that we'd likely not be able to merge
_mm256_testc_si256 (a, b) + _mm256_testz_si256 (a, b) (or vice versa).

Reply via email to