14 Regression] Wrong code for AVX2 since 13.1 by combining VPAND and VPTEST since r13-2006-ga56c1641e9d25e

jakub at gcc dot gnu.org via Gcc-bugs Thu, 25 May 2023 23:56:54 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109973


Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jakub at gcc dot gnu.org

--- Comment #3 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Guess the optimization is perfectly valid when it is just the ZF flag that is
tested, i.e. in bar:

#include <immintrin.h>

int
foo (__m256i x, __m256i y)
{
  __m256i a = _mm256_and_si256 (x, y);
  return _mm256_testc_si256 (a, a);
}

int
bar (__m256i x, __m256i y)
{
  __m256i a = _mm256_and_si256 (x, y);
  return _mm256_testz_si256 (a, a);
}

_mm256_testc_si256 (a, a) is dumb (always returns non-zero because a & ~a is
0), perhaps we could fold it in gimple folding to 1.  Still I'm afraid at RTL
we can't rely on that folding.  One option could be to use CCZmode instead of
CCmode for the _mm*_testz* cases and perform this optimization solely for
CCZmode and not for CCmode that would be used
for _mm*_testc*.  It has a disadvantage that we'd likely not be able to merge
_mm256_testc_si256 (a, b) + _mm256_testz_si256 (a, b) (or vice versa).

[Bug target/109973] [13/14 Regression] Wrong code for AVX2 since 13.1 by combining VPAND and VPTEST since r13-2006-ga56c1641e9d25e

Reply via email to