[Bug target/99698] New: [aarch64] Impossible to accurately detect extensions using preprocessor

2021-03-21 Thread evan--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99698

Bug ID: 99698
   Summary: [aarch64] Impossible to accurately detect extensions
using preprocessor
   Product: gcc
   Version: 10.2.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: e...@coeus-group.com
  Target Milestone: ---

Using SHA3 as an example, but this problem exists for other extensions as well.

If you pass something like -march=armv8-a+sha3, GCC will define
__ARM_FEATURE_SHA3.  According to the ACLE documentation:

  __ARM_FEATURE_SHA3 is defined to 1 if the SHA1 & SHA2 Crypto instructions
  from Armv8-A and the SHA2 and SHA3 instructions from Armv8.2-A and newer
  are supported and intrinsics targeting them are available. These
  instructions include AES{E, D}, SHA1{C, P, M}, RAX, and others.

So, I should be able to use the preprocessor to call a function like vbcaxq_u32
if __ARM_FEATURE_SHA3 is defined to 1.

However, if I call vbcaxq_u32 I get an error about a target specific option
mismatch.  It turns out that you need -march=armv8.2-a+sha3 to convince GCC to
accept the function call.

AFAICT there is no way to reliably detect armv8.2-a.  The only difference in
the preprocessor macros defined for armv8-a and armv8.2-a are a few extra
__ARM_FEATURE_* macros.  There is *no* difference in pre-defined macros between
armv8.1-a and armv8.2-a.

FWIW, clang accepts -march=armv8-a+sha3, defines __ARM_FEATURE_SHA3 to 1, and
allows you to call the SHA3 functions, which seems pretty reasonable to me.

[Bug target/99698] [aarch64] Impossible to accurately detect extensions using preprocessor

2021-03-21 Thread evan--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99698

--- Comment #2 from Evan Nemerson  ---
Nice, thanks!  That would at least make testing possible, though I still think
that just checking __ARM_FEATURE_SHA3 should be sufficient, and ktkachov's
comment reinforces that.

[Bug target/99754] New: [sse2] new _mm_loadu_si16 and _mm_loadu_si32 implemented incorrectly

2021-03-24 Thread evan--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99754

Bug ID: 99754
   Summary: [sse2] new _mm_loadu_si16 and _mm_loadu_si32
implemented incorrectly
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: e...@coeus-group.com
  Target Milestone: ---

Created attachment 50470
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50470&action=edit
Trivial patch

_mm_loadu_si16 and _mm_loadu_si32 were implemented in GCC 11, but incorrectly. 
The value pointed to by the argument is supposed to go in the first element,
but _mm_set_epi16 / _mm_set_epi32 reverse the argument order so in GCC they go
in the *last* elemement.

The most straightforward solution would be to change the _mm_set_* calls so the
input is used for the last argument instead of the first (patch attached).

FWIW, here is LLVM's implementation:
.
I've verified that LLVM's implementation matches ICC's.

[Bug target/97248] New: [mips] unrecognizable insn when left shifting uint64 vector by scalar with MSA

2020-09-29 Thread evan--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97248

Bug ID: 97248
   Summary: [mips] unrecognizable insn when left shifting uint64
vector by scalar with MSA
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: e...@coeus-group.com
  Target Milestone: ---

On mips64el with -mmsa, attempting to left shift a vector of unsigned 64-bit
integers by a scalar results in an ICE.  The same code works without -mmsa.

Test case:

  typedef struct {
long unsigned int c __attribute__((__vector_size__(16)));
  } d;
  int i() {
int e;
d f, g;
f.c = g.c << e;
  }

`mips64el-linux-gnuabi64-gcc-10 -mmsa -c -o foo.o foo.c`:

  foo.c: In function ‘i’:
  foo.c:8:1: error: unrecognizable insn:
  8 | }
| ^
  (insn 9 8 10 2 (set (reg:DI 198)
  (subreg:DI (mem/c:SI (reg/f:DI 189 virtual-stack-vars) [1 e+0 S4
A32]) 0)) "foo.c":7:13 -1
  (nil))
  during RTL pass: vregs
  foo.c:8:1: internal compiler error: in extract_insn, at recog.c:2294
  Please submit a full bug report,
  with preprocessed source if appropriate.
  See  for instructions.

[Bug target/98428] New: [11 regression] ICE with omp simd loop + optimization

2020-12-23 Thread evan--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98428

Bug ID: 98428
   Summary: [11 regression] ICE with omp simd loop + optimization
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: e...@coeus-group.com
  Target Milestone: ---

This is based on the gcc-11 currently in Debian experimental (gcc-11 (Debian
11-20201216-2) 11.0.0 20201216 (experimental) [master revision
4e42f6ebf48:da5e758223c:134afa38f0befc49f51747f3afe931fa4707c2f8]), but as
you'll see the build on Compiler Explorer also fails.

I'm getting an ICE when attempting to compile SIMDe on GCC 11 on x86_64 with
-O1.  Here is a minimized test case (or, on Compiler Explorer if you prefer:
):

#include 
typedef struct {
  float d[4];
} f;
typedef __m128 e;
f k;
float ak;
e j;
f fn1(e g) {
  __builtin_memcpy(&k, &g, sizeof(k));
  return k;
}
typedef struct {
  float d[8];
} ag;
void l() {
  ag ao;
  f ap = fn1(j);

#pragma omp simd
  for (size_t i = 0; i < sizeof(ao.d[0]); i += 2)
ao.d[i] = ao.d[i + 1] = ap.d[1];
  ag g = ao;
  __builtin_memcpy(&ak, &g, ak);
}
int main() {}

It generates an ICE with `-O1 -fopenmp-simd`. but removing either -O1 or
-fopenmp-simd makes compilation succeed.  This only happens on gcc-11; gcc-10
works as expected.

The original code which I fed into C-Reduce is at
,
and the `simde_mm512_maskz_broadcast_f64x2` function it calls is at


[Bug target/98521] New: [x86] Missing/incorrect XOP functions

2021-01-04 Thread evan--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98521

Bug ID: 98521
   Summary: [x86] Missing/incorrect XOP functions
   Product: gcc
   Version: 10.2.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: e...@coeus-group.com
  Target Milestone: ---

_mm256_cmov_si256 is missing from xopintrin.h.  It is present on clang and
Visual Studio, and mentioned in AMD's documentation at
https://developer.amd.com/wordpress/media/2012/10/New-Bulldozer-and-Piledriver-Instructions.pdf

Additionally, _mm_frcz_ss and _mm_frcz_sd take two arguments in GCC's header
but should only take one (as they do in clang and VS).

[Bug target/98521] [x86] Missing/incorrect XOP functions

2021-01-04 Thread evan--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98521

--- Comment #1 from Evan Nemerson  ---
Sorry, VS has two parametrs for _mm_frcz_ss and _mm_frcz_sd; clang is the
outlier.

So just the missing _mm256_cmov_si256.

[Bug target/98734] New: ABI diagnostics emitted despite always_inline attribute

2021-01-18 Thread evan--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98734

Bug ID: 98734
   Summary: ABI diagnostics emitted despite always_inline
attribute
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: e...@coeus-group.com
  Target Milestone: ---

On POWER, I get a lot of "note: the ABI of passing aggregates with 16-byte
alignment has changed in GCC 5" diagnostics unless I pass -Wno-psabi to the
compiler.  This happens even if all the functions which accept the offending
type are also annotated with __attribute__((__always_inline__)) static inline. 
Not even a '#pragma GCC diagnostic ignored "-Wpsabi"' can disable it.

Here is a quick example:

  #include 

  typedef struct {
int32_t i32 __attribute__((__aligned__(16)));
  } vec;

  #pragma GCC diagnostic ignored "-Wpsabi"

  __attribute__((__always_inline__))
  static inline vec
  add_i32x4 (vec a, vec b) {
vec r;
r.i32 = a.i32 + b.i32;
return r;
  }

  void
  foo (int32_t r[4], int32_t a[4], int32_t b[4]) {
vec va, vb, vr;
__builtin_memcpy(&va, a, sizeof(va));
__builtin_memcpy(&vb, b, sizeof(vb));
vr = add_i32x4(va, vb);
__builtin_memcpy(r, &vr, sizeof(vr));
  }

$ gcc -mcpu=power8 -c -o psabi.o psabi.c
psabi.c: In function ‘add_i32x4’:
psabi.c:11:1: note: the ABI of passing aggregates with 16-byte alignment has
changed in GCC 5
   11 | add_i32x4 (vec a, vec b) {
  | ^

Ideally, I don't think this should be emitted if the ABI isn't publicly exposed
(i.e., if the function is static, or always_inline.  It would also be good if
it could be disabled with a pragma-.  Finally, the diagnostic message itself
should really mention that it is triggered by -Wpsabi so people don't have to
use Google to figure out what to do to work around the issue.

[Bug target/95782] [ppc64le] ICE in _cpp_pop_context

2021-05-24 Thread evan--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95782

--- Comment #1 from Evan Nemerson  ---
This seems to also happen on s390x with -mzvector:


# s390x-linux-gnu-gcc-10 -march=z14 -mzvector -o test test.c
test.c:4:1: internal compiler error: in _cpp_pop_context, at
libcpp/macro.c:2644
4 | b(vector double)
  | ^
0x11da337 _cpp_pop_context
../../src/libcpp/macro.c:2644
0x11dcd67 expand_arg
../../src/libcpp/macro.c:2601
0x11dc41f replace_args
../../src/libcpp/macro.c:1879
0x11dc41f enter_macro_context
../../src/libcpp/macro.c:1421
0x11dc7e8 cpp_get_token_1
../../src/libcpp/macro.c:2891
0x6393fe c_lex_with_flags(tree_node**, unsigned int*, unsigned char*, int)
../../src/gcc/c-family/c-lex.c:458
0x5c96a5 c_lex_one_token
../../src/gcc/c/c-parser.c:270
0x5f6517 c_parser_peek_token(c_parser*)
../../src/gcc/c/c-parser.c:474
0x5f6517 c_parse_file()
../../src/gcc/c/c-parser.c:21742
0x64054b c_common_parse_file()
../../src/gcc/c-family/c-opts.c:1190
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See  for instructions.

[Bug target/100760] New: [mips + msa] ICE: maximum number of generated reload insns per insn achieved

2021-05-25 Thread evan--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100760

Bug ID: 100760
   Summary: [mips + msa] ICE: maximum number of generated reload
insns per insn achieved
   Product: gcc
   Version: 10.2.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: e...@coeus-group.com
  Target Milestone: ---

Created attachment 50869
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50869&action=edit
preprocessed, un-reduced reproducer

I'm getting an ICE when attempting to compile some code on MIPS with MSA which
works on other architectures, and on MIPS without MSA.  Here is a reduced test
case courtesy of C-Reduce:


typedef int a;
void b() { a __attribute__((__vector_size__(8))) c{1, 1}; }


Compile with `mips64el-linux-gnuabi64-g++-10 -march=loongson3a -mmsa -o test
test.c`

I've also attached a pre-processed copy of the original, non-reduced code.  The
original is at
https://github.com/simd-everywhere/simde/blob/7d0e2aca9458f760d7196b94bfdcf83b2178ea24/simde/arm/neon/cmla_rot90.h#L50-L52
(SIMDE_SHUFFLE_VECTOR_ is defined at
https://github.com/simd-everywhere/simde/blob/7d0e2aca9458f760d7196b94bfdcf83b2178ea24/simde/simde-common.h#L278-L281)

This is with 10.2.1-6 from Debian:

/usr/bin/mips64el-linux-gnuabi64-gcc-10 --version
mips64el-linux-gnuabi64-gcc-10 (Debian 10.2.1-6) 10.2.1 20210110
Copyright (C) 2020 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

[Bug target/97248] [mips] unrecognizable insn when left shifting uint64 vector by scalar with MSA

2021-05-25 Thread evan--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97248

--- Comment #1 from Evan Nemerson  ---
SImilar issue with right shift:


typedef long a;
typedef struct {
  a b __attribute__((__vector_size__(64)));
} c;
void d() {
  int e;
  c f, g;
  f.b = g.b >> e;
}


$ mips64el-linux-gnuabi64-g++-10 -march=loongson3a -mmsa -o test test.cpp
test.cpp: In function 'void d()':
test.cpp:9:1: error: unrecognizable insn:
9 | }
  | ^
(insn 30 29 31 2 (set (reg:DI 216)
(subreg:DI (mem/c:SI (reg/f:DI 189 virtual-stack-vars) [1 e+0 S4 A32])
0)) "test.cpp":8:13 -1
 (nil))
during RTL pass: vregs
test.cpp:9:1: internal compiler error: in extract_insn, at recog.c:2294
0x7f567e543d09 __libc_start_main
../csu/libc-start.c:308
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See  for instructions.

[Bug target/100761] New: [mips+msa] ICE when using __builtin_convertvector to convert from u8x8 to u8x16

2021-05-25 Thread evan--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100761

Bug ID: 100761
   Summary: [mips+msa] ICE when using __builtin_convertvector to
convert from u8x8 to u8x16
   Product: gcc
   Version: 10.2.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: e...@coeus-group.com
  Target Milestone: ---

I'm seeing an ICE from MIPS with MSA enabled when attempting to use
__builtin_convertvector to convert a vector of 8 unsigned 8-bit integers to 8
unsigned 16-bit integers.

Here is a reduced test case, courtesy of C-Reduce:


typedef char a;
typedef short b;
typedef struct {
  a c __attribute__((__vector_size__(8)));
} d;
d e;
void f() {
  b g __attribute__((__vector_size__(16))) (
  __builtin_convertvector(e.c, __typeof__(g)));
}


$ mips64el-linux-gnuabi64-g++-10 -march=loongson3a -mmsa -c -o test.o test.c
during RTL pass: expand
test.c: In function 'void f()':
test.c:8:5: internal compiler error: in mips_expand_vec_unpack, at
config/mips/mips.c:21757
8 |   b g __attribute__((__vector_size__(16))) (
  | ^
0x7fed0d1f9d09 __libc_start_main
../csu/libc-start.c:308
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See  for instructions.


This is with GCC 10.2.1-6 from Debian:

$ mips64el-linux-gnuabi64-g++-10 --version  
mips64el-linux-gnuabi64-g++-10 (Debian 10.2.1-6) 10.2.1 20210110
Copyright (C) 2020 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.


Original code is at
https://github.com/simd-everywhere/simde/blob/7d0e2aca9458f760d7196b94bfdcf83b2178ea24/simde/x86/sse.h#L1046,
SIMDE_CONVERT_VECTOR_ is defined at
https://github.com/simd-everywhere/simde/blob/7d0e2aca9458f760d7196b94bfdcf83b2178ea24/simde/simde-common.h#L292-L296

[Bug target/100762] New: [mips+msa] ICE when comparing 64 bit vectors

2021-05-25 Thread evan--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100762

Bug ID: 100762
   Summary: [mips+msa] ICE when comparing 64 bit vectors
   Product: gcc
   Version: 10.2.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: e...@coeus-group.com
  Target Milestone: ---

In MIPS with MSA enabled on GCC 10.2.1, any comparison operation on 64-bit
vectors results in an ICE.

Test case:


typedef int i32x2 __attribute__((__vector_size__(8)));

i32x2 cmp(i32x2 a, i32x2 b) {
  return a >= b;
}


$ mips64el-linux-gnuabi64-gcc-10 -march=loongson3a -mmsa -c -o test.o test.c
mips64el-linux-gnuabi64-gcc-10 -march=loongson3a -mmsa -c -o test.o test2.c
during RTL pass: expand
test.c: In function 'cmp':
test.c:4:12: internal compiler error: in mips_expand_vector_init, at
config/mips/mips.c:22076
4 |   return a >= b;
  |  ~~^~~~
0x7f420202dd09 __libc_start_main
../csu/libc-start.c:308
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See  for instructions.


This is with GCC 10.2.1-6 from Debian:


$ mips64el-linux-gnuabi64-gcc-10 --version
mips64el-linux-gnuabi64-gcc-10 (Debian 10.2.1-6) 10.2.1 20210110
Copyright (C) 2020 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

[Bug target/100762] [mips+msa] ICE when comparing 64 bit vectors

2021-05-26 Thread evan--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100762

--- Comment #1 from Evan Nemerson  ---
It's not just comparisons.  <<, >>, /, * also don't work.  AFAICT only bitwise
operations and +/- work, as well as everything with a 64-bit element type
(i.e., a vector of one element)… 8/16/32-bit elements all fail.

[Bug target/100927] New: [sse2] floating point to integer conversion functions incorrect results w/ NaN constants + optimization

2021-06-05 Thread evan--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100927

Bug ID: 100927
   Summary: [sse2] floating point to integer conversion functions
incorrect results w/ NaN constants + optimization
   Product: gcc
   Version: 11.1.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: e...@coeus-group.com
  Target Milestone: ---

_mm_cvttpd_epi32, _mm_cvttpd_pi32, _mm_cvttps_epi32, and _mm_cvttsd_si32 are
supposed to return INT32_MIN for NaN inputs.  However, when compiled with
optimization on GCC, if the values are known at compile time NaN inputs result
in 0 in the output.

Here is a quick test case, using _mm_cvttpd_epi32:


#include 
#include 

int main(void) {
  static const double values[] = {
__builtin_nan(""), -__builtin_nan("")
  };
  int32_t res[4];

  _mm_storeu_si128((__m128i*) res, _mm_cvttpd_epi32(_mm_loadu_pd(values)));

  for (int i = 0 ; i < 4 ; i++) {
printf("%d\n", res[i]);
  }

  return 0;
}


Compile with `gcc -O1 -o test test.c` and you get all zeros, `gcc -O0 -o test
test.c` and the first two elements of the result are INT32_MIN as they should
be.  Changing the const to volatile (and adding -Wno-discarded-qualifiers)
"fixes" the issue.

[Bug target/105339] New: [x86] missing AVX-512F scalef functions when optimization is disabled

2022-04-21 Thread evan--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105339

Bug ID: 105339
   Summary: [x86] missing AVX-512F scalef functions when
optimization is disabled
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: e...@coeus-group.com
  Target Milestone: ---

Several AVX-512F functions scalef functions are only declared when __OPTIMIZE__
is defined:

 * _mm_maskz_scalef_ss
 * _mm_mask_scalef_sd
 * _mm_maskz_scalef_sd

There may be others; I haven't done an exhaustive check.

[Bug target/101614] New: [s390] vec_signed requires z15, docs say z13

2021-07-24 Thread evan--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101614

Bug ID: 101614
   Summary: [s390] vec_signed requires z15, docs say z13
   Product: gcc
   Version: 11.1.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: e...@coeus-group.com
  Target Milestone: ---

Documentation:
https://www.ibm.com/docs/en/zos/2.4.0?topic=conversion-vec-signed-vector-convert-floating-point-signed

Quick test case:


$ cat signed.c
#include 

__vector signed int
foo(__vector float a) {
  #if __ARCH__ >= 11 /* 11 = z13 */
return vec_signed(a);
  #else
return __builtin_convertvector(a, __vector signed int);
  #endif
}
$ s390x-linux-gnu-gcc -march=z13 -mzvector -c -o signed.o signed.c
In file included from signed.c:1:
signed.c: In function ‘foo’:
signed.c:6:12: error: Builtin ‘__builtin_s390_vcfeb’ requires z15 or higher.
6 | return vec_signed(a);
  |

[Bug target/101614] [s390] vec_signed requires z15, docs say z13

2021-07-25 Thread evan--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101614

Evan Nemerson  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |INVALID

--- Comment #1 from Evan Nemerson  ---
Never mind; the ARCH in the documentation refers to the same value as __ARCH__,
not -march=zN

[Bug target/101714] New: [POWER] vec_min / vec_max handles NaN incorrectly when evaluated at compile time

2021-08-01 Thread evan--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101714

Bug ID: 101714
   Summary: [POWER] vec_min / vec_max handles NaN incorrectly when
evaluated at compile time
   Product: gcc
   Version: 10.2.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: e...@coeus-group.com
  Target Milestone: ---

When a call to vec_min or vec_max with one numeric value and one NaN is
evaluated at compile time, the result is NaN instead of the numeric value.

Here is a quick test case:


#include 
#include 

#if defined(NO_INLINE)
__attribute__((__noinline__))
#endif
__vector float
foo(__vector float a, __vector float b) {
  return vec_min(a, b);
}

int main(void) {
  __vector float a = { 1.0f, __builtin_nanf(""), __builtin_nanf(""), 1.0f };
  __vector float b = { __builtin_nanf(""), 1.0f, __builtin_nanf(""), 1.0f };
  __vector float r = foo(a, b);
  for (int i = 0 ; i < 4 ; i++) {
printf("%f\n", r[i]);
  }
}


$ gcc -O3 -o minmax minmax.c && ./minmax
nan
nan
nan
1.00
$ gcc -DNO_INLINE -O3 -o minmax minmax.c && ./minmax
1.00
1.00
nan
1.00
$ gcc --version
gcc (Debian 10.2.1-6) 10.2.1 20210110
Copyright (C) 2020 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.