Re: [PATCH] rs6000, document built-ins vec_test_lsbb_all_ones and, vec_test_lsbb_all_zeros

2024-08-05 Thread Steven Munroe
Looking at the latest version of the Power Vector Intrinsic Programming
Reference (Revision 2.0.0_prd, Bill slipped this to me for review), I see
that
vec_test_lsbb_all_ones
vec_test_lsbb_all_zeros
both specify vector unsigned char, only.

On Mon, Aug 5, 2024 at 1:15 AM Kewen.Lin  wrote:

> on 2024/8/3 05:48, Peter Bergner wrote:
> > On 7/31/24 10:21 PM, Kewen.Lin wrote:
> >> on 2024/8/1 01:52, Carl Love wrote:
> >>> Yes, I noticed that the built-ins were defined as overloaded but only
> had one definition.   Did seem odd to me.
> >>>
>  either is with "vector unsigned char" as argument type, but the
> corresponding instance
>  prototype in builtin table is with "vector signed char".  It's
> inconsistent and weird,
>  I think we can just update the prototype in builtin table with
> "vector unsigned char"
>  and remove the entries in overload table.  It can be a follow up
> patch.
> >>>
> >>> I didn't notice that it was signed in the instance prototype but
> unsigned in the overloaded definition.  That is definitely inconsistent.
> >>>
> >>> That said, should we just go ahead and support both signed and
> unsigned argument versions of the all ones and all zeros built-ins?
> >>
> >> Good question, I thought about that but found openxl only supports the
> unsigned version
> >> so I felt it's probably better to keep consistent with it.  But I'm
> fine for either, if
> >> we decide to extend it to cover both signed and unsigned, we should
> notify openxl team
> >> to extend it as well.
> >>
> >> openxl doc links:
> >>
> >>
> https://www.ibm.com/docs/en/openxl-c-and-cpp-aix/17.1.2?topic=functions-vec-test-lsbb-all-ones
> >>
> https://www.ibm.com/docs/en/openxl-c-and-cpp-aix/17.1.2?topic=functions-vec-test-lsbb-all-zeros
> >
> > If it makes sense to support vector signed char rather than only the
> vector unsigned char,
> > then I'm fine adding support for it.  It almost seems since we tried
> adding an overload
> > for it, that that was our intention (to support both signed and
> unsigned) and we just
> > had a bug so only unsigned was supported?
>
> Good question but I'm not sure, it could be an oversight without adding
> one more instance
> for overloading, or adopting some useless code (only for overloading) for
> a single instance.
> I found it's introduced by r11-2437-gcf5d0fc2d1adcd, CC'ed Will as he
> contributed this.
>
> BR,
> Kewen
>
> >
> > CC'ing Steve since he noticed the missing documentation when we was
> trying to
> > use the built-ins.  Steve, do you see a need to also support vector
> signed char
> > with these built-ins?
> >
> > Peter
> >
> >
>
>


Re: [PATCH, AArch64] Add x86 intrinsic headers to GCC AArch64 taget

2017-06-20 Thread Steven Munroe
On Tue, 2017-06-20 at 09:04 +, Hurugalawadi, Naveen wrote:
> Hi Joesph,
> 
> Thanks for your review and valuable comments on this issue.
> 
> Please find attached the patch that merges x86-intrinsics for AArch64 and PPC
> architectures.
> 
> >> it would seem to me to be a bad idea to duplicate the 
> >> implementation for more and more architectures.
> Merged the implementation for AArch64 and PPC architectures.
> 
> The testcase have not been merged yet. Will do it after checking out
> the comments on the current idea of implementation.
> 
> Please check the patch and let me know the comments.
> 
> Bootstrapped and Regression tested on aarch64-thunder-linux and PPC.
> 
I am not sure this works or is even a good idea.

As an accident bmiintrin.h can be implemented as C code or common
builtins. But bmi2intrin.h depends on __builtin_bpermd which to my
knowledge is PowerISA only.

As I work on mmx, sse, sse2, etc it gets more complicated. There are
many X86 intrinsic instances that require altivec.h unique instrisics to
implement efficiently for the power64le target and some inline __asm.

Net the current sample size so far is to small to make a reasonable
assessment.

And as you see see below the gcc.target tests have to be duplicated
anyway. Even if the C code is common there will many differences in
dg-options and dg-require-effective-target. Trying to common these
implementations only creates more small files to manage.

> Thanks,
> Naveen
> 
> 2017-06-20  Naveen H.S  
> 
> [gcc]
>   * config.gcc (aarch64*-*-*): Add bmi2intrin.h, bmiintrin.h,
>   adxintrin.h and x86intrin.h in Config folder.
>   (powerpc*-*-*): Move bmi2intrin.h, bmiintrin.h and x86intrin.h into
>   Config folder.
>   * config/adxintrin.h: New file.
>   * config/bmi2intrin.h: New file.
>   * config/bmiintrin.h: New file.
>   * config/x86intrin.h: New file.
>   * config/rs6000/bmi2intrin.h: Delete file.
>   * config/rs6000/bmiintrin.h: Likewise.
>   * config/rs6000/x86intrin.h: Likewise.
> 
> [gcc/testsuite]
> 
>   * gcc.target/aarch64/adx-addcarryx32-1.c: New file.
>   * gcc.target/aarch64/adx-addcarryx32-2.c: New file.
>   * gcc.target/aarch64/adx-addcarryx32-3.c: New file.
>   * gcc.target/aarch64/adx-addcarryx64-1.c: New file.
>   * gcc.target/aarch64/adx-addcarryx64-2.c: New file
>   * gcc.target/aarch64/adx-addcarryx64-3.c: New file
>   * gcc.target/aarch64/adx-check.h: New file
>   * gcc.target/aarch64/bmi-andn-1.c: New file
>   * gcc.target/aarch64/bmi-andn-2.c: New file.
>   * gcc.target/aarch64/bmi-bextr-1.c: New file.
>   * gcc.target/aarch64/bmi-bextr-2.c: New file.
>   * gcc.target/aarch64/bmi-bextr-4.c: New file.
>   * gcc.target/aarch64/bmi-bextr-5.c: New file.
>   * gcc.target/aarch64/bmi-blsi-1.c: New file.
>   * gcc.target/aarch64/bmi-blsi-2.c: New file.
>   * gcc.target/aarch64/bmi-blsmsk-1.c: new file.
>   * gcc.target/aarch64/bmi-blsmsk-2.c: New file.
>   * gcc.target/aarch64/bmi-blsr-1.c: New file.
>   * gcc.target/aarch64/bmi-blsr-2.c: New File.
>   * gcc.target/aarch64/bmi-check.h: New File.
>   * gcc.target/aarch64/bmi-tzcnt-1.c: new file.
>   * gcc.target/aarch64/bmi-tzcnt-2.c: New file.
>   * gcc.target/aarch64/bmi2-bzhi32-1.c: New file.
>   * gcc.target/aarch64/bmi2-bzhi64-1.c: New file.
>   * gcc.target/aarch64/bmi2-bzhi64-1a.c: New file.
>   * gcc.target/aarch64/bmi2-check.h: New file.
>   * gcc.target/aarch64/bmi2-mulx32-1.c: New file.
>   * gcc.target/aarch64/bmi2-mulx32-2.c: New file.
>   * gcc.target/aarch64/bmi2-mulx64-1.c: New file.
>   * gcc.target/aarch64/bmi2-mulx64-2.c: New file.
>   * gcc.target/aarch64/bmi2-pdep32-1.c: New file.
>   * gcc.target/aarch64/bmi2-pdep64-1.c: New file.
>   * gcc.target/aarch64/bmi2-pext32-1.c: New File.
>   * gcc.target/aarch64/bmi2-pext64-1.c: New file.
>   * gcc.target/aarch64/bmi2-pext64-1a.c: New File.




Re: [PATCH, AArch64] Add x86 intrinsic headers to GCC AArch64 taget

2017-06-21 Thread Steven Munroe
On Tue, 2017-06-20 at 17:16 -0500, Segher Boessenkool wrote:
> On Tue, Jun 20, 2017 at 09:34:25PM +, Joseph Myers wrote:
> > On Tue, 20 Jun 2017, Segher Boessenkool wrote:
> > 
> > > > And as you see see below the gcc.target tests have to be duplicated
> > > > anyway. Even if the C code is common there will many differences in
> > > > dg-options and dg-require-effective-target. Trying to common these
> > > > implementations only creates more small files to manage.
> > > 
> > > So somewhere in the near future we'll have to pull things apart again,
> > > if we go with merging things now.
> > 
> > The common part in the intrinsics implementation should be exactly the 
> > parts that can be implemented in GNU C without target-specific intrinsics 
> > being needed.  There should be nothing to pull apart if you start with the 
> > right things in the common header.  If a particular header has some 
> > functions that can be implemented in GNU C and some that need 
> > target-specific code, the generic GNU C functions should be in a common 
> > header, #included by the target-specific header.  The common header should 
> > have no conditionals on target architectures whatever (it might have 
> > conditionals on things like endianness).
> 
> I don't think there is much that will end up in the common header
> eventually.  If it was possible to describe most of this in plain C,
> and in such a way that it would optimise well, there would not *be*
> these intrinsics.
> 
> > I don't expect many different effective-target / dg-add-options keywords 
> > to be needed for common tests (obviously, duplicating tests for each 
> > architecture wanting these intrinsics is generally a bad idea).
> 
> Yeah, I think it should be possible to share the tests, perhaps with
> some added dg things (so that we don't have to repeat the same things
> over and over).
> 
I don't see how we can share the test as this requires platform unique
dg-options and dg-require-effective-target values to enforce the
platform restrictions you mentioned earlier.





[PATCH, rs6000] correct implementation of _mm_add_pi32

2017-11-15 Thread Steven Munroe
A small thinko in the implementation of _mm_add_pi32 that only shows
when compiling for power9.

./gcc/ChangeLog:

2017-11-15  Steven Munroe  

* config/rs6000/mmintrin.h (_mm_add_pi32[_ARCH_PWR]): Correct
parameter list for vec_splats.

Index: gcc/config/rs6000/mmintrin.h
===
--- gcc/config/rs6000/mmintrin.h(revision 254714)
+++ gcc/config/rs6000/mmintrin.h(working copy)
@@ -463,8 +463,8 @@ _mm_add_pi32 (__m64 __m1, __m64 __m2)
 #if _ARCH_PWR9
   __vector signed int a, b, c;
 
-  a = (__vector signed int)vec_splats (__m1, __m1);
-  b = (__vector signed int)vec_splats (__m2, __m2);
+  a = (__vector signed int)vec_splats (__m1);
+  b = (__vector signed int)vec_splats (__m2);
   c = vec_add (a, b);
   return (__builtin_unpack_vector_int128 ((__vector __int128_t)c, 0));
 #else




[Fwd: [PATCH][Bug target/84266] mmintrin.h intrinsic headers for PowerPC code fails on power9]

2018-02-09 Thread Steven Munroe
--- Begin Message ---
This has a simple fix that I have tested on power8 and Seurer are
tested on power9.

While there may be a more elegent coding for the require casts, this is
the simplest change, considering the current stage.

2018-02-09  Steven Munroe  

* config/rs6000/mmintrin.h (_mm_cmpeq_pi32 [_ARCH_PWR9]):
Cast vec_cmpeq result to correct type.
* config/rs6000/mmintrin.h (_mm_cmpgt_pi32 [_ARCH_PWR9]):
Cast vec_cmpgt result to correct type.

Index: gcc/config/rs6000/mmintrin.h
===
--- gcc/config/rs6000/mmintrin.h(revision 257533)
+++ gcc/config/rs6000/mmintrin.h(working copy)
@@ -854,7 +854,7 @@
 
   a = (__vector signed int)vec_splats (__m1);
   b = (__vector signed int)vec_splats (__m2);
-  c = (__vector signed short)vec_cmpeq (a, b);
+  c = (__vector signed int)vec_cmpeq (a, b);
   return (__builtin_unpack_vector_int128 ((__vector __int128_t)c, 0));
 #else
   __m64_union m1, m2, res;
@@ -883,7 +883,7 @@
 
   a = (__vector signed int)vec_splats (__m1);
   b = (__vector signed int)vec_splats (__m2);
-  c = (__vector signed short)vec_cmpgt (a, b);
+  c = (__vector signed int)vec_cmpgt (a, b);
   return (__builtin_unpack_vector_int128 ((__vector __int128_t)c, 0));
 #else
   __m64_union m1, m2, res;

ready to commit?--- End Message ---


Re: [PATCH] PR target/66224 _GLIBC_READ_MEM_BARRIER

2015-05-21 Thread Steven Munroe
On Wed, 2015-05-20 at 14:40 -0400, David Edelsohn wrote:
> The current definition of _GLIBC_READ_MEM_BARRIER in libstdc++ is too
> weak for an ACQUIRE FENCE, which is what it is intended to be. The
> original code emitted an "isync" instead of "lwsync".
> 
> All of the guard acquire and set code needs to be cleaned up to use
> GCC atomic intrinsics, but this is necessary for correctness.
> 
> Steve, any comment about the Linux part?
> 
This is correct for the PowerISA V2 (POWER4 and later) processors.

I assume the #ifdef __NO_LWSYNC guard is only set for older (ISA V1)
processors.

Thanks




[PATCH, rs6000] 1/2 Add x86 SSE2 intrinsics to GCC PPC64LE target

2017-10-17 Thread Steven Munroe
These is the forth major contribution of X86 intrinsic equivalent
headers for PPC64LE.

X86 SSE2 technology adds double float (__m128d) support, filled in a
number 128-bit vector integer (__m128i) operations and added some MMX
conversions to and from 128-bit vector (XMM) operations.

In general the SSE2 (__m128) intrinsic's are a good match to the
PowerISA VSX 128-bit vector double facilities. This allows direct
mapping of the __m128d type to PowerPC __vector double type and allows
natural handling of parameter passing, return values, and SIMD double
operations. 

However, while both ISA's support double and float scalars in vector
registers the X86_64 and PowerPC64LE use different formats (and bits
within the vector register) for floating point scalars. This requires
extra PowerISA operations to exactly match the X86 SSE scalar (intrinsic
functions ending in *_sd) semantics. The intent is to provide a
functionally correct implementation at some reduction in performance.

Some inline assembler is required. There a several cases where we need 
to generate Data Cache Block instruction. There are no existing builtin
for flush and touch for store transient.  Also some of the double to and
from 32-bit float and int required assembler to the correct semantics
at reasonable cost. Perhaps these can be revisited when the team
completes the builtins for vec_double* and vec_float*.

part 2 adds the associated 131 DG test cases.

./gcc/ChangeLog:

2017-10-17  Steven Munroe  

* config.gcc (powerpc*-*-*): Add emmintrin.h.
* config/rs6000/emmintrin.h: New file.
* config/rs6000/x86intrin.h [__ALTIVEC__]: Include emmintrin.h.

Index: gcc/config.gcc
===
--- gcc/config.gcc  (revision 253786)
+++ gcc/config.gcc  (working copy)
@@ -459,7 +459,7 @@ powerpc*-*-*)
extra_objs="rs6000-string.o rs6000-p8swap.o"
extra_headers="ppc-asm.h altivec.h htmintrin.h htmxlintrin.h"
extra_headers="${extra_headers} bmi2intrin.h bmiintrin.h"
-   extra_headers="${extra_headers} xmmintrin.h mm_malloc.h"
+   extra_headers="${extra_headers} xmmintrin.h mm_malloc.h emmintrin.h"
extra_headers="${extra_headers} mmintrin.h x86intrin.h"
extra_headers="${extra_headers} ppu_intrinsics.h spu2vmx.h vec_types.h 
si2vmx.h"
extra_headers="${extra_headers} paired.h"
Index: gcc/config/rs6000/x86intrin.h
===
--- gcc/config/rs6000/x86intrin.h   (revision 253786)
+++ gcc/config/rs6000/x86intrin.h   (working copy)
@@ -39,6 +39,8 @@
 #include 
 
 #include 
+
+#include 
 #endif /* __ALTIVEC__ */
 
 #include 
Index: gcc/config/rs6000/emmintrin.h
===
--- gcc/config/rs6000/emmintrin.h   (revision 0)
+++ gcc/config/rs6000/emmintrin.h   (revision 0)
@@ -0,0 +1,2413 @@
+/* Copyright (C) 2003-2017 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+/* Implemented from the specification included in the Intel C++ Compiler
+   User Guide and Reference, version 9.0.  */
+
+#ifndef NO_WARN_X86_INTRINSICS
+/* This header is distributed to simplify porting x86_64 code that
+   makes explicit use of Intel intrinsics to powerpc64le.
+   It is the user's responsibility to determine if the results are
+   acceptable and make additional changes as necessary.
+   Note that much code that uses Intel intrinsics can be rewritten in
+   standard C or GNU C extensions, which are more portable and better
+   optimized across multiple targets.
+
+   In the specific case of X86 SSE2 (__m128i, __m128d) intrinsics,
+   the PowerPC VMX/VSX ISA is a good match for vector double SIMD
+   operations.  However scalar double operations in vector (XMM)
+   registers require the POWER8 VSX ISA (2.07) level. Also there are
+   important differences for data format and placement of doub

Re: [PATCH, rs6000] 1/2 Add x86 SSE2 intrinsics to GCC PPC64LE target

2017-10-24 Thread Steven Munroe
On Mon, 2017-10-23 at 16:21 -0500, Segher Boessenkool wrote:
> Hi!
> 
> On Tue, Oct 17, 2017 at 01:24:45PM -0500, Steven Munroe wrote:
> > Some inline assembler is required. There a several cases where we need 
> > to generate Data Cache Block instruction. There are no existing builtin
> > for flush and touch for store transient.
> 
> Would builtins for those help?  Would anything else want to use such
> builtins, I mean?
> 
Yes I think NVMe and In-memory DB in general will want easy access to
these instructions.

Intel provides intrinsic functions

Builtin or intrinsic will be easier then finding and reading the
PowerISA and trying to write your own inline asm

> > +   For PowerISA Scalar double in FPRs (left most 64-bits of the
> > +   low 32 VSRs), while X86_64 SSE2 uses the right most 64-bits of
> > +   the XMM. These differences require extra steps on POWER to match
> > +   the SSE2 scalar double semantics.
> 
> Maybe say "is in FPRs"?  (And two space after a full stop, here and
> elsewhere).
> 
Ok

> > +/* We need definitions from the SSE header files*/
> 
> Dot space space.
>
Ok

> > +/* Sets the low DPFP value of A from the low value of B.  */
> > +extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, 
> > __artificial__))
> > +_mm_move_sd (__m128d __A, __m128d __B)
> > +{
> > +#if 1
> > +  __v2df result = (__v2df) __A;
> > +  result [0] = ((__v2df) __B)[0];
> > +  return (__m128d) result;
> > +#else
> > +  return (vec_xxpermdi(__A, __B, 1));
> > +#endif
> > +}
> 
Meant to check what trunk generated and them pick one. 

Done.

> You probably forgot to finish this?  Or, what are the two versions,
> and why are they both here?  Same question later a few times.
> 
> > +/* Add the lower double-precision (64-bit) floating-point element in
> > + * a and b, store the result in the lower element of dst, and copy
> > + * the upper element from a to the upper element of dst. */
> 
> No leading stars on block comments please.
> 
> > +extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, 
> > __artificial__))
> > +_mm_cmpnge_pd (__m128d __A, __m128d __B)
> > +{
> > +  return ((__m128d)vec_cmplt ((__v2df ) __A, (__v2df ) __B));
> > +}
> 
> You have some spaces before closing parentheses here (and elsewhere --
> please check).
> 
Ok

> > +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, 
> > __artificial__))
> > +_mm_cvtpd_epi32 (__m128d __A)
> > +{
> > +  __v2df rounded = vec_rint (__A);
> > +  __v4si result, temp;
> > +  const __v4si vzero =
> > +{ 0, 0, 0, 0 };
> > +
> > +  /* VSX Vector truncate Double-Precision to integer and Convert to
> > +   Signed Integer Word format with Saturate.  */
> > +  __asm__(
> > +  "xvcvdpsxws %x0,%x1;\n"
> > +  : "=wa" (temp)
> > +  : "wa" (rounded)
> > +  : );
> 
> Why the ";\n"?  And no empty clobber list please.
> 
Ok

> > +extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, 
> > __artificial__))
> > +_mm_cvtps_pd (__m128 __A)
> > +{
> > +  /* Check if vec_doubleh is defined by . If so use that. */
> > +#ifdef vec_doubleh
> > +  return (__m128d) vec_doubleh ((__v4sf)__A);
> > +#else
> > +  /* Otherwise the compiler is not current and so need to generate the
> > + equivalent code.  */
> 
> Do we need this?  The compiler will always be current.
> 
the vec_double* and vec_float* builtins where in flux at the time and
this deferred the problem.

Not sure what their status is now.

Would still need this if we want to backport to GCC7 (AT11) and there
are more places where now we only have asm.


> > +extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, 
> > __artificial__))
> > +_mm_loadl_pd (__m128d __A, double const *__B)
> > +{
> > +  __v2df result = (__v2df)__A;
> > +  result [0] = *__B;
> > +  return (__m128d)result;
> > +}
> > +#ifdef _ARCH_PWR8
> > +/* Intrinsic functions that require PowerISA 2.07 minimum.  */
> 
> You want an empty line before that #ifdef.
> 
Ok fixed

> > +extern __inline int __attribute__((__gnu_inline__, __always_inline__, 
> > __artificial__))
> > +_mm_movemask_pd (__m128d  __A)
> > +{
> > +  __vector __m64 result;
> > +  static const __vector unsigned int perm_mask =
> > +{
> > +#ifdef __LITTLE_ENDIAN__
> > +   0x80800040, 0x80808080, 0x80808080, 0x80808080
> > +#elif __BIG_ENDIAN__
> > +  0x80808080, 0

Re: [PATCH, rs6000] 2/2 Add x86 SSE2 intrinsics to GCC PPC64LE target

2017-10-26 Thread Steven Munroe
On Wed, 2017-10-25 at 18:37 -0500, Segher Boessenkool wrote:
> Hi!
> 
> On Tue, Oct 17, 2017 at 01:27:16PM -0500, Steven Munroe wrote:
> > This it part 2/2 for contributing PPC64LE support for X86 SSE2
> > instrisics. This patch includes testsuite/gcc.target tests for the
> > intrinsics included by emmintrin.h. 
> 
> > --- gcc/testsuite/gcc.target/powerpc/sse2-mmx.c (revision 0)
> > +++ gcc/testsuite/gcc.target/powerpc/sse2-mmx.c (revision 0)
> > @@ -0,0 +1,83 @@
> > +/* { dg-do run } */
> > +/* { dg-options "-O3 -mdirect-move" } */
> > +/* { dg-require-effective-target lp64 } */
> > +/* { dg-require-effective-target p8vector_hw } */
> > +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { 
> > "-mcpu=power8" } } */
> 
> Why this dg-skip-if?  Also, why -mdirect-move?
> 
this is weird test because it the effectively MMX style operations but
added to IA under the SSE2 Technology.

Normally mmintrin.h compare operations require a transfer to/from vector
with direct move for efficient execution on power.

The one exception to that is _mm_cmpeq_pi8 which can be implemented
directly in GPRs using cmpb.

The cmpb instruction is from power6 but I do not want to use
-mcpu=power6 here. -mdirect-move is a compromise.

I suspect that the dg-skip-if is an artifact of the early struggles to
make this stuff work across various --withcpu= settings.

I think the key is dg-require-effective-target p8vector_hw which should
allow dropping both the -mdirect-move and the whole dg-skip-if clause.

Will need to try this change and retest.

> 
> Okay for trunk with that taken care of.  Sorry it took a while.
> 
> Have you tested this on big endian btw?
> 
Yes.

I have tested on P8 BE using --withcpu=[power6 | power7 | power8 ]

> 
> Segher
> 




[PATCH, rs6000] 1/2 Add x86 MMX intrinsics to GCC PPC64LE taget

2017-07-06 Thread Steven Munroe
These is the second major contribution of X86 intrinsic equivalent
headers for PPC64LE.

X86 MMX technology was the earlest integer SIMD and 64-bit scalar
extension for IA32. MMX should have largely been replaced by now with
X86_64 64-bit scalars and SSE 128-bit SIMD operation in modern
application.  However it is still part of the X86 API and and supported
via the mmintrin.h header and numerous GCC built-ins. The mmintrin.h is
included from the SSE instruction headers and x86intrin,h. So it needs
to be there to simplify porting of existing X86 applications to PPC64LE.

In the specific case of X86 MMX (__m64) intrinsics, the PowerPC target
does not support a native __vector_size__ (8) type.  Instead we typedef
__m64 to a 64-bit unsigned long long, which is natively supported in
64-bit mode.  This works well for the _si64 and some _pi32 operations,
but starts to generate long sequences for _pi16 and _pi8 operations.
For those cases it better (faster and smaller code) to transfer __m64
data to the PowerPC (VMX/VSX) vector 128-bit unit, perform the
operation, and then transfer the result back to the __m64 type. This
implies that the direct register move instructions, introduced with
power8, are available for efficient implementation of these transfers.

This patch submission includes just the config.gcc and associated MMX
headers changes to make the review more manageable. A separate patch for
the DG test cases will follow.

./gcc/ChangeLog:

2017-07-06  Steven Munroe  

* config.gcc (powerpc*-*-*): Add mmintrin.h.
* config/rs6000/mmintrin.h: New file.
* config/rs6000/x86intrin.h: Include mmintrin.h.

Index: gcc/config.gcc
===
--- gcc/config.gcc  (revision 249663)
+++ gcc/config.gcc  (working copy)
@@ -456,7 +456,8 @@ powerpc*-*-*)
cpu_type=rs6000
extra_objs="rs6000-string.o"
extra_headers="ppc-asm.h altivec.h htmintrin.h htmxlintrin.h"
-   extra_headers="${extra_headers} bmi2intrin.h bmiintrin.h x86intrin.h"
+   extra_headers="${extra_headers} bmi2intrin.h bmiintrin.h"
+   extra_headers="${extra_headers} mmintrin.h x86intrin.h"
extra_headers="${extra_headers} ppu_intrinsics.h spu2vmx.h vec_types.h
si2vmx.h"
extra_headers="${extra_headers} paired.h"
case x$with_cpu in
Index: gcc/config/rs6000/mmintrin.h
===
--- gcc/config/rs6000/mmintrin.h(revision 0)
+++ gcc/config/rs6000/mmintrin.h(revision 0)
@@ -0,0 +1,1444 @@
+/* Copyright (C) 2002-2017 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License
and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not,
see
+   <http://www.gnu.org/licenses/>.  */
+
+/* Implemented from the specification included in the Intel C++
Compiler
+   User Guide and Reference, version 9.0.  */
+
+#ifndef NO_WARN_X86_INTRINSICS
+/* This header is distributed to simplify porting x86_64 code that
+   makes explicit use of Intel intrinsics to powerpc64le.
+   It is the user's responsibility to determine if the results are
+   acceptable and make additional changes as necessary.
+   Note that much code that uses Intel intrinsics can be rewritten in
+   standard C or GNU C extensions, which are more portable and better
+   optimized across multiple targets.
+
+   In the specific case of X86 MMX (__m64) intrinsics, the PowerPC
+   target does not support a native __vector_size__ (8) type.  Instead
+   we typedef __m64 to a 64-bit unsigned long long, which is natively
+   supported in 64-bit mode.  This works well for the _si64 and some
+   _pi32 operations, but starts to generate long sequences for _pi16
+   and _pi8 operations.  For those cases it better (faster and
+   smaller code) to transfer __m64 data to the PowerPC vector 128-bit
+   unit, perform the operation, and then transfer the result back to
+   the __m64 type. This implies that the direct register move
+   instructions, introduced with power8, are available for efficient
+   implementat

[PATCH rs6000] Fix up BMI/BMI2 intrinsic DG tests

2017-07-17 Thread Steven Munroe
After a resent GCC change the previously submitted BMI/BMI2 intrinsic
test started to fail with the following warning/error.

ppc_cpu_supports_hw_available122373.c: In function 'main':
ppc_cpu_supports_hw_available122373.c:9:10: warning:
__builtin_cpu_supports need
s GLIBC (2.23 and newer) that exports hardware capability bits

The does not occur on systems with the newer (2.23) GLIBC but is common
on older (stable) distos.

As this is coming from the bmi-check.h and bmi2-check.h includes (and
not the tests directly) it seems simpler to simply skip the test unless
__BUILTIN_CPU_SUPPORTS__ is defined.


[gcc/testsuite]

2017-07-17  Steven Munroe  

*gcc.target/powerpc/bmi-check.h (main): Skip unless
__BUILTIN_CPU_SUPPORTS__ defined.
*gcc.target/powerpc/bmi2-check.h (main): Skip unless
__BUILTIN_CPU_SUPPORTS__ defined.

Index: gcc/testsuite/gcc.target/powerpc/bmi-check.h
===
--- gcc/testsuite/gcc.target/powerpc/bmi-check.h(revision 250212)
+++ gcc/testsuite/gcc.target/powerpc/bmi-check.h(working copy)
@@ -13,6 +13,7 @@ do_test (void)
 int
 main ()
 {
+#ifdef __BUILTIN_CPU_SUPPORTS__
   /* Need 64-bit for 64-bit longs as single instruction.  */
   if ( __builtin_cpu_supports ("ppc64") )
 {
@@ -25,6 +26,6 @@ main ()
   else
 printf ("SKIPPED\n");
 #endif
-
+#endif /* __BUILTIN_CPU_SUPPORTS__ */
   return 0;
 }
Index: gcc/testsuite/gcc.target/powerpc/bmi2-check.h
===
--- gcc/testsuite/gcc.target/powerpc/bmi2-check.h   (revision 250212)
+++ gcc/testsuite/gcc.target/powerpc/bmi2-check.h   (working copy)
@@ -13,6 +13,7 @@ do_test (void)
 int
 main ()
 {
+#ifdef __BUILTIN_CPU_SUPPORTS__
   /* The BMI2 test for pext test requires the Bit Permute doubleword
  (bpermd) instruction added in PowerISA 2.06 along with the VSX
  facility.  So we can test for arch_2_06.  */
@@ -27,7 +28,7 @@ main ()
   else
 printf ("SKIPPED\n");
 #endif
-
+#endif /* __BUILTIN_CPU_SUPPORTS__ */
   return 0;
 }
 




[PATCH, rs6000] Rev 2, 1/2 Add x86 MMX intrinsics to GCC PPC64LE target

2017-07-17 Thread Steven Munroe
Correct the problems Segher found in review and added a changes to deal
with the fallout from the __builtin_cpu_supports warning for older
distros.

Tested on P8 LE and P6/P7/P8 BE. No new tests failures.

./gcc/ChangeLog:

2017-07-17  Steven Munroe  

* config.gcc (powerpc*-*-*): Add mmintrin.h.
* config/rs6000/mmintrin.h: New file.
* config/rs6000/x86intrin.h [__ALTIVEC__]: Include mmintrin.h.

Index: gcc/config/rs6000/mmintrin.h
===
--- gcc/config/rs6000/mmintrin.h(revision 0)
+++ gcc/config/rs6000/mmintrin.h(working copy)
@@ -0,0 +1,1456 @@
+/* Copyright (C) 2002-2017 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+/* Implemented from the specification included in the Intel C++ Compiler
+   User Guide and Reference, version 9.0.  */
+
+#ifndef NO_WARN_X86_INTRINSICS
+/* This header is distributed to simplify porting x86_64 code that
+   makes explicit use of Intel intrinsics to powerpc64le.
+   It is the user's responsibility to determine if the results are
+   acceptable and make additional changes as necessary.
+   Note that much code that uses Intel intrinsics can be rewritten in
+   standard C or GNU C extensions, which are more portable and better
+   optimized across multiple targets.
+
+   In the specific case of X86 MMX (__m64) intrinsics, the PowerPC
+   target does not support a native __vector_size__ (8) type.  Instead
+   we typedef __m64 to a 64-bit unsigned long long, which is natively
+   supported in 64-bit mode.  This works well for the _si64 and some
+   _pi32 operations, but starts to generate long sequences for _pi16
+   and _pi8 operations.  For those cases it better (faster and
+   smaller code) to transfer __m64 data to the PowerPC vector 128-bit
+   unit, perform the operation, and then transfer the result back to
+   the __m64 type. This implies that the direct register move
+   instructions, introduced with power8, are available for efficient
+   implementation of these transfers.
+
+   Most MMX intrinsic operations can be performed efficiently as
+   C language 64-bit scalar operation or optimized to use the newer
+   128-bit SSE/Altivec operations.  We recomend this for new
+   applications.  */
+#warning "Please read comment above.  Use -DNO_WARN_X86_INTRINSICS to disable 
this warning."
+#endif
+
+#ifndef _MMINTRIN_H_INCLUDED
+#define _MMINTRIN_H_INCLUDED
+
+#include 
+/* The Intel API is flexible enough that we must allow aliasing with other
+   vector types, and their scalar components.  */
+typedef __attribute__ ((__aligned__ (8))) unsigned long long __m64;
+
+typedef __attribute__ ((__aligned__ (8)))
+union
+  {
+__m64 as_m64;
+char as_char[8];
+signed char as_signed_char [8];
+short as_short[4];
+int as_int[2];
+long long as_long_long;
+float as_float[2];
+double as_double;
+  } __m64_union;
+
+/* Empty the multimedia state.  */
+extern __inline void __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
+_mm_empty (void)
+{
+  /* nothing to do on PowerPC.  */
+}
+
+extern __inline void __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
+_m_empty (void)
+{
+  /* nothing to do on PowerPC.  */
+}
+
+/* Convert I to a __m64 object.  The integer is zero-extended to 64-bits.  */
+extern __inline __m64  __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
+_mm_cvtsi32_si64 (int __i)
+{
+  return (__m64) (unsigned int) __i;
+}
+
+extern __inline __m64  __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
+_m_from_int (int __i)
+{
+  return _mm_cvtsi32_si64 (__i);
+}
+
+/* Convert the lower 32 bits of the __m64 object into an integer.  */
+extern __inline int __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
+_mm_cvtsi64_si32 (__m64 __i)
+{
+  return ((int) __i);
+}
+
+extern __inline int __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
+_m_to_int (__m64 __i)
+{
+  return _mm_cvtsi64_si3

Re: [PATCH rs6000] Fix up BMI/BMI2 intrinsic DG tests

2017-07-18 Thread Steven Munroe
On Tue, 2017-07-18 at 16:54 -0500, Segher Boessenkool wrote:
> Hi!
> 
> On Mon, Jul 17, 2017 at 01:28:20PM -0500, Steven Munroe wrote:
> > After a resent GCC change the previously submitted BMI/BMI2 intrinsic
> > test started to fail with the following warning/error.
> > 
> > ppc_cpu_supports_hw_available122373.c: In function 'main':
> > ppc_cpu_supports_hw_available122373.c:9:10: warning:
> > __builtin_cpu_supports need
> > s GLIBC (2.23 and newer) that exports hardware capability bits
> > 
> > The does not occur on systems with the newer (2.23) GLIBC but is common
> > on older (stable) distos.
> > 
> > As this is coming from the bmi-check.h and bmi2-check.h includes (and
> > not the tests directly) it seems simpler to simply skip the test unless
> > __BUILTIN_CPU_SUPPORTS__ is defined.
> 
> So this will skip on most current systems; is there no reasonable
> way around that?
> 
The work around would be to add an #else leg where we obtain the address
of the auxv then scan for the AT_PLATFOM, AT_HWCAP, and AT_HWCAP2
entries. Then perform the required string compares and / or bit tests.

> Okay otherwise.  One typo thing:
> 
> > 2017-07-17  Steven Munroe  
> > 
> > *gcc.target/powerpc/bmi-check.h (main): Skip unless
> > __BUILTIN_CPU_SUPPORTS__ defined.
> > *gcc.target/powerpc/bmi2-check.h (main): Skip unless
> > __BUILTIN_CPU_SUPPORTS__ defined.
> 
> There should be a space after the asterisks.
> 
> 
> Segher
> 




[PATCH, rs6000] 2/2 Add x86 MMX intrinsics DG tests to GCC PPC64LE taget

2017-07-19 Thread Steven Munroe
This it part 2/2 for contributing PPC64LE support for X86 MMX
instrisics. This patch adds the DG tests to verify the headers contents.
Oddly there are very few MMX specific included in i386 so I had to adapt
some the SSE tested to smaller vector size.

[gcc/testsuite]

2017-07-18  Steven Munroe  

* gcc.target/powerpc/mmx-check.h: New file.
* gcc.target/powerpc/mmx-packs.c: New file.
* gcc.target/powerpc/mmx-packssdw-1.c: New file.
* gcc.target/powerpc/mmx-packsswb-1.c: New file.
* gcc.target/powerpc/mmx-packuswb-1.c: New file.
* gcc.target/powerpc/mmx-paddb-1.c: New file.
* gcc.target/powerpc/mmx-paddd-1.c: New file.
* gcc.target/powerpc/mmx-paddsb-1.c: New file.
* gcc.target/powerpc/mmx-paddsw-1.c: New file.
* gcc.target/powerpc/mmx-paddusb-1.c: New file.
* gcc.target/powerpc/mmx-paddusw-1.c: New file.
* gcc.target/powerpc/mmx-paddw-1.c: New file.
* gcc.target/powerpc/mmx-pcmpeqb-1.c: New file.
* gcc.target/powerpc/mmx-pcmpeqd-1.c: New file.
* gcc.target/powerpc/mmx-pcmpeqw-1.c: New file.
* gcc.target/powerpc/mmx-pcmpgtb-1.c: New file.
* gcc.target/powerpc/mmx-pcmpgtd-1.c: New file.
* gcc.target/powerpc/mmx-pcmpgtw-1.c: New file.
* gcc.target/powerpc/mmx-pmaddwd-1.c: New file.
* gcc.target/powerpc/mmx-pmulhw-1.c: New file.
* gcc.target/powerpc/mmx-pmullw-1.c: New file.
* gcc.target/powerpc/mmx-pslld-1.c: New file.
* gcc.target/powerpc/mmx-psllw-1.c: New file.
* gcc.target/powerpc/mmx-psrad-1.c: New file.
* gcc.target/powerpc/mmx-psraw-1.c: New file.
* gcc.target/powerpc/mmx-psrld-1.c: New file.
* gcc.target/powerpc/mmx-psrlw-1.c: New file.
* gcc.target/powerpc/mmx-psubb-2.c: New file.
* gcc.target/powerpc/mmx-psubd-2.c: New file.
* gcc.target/powerpc/mmx-psubsb-1.c: New file.
* gcc.target/powerpc/mmx-psubsw-1.c: New file.
* gcc.target/powerpc/mmx-psubusb-1.c: New file.
* gcc.target/powerpc/mmx-psubusw-1.c: New file.
* gcc.target/powerpc/mmx-psubw-2.c: New file.
* gcc.target/powerpc/mmx-punpckhbw-1.c: New file.
* gcc.target/powerpc/mmx-punpckhdq-1.c: New file.
* gcc.target/powerpc/mmx-punpckhwd-1.c: New file.
* gcc.target/powerpc/mmx-punpcklbw-1.c: New file.
* gcc.target/powerpc/mmx-punpckldq-1.c: New file.
* gcc.target/powerpc/mmx-punpcklwd-1.c: New file.

Index: gcc/testsuite/gcc.target/powerpc/mmx-check.h
===
--- gcc/testsuite/gcc.target/powerpc/mmx-check.h(nonexistent)
+++ gcc/testsuite/gcc.target/powerpc/mmx-check.h(working copy)
@@ -0,0 +1,35 @@
+#include 
+#include 
+
+static void mmx_test (void);
+
+static void
+__attribute__ ((noinline))
+do_test (void)
+{
+  mmx_test ();
+}
+
+int
+main ()
+  {
+#ifdef __BUILTIN_CPU_SUPPORTS__
+/* Many MMX intrinsics are simpler / faster to implement by
+ * transferring the __m64 (long int) to vector registers for SIMD
+ * operations.  To be efficient we also need the direct register
+ * transfer instructions from POWER8.  So we can test for
+ * arch_2_07.  */
+if ( __builtin_cpu_supports ("arch_2_07") )
+  {
+   do_test ();
+#ifdef DEBUG
+   printf ("PASSED\n");
+#endif
+  }
+#ifdef DEBUG
+else
+  printf ("SKIPPED\n");
+#endif
+#endif /* __BUILTIN_CPU_SUPPORTS__ */
+return 0;
+  }
Index: gcc/testsuite/gcc.target/powerpc/mmx-packs.c
===
--- gcc/testsuite/gcc.target/powerpc/mmx-packs.c(nonexistent)
+++ gcc/testsuite/gcc.target/powerpc/mmx-packs.c(working copy)
@@ -0,0 +1,91 @@
+/* { dg-do run } */
+/* { dg-options "-O3 -mpower8-vector" } */
+/* { dg-require-effective-target lp64 } */
+/* { dg-require-effective-target p8vector_hw { target powerpc*-*-* } }
*/
+
+#define NO_WARN_X86_INTRINSICS 1
+#include 
+#include "mmx-check.h"
+
+#ifndef TEST
+#define TEST mmx_test
+#endif
+
+static void
+__attribute__ ((noinline))
+check_packs_pu16 (unsigned long long int src1, unsigned long long int
src2,
+  unsigned long long int res_ref)
+{
+  unsigned long long int res;
+
+  res = (unsigned long long int) _mm_packs_pu16 ((__m64 ) src1, (__m64
) src2);
+
+  if (res != res_ref)
+abort ();
+}
+
+static void
+__attribute__ ((noinline))
+check_packs_pi16 (unsigned long long int src1, unsigned long long int
src2,
+  unsigned long long int res_ref)
+{
+  unsigned long long int res;
+
+  res = (unsigned long long int) _mm_packs_pi16 ((__m64 ) src1, (__m64
) src2);
+
+
+  if (res != res_ref)
+abort ();
+}
+
+static void
+__attribute__ ((noinline))
+check_packs_pi32 (unsigned long long int src1, unsigned long long int
src2,
+ unsigned long long int r

Re: [PATCH rs6000] Fix up BMI/BMI2 intrinsic DG tests

2017-07-19 Thread Steven Munroe
On Wed, 2017-07-19 at 12:45 -0500, Segher Boessenkool wrote:
> On Tue, Jul 18, 2017 at 05:10:42PM -0500, Steven Munroe wrote:
> > On Tue, 2017-07-18 at 16:54 -0500, Segher Boessenkool wrote:
> > > On Mon, Jul 17, 2017 at 01:28:20PM -0500, Steven Munroe wrote:
> > > > After a resent GCC change the previously submitted BMI/BMI2 intrinsic
> > > > test started to fail with the following warning/error.
> > > > 
> > > > ppc_cpu_supports_hw_available122373.c: In function 'main':
> > > > ppc_cpu_supports_hw_available122373.c:9:10: warning:
> > > > __builtin_cpu_supports need
> > > > s GLIBC (2.23 and newer) that exports hardware capability bits
> > > > 
> > > > The does not occur on systems with the newer (2.23) GLIBC but is common
> > > > on older (stable) distos.
> > > > 
> > > > As this is coming from the bmi-check.h and bmi2-check.h includes (and
> > > > not the tests directly) it seems simpler to simply skip the test unless
> > > > __BUILTIN_CPU_SUPPORTS__ is defined.
> > > 
> > > So this will skip on most current systems; is there no reasonable
> > > way around that?
> > > 
> > The work around would be to add an #else leg where we obtain the address
> > of the auxv then scan for the AT_PLATFOM, AT_HWCAP, and AT_HWCAP2
> > entries. Then perform the required string compares and / or bit tests.
> 
> Yeah let's not do that.  We'll just have to live with less test
> coverage by random testers, for now.  It's no different from any other
> new feature in that regard.
> 
So proceed with check in ?





Re: [PATCH, rs6000] 2/2 Add x86 MMX intrinsics DG tests to GCC PPC64LE taget

2017-07-20 Thread Steven Munroe
On Wed, 2017-07-19 at 16:42 -0500, Segher Boessenkool wrote:
> Hi Steve,
> 
> On Wed, Jul 19, 2017 at 10:14:01AM -0500, Steven Munroe wrote:
> > This it part 2/2 for contributing PPC64LE support for X86 MMX
> > instrisics. This patch adds the DG tests to verify the headers contents.
> > Oddly there are very few MMX specific included in i386 so I had to adapt
> > some the SSE tested to smaller vector size.
> 
> Juat two comments...
> 
> > +/* Many MMX intrinsics are simpler / faster to implement by
> > + * transferring the __m64 (long int) to vector registers for SIMD
> > + * operations.  To be efficient we also need the direct register
> > + * transfer instructions from POWER8.  So we can test for
> > + * arch_2_07.  */
> 
> We don't use leading * in block comments.  Not that I care in test
> cases, but you seem to be following the coding standards otherwise :-)
> 
This is Eclipse CDT GNU format-er. Seems it is acceptable, most of the
time.

I will try to convince it not to add the leading * in the future. For
now I'll fix the comment manually before I commit.


> > --- gcc/testsuite/gcc.target/powerpc/mmx-packs.c(nonexistent)
> > +++ gcc/testsuite/gcc.target/powerpc/mmx-packs.c(working copy)
> > @@ -0,0 +1,91 @@
> > +/* { dg-do run } */
> > +/* { dg-options "-O3 -mpower8-vector" } */
> > +/* { dg-require-effective-target lp64 } */
> > +/* { dg-require-effective-target p8vector_hw { target powerpc*-*-* } }
> > */
> 
> Why have the target selector here, and not on the dg-options line as
> well?  Don't we need it in both places, or neither?  (I think you don't
> need it, same for all other files here).
> 
I was backed into this because we dont have the 
/* (dg-require-effective-target p8vector_min } */

yet. And we don't want to use -mcpu=power8 if we mean power8 or power9
and later.

The { target powerpc*-*-* } bit is there to enable possible future
sharing of DG tests across platforms. So I should either remove the
target selector or add it any line that the platform specific? For
example:

/* { dg-options "-O3 -mpower8-vector" { target powerpc*-*-* } } */

If you agree with the above, I will correct and commit.




[PATCH, rs6000] 1/3 Add x86 SSE intrinsics to GCC PPC64LE taget

2017-08-16 Thread Steven Munroe
These is the third major contribution of X86 intrinsic equivalent
headers for PPC64LE.

X86 SSE technology was the second SIMD extension which added wider
128-bit vector (XMM) registers and single precision float capability.
They also addressed missing MMX capabilies and provided transfers (move,
pack, unpack) operations between MMX and XMM registers. This was
embodied in the xmmintrin.h> header (in part 2/3). The implementation
also provided the mm_malloc.h API to allow for correct 16-byte alignment
where the system malloc may only provide 8-byte alignment. PowerPC64LE
can assume the PowerPC quadword (16-byte) alignment but we provide this
header and API to ease the application porting process. The mm_malloc.h
header is implicitly included by xmmintrin.h.

In general the SSE (__m128) intrinsic's are a better match to the
PowerISA VMX/VSX 128-bit vector facilities. This allows direct mapping
of the __m128 type to PowerPC __vector float and allows natural handling
of parameter passing return values and SIMD float operations. 

However while both ISA's support float scalars in vector registers the
X86_64 and PowerPC64LE use different formats (and bits within the vector
register) for float scalars. This requires extra PowerISA operations to
exactly match the X86 scalar float (intrinsics ending in *_ss)
semantics. The intent is to provide a functionally correct
implementation at some reduction in performance.

This patch just adds the mm_malloc.h header with is will be needed by
xmmintrin.h and cleans up some noisy warnings from the previous MMX
commit.

Part 2 adds the xmmintrin.h include and associated config.gcc and
x86intrin.h changes

part 3 adds the associated DG test cases.


./gcc/ChangeLog:

2017-08-16  Steven Munroe  

* config/rs6000/mm_malloc.h: New file.


[gcc/testsuite]

2017-07-21  Steven Munroe  

* gcc.target/powerpc/mmx-packuswb-1.c [NO_WARN_X86_INTRINSICS]:
Define. Suppress warning during tests.


Index: gcc/testsuite/gcc.target/powerpc/mmx-packuswb-1.c
===
--- gcc/testsuite/gcc.target/powerpc/mmx-packuswb-1.c   (revision 250986)
+++ gcc/testsuite/gcc.target/powerpc/mmx-packuswb-1.c   (working copy)
@@ -3,6 +3,8 @@
 /* { dg-require-effective-target lp64 } */
 /* { dg-require-effective-target p8vector_hw } */
 
+#define NO_WARN_X86_INTRINSICS 1
+
 #ifndef CHECK_H
 #define CHECK_H "mmx-check.h"
 #endif
Index: gcc/config/rs6000/mm_malloc.h
===
--- gcc/config/rs6000/mm_malloc.h   (revision 0)
+++ gcc/config/rs6000/mm_malloc.h   (revision 0)
@@ -0,0 +1,62 @@
+/* Copyright (C) 2004-2017 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License
and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not,
see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef _MM_MALLOC_H_INCLUDED
+#define _MM_MALLOC_H_INCLUDED
+
+#include 
+
+/* We can't depend on  since the prototype of posix_memalign
+   may not be visible.  */
+#ifndef __cplusplus
+extern int posix_memalign (void **, size_t, size_t);
+#else
+extern "C" int posix_memalign (void **, size_t, size_t) throw ();
+#endif
+
+static __inline void *
+_mm_malloc (size_t size, size_t alignment)
+{
+  /* PowerPC64 ELF V2 ABI requires quadword alignment. */
+  size_t vec_align = sizeof (__vector float);
+  /* Linux GLIBC malloc alignment is at least 2 X ptr size.  */
+  size_t malloc_align = (sizeof (void *) + sizeof (void *));
+  void *ptr;
+
+  if (alignment == malloc_align && alignment == vec_align)
+return malloc (size);
+  if (alignment < vec_align)
+alignment = vec_align;
+  if (posix_memalign (&ptr, alignment, size) == 0)
+return ptr;
+  else
+return NULL;
+}
+
+static __inline void
+_mm_free (void * ptr)
+{
+  free (ptr);
+}
+
+#endif /* _MM_MALLOC_H_INCLUDED */




[PATCH, rs6000] 2/3 Add x86 SSE intrinsics to GCC PPC64LE taget

2017-08-16 Thread Steven Munroe
This it part 2/3 for contributing PPC64LE support for X86 SSE
instrisics. This patch includes the new (for PPC) xmmintrin.h and
associated config.gcc changes.

This submission implements all the SSE Technology intrinsic functions
except those associated with directly accessing and updating the MX
Status and Control Register (MXSCR). 

1) The features and layout of the MXSCR is specific to the Intel
Architecture. 
2) Not all the controls and status bits of the MXSCR have equivalents in
the PowerISA's FPSCR. 
3) And using the Posix Floating Point Environments  API is a
better cross platform solution.


./gcc/ChangeLog:

2017-08-16  Steven Munroe  

* config.gcc (powerpc*-*-*): Add xmmintrin.h and mm_malloc.h.
* config/rs6000/xmmintrin.h: New file.
* config/rs6000/x86intrin.h [__ALTIVEC__]: Include xmmintrin.h.



Index: gcc/config.gcc
===
--- gcc/config.gcc  (revision 250986)
+++ gcc/config.gcc  (working copy)
@@ -457,6 +457,7 @@ powerpc*-*-*)
extra_objs="rs6000-string.o rs6000-p8swap.o"
extra_headers="ppc-asm.h altivec.h htmintrin.h htmxlintrin.h"
extra_headers="${extra_headers} bmi2intrin.h bmiintrin.h"
+   extra_headers="${extra_headers} xmmintrin.h mm_malloc.h"
extra_headers="${extra_headers} mmintrin.h x86intrin.h"
extra_headers="${extra_headers} ppu_intrinsics.h spu2vmx.h vec_types.h 
si2vmx.h"
extra_headers="${extra_headers} paired.h"
Index: gcc/config/rs6000/x86intrin.h
===
--- gcc/config/rs6000/x86intrin.h   (revision 250986)
+++ gcc/config/rs6000/x86intrin.h   (working copy)
@@ -37,6 +37,8 @@
 
 #ifdef __ALTIVEC__
 #include 
+
+#include 
 #endif /* __ALTIVEC__ */
 
 #include 
Index: gcc/config/rs6000/xmmintrin.h
===
--- gcc/config/rs6000/xmmintrin.h   (revision 0)
+++ gcc/config/rs6000/xmmintrin.h   (revision 0)
@@ -0,0 +1,1815 @@
+/* Copyright (C) 2002-2017 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+/* Implemented from the specification included in the Intel C++ Compiler
+   User Guide and Reference, version 9.0.  */
+
+#ifndef NO_WARN_X86_INTRINSICS
+/* This header is distributed to simplify porting x86_64 code that
+   makes explicit use of Intel intrinsics to powerpc64le.
+   It is the user's responsibility to determine if the results are
+   acceptable and make additional changes as necessary.
+   Note that much code that uses Intel intrinsics can be rewritten in
+   standard C or GNU C extensions, which are more portable and better
+   optimized across multiple targets.
+
+   In the specific case of X86 SSE (__m128) intrinsics, the PowerPC
+   VMX/VSX ISA is a good match for vector float SIMD operations.
+   However scalar float operations in vector (XMM) registers require
+   the POWER8 VSX ISA (2.07) level. Also there are important
+   differences for data format and placement of float scalars in the
+   vector register. For PowerISA Scalar floats in FPRs (left most
+   64-bits of the low 32 VSRs) is in double format, while X86_64 SSE
+   uses the right most 32-bits of the XMM. These differences require
+   extra steps on POWER to match the SSE scalar float semantics.
+
+   Most SSE scalar float intrinsic operations can be performed more
+   efficiently as C language float scalar operations or optimized to
+   use vector SIMD operations.  We recommend this for new applications.
+
+   Another difference is the format and details of the X86_64 MXSCR vs
+   the PowerISA FPSCR / VSCR registers. We recommend applications
+   replace direct access to the MXSCR with the more portable 
+   Posix APIs. */
+#warning "Please read comment above.  Use -DNO_WARN_X86_INTRINSICS to disable 
this warning."
+#endif
+
+#ifndef _XMMINTRIN_H_INCLUDED
+#define _XMMINTRIN_H_INCLUDED
+
+#inclu

[PATCH, rs6000] 3/3 Add x86 SSE intrinsics to GCC PPC64LE taget

2017-08-16 Thread Steven Munroe
This it part 3/3 for contributing PPC64LE support for X86 SSE
instrisics. This patch includes testsuite/gcc.target tests for the
intrinsics included by xmmintrin.h. 

For these tests I added -Wno-psabi to dg-options to suppress warnings
associated with the vector ABI change in GCC5. These warning are
associated with unions defined in m128-check.h (ported with minimal
change from i386). This removes some noise from make check.


[gcc/testsuite]

2017-08-16  Steven Munroe  

* gcc.target/powerpc/m128-check.h: New file.
* gcc.target/powerpc/sse-check.h: New file.
* gcc.target/powerpc/sse-movmskps-1.c: New file.
* gcc.target/powerpc/sse-movlps-2.c: New file.
* gcc.target/powerpc/sse-pavgw-1.c: New file.
* gcc.target/powerpc/sse-cvttss2si-1.c: New file.
* gcc.target/powerpc/sse-cvtpi32x2ps-1.c: New file.
* gcc.target/powerpc/sse-cvtss2si-1.c: New file.
* gcc.target/powerpc/sse-divss-1.c: New file.
* gcc.target/powerpc/sse-movhps-1.c: New file.
* gcc.target/powerpc/sse-cvtsi2ss-2.c: New file.
* gcc.target/powerpc/sse-subps-1.c: New file.
* gcc.target/powerpc/sse-minps-1.c: New file.
* gcc.target/powerpc/sse-pminub-1.c: New file.
* gcc.target/powerpc/sse-cvtpu16ps-1.c: New file.
* gcc.target/powerpc/sse-shufps-1.c: New file.
* gcc.target/powerpc/sse-ucomiss-2.c: New file.
* gcc.target/powerpc/sse-maxps-1.c: New file.
* gcc.target/powerpc/sse-pmaxub-1.c: New file.
* gcc.target/powerpc/sse-movmskb-1.c: New file.
* gcc.target/powerpc/sse-ucomiss-4.c: New file.
* gcc.target/powerpc/sse-unpcklps-1.c: New file.
* gcc.target/powerpc/sse-mulps-1.c: New file.
* gcc.target/powerpc/sse-rcpps-1.c: New file.
* gcc.target/powerpc/sse-pminsw-1.c: New file.
* gcc.target/powerpc/sse-ucomiss-6.c: New file.
* gcc.target/powerpc/sse-subss-1.c: New file.
* gcc.target/powerpc/sse-movss-2.c: New file.
* gcc.target/powerpc/sse-pmaxsw-1.c: New file.
* gcc.target/powerpc/sse-minss-1.c: New file.
* gcc.target/powerpc/sse-movaps-2.c: New file.
* gcc.target/powerpc/sse-movlps-1.c: New file.
* gcc.target/powerpc/sse-maxss-1.c: New file.
* gcc.target/powerpc/sse-movhlps-1.c: New file.
* gcc.target/powerpc/sse-cvttss2si-2.c: New file.
* gcc.target/powerpc/sse-cvtpi8ps-1.c: New file.
* gcc.target/powerpc/sse-cvtpi32ps-1.c: New file.
* gcc.target/powerpc/sse-mulss-1.c: New file.
* gcc.target/powerpc/sse-cvtsi2ss-1.c: New file.
* gcc.target/powerpc/sse-cvtss2si-2.c: New file.
* gcc.target/powerpc/sse-movlhps-1.c: New file.
* gcc.target/powerpc/sse-movhps-2.c: New file.
* gcc.target/powerpc/sse-rsqrtps-1.c: New file.
* gcc.target/powerpc/sse-xorps-1.c: New file.
* gcc.target/powerpc/sse-cvtpspi8-1.c: New file.
* gcc.target/powerpc/sse-orps-1.c: New file.
* gcc.target/powerpc/sse-addps-1.c: New file.
* gcc.target/powerpc/sse-cvtpi16ps-1.c: New file.
* gcc.target/powerpc/sse-ucomiss-1.c: New file.
* gcc.target/powerpc/sse-ucomiss-3.c: New file.
* gcc.target/powerpc/sse-pmulhuw-1.c: New file.
* gcc.target/powerpc/sse-andps-1.c: New file.
* gcc.target/powerpc/sse-cmpss-1.c: New file.
* gcc.target/powerpc/sse-divps-1.c: New file.
* gcc.target/powerpc/sse-andnps-1.c: New file.
* gcc.target/powerpc/sse-ucomiss-5.c: New file.
* gcc.target/powerpc/sse-movss-1.c: New file.
* gcc.target/powerpc/sse-sqrtps-1.c: New file.
* gcc.target/powerpc/sse-cvtpu8ps-1.c: New file.
* gcc.target/powerpc/sse-cvtpspi16-1.c: New file.
* gcc.target/powerpc/sse-movaps-1.c: New file.
* gcc.target/powerpc/sse-movss-3.c: New file.
* gcc.target/powerpc/sse-unpckhps-1.c: New file.
* gcc.target/powerpc/sse-addss-1.c: New file.
* gcc.target/powerpc/sse-psadbw-1.c: New file.


Index: gcc/testsuite/gcc.target/powerpc/sse-movmskps-1.c
===
--- gcc/testsuite/gcc.target/powerpc/sse-movmskps-1.c   (revision 0)
+++ gcc/testsuite/gcc.target/powerpc/sse-movmskps-1.c   (revision 0)
@@ -0,0 +1,45 @@
+/* { dg-do run } */
+/* { dg-options "-O3 -mpower8-vector" } */
+/* { dg-require-effective-target lp64 } */
+/* { dg-require-effective-target p8vector_hw } */
+
+#define NO_WARN_X86_INTRINSICS 1
+
+#ifndef CHECK_H
+#define CHECK_H "sse-check.h"
+#endif
+
+#include CHECK_H
+
+#ifndef TEST
+#define TEST sse_test_movmskps_1
+#endif
+
+#include 
+
+static int
+__attribute__((noinline, unused))
+test (__m128 a)
+{
+  return _mm_movemask_ps (a); 
+}
+
+static void
+TEST (void)
+{
+  union128 u;
+  float s[4] = {-2134.3343, 1234.635654, 1.2234, -876.8976};
+  int d;
+  int e = 0;
+  int i;
+
+  u.x = _mm_loadu_ps

Re: [PATCH, rs6000] 2/3 Add x86 SSE intrinsics to GCC PPC64LE taget

2017-08-17 Thread Steven Munroe
On Thu, 2017-08-17 at 00:28 -0500, Segher Boessenkool wrote:
> Hi!
> 
> On Wed, Aug 16, 2017 at 03:35:40PM -0500, Steven Munroe wrote:
> > +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, 
> > __artificial__))
> > +_mm_add_ss (__m128 __A, __m128 __B)
> > +{
> > +#ifdef _ARCH_PWR7
> > +  __m128 a, b, c;
> > +  static const __vector unsigned int mask = {0x, 0, 0, 0};
> > +  /* PowerISA VSX does not allow partial (for just lower double)
> > +   * results. So to insure we don't generate spurious exceptions
> > +   * (from the upper double values) we splat the lower double
> > +   * before we to the operation. */
> 
> No leading stars in comments please.
Fixed

> 
> > +  a = vec_splat (__A, 0);
> > +  b = vec_splat (__B, 0);
> > +  c = a + b;
> > +  /* Then we merge the lower float result with the original upper
> > +   * float elements from __A.  */
> > +  return (vec_sel (__A, c, mask));
> > +#else
> > +  __A[0] = __A[0] + __B[0];
> > +  return (__A);
> > +#endif
> > +}
> 
> It would be nice if we could just write the #else version and get the
> more optimised code, but I guess we get something horrible going through
> memory, instead?
> 
No, even with GCC8-trunk this field access is going through storage.

The generated code for splat, op, select is shorter even when you
include loading the constant.

vector <-> scalar float is just nasty!

> > +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, 
> > __artificial__))
> > +_mm_rcp_ps (__m128 __A)
> > +{
> > +  __v4sf result;
> > +
> > +  __asm__(
> > +  "xvresp %x0,%x1;\n"
> > +  : "=v" (result)
> > +  : "v" (__A)
> > +  : );
> > +
> > +  return (result);
> > +}
> 
> There is a builtin for this (__builtin_vec_re).

Yes, not sure how I missed that. Fixed.

> 
> > +/* Convert the lower SPFP value to a 32-bit integer according to the 
> > current
> > +   rounding mode.  */
> > +extern __inline int __attribute__((__gnu_inline__, __always_inline__, 
> > __artificial__))
> > +_mm_cvtss_si32 (__m128 __A)
> > +{
> > +  __m64 res = 0;
> > +#ifdef _ARCH_PWR8
> > +  __m128 vtmp;
> > +  __asm__(
> > +  "xxsldwi %x1,%x2,%x2,3;\n"
> > +  "xscvspdp %x1,%x1;\n"
> > +  "fctiw  %1,%1;\n"
> > +  "mfvsrd  %0,%x1;\n"
> > +  : "=r" (res),
> > +   "=&wi" (vtmp)
> > +  : "wa" (__A)
> > +  : );
> > +#endif
> > +  return (res);
> > +}
> 
> Maybe it could do something better than return the wrong answer for non-p8?

Ok this gets tricky. Before _ARCH_PWR8 the vector to scalar transfer
would go through storage. But that is not the worst of it.

The semantic of cvtss requires rint or llrint. But __builtin_rint will
generate a call to libm unless we assert -ffast-math. And we don't have
builtins to generate fctiw/fctid directly.

So I will add the #else using __builtin_rint if that libm dependency is
ok (this will pop in the DG test for older machines.

> 
> > +extern __inline int __attribute__((__gnu_inline__, __always_inline__, 
> > __artificial__))
> 
> > +#ifdef __LITTLE_ENDIAN__
> > +  return result[1];
> > +#elif __BIG_ENDIAN__
> > +  return result [0];
> 
> Remove the extra space here?
> 
> > +_mm_max_pi16 (__m64 __A, __m64 __B)
> 
> > +  res.as_short[0] = (m1.as_short[0] > m2.as_short[0])? m1.as_short[0]: 
> > m2.as_short[0];
> > +  res.as_short[1] = (m1.as_short[1] > m2.as_short[1])? m1.as_short[1]: 
> > m2.as_short[1];
> > +  res.as_short[2] = (m1.as_short[2] > m2.as_short[2])? m1.as_short[2]: 
> > m2.as_short[2];
> > +  res.as_short[3] = (m1.as_short[3] > m2.as_short[3])? m1.as_short[3]: 
> > m2.as_short[3];
> 
> Space before ? and : .
done

> 
> > +_mm_min_pi16 (__m64 __A, __m64 __B)
> 
> In this function, too.
> 
> > +extern __inline int __attribute__((__gnu_inline__, __always_inline__, 
> > __artificial__))
> > +_m_pmovmskb (__m64 __A)
> > +{
> > +  return _mm_movemask_pi8 (__A);
> > +}
> > +/* Multiply four unsigned 16-bit values in A by four unsigned 16-bit values
> > +   in B and produce the high 16 bits of the 32-bit results.  */
> > +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, 
> > __artificial__))
> 
> Newline before the comment?
done

> 
> > +_mm_sad_pu8 (__m64  __A, __m64  __B)
> 
> > +  /* Sum four group

Re: [PATCH, rs6000] 3/3 Add x86 SSE intrinsics to GCC PPC64LE taget

2017-08-18 Thread Steven Munroe
On Thu, 2017-08-17 at 00:47 -0500, Segher Boessenkool wrote:
> On Wed, Aug 16, 2017 at 03:50:55PM -0500, Steven Munroe wrote:
> > This it part 3/3 for contributing PPC64LE support for X86 SSE
> > instrisics. This patch includes testsuite/gcc.target tests for the
> > intrinsics included by xmmintrin.h. 
> 
> > +#define CHECK_EXP(UINON_TYPE, VALUE_TYPE, FMT) \
> 
> Should that be UNION_TYPE?

It is spelled 'UINON_TYPE' in
./gcc/testsuite/gcc.target/i386/m128-check.h which the source for the
powerpc version.

There is no obvious reason why it could not be spelled UNION_TYPE.
Unless there is some symbol collision further up the SSE/AVX stack.

Bingo:

avx512f-helper.h:#define UNION_TYPE(SIZE, NAME) EVAL(union, SIZE, NAME)

I propose not to change this.




[PATCH, rs6000] Add x86 instrinsic headers to GCC PPC64LE taget

2017-05-08 Thread Steven Munroe
A common issue in porting applications and packages is that someone may
have forgotten that there is more than one hardware platform. 

A specific example is applications using Intel x86 intrinsic functions
without appropriate conditional compile guards. Another example is a
developer tasked to port a large volume of code containing important
functions "optimized" with Intel x86 intrinsics, but without the skill
or time to perform the same optimization for another platform. Often the
developer who wrote the original optimization has moved on and those
left to maintain the application / package lack understanding of the
original x86 intrinsic code or design.

For PowerPC this can be acute especially for HPC vector SIMD codes. The
PowerISA (as implemented for POWER and OpenPOWER servers) has extensive
vector hardware facilities and GCC proves a large set of vector
intrinsics. Thus I would like to restrict this support to PowerPC
targets that support VMX/VSX and PowerISA-2.07 (power8) and later.

But the difference in (intrinsic) spelling alone is enough stop many
application developers in their tracks.

So I propose to submit a series of patches to implement the PowerPC64LE
equivalent of a useful subset of the x86 intrinsics. The final size and
usefulness of this effort is to be determined. The proposal is to
incrementally port intrinsic header files from the ./config/i386 tree to
the ./config/rs6000 tree. This naturally provides the same header
structure and intrinsic names which will simplify code porting.

It seems natural to work from the bottom (oldest) up. For example
starting with mmintrin.h and working our way up the following headers:

smmintrin.h(SSE4.1)  includes tmmintrin,h
tmmintrin.h(SSSE3)   includes pmmintrin.h
pmmintrin.h(SSE3)includes emmintrin,h
emmintrin.h(SSE2)includes xmmintrin.h
xmmintrin.h(SSE) includes mmintrin.h and mm_malloc.h
mmintrin.h (MMX)

There is a smattering of non-vector intrinsics in common use.
Like the Bit Manipulation Instructions (BMI & BMI2).

bmiintrin.h
bmi2intrin.h
x86intrin.h (collector includes BMI headers and many others)

The older intrinsic (BMI/MMX/SSE) instructions have been integrated into
GCC and many of the intrinsic implementations are simple C code or GCC
built-ins. The remaining intrinsic functions are implemented as platform
specific builtins (__builtin_ia32_*) and need to be mapped to equivalent
PowerPC builtin or vector intrinsic from altivec.h is required.

Of course as part of this process we will port as many of the
corresponding DejaGnu tests from gcc/testsuite/gcc.target/i386/ to
gcc/testsuite/gcc.target/powerpc/ as appropriate. So far the dg-do run
tests only require minor source changes, mostly to the platform specific
dg-* directives. A few dg-do compile tests are needed to insure we are
getting the expected folding/Common subexpression elimination (CSE) to
generate the optimum sequence for PowerPC.

To get the ball rolling I include the BMI intrinsics ported to PowerPC
for review as they are reasonable size (31 intrinsic implementations).

[gcc]

2017-05-04  Steven Munroe  

* config.gcc (powerpc*-*-*): Add bmi2intrin.h, bmiintrin.h,
and x86intrin.h
* config/rs6000/bmiintrin.h: New file.
* config/rs6000/bmi2intrin.h: New file.
* config/rs6000/x86intrin.h: New file.

[gcc/testsuite]

2017-05-04  Steven Munroe  

* gcc.target/powerpc/bmi-andn-1.c: New file
* gcc.target/powerpc/bmi-andn-2.c: New file.
* gcc.target/powerpc/bmi-bextr-1.c: New file.
* gcc.target/powerpc/bmi-bextr-2.c: New file.
* gcc.target/powerpc/bmi-bextr-4.c: New file.
* gcc.target/powerpc/bmi-bextr-5.c: New file.
* gcc.target/powerpc/bmi-blsi-1.c: New file.
* gcc.target/powerpc/bmi-blsi-2.c: New file.
* gcc.target/powerpc/bmi-blsmsk-1.c: new file.
* gcc.target/powerpc/bmi-blsmsk-2.c: New file.
* gcc.target/powerpc/bmi-blsr-1.c: New file.
* gcc.target/powerpc/bmi-blsr-2.c: New File.
* gcc.target/powerpc/bmi-check.h: New File.
* gcc.target/powerpc/bmi-tzcnt-1.c: new file.
* gcc.target/powerpc/bmi-tzcnt-2.c: New file.
* gcc.target/powerpc/bmi2-bzhi32-1.c: New file.
* gcc.target/powerpc/bmi2-bzhi64-1.c: New file.
* gcc.target/powerpc/bmi2-bzhi64-1a.c: New file.
* gcc.target/powerpc/bmi2-check.h: New file.
* gcc.target/powerpc/bmi2-mulx32-1.c: New file.
* gcc.target/powerpc/bmi2-mulx32-2.c: New file.
* gcc.target/powerpc/bmi2-mulx64-1.c: New file.
* gcc.target/powerpc/bmi2-mulx64-2.c: New file.
* gcc.target/powerpc/bmi2-pdep32-1.c: New file.
* gcc.target/powerpc/bmi2-pdep64-1.c: New file.
* gcc.target/powerpc/bmi2-pext32-1.c: New File.
* gcc.target/powerpc/bmi2-pext64-1.c: New file.
* gcc.target/powerpc/bmi2-pext64-1a.c: New File.

Index: gcc/testsuite/gcc.target/po

Re: [PATCH, rs6000] Add x86 instrinsic headers to GCC PPC64LE taget

2017-05-09 Thread Steven Munroe
On Tue, 2017-05-09 at 12:23 -0500, Segher Boessenkool wrote:
> Hi!
> 
> On Mon, May 08, 2017 at 09:49:57AM -0500, Steven Munroe wrote:
> > Thus I would like to restrict this support to PowerPC
> > targets that support VMX/VSX and PowerISA-2.07 (power8) and later.
> 
> What happens if you run it on an older machine, or as BE or 32-bit,
> or with vectors disabled?
> 
Well I hope that I set the dg-require-effective-target correctly because
while some of these intrinsics might work on the BE or 32-bit machine,
most will not.

For example; many of the BMI intrinsic implementations depend on 64-bit
instructions and so I use { dg-require-effective-target lp64 }.  The
BMI2 intrinsic _pext exploits the Bit Permute Doubleword instruction.
There is no Bit Permute Word instruction. So for BMI2 I use
{ dg-require-effective-target powerpc_vsx_ok } as bpermd was introduced
in PowerISA 2.06 along with the Vector Scalar Extension facility.

The situation gets more complicated when we start looking at the
SSE/SSE2. These headers define many variants of load and store
instructions that are decidedly LE and many unaligned forms. While
powerpc64le handles this with ease, implementing LE semantics in BE mode
gets seriously tricky. I think it is better to avoid this and only
support these headers for LE.

And while some SSE instrinsics can be implemented with VMX instructions
all the SSE2 double float intrinsics require VSX. And some PowerISA 2.07
instructions simplify implementation if available. As power8 is also the
first supported powerpc64le system it seems the logical starting point
for most of this work. 

I don't plan to spend effort on supporting Intel intrinsic functions on
older PowerPC machines (before power8) or BE.

> > So I propose to submit a series of patches to implement the PowerPC64LE
> > equivalent of a useful subset of the x86 intrinsics. The final size and
> > usefulness of this effort is to be determined. The proposal is to
> > incrementally port intrinsic header files from the ./config/i386 tree to
> > the ./config/rs6000 tree. This naturally provides the same header
> > structure and intrinsic names which will simplify code porting.
> 
> Yeah.
> 
> I'd still like to see these headers moved into some subdir (both in
> the source tree and in the installed headers tree), to reduce clutter,
> but I understand it's not trivial to do.
> 
> > To get the ball rolling I include the BMI intrinsics ported to PowerPC
> > for review as they are reasonable size (31 intrinsic implementations).
> 
> This is okay for trunk.  Thanks!
> 
Thank you

> > --- gcc/config.gcc  (revision 247616)
> > +++ gcc/config.gcc  (working copy)
> > @@ -444,7 +444,7 @@ nvptx-*-*)
> > ;;
> >  powerpc*-*-*)
> > cpu_type=rs6000
> > -   extra_headers="ppc-asm.h altivec.h spe.h ppu_intrinsics.h paired.h
> > spu2vmx.h vec_types.h si2vmx.h htmintrin.h htmxlintrin.h"
> > +   extra_headers="ppc-asm.h altivec.h spe.h ppu_intrinsics.h paired.h
> > spu2vmx.h vec_types.h si2vmx.h htmintrin.h htmxlintrin.h bmi2intrin.h
> > bmiintrin.h x86intrin.h"
> 
> (Your mail client wrapped this).
> 
> Write this on a separate line?  Like
>   extra_headers="${extra_headers} htmintrin.h htmxlintrin.h bmi2intrin.h"
> (You cannot use += here, pity).
> 
> 
> Segher
> 




Re: [PATCH, rs6000] Add x86 instrinsic headers to GCC PPC64LE taget

2017-05-10 Thread Steven Munroe
On Tue, 2017-05-09 at 16:03 -0500, Segher Boessenkool wrote:
> On Tue, May 09, 2017 at 02:33:00PM -0500, Steven Munroe wrote:
> > On Tue, 2017-05-09 at 12:23 -0500, Segher Boessenkool wrote:
> > > On Mon, May 08, 2017 at 09:49:57AM -0500, Steven Munroe wrote:
> > > > Thus I would like to restrict this support to PowerPC
> > > > targets that support VMX/VSX and PowerISA-2.07 (power8) and later.
> > > 
> > > What happens if you run it on an older machine, or as BE or 32-bit,
> > > or with vectors disabled?
> > > 
> > Well I hope that I set the dg-require-effective-target correctly because
> > while some of these intrinsics might work on the BE or 32-bit machine,
> > most will not.
> 
> That is just for the testsuite; I meant what happens if a user tries
> to use it with an older target (or BE, or 32-bit)?  Is there a useful,
> obvious error message?
> 
So looking at the X86 headers, their current practice falls into two two
areas. 

1) guard 64-bit dependent intrinsic functions with:

#ifdef __x86_64__
#endif

But they do not provide any warnings. I assume that attempting to use an
intrinsic of this class would result in an implicit function declaration
and a link-time failure.

2) guard architecture level dependent intrinsic header content with:

#ifndef __AVX__
#pragma GCC push_options
#pragma GCC target("avx")
#define __DISABLE_AVX__
#endif /* __AVX__ */
...

#ifdef __DISABLE_AVX__
#undef __DISABLE_AVX__
#pragma GCC pop_options
#endif /* __DISABLE_AVX__ */

So they don't many any attempt to prevent them from using a specific
header. If the compiler version does not support the "GCC target" I
assume that specific did not exist in that version. 

If GCC does support that target then the '#pragma GCC target("avx")'
will enable code generation, but the user might get a SIGILL if the
hardware they have does not support those instructions.

In the BMI headers I already guard with:

#ifdef  __PPC64__
#endif

This means that like x86_64, attempting to use _pext_u64 on a 32-bit
compiler will result in an implicit function declaration and cause a
linker error.

This is sufficient for most of BMI and BMI2 (registers only / endian
agnostic). But this does not address the larger issues (for SSE/SSE2+)
which needing VXS implementation or restricting to LE.

So should I check for:

#ifdef __VSX__
#endif

or 

#ifdef __POWER8_VECTOR__

or 

#ifdef _ARCH_PWR8

and perhaps:

#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__

as well to enforce this. 

And are you suggesting I add an #else clause with #warning or #error? Or
is the implicit function and link failure sufficient?

> > The situation gets more complicated when we start looking at the
> > SSE/SSE2. These headers define many variants of load and store
> > instructions that are decidedly LE and many unaligned forms. While
> > powerpc64le handles this with ease, implementing LE semantics in BE mode
> > gets seriously tricky. I think it is better to avoid this and only
> > support these headers for LE.
> 
> Right.
> 
> > And while some SSE instrinsics can be implemented with VMX instructions
> > all the SSE2 double float intrinsics require VSX. And some PowerISA 2.07
> > instructions simplify implementation if available. As power8 is also the
> > first supported powerpc64le system it seems the logical starting point
> > for most of this work. 
> 
> Agreed as well.
> 
> > I don't plan to spend effort on supporting Intel intrinsic functions on
> > older PowerPC machines (before power8) or BE.
> 
> Just make sure if anyone tries anyway, there is a clear error message
> that tells them not to.
> 
> 
> Segher
> 




Re: [PATCH, rs6000] Add x86 instrinsic headers to GCC PPC64LE taget

2017-05-11 Thread Steven Munroe
On Thu, 2017-05-11 at 09:39 -0500, Segher Boessenkool wrote:
> On Wed, May 10, 2017 at 12:59:28PM -0500, Steven Munroe wrote:
> > > That is just for the testsuite; I meant what happens if a user tries
> > > to use it with an older target (or BE, or 32-bit)?  Is there a useful,
> > > obvious error message?
> > > 
> > So looking at the X86 headers, their current practice falls into two two
> > areas. 
> > 
> > 1) guard 64-bit dependent intrinsic functions with:
> > 
> > #ifdef __x86_64__
> > #endif
> > 
> > But they do not provide any warnings. I assume that attempting to use an
> > intrinsic of this class would result in an implicit function declaration
> > and a link-time failure.
> 
> Yeah probably.  Which is fine -- it does not silently do the wrong thing,
> and it is easy to find where the problem is.
> 
> > If GCC does support that target then the '#pragma GCC target("avx")'
> > will enable code generation, but the user might get a SIGILL if the
> > hardware they have does not support those instructions.
> 
> That is less friendly, but it still does not silently generate bad code.
> 
> > In the BMI headers I already guard with:
> > 
> > #ifdef  __PPC64__
> > #endif
> > 
> > This means that like x86_64, attempting to use _pext_u64 on a 32-bit
> > compiler will result in an implicit function declaration and cause a
> > linker error.
> 
> Yup, that's fine.
> 
> > This is sufficient for most of BMI and BMI2 (registers only / endian
> > agnostic). But this does not address the larger issues (for SSE/SSE2+)
> > which needing VXS implementation or restricting to LE.
> 
> Right.
> 
> > So should I check for:
> > 
> > #ifdef __VSX__
> > #endif
> > 
> > or 
> > 
> > #ifdef __POWER8_VECTOR__
> > 
> > or 
> > 
> > #ifdef _ARCH_PWR8
> > 
> > and perhaps:
> > 
> > #if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
> > 
> > as well to enforce this. 
> > 
> > And are you suggesting I add an #else clause with #warning or #error? Or
> > is the implicit function and link failure sufficient?
> 
> The first is friendlier, the second is sufficient I think.
> 
> Maybe it is good enough to check for LE only?  Most unmodified code
> written for x86 (using intrinsics etc.) will not work correctly on BE.
> And if you restrict to LE you get 64-bit and POWER8 automatically.
> 
> So maybe just require LE?
> 
Ok I will add "#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__" guard for
the MMX/SSE and later intrinsic headers.




Re: [PATCH, rs6000] Add x86 instrinsic headers to GCC PPC64LE taget

2017-05-12 Thread Steven Munroe
On Fri, 2017-05-12 at 11:38 -0700, Mike Stump wrote:
> On May 8, 2017, at 7:49 AM, Steven Munroe  wrote:
> > Of course as part of this process we will port as many of the
> > corresponding DejaGnu tests from gcc/testsuite/gcc.target/i386/ to
> > gcc/testsuite/gcc.target/powerpc/ as appropriate. So far the dg-do run
> > tests only require minor source changes, mostly to the platform specific
> > dg-* directives. A few dg-do compile tests are needed to insure we are
> > getting the expected folding/Common subexpression elimination (CSE) to
> > generate the optimum sequence for PowerPC.
> 
> If there is a way to share that seems reasonable and the x86 would like to 
> share...
> 
> I'd let you and the x86 folks figure out what is best.

It too early to tell but I have no objections to discussing options.

Are you looking to share source files? This seems like low value because
the files tend to be small and the only difference is the dg-*
directives. I don't know enough about the DejaGnu macros to even guess
at what this might entail.

So far the sharing it is mostly one way (./i386/ -> ./powerpc/) but if I
find cases that requires a new dg test and might also apply to ./i386/ I
be willing to share that with X86.




[PATCH rs6000] Fix up dg-options for BMI intrinsic tests

2017-05-17 Thread Steven Munroe
David pointed out that I my earlier X86 BMI intrinsic header submission
was causing make check failures on on powerpc64le platforms. The patch
below tests out on Linux BE powerpc64/32 and should also resolve the
failures on AIX. I don't have access to a AIX so David can you give this
patch a quick test.

Thanks.

[gcc/testsuite]

2017-05-17  Steven Munroe  

* gcc.target/powerpc/bmi-andn-1.c: Fix-up dg-options.
* gcc.target/powerpc/bmi-andn-2.c: Fix-up dg-options.
* gcc.target/powerpc/bmi-bextr-1.c: Fix-up dg-options.
* gcc.target/powerpc/bmi-bextr-2.c: Fix-up dg-options.
* gcc.target/powerpc/bmi-bextr-4.c: Fix-up dg-options.
* gcc.target/powerpc/bmi-bextr-5.c: Fix-up dg-options.
* gcc.target/powerpc/bmi-blsi-1.c: Fix-up dg-options.
* gcc.target/powerpc/bmi-blsi-2.c: Fix-up dg-options.
* gcc.target/powerpc/bmi-blsmsk-1.c: Fix-up dg-options.
* gcc.target/powerpc/bmi-blsmsk-2.c: Fix-up dg-options.
* gcc.target/powerpc/bmi-blsr-1.c: Fix-up dg-options.
* gcc.target/powerpc/bmi-blsr-2.c: Fix-up dg-options.
* gcc.target/powerpc/bmi-tzcnt-1.c: Fix-up dg-options.
* gcc.target/powerpc/bmi-tzcnt-2.c: Fix-up dg-options.
* gcc.target/powerpc/bmi2-bzhi32-1.c: Fix-up dg-options.
* gcc.target/powerpc/bmi2-bzhi64-1.c: Fix-up dg-options.
* gcc.target/powerpc/bmi2-bzhi64-1a.c: Fix-up dg-options.
* gcc.target/powerpc/bmi2-mulx32-1.c: Fix-up dg-options.
* gcc.target/powerpc/bmi2-mulx32-2.c: Fix-up dg-options.
* gcc.target/powerpc/bmi2-mulx64-1.c: Fix-up dg-options.
* gcc.target/powerpc/bmi2-mulx64-2.c: Fix-up dg-options.
* gcc.target/powerpc/bmi2-pdep32-1.c: Fix-up dg-options.
* gcc.target/powerpc/bmi2-pdep64-1.c: Fix-up dg-options.
* gcc.target/powerpc/bmi2-pext32-1.c: Fix-up dg-options.
* gcc.target/powerpc/bmi2-pext64-1.c: Fix-up dg-options.
* gcc.target/powerpc/bmi2-pext64-1a.c: Fix-up dg-options.

Index: gcc/testsuite/gcc.target/powerpc/bmi-andn-1.c
===
--- gcc/testsuite/gcc.target/powerpc/bmi-andn-1.c   (revision 248166)
+++ gcc/testsuite/gcc.target/powerpc/bmi-andn-1.c   (working copy)
@@ -1,5 +1,5 @@
 /* { dg-do run } */
-/* { dg-options "-O3 -m64" } */
+/* { dg-options "-O3" } */
 /* { dg-require-effective-target lp64 } */
 
 #define NO_WARN_X86_INTRINSICS 1
Index: gcc/testsuite/gcc.target/powerpc/bmi-andn-2.c
===
--- gcc/testsuite/gcc.target/powerpc/bmi-andn-2.c   (revision 248166)
+++ gcc/testsuite/gcc.target/powerpc/bmi-andn-2.c   (working copy)
@@ -1,5 +1,5 @@
 /* { dg-do run } */
-/* { dg-options "-O3 -m64" } */
+/* { dg-options "-O3" } */
 /* { dg-require-effective-target lp64 } */
 
 #define NO_WARN_X86_INTRINSICS 1
Index: gcc/testsuite/gcc.target/powerpc/bmi-bextr-1.c
===
--- gcc/testsuite/gcc.target/powerpc/bmi-bextr-1.c  (revision 248166)
+++ gcc/testsuite/gcc.target/powerpc/bmi-bextr-1.c  (working copy)
@@ -1,6 +1,6 @@
 /* { dg-do run } */
+/* { dg-options "-O2 -fno-inline" } */
 /* { dg-require-effective-target lp64 } */
-/* { dg-options "-O2 -m64 -fno-inline" } */
 
 #define NO_WARN_X86_INTRINSICS 1
 #include 
Index: gcc/testsuite/gcc.target/powerpc/bmi-bextr-2.c
===
--- gcc/testsuite/gcc.target/powerpc/bmi-bextr-2.c  (revision 248166)
+++ gcc/testsuite/gcc.target/powerpc/bmi-bextr-2.c  (working copy)
@@ -1,6 +1,6 @@
 /* { dg-do run } */
+/* { dg-options "-O3 -fno-inline" } */
 /* { dg-require-effective-target lp64 } */
-/* { dg-options "-O3 -m64 -fno-inline" } */
 
 #define NO_WARN_X86_INTRINSICS 1
 #include 
Index: gcc/testsuite/gcc.target/powerpc/bmi-bextr-4.c
===
--- gcc/testsuite/gcc.target/powerpc/bmi-bextr-4.c  (revision 248166)
+++ gcc/testsuite/gcc.target/powerpc/bmi-bextr-4.c  (working copy)
@@ -1,6 +1,6 @@
 /* { dg-do run } */
+/* { dg-options "-O3 -fno-inline" } */
 /* { dg-require-effective-target lp64 } */
-/* { dg-options "-O3 -m64 -fno-inline" } */
 
 #define NO_WARN_X86_INTRINSICS 1
 #include 
Index: gcc/testsuite/gcc.target/powerpc/bmi-bextr-5.c
===
--- gcc/testsuite/gcc.target/powerpc/bmi-bextr-5.c  (revision 248166)
+++ gcc/testsuite/gcc.target/powerpc/bmi-bextr-5.c  (working copy)
@@ -1,6 +1,6 @@
 /* { dg-do run } */
+/* { dg-options "-O3 -fno-inline" } */
 /* { dg-require-effective-target lp64 } */
-/* { dg-options "-O3 -m64 -fno-inline" } */
 
 #define NO_WARN_X86_INTRINSICS 1
 #include 
Index: gcc/testsuite/gcc.target/powerpc/bmi

Re: [PATCH rs6000] Fix up dg-options for BMI intrinsic tests

2017-05-18 Thread Steven Munroe
On Wed, 2017-05-17 at 17:22 -0400, David Edelsohn wrote:
> On Wed, May 17, 2017 at 4:56 PM, Steven Munroe
>  wrote:
> > David pointed out that I my earlier X86 BMI intrinsic header submission
> > was causing make check failures on on powerpc64le platforms. The patch
> > below tests out on Linux BE powerpc64/32 and should also resolve the
> > failures on AIX. I don't have access to a AIX so David can you give this
> > patch a quick test.
> 
> This will fix the failures on AIX.
> 
Ok I'll commit this.





[PATCH rs6000] Addition fixes to BMI intrinsic test

2017-05-24 Thread Steven Munroe
Bill Seurer pointed out that building the BMI tests on a power8 but with
gcc built --with-cpu=power6 fails with link errors. The intrinsics
_pdep_u64/32 and _pext_u64/32 are guarded with #ifdef _ARCH_PWR7 as the
implementation uses bpermd and popcntd instructions introduced with
power7 (PowerISA-2.06).

But if the GCC is built --with-cpu=power6, the compiler is capable of
supporting -mcpu=power7 but will not generate bpermd/popcntd by default.
Then if some code them uses say _pext_u64 with -mcpu=power6 the
intrinsic is not not supported (needs power7) and so not defined. 

The dg tests are guarded with dg-require-effective-target
powerpc_vsx_ok, This only tests if GCC and Binutils are capable of
generating vsx (and by extension PowerISA-2.06 bpermd and popcntd)
instructions.

In this case the result is the intrinsic functions are implicitly
defined as extern and cause a link failure. The solution is to guard the
test code with #ifdef _ARCH_PWR7 so that it does not attempt to use
instructions that are not there.

However for dg-compile test bmi2-pext64-1a.c we have no alternative to
add -mcpu=power7 to dg-options.


[gcc/testsuite]

2017-05-24  Steven Munroe  

* gcc.target/powerpc/bmi2-pdep32-1.c [_ARCH_PWR7]: Prevent
implicit function for processors without bpermd instruction.
* gcc.target/powerpc/bmi2-pdep64-1.c: Likewise.
* gcc.target/powerpc/bmi2-pext32-1.c: Likewise.
* gcc.target/powerpc/bmi2-pext64-1.c: Likewise.
* gcc.target/powerpc/bmi2-pext64-1a.c: Add -mcpu=power7
to dg-option.

Index: gcc/testsuite/gcc.target/powerpc/bmi2-pdep64-1.c
===
--- gcc/testsuite/gcc.target/powerpc/bmi2-pdep64-1.c(revision 248381)
+++ gcc/testsuite/gcc.target/powerpc/bmi2-pdep64-1.c(working copy)
@@ -7,6 +7,7 @@
 #include 
 #include "bmi2-check.h"
 
+#ifdef  _ARCH_PWR7
 __attribute__((noinline))
 unsigned long long
 calc_pdep_u64 (unsigned long long a, unsigned long long mask)
@@ -21,11 +22,13 @@ calc_pdep_u64 (unsigned long long a, unsigned long
 }
   return res;
 }
+#endif /* _ARCH_PWR7 */
 
 static
 void
 bmi2_test ()
 {
+#ifdef  _ARCH_PWR7
   unsigned long long i;
   unsigned long long src = 0xce7acce7acce7ac;
   unsigned long long res, res_ref;
@@ -39,4 +42,5 @@ bmi2_test ()
 if (res != res_ref)
   abort ();
   }
+#endif /* _ARCH_PWR7 */
 }
Index: gcc/testsuite/gcc.target/powerpc/bmi2-pext64-1.c
===
--- gcc/testsuite/gcc.target/powerpc/bmi2-pext64-1.c(revision 248381)
+++ gcc/testsuite/gcc.target/powerpc/bmi2-pext64-1.c(working copy)
@@ -7,6 +7,7 @@
 #include 
 #include "bmi2-check.h"
 
+#ifdef  _ARCH_PWR7
 __attribute__((noinline))
 unsigned long long
 calc_pext_u64 (unsigned long long a, unsigned long long mask)
@@ -22,10 +23,12 @@ calc_pext_u64 (unsigned long long a, unsigned long
 
   return res;
 }
+#endif /* _ARCH_PWR7 */
 
 static void
 bmi2_test ()
 {
+#ifdef  _ARCH_PWR7
   unsigned long long i;
   unsigned long long src = 0xce7acce7acce7ac;
   unsigned long long res, res_ref;
@@ -39,4 +42,5 @@ bmi2_test ()
 if (res != res_ref)
   abort();
   }
+#endif /* _ARCH_PWR7 */
 }
Index: gcc/testsuite/gcc.target/powerpc/bmi2-pdep32-1.c
===
--- gcc/testsuite/gcc.target/powerpc/bmi2-pdep32-1.c(revision 248381)
+++ gcc/testsuite/gcc.target/powerpc/bmi2-pdep32-1.c(working copy)
@@ -7,6 +7,7 @@
 #include 
 #include "bmi2-check.h"
 
+#ifdef  _ARCH_PWR7
 __attribute__((noinline))
 unsigned
 calc_pdep_u32 (unsigned a, int mask)
@@ -22,10 +23,12 @@ calc_pdep_u32 (unsigned a, int mask)
 
   return res;
 }
+#endif /* _ARCH_PWR7 */
 
 static void
 bmi2_test ()
 {
+#ifdef  _ARCH_PWR7
   unsigned i;
   unsigned src = 0xce7acc;
   unsigned res, res_ref;
@@ -39,4 +42,5 @@ bmi2_test ()
 if (res != res_ref)
   abort();
   }
+#endif /* _ARCH_PWR7 */
 }
Index: gcc/testsuite/gcc.target/powerpc/bmi2-pext64-1a.c
===
--- gcc/testsuite/gcc.target/powerpc/bmi2-pext64-1a.c   (revision 248381)
+++ gcc/testsuite/gcc.target/powerpc/bmi2-pext64-1a.c   (working copy)
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O3" } */
+/* { dg-options "-O3 -mcpu=power7" } */
 /* { dg-require-effective-target lp64 } */
 /* { dg-require-effective-target powerpc_vsx_ok } */
 
Index: gcc/testsuite/gcc.target/powerpc/bmi2-pext32-1.c
===
--- gcc/testsuite/gcc.target/powerpc/bmi2-pext32-1.c(revision 248381)
+++ gcc/testsuite/gcc.target/powerpc/bmi2-pext32-1.c(working copy)
@@ -7,6 +7,7 @@
 #include 
 #include "bmi2-check.h"
 
+#ifdef  _ARCH_PWR7
 __attribute__((noinline))
 unsigned
 calc_pext_u32 (unsigned a, unsigned mask)
@@ -22,10 +23,12 @@ calc_pext_u32 (unsigned

[PATCH rs6000] Addition fixes to BMI intrinsic tests, 2nd edition

2017-05-26 Thread Steven Munroe
Bill Seurer pointed out that building the BMI tests on a power8 but with
gcc built --with-cpu=power6 fails with link errors. The intrinsics
_pdep_u64/32 and _pext_u64/32 are guarded with #ifdef _ARCH_PWR7 as the
implementation uses bpermd and popcntd instructions introduced with
power7 (PowerISA-2.06).

But if the GCC is built --with-cpu=power6, the compiler is capable of
supporting -mcpu=power7 but will not generate bpermd/popcntd by default.
Then if some code them uses say _pext_u64 with -mcpu=power6 the
intrinsic is not not supported (needs power7) and so is not defined. 

The { dg-require-effective-target powerpc_vsx_ok } is not sufficient for
the { dg-do run } and need to be changed to vsx_hw. Also we need add
-mcpu=power7 to dg-options to insure the compiler will generated the
bpermd/popcntd instructions.

This is sufficient for all the bmi/bmi2 tests to skip/pass for power6
and later.

[gcc/testsuite]

2017-05-26  Steven Munroe  

* gcc.target/powerpc/bmi2-pdep32-1.c []: Add -mcpu=power7 to
dg-options.  Change dg-require-effective-target powerpc_vsx_ok
to vsx_hw.
* gcc.target/powerpc/bmi2-pdep64-1.c: Likewise.
* gcc.target/powerpc/bmi2-pext32-1.c: Likewise.
* gcc.target/powerpc/bmi2-pext64-1.c: Likewise.
* gcc.target/powerpc/bmi2-pext64-1a.c: Add -mcpu=power7 to
dg-options.

Index: gcc/testsuite/gcc.target/powerpc/bmi2-pdep32-1.c
===
--- gcc/testsuite/gcc.target/powerpc/bmi2-pdep32-1.c(revision 248468)
+++ gcc/testsuite/gcc.target/powerpc/bmi2-pdep32-1.c(working copy)
@@ -1,7 +1,7 @@
 /* { dg-do run } */
-/* { dg-options "-O3" } */
+/* { dg-options "-O3 -mcpu=power7" } */
 /* { dg-require-effective-target lp64 } */
-/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-require-effective-target vsx_hw } */
 
 #define NO_WARN_X86_INTRINSICS 1
 #include 
Index: gcc/testsuite/gcc.target/powerpc/bmi2-pdep64-1.c
===
--- gcc/testsuite/gcc.target/powerpc/bmi2-pdep64-1.c(revision 248468)
+++ gcc/testsuite/gcc.target/powerpc/bmi2-pdep64-1.c(working copy)
@@ -1,7 +1,7 @@
 /* { dg-do run } */
-/* { dg-options "-O3" } */
+/* { dg-options "-O3 -mcpu=power7" } */
 /* { dg-require-effective-target lp64 } */
-/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-require-effective-target vsx_hw } */
 
 #define NO_WARN_X86_INTRINSICS 1
 #include 
Index: gcc/testsuite/gcc.target/powerpc/bmi2-pext32-1.c
===
--- gcc/testsuite/gcc.target/powerpc/bmi2-pext32-1.c(revision 248468)
+++ gcc/testsuite/gcc.target/powerpc/bmi2-pext32-1.c(working copy)
@@ -1,7 +1,7 @@
 /* { dg-do run } */
-/* { dg-options "-O3" } */
+/* { dg-options "-O3 -mcpu=power7" } */
 /* { dg-require-effective-target lp64 } */
-/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-require-effective-target vsx_hw } */
 
 #define NO_WARN_X86_INTRINSICS 1
 #include 
Index: gcc/testsuite/gcc.target/powerpc/bmi2-pext64-1.c
===
--- gcc/testsuite/gcc.target/powerpc/bmi2-pext64-1.c(revision 248468)
+++ gcc/testsuite/gcc.target/powerpc/bmi2-pext64-1.c(working copy)
@@ -1,7 +1,7 @@
 /* { dg-do run } */
-/* { dg-options "-O3" } */
+/* { dg-options "-O3 -mcpu=power7" } */
 /* { dg-require-effective-target lp64 } */
-/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-require-effective-target vsx_hw } */
 
 #define NO_WARN_X86_INTRINSICS 1
 #include 
Index: gcc/testsuite/gcc.target/powerpc/bmi2-pext64-1a.c
===
--- gcc/testsuite/gcc.target/powerpc/bmi2-pext64-1a.c   (revision 248468)
+++ gcc/testsuite/gcc.target/powerpc/bmi2-pext64-1a.c   (working copy)
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O3" } */
+/* { dg-options "-O3 -mcpu=power7" } */
 /* { dg-require-effective-target lp64 } */
 /* { dg-require-effective-target powerpc_vsx_ok } */
 



Re: [PATCH rs6000] Addition fixes to BMI intrinsic tests, 2nd edition

2017-05-30 Thread Steven Munroe
On Tue, 2017-05-30 at 17:26 -0500, Segher Boessenkool wrote:
> On Fri, May 26, 2017 at 10:32:54AM -0500, Steven Munroe wrote:
> > * gcc.target/powerpc/bmi2-pdep32-1.c []: Add -mcpu=power7 to
> > dg-options.  Change dg-require-effective-target powerpc_vsx_ok
> > to vsx_hw.
> 
> Stray "[]"?
Yes, still not sure of the changelog conventions for DG options
> 
> > --- gcc/testsuite/gcc.target/powerpc/bmi2-pdep32-1.c(revision 
> > 248468)
> > +++ gcc/testsuite/gcc.target/powerpc/bmi2-pdep32-1.c(working copy)
> > @@ -1,7 +1,7 @@
> >  /* { dg-do run } */
> > -/* { dg-options "-O3" } */
> > +/* { dg-options "-O3 -mcpu=power7" } */
> >  /* { dg-require-effective-target lp64 } */
> > -/* { dg-require-effective-target powerpc_vsx_ok } */
> > +/* { dg-require-effective-target vsx_hw } */
> 
> Other testcases selecting a -mcpu= also use
> 
> /* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { 
> "-mcpu=power7" } } */
> 
> Do you really want -mcpu=power7 always?  Or just at least power7?
> 

I need at least -mcpu=power7 to generate popcntd/bpermd. The
pdep_u32/pext_u32 implementations call the respective 64-bit versions as
the ISA does not provide a 32-bit bpermd.

It is not obvious how to the skip until a minimum -mcpu=power7 for these
dc-do run tests.

If the dg-skip-if is required I will add it.

For the dg-do compile test for _pext_u64 I need the -mcpu=power7
specifically to get the correct counts for bpermd, popcntd and cntlzd.




[PATCH rs6000] Addition fixes to BMI intrinsic tests, 3rd edition

2017-05-31 Thread Steven Munroe
Bill Seurer pointed out that building the BMI tests on a power8 but with
gcc built --with-cpu=power6 fails with link errors. The intrinsics
_pdep_u64/32 and _pext_u64/32 are guarded with #ifdef _ARCH_PWR7 as the
implementation uses bpermd and popcntd instructions introduced with
power7 (PowerISA-2.06).

But if the GCC is built --with-cpu=power6, the compiler is capable of
supporting -mcpu=power7 but will not generate bpermd/popcntd by default.
Then if some code uses say _pext_u64 with -mcpu=power6 the
intrinsic is not not supported (needs power7) and so is not defined. 

The { dg-require-effective-target powerpc_vsx_ok } is not sufficient for
the { dg-do run } and need to be changed to vsx_hw. Also we need add
-mcpu=power7 to dg-options to insure the compiler will generated the
bpermd/popcntd instructions.

Also added:

{ dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" }
{ "-mcpu=power7" } }

and 

dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } }

To ward off the evil spirits 

Tests on BE --with-cpu=power6 -m32/-m64 and LE --with-cpu=power8. All
bmi/bmi2 intrinsic tests pasted.

[gcc/testsuite]

2017-05-31  Steven Munroe  

* gcc.target/powerpc/bmi2-pdep32-1.c: Add -mcpu=power7 to
dg-options.  Change dg-require-effective-target powerpc_vsx_ok
to vsx_hw.  Add dg-skip-if directive disable this test if
-mcpu overridden.
* gcc.target/powerpc/bmi2-pdep64-1.c: Likewise.
* gcc.target/powerpc/bmi2-pext32-1.c: Likewise.
* gcc.target/powerpc/bmi2-pext64-1.c: Likewise.
* gcc.target/powerpc/bmi2-pext64-1a.c: Add -mcpu=power7
to dg-option.  Add dg-skip-if directive to disable this test
for darwin.

Index: gcc/testsuite/gcc.target/powerpc/bmi2-pdep32-1.c
===
--- gcc/testsuite/gcc.target/powerpc/bmi2-pdep32-1.c(revision 248468)
+++ gcc/testsuite/gcc.target/powerpc/bmi2-pdep32-1.c(working copy)
@@ -1,7 +1,8 @@
 /* { dg-do run } */
-/* { dg-options "-O3" } */
+/* { dg-options "-O3 -mcpu=power7" } */
 /* { dg-require-effective-target lp64 } */
-/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-require-effective-target vsx_hw } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" }
{ "-mcpu=power7" } } */
 
 #define NO_WARN_X86_INTRINSICS 1
 #include 
Index: gcc/testsuite/gcc.target/powerpc/bmi2-pdep64-1.c
===
--- gcc/testsuite/gcc.target/powerpc/bmi2-pdep64-1.c(revision 248468)
+++ gcc/testsuite/gcc.target/powerpc/bmi2-pdep64-1.c(working copy)
@@ -1,7 +1,8 @@
 /* { dg-do run } */
-/* { dg-options "-O3" } */
+/* { dg-options "-O3 -mcpu=power7" } */
 /* { dg-require-effective-target lp64 } */
-/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-require-effective-target vsx_hw } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" }
{ "-mcpu=power7" } } */
 
 #define NO_WARN_X86_INTRINSICS 1
 #include 
Index: gcc/testsuite/gcc.target/powerpc/bmi2-pext32-1.c
===
--- gcc/testsuite/gcc.target/powerpc/bmi2-pext32-1.c(revision 248468)
+++ gcc/testsuite/gcc.target/powerpc/bmi2-pext32-1.c(working copy)
@@ -1,7 +1,8 @@
 /* { dg-do run } */
-/* { dg-options "-O3" } */
+/* { dg-options "-O3 -mcpu=power7" } */
 /* { dg-require-effective-target lp64 } */
-/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-require-effective-target vsx_hw } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" }
{ "-mcpu=power7" } } */
 
 #define NO_WARN_X86_INTRINSICS 1
 #include 
Index: gcc/testsuite/gcc.target/powerpc/bmi2-pext64-1.c
===
--- gcc/testsuite/gcc.target/powerpc/bmi2-pext64-1.c(revision 248468)
+++ gcc/testsuite/gcc.target/powerpc/bmi2-pext64-1.c(working copy)
@@ -1,7 +1,8 @@
 /* { dg-do run } */
-/* { dg-options "-O3" } */
+/* { dg-options "-O3 -mcpu=power7" } */
 /* { dg-require-effective-target lp64 } */
-/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-require-effective-target vsx_hw } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" }
{ "-mcpu=power7" } } */
 
 #define NO_WARN_X86_INTRINSICS 1
 #include 
Index: gcc/testsuite/gcc.target/powerpc/bmi2-pext64-1a.c
===
--- gcc/testsuite/gcc.target/powerpc/bmi2-pext64-1a.c   (revision 248468)
+++ gcc/testsuite/gcc.target/powerpc/bmi2-pext64-1a.c   (working copy)
@@ -1,5 +1,6 @@
 /* { dg-do compile } */
-/* { dg-options "-O3" } */
+/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */
+/* { dg-options "-O3 -mcpu=power7" } */
 /* { dg-require-effective-target lp64 } */
 /* { dg-require-effective-target powerpc_vsx_ok } */
 



Re: [PATCH], RFC, add support for __float128/__ibm128 types on PowerPC

2014-05-02 Thread Steven Munroe
On Fri, 2014-05-02 at 12:13 +0200, Jakub Jelinek wrote:
> Hi!
> 
> On Tue, Apr 29, 2014 at 06:30:32PM -0400, Michael Meissner wrote:
> > This patch adds support for a new type (__float128) on the PowerPC to allow
> > people to use the 128-bit IEEE floating point format instead of the 
> > traditional
> > IBM double-double that has been used in the Linux compilers.  At this time,
> > long double still will remain using the IBM double-double format.
> > 
> > There has been an undocumented option to switch long double to to IEEE 
> > 128-bit,
> > but right now, there are bugs I haven't ironed out on VSX systems.
> > 
> > In addition, I added another type (__ibm128) so that when the transition is
> > eventually made, people can use this type to get the old long double type.
> > 
> > I was wondering if people had any comments on the code so far, and things I
> > should different.  Note, I will be out on vacation May 6th - 14th, so I 
> > don't
> > expect to submit the patches until I get back.
> 
> For mangling, if you are going to mangle it the same as the -mlong-double-64
> long double, is __float128 going to be supported solely for ELFv2 ABI and
> are you sure nobody has ever used -mlong-double-64 or
> --without-long-double-128 configured compiler for it?

> What is the plan for glibc (and for libstdc++)?
> Looking at current ppc64le glibc, it seems it mistakenly still supports
> the -mlong-double-64 stuff (e.g. printf calls are usually redirected to
> __nldbl_printf (and tons of other calls).  So, is the plan to use
> yet another set of symbols?  For __nldbl_* it is about 113 entry points
> in libc.so and 1 in libm.so, but if you are going to support all of
> -mlong-double-64, -mlong-double-128 as well as __float128, that would be far
> more, because the compat -mlong-double-64 support mostly works by
> redirecting, either in headers or through a special *.a library, to
> corresponding double entry points whenever possible.
> So, if you call logl in -mlong-double-64 code, it will be redirected to
> log, because it has the same ABI.  But if you call *printf or nexttowardf
> etc. where there is no ABI compatible double entrypoint, it needs to be a
> new symbol.
> But with __float128 vs. __ibm128 and long double being either of those,
> you need different logl.
> 
Yes and we will work on a plan to do this. But at this time and near
future there is no performance advantage to __float128 over IBM long
double.

> Which is why it is so huge problem that this hasn't been resolved initially
> as part of ELFv2 changes.

Because it was a huge problem and there was no way for the required GCC
support to be available in time for GLIBC-2.19.

So we will develop a orderly, step by step transition plan. This will
take some time.