Re: [PATCH] rs6000, document built-ins vec_test_lsbb_all_ones and, vec_test_lsbb_all_zeros
Looking at the latest version of the Power Vector Intrinsic Programming Reference (Revision 2.0.0_prd, Bill slipped this to me for review), I see that vec_test_lsbb_all_ones vec_test_lsbb_all_zeros both specify vector unsigned char, only. On Mon, Aug 5, 2024 at 1:15 AM Kewen.Lin wrote: > on 2024/8/3 05:48, Peter Bergner wrote: > > On 7/31/24 10:21 PM, Kewen.Lin wrote: > >> on 2024/8/1 01:52, Carl Love wrote: > >>> Yes, I noticed that the built-ins were defined as overloaded but only > had one definition. Did seem odd to me. > >>> > either is with "vector unsigned char" as argument type, but the > corresponding instance > prototype in builtin table is with "vector signed char". It's > inconsistent and weird, > I think we can just update the prototype in builtin table with > "vector unsigned char" > and remove the entries in overload table. It can be a follow up > patch. > >>> > >>> I didn't notice that it was signed in the instance prototype but > unsigned in the overloaded definition. That is definitely inconsistent. > >>> > >>> That said, should we just go ahead and support both signed and > unsigned argument versions of the all ones and all zeros built-ins? > >> > >> Good question, I thought about that but found openxl only supports the > unsigned version > >> so I felt it's probably better to keep consistent with it. But I'm > fine for either, if > >> we decide to extend it to cover both signed and unsigned, we should > notify openxl team > >> to extend it as well. > >> > >> openxl doc links: > >> > >> > https://www.ibm.com/docs/en/openxl-c-and-cpp-aix/17.1.2?topic=functions-vec-test-lsbb-all-ones > >> > https://www.ibm.com/docs/en/openxl-c-and-cpp-aix/17.1.2?topic=functions-vec-test-lsbb-all-zeros > > > > If it makes sense to support vector signed char rather than only the > vector unsigned char, > > then I'm fine adding support for it. It almost seems since we tried > adding an overload > > for it, that that was our intention (to support both signed and > unsigned) and we just > > had a bug so only unsigned was supported? > > Good question but I'm not sure, it could be an oversight without adding > one more instance > for overloading, or adopting some useless code (only for overloading) for > a single instance. > I found it's introduced by r11-2437-gcf5d0fc2d1adcd, CC'ed Will as he > contributed this. > > BR, > Kewen > > > > > CC'ing Steve since he noticed the missing documentation when we was > trying to > > use the built-ins. Steve, do you see a need to also support vector > signed char > > with these built-ins? > > > > Peter > > > > > >
Re: [PATCH, AArch64] Add x86 intrinsic headers to GCC AArch64 taget
On Tue, 2017-06-20 at 09:04 +, Hurugalawadi, Naveen wrote: > Hi Joesph, > > Thanks for your review and valuable comments on this issue. > > Please find attached the patch that merges x86-intrinsics for AArch64 and PPC > architectures. > > >> it would seem to me to be a bad idea to duplicate the > >> implementation for more and more architectures. > Merged the implementation for AArch64 and PPC architectures. > > The testcase have not been merged yet. Will do it after checking out > the comments on the current idea of implementation. > > Please check the patch and let me know the comments. > > Bootstrapped and Regression tested on aarch64-thunder-linux and PPC. > I am not sure this works or is even a good idea. As an accident bmiintrin.h can be implemented as C code or common builtins. But bmi2intrin.h depends on __builtin_bpermd which to my knowledge is PowerISA only. As I work on mmx, sse, sse2, etc it gets more complicated. There are many X86 intrinsic instances that require altivec.h unique instrisics to implement efficiently for the power64le target and some inline __asm. Net the current sample size so far is to small to make a reasonable assessment. And as you see see below the gcc.target tests have to be duplicated anyway. Even if the C code is common there will many differences in dg-options and dg-require-effective-target. Trying to common these implementations only creates more small files to manage. > Thanks, > Naveen > > 2017-06-20 Naveen H.S > > [gcc] > * config.gcc (aarch64*-*-*): Add bmi2intrin.h, bmiintrin.h, > adxintrin.h and x86intrin.h in Config folder. > (powerpc*-*-*): Move bmi2intrin.h, bmiintrin.h and x86intrin.h into > Config folder. > * config/adxintrin.h: New file. > * config/bmi2intrin.h: New file. > * config/bmiintrin.h: New file. > * config/x86intrin.h: New file. > * config/rs6000/bmi2intrin.h: Delete file. > * config/rs6000/bmiintrin.h: Likewise. > * config/rs6000/x86intrin.h: Likewise. > > [gcc/testsuite] > > * gcc.target/aarch64/adx-addcarryx32-1.c: New file. > * gcc.target/aarch64/adx-addcarryx32-2.c: New file. > * gcc.target/aarch64/adx-addcarryx32-3.c: New file. > * gcc.target/aarch64/adx-addcarryx64-1.c: New file. > * gcc.target/aarch64/adx-addcarryx64-2.c: New file > * gcc.target/aarch64/adx-addcarryx64-3.c: New file > * gcc.target/aarch64/adx-check.h: New file > * gcc.target/aarch64/bmi-andn-1.c: New file > * gcc.target/aarch64/bmi-andn-2.c: New file. > * gcc.target/aarch64/bmi-bextr-1.c: New file. > * gcc.target/aarch64/bmi-bextr-2.c: New file. > * gcc.target/aarch64/bmi-bextr-4.c: New file. > * gcc.target/aarch64/bmi-bextr-5.c: New file. > * gcc.target/aarch64/bmi-blsi-1.c: New file. > * gcc.target/aarch64/bmi-blsi-2.c: New file. > * gcc.target/aarch64/bmi-blsmsk-1.c: new file. > * gcc.target/aarch64/bmi-blsmsk-2.c: New file. > * gcc.target/aarch64/bmi-blsr-1.c: New file. > * gcc.target/aarch64/bmi-blsr-2.c: New File. > * gcc.target/aarch64/bmi-check.h: New File. > * gcc.target/aarch64/bmi-tzcnt-1.c: new file. > * gcc.target/aarch64/bmi-tzcnt-2.c: New file. > * gcc.target/aarch64/bmi2-bzhi32-1.c: New file. > * gcc.target/aarch64/bmi2-bzhi64-1.c: New file. > * gcc.target/aarch64/bmi2-bzhi64-1a.c: New file. > * gcc.target/aarch64/bmi2-check.h: New file. > * gcc.target/aarch64/bmi2-mulx32-1.c: New file. > * gcc.target/aarch64/bmi2-mulx32-2.c: New file. > * gcc.target/aarch64/bmi2-mulx64-1.c: New file. > * gcc.target/aarch64/bmi2-mulx64-2.c: New file. > * gcc.target/aarch64/bmi2-pdep32-1.c: New file. > * gcc.target/aarch64/bmi2-pdep64-1.c: New file. > * gcc.target/aarch64/bmi2-pext32-1.c: New File. > * gcc.target/aarch64/bmi2-pext64-1.c: New file. > * gcc.target/aarch64/bmi2-pext64-1a.c: New File.
Re: [PATCH, AArch64] Add x86 intrinsic headers to GCC AArch64 taget
On Tue, 2017-06-20 at 17:16 -0500, Segher Boessenkool wrote: > On Tue, Jun 20, 2017 at 09:34:25PM +, Joseph Myers wrote: > > On Tue, 20 Jun 2017, Segher Boessenkool wrote: > > > > > > And as you see see below the gcc.target tests have to be duplicated > > > > anyway. Even if the C code is common there will many differences in > > > > dg-options and dg-require-effective-target. Trying to common these > > > > implementations only creates more small files to manage. > > > > > > So somewhere in the near future we'll have to pull things apart again, > > > if we go with merging things now. > > > > The common part in the intrinsics implementation should be exactly the > > parts that can be implemented in GNU C without target-specific intrinsics > > being needed. There should be nothing to pull apart if you start with the > > right things in the common header. If a particular header has some > > functions that can be implemented in GNU C and some that need > > target-specific code, the generic GNU C functions should be in a common > > header, #included by the target-specific header. The common header should > > have no conditionals on target architectures whatever (it might have > > conditionals on things like endianness). > > I don't think there is much that will end up in the common header > eventually. If it was possible to describe most of this in plain C, > and in such a way that it would optimise well, there would not *be* > these intrinsics. > > > I don't expect many different effective-target / dg-add-options keywords > > to be needed for common tests (obviously, duplicating tests for each > > architecture wanting these intrinsics is generally a bad idea). > > Yeah, I think it should be possible to share the tests, perhaps with > some added dg things (so that we don't have to repeat the same things > over and over). > I don't see how we can share the test as this requires platform unique dg-options and dg-require-effective-target values to enforce the platform restrictions you mentioned earlier.
[PATCH, rs6000] correct implementation of _mm_add_pi32
A small thinko in the implementation of _mm_add_pi32 that only shows when compiling for power9. ./gcc/ChangeLog: 2017-11-15 Steven Munroe * config/rs6000/mmintrin.h (_mm_add_pi32[_ARCH_PWR]): Correct parameter list for vec_splats. Index: gcc/config/rs6000/mmintrin.h === --- gcc/config/rs6000/mmintrin.h(revision 254714) +++ gcc/config/rs6000/mmintrin.h(working copy) @@ -463,8 +463,8 @@ _mm_add_pi32 (__m64 __m1, __m64 __m2) #if _ARCH_PWR9 __vector signed int a, b, c; - a = (__vector signed int)vec_splats (__m1, __m1); - b = (__vector signed int)vec_splats (__m2, __m2); + a = (__vector signed int)vec_splats (__m1); + b = (__vector signed int)vec_splats (__m2); c = vec_add (a, b); return (__builtin_unpack_vector_int128 ((__vector __int128_t)c, 0)); #else
[Fwd: [PATCH][Bug target/84266] mmintrin.h intrinsic headers for PowerPC code fails on power9]
--- Begin Message --- This has a simple fix that I have tested on power8 and Seurer are tested on power9. While there may be a more elegent coding for the require casts, this is the simplest change, considering the current stage. 2018-02-09 Steven Munroe * config/rs6000/mmintrin.h (_mm_cmpeq_pi32 [_ARCH_PWR9]): Cast vec_cmpeq result to correct type. * config/rs6000/mmintrin.h (_mm_cmpgt_pi32 [_ARCH_PWR9]): Cast vec_cmpgt result to correct type. Index: gcc/config/rs6000/mmintrin.h === --- gcc/config/rs6000/mmintrin.h(revision 257533) +++ gcc/config/rs6000/mmintrin.h(working copy) @@ -854,7 +854,7 @@ a = (__vector signed int)vec_splats (__m1); b = (__vector signed int)vec_splats (__m2); - c = (__vector signed short)vec_cmpeq (a, b); + c = (__vector signed int)vec_cmpeq (a, b); return (__builtin_unpack_vector_int128 ((__vector __int128_t)c, 0)); #else __m64_union m1, m2, res; @@ -883,7 +883,7 @@ a = (__vector signed int)vec_splats (__m1); b = (__vector signed int)vec_splats (__m2); - c = (__vector signed short)vec_cmpgt (a, b); + c = (__vector signed int)vec_cmpgt (a, b); return (__builtin_unpack_vector_int128 ((__vector __int128_t)c, 0)); #else __m64_union m1, m2, res; ready to commit?--- End Message ---
Re: [PATCH] PR target/66224 _GLIBC_READ_MEM_BARRIER
On Wed, 2015-05-20 at 14:40 -0400, David Edelsohn wrote: > The current definition of _GLIBC_READ_MEM_BARRIER in libstdc++ is too > weak for an ACQUIRE FENCE, which is what it is intended to be. The > original code emitted an "isync" instead of "lwsync". > > All of the guard acquire and set code needs to be cleaned up to use > GCC atomic intrinsics, but this is necessary for correctness. > > Steve, any comment about the Linux part? > This is correct for the PowerISA V2 (POWER4 and later) processors. I assume the #ifdef __NO_LWSYNC guard is only set for older (ISA V1) processors. Thanks
[PATCH, rs6000] 1/2 Add x86 SSE2 intrinsics to GCC PPC64LE target
These is the forth major contribution of X86 intrinsic equivalent headers for PPC64LE. X86 SSE2 technology adds double float (__m128d) support, filled in a number 128-bit vector integer (__m128i) operations and added some MMX conversions to and from 128-bit vector (XMM) operations. In general the SSE2 (__m128) intrinsic's are a good match to the PowerISA VSX 128-bit vector double facilities. This allows direct mapping of the __m128d type to PowerPC __vector double type and allows natural handling of parameter passing, return values, and SIMD double operations. However, while both ISA's support double and float scalars in vector registers the X86_64 and PowerPC64LE use different formats (and bits within the vector register) for floating point scalars. This requires extra PowerISA operations to exactly match the X86 SSE scalar (intrinsic functions ending in *_sd) semantics. The intent is to provide a functionally correct implementation at some reduction in performance. Some inline assembler is required. There a several cases where we need to generate Data Cache Block instruction. There are no existing builtin for flush and touch for store transient. Also some of the double to and from 32-bit float and int required assembler to the correct semantics at reasonable cost. Perhaps these can be revisited when the team completes the builtins for vec_double* and vec_float*. part 2 adds the associated 131 DG test cases. ./gcc/ChangeLog: 2017-10-17 Steven Munroe * config.gcc (powerpc*-*-*): Add emmintrin.h. * config/rs6000/emmintrin.h: New file. * config/rs6000/x86intrin.h [__ALTIVEC__]: Include emmintrin.h. Index: gcc/config.gcc === --- gcc/config.gcc (revision 253786) +++ gcc/config.gcc (working copy) @@ -459,7 +459,7 @@ powerpc*-*-*) extra_objs="rs6000-string.o rs6000-p8swap.o" extra_headers="ppc-asm.h altivec.h htmintrin.h htmxlintrin.h" extra_headers="${extra_headers} bmi2intrin.h bmiintrin.h" - extra_headers="${extra_headers} xmmintrin.h mm_malloc.h" + extra_headers="${extra_headers} xmmintrin.h mm_malloc.h emmintrin.h" extra_headers="${extra_headers} mmintrin.h x86intrin.h" extra_headers="${extra_headers} ppu_intrinsics.h spu2vmx.h vec_types.h si2vmx.h" extra_headers="${extra_headers} paired.h" Index: gcc/config/rs6000/x86intrin.h === --- gcc/config/rs6000/x86intrin.h (revision 253786) +++ gcc/config/rs6000/x86intrin.h (working copy) @@ -39,6 +39,8 @@ #include #include + +#include #endif /* __ALTIVEC__ */ #include Index: gcc/config/rs6000/emmintrin.h === --- gcc/config/rs6000/emmintrin.h (revision 0) +++ gcc/config/rs6000/emmintrin.h (revision 0) @@ -0,0 +1,2413 @@ +/* Copyright (C) 2003-2017 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + Under Section 7 of GPL version 3, you are granted additional + permissions described in the GCC Runtime Library Exception, version + 3.1, as published by the Free Software Foundation. + + You should have received a copy of the GNU General Public License and + a copy of the GCC Runtime Library Exception along with this program; + see the files COPYING3 and COPYING.RUNTIME respectively. If not, see + <http://www.gnu.org/licenses/>. */ + +/* Implemented from the specification included in the Intel C++ Compiler + User Guide and Reference, version 9.0. */ + +#ifndef NO_WARN_X86_INTRINSICS +/* This header is distributed to simplify porting x86_64 code that + makes explicit use of Intel intrinsics to powerpc64le. + It is the user's responsibility to determine if the results are + acceptable and make additional changes as necessary. + Note that much code that uses Intel intrinsics can be rewritten in + standard C or GNU C extensions, which are more portable and better + optimized across multiple targets. + + In the specific case of X86 SSE2 (__m128i, __m128d) intrinsics, + the PowerPC VMX/VSX ISA is a good match for vector double SIMD + operations. However scalar double operations in vector (XMM) + registers require the POWER8 VSX ISA (2.07) level. Also there are + important differences for data format and placement of doub
Re: [PATCH, rs6000] 1/2 Add x86 SSE2 intrinsics to GCC PPC64LE target
On Mon, 2017-10-23 at 16:21 -0500, Segher Boessenkool wrote: > Hi! > > On Tue, Oct 17, 2017 at 01:24:45PM -0500, Steven Munroe wrote: > > Some inline assembler is required. There a several cases where we need > > to generate Data Cache Block instruction. There are no existing builtin > > for flush and touch for store transient. > > Would builtins for those help? Would anything else want to use such > builtins, I mean? > Yes I think NVMe and In-memory DB in general will want easy access to these instructions. Intel provides intrinsic functions Builtin or intrinsic will be easier then finding and reading the PowerISA and trying to write your own inline asm > > + For PowerISA Scalar double in FPRs (left most 64-bits of the > > + low 32 VSRs), while X86_64 SSE2 uses the right most 64-bits of > > + the XMM. These differences require extra steps on POWER to match > > + the SSE2 scalar double semantics. > > Maybe say "is in FPRs"? (And two space after a full stop, here and > elsewhere). > Ok > > +/* We need definitions from the SSE header files*/ > > Dot space space. > Ok > > +/* Sets the low DPFP value of A from the low value of B. */ > > +extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, > > __artificial__)) > > +_mm_move_sd (__m128d __A, __m128d __B) > > +{ > > +#if 1 > > + __v2df result = (__v2df) __A; > > + result [0] = ((__v2df) __B)[0]; > > + return (__m128d) result; > > +#else > > + return (vec_xxpermdi(__A, __B, 1)); > > +#endif > > +} > Meant to check what trunk generated and them pick one. Done. > You probably forgot to finish this? Or, what are the two versions, > and why are they both here? Same question later a few times. > > > +/* Add the lower double-precision (64-bit) floating-point element in > > + * a and b, store the result in the lower element of dst, and copy > > + * the upper element from a to the upper element of dst. */ > > No leading stars on block comments please. > > > +extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, > > __artificial__)) > > +_mm_cmpnge_pd (__m128d __A, __m128d __B) > > +{ > > + return ((__m128d)vec_cmplt ((__v2df ) __A, (__v2df ) __B)); > > +} > > You have some spaces before closing parentheses here (and elsewhere -- > please check). > Ok > > +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, > > __artificial__)) > > +_mm_cvtpd_epi32 (__m128d __A) > > +{ > > + __v2df rounded = vec_rint (__A); > > + __v4si result, temp; > > + const __v4si vzero = > > +{ 0, 0, 0, 0 }; > > + > > + /* VSX Vector truncate Double-Precision to integer and Convert to > > + Signed Integer Word format with Saturate. */ > > + __asm__( > > + "xvcvdpsxws %x0,%x1;\n" > > + : "=wa" (temp) > > + : "wa" (rounded) > > + : ); > > Why the ";\n"? And no empty clobber list please. > Ok > > +extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, > > __artificial__)) > > +_mm_cvtps_pd (__m128 __A) > > +{ > > + /* Check if vec_doubleh is defined by . If so use that. */ > > +#ifdef vec_doubleh > > + return (__m128d) vec_doubleh ((__v4sf)__A); > > +#else > > + /* Otherwise the compiler is not current and so need to generate the > > + equivalent code. */ > > Do we need this? The compiler will always be current. > the vec_double* and vec_float* builtins where in flux at the time and this deferred the problem. Not sure what their status is now. Would still need this if we want to backport to GCC7 (AT11) and there are more places where now we only have asm. > > +extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, > > __artificial__)) > > +_mm_loadl_pd (__m128d __A, double const *__B) > > +{ > > + __v2df result = (__v2df)__A; > > + result [0] = *__B; > > + return (__m128d)result; > > +} > > +#ifdef _ARCH_PWR8 > > +/* Intrinsic functions that require PowerISA 2.07 minimum. */ > > You want an empty line before that #ifdef. > Ok fixed > > +extern __inline int __attribute__((__gnu_inline__, __always_inline__, > > __artificial__)) > > +_mm_movemask_pd (__m128d __A) > > +{ > > + __vector __m64 result; > > + static const __vector unsigned int perm_mask = > > +{ > > +#ifdef __LITTLE_ENDIAN__ > > + 0x80800040, 0x80808080, 0x80808080, 0x80808080 > > +#elif __BIG_ENDIAN__ > > + 0x80808080, 0
Re: [PATCH, rs6000] 2/2 Add x86 SSE2 intrinsics to GCC PPC64LE target
On Wed, 2017-10-25 at 18:37 -0500, Segher Boessenkool wrote: > Hi! > > On Tue, Oct 17, 2017 at 01:27:16PM -0500, Steven Munroe wrote: > > This it part 2/2 for contributing PPC64LE support for X86 SSE2 > > instrisics. This patch includes testsuite/gcc.target tests for the > > intrinsics included by emmintrin.h. > > > --- gcc/testsuite/gcc.target/powerpc/sse2-mmx.c (revision 0) > > +++ gcc/testsuite/gcc.target/powerpc/sse2-mmx.c (revision 0) > > @@ -0,0 +1,83 @@ > > +/* { dg-do run } */ > > +/* { dg-options "-O3 -mdirect-move" } */ > > +/* { dg-require-effective-target lp64 } */ > > +/* { dg-require-effective-target p8vector_hw } */ > > +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { > > "-mcpu=power8" } } */ > > Why this dg-skip-if? Also, why -mdirect-move? > this is weird test because it the effectively MMX style operations but added to IA under the SSE2 Technology. Normally mmintrin.h compare operations require a transfer to/from vector with direct move for efficient execution on power. The one exception to that is _mm_cmpeq_pi8 which can be implemented directly in GPRs using cmpb. The cmpb instruction is from power6 but I do not want to use -mcpu=power6 here. -mdirect-move is a compromise. I suspect that the dg-skip-if is an artifact of the early struggles to make this stuff work across various --withcpu= settings. I think the key is dg-require-effective-target p8vector_hw which should allow dropping both the -mdirect-move and the whole dg-skip-if clause. Will need to try this change and retest. > > Okay for trunk with that taken care of. Sorry it took a while. > > Have you tested this on big endian btw? > Yes. I have tested on P8 BE using --withcpu=[power6 | power7 | power8 ] > > Segher >
[PATCH, rs6000] 1/2 Add x86 MMX intrinsics to GCC PPC64LE taget
These is the second major contribution of X86 intrinsic equivalent headers for PPC64LE. X86 MMX technology was the earlest integer SIMD and 64-bit scalar extension for IA32. MMX should have largely been replaced by now with X86_64 64-bit scalars and SSE 128-bit SIMD operation in modern application. However it is still part of the X86 API and and supported via the mmintrin.h header and numerous GCC built-ins. The mmintrin.h is included from the SSE instruction headers and x86intrin,h. So it needs to be there to simplify porting of existing X86 applications to PPC64LE. In the specific case of X86 MMX (__m64) intrinsics, the PowerPC target does not support a native __vector_size__ (8) type. Instead we typedef __m64 to a 64-bit unsigned long long, which is natively supported in 64-bit mode. This works well for the _si64 and some _pi32 operations, but starts to generate long sequences for _pi16 and _pi8 operations. For those cases it better (faster and smaller code) to transfer __m64 data to the PowerPC (VMX/VSX) vector 128-bit unit, perform the operation, and then transfer the result back to the __m64 type. This implies that the direct register move instructions, introduced with power8, are available for efficient implementation of these transfers. This patch submission includes just the config.gcc and associated MMX headers changes to make the review more manageable. A separate patch for the DG test cases will follow. ./gcc/ChangeLog: 2017-07-06 Steven Munroe * config.gcc (powerpc*-*-*): Add mmintrin.h. * config/rs6000/mmintrin.h: New file. * config/rs6000/x86intrin.h: Include mmintrin.h. Index: gcc/config.gcc === --- gcc/config.gcc (revision 249663) +++ gcc/config.gcc (working copy) @@ -456,7 +456,8 @@ powerpc*-*-*) cpu_type=rs6000 extra_objs="rs6000-string.o" extra_headers="ppc-asm.h altivec.h htmintrin.h htmxlintrin.h" - extra_headers="${extra_headers} bmi2intrin.h bmiintrin.h x86intrin.h" + extra_headers="${extra_headers} bmi2intrin.h bmiintrin.h" + extra_headers="${extra_headers} mmintrin.h x86intrin.h" extra_headers="${extra_headers} ppu_intrinsics.h spu2vmx.h vec_types.h si2vmx.h" extra_headers="${extra_headers} paired.h" case x$with_cpu in Index: gcc/config/rs6000/mmintrin.h === --- gcc/config/rs6000/mmintrin.h(revision 0) +++ gcc/config/rs6000/mmintrin.h(revision 0) @@ -0,0 +1,1444 @@ +/* Copyright (C) 2002-2017 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + Under Section 7 of GPL version 3, you are granted additional + permissions described in the GCC Runtime Library Exception, version + 3.1, as published by the Free Software Foundation. + + You should have received a copy of the GNU General Public License and + a copy of the GCC Runtime Library Exception along with this program; + see the files COPYING3 and COPYING.RUNTIME respectively. If not, see + <http://www.gnu.org/licenses/>. */ + +/* Implemented from the specification included in the Intel C++ Compiler + User Guide and Reference, version 9.0. */ + +#ifndef NO_WARN_X86_INTRINSICS +/* This header is distributed to simplify porting x86_64 code that + makes explicit use of Intel intrinsics to powerpc64le. + It is the user's responsibility to determine if the results are + acceptable and make additional changes as necessary. + Note that much code that uses Intel intrinsics can be rewritten in + standard C or GNU C extensions, which are more portable and better + optimized across multiple targets. + + In the specific case of X86 MMX (__m64) intrinsics, the PowerPC + target does not support a native __vector_size__ (8) type. Instead + we typedef __m64 to a 64-bit unsigned long long, which is natively + supported in 64-bit mode. This works well for the _si64 and some + _pi32 operations, but starts to generate long sequences for _pi16 + and _pi8 operations. For those cases it better (faster and + smaller code) to transfer __m64 data to the PowerPC vector 128-bit + unit, perform the operation, and then transfer the result back to + the __m64 type. This implies that the direct register move + instructions, introduced with power8, are available for efficient + implementat
[PATCH rs6000] Fix up BMI/BMI2 intrinsic DG tests
After a resent GCC change the previously submitted BMI/BMI2 intrinsic test started to fail with the following warning/error. ppc_cpu_supports_hw_available122373.c: In function 'main': ppc_cpu_supports_hw_available122373.c:9:10: warning: __builtin_cpu_supports need s GLIBC (2.23 and newer) that exports hardware capability bits The does not occur on systems with the newer (2.23) GLIBC but is common on older (stable) distos. As this is coming from the bmi-check.h and bmi2-check.h includes (and not the tests directly) it seems simpler to simply skip the test unless __BUILTIN_CPU_SUPPORTS__ is defined. [gcc/testsuite] 2017-07-17 Steven Munroe *gcc.target/powerpc/bmi-check.h (main): Skip unless __BUILTIN_CPU_SUPPORTS__ defined. *gcc.target/powerpc/bmi2-check.h (main): Skip unless __BUILTIN_CPU_SUPPORTS__ defined. Index: gcc/testsuite/gcc.target/powerpc/bmi-check.h === --- gcc/testsuite/gcc.target/powerpc/bmi-check.h(revision 250212) +++ gcc/testsuite/gcc.target/powerpc/bmi-check.h(working copy) @@ -13,6 +13,7 @@ do_test (void) int main () { +#ifdef __BUILTIN_CPU_SUPPORTS__ /* Need 64-bit for 64-bit longs as single instruction. */ if ( __builtin_cpu_supports ("ppc64") ) { @@ -25,6 +26,6 @@ main () else printf ("SKIPPED\n"); #endif - +#endif /* __BUILTIN_CPU_SUPPORTS__ */ return 0; } Index: gcc/testsuite/gcc.target/powerpc/bmi2-check.h === --- gcc/testsuite/gcc.target/powerpc/bmi2-check.h (revision 250212) +++ gcc/testsuite/gcc.target/powerpc/bmi2-check.h (working copy) @@ -13,6 +13,7 @@ do_test (void) int main () { +#ifdef __BUILTIN_CPU_SUPPORTS__ /* The BMI2 test for pext test requires the Bit Permute doubleword (bpermd) instruction added in PowerISA 2.06 along with the VSX facility. So we can test for arch_2_06. */ @@ -27,7 +28,7 @@ main () else printf ("SKIPPED\n"); #endif - +#endif /* __BUILTIN_CPU_SUPPORTS__ */ return 0; }
[PATCH, rs6000] Rev 2, 1/2 Add x86 MMX intrinsics to GCC PPC64LE target
Correct the problems Segher found in review and added a changes to deal with the fallout from the __builtin_cpu_supports warning for older distros. Tested on P8 LE and P6/P7/P8 BE. No new tests failures. ./gcc/ChangeLog: 2017-07-17 Steven Munroe * config.gcc (powerpc*-*-*): Add mmintrin.h. * config/rs6000/mmintrin.h: New file. * config/rs6000/x86intrin.h [__ALTIVEC__]: Include mmintrin.h. Index: gcc/config/rs6000/mmintrin.h === --- gcc/config/rs6000/mmintrin.h(revision 0) +++ gcc/config/rs6000/mmintrin.h(working copy) @@ -0,0 +1,1456 @@ +/* Copyright (C) 2002-2017 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + Under Section 7 of GPL version 3, you are granted additional + permissions described in the GCC Runtime Library Exception, version + 3.1, as published by the Free Software Foundation. + + You should have received a copy of the GNU General Public License and + a copy of the GCC Runtime Library Exception along with this program; + see the files COPYING3 and COPYING.RUNTIME respectively. If not, see + <http://www.gnu.org/licenses/>. */ + +/* Implemented from the specification included in the Intel C++ Compiler + User Guide and Reference, version 9.0. */ + +#ifndef NO_WARN_X86_INTRINSICS +/* This header is distributed to simplify porting x86_64 code that + makes explicit use of Intel intrinsics to powerpc64le. + It is the user's responsibility to determine if the results are + acceptable and make additional changes as necessary. + Note that much code that uses Intel intrinsics can be rewritten in + standard C or GNU C extensions, which are more portable and better + optimized across multiple targets. + + In the specific case of X86 MMX (__m64) intrinsics, the PowerPC + target does not support a native __vector_size__ (8) type. Instead + we typedef __m64 to a 64-bit unsigned long long, which is natively + supported in 64-bit mode. This works well for the _si64 and some + _pi32 operations, but starts to generate long sequences for _pi16 + and _pi8 operations. For those cases it better (faster and + smaller code) to transfer __m64 data to the PowerPC vector 128-bit + unit, perform the operation, and then transfer the result back to + the __m64 type. This implies that the direct register move + instructions, introduced with power8, are available for efficient + implementation of these transfers. + + Most MMX intrinsic operations can be performed efficiently as + C language 64-bit scalar operation or optimized to use the newer + 128-bit SSE/Altivec operations. We recomend this for new + applications. */ +#warning "Please read comment above. Use -DNO_WARN_X86_INTRINSICS to disable this warning." +#endif + +#ifndef _MMINTRIN_H_INCLUDED +#define _MMINTRIN_H_INCLUDED + +#include +/* The Intel API is flexible enough that we must allow aliasing with other + vector types, and their scalar components. */ +typedef __attribute__ ((__aligned__ (8))) unsigned long long __m64; + +typedef __attribute__ ((__aligned__ (8))) +union + { +__m64 as_m64; +char as_char[8]; +signed char as_signed_char [8]; +short as_short[4]; +int as_int[2]; +long long as_long_long; +float as_float[2]; +double as_double; + } __m64_union; + +/* Empty the multimedia state. */ +extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_empty (void) +{ + /* nothing to do on PowerPC. */ +} + +extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_m_empty (void) +{ + /* nothing to do on PowerPC. */ +} + +/* Convert I to a __m64 object. The integer is zero-extended to 64-bits. */ +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtsi32_si64 (int __i) +{ + return (__m64) (unsigned int) __i; +} + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_m_from_int (int __i) +{ + return _mm_cvtsi32_si64 (__i); +} + +/* Convert the lower 32 bits of the __m64 object into an integer. */ +extern __inline int __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtsi64_si32 (__m64 __i) +{ + return ((int) __i); +} + +extern __inline int __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_m_to_int (__m64 __i) +{ + return _mm_cvtsi64_si3
Re: [PATCH rs6000] Fix up BMI/BMI2 intrinsic DG tests
On Tue, 2017-07-18 at 16:54 -0500, Segher Boessenkool wrote: > Hi! > > On Mon, Jul 17, 2017 at 01:28:20PM -0500, Steven Munroe wrote: > > After a resent GCC change the previously submitted BMI/BMI2 intrinsic > > test started to fail with the following warning/error. > > > > ppc_cpu_supports_hw_available122373.c: In function 'main': > > ppc_cpu_supports_hw_available122373.c:9:10: warning: > > __builtin_cpu_supports need > > s GLIBC (2.23 and newer) that exports hardware capability bits > > > > The does not occur on systems with the newer (2.23) GLIBC but is common > > on older (stable) distos. > > > > As this is coming from the bmi-check.h and bmi2-check.h includes (and > > not the tests directly) it seems simpler to simply skip the test unless > > __BUILTIN_CPU_SUPPORTS__ is defined. > > So this will skip on most current systems; is there no reasonable > way around that? > The work around would be to add an #else leg where we obtain the address of the auxv then scan for the AT_PLATFOM, AT_HWCAP, and AT_HWCAP2 entries. Then perform the required string compares and / or bit tests. > Okay otherwise. One typo thing: > > > 2017-07-17 Steven Munroe > > > > *gcc.target/powerpc/bmi-check.h (main): Skip unless > > __BUILTIN_CPU_SUPPORTS__ defined. > > *gcc.target/powerpc/bmi2-check.h (main): Skip unless > > __BUILTIN_CPU_SUPPORTS__ defined. > > There should be a space after the asterisks. > > > Segher >
[PATCH, rs6000] 2/2 Add x86 MMX intrinsics DG tests to GCC PPC64LE taget
This it part 2/2 for contributing PPC64LE support for X86 MMX instrisics. This patch adds the DG tests to verify the headers contents. Oddly there are very few MMX specific included in i386 so I had to adapt some the SSE tested to smaller vector size. [gcc/testsuite] 2017-07-18 Steven Munroe * gcc.target/powerpc/mmx-check.h: New file. * gcc.target/powerpc/mmx-packs.c: New file. * gcc.target/powerpc/mmx-packssdw-1.c: New file. * gcc.target/powerpc/mmx-packsswb-1.c: New file. * gcc.target/powerpc/mmx-packuswb-1.c: New file. * gcc.target/powerpc/mmx-paddb-1.c: New file. * gcc.target/powerpc/mmx-paddd-1.c: New file. * gcc.target/powerpc/mmx-paddsb-1.c: New file. * gcc.target/powerpc/mmx-paddsw-1.c: New file. * gcc.target/powerpc/mmx-paddusb-1.c: New file. * gcc.target/powerpc/mmx-paddusw-1.c: New file. * gcc.target/powerpc/mmx-paddw-1.c: New file. * gcc.target/powerpc/mmx-pcmpeqb-1.c: New file. * gcc.target/powerpc/mmx-pcmpeqd-1.c: New file. * gcc.target/powerpc/mmx-pcmpeqw-1.c: New file. * gcc.target/powerpc/mmx-pcmpgtb-1.c: New file. * gcc.target/powerpc/mmx-pcmpgtd-1.c: New file. * gcc.target/powerpc/mmx-pcmpgtw-1.c: New file. * gcc.target/powerpc/mmx-pmaddwd-1.c: New file. * gcc.target/powerpc/mmx-pmulhw-1.c: New file. * gcc.target/powerpc/mmx-pmullw-1.c: New file. * gcc.target/powerpc/mmx-pslld-1.c: New file. * gcc.target/powerpc/mmx-psllw-1.c: New file. * gcc.target/powerpc/mmx-psrad-1.c: New file. * gcc.target/powerpc/mmx-psraw-1.c: New file. * gcc.target/powerpc/mmx-psrld-1.c: New file. * gcc.target/powerpc/mmx-psrlw-1.c: New file. * gcc.target/powerpc/mmx-psubb-2.c: New file. * gcc.target/powerpc/mmx-psubd-2.c: New file. * gcc.target/powerpc/mmx-psubsb-1.c: New file. * gcc.target/powerpc/mmx-psubsw-1.c: New file. * gcc.target/powerpc/mmx-psubusb-1.c: New file. * gcc.target/powerpc/mmx-psubusw-1.c: New file. * gcc.target/powerpc/mmx-psubw-2.c: New file. * gcc.target/powerpc/mmx-punpckhbw-1.c: New file. * gcc.target/powerpc/mmx-punpckhdq-1.c: New file. * gcc.target/powerpc/mmx-punpckhwd-1.c: New file. * gcc.target/powerpc/mmx-punpcklbw-1.c: New file. * gcc.target/powerpc/mmx-punpckldq-1.c: New file. * gcc.target/powerpc/mmx-punpcklwd-1.c: New file. Index: gcc/testsuite/gcc.target/powerpc/mmx-check.h === --- gcc/testsuite/gcc.target/powerpc/mmx-check.h(nonexistent) +++ gcc/testsuite/gcc.target/powerpc/mmx-check.h(working copy) @@ -0,0 +1,35 @@ +#include +#include + +static void mmx_test (void); + +static void +__attribute__ ((noinline)) +do_test (void) +{ + mmx_test (); +} + +int +main () + { +#ifdef __BUILTIN_CPU_SUPPORTS__ +/* Many MMX intrinsics are simpler / faster to implement by + * transferring the __m64 (long int) to vector registers for SIMD + * operations. To be efficient we also need the direct register + * transfer instructions from POWER8. So we can test for + * arch_2_07. */ +if ( __builtin_cpu_supports ("arch_2_07") ) + { + do_test (); +#ifdef DEBUG + printf ("PASSED\n"); +#endif + } +#ifdef DEBUG +else + printf ("SKIPPED\n"); +#endif +#endif /* __BUILTIN_CPU_SUPPORTS__ */ +return 0; + } Index: gcc/testsuite/gcc.target/powerpc/mmx-packs.c === --- gcc/testsuite/gcc.target/powerpc/mmx-packs.c(nonexistent) +++ gcc/testsuite/gcc.target/powerpc/mmx-packs.c(working copy) @@ -0,0 +1,91 @@ +/* { dg-do run } */ +/* { dg-options "-O3 -mpower8-vector" } */ +/* { dg-require-effective-target lp64 } */ +/* { dg-require-effective-target p8vector_hw { target powerpc*-*-* } } */ + +#define NO_WARN_X86_INTRINSICS 1 +#include +#include "mmx-check.h" + +#ifndef TEST +#define TEST mmx_test +#endif + +static void +__attribute__ ((noinline)) +check_packs_pu16 (unsigned long long int src1, unsigned long long int src2, + unsigned long long int res_ref) +{ + unsigned long long int res; + + res = (unsigned long long int) _mm_packs_pu16 ((__m64 ) src1, (__m64 ) src2); + + if (res != res_ref) +abort (); +} + +static void +__attribute__ ((noinline)) +check_packs_pi16 (unsigned long long int src1, unsigned long long int src2, + unsigned long long int res_ref) +{ + unsigned long long int res; + + res = (unsigned long long int) _mm_packs_pi16 ((__m64 ) src1, (__m64 ) src2); + + + if (res != res_ref) +abort (); +} + +static void +__attribute__ ((noinline)) +check_packs_pi32 (unsigned long long int src1, unsigned long long int src2, + unsigned long long int r
Re: [PATCH rs6000] Fix up BMI/BMI2 intrinsic DG tests
On Wed, 2017-07-19 at 12:45 -0500, Segher Boessenkool wrote: > On Tue, Jul 18, 2017 at 05:10:42PM -0500, Steven Munroe wrote: > > On Tue, 2017-07-18 at 16:54 -0500, Segher Boessenkool wrote: > > > On Mon, Jul 17, 2017 at 01:28:20PM -0500, Steven Munroe wrote: > > > > After a resent GCC change the previously submitted BMI/BMI2 intrinsic > > > > test started to fail with the following warning/error. > > > > > > > > ppc_cpu_supports_hw_available122373.c: In function 'main': > > > > ppc_cpu_supports_hw_available122373.c:9:10: warning: > > > > __builtin_cpu_supports need > > > > s GLIBC (2.23 and newer) that exports hardware capability bits > > > > > > > > The does not occur on systems with the newer (2.23) GLIBC but is common > > > > on older (stable) distos. > > > > > > > > As this is coming from the bmi-check.h and bmi2-check.h includes (and > > > > not the tests directly) it seems simpler to simply skip the test unless > > > > __BUILTIN_CPU_SUPPORTS__ is defined. > > > > > > So this will skip on most current systems; is there no reasonable > > > way around that? > > > > > The work around would be to add an #else leg where we obtain the address > > of the auxv then scan for the AT_PLATFOM, AT_HWCAP, and AT_HWCAP2 > > entries. Then perform the required string compares and / or bit tests. > > Yeah let's not do that. We'll just have to live with less test > coverage by random testers, for now. It's no different from any other > new feature in that regard. > So proceed with check in ?
Re: [PATCH, rs6000] 2/2 Add x86 MMX intrinsics DG tests to GCC PPC64LE taget
On Wed, 2017-07-19 at 16:42 -0500, Segher Boessenkool wrote: > Hi Steve, > > On Wed, Jul 19, 2017 at 10:14:01AM -0500, Steven Munroe wrote: > > This it part 2/2 for contributing PPC64LE support for X86 MMX > > instrisics. This patch adds the DG tests to verify the headers contents. > > Oddly there are very few MMX specific included in i386 so I had to adapt > > some the SSE tested to smaller vector size. > > Juat two comments... > > > +/* Many MMX intrinsics are simpler / faster to implement by > > + * transferring the __m64 (long int) to vector registers for SIMD > > + * operations. To be efficient we also need the direct register > > + * transfer instructions from POWER8. So we can test for > > + * arch_2_07. */ > > We don't use leading * in block comments. Not that I care in test > cases, but you seem to be following the coding standards otherwise :-) > This is Eclipse CDT GNU format-er. Seems it is acceptable, most of the time. I will try to convince it not to add the leading * in the future. For now I'll fix the comment manually before I commit. > > --- gcc/testsuite/gcc.target/powerpc/mmx-packs.c(nonexistent) > > +++ gcc/testsuite/gcc.target/powerpc/mmx-packs.c(working copy) > > @@ -0,0 +1,91 @@ > > +/* { dg-do run } */ > > +/* { dg-options "-O3 -mpower8-vector" } */ > > +/* { dg-require-effective-target lp64 } */ > > +/* { dg-require-effective-target p8vector_hw { target powerpc*-*-* } } > > */ > > Why have the target selector here, and not on the dg-options line as > well? Don't we need it in both places, or neither? (I think you don't > need it, same for all other files here). > I was backed into this because we dont have the /* (dg-require-effective-target p8vector_min } */ yet. And we don't want to use -mcpu=power8 if we mean power8 or power9 and later. The { target powerpc*-*-* } bit is there to enable possible future sharing of DG tests across platforms. So I should either remove the target selector or add it any line that the platform specific? For example: /* { dg-options "-O3 -mpower8-vector" { target powerpc*-*-* } } */ If you agree with the above, I will correct and commit.
[PATCH, rs6000] 1/3 Add x86 SSE intrinsics to GCC PPC64LE taget
These is the third major contribution of X86 intrinsic equivalent headers for PPC64LE. X86 SSE technology was the second SIMD extension which added wider 128-bit vector (XMM) registers and single precision float capability. They also addressed missing MMX capabilies and provided transfers (move, pack, unpack) operations between MMX and XMM registers. This was embodied in the xmmintrin.h> header (in part 2/3). The implementation also provided the mm_malloc.h API to allow for correct 16-byte alignment where the system malloc may only provide 8-byte alignment. PowerPC64LE can assume the PowerPC quadword (16-byte) alignment but we provide this header and API to ease the application porting process. The mm_malloc.h header is implicitly included by xmmintrin.h. In general the SSE (__m128) intrinsic's are a better match to the PowerISA VMX/VSX 128-bit vector facilities. This allows direct mapping of the __m128 type to PowerPC __vector float and allows natural handling of parameter passing return values and SIMD float operations. However while both ISA's support float scalars in vector registers the X86_64 and PowerPC64LE use different formats (and bits within the vector register) for float scalars. This requires extra PowerISA operations to exactly match the X86 scalar float (intrinsics ending in *_ss) semantics. The intent is to provide a functionally correct implementation at some reduction in performance. This patch just adds the mm_malloc.h header with is will be needed by xmmintrin.h and cleans up some noisy warnings from the previous MMX commit. Part 2 adds the xmmintrin.h include and associated config.gcc and x86intrin.h changes part 3 adds the associated DG test cases. ./gcc/ChangeLog: 2017-08-16 Steven Munroe * config/rs6000/mm_malloc.h: New file. [gcc/testsuite] 2017-07-21 Steven Munroe * gcc.target/powerpc/mmx-packuswb-1.c [NO_WARN_X86_INTRINSICS]: Define. Suppress warning during tests. Index: gcc/testsuite/gcc.target/powerpc/mmx-packuswb-1.c === --- gcc/testsuite/gcc.target/powerpc/mmx-packuswb-1.c (revision 250986) +++ gcc/testsuite/gcc.target/powerpc/mmx-packuswb-1.c (working copy) @@ -3,6 +3,8 @@ /* { dg-require-effective-target lp64 } */ /* { dg-require-effective-target p8vector_hw } */ +#define NO_WARN_X86_INTRINSICS 1 + #ifndef CHECK_H #define CHECK_H "mmx-check.h" #endif Index: gcc/config/rs6000/mm_malloc.h === --- gcc/config/rs6000/mm_malloc.h (revision 0) +++ gcc/config/rs6000/mm_malloc.h (revision 0) @@ -0,0 +1,62 @@ +/* Copyright (C) 2004-2017 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + Under Section 7 of GPL version 3, you are granted additional + permissions described in the GCC Runtime Library Exception, version + 3.1, as published by the Free Software Foundation. + + You should have received a copy of the GNU General Public License and + a copy of the GCC Runtime Library Exception along with this program; + see the files COPYING3 and COPYING.RUNTIME respectively. If not, see + <http://www.gnu.org/licenses/>. */ + +#ifndef _MM_MALLOC_H_INCLUDED +#define _MM_MALLOC_H_INCLUDED + +#include + +/* We can't depend on since the prototype of posix_memalign + may not be visible. */ +#ifndef __cplusplus +extern int posix_memalign (void **, size_t, size_t); +#else +extern "C" int posix_memalign (void **, size_t, size_t) throw (); +#endif + +static __inline void * +_mm_malloc (size_t size, size_t alignment) +{ + /* PowerPC64 ELF V2 ABI requires quadword alignment. */ + size_t vec_align = sizeof (__vector float); + /* Linux GLIBC malloc alignment is at least 2 X ptr size. */ + size_t malloc_align = (sizeof (void *) + sizeof (void *)); + void *ptr; + + if (alignment == malloc_align && alignment == vec_align) +return malloc (size); + if (alignment < vec_align) +alignment = vec_align; + if (posix_memalign (&ptr, alignment, size) == 0) +return ptr; + else +return NULL; +} + +static __inline void +_mm_free (void * ptr) +{ + free (ptr); +} + +#endif /* _MM_MALLOC_H_INCLUDED */
[PATCH, rs6000] 2/3 Add x86 SSE intrinsics to GCC PPC64LE taget
This it part 2/3 for contributing PPC64LE support for X86 SSE instrisics. This patch includes the new (for PPC) xmmintrin.h and associated config.gcc changes. This submission implements all the SSE Technology intrinsic functions except those associated with directly accessing and updating the MX Status and Control Register (MXSCR). 1) The features and layout of the MXSCR is specific to the Intel Architecture. 2) Not all the controls and status bits of the MXSCR have equivalents in the PowerISA's FPSCR. 3) And using the Posix Floating Point Environments API is a better cross platform solution. ./gcc/ChangeLog: 2017-08-16 Steven Munroe * config.gcc (powerpc*-*-*): Add xmmintrin.h and mm_malloc.h. * config/rs6000/xmmintrin.h: New file. * config/rs6000/x86intrin.h [__ALTIVEC__]: Include xmmintrin.h. Index: gcc/config.gcc === --- gcc/config.gcc (revision 250986) +++ gcc/config.gcc (working copy) @@ -457,6 +457,7 @@ powerpc*-*-*) extra_objs="rs6000-string.o rs6000-p8swap.o" extra_headers="ppc-asm.h altivec.h htmintrin.h htmxlintrin.h" extra_headers="${extra_headers} bmi2intrin.h bmiintrin.h" + extra_headers="${extra_headers} xmmintrin.h mm_malloc.h" extra_headers="${extra_headers} mmintrin.h x86intrin.h" extra_headers="${extra_headers} ppu_intrinsics.h spu2vmx.h vec_types.h si2vmx.h" extra_headers="${extra_headers} paired.h" Index: gcc/config/rs6000/x86intrin.h === --- gcc/config/rs6000/x86intrin.h (revision 250986) +++ gcc/config/rs6000/x86intrin.h (working copy) @@ -37,6 +37,8 @@ #ifdef __ALTIVEC__ #include + +#include #endif /* __ALTIVEC__ */ #include Index: gcc/config/rs6000/xmmintrin.h === --- gcc/config/rs6000/xmmintrin.h (revision 0) +++ gcc/config/rs6000/xmmintrin.h (revision 0) @@ -0,0 +1,1815 @@ +/* Copyright (C) 2002-2017 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + Under Section 7 of GPL version 3, you are granted additional + permissions described in the GCC Runtime Library Exception, version + 3.1, as published by the Free Software Foundation. + + You should have received a copy of the GNU General Public License and + a copy of the GCC Runtime Library Exception along with this program; + see the files COPYING3 and COPYING.RUNTIME respectively. If not, see + <http://www.gnu.org/licenses/>. */ + +/* Implemented from the specification included in the Intel C++ Compiler + User Guide and Reference, version 9.0. */ + +#ifndef NO_WARN_X86_INTRINSICS +/* This header is distributed to simplify porting x86_64 code that + makes explicit use of Intel intrinsics to powerpc64le. + It is the user's responsibility to determine if the results are + acceptable and make additional changes as necessary. + Note that much code that uses Intel intrinsics can be rewritten in + standard C or GNU C extensions, which are more portable and better + optimized across multiple targets. + + In the specific case of X86 SSE (__m128) intrinsics, the PowerPC + VMX/VSX ISA is a good match for vector float SIMD operations. + However scalar float operations in vector (XMM) registers require + the POWER8 VSX ISA (2.07) level. Also there are important + differences for data format and placement of float scalars in the + vector register. For PowerISA Scalar floats in FPRs (left most + 64-bits of the low 32 VSRs) is in double format, while X86_64 SSE + uses the right most 32-bits of the XMM. These differences require + extra steps on POWER to match the SSE scalar float semantics. + + Most SSE scalar float intrinsic operations can be performed more + efficiently as C language float scalar operations or optimized to + use vector SIMD operations. We recommend this for new applications. + + Another difference is the format and details of the X86_64 MXSCR vs + the PowerISA FPSCR / VSCR registers. We recommend applications + replace direct access to the MXSCR with the more portable + Posix APIs. */ +#warning "Please read comment above. Use -DNO_WARN_X86_INTRINSICS to disable this warning." +#endif + +#ifndef _XMMINTRIN_H_INCLUDED +#define _XMMINTRIN_H_INCLUDED + +#inclu
[PATCH, rs6000] 3/3 Add x86 SSE intrinsics to GCC PPC64LE taget
This it part 3/3 for contributing PPC64LE support for X86 SSE instrisics. This patch includes testsuite/gcc.target tests for the intrinsics included by xmmintrin.h. For these tests I added -Wno-psabi to dg-options to suppress warnings associated with the vector ABI change in GCC5. These warning are associated with unions defined in m128-check.h (ported with minimal change from i386). This removes some noise from make check. [gcc/testsuite] 2017-08-16 Steven Munroe * gcc.target/powerpc/m128-check.h: New file. * gcc.target/powerpc/sse-check.h: New file. * gcc.target/powerpc/sse-movmskps-1.c: New file. * gcc.target/powerpc/sse-movlps-2.c: New file. * gcc.target/powerpc/sse-pavgw-1.c: New file. * gcc.target/powerpc/sse-cvttss2si-1.c: New file. * gcc.target/powerpc/sse-cvtpi32x2ps-1.c: New file. * gcc.target/powerpc/sse-cvtss2si-1.c: New file. * gcc.target/powerpc/sse-divss-1.c: New file. * gcc.target/powerpc/sse-movhps-1.c: New file. * gcc.target/powerpc/sse-cvtsi2ss-2.c: New file. * gcc.target/powerpc/sse-subps-1.c: New file. * gcc.target/powerpc/sse-minps-1.c: New file. * gcc.target/powerpc/sse-pminub-1.c: New file. * gcc.target/powerpc/sse-cvtpu16ps-1.c: New file. * gcc.target/powerpc/sse-shufps-1.c: New file. * gcc.target/powerpc/sse-ucomiss-2.c: New file. * gcc.target/powerpc/sse-maxps-1.c: New file. * gcc.target/powerpc/sse-pmaxub-1.c: New file. * gcc.target/powerpc/sse-movmskb-1.c: New file. * gcc.target/powerpc/sse-ucomiss-4.c: New file. * gcc.target/powerpc/sse-unpcklps-1.c: New file. * gcc.target/powerpc/sse-mulps-1.c: New file. * gcc.target/powerpc/sse-rcpps-1.c: New file. * gcc.target/powerpc/sse-pminsw-1.c: New file. * gcc.target/powerpc/sse-ucomiss-6.c: New file. * gcc.target/powerpc/sse-subss-1.c: New file. * gcc.target/powerpc/sse-movss-2.c: New file. * gcc.target/powerpc/sse-pmaxsw-1.c: New file. * gcc.target/powerpc/sse-minss-1.c: New file. * gcc.target/powerpc/sse-movaps-2.c: New file. * gcc.target/powerpc/sse-movlps-1.c: New file. * gcc.target/powerpc/sse-maxss-1.c: New file. * gcc.target/powerpc/sse-movhlps-1.c: New file. * gcc.target/powerpc/sse-cvttss2si-2.c: New file. * gcc.target/powerpc/sse-cvtpi8ps-1.c: New file. * gcc.target/powerpc/sse-cvtpi32ps-1.c: New file. * gcc.target/powerpc/sse-mulss-1.c: New file. * gcc.target/powerpc/sse-cvtsi2ss-1.c: New file. * gcc.target/powerpc/sse-cvtss2si-2.c: New file. * gcc.target/powerpc/sse-movlhps-1.c: New file. * gcc.target/powerpc/sse-movhps-2.c: New file. * gcc.target/powerpc/sse-rsqrtps-1.c: New file. * gcc.target/powerpc/sse-xorps-1.c: New file. * gcc.target/powerpc/sse-cvtpspi8-1.c: New file. * gcc.target/powerpc/sse-orps-1.c: New file. * gcc.target/powerpc/sse-addps-1.c: New file. * gcc.target/powerpc/sse-cvtpi16ps-1.c: New file. * gcc.target/powerpc/sse-ucomiss-1.c: New file. * gcc.target/powerpc/sse-ucomiss-3.c: New file. * gcc.target/powerpc/sse-pmulhuw-1.c: New file. * gcc.target/powerpc/sse-andps-1.c: New file. * gcc.target/powerpc/sse-cmpss-1.c: New file. * gcc.target/powerpc/sse-divps-1.c: New file. * gcc.target/powerpc/sse-andnps-1.c: New file. * gcc.target/powerpc/sse-ucomiss-5.c: New file. * gcc.target/powerpc/sse-movss-1.c: New file. * gcc.target/powerpc/sse-sqrtps-1.c: New file. * gcc.target/powerpc/sse-cvtpu8ps-1.c: New file. * gcc.target/powerpc/sse-cvtpspi16-1.c: New file. * gcc.target/powerpc/sse-movaps-1.c: New file. * gcc.target/powerpc/sse-movss-3.c: New file. * gcc.target/powerpc/sse-unpckhps-1.c: New file. * gcc.target/powerpc/sse-addss-1.c: New file. * gcc.target/powerpc/sse-psadbw-1.c: New file. Index: gcc/testsuite/gcc.target/powerpc/sse-movmskps-1.c === --- gcc/testsuite/gcc.target/powerpc/sse-movmskps-1.c (revision 0) +++ gcc/testsuite/gcc.target/powerpc/sse-movmskps-1.c (revision 0) @@ -0,0 +1,45 @@ +/* { dg-do run } */ +/* { dg-options "-O3 -mpower8-vector" } */ +/* { dg-require-effective-target lp64 } */ +/* { dg-require-effective-target p8vector_hw } */ + +#define NO_WARN_X86_INTRINSICS 1 + +#ifndef CHECK_H +#define CHECK_H "sse-check.h" +#endif + +#include CHECK_H + +#ifndef TEST +#define TEST sse_test_movmskps_1 +#endif + +#include + +static int +__attribute__((noinline, unused)) +test (__m128 a) +{ + return _mm_movemask_ps (a); +} + +static void +TEST (void) +{ + union128 u; + float s[4] = {-2134.3343, 1234.635654, 1.2234, -876.8976}; + int d; + int e = 0; + int i; + + u.x = _mm_loadu_ps
Re: [PATCH, rs6000] 2/3 Add x86 SSE intrinsics to GCC PPC64LE taget
On Thu, 2017-08-17 at 00:28 -0500, Segher Boessenkool wrote: > Hi! > > On Wed, Aug 16, 2017 at 03:35:40PM -0500, Steven Munroe wrote: > > +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, > > __artificial__)) > > +_mm_add_ss (__m128 __A, __m128 __B) > > +{ > > +#ifdef _ARCH_PWR7 > > + __m128 a, b, c; > > + static const __vector unsigned int mask = {0x, 0, 0, 0}; > > + /* PowerISA VSX does not allow partial (for just lower double) > > + * results. So to insure we don't generate spurious exceptions > > + * (from the upper double values) we splat the lower double > > + * before we to the operation. */ > > No leading stars in comments please. Fixed > > > + a = vec_splat (__A, 0); > > + b = vec_splat (__B, 0); > > + c = a + b; > > + /* Then we merge the lower float result with the original upper > > + * float elements from __A. */ > > + return (vec_sel (__A, c, mask)); > > +#else > > + __A[0] = __A[0] + __B[0]; > > + return (__A); > > +#endif > > +} > > It would be nice if we could just write the #else version and get the > more optimised code, but I guess we get something horrible going through > memory, instead? > No, even with GCC8-trunk this field access is going through storage. The generated code for splat, op, select is shorter even when you include loading the constant. vector <-> scalar float is just nasty! > > +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, > > __artificial__)) > > +_mm_rcp_ps (__m128 __A) > > +{ > > + __v4sf result; > > + > > + __asm__( > > + "xvresp %x0,%x1;\n" > > + : "=v" (result) > > + : "v" (__A) > > + : ); > > + > > + return (result); > > +} > > There is a builtin for this (__builtin_vec_re). Yes, not sure how I missed that. Fixed. > > > +/* Convert the lower SPFP value to a 32-bit integer according to the > > current > > + rounding mode. */ > > +extern __inline int __attribute__((__gnu_inline__, __always_inline__, > > __artificial__)) > > +_mm_cvtss_si32 (__m128 __A) > > +{ > > + __m64 res = 0; > > +#ifdef _ARCH_PWR8 > > + __m128 vtmp; > > + __asm__( > > + "xxsldwi %x1,%x2,%x2,3;\n" > > + "xscvspdp %x1,%x1;\n" > > + "fctiw %1,%1;\n" > > + "mfvsrd %0,%x1;\n" > > + : "=r" (res), > > + "=&wi" (vtmp) > > + : "wa" (__A) > > + : ); > > +#endif > > + return (res); > > +} > > Maybe it could do something better than return the wrong answer for non-p8? Ok this gets tricky. Before _ARCH_PWR8 the vector to scalar transfer would go through storage. But that is not the worst of it. The semantic of cvtss requires rint or llrint. But __builtin_rint will generate a call to libm unless we assert -ffast-math. And we don't have builtins to generate fctiw/fctid directly. So I will add the #else using __builtin_rint if that libm dependency is ok (this will pop in the DG test for older machines. > > > +extern __inline int __attribute__((__gnu_inline__, __always_inline__, > > __artificial__)) > > > +#ifdef __LITTLE_ENDIAN__ > > + return result[1]; > > +#elif __BIG_ENDIAN__ > > + return result [0]; > > Remove the extra space here? > > > +_mm_max_pi16 (__m64 __A, __m64 __B) > > > + res.as_short[0] = (m1.as_short[0] > m2.as_short[0])? m1.as_short[0]: > > m2.as_short[0]; > > + res.as_short[1] = (m1.as_short[1] > m2.as_short[1])? m1.as_short[1]: > > m2.as_short[1]; > > + res.as_short[2] = (m1.as_short[2] > m2.as_short[2])? m1.as_short[2]: > > m2.as_short[2]; > > + res.as_short[3] = (m1.as_short[3] > m2.as_short[3])? m1.as_short[3]: > > m2.as_short[3]; > > Space before ? and : . done > > > +_mm_min_pi16 (__m64 __A, __m64 __B) > > In this function, too. > > > +extern __inline int __attribute__((__gnu_inline__, __always_inline__, > > __artificial__)) > > +_m_pmovmskb (__m64 __A) > > +{ > > + return _mm_movemask_pi8 (__A); > > +} > > +/* Multiply four unsigned 16-bit values in A by four unsigned 16-bit values > > + in B and produce the high 16 bits of the 32-bit results. */ > > +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, > > __artificial__)) > > Newline before the comment? done > > > +_mm_sad_pu8 (__m64 __A, __m64 __B) > > > + /* Sum four group
Re: [PATCH, rs6000] 3/3 Add x86 SSE intrinsics to GCC PPC64LE taget
On Thu, 2017-08-17 at 00:47 -0500, Segher Boessenkool wrote: > On Wed, Aug 16, 2017 at 03:50:55PM -0500, Steven Munroe wrote: > > This it part 3/3 for contributing PPC64LE support for X86 SSE > > instrisics. This patch includes testsuite/gcc.target tests for the > > intrinsics included by xmmintrin.h. > > > +#define CHECK_EXP(UINON_TYPE, VALUE_TYPE, FMT) \ > > Should that be UNION_TYPE? It is spelled 'UINON_TYPE' in ./gcc/testsuite/gcc.target/i386/m128-check.h which the source for the powerpc version. There is no obvious reason why it could not be spelled UNION_TYPE. Unless there is some symbol collision further up the SSE/AVX stack. Bingo: avx512f-helper.h:#define UNION_TYPE(SIZE, NAME) EVAL(union, SIZE, NAME) I propose not to change this.
[PATCH, rs6000] Add x86 instrinsic headers to GCC PPC64LE taget
A common issue in porting applications and packages is that someone may have forgotten that there is more than one hardware platform. A specific example is applications using Intel x86 intrinsic functions without appropriate conditional compile guards. Another example is a developer tasked to port a large volume of code containing important functions "optimized" with Intel x86 intrinsics, but without the skill or time to perform the same optimization for another platform. Often the developer who wrote the original optimization has moved on and those left to maintain the application / package lack understanding of the original x86 intrinsic code or design. For PowerPC this can be acute especially for HPC vector SIMD codes. The PowerISA (as implemented for POWER and OpenPOWER servers) has extensive vector hardware facilities and GCC proves a large set of vector intrinsics. Thus I would like to restrict this support to PowerPC targets that support VMX/VSX and PowerISA-2.07 (power8) and later. But the difference in (intrinsic) spelling alone is enough stop many application developers in their tracks. So I propose to submit a series of patches to implement the PowerPC64LE equivalent of a useful subset of the x86 intrinsics. The final size and usefulness of this effort is to be determined. The proposal is to incrementally port intrinsic header files from the ./config/i386 tree to the ./config/rs6000 tree. This naturally provides the same header structure and intrinsic names which will simplify code porting. It seems natural to work from the bottom (oldest) up. For example starting with mmintrin.h and working our way up the following headers: smmintrin.h(SSE4.1) includes tmmintrin,h tmmintrin.h(SSSE3) includes pmmintrin.h pmmintrin.h(SSE3)includes emmintrin,h emmintrin.h(SSE2)includes xmmintrin.h xmmintrin.h(SSE) includes mmintrin.h and mm_malloc.h mmintrin.h (MMX) There is a smattering of non-vector intrinsics in common use. Like the Bit Manipulation Instructions (BMI & BMI2). bmiintrin.h bmi2intrin.h x86intrin.h (collector includes BMI headers and many others) The older intrinsic (BMI/MMX/SSE) instructions have been integrated into GCC and many of the intrinsic implementations are simple C code or GCC built-ins. The remaining intrinsic functions are implemented as platform specific builtins (__builtin_ia32_*) and need to be mapped to equivalent PowerPC builtin or vector intrinsic from altivec.h is required. Of course as part of this process we will port as many of the corresponding DejaGnu tests from gcc/testsuite/gcc.target/i386/ to gcc/testsuite/gcc.target/powerpc/ as appropriate. So far the dg-do run tests only require minor source changes, mostly to the platform specific dg-* directives. A few dg-do compile tests are needed to insure we are getting the expected folding/Common subexpression elimination (CSE) to generate the optimum sequence for PowerPC. To get the ball rolling I include the BMI intrinsics ported to PowerPC for review as they are reasonable size (31 intrinsic implementations). [gcc] 2017-05-04 Steven Munroe * config.gcc (powerpc*-*-*): Add bmi2intrin.h, bmiintrin.h, and x86intrin.h * config/rs6000/bmiintrin.h: New file. * config/rs6000/bmi2intrin.h: New file. * config/rs6000/x86intrin.h: New file. [gcc/testsuite] 2017-05-04 Steven Munroe * gcc.target/powerpc/bmi-andn-1.c: New file * gcc.target/powerpc/bmi-andn-2.c: New file. * gcc.target/powerpc/bmi-bextr-1.c: New file. * gcc.target/powerpc/bmi-bextr-2.c: New file. * gcc.target/powerpc/bmi-bextr-4.c: New file. * gcc.target/powerpc/bmi-bextr-5.c: New file. * gcc.target/powerpc/bmi-blsi-1.c: New file. * gcc.target/powerpc/bmi-blsi-2.c: New file. * gcc.target/powerpc/bmi-blsmsk-1.c: new file. * gcc.target/powerpc/bmi-blsmsk-2.c: New file. * gcc.target/powerpc/bmi-blsr-1.c: New file. * gcc.target/powerpc/bmi-blsr-2.c: New File. * gcc.target/powerpc/bmi-check.h: New File. * gcc.target/powerpc/bmi-tzcnt-1.c: new file. * gcc.target/powerpc/bmi-tzcnt-2.c: New file. * gcc.target/powerpc/bmi2-bzhi32-1.c: New file. * gcc.target/powerpc/bmi2-bzhi64-1.c: New file. * gcc.target/powerpc/bmi2-bzhi64-1a.c: New file. * gcc.target/powerpc/bmi2-check.h: New file. * gcc.target/powerpc/bmi2-mulx32-1.c: New file. * gcc.target/powerpc/bmi2-mulx32-2.c: New file. * gcc.target/powerpc/bmi2-mulx64-1.c: New file. * gcc.target/powerpc/bmi2-mulx64-2.c: New file. * gcc.target/powerpc/bmi2-pdep32-1.c: New file. * gcc.target/powerpc/bmi2-pdep64-1.c: New file. * gcc.target/powerpc/bmi2-pext32-1.c: New File. * gcc.target/powerpc/bmi2-pext64-1.c: New file. * gcc.target/powerpc/bmi2-pext64-1a.c: New File. Index: gcc/testsuite/gcc.target/po
Re: [PATCH, rs6000] Add x86 instrinsic headers to GCC PPC64LE taget
On Tue, 2017-05-09 at 12:23 -0500, Segher Boessenkool wrote: > Hi! > > On Mon, May 08, 2017 at 09:49:57AM -0500, Steven Munroe wrote: > > Thus I would like to restrict this support to PowerPC > > targets that support VMX/VSX and PowerISA-2.07 (power8) and later. > > What happens if you run it on an older machine, or as BE or 32-bit, > or with vectors disabled? > Well I hope that I set the dg-require-effective-target correctly because while some of these intrinsics might work on the BE or 32-bit machine, most will not. For example; many of the BMI intrinsic implementations depend on 64-bit instructions and so I use { dg-require-effective-target lp64 }. The BMI2 intrinsic _pext exploits the Bit Permute Doubleword instruction. There is no Bit Permute Word instruction. So for BMI2 I use { dg-require-effective-target powerpc_vsx_ok } as bpermd was introduced in PowerISA 2.06 along with the Vector Scalar Extension facility. The situation gets more complicated when we start looking at the SSE/SSE2. These headers define many variants of load and store instructions that are decidedly LE and many unaligned forms. While powerpc64le handles this with ease, implementing LE semantics in BE mode gets seriously tricky. I think it is better to avoid this and only support these headers for LE. And while some SSE instrinsics can be implemented with VMX instructions all the SSE2 double float intrinsics require VSX. And some PowerISA 2.07 instructions simplify implementation if available. As power8 is also the first supported powerpc64le system it seems the logical starting point for most of this work. I don't plan to spend effort on supporting Intel intrinsic functions on older PowerPC machines (before power8) or BE. > > So I propose to submit a series of patches to implement the PowerPC64LE > > equivalent of a useful subset of the x86 intrinsics. The final size and > > usefulness of this effort is to be determined. The proposal is to > > incrementally port intrinsic header files from the ./config/i386 tree to > > the ./config/rs6000 tree. This naturally provides the same header > > structure and intrinsic names which will simplify code porting. > > Yeah. > > I'd still like to see these headers moved into some subdir (both in > the source tree and in the installed headers tree), to reduce clutter, > but I understand it's not trivial to do. > > > To get the ball rolling I include the BMI intrinsics ported to PowerPC > > for review as they are reasonable size (31 intrinsic implementations). > > This is okay for trunk. Thanks! > Thank you > > --- gcc/config.gcc (revision 247616) > > +++ gcc/config.gcc (working copy) > > @@ -444,7 +444,7 @@ nvptx-*-*) > > ;; > > powerpc*-*-*) > > cpu_type=rs6000 > > - extra_headers="ppc-asm.h altivec.h spe.h ppu_intrinsics.h paired.h > > spu2vmx.h vec_types.h si2vmx.h htmintrin.h htmxlintrin.h" > > + extra_headers="ppc-asm.h altivec.h spe.h ppu_intrinsics.h paired.h > > spu2vmx.h vec_types.h si2vmx.h htmintrin.h htmxlintrin.h bmi2intrin.h > > bmiintrin.h x86intrin.h" > > (Your mail client wrapped this). > > Write this on a separate line? Like > extra_headers="${extra_headers} htmintrin.h htmxlintrin.h bmi2intrin.h" > (You cannot use += here, pity). > > > Segher >
Re: [PATCH, rs6000] Add x86 instrinsic headers to GCC PPC64LE taget
On Tue, 2017-05-09 at 16:03 -0500, Segher Boessenkool wrote: > On Tue, May 09, 2017 at 02:33:00PM -0500, Steven Munroe wrote: > > On Tue, 2017-05-09 at 12:23 -0500, Segher Boessenkool wrote: > > > On Mon, May 08, 2017 at 09:49:57AM -0500, Steven Munroe wrote: > > > > Thus I would like to restrict this support to PowerPC > > > > targets that support VMX/VSX and PowerISA-2.07 (power8) and later. > > > > > > What happens if you run it on an older machine, or as BE or 32-bit, > > > or with vectors disabled? > > > > > Well I hope that I set the dg-require-effective-target correctly because > > while some of these intrinsics might work on the BE or 32-bit machine, > > most will not. > > That is just for the testsuite; I meant what happens if a user tries > to use it with an older target (or BE, or 32-bit)? Is there a useful, > obvious error message? > So looking at the X86 headers, their current practice falls into two two areas. 1) guard 64-bit dependent intrinsic functions with: #ifdef __x86_64__ #endif But they do not provide any warnings. I assume that attempting to use an intrinsic of this class would result in an implicit function declaration and a link-time failure. 2) guard architecture level dependent intrinsic header content with: #ifndef __AVX__ #pragma GCC push_options #pragma GCC target("avx") #define __DISABLE_AVX__ #endif /* __AVX__ */ ... #ifdef __DISABLE_AVX__ #undef __DISABLE_AVX__ #pragma GCC pop_options #endif /* __DISABLE_AVX__ */ So they don't many any attempt to prevent them from using a specific header. If the compiler version does not support the "GCC target" I assume that specific did not exist in that version. If GCC does support that target then the '#pragma GCC target("avx")' will enable code generation, but the user might get a SIGILL if the hardware they have does not support those instructions. In the BMI headers I already guard with: #ifdef __PPC64__ #endif This means that like x86_64, attempting to use _pext_u64 on a 32-bit compiler will result in an implicit function declaration and cause a linker error. This is sufficient for most of BMI and BMI2 (registers only / endian agnostic). But this does not address the larger issues (for SSE/SSE2+) which needing VXS implementation or restricting to LE. So should I check for: #ifdef __VSX__ #endif or #ifdef __POWER8_VECTOR__ or #ifdef _ARCH_PWR8 and perhaps: #if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__ as well to enforce this. And are you suggesting I add an #else clause with #warning or #error? Or is the implicit function and link failure sufficient? > > The situation gets more complicated when we start looking at the > > SSE/SSE2. These headers define many variants of load and store > > instructions that are decidedly LE and many unaligned forms. While > > powerpc64le handles this with ease, implementing LE semantics in BE mode > > gets seriously tricky. I think it is better to avoid this and only > > support these headers for LE. > > Right. > > > And while some SSE instrinsics can be implemented with VMX instructions > > all the SSE2 double float intrinsics require VSX. And some PowerISA 2.07 > > instructions simplify implementation if available. As power8 is also the > > first supported powerpc64le system it seems the logical starting point > > for most of this work. > > Agreed as well. > > > I don't plan to spend effort on supporting Intel intrinsic functions on > > older PowerPC machines (before power8) or BE. > > Just make sure if anyone tries anyway, there is a clear error message > that tells them not to. > > > Segher >
Re: [PATCH, rs6000] Add x86 instrinsic headers to GCC PPC64LE taget
On Thu, 2017-05-11 at 09:39 -0500, Segher Boessenkool wrote: > On Wed, May 10, 2017 at 12:59:28PM -0500, Steven Munroe wrote: > > > That is just for the testsuite; I meant what happens if a user tries > > > to use it with an older target (or BE, or 32-bit)? Is there a useful, > > > obvious error message? > > > > > So looking at the X86 headers, their current practice falls into two two > > areas. > > > > 1) guard 64-bit dependent intrinsic functions with: > > > > #ifdef __x86_64__ > > #endif > > > > But they do not provide any warnings. I assume that attempting to use an > > intrinsic of this class would result in an implicit function declaration > > and a link-time failure. > > Yeah probably. Which is fine -- it does not silently do the wrong thing, > and it is easy to find where the problem is. > > > If GCC does support that target then the '#pragma GCC target("avx")' > > will enable code generation, but the user might get a SIGILL if the > > hardware they have does not support those instructions. > > That is less friendly, but it still does not silently generate bad code. > > > In the BMI headers I already guard with: > > > > #ifdef __PPC64__ > > #endif > > > > This means that like x86_64, attempting to use _pext_u64 on a 32-bit > > compiler will result in an implicit function declaration and cause a > > linker error. > > Yup, that's fine. > > > This is sufficient for most of BMI and BMI2 (registers only / endian > > agnostic). But this does not address the larger issues (for SSE/SSE2+) > > which needing VXS implementation or restricting to LE. > > Right. > > > So should I check for: > > > > #ifdef __VSX__ > > #endif > > > > or > > > > #ifdef __POWER8_VECTOR__ > > > > or > > > > #ifdef _ARCH_PWR8 > > > > and perhaps: > > > > #if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__ > > > > as well to enforce this. > > > > And are you suggesting I add an #else clause with #warning or #error? Or > > is the implicit function and link failure sufficient? > > The first is friendlier, the second is sufficient I think. > > Maybe it is good enough to check for LE only? Most unmodified code > written for x86 (using intrinsics etc.) will not work correctly on BE. > And if you restrict to LE you get 64-bit and POWER8 automatically. > > So maybe just require LE? > Ok I will add "#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__" guard for the MMX/SSE and later intrinsic headers.
Re: [PATCH, rs6000] Add x86 instrinsic headers to GCC PPC64LE taget
On Fri, 2017-05-12 at 11:38 -0700, Mike Stump wrote: > On May 8, 2017, at 7:49 AM, Steven Munroe wrote: > > Of course as part of this process we will port as many of the > > corresponding DejaGnu tests from gcc/testsuite/gcc.target/i386/ to > > gcc/testsuite/gcc.target/powerpc/ as appropriate. So far the dg-do run > > tests only require minor source changes, mostly to the platform specific > > dg-* directives. A few dg-do compile tests are needed to insure we are > > getting the expected folding/Common subexpression elimination (CSE) to > > generate the optimum sequence for PowerPC. > > If there is a way to share that seems reasonable and the x86 would like to > share... > > I'd let you and the x86 folks figure out what is best. It too early to tell but I have no objections to discussing options. Are you looking to share source files? This seems like low value because the files tend to be small and the only difference is the dg-* directives. I don't know enough about the DejaGnu macros to even guess at what this might entail. So far the sharing it is mostly one way (./i386/ -> ./powerpc/) but if I find cases that requires a new dg test and might also apply to ./i386/ I be willing to share that with X86.
[PATCH rs6000] Fix up dg-options for BMI intrinsic tests
David pointed out that I my earlier X86 BMI intrinsic header submission was causing make check failures on on powerpc64le platforms. The patch below tests out on Linux BE powerpc64/32 and should also resolve the failures on AIX. I don't have access to a AIX so David can you give this patch a quick test. Thanks. [gcc/testsuite] 2017-05-17 Steven Munroe * gcc.target/powerpc/bmi-andn-1.c: Fix-up dg-options. * gcc.target/powerpc/bmi-andn-2.c: Fix-up dg-options. * gcc.target/powerpc/bmi-bextr-1.c: Fix-up dg-options. * gcc.target/powerpc/bmi-bextr-2.c: Fix-up dg-options. * gcc.target/powerpc/bmi-bextr-4.c: Fix-up dg-options. * gcc.target/powerpc/bmi-bextr-5.c: Fix-up dg-options. * gcc.target/powerpc/bmi-blsi-1.c: Fix-up dg-options. * gcc.target/powerpc/bmi-blsi-2.c: Fix-up dg-options. * gcc.target/powerpc/bmi-blsmsk-1.c: Fix-up dg-options. * gcc.target/powerpc/bmi-blsmsk-2.c: Fix-up dg-options. * gcc.target/powerpc/bmi-blsr-1.c: Fix-up dg-options. * gcc.target/powerpc/bmi-blsr-2.c: Fix-up dg-options. * gcc.target/powerpc/bmi-tzcnt-1.c: Fix-up dg-options. * gcc.target/powerpc/bmi-tzcnt-2.c: Fix-up dg-options. * gcc.target/powerpc/bmi2-bzhi32-1.c: Fix-up dg-options. * gcc.target/powerpc/bmi2-bzhi64-1.c: Fix-up dg-options. * gcc.target/powerpc/bmi2-bzhi64-1a.c: Fix-up dg-options. * gcc.target/powerpc/bmi2-mulx32-1.c: Fix-up dg-options. * gcc.target/powerpc/bmi2-mulx32-2.c: Fix-up dg-options. * gcc.target/powerpc/bmi2-mulx64-1.c: Fix-up dg-options. * gcc.target/powerpc/bmi2-mulx64-2.c: Fix-up dg-options. * gcc.target/powerpc/bmi2-pdep32-1.c: Fix-up dg-options. * gcc.target/powerpc/bmi2-pdep64-1.c: Fix-up dg-options. * gcc.target/powerpc/bmi2-pext32-1.c: Fix-up dg-options. * gcc.target/powerpc/bmi2-pext64-1.c: Fix-up dg-options. * gcc.target/powerpc/bmi2-pext64-1a.c: Fix-up dg-options. Index: gcc/testsuite/gcc.target/powerpc/bmi-andn-1.c === --- gcc/testsuite/gcc.target/powerpc/bmi-andn-1.c (revision 248166) +++ gcc/testsuite/gcc.target/powerpc/bmi-andn-1.c (working copy) @@ -1,5 +1,5 @@ /* { dg-do run } */ -/* { dg-options "-O3 -m64" } */ +/* { dg-options "-O3" } */ /* { dg-require-effective-target lp64 } */ #define NO_WARN_X86_INTRINSICS 1 Index: gcc/testsuite/gcc.target/powerpc/bmi-andn-2.c === --- gcc/testsuite/gcc.target/powerpc/bmi-andn-2.c (revision 248166) +++ gcc/testsuite/gcc.target/powerpc/bmi-andn-2.c (working copy) @@ -1,5 +1,5 @@ /* { dg-do run } */ -/* { dg-options "-O3 -m64" } */ +/* { dg-options "-O3" } */ /* { dg-require-effective-target lp64 } */ #define NO_WARN_X86_INTRINSICS 1 Index: gcc/testsuite/gcc.target/powerpc/bmi-bextr-1.c === --- gcc/testsuite/gcc.target/powerpc/bmi-bextr-1.c (revision 248166) +++ gcc/testsuite/gcc.target/powerpc/bmi-bextr-1.c (working copy) @@ -1,6 +1,6 @@ /* { dg-do run } */ +/* { dg-options "-O2 -fno-inline" } */ /* { dg-require-effective-target lp64 } */ -/* { dg-options "-O2 -m64 -fno-inline" } */ #define NO_WARN_X86_INTRINSICS 1 #include Index: gcc/testsuite/gcc.target/powerpc/bmi-bextr-2.c === --- gcc/testsuite/gcc.target/powerpc/bmi-bextr-2.c (revision 248166) +++ gcc/testsuite/gcc.target/powerpc/bmi-bextr-2.c (working copy) @@ -1,6 +1,6 @@ /* { dg-do run } */ +/* { dg-options "-O3 -fno-inline" } */ /* { dg-require-effective-target lp64 } */ -/* { dg-options "-O3 -m64 -fno-inline" } */ #define NO_WARN_X86_INTRINSICS 1 #include Index: gcc/testsuite/gcc.target/powerpc/bmi-bextr-4.c === --- gcc/testsuite/gcc.target/powerpc/bmi-bextr-4.c (revision 248166) +++ gcc/testsuite/gcc.target/powerpc/bmi-bextr-4.c (working copy) @@ -1,6 +1,6 @@ /* { dg-do run } */ +/* { dg-options "-O3 -fno-inline" } */ /* { dg-require-effective-target lp64 } */ -/* { dg-options "-O3 -m64 -fno-inline" } */ #define NO_WARN_X86_INTRINSICS 1 #include Index: gcc/testsuite/gcc.target/powerpc/bmi-bextr-5.c === --- gcc/testsuite/gcc.target/powerpc/bmi-bextr-5.c (revision 248166) +++ gcc/testsuite/gcc.target/powerpc/bmi-bextr-5.c (working copy) @@ -1,6 +1,6 @@ /* { dg-do run } */ +/* { dg-options "-O3 -fno-inline" } */ /* { dg-require-effective-target lp64 } */ -/* { dg-options "-O3 -m64 -fno-inline" } */ #define NO_WARN_X86_INTRINSICS 1 #include Index: gcc/testsuite/gcc.target/powerpc/bmi
Re: [PATCH rs6000] Fix up dg-options for BMI intrinsic tests
On Wed, 2017-05-17 at 17:22 -0400, David Edelsohn wrote: > On Wed, May 17, 2017 at 4:56 PM, Steven Munroe > wrote: > > David pointed out that I my earlier X86 BMI intrinsic header submission > > was causing make check failures on on powerpc64le platforms. The patch > > below tests out on Linux BE powerpc64/32 and should also resolve the > > failures on AIX. I don't have access to a AIX so David can you give this > > patch a quick test. > > This will fix the failures on AIX. > Ok I'll commit this.
[PATCH rs6000] Addition fixes to BMI intrinsic test
Bill Seurer pointed out that building the BMI tests on a power8 but with gcc built --with-cpu=power6 fails with link errors. The intrinsics _pdep_u64/32 and _pext_u64/32 are guarded with #ifdef _ARCH_PWR7 as the implementation uses bpermd and popcntd instructions introduced with power7 (PowerISA-2.06). But if the GCC is built --with-cpu=power6, the compiler is capable of supporting -mcpu=power7 but will not generate bpermd/popcntd by default. Then if some code them uses say _pext_u64 with -mcpu=power6 the intrinsic is not not supported (needs power7) and so not defined. The dg tests are guarded with dg-require-effective-target powerpc_vsx_ok, This only tests if GCC and Binutils are capable of generating vsx (and by extension PowerISA-2.06 bpermd and popcntd) instructions. In this case the result is the intrinsic functions are implicitly defined as extern and cause a link failure. The solution is to guard the test code with #ifdef _ARCH_PWR7 so that it does not attempt to use instructions that are not there. However for dg-compile test bmi2-pext64-1a.c we have no alternative to add -mcpu=power7 to dg-options. [gcc/testsuite] 2017-05-24 Steven Munroe * gcc.target/powerpc/bmi2-pdep32-1.c [_ARCH_PWR7]: Prevent implicit function for processors without bpermd instruction. * gcc.target/powerpc/bmi2-pdep64-1.c: Likewise. * gcc.target/powerpc/bmi2-pext32-1.c: Likewise. * gcc.target/powerpc/bmi2-pext64-1.c: Likewise. * gcc.target/powerpc/bmi2-pext64-1a.c: Add -mcpu=power7 to dg-option. Index: gcc/testsuite/gcc.target/powerpc/bmi2-pdep64-1.c === --- gcc/testsuite/gcc.target/powerpc/bmi2-pdep64-1.c(revision 248381) +++ gcc/testsuite/gcc.target/powerpc/bmi2-pdep64-1.c(working copy) @@ -7,6 +7,7 @@ #include #include "bmi2-check.h" +#ifdef _ARCH_PWR7 __attribute__((noinline)) unsigned long long calc_pdep_u64 (unsigned long long a, unsigned long long mask) @@ -21,11 +22,13 @@ calc_pdep_u64 (unsigned long long a, unsigned long } return res; } +#endif /* _ARCH_PWR7 */ static void bmi2_test () { +#ifdef _ARCH_PWR7 unsigned long long i; unsigned long long src = 0xce7acce7acce7ac; unsigned long long res, res_ref; @@ -39,4 +42,5 @@ bmi2_test () if (res != res_ref) abort (); } +#endif /* _ARCH_PWR7 */ } Index: gcc/testsuite/gcc.target/powerpc/bmi2-pext64-1.c === --- gcc/testsuite/gcc.target/powerpc/bmi2-pext64-1.c(revision 248381) +++ gcc/testsuite/gcc.target/powerpc/bmi2-pext64-1.c(working copy) @@ -7,6 +7,7 @@ #include #include "bmi2-check.h" +#ifdef _ARCH_PWR7 __attribute__((noinline)) unsigned long long calc_pext_u64 (unsigned long long a, unsigned long long mask) @@ -22,10 +23,12 @@ calc_pext_u64 (unsigned long long a, unsigned long return res; } +#endif /* _ARCH_PWR7 */ static void bmi2_test () { +#ifdef _ARCH_PWR7 unsigned long long i; unsigned long long src = 0xce7acce7acce7ac; unsigned long long res, res_ref; @@ -39,4 +42,5 @@ bmi2_test () if (res != res_ref) abort(); } +#endif /* _ARCH_PWR7 */ } Index: gcc/testsuite/gcc.target/powerpc/bmi2-pdep32-1.c === --- gcc/testsuite/gcc.target/powerpc/bmi2-pdep32-1.c(revision 248381) +++ gcc/testsuite/gcc.target/powerpc/bmi2-pdep32-1.c(working copy) @@ -7,6 +7,7 @@ #include #include "bmi2-check.h" +#ifdef _ARCH_PWR7 __attribute__((noinline)) unsigned calc_pdep_u32 (unsigned a, int mask) @@ -22,10 +23,12 @@ calc_pdep_u32 (unsigned a, int mask) return res; } +#endif /* _ARCH_PWR7 */ static void bmi2_test () { +#ifdef _ARCH_PWR7 unsigned i; unsigned src = 0xce7acc; unsigned res, res_ref; @@ -39,4 +42,5 @@ bmi2_test () if (res != res_ref) abort(); } +#endif /* _ARCH_PWR7 */ } Index: gcc/testsuite/gcc.target/powerpc/bmi2-pext64-1a.c === --- gcc/testsuite/gcc.target/powerpc/bmi2-pext64-1a.c (revision 248381) +++ gcc/testsuite/gcc.target/powerpc/bmi2-pext64-1a.c (working copy) @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O3" } */ +/* { dg-options "-O3 -mcpu=power7" } */ /* { dg-require-effective-target lp64 } */ /* { dg-require-effective-target powerpc_vsx_ok } */ Index: gcc/testsuite/gcc.target/powerpc/bmi2-pext32-1.c === --- gcc/testsuite/gcc.target/powerpc/bmi2-pext32-1.c(revision 248381) +++ gcc/testsuite/gcc.target/powerpc/bmi2-pext32-1.c(working copy) @@ -7,6 +7,7 @@ #include #include "bmi2-check.h" +#ifdef _ARCH_PWR7 __attribute__((noinline)) unsigned calc_pext_u32 (unsigned a, unsigned mask) @@ -22,10 +23,12 @@ calc_pext_u32 (unsigned
[PATCH rs6000] Addition fixes to BMI intrinsic tests, 2nd edition
Bill Seurer pointed out that building the BMI tests on a power8 but with gcc built --with-cpu=power6 fails with link errors. The intrinsics _pdep_u64/32 and _pext_u64/32 are guarded with #ifdef _ARCH_PWR7 as the implementation uses bpermd and popcntd instructions introduced with power7 (PowerISA-2.06). But if the GCC is built --with-cpu=power6, the compiler is capable of supporting -mcpu=power7 but will not generate bpermd/popcntd by default. Then if some code them uses say _pext_u64 with -mcpu=power6 the intrinsic is not not supported (needs power7) and so is not defined. The { dg-require-effective-target powerpc_vsx_ok } is not sufficient for the { dg-do run } and need to be changed to vsx_hw. Also we need add -mcpu=power7 to dg-options to insure the compiler will generated the bpermd/popcntd instructions. This is sufficient for all the bmi/bmi2 tests to skip/pass for power6 and later. [gcc/testsuite] 2017-05-26 Steven Munroe * gcc.target/powerpc/bmi2-pdep32-1.c []: Add -mcpu=power7 to dg-options. Change dg-require-effective-target powerpc_vsx_ok to vsx_hw. * gcc.target/powerpc/bmi2-pdep64-1.c: Likewise. * gcc.target/powerpc/bmi2-pext32-1.c: Likewise. * gcc.target/powerpc/bmi2-pext64-1.c: Likewise. * gcc.target/powerpc/bmi2-pext64-1a.c: Add -mcpu=power7 to dg-options. Index: gcc/testsuite/gcc.target/powerpc/bmi2-pdep32-1.c === --- gcc/testsuite/gcc.target/powerpc/bmi2-pdep32-1.c(revision 248468) +++ gcc/testsuite/gcc.target/powerpc/bmi2-pdep32-1.c(working copy) @@ -1,7 +1,7 @@ /* { dg-do run } */ -/* { dg-options "-O3" } */ +/* { dg-options "-O3 -mcpu=power7" } */ /* { dg-require-effective-target lp64 } */ -/* { dg-require-effective-target powerpc_vsx_ok } */ +/* { dg-require-effective-target vsx_hw } */ #define NO_WARN_X86_INTRINSICS 1 #include Index: gcc/testsuite/gcc.target/powerpc/bmi2-pdep64-1.c === --- gcc/testsuite/gcc.target/powerpc/bmi2-pdep64-1.c(revision 248468) +++ gcc/testsuite/gcc.target/powerpc/bmi2-pdep64-1.c(working copy) @@ -1,7 +1,7 @@ /* { dg-do run } */ -/* { dg-options "-O3" } */ +/* { dg-options "-O3 -mcpu=power7" } */ /* { dg-require-effective-target lp64 } */ -/* { dg-require-effective-target powerpc_vsx_ok } */ +/* { dg-require-effective-target vsx_hw } */ #define NO_WARN_X86_INTRINSICS 1 #include Index: gcc/testsuite/gcc.target/powerpc/bmi2-pext32-1.c === --- gcc/testsuite/gcc.target/powerpc/bmi2-pext32-1.c(revision 248468) +++ gcc/testsuite/gcc.target/powerpc/bmi2-pext32-1.c(working copy) @@ -1,7 +1,7 @@ /* { dg-do run } */ -/* { dg-options "-O3" } */ +/* { dg-options "-O3 -mcpu=power7" } */ /* { dg-require-effective-target lp64 } */ -/* { dg-require-effective-target powerpc_vsx_ok } */ +/* { dg-require-effective-target vsx_hw } */ #define NO_WARN_X86_INTRINSICS 1 #include Index: gcc/testsuite/gcc.target/powerpc/bmi2-pext64-1.c === --- gcc/testsuite/gcc.target/powerpc/bmi2-pext64-1.c(revision 248468) +++ gcc/testsuite/gcc.target/powerpc/bmi2-pext64-1.c(working copy) @@ -1,7 +1,7 @@ /* { dg-do run } */ -/* { dg-options "-O3" } */ +/* { dg-options "-O3 -mcpu=power7" } */ /* { dg-require-effective-target lp64 } */ -/* { dg-require-effective-target powerpc_vsx_ok } */ +/* { dg-require-effective-target vsx_hw } */ #define NO_WARN_X86_INTRINSICS 1 #include Index: gcc/testsuite/gcc.target/powerpc/bmi2-pext64-1a.c === --- gcc/testsuite/gcc.target/powerpc/bmi2-pext64-1a.c (revision 248468) +++ gcc/testsuite/gcc.target/powerpc/bmi2-pext64-1a.c (working copy) @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O3" } */ +/* { dg-options "-O3 -mcpu=power7" } */ /* { dg-require-effective-target lp64 } */ /* { dg-require-effective-target powerpc_vsx_ok } */
Re: [PATCH rs6000] Addition fixes to BMI intrinsic tests, 2nd edition
On Tue, 2017-05-30 at 17:26 -0500, Segher Boessenkool wrote: > On Fri, May 26, 2017 at 10:32:54AM -0500, Steven Munroe wrote: > > * gcc.target/powerpc/bmi2-pdep32-1.c []: Add -mcpu=power7 to > > dg-options. Change dg-require-effective-target powerpc_vsx_ok > > to vsx_hw. > > Stray "[]"? Yes, still not sure of the changelog conventions for DG options > > > --- gcc/testsuite/gcc.target/powerpc/bmi2-pdep32-1.c(revision > > 248468) > > +++ gcc/testsuite/gcc.target/powerpc/bmi2-pdep32-1.c(working copy) > > @@ -1,7 +1,7 @@ > > /* { dg-do run } */ > > -/* { dg-options "-O3" } */ > > +/* { dg-options "-O3 -mcpu=power7" } */ > > /* { dg-require-effective-target lp64 } */ > > -/* { dg-require-effective-target powerpc_vsx_ok } */ > > +/* { dg-require-effective-target vsx_hw } */ > > Other testcases selecting a -mcpu= also use > > /* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { > "-mcpu=power7" } } */ > > Do you really want -mcpu=power7 always? Or just at least power7? > I need at least -mcpu=power7 to generate popcntd/bpermd. The pdep_u32/pext_u32 implementations call the respective 64-bit versions as the ISA does not provide a 32-bit bpermd. It is not obvious how to the skip until a minimum -mcpu=power7 for these dc-do run tests. If the dg-skip-if is required I will add it. For the dg-do compile test for _pext_u64 I need the -mcpu=power7 specifically to get the correct counts for bpermd, popcntd and cntlzd.
[PATCH rs6000] Addition fixes to BMI intrinsic tests, 3rd edition
Bill Seurer pointed out that building the BMI tests on a power8 but with gcc built --with-cpu=power6 fails with link errors. The intrinsics _pdep_u64/32 and _pext_u64/32 are guarded with #ifdef _ARCH_PWR7 as the implementation uses bpermd and popcntd instructions introduced with power7 (PowerISA-2.06). But if the GCC is built --with-cpu=power6, the compiler is capable of supporting -mcpu=power7 but will not generate bpermd/popcntd by default. Then if some code uses say _pext_u64 with -mcpu=power6 the intrinsic is not not supported (needs power7) and so is not defined. The { dg-require-effective-target powerpc_vsx_ok } is not sufficient for the { dg-do run } and need to be changed to vsx_hw. Also we need add -mcpu=power7 to dg-options to insure the compiler will generated the bpermd/popcntd instructions. Also added: { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power7" } } and dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } To ward off the evil spirits Tests on BE --with-cpu=power6 -m32/-m64 and LE --with-cpu=power8. All bmi/bmi2 intrinsic tests pasted. [gcc/testsuite] 2017-05-31 Steven Munroe * gcc.target/powerpc/bmi2-pdep32-1.c: Add -mcpu=power7 to dg-options. Change dg-require-effective-target powerpc_vsx_ok to vsx_hw. Add dg-skip-if directive disable this test if -mcpu overridden. * gcc.target/powerpc/bmi2-pdep64-1.c: Likewise. * gcc.target/powerpc/bmi2-pext32-1.c: Likewise. * gcc.target/powerpc/bmi2-pext64-1.c: Likewise. * gcc.target/powerpc/bmi2-pext64-1a.c: Add -mcpu=power7 to dg-option. Add dg-skip-if directive to disable this test for darwin. Index: gcc/testsuite/gcc.target/powerpc/bmi2-pdep32-1.c === --- gcc/testsuite/gcc.target/powerpc/bmi2-pdep32-1.c(revision 248468) +++ gcc/testsuite/gcc.target/powerpc/bmi2-pdep32-1.c(working copy) @@ -1,7 +1,8 @@ /* { dg-do run } */ -/* { dg-options "-O3" } */ +/* { dg-options "-O3 -mcpu=power7" } */ /* { dg-require-effective-target lp64 } */ -/* { dg-require-effective-target powerpc_vsx_ok } */ +/* { dg-require-effective-target vsx_hw } */ +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power7" } } */ #define NO_WARN_X86_INTRINSICS 1 #include Index: gcc/testsuite/gcc.target/powerpc/bmi2-pdep64-1.c === --- gcc/testsuite/gcc.target/powerpc/bmi2-pdep64-1.c(revision 248468) +++ gcc/testsuite/gcc.target/powerpc/bmi2-pdep64-1.c(working copy) @@ -1,7 +1,8 @@ /* { dg-do run } */ -/* { dg-options "-O3" } */ +/* { dg-options "-O3 -mcpu=power7" } */ /* { dg-require-effective-target lp64 } */ -/* { dg-require-effective-target powerpc_vsx_ok } */ +/* { dg-require-effective-target vsx_hw } */ +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power7" } } */ #define NO_WARN_X86_INTRINSICS 1 #include Index: gcc/testsuite/gcc.target/powerpc/bmi2-pext32-1.c === --- gcc/testsuite/gcc.target/powerpc/bmi2-pext32-1.c(revision 248468) +++ gcc/testsuite/gcc.target/powerpc/bmi2-pext32-1.c(working copy) @@ -1,7 +1,8 @@ /* { dg-do run } */ -/* { dg-options "-O3" } */ +/* { dg-options "-O3 -mcpu=power7" } */ /* { dg-require-effective-target lp64 } */ -/* { dg-require-effective-target powerpc_vsx_ok } */ +/* { dg-require-effective-target vsx_hw } */ +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power7" } } */ #define NO_WARN_X86_INTRINSICS 1 #include Index: gcc/testsuite/gcc.target/powerpc/bmi2-pext64-1.c === --- gcc/testsuite/gcc.target/powerpc/bmi2-pext64-1.c(revision 248468) +++ gcc/testsuite/gcc.target/powerpc/bmi2-pext64-1.c(working copy) @@ -1,7 +1,8 @@ /* { dg-do run } */ -/* { dg-options "-O3" } */ +/* { dg-options "-O3 -mcpu=power7" } */ /* { dg-require-effective-target lp64 } */ -/* { dg-require-effective-target powerpc_vsx_ok } */ +/* { dg-require-effective-target vsx_hw } */ +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power7" } } */ #define NO_WARN_X86_INTRINSICS 1 #include Index: gcc/testsuite/gcc.target/powerpc/bmi2-pext64-1a.c === --- gcc/testsuite/gcc.target/powerpc/bmi2-pext64-1a.c (revision 248468) +++ gcc/testsuite/gcc.target/powerpc/bmi2-pext64-1a.c (working copy) @@ -1,5 +1,6 @@ /* { dg-do compile } */ -/* { dg-options "-O3" } */ +/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */ +/* { dg-options "-O3 -mcpu=power7" } */ /* { dg-require-effective-target lp64 } */ /* { dg-require-effective-target powerpc_vsx_ok } */
Re: [PATCH], RFC, add support for __float128/__ibm128 types on PowerPC
On Fri, 2014-05-02 at 12:13 +0200, Jakub Jelinek wrote: > Hi! > > On Tue, Apr 29, 2014 at 06:30:32PM -0400, Michael Meissner wrote: > > This patch adds support for a new type (__float128) on the PowerPC to allow > > people to use the 128-bit IEEE floating point format instead of the > > traditional > > IBM double-double that has been used in the Linux compilers. At this time, > > long double still will remain using the IBM double-double format. > > > > There has been an undocumented option to switch long double to to IEEE > > 128-bit, > > but right now, there are bugs I haven't ironed out on VSX systems. > > > > In addition, I added another type (__ibm128) so that when the transition is > > eventually made, people can use this type to get the old long double type. > > > > I was wondering if people had any comments on the code so far, and things I > > should different. Note, I will be out on vacation May 6th - 14th, so I > > don't > > expect to submit the patches until I get back. > > For mangling, if you are going to mangle it the same as the -mlong-double-64 > long double, is __float128 going to be supported solely for ELFv2 ABI and > are you sure nobody has ever used -mlong-double-64 or > --without-long-double-128 configured compiler for it? > What is the plan for glibc (and for libstdc++)? > Looking at current ppc64le glibc, it seems it mistakenly still supports > the -mlong-double-64 stuff (e.g. printf calls are usually redirected to > __nldbl_printf (and tons of other calls). So, is the plan to use > yet another set of symbols? For __nldbl_* it is about 113 entry points > in libc.so and 1 in libm.so, but if you are going to support all of > -mlong-double-64, -mlong-double-128 as well as __float128, that would be far > more, because the compat -mlong-double-64 support mostly works by > redirecting, either in headers or through a special *.a library, to > corresponding double entry points whenever possible. > So, if you call logl in -mlong-double-64 code, it will be redirected to > log, because it has the same ABI. But if you call *printf or nexttowardf > etc. where there is no ABI compatible double entrypoint, it needs to be a > new symbol. > But with __float128 vs. __ibm128 and long double being either of those, > you need different logl. > Yes and we will work on a plan to do this. But at this time and near future there is no performance advantage to __float128 over IBM long double. > Which is why it is so huge problem that this hasn't been resolved initially > as part of ELFv2 changes. Because it was a huge problem and there was no way for the required GCC support to be available in time for GLIBC-2.19. So we will develop a orderly, step by step transition plan. This will take some time.