Re: [PATCH,rs6000] Fix implementation of vec_unpackh, vec_unpackl builtins
Segher: On Mon, 2018-07-02 at 11:53 -0500, Segher Boessenkool wrote: > Hi! > > On Fri, Jun 29, 2018 at 07:38:39AM -0700, Carl Love wrote: > > +;; Unpack high elements of float vector to vector of doubles > > +(define_expand "altivec_unpackh_v4sf" > > + [(set (match_operand:V2DF 0 "register_operand" "=v") > > +(match_operand:V4SF 1 "register_operand" "v"))] > > + "TARGET_VSX" > > +{ > > + emit_insn (gen_doublehv4sf2 (operands[0], operands[1])); > > + DONE; > > +} > > + [(set_attr "type" "veccomplex")]) > > I wondered if these mactually work for all VSX registers, not just > the VMX > registers (i.e. "wa" or similar instead of "v"). But constraints in > define_expand are meaningless anyway; just leave them out please. > > Does it help to define these altivec_unpackh_v4sf, when all it does > is > expand as doublehv4sf2? Can't you more easily put the latter in the > tables? Yes, my bad. It is way cleaner to just do it directly. My first attempt needed the define_expand but then I realized I had made things way more complicated then needed and rewrote the patch. > > > --- /dev/null > > +++ b/gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c > > @@ -0,0 +1,257 @@ > > +/* { dg-do compile { target powerpc*-*-* } } */ > > +/* { dg-require-effective-target powerpc_altivec_ok } */ > > +/* { dg-options "-mpower8-vector -maltivec" } */ > > This needs p8vector_ok then. Is that correct? What requires p8? > Is VSX (p7) enough for everything here? > > > --- /dev/null > > +++ b/gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c > > @@ -0,0 +1,94 @@ > > +/* { dg-do compile { target powerpc*-*-* } } */ > > +/* { dg-require-effective-target powerpc_altivec_ok } */ > > +/* { dg-options "-mpower8-vector -mvsx" } */ > > Same here: required target does not match options. > By bad again, I can't follow my own comments. altivec-1-runnable.c does not need power 8. But altivec-2-runnable.c does, per the comments in the file. Fixed the various issues and retested on powerpc64le-unknown-linux-gnu (Power 8 LE) powerpc64-unknown-linux-gnu (Power 8 BE) powerpc64le-unknown-linux-gnu (Power 9 LE) Please let me know if the patch looks OK for GCC mainline. The patch also needs to be backported to GCC 8. Carl Love ----- gcc/ChangeLog: 2018-07-03 Carl Love * config/rs6000/rs6000-c.c: Map ALTIVEC_BUILTIN_VEC_UNPACKH for float argument to VSX_BUILTIN_DOUBLEH_V4SF. Map ALTIVEC_BUILTIN_VEC_UNPACKL for float argument to VSX_BUILTIN_DOUBLEL_V4SF. gcc/testsuite/ChangeLog: 2018-07-03 Carl Love * gcc.target/altivec-1-runnable.c: New test file. * gcc.target/altivec-2-runnable.c: New test file. * gcc.target/vsx-7.c (main2): Change expected expected instruction for tests. --- gcc/config/rs6000/rs6000-c.c | 4 +- .../gcc.target/powerpc/altivec-1-runnable.c| 257 + .../gcc.target/powerpc/altivec-2-runnable.c| 94 gcc/testsuite/gcc.target/powerpc/vsx-7.c | 7 +- 4 files changed, 356 insertions(+), 6 deletions(-) create mode 100644 gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c create mode 100644 gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c diff --git a/gcc/config/rs6000/rs6000-c.c b/gcc/config/rs6000/rs6000-c.c index f4b1bf7..f37f0b1 100644 --- a/gcc/config/rs6000/rs6000-c.c +++ b/gcc/config/rs6000/rs6000-c.c @@ -865,7 +865,7 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = { RS6000_BTI_bool_V2DI, RS6000_BTI_bool_V4SI, 0, 0 }, { ALTIVEC_BUILTIN_VEC_UNPACKH, ALTIVEC_BUILTIN_VUPKHPX, RS6000_BTI_unsigned_V4SI, RS6000_BTI_pixel_V8HI, 0, 0 }, - { ALTIVEC_BUILTIN_VEC_UNPACKH, ALTIVEC_BUILTIN_VUPKHPX, + { ALTIVEC_BUILTIN_VEC_UNPACKH, VSX_BUILTIN_DOUBLEH_V4SF, RS6000_BTI_V2DF, RS6000_BTI_V4SF, 0, 0 }, { ALTIVEC_BUILTIN_VEC_VUPKHSH, ALTIVEC_BUILTIN_VUPKHSH, RS6000_BTI_V4SI, RS6000_BTI_V8HI, 0, 0 }, @@ -897,7 +897,7 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = { RS6000_BTI_V2DI, RS6000_BTI_V4SI, 0, 0 }, { ALTIVEC_BUILTIN_VEC_UNPACKL, P8V_BUILTIN_VUPKLSW, RS6000_BTI_bool_V2DI, RS6000_BTI_bool_V4SI, 0, 0 }, - { ALTIVEC_BUILTIN_VEC_UNPACKL, ALTIVEC_BUILTIN_VUPKLPX, + { ALTIVEC_BUILTIN_VEC_UNPACKL, VSX_BUILTIN_DOUBLEL_V4SF, RS6000_BTI_V2DF, RS6000_BTI_V4SF, 0, 0 }, { ALTIVEC_BUILTIN_VEC_VUPKLPX, ALTIVEC_BUILTIN_VUPKLPX, RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V8HI, 0, 0 }, diff --git a/gcc/testsuite/gcc.target/powerpc/altivec-1-ru
[PATCH,rs6000] Backport of stxvl instruction fix to GCC 7
GCC Maintainers: The following patch is a back port for a commit to mainline prior to GCC 8 release. Note, the code fixed by this patch was later modified in commit 256798 as part of adding vec_xst_len support. The sldi instruction gets replaced by an ashift of the operand for the stxvl instruction. Commit 256798 adds additional functionality and does not fix any functional issues. Hence it is not being back ported, just the original bug fix given below. The patch has been tested on powerpc64le-unknown-linux-gnu (Power 8 LE) With no regressions. Please let me know if the patch looks OK for GCC 7. Carl Love --- 2018-07-09 Carl Love Backport from mainline 2017-09-07 Carl Love * config/rs6000/vsx.md (define_insn "*stxvl"): Add missing argument to the sldi instruction. --- gcc/config/rs6000/vsx.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md index eef5357..37d768f 100644 --- a/gcc/config/rs6000/vsx.md +++ b/gcc/config/rs6000/vsx.md @@ -3946,7 +3946,7 @@ (match_operand:DI 2 "register_operand" "+r")] UNSPEC_STXVL))] "TARGET_P9_VECTOR && TARGET_64BIT" - "sldi %2,%2\;stxvl %x0,%1,%2" + "sldi %2,%2,56\;stxvl %x0,%1,%2" [(set_attr "length" "8") (set_attr "type" "vecstore")]) -- 2.7.4
Re: [PATCH, rs6000] Fix AIX test case failures
On Fri, 2018-07-13 at 16:00 -0500, Segher Boessenkool wrote: > On Fri, Jul 13, 2018 at 10:51:24AM -0400, David Edelsohn wrote: > > On AIX it would be calling divtc3, but AIX defaults to 64 bit long > > double. Either all of these tests need > > > > /* { dg-require-effective-target longdouble128 } */ > > > > or > > > > /* { dg-additional-options "-mlong-double-128" { target powerpc- > > ibm-aix* } } */ > > > > along with testing for "tc", e.g., bl .__divtc3 > > Which would you prefer David? (I'd do the former). > > > Segher > Segher, David: I reworked the patch per the first option that David gave. The tests divkc3-2.c, divkc3-3.c, mulkc3-2.c and mulkc3-3.c pass on Power 9 Linux as they did before. The tests are unsupported on Power8 Linux as they were before. Now, the tests are reported as unsupported on AIX rather then failing on AIX. Please let me know if you both approve the updated patch below. Thanks for the input and help on this. Carl Love ------- gcc/testsuite/ChangeLog: 2018-07-13 Carl Love * gcc.target/powerpc/divkc3-2.c: Add dg-require-effective-target longdouble128. * gcc.target/powerpc/divkc3-3.c: Ditto. * gcc.target/powerpc/mulkc3-2.c: Ditto. * gcc.target/powerpc/mulkc3-3.c: Ditto. * gcc.target/powerpc/fold-vec-mergehl-double.c: Update counts. * gcc.target/powerpc/pr85456.c: Make check Linux and AIX specific. --- gcc/testsuite/gcc.target/powerpc/divkc3-2.c| 1 + gcc/testsuite/gcc.target/powerpc/divkc3-3.c| 1 + gcc/testsuite/gcc.target/powerpc/fold-vec-mergehl-double.c | 4 +--- gcc/testsuite/gcc.target/powerpc/mulkc3-2.c| 1 + gcc/testsuite/gcc.target/powerpc/mulkc3-3.c| 1 + gcc/testsuite/gcc.target/powerpc/pr85456.c | 3 ++- 6 files changed, 7 insertions(+), 4 deletions(-) diff --git a/gcc/testsuite/gcc.target/powerpc/divkc3-2.c b/gcc/testsuite/gcc.target/powerpc/divkc3-2.c index d3fcbedac..e34ed40ba 100644 --- a/gcc/testsuite/gcc.target/powerpc/divkc3-2.c +++ b/gcc/testsuite/gcc.target/powerpc/divkc3-2.c @@ -1,5 +1,6 @@ /* { dg-do compile { target { powerpc*-*-* } } } */ /* { dg-require-effective-target powerpc_p8vector_ok } */ +/* { dg-require-effective-target longdouble128 } */ /* { dg-options "-O2 -mpower8-vector -mabi=ieeelongdouble -Wno-psabi" } */ /* Check that complex multiply generates the right call when long double is diff --git a/gcc/testsuite/gcc.target/powerpc/divkc3-3.c b/gcc/testsuite/gcc.target/powerpc/divkc3-3.c index 45695fef8..c0fda8b24 100644 --- a/gcc/testsuite/gcc.target/powerpc/divkc3-3.c +++ b/gcc/testsuite/gcc.target/powerpc/divkc3-3.c @@ -1,5 +1,6 @@ /* { dg-do compile { target { powerpc*-*-* } } } */ /* { dg-require-effective-target powerpc_p8vector_ok } */ +/* { dg-require-effective-target longdouble128 } */ /* { dg-options "-O2 -mpower8-vector -mabi=ibmlongdouble -Wno-psabi" } */ /* Check that complex multiply generates the right call when long double is diff --git a/gcc/testsuite/gcc.target/powerpc/fold-vec-mergehl-double.c b/gcc/testsuite/gcc.target/powerpc/fold-vec-mergehl-double.c index 25f4bc6aa..14f944817 100644 --- a/gcc/testsuite/gcc.target/powerpc/fold-vec-mergehl-double.c +++ b/gcc/testsuite/gcc.target/powerpc/fold-vec-mergehl-double.c @@ -19,7 +19,5 @@ testd_h (vector double vd2, vector double vd3) return vec_mergeh (vd2, vd3); } -/* vec_merge with doubles tend to just use xxpermdi (3 ea for BE, 1 ea for LE). */ -/* { dg-final { scan-assembler-times "xxpermdi" 2 { target { powerpc*le-*-* } }} } */ -/* { dg-final { scan-assembler-times "xxpermdi" 6 { target { powerpc-*-* } } } } */ +/* { dg-final { scan-assembler-times "xxpermdi" 2 } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/mulkc3-2.c b/gcc/testsuite/gcc.target/powerpc/mulkc3-2.c index 9ba577a0c..eee6de9e2 100644 --- a/gcc/testsuite/gcc.target/powerpc/mulkc3-2.c +++ b/gcc/testsuite/gcc.target/powerpc/mulkc3-2.c @@ -1,5 +1,6 @@ /* { dg-do compile { target { powerpc*-*-* } } } */ /* { dg-require-effective-target powerpc_p8vector_ok } */ +/* { dg-require-effective-target longdouble128 } */ /* { dg-options "-O2 -mpower8-vector -mabi=ieeelongdouble -Wno-psabi" } */ /* Check that complex multiply generates the right call when long double is diff --git a/gcc/testsuite/gcc.target/powerpc/mulkc3-3.c b/gcc/testsuite/gcc.target/powerpc/mulkc3-3.c index db8730158..b6d2bdf73 100644 --- a/gcc/testsuite/gcc.target/powerpc/mulkc3-3.c +++ b/gcc/testsuite/gcc.target/powerpc/mulkc3-3.c @@ -1,5 +1,6 @@ /* { dg-do compile { target { powerpc*-*-* } } } */ /* { dg-require-effective-target powerpc_p8vector_ok } */ +/* { dg-require-effective-target longd
[PATCH, rs6000] Fix AIX test case failures
Segher: I was requested to backport the patch for the AIX test case failures to GCC 8. The trunk patch applied cleanly to GCC 8. I updated the changelog patch, built and retested the patch on: powerpc64le-unknown-linux-gnu (Power 8 LE) powerpc64-unknown-linux-gnu (Power 8 BE) AIX 7200-00-01-1543 (Power 8 BE) With no regressions. Please let me know if it is OK to apply the patch to the GCC 8 branch. Thanks. Carl Love - gcc/testsuite/ChangeLog: 2018-07-17 Carl Love Backport from mainline 2018-07-16 Carl Love PR target/86414 * gcc.target/powerpc/divkc3-2.c: Add dg-require-effective-target longdouble128. * gcc.target/powerpc/divkc3-3.c: Ditto. * gcc.target/powerpc/mulkc3-2.c: Ditto. * gcc.target/powerpc/mulkc3-3.c: Ditto. * gcc.target/powerpc/fold-vec-mergehl-double.c: Update counts. * gcc.target/powerpc/pr85456.c: Make check Linux and AIX specific. --- gcc/testsuite/gcc.target/powerpc/divkc3-2.c| 1 + gcc/testsuite/gcc.target/powerpc/divkc3-3.c| 1 + gcc/testsuite/gcc.target/powerpc/fold-vec-mergehl-double.c | 4 +--- gcc/testsuite/gcc.target/powerpc/mulkc3-2.c| 1 + gcc/testsuite/gcc.target/powerpc/mulkc3-3.c| 1 + gcc/testsuite/gcc.target/powerpc/pr85456.c | 3 ++- 6 files changed, 7 insertions(+), 4 deletions(-) diff --git a/gcc/testsuite/gcc.target/powerpc/divkc3-2.c b/gcc/testsuite/gcc.target/powerpc/divkc3-2.c index d3fcbed..e34ed40 100644 --- a/gcc/testsuite/gcc.target/powerpc/divkc3-2.c +++ b/gcc/testsuite/gcc.target/powerpc/divkc3-2.c @@ -1,5 +1,6 @@ /* { dg-do compile { target { powerpc*-*-* } } } */ /* { dg-require-effective-target powerpc_p8vector_ok } */ +/* { dg-require-effective-target longdouble128 } */ /* { dg-options "-O2 -mpower8-vector -mabi=ieeelongdouble -Wno-psabi" } */ /* Check that complex multiply generates the right call when long double is diff --git a/gcc/testsuite/gcc.target/powerpc/divkc3-3.c b/gcc/testsuite/gcc.target/powerpc/divkc3-3.c index 45695fe..c0fda8b 100644 --- a/gcc/testsuite/gcc.target/powerpc/divkc3-3.c +++ b/gcc/testsuite/gcc.target/powerpc/divkc3-3.c @@ -1,5 +1,6 @@ /* { dg-do compile { target { powerpc*-*-* } } } */ /* { dg-require-effective-target powerpc_p8vector_ok } */ +/* { dg-require-effective-target longdouble128 } */ /* { dg-options "-O2 -mpower8-vector -mabi=ibmlongdouble -Wno-psabi" } */ /* Check that complex multiply generates the right call when long double is diff --git a/gcc/testsuite/gcc.target/powerpc/fold-vec-mergehl-double.c b/gcc/testsuite/gcc.target/powerpc/fold-vec-mergehl-double.c index 25f4bc6..14f9448 100644 --- a/gcc/testsuite/gcc.target/powerpc/fold-vec-mergehl-double.c +++ b/gcc/testsuite/gcc.target/powerpc/fold-vec-mergehl-double.c @@ -19,7 +19,5 @@ testd_h (vector double vd2, vector double vd3) return vec_mergeh (vd2, vd3); } -/* vec_merge with doubles tend to just use xxpermdi (3 ea for BE, 1 ea for LE). */ -/* { dg-final { scan-assembler-times "xxpermdi" 2 { target { powerpc*le-*-* } }} } */ -/* { dg-final { scan-assembler-times "xxpermdi" 6 { target { powerpc-*-* } } } } */ +/* { dg-final { scan-assembler-times "xxpermdi" 2 } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/mulkc3-2.c b/gcc/testsuite/gcc.target/powerpc/mulkc3-2.c index 9ba577a..eee6de9 100644 --- a/gcc/testsuite/gcc.target/powerpc/mulkc3-2.c +++ b/gcc/testsuite/gcc.target/powerpc/mulkc3-2.c @@ -1,5 +1,6 @@ /* { dg-do compile { target { powerpc*-*-* } } } */ /* { dg-require-effective-target powerpc_p8vector_ok } */ +/* { dg-require-effective-target longdouble128 } */ /* { dg-options "-O2 -mpower8-vector -mabi=ieeelongdouble -Wno-psabi" } */ /* Check that complex multiply generates the right call when long double is diff --git a/gcc/testsuite/gcc.target/powerpc/mulkc3-3.c b/gcc/testsuite/gcc.target/powerpc/mulkc3-3.c index db87301..b6d2bdf 100644 --- a/gcc/testsuite/gcc.target/powerpc/mulkc3-3.c +++ b/gcc/testsuite/gcc.target/powerpc/mulkc3-3.c @@ -1,5 +1,6 @@ /* { dg-do compile { target { powerpc*-*-* } } } */ /* { dg-require-effective-target powerpc_p8vector_ok } */ +/* { dg-require-effective-target longdouble128 } */ /* { dg-options "-O2 -mpower8-vector -mabi=ibmlongdouble -Wno-psabi" } */ /* Check that complex multiply generates the right call when long double is diff --git a/gcc/testsuite/gcc.target/powerpc/pr85456.c b/gcc/testsuite/gcc.target/powerpc/pr85456.c index b9df16a..b928292 100644 --- a/gcc/testsuite/gcc.target/powerpc/pr85456.c +++ b/gcc/testsuite/gcc.target/powerpc/pr85456.c @@ -11,4 +11,5 @@ do_powl (long double a, int i) return __builtin_powil (a, i); } -/* { dg-final { scan-assembler "bl __powikf2" } } */ +/* { dg-final { scan-assembl
[PATCH,rs6000] AIX test fixes 2
GCC maintainers: The following patch fixes errors on AIX for the "vector double" tests in altivec-1-runnable.c file. The type "vector double" requires the use of the GCC command line option -mvsx. The vector double tests in altivec-1-runnable.c should be in altivec-2-runnable.c. It looks like my Linux testing of the original patch worked because I configured GCC by default with -mcpu=power8. AIX is not using that as the default processor thus causing the compile of altivec-1-runnable.c to fail. The vec_or tests in builtins-1.c were moved to another file by a previous patch. The vec_or test generated the xxlor instruction. The count of the xxlor instruction varies depending on the target as it is used as a move instruction. No other tests generate the xxlor instruction. Hence, the count check was removed. The patch has been tested on powerpc64le-unknown-linux-gnu (Power 8 LE) powerpc64-unknown-linux-gnu (Power 8 BE) AIX (Power 8) With no regressions. Please let me know if the patch looks OK for trunk. Carl Love gcc/testsuite/ChangeLog: 2018-07-20 Carl Love * gcc.target/powerpc/altivec-1-runnable.c: Move vector double tests to file altivec-2-runnable.c. * gcc.target/powerpc/altivec-2-runnable.c: Add vector double tests. * gcc.target/powerpc/buitlins-1.c: Remove check for xxlor. Add linux and AIX targets for divdi3 and udivdi3 instructions. --- .../gcc.target/powerpc/altivec-1-runnable.c| 50 -- .../gcc.target/powerpc/altivec-2-runnable.c| 49 - gcc/testsuite/gcc.target/powerpc/builtins-1.c | 9 ++-- 3 files changed, 52 insertions(+), 56 deletions(-) diff --git a/gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c b/gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c index bb913d2..da8ebbc 100644 --- a/gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c +++ b/gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c @@ -31,16 +31,9 @@ int main () vector signed int vec_si_result, vec_si_expected; vector signed char vec_sc_arg; vector signed char vec_sc_result, vec_sc_expected; - vector float vec_float_arg; - vector double vec_double_result, vec_double_expected; vector pixel vec_pixel_arg; vector unsigned int vec_ui_result, vec_ui_expected; - union conv { - double d; - unsigned long long l; - } conv_exp, conv_val; - vec_bs_arg = (vector bool short){ 0, 101, 202, 303, 404, 505, 606, 707 }; vec_bi_expected = (vector bool int){ 0, 101, 202, 303 }; @@ -209,49 +202,6 @@ int main () abort(); #endif } - - - vec_float_arg = (vector float){ 0.0, 1.5, 2.5, 3.5 }; - - vec_double_expected = (vector double){ 0.0, 1.5 }; - - vec_double_result = vec_unpackh (vec_float_arg); - - for (i = 0; i < 2; i++) { -if (vec_double_expected[i] != vec_double_result[i]) - { -#if DEBUG - printf("ERROR: vec_unpackh(), vec_double_expected[%d] = %f does not match vec_double_result[%d] = %f\n", - i, vec_double_expected[i], i, vec_double_result[i]); - conv_val.d = vec_double_result[i]; - conv_exp.d = vec_double_expected[i]; - printf(" vec_unpackh(), vec_double_expected[%d] = 0x%llx does not match vec_double_result[%d] = 0x%llx\n", - i, conv_exp.l, i,conv_val.l); -#else - abort(); -#endif -} - } - - vec_double_expected = (vector double){ 2.5, 3.5 }; - - vec_double_result = vec_unpackl (vec_float_arg); - - for (i = 0; i < 2; i++) { -if (vec_double_expected[i] != vec_double_result[i]) - { -#if DEBUG - printf("ERROR: vec_unpackl() vec_double_expected[%d] = %f does not match vec_double_result[%d] = %f\n", - i, vec_double_expected[i], i, vec_double_result[i]); - conv_val.d = vec_double_result[i]; - conv_exp.d = vec_double_expected[i]; - printf(" vec_unpackh(), vec_double_expected[%d] = 0x%llx does not match vec_double_result[%d] = 0x%llx\n", - i, conv_exp.l, i,conv_val.l); -#else - abort(); -#endif - } - } return 0; } diff --git a/gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c b/gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c index 9d8aad4..041edcb 100644 --- a/gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c +++ b/gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c @@ -23,8 +23,15 @@ int main () vector signed int vec_si_arg; vector signed long long int vec_slli_result, vec_slli_expected; + vector float vec_float_arg; + vector double vec_double_result, vec_double_expected; - /* use of ‘long long’ in AltiVec types requires -mvsx */ + union conv { + double d; + unsigned long long l; + } conv_exp, conv_val; + +
[PATCH] rs6000, Add missing overloaded bcd builtin tests
GCC maintainers: The following patch adds tests for two of the rs6000 overloaded built- ins that do not have tests. Additionally the GCC documentation file doc/extend.texi is updated to include the built-in definitions as they were missing. The patch has been tested on a Power 10 system with no regressions. Please let me know if this patch is acceptable for mainline. Carl --- rs6000, Add missing overloaded bcd builtin tests The two BCD overloaded built-ins __builtin_bcdsub_ge and __builtin_bcdsub_le do not have a corresponding test. Add tests to existing test file and update the documentation with the built-in definitions. gcc/ChangeLog: * doc/extend.texi (__builtin_bcdsub_le, __builtin_bcdsub_ge): Add documentation for the builti-ins. gcc/testsuite/ChangeLog: * bcd-3.c (do_sub_ge, do_suble): Add functions to test builtins __builtin_bcdsub_ge and __builtin_bcdsub_le). --- gcc/doc/extend.texi | 4 gcc/testsuite/gcc.target/powerpc/bcd-3.c | 22 +- 2 files changed, 25 insertions(+), 1 deletion(-) diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index cf0d0c63cce..fa7402813e7 100644 --- a/gcc/doc/extend.texi +++ b/gcc/doc/extend.texi @@ -20205,12 +20205,16 @@ int __builtin_bcdadd_ov (vector unsigned char, vector unsigned char, const int); vector __int128 __builtin_bcdsub (vector __int128, vector __int128, const int); vector unsigned char __builtin_bcdsub (vector unsigned char, vector unsigned char, const int); +int __builtin_bcdsub_le (vector __int128, vector __int128, const int); +int __builtin_bcdsub_le (vector unsigned char, vector unsigned char, const int); int __builtin_bcdsub_lt (vector __int128, vector __int128, const int); int __builtin_bcdsub_lt (vector unsigned char, vector unsigned char, const int); int __builtin_bcdsub_eq (vector __int128, vector __int128, const int); int __builtin_bcdsub_eq (vector unsigned char, vector unsigned char, const int); int __builtin_bcdsub_gt (vector __int128, vector __int128, const int); int __builtin_bcdsub_gt (vector unsigned char, vector unsigned char, const int); +int __builtin_bcdsub_ge (vector __int128, vector __int128, const int); +int __builtin_bcdsub_ge (vector unsigned char, vector unsigned char, const int); int __builtin_bcdsub_ov (vector __int128, vector __int128, const int); int __builtin_bcdsub_ov (vector unsigned char, vector unsigned char, const int); @end smallexample diff --git a/gcc/testsuite/gcc.target/powerpc/bcd-3.c b/gcc/testsuite/gcc.target/powerpc/bcd-3.c index 7948a0c95e2..9891f4ff08e 100644 --- a/gcc/testsuite/gcc.target/powerpc/bcd-3.c +++ b/gcc/testsuite/gcc.target/powerpc/bcd-3.c @@ -3,7 +3,7 @@ /* { dg-require-effective-target powerpc_p8vector_ok } */ /* { dg-options "-mdejagnu-cpu=power8 -O2" } */ /* { dg-final { scan-assembler-times "bcdadd\[.\] " 4 } } */ -/* { dg-final { scan-assembler-times "bcdsub\[.\] " 4 } } */ +/* { dg-final { scan-assembler-times "bcdsub\[.\] " 6 } } */ /* { dg-final { scan-assembler-not "bl __builtin" } } */ /* { dg-final { scan-assembler-not "mtvsr" } } */ /* { dg-final { scan-assembler-not "mfvsr" } } */ @@ -93,6 +93,26 @@ do_sub_gt (vector_128_t a, vector_128_t b, int *p) return ret; } +vector_128_t +do_sub_ge (vector_128_t a, vector_128_t b, int *p) +{ + vector_128_t ret = __builtin_bcdsub (a, b, 0); + if (__builtin_bcdsub_ge (a, b, 0)) +*p = 1; + + return ret; +} + +vector_128_t +do_sub_le (vector_128_t a, vector_128_t b, int *p) +{ + vector_128_t ret = __builtin_bcdsub (a, b, 0); + if (__builtin_bcdsub_le (a, b, 0)) +*p = 1; + + return ret; +} + vector_128_t do_sub_ov (vector_128_t a, vector_128_t b, int *p) { -- 2.37.2
Re: [PATCH] rs6000, Add missing overloaded bcd builtin tests
On Tue, 2023-10-31 at 10:34 +0800, Kewen.Lin wrote: > Hi Carl, > > on 2023/10/31 08:08, Carl Love wrote: > > GCC maintainers: > > > > The following patch adds tests for two of the rs6000 overloaded > > built- > > ins that do not have tests. Additionally the GCC documentation > > file > > I just found that actually they have the test coverage, because we > have > > #define __builtin_bcdcmpeq(a,b) __builtin_vec_bcdsub_eq(a,b,0) > #define __builtin_bcdcmpgt(a,b) __builtin_vec_bcdsub_gt(a,b,0) > #define __builtin_bcdcmplt(a,b) __builtin_vec_bcdsub_lt(a,b,0) > #define __builtin_bcdcmpge(a,b) __builtin_vec_bcdsub_ge(a,b,0) > #define __builtin_bcdcmple(a,b) __builtin_vec_bcdsub_le(a,b,0) > > in altivec.h and gcc/testsuite/gcc.target/powerpc/bcd-4.c tests all > these OK, my simple scripts are not going to pickup the stuff in altivec.h. They were just grepping for the built-in name in the test file directory. > __builtin_bcdcmp* ... > > > doc/extend.texi is updated to include the built-in definitions as > > they > > were missing. > > ... since we already document __builtin_vec_bcdsub_{eq,gt,lt}, I > think > it's still good to supplement the documentation and add the explicit > testing cases. > > > The patch has been tested on a Power 10 system with no > > regressions. > > Please let me know if this patch is acceptable for mainline. > > > > Carl > > > > --- > > rs6000, Add missing overloaded bcd builtin tests > > > > The two BCD overloaded built-ins __builtin_bcdsub_ge and > > __builtin_bcdsub_le > > do not have a corresponding test. Add tests to existing test file > > and update > > the documentation with the built-in definitions. > > As above, this commit log doesn't describe the actuality well, please > update > it with something like: > > Currently we have the documentation for > __builtin_vec_bcdsub_{eq,gt,lt} but > not for __builtin_bcdsub_[gl]e, this patch is to supplement the > descriptions > for them. Although they are mainly for __builtin_bcdcmp{ge,le}, we > already > have some testing coverage for __builtin_vec_bcdsub_{eq,gt,lt}, this > patch > adds the corresponding explicit test cases as well. > OK, replaced the commit log with the suggestion. > > gcc/ChangeLog: > > * doc/extend.texi (__builtin_bcdsub_le, __builtin_bcdsub_ge): > > Add > > documentation for the builti-ins. > > > > gcc/testsuite/ChangeLog: > > * bcd-3.c (do_sub_ge, do_suble): Add functions to test builtins > > __builtin_bcdsub_ge and __builtin_bcdsub_le). > > 1) Unexpected ")" at the end. > > 2) I supposed git gcc-verify would complain on this changelog entry. > > Should be starting with: > > * gcc.target/powerpc/bcd-3.c ( > > , no? > Yes, I ment to run the commit check but obviously got distracted and didn't. Sorry about that. > OK for trunk with the above comments addressed, thanks! > OK, thanks. Carl > BR, > Kewen > > > --- > > gcc/doc/extend.texi | 4 > > gcc/testsuite/gcc.target/powerpc/bcd-3.c | 22 > > +- > > 2 files changed, 25 insertions(+), 1 deletion(-) > > > > diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi > > index cf0d0c63cce..fa7402813e7 100644 > > --- a/gcc/doc/extend.texi > > +++ b/gcc/doc/extend.texi > > @@ -20205,12 +20205,16 @@ int __builtin_bcdadd_ov (vector unsigned > > char, vector unsigned char, const int); > > vector __int128 __builtin_bcdsub (vector __int128, vector > > __int128, const int); > > vector unsigned char __builtin_bcdsub (vector unsigned char, > > vector unsigned char, > > const int); > > +int __builtin_bcdsub_le (vector __int128, vector __int128, const > > int); > > +int __builtin_bcdsub_le (vector unsigned char, vector unsigned > > char, const int); > > int __builtin_bcdsub_lt (vector __int128, vector __int128, const > > int); > > int __builtin_bcdsub_lt (vector unsigned char, vector unsigned > > char, const int); > > int __builtin_bcdsub_eq (vector __int128, vector __int128, const > > int); > > int __builtin_bcdsub_eq (vector unsigned char, vector unsigned > > char, const int); > > int __builtin_bcdsub_gt (vector __int128, vector __int128, const > > int); > > int __builtin_bcdsub_gt (vector unsigned char, vector unsigned > > char, const int); > > +int __builtin_bcdsub_ge (vect
Re: [PATCH] rs6000, Add missing overloaded bcd builtin tests
Segher: On Tue, 2023-10-31 at 11:17 -0500, Segher Boessenkool wrote: > > You could use gcov to see which rs6000 builtins are not exercised by > anything in the testsuite, maybe. This probably can be automated > pretty > nicely. I will take a look at gcov. I just did some relatively simple scripts to go look for test cases. For the non-overloaded built-ins, the scrips had to exclude built-ins referenced by the overloaded built-ins. This patch is just the first of a series of patches that I am working on to try and clean up the built-in stuff per some comments in a PR. The internal LTC issue is https://github.ibm.com/ltc-toolchain/power-gcc/issues/1288 The goal is to make sure there are test cases and documentation for all of the overloaded and non overloaded built-in definitions. Just a low priority project to fill any spare cycles. :-) Carl
rs6000, built-in cleanup patch series
GCC maintainers: The following series of patches cleanup some of the rs6000 built-in support. Some of the first patches fix errors in the definition of a few of the built-ins. The built-ins are supposed to have unsigned arguments but are listed as signed. Some of the built-ins are supposed to return unsigned values but were defined to return a signed value. There are a number of built-ins that are not documented but are duplicates of other documented built-ins. The duplicate definitions are removed so users will only use the supported documented built-ins. There are a number of the built-ins that are not documented in either the Power Vector Intrinsic Reference manual or in the gcc/doc/extend.texi file. The patch adds the missing documentation as needed. Also most of the built-ins do not have test cases. The patch adds test cases for the various built-ins. Carl
[PATCH 01/11] rs6000, Fix __builtin_vsx_cmple* args and documentation, builtins
GCC maintainers: This patch fixes the arguments and return type for the various __builtin_vsx_cmple* built-ins. They were defined as signed but should have been defined as unsigned. The patch has been tested on Power 10 with no regressions. Please let me know if this patch is acceptable for mainline. Thanks. Carl - rs6000, Fix __builtin_vsx_cmple* args and documentation, builtins The built-ins __builtin_vsx_cmple_u16qi, __builtin_vsx_cmple_u2di, __builtin_vsx_cmple_u4si and __builtin_vsx_cmple_u8hi should take unsigned arguments and return an unsigned result. This patch changes the arguments and return type from signed to unsigned. The documentation for the signed and unsigned versions of __builtin_vsx_cmple is missing from extend.texi. This patch adds the missing documentation. Test cases are added for each of the signed and unsigned built-ins. gcc/ChangeLog: * config/rs6000/rs6000-builtins.def (__builtin_vsx_cmple_u16qi, __builtin_vsx_cmple_u2di, __builtin_vsx_cmple_u4si): Change arguments and return from signed to unsigned. * doc/extend.texi (__builtin_vsx_cmple_16qi, __builtin_vsx_cmple_8hi, __builtin_vsx_cmple_4si, __builtin_vsx_cmple_u16qi, __builtin_vsx_cmple_u8hi, __builtin_vsx_cmple_u4si): Add documentation. gcc/testsuite/ChangeLog: * gcc.target/powerpc/vsx-cmple.c: New test file. --- gcc/config/rs6000/rs6000-builtins.def| 10 +- gcc/doc/extend.texi | 23 gcc/testsuite/gcc.target/powerpc/vsx-cmple.c | 127 +++ 3 files changed, 155 insertions(+), 5 deletions(-) create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-cmple.c diff --git a/gcc/config/rs6000/rs6000-builtins.def b/gcc/config/rs6000/rs6000-builtins.def index 3bc7fed6956..d66a53a0fab 100644 --- a/gcc/config/rs6000/rs6000-builtins.def +++ b/gcc/config/rs6000/rs6000-builtins.def @@ -1349,16 +1349,16 @@ const vss __builtin_vsx_cmple_8hi (vss, vss); CMPLE_8HI vector_ngtv8hi {} - const vsc __builtin_vsx_cmple_u16qi (vsc, vsc); + const vuc __builtin_vsx_cmple_u16qi (vuc, vuc); CMPLE_U16QI vector_ngtuv16qi {} - const vsll __builtin_vsx_cmple_u2di (vsll, vsll); + const vull __builtin_vsx_cmple_u2di (vull, vull); CMPLE_U2DI vector_ngtuv2di {} - const vsi __builtin_vsx_cmple_u4si (vsi, vsi); + const vui __builtin_vsx_cmple_u4si (vui, vui); CMPLE_U4SI vector_ngtuv4si {} - const vss __builtin_vsx_cmple_u8hi (vss, vss); + const vus __builtin_vsx_cmple_u8hi (vus, vus); CMPLE_U8HI vector_ngtuv8hi {} const vd __builtin_vsx_concat_2df (double, double); @@ -1769,7 +1769,7 @@ const vf __builtin_vsx_xvcvuxdsp (vull); XVCVUXDSP vsx_xvcvuxdsp {} - const vd __builtin_vsx_xvcvuxwdp (vsi); + const vd __builtin_vsx_xvcvuxwdp (vui); XVCVUXWDP vsx_xvcvuxwdp {} const vf __builtin_vsx_xvcvuxwsp (vsi); diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index 2b8ba1949bf..4d8610f6aa8 100644 --- a/gcc/doc/extend.texi +++ b/gcc/doc/extend.texi @@ -22522,6 +22522,29 @@ if the VSX instruction set is available. The @samp{vec_vsx_ld} and @samp{vec_vsx_st} built-in functions always generate the VSX @samp{LXVD2X}, @samp{LXVW4X}, @samp{STXVD2X}, and @samp{STXVW4X} instructions. + +@smallexample +vector signed char __builtin_vsx_cmple_16qi (vector signed char, + vector signed char); +vector signed short __builtin_vsx_cmple_8hi (vector signed short, + vector signed short); +vector signed int __builtin_vsx_cmple_4si (vector signed int, + vector signed int); +vector unsigned char __builtin_vsx_cmple_u16qi (vector unsigned char, +vector unsigned char); +vector unsigned short __builtin_vsx_cmple_u8hi (vector unsigned short, +vector unsigned short); +vector unsigned int __builtin_vsx_cmple_u4si (vector unsigned int, + vector unsigned int); +@end smallexample + +The builti-ins @code{__builtin_vsx_cmple_16qi}, @code{__builtin_vsx_cmple_8hi}, +@code{__builtin_vsx_cmple_4si}, @code{__builtin_vsx_cmple_u16qi}, +@code{__builtin_vsx_cmple_u8hi} and @code{__builtin_vsx_cmple_u4si} compare +vectors of their defined type. The corresponding result element is set to +all ones if the two argument elements are less than or equal and all zeros +otherwise. + @node PowerPC AltiVec Built-in Functions Available on ISA 2.07 @subsubsection PowerPC AltiVec Built-in Functions Available on ISA 2.07 diff --git a/gcc/testsuite/gcc.target/powerpc/vsx-cmple.c b/gcc/testsuite/gcc.target/powerpc/vsx-cmple.c new file mode 100644 index 000..081817b4ba3 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/vsx-cmple.c @@ -0,0 +1,127 @@ +/* { dg
[PATCH 02/11] rs6000, fix arguments, add documentation for vector, element conversions
GCC maintainers: This patch fixes the return type for the __builtin_vsx_xvcvdpuxws and __builtin_vsx_xvcvspuxds built-ins. They were defined as signed but should have been defined as unsigned. The patch has been tested on Power 10 with no regressions. Please let me know if this patch is acceptable for mainline. Thanks. Carl - rs6000, fix arguments, add documentation for vector element conversions The return type for the __builtin_vsx_xvcvdpuxws, __builtin_vsx_xvcvspuxds, __builtin_vsx_xvcvspuxws built-ins should be unsigned. This patch changes the return values from signed to unsigned. The documentation for the vector element conversion built-ins: __builtin_vsx_xvcvspsxws __builtin_vsx_xvcvspsxds __builtin_vsx_xvcvspuxds __builtin_vsx_xvcvdpsxws __builtin_vsx_xvcvdpuxws __builtin_vsx_xvcvdpuxds_uns __builtin_vsx_xvcvspdp __builtin_vsx_xvcvdpsp __builtin_vsx_xvcvspuxws __builtin_vsx_xvcvsxwdp __builtin_vsx_xvcvuxddp_uns __builtin_vsx_xvcvuxwdp is missing from extend.texi. This patch adds the missing documentation. This patch also adds runnable test cases for each of the built-ins. gcc/ChangeLog: * config/rs6000/rs6000-builtins.def (__builtin_vsx_xvcvdpuxws, __builtin_vsx_xvcvspuxds, __builtin_vsx_xvcvspuxws): Change return type from signed to unsigned. * doc/extend.texi (__builtin_vsx_xvcvspsxws, __builtin_vsx_xvcvspsxds, __builtin_vsx_xvcvspuxds, __builtin_vsx_xvcvdpsxws, __builtin_vsx_xvcvdpuxws, __builtin_vsx_xvcvdpuxds_uns, __builtin_vsx_xvcvspdp, __builtin_vsx_xvcvdpsp, __builtin_vsx_xvcvspuxws, __builtin_vsx_xvcvsxwdp, __builtin_vsx_xvcvuxddp_uns, __builtin_vsx_xvcvuxwdp): Add documentation for builtins. gcc/testsuite/ChangeLog: * gcc.target/powerpc/vsx-builtin-runnable-1.c: New test file. --- gcc/config/rs6000/rs6000-builtins.def | 6 +- gcc/doc/extend.texi | 135 ++ .../powerpc/vsx-builtin-runnable-1.c | 233 ++ 3 files changed, 371 insertions(+), 3 deletions(-) create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-builtin-runnable-1.c diff --git a/gcc/config/rs6000/rs6000-builtins.def b/gcc/config/rs6000/rs6000-builtins.def index d66a53a0fab..fd316f629e5 100644 --- a/gcc/config/rs6000/rs6000-builtins.def +++ b/gcc/config/rs6000/rs6000-builtins.def @@ -1724,7 +1724,7 @@ const vull __builtin_vsx_xvcvdpuxds_uns (vd); XVCVDPUXDS_UNS vsx_fixuns_truncv2dfv2di2 {} - const vsi __builtin_vsx_xvcvdpuxws (vd); + const vui __builtin_vsx_xvcvdpuxws (vd); XVCVDPUXWS vsx_xvcvdpuxws {} const vd __builtin_vsx_xvcvspdp (vf); @@ -1736,10 +1736,10 @@ const vsi __builtin_vsx_xvcvspsxws (vf); XVCVSPSXWS vsx_fix_truncv4sfv4si2 {} - const vsll __builtin_vsx_xvcvspuxds (vf); + const vull __builtin_vsx_xvcvspuxds (vf); XVCVSPUXDS vsx_xvcvspuxds {} - const vsi __builtin_vsx_xvcvspuxws (vf); + const vui __builtin_vsx_xvcvspuxws (vf); XVCVSPUXWS vsx_fixuns_truncv4sfv4si2 {} const vd __builtin_vsx_xvcvsxddp (vsll); diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index 4d8610f6aa8..583b1d890bf 100644 --- a/gcc/doc/extend.texi +++ b/gcc/doc/extend.texi @@ -21360,6 +21360,141 @@ __float128 __builtin_sqrtf128 (__float128); __float128 __builtin_fmaf128 (__float128, __float128, __float128); @end smallexample +@smallexample +vector int __builtin_vsx_xvcvspsxws (vector float); +@end smallexample + +The @code{__builtin_vsx_xvcvspsxws} converts the single precision floating +point vector element i to a signed single-precision integer value using +round to zero storing the result in element i. If the source element is NaN +the result is set to 0x8000 and VXCI is set to 1. If the source +element is SNaN then VXSNAN is also set to 1. If the rounded value is greater +than 2^31 - 1 the result is 0x7FFF and VXCVI is set to 1. If the +rounded value is less than -2^31, the result is set to 0x8000 and +VXCVI is set to 1. If the rounded result is inexact then XX is set to 1. + +@smallexample +vector signed long long int __builtin_vsx_xvcvspsxds (vector float); +@end smallexample + +The @code{__builtin_vsx_xvcvspsxds} converts the single precision floating +point vector element to a double precision signed integer value using the +round to zero rounding mode. If the source element is NaN the result +is set to 0x8000 and VXCI is set to 1. If the source element is +SNaN then VXSNAN is also set to 1. If the rounded value is greater than +2^63 - 1 the result is 0x7FFF and VXCVI is set to 1. If the +rounded value is less than zero, the result is set to 0x8000 and +VXCVI is set to 1. If the rounded result is inexact then XX is set to 1. + +@smallexample +vector unsigned long long __builtin_vsx_xvcvspuxds (vector float); +@end smallexample + +The @code{__builtin_vsx_xvcvspuxds} conv
[PATCH 05/11] rs6000, __builtin_vsx_xvneg[sp,dp] add documentation, and test cases
GCC maintainers: The patch adds documentation and test cases for the __builtin_vsx_xvnegsp, __builtin_vsx_xvnegdp built-ins. The patch has been tested on Power 10 with no regressions. Please let me know if this patch is acceptable for mainline. Thanks. Carl rs6000, __builtin_vsx_xvneg[sp,dp] add documentation and test cases Add documentation to the extend.texi file for the two built-ins __builtin_vsx_xvnegsp, __builtin_vsx_xvnegdp. Add test cases for the two built-ins. gcc/ChangeLog: * doc/extend.texi (__builtin_vsx_xvnegsp, __builtin_vsx_xvnegdp): Add documentation. gcc/testsuite/ChangeLog: * gcc.target/powerpc/vsx-builtin-runnable-2.c: New test case. --- gcc/doc/extend.texi | 13 + .../powerpc/vsx-builtin-runnable-2.c | 51 +++ 2 files changed, 64 insertions(+) create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-builtin-runnable-2.c diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index 583b1d890bf..83eed9e334b 100644 --- a/gcc/doc/extend.texi +++ b/gcc/doc/extend.texi @@ -21495,6 +21495,19 @@ The @code{__builtin_vsx_xvcvuxwdp} converts single precision unsigned integer value to a double precision floating point value. Input element at index 2*i is stored in the destination element i. +@smallexample +vector float __builtin_vsx_xvnegsp (vector float); +vector double __builtin_vsx_xvnegdp (vector double); +@end smallexample + +The @code{__builtin_vsx_xvnegsp} and @code{__builtin_vsx_xvnegdp} negate each +vector element. + +@smallexample +vector __int128 __builtin_vsx_xxpermdi_1ti (vector __int128, vector __int128, +const int); + +@end smallexample @node Basic PowerPC Built-in Functions Available on ISA 2.07 @subsubsection Basic PowerPC Built-in Functions Available on ISA 2.07 diff --git a/gcc/testsuite/gcc.target/powerpc/vsx-builtin-runnable-2.c b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-runnable-2.c new file mode 100644 index 000..7906a8e01d7 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-runnable-2.c @@ -0,0 +1,51 @@ +/* { dg-do run { target { lp64 } } } */ +/* { dg-require-effective-target powerpc_vsx_ok } */ +/* { dg-options "-O2 -mdejagnu-cpu=power7" } */ + +#define DEBUG 0 + +#if DEBUG +#include +#include +#endif + +void abort (void); + +int main () +{ + int i; + vector double vd_arg1, vd_result, vd_expected_result; + vector float vf_arg1, vf_result, vf_expected_result; + + /* VSX Vector Negate Single-Precision. */ + + vf_arg1 = (vector float) {-1.0, 12345.98, -2.1234, 238.9}; + vf_result = __builtin_vsx_xvnegsp (vf_arg1); + vf_expected_result = (vector float) {1.0, -12345.98, 2.1234, -238.9}; + + for (i = 0; i < 4; i++) +if (vf_result[i] != vf_expected_result[i]) +#if DEBUG + printf("ERROR, __builtin_vsx_xvnegsp: vf_result[%d] = %f, vf_expected_result[%d] = %f\n", +i, vf_result[i], i, vf_expected_result[i]); +#else + abort(); +#endif + + /* VSX Vector Negate Double-Precision. */ + + vd_arg1 = (vector double) {12345.98, -2.1234}; + vd_result = __builtin_vsx_xvnegdp (vd_arg1); + vd_expected_result = (vector double) {-12345.98, 2.1234}; + + for (i = 0; i < 2; i++) +if (vd_result[i] != vd_expected_result[i]) +#if DEBUG + printf("ERROR, __builtin_vsx_xvnegdp: vd_result[%d] = %f, vd_expected_result[%d] = %f\n", +i, vd_result[i], i, vd_expected_result[i]); +#else + abort(); +#endif + + return 0; +} -- 2.43.0
[PATCH 06/11] rs6000, __builtin_vsx_xxpermdi_1ti add documentation, and test case
GCC maintainers: The patch adds documentation and test case for the __builtin_vsx_xxpermdi_1ti built-in. The patch has been tested on Power 10 with no regressions. Please let me know if this patch is acceptable for mainline. Thanks. Carl rs6000, __builtin_vsx_xxpermdi_1ti add documentation and test case Add documentation to the extend.texi file for the __builtin_vsx_xxpermdi_1ti built-in. Add test cases for the __builtin_vsx_xxpermdi_1ti built-in. gcc/ChangeLog: * doc/extend.texi (__builtin_vsx_xxpermdi_1ti): Add documentation. gcc/testsuite/ChangeLog: * gcc.target/powerpc/vsx-builtin-runnable-3.c: New test case. --- gcc/doc/extend.texi | 7 +++ .../powerpc/vsx-builtin-runnable-3.c | 48 +++ 2 files changed, 55 insertions(+) create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-builtin-runnable-3.c diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index 83eed9e334b..22f67ebab31 100644 --- a/gcc/doc/extend.texi +++ b/gcc/doc/extend.texi @@ -21508,6 +21508,13 @@ vector __int128 __builtin_vsx_xxpermdi_1ti (vector __int128, vector __int128, const int); @end smallexample + +The @code{__builtin_vsx_xxpermdi_1ti} Let srcA[127:0] be the 128-bit first +argument and srcB[127:0] be the 128-bit second argument. Let sel[1:0] be the +least significant bits of the const int argument (third input argument). The +result bits [127:64] is srcB[127:64] if sel[1] = 0, srcB[63:0] otherwise. The +result bits [63:0] is srcA[127:64] if sel[0] = 0, srcA[63:0] otherwise. + @node Basic PowerPC Built-in Functions Available on ISA 2.07 @subsubsection Basic PowerPC Built-in Functions Available on ISA 2.07 diff --git a/gcc/testsuite/gcc.target/powerpc/vsx-builtin-runnable-3.c b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-runnable-3.c new file mode 100644 index 000..ba287597cec --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-runnable-3.c @@ -0,0 +1,48 @@ +/* { dg-do run { target { lp64 } } } */ +/* { dg-require-effective-target powerpc_vsx_ok } */ +/* { dg-options "-O2 -mdejagnu-cpu=power7" } */ + +#include + +#define DEBUG 0 + +#if DEBUG +#include +#include +#endif + +void abort (void); + +int main () +{ + int i; + + vector signed __int128 vsq_arg1, vsq_arg2, vsq_result, vsq_expected_result; + + vsq_arg1[0] = (__int128) 0x; + vsq_arg1[0] = vsq_arg1[0] << 64 | (__int128) 0x; + vsq_arg2[0] = (__int128) 0x1100110011001100; + vsq_arg2[0] = (vsq_arg2[0] << 64) | (__int128) 0x; + + vsq_expected_result[0] = (__int128) 0x; + vsq_expected_result[0] = (vsq_expected_result[0] << 64) +| (__int128) 0x; + + vsq_result = __builtin_vsx_xxpermdi_1ti (vsq_arg1, vsq_arg2, 2); + + if (vsq_result[0] != vsq_expected_result[0]) +{ +#if DEBUG + printf("ERROR, __builtin_vsx_xxpermdi_1ti: vsq_result = 0x%016llx %016llx\n", + (unsigned long long) (vsq_result[0] >> 64), + (unsigned long long) vsq_result[0]); + printf(" vsq_expected_resultd = 0x%016llx %016llx\n", + (unsigned long long)(vsq_expected_result[0] >> 64), + (unsigned long long) vsq_expected_result[0]); +#else + abort(); +#endif + } + + return 0; +} -- 2.43.0
[PATCH 04/11] rs6000, Update comment for the __builtin_vsx_vper*, built-ins.
GCC maintainers: The patch expands an existing comment to document that the duplicates are covered by an overloaded built-in. I am wondering if we should just go ahead and remove the duplicates? The patch has been tested on Power 10 with no regressions. Please let me know if this patch is acceptable for mainline. Thanks. Carl - rs6000, Update comment for the __builtin_vsx_vper* built-ins. There is a comment about the __builtin_vsx_vper* built-ins being duplicates of the __builtin_altivec_* built-ins. The note says we should consider deprecation/removeal of the __builtin_vsx_vper*. Add a note that the _builtin_vsx_vper* built-ins are covered by the overloaded vec_perm built-ins which use the __builtin_altivec_* built-in definitions. gcc/ChangeLog: * config/rs6000/rs6000-builtins.def ( __builtin_vsx_vperm_*): Add comment to existing comment about the built-ins. --- gcc/config/rs6000/rs6000-builtins.def | 8 1 file changed, 8 insertions(+) diff --git a/gcc/config/rs6000/rs6000-builtins.def b/gcc/config/rs6000/rs6000-builtins.def index 96d095da2cb..4c95429f137 100644 --- a/gcc/config/rs6000/rs6000-builtins.def +++ b/gcc/config/rs6000/rs6000-builtins.def @@ -1556,6 +1556,14 @@ ; These are duplicates of __builtin_altivec_* counterparts, and are being ; kept for backwards compatibility. The reason for their existence is ; unclear. TODO: Consider deprecation/removal at some point. +; Note, __builtin_vsx_vperm_16qi, __builtin_vsx_vperm_16qi_uns, +; __builtin_vsx_vperm_1ti, __builtin_vsx_vperm_v1ti_uns, +; __builtin_vsx_vperm_2df, __builtin_vsx_vperm_2di, __builtin_vsx_vperm_2di, +; __builtin_vsx_vperm_2di_uns, __builtin_vsx_vperm_4sf, +; __builtin_vsx_vperm_4si, __builtin_vsx_vperm_4si_uns, +; __builtin_vsx_vperm_8hi, __builtin_altivec_vperm_8hi_uns +; are all covered by the overloaded vec_perm built-in which uses the +; __builtin_altivec_* built-in definitions. const vsc __builtin_vsx_vperm_16qi (vsc, vsc, vuc); VPERM_16QI_X altivec_vperm_v16qi {} -- 2.43.0
[PATCH 08/11] rs6000, add tests and documentation for various, built-ins
GCC maintainers: The patch adds documentation a number of built-ins. The patch has been tested on Power 10 with no regressions. Please let me know if this patch is acceptable for mainline. Thanks. Carl rs6000, add tests and documentation for various built-ins This patch adds a test case and documentation in extend.texi for the following built-ins: __builtin_altivec_fix_sfsi __builtin_altivec_fixuns_sfsi __builtin_altivec_float_sisf __builtin_altivec_uns_float_sisf __builtin_altivec_vrsqrtfp __builtin_altivec_mask_for_load __builtin_altivec_vsel_1ti __builtin_altivec_vsel_1ti_uns __builtin_vec_init_v16qi __builtin_vec_init_v4sf __builtin_vec_init_v4si __builtin_vec_init_v8hi __builtin_vec_set_v16qi __builtin_vec_set_v4sf __builtin_vec_set_v4si __builtin_vec_set_v8hi gcc/ChangeLog: * doc/extend.texi (__builtin_altivec_fix_sfsi, __builtin_altivec_fixuns_sfsi, __builtin_altivec_float_sisf, __builtin_altivec_uns_float_sisf, __builtin_altivec_vrsqrtfp, __builtin_altivec_mask_for_load, __builtin_altivec_vsel_1ti, __builtin_altivec_vsel_1ti_uns, __builtin_vec_init_v16qi, __builtin_vec_init_v4sf, __builtin_vec_init_v4si, __builtin_vec_init_v8hi, __builtin_vec_set_v16qi, __builtin_vec_set_v4sf, __builtin_vec_set_v4si, __builtin_vec_set_v8hi): Add documentation. gcc/testsuite/ChangeLog: * gcc.target/powerpc/altivec-38.c: New test case. --- gcc/doc/extend.texi | 98 gcc/testsuite/gcc.target/powerpc/altivec-38.c | 503 ++ 2 files changed, 601 insertions(+) create mode 100644 gcc/testsuite/gcc.target/powerpc/altivec-38.c diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index 87fd30bfa9e..89d0a1f77b0 100644 --- a/gcc/doc/extend.texi +++ b/gcc/doc/extend.texi @@ -22678,6 +22678,104 @@ if the VSX instruction set is available. The @samp{vec_vsx_ld} and @samp{LXVW4X}, @samp{STXVD2X}, and @samp{STXVW4X} instructions. +@smallexample +vector signed int __builtin_altivec_fix_sfsi (vector float); +vector signed int __builtin_altivec_fixuns_sfsi (vector float); +vector float __builtin_altivec_float_sisf (vector int); +vector float __builtin_altivec_uns_float_sisf (vector int); +vector float __builtin_altivec_vrsqrtfp (vector float); +@end smallexample + +The @code{__builtin_altivec_fix_sfsi} converts a vector of single precision +floating point values to a vector of signed integers with round to zero. + +The @code{__builtin_altivec_fixuns_sfsi} converts a vector of single precision +floating point values to a vector of unsigned integers with round to zero. If +the rounded floating point value is less then 0 the result is 0 and VXCVI +is set to 1. + +The @code{__builtin_altivec_float_sisf} converts a vector of single precision +signed integers to a vector of floating point values using the rounding mode +specified by RN. + +The @code{__builtin_altivec_uns_float_sisf} converts a vector of single +precision unsigned integers to a vector of floating point values using the +rounding mode specified by RN. + +The @code{__builtin_altivec_vrsqrtfp} returns a vector of floating point +estimates of the reciprical square root of each floating point source vector +element. + +@smallexample +vector signed char test_altivec_mask_for_load (const void *); +@end smallexample + +The @code{__builtin_altivec_vrsqrtfp} returns a vector mask based on the +bottom four bits of the argument. Let X be the 32-byte value: +0x00 || 0x01 || 0x02 || ... || 0x1D || 0x1E || 0x1F. +Bytes sh to sh+15 are returned where sh is given by the least significant 4 +bit of the argument. See description of lvsl, lvsr instructions. + +@smallexample +vector signed __int128 __builtin_altivec_vsel_1ti (vector signed __int128, + vector signed __int128, + vector unsigned __int128); +vector unsigned __int128 + __builtin_altivec_vsel_1ti_uns (vector unsigned __int128, + vector unsigned __int128, + vector unsigned __int128) +@end smallexample + +Let the arguments of @code{__builtin_altivec_vsel_1ti} and +@code{__builtin_altivec_vsel_1ti_uns} be src1, src2, mask. The result is +given by (src1 & ~mask) | (src2 & mask). + +@smallexample +vector signed char +__builtin_vec_init_v16qi (signed char, signed char, signed char, signed char, + signed char, signed char, signed char, signed char, + signed char, signed char, signed char, signed char, + signed char, signed char, signed char, signed char); + +vector short int __builtin_vec_init_v8hi (short int, short int, short int, + short int, short int, short int, + short int, short int);
[PATCH 03/11] rs6000, remove duplicated built-ins
GCC maintainers: There are a number of undocumented built-ins that are duplicates of other documented built-ins. This patch removes the duplicates so users will only use the documented built-in. The patch has been tested on Power 10 with no regressions. Please let me know if this patch is acceptable for mainline. Thanks. Carl - rs6000, remove duplicated built-ins The following undocumented built-ins are same as existing documented overloaded builtins. const vf __builtin_vsx_xxmrghw (vf, vf); same as vf __builtin_vec_mergeh (vf, vf); (overloaded vec_mergeh) const vsi __builtin_vsx_xxmrghw_4si (vsi, vsi); same as vsi __builtin_vec_mergeh (vsi, vsi); (overloaded vec_mergeh) const vf __builtin_vsx_xxmrglw (vf, vf); same as vf __builtin_vec_mergel (vf, vf); (overloaded vec_mergel) const vsi __builtin_vsx_xxmrglw_4si (vsi, vsi); same as vsi __builtin_vec_mergel (vsi, vsi); (overloaded vec_mergel) const vsc __builtin_vsx_xxsel_16qi (vsc, vsc, vsc); same as vsc __builtin_vec_sel (vsc, vsc, vuc); (overloaded vec_sel) const vuc __builtin_vsx_xxsel_16qi_uns (vuc, vuc, vuc); same as vuc __builtin_vec_sel (vuc, vuc, vuc); (overloaded vec_sel) const vd __builtin_vsx_xxsel_2df (vd, vd, vd); same as vd __builtin_vec_sel (vd, vd, vull); (overloaded vec_sel) const vsll __builtin_vsx_xxsel_2di (vsll, vsll, vsll); same as vsll __builtin_vec_sel (vsll, vsll, vsll); (overloaded vec_sel) const vull __builtin_vsx_xxsel_2di_uns (vull, vull, vull); same as vull __builtin_vec_sel (vull, vull, vsll); (overloaded vec_sel) const vf __builtin_vsx_xxsel_4sf (vf, vf, vf); same as vf __builtin_vec_sel (vf, vf, vsi) (overloaded vec_sel) const vsi __builtin_vsx_xxsel_4si (vsi, vsi, vsi); same as vsi __builtin_vec_sel (vsi, vsi, vbi); (overloaded vec_sel) const vui __builtin_vsx_xxsel_4si_uns (vui, vui, vui); same as vui __builtin_vec_sel (vui, vui, vui); (overloaded vec_sel) const vss __builtin_vsx_xxsel_8hi (vss, vss, vss); same as vss __builtin_vec_sel (vss, vss, vbs); (overloaded vec_sel) const vus __builtin_vsx_xxsel_8hi_uns (vus, vus, vus); same as vus __builtin_vec_sel (vus, vus, vus); (overloaded vec_sel) This patch removed the duplicate built-in definitions so only the documented built-ins will be available for use. The case statements in rs6000_gimple_fold_builtin that ar no longer needed are also removed. gcc/ChangeLog: * config/rs6000/rs6000-builtins.def (__builtin_vsx_xxmrghw, __builtin_vsx_xxmrghw_4si, __builtin_vsx_xxmrglw, __builtin_vsx_xxmrglw_4si, __builtin_vsx_xxsel_16qi, __builtin_vsx_xxsel_16qi_uns, __builtin_vsx_xxsel_2df, __builtin_vsx_xxsel_2di, __builtin_vsx_xxsel_2di_uns, __builtin_vsx_xxsel_4sf, __builtin_vsx_xxsel_4si, __builtin_vsx_xxsel_4si_uns, __builtin_vsx_xxsel_8hi, __builtin_vsx_xxsel_8hi_uns): Removed built-in definition. * config/rs6000/rs6000-builtin.cc (rs6000_gimple_fold_builtin): remove case entries RS6000_BIF_XXMRGLW_4SI, RS6000_BIF_XXMRGLW_4SF, RS6000_BIF_XXMRGHW_4SI, RS6000_BIF_XXMRGHW_4SF. gcc/testsuite/ChangeLog: * gcc.target/powerpc/vsx-builtin-3.c (__builtin_vsx_xxsel_4si, __builtin_vsx_xxsel_8hi, __builtin_vsx_xxsel_16qi, __builtin_vsx_xxsel_4sf, __builtin_vsx_xxsel_2df): Remove test cases for removed built-ins. --- gcc/config/rs6000/rs6000-builtin.cc | 4 -- gcc/config/rs6000/rs6000-builtins.def | 42 --- .../gcc.target/powerpc/vsx-builtin-3.c| 6 --- 3 files changed, 52 deletions(-) diff --git a/gcc/config/rs6000/rs6000-builtin.cc b/gcc/config/rs6000/rs6000-builtin.cc index 6698274031b..e436cbe4935 100644 --- a/gcc/config/rs6000/rs6000-builtin.cc +++ b/gcc/config/rs6000/rs6000-builtin.cc @@ -2110,20 +2110,16 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi) /* vec_mergel (integrals). */ case RS6000_BIF_VMRGLH: case RS6000_BIF_VMRGLW: -case RS6000_BIF_XXMRGLW_4SI: case RS6000_BIF_VMRGLB: case RS6000_BIF_VEC_MERGEL_V2DI: -case RS6000_BIF_XXMRGLW_4SF: case RS6000_BIF_VEC_MERGEL_V2DF: fold_mergehl_helper (gsi, stmt, 1); return true; /* vec_mergeh (integrals). */ case RS6000_BIF_VMRGHH: case RS6000_BIF_VMRGHW: -case RS6000_BIF_XXMRGHW_4SI: case RS6000_BIF_VMRGHB: case RS6000_BIF_VEC_MERGEH_V2DI: -case RS6000_BIF_XXMRGHW_4SF: case RS6000_BIF_VEC_MERGEH_V2DF: fold_mergehl_helper (gsi, stmt, 0); return true; diff --git a/gcc/config/rs6000/rs6000-builtins.def b/gcc/config/rs6000/rs6000-builtins.def index fd316f629e5..96d095da2cb 100644 --- a/gcc/config/rs6000/rs6000-builtins.def +++ b/gcc/config/rs6000/rs6000-builtins.def @@ -1925,18 +1925,6 @@ const signed int __builtin_vsx_xvtsqrtsp_fg (vf); XVTSQRTSP_FG vsx_tsqrtv4sf2_fg {}
[PATCH 07/11] rs6000, __builtin_vsx_xvcmpeq[sp, dp, sp_p] add, documentation and test case
GCC maintainers: The patch adds documentation and test case for the __builtin_vsx_xvcmpeq[sp, dp, sp_p] built-ins. The patch has been tested on Power 10 with no regressions. Please let me know if this patch is acceptable for mainline. Thanks. Carl rs6000, __builtin_vsx_xvcmpeq[sp, dp, sp_p] add documentation and test case Add a test case for the __builtin_vsx_xvcmpeqsp_p built-in. Add documentation for the __builtin_vsx_xvcmpeqsp_p, __builtin_vsx_xvcmpeqdp, and __builtin_vsx_xvcmpeqsp builtins. gcc/ChangeLog: * doc/extend.texi (__builtin_vsx_xvcmpeqsp_p, __builtin_vsx_xvcmpeqdp, __builtin_vsx_xvcmpeqsp): Add documentation. gcc/testsuite/ChangeLog: * gcc.target/powerpc/vsx-builtin-runnable-4.c: New test case. --- gcc/doc/extend.texi | 23 +++ .../powerpc/vsx-builtin-runnable-4.c | 135 ++ 2 files changed, 158 insertions(+) create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-builtin-runnable-4.c diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index 22f67ebab31..87fd30bfa9e 100644 --- a/gcc/doc/extend.texi +++ b/gcc/doc/extend.texi @@ -22700,6 +22700,18 @@ vectors of their defined type. The corresponding result element is set to all ones if the two argument elements are less than or equal and all zeros otherwise. +@smallexample +const vf __builtin_vsx_xvcmpeqsp (vf, vf); +const vd __builtin_vsx_xvcmpeqdp (vd, vd); +@end smallexample + +The builti-ins @code{__builtin_vsx_xvcmpeqdp} and +@code{__builtin_vsx_xvcmpeqdp} compare two floating point vectors and return +a vector. If the corresponding elements are equal then the corresponding +vector element of the result is set to all ones, it is set to all zeros +otherwise. + + @node PowerPC AltiVec Built-in Functions Available on ISA 2.07 @subsubsection PowerPC AltiVec Built-in Functions Available on ISA 2.07 @@ -23989,6 +24001,17 @@ is larger than 128 bits, the result is undefined. The result is the modulo result of dividing the first input by the second input. +@smallexample +const signed int __builtin_vsx_xvcmpeqdp_p (signed int, vd, vd); +@end smallexample + +The first argument of the builti-in @code{__builtin_vsx_xvcmpeqdp_p} is an +integer in the range of 0 to 1. The second and third arguments are floating +point vectors to be compared. The result is 1 if the first argument is a 1 +and one or more of the corresponding vector elements are equal. The result is +1 if the first argument is 0 and all of the corresponding vector elements are +not equal. The result is zero otherwise. + The following builtins perform 128-bit vector comparisons. The @code{vec_all_xx}, @code{vec_any_xx}, and @code{vec_cmpxx}, where @code{xx} is one of the operations @code{eq, ne, gt, lt, ge, le} perform pairwise diff --git a/gcc/testsuite/gcc.target/powerpc/vsx-builtin-runnable-4.c b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-runnable-4.c new file mode 100644 index 000..8ac07c7c807 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-runnable-4.c @@ -0,0 +1,135 @@ +/* { dg-do run { target { power10_hw } } } */ +/* { dg-do link { target { ! power10_hw } } } */ +/* { dg-options "-mdejagnu-cpu=power10 -O2 -save-temps" } */ +/* { dg-require-effective-target power10_ok } */ + +#define DEBUG 0 + +#if DEBUG +#include +#include +#endif + +void abort (void); + +int main () +{ + int i; + int result; + vector float vf_arg1, vf_arg2; + vector double d_arg1, d_arg2; + + /* Compare vectors with one equal element, check + for all elements unequal, i.e. first arg is 1. */ + vf_arg1 = (vector float) {1.0, 2.0, 3.0, 4.0}; + vf_arg2 = (vector float) {1.0, 3.0, 2.0, 8.0}; + result = __builtin_vsx_xvcmpeqsp_p (1, vf_arg1, vf_arg2); + +#if DEBUG + printf("result = 0x%x\n", (unsigned int) result); +#endif + + if (result != 1) +for (i = 0; i < 4; i++) +#if DEBUG + printf("ERROR, __builtin_vsx_xvcmpeqsp_p 1: arg 1 = 1, varg3[%d] = %f, varg3[%d] = %f\n", +i, vf_arg1[i], i, vf_arg2[i]); +#else + abort(); +#endif + /* Compare vectors with one equal element, check + for all elements unequal, i.e. first arg is 0. */ + vf_arg1 = (vector float) {1.0, 2.0, 3.0, 4.0}; + vf_arg2 = (vector float) {1.0, 3.0, 2.0, 8.0}; + result = __builtin_vsx_xvcmpeqsp_p (0, vf_arg1, vf_arg2); + +#if DEBUG + printf("result = 0x%x\n", (unsigned int) result); +#endif + + if (result != 0) +for (i = 0; i < 4; i++) +#if DEBUG + printf("ERROR, __builtin_vsx_xvcmpeqsp_p 2: arg 1 = 0, varg3[%d] = %f, varg3[%d] = %f\n", +i, vf_arg1[i], i, vf_arg2[i]); +#else + abort(); +#endif + + /* Compare vectors with all unequal elements, check + for all elements unequal, i.e. first arg is 1. */ + vf_arg1 = (vector float) {1.0, 2.0, 3.0, 4.0}; + vf_arg2 = (vector float) {8.0, 3.0, 2.0, 8.0}; + result = __builtin_vsx_xvcmpeqsp_p (1
[PATCH 09/11] rs6000, add test cases for the vec_cmpne built-ins
GCC maintainers: The patch adds test cases for the vec_cmpne of built-ins. The patch has been tested on Power 10 with no regressions. Please let me know if this patch is acceptable for mainline. Thanks. Carl rs6000, add test cases for the vec_cmpne built-ins Add test cases for the signed int, unsigned it, signed short, unsigned short, signed char and unsigned char built-ins. Note, the built-ins are documented in the Power Vector Instrinsic Programing reference manual. gcc/testsuite/ChangeLog: * gcc.target/powerpc/vec-cmple.c: New test case. * gcc.target/powerpc/vec-cmple.h: New test case include file. --- gcc/testsuite/gcc.target/powerpc/vec-cmple.c | 35 gcc/testsuite/gcc.target/powerpc/vec-cmple.h | 84 2 files changed, 119 insertions(+) create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-cmple.c create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-cmple.h diff --git a/gcc/testsuite/gcc.target/powerpc/vec-cmple.c b/gcc/testsuite/gcc.target/powerpc/vec-cmple.c new file mode 100644 index 000..766a1c770e2 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/vec-cmple.c @@ -0,0 +1,35 @@ +/* { dg-do run } */ +/* { dg-require-effective-target powerpc_altivec_ok } */ +/* { dg-options "-maltivec -O2" } */ + +/* Test that the vec_cmpne builtin generates the expected Altivec + instructions. */ + +#include "vec-cmple.h" + +int main () +{ + /* Note macro expansions for "signed long long int" and + "unsigned long long int" do not work for the vec_vsx_ld builtin. */ + define_test_functions (int, signed int, signed int, si); + define_test_functions (int, unsigned int, unsigned int, ui); + define_test_functions (short, signed short, signed short, ss); + define_test_functions (short, unsigned short, unsigned short, us); + define_test_functions (char, signed char, signed char, sc); + define_test_functions (char, unsigned char, unsigned char, uc); + + define_init_verify_functions (int, signed int, signed int, si); + define_init_verify_functions (int, unsigned int, unsigned int, ui); + define_init_verify_functions (short, signed short, signed short, ss); + define_init_verify_functions (short, unsigned short, unsigned short, us); + define_init_verify_functions (char, signed char, signed char, sc); + define_init_verify_functions (char, unsigned char, unsigned char, uc); + + execute_test_functions (int, signed int, signed int, si); + execute_test_functions (int, unsigned int, unsigned int, ui); + execute_test_functions (short, signed short, signed short, ss); + execute_test_functions (short, unsigned short, unsigned short, us); + execute_test_functions (char, signed char, signed char, sc); + execute_test_functions (char, unsigned char, unsigned char, uc); + return 0; +} diff --git a/gcc/testsuite/gcc.target/powerpc/vec-cmple.h b/gcc/testsuite/gcc.target/powerpc/vec-cmple.h new file mode 100644 index 000..4126706b99a --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/vec-cmple.h @@ -0,0 +1,84 @@ +#include "altivec.h" + +#define N 4096 + +#include +void abort (); + +#define PRAGMA(X) _Pragma (#X) +#define UNROLL0 PRAGMA (GCC unroll 0) + +#define define_test_functions(VBTYPE, RTYPE, STYPE, NAME) \ +\ +RTYPE result_le_##NAME[N] __attribute__((aligned(16))); \ +STYPE operand1_##NAME[N] __attribute__((aligned(16))); \ +STYPE operand2_##NAME[N] __attribute__((aligned(16))); \ +RTYPE expected_##NAME[N] __attribute__((aligned(16))); \ +\ +__attribute__((noinline)) void vector_tests_##NAME () \ +{ \ + vector STYPE v1_##NAME, v2_##NAME; \ + vector bool VBTYPE tmp_##NAME; \ + int i; \ + UNROLL0 \ + for (i = 0; i < N; i+=16/sizeof (STYPE)) \ +{ \ + /* result_le = operand1!=operand2. */ \ + v1_##NAME = vec_vsx_ld (0, (const vector STYPE*)&operand1_##NAME[i]); \ + v2_##NAME = vec_vsx_ld (0, (const vector STYPE*)&operand2_##NAME[i]); \ +\ + tmp_##NAME = vec_cmple (v1_##NAME, v2_##NAME); \ + vec_vsx_st (tmp_##NAME, 0, &result_le_##NAME[i]); \ +} \ +} + +#define define_init_verify_functions(VBTYPE, RTYPE, STYPE, NAME) \ +__attribute__((noinline)) void init_##NAME () \ +{ \ + int i; \ + for (i = 0; i < N; ++i) \ +{ \ + result_le_##NAME[i] = 7; \ + if (i%3 == 0) \ + { \ + /* op1 < op2. */ \ + operand1_##NAME[i] = 1; \ + operand2_##NAME[i] = 2; \ + } \ + else if (i%3 == 1) \ + { \ + /* op1 > op2. */ \ + operand1_##NAME[i] = 2; \ + operand2_##NAME[i] = 1; \ + } \ + else if (i%3 == 2) \ + { \ + /* op1 == op2. */ \ + operand1_##NAME[i] = 3; \ + operand2_##NAME[i] = 3; \ + } \ + /* For vector comparisons: "For each element of the result_le, the \ + value of each bit is 1 if the corresponding elements of ARG1 and \ + ARG2 are equal." {or whatever the
PATCH 10/11] rs6000, add test cases for __builtin_vec_init* and, __builtin_vec_set*
GCC maintainers: The patch adds test cases for the __builtin_vec_init* and __builtin_vec_set* built-ins. The patch has been tested on Power 10 with no regressions. Please let me know if this patch is acceptable for mainline. Thanks. Carl rs6000, add test cases for __builtin_vec_init* and __builtin_vec_set* Add test cases for the following built-ins: __builtin_vec_init_v1ti __builtin_vec_init_v2df __builtin_vec_init_v2di __builtin_vec_set_v1ti __builtin_vec_set_v2df __builtin_vec_set_v2di Note, the above built-ins are documented in extend.texi. gcc/testsuite/ChangeLog: * gcc.target/powerpc/vsx-builtin-21.c: New test file. --- .../gcc.target/powerpc/vsx-builtin-21.c | 181 ++ 1 file changed, 181 insertions(+) create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-builtin-21.c diff --git a/gcc/testsuite/gcc.target/powerpc/vsx-builtin-21.c b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-21.c new file mode 100644 index 000..b7e1201f37e --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-21.c @@ -0,0 +1,181 @@ +/* { dg-do run { target int128 } } */ +/* { dg-require-effective-target vsx_hw } */ +/* { dg-options "-mvsx" } */ + +/* This test should run the same on any target that supports vsx + instructions. Intentionally not specifying cpu in order to test + all code generation paths. */ + +#define DEBUG 0 + +#include + +#if DEBUG +#include +#include + +void print_i128 (__int128_t val) +{ + printf(" %lld %llu (0x%llx %llx)", +(signed long long)(val >> 64), +(unsigned long long)(val & 0x), +(unsigned long long)(val >> 64), +(unsigned long long)(val & 0x)); +} +#endif + +void abort (void); + +void test_vec_init_v1ti (__int128_t ti_arg, +vector __int128_t v1ti_expected_result) +{ + vector __int128_t v1ti_result; + + v1ti_result = __builtin_vec_init_v1ti (ti_arg); + if (v1ti_result[0] != v1ti_expected_result[0]) +{ +#if DEBUG + printf ("test_vec_init_v1ti: v1ti_result[0] = "); + print_i128 (v1ti_result[0]); + printf( "vf_expected_result[0] = "); + print_i128 (v1ti_expected_result[0]); + printf("\n"); +#else + abort(); +#endif +} +} + +void test_vec_init_v2df (double d_arg1, double d_arg2, +vector double v2df_expected_result) +{ + vector double v2df_result; + int i; + + v2df_result = __builtin_vec_init_v2df (d_arg1, d_arg2); + + for ( i= 0; i < 2; i++) +if (v2df_result[i] != v2df_expected_result[i]) +#if DEBUG + printf ("test_vec_init_v2df: v2df_result[%d] = %f, v2df_expected_result[%d] = %f\n", + i, v2df_result[i], i, v2df_expected_result[i]); +#else + abort(); +#endif +} + +void test_vec_init_v2di (signed long long sl_arg1, signed long long sl_arg2, +vector signed long long v2di_expected_result) +{ + vector signed long long v2di_result; + int i; + + v2di_result = __builtin_vec_init_v2di (sl_arg1, sl_arg2); + + for ( i= 0; i < 2; i++) +if (v2di_result[i] != v2di_expected_result[i]) +#if DEBUG + printf ("test_vec_init_v2di: v2di_result[%d] = %lld, v2df_expected_result[%d] = %lld\n", + i, v2di_result[i], i, v2di_expected_result[i]); +#else + abort(); +#endif +} + +void test_vec_set_v1ti (vector __int128_t v1ti_arg, __int128_t ti_arg, + vector __int128_t v1ti_expected_result) +{ + vector __int128_t v1ti_result; + + v1ti_result = __builtin_vec_set_v1ti (v1ti_arg, ti_arg, 0); + if (v1ti_result[0] != v1ti_expected_result[0]) +{ +#if DEBUG + printf ("test_vec_set_v1ti: v1ti_result[0] = "); + print_i128 (v1ti_result[0]); + printf( "vf_expected_result[0] = "); + print_i128 (v1ti_expected_result[0]); + printf("\n"); +#else + abort(); +#endif +} +} + +void test_vec_set_v2df (vector double v2df_arg, double d_arg, + vector double v2df_expected_result) +{ + vector double v2df_result; + int i; + + v2df_result = __builtin_vec_set_v2df (v2df_arg, d_arg, 0); + + for ( i= 0; i < 2; i++) +if (v2df_result[i] != v2df_expected_result[i]) +#if DEBUG + printf ("test_vec_set_v2df: v2df_result[%d] = %f, v2df_expected_result[%d] = %f\n", + i, v2df_result[i], i, v2df_expected_result[i]); +#else + abort(); +#endif +} + +void test_vec_set_v2di (vector signed long long v2di_arg, signed long long sl_arg, + vector signed long long v2di_expected_result) +{ + vector signed long long v2di_result; + int i; + + v2di_result = __builtin_vec_set_v2di (v2di_arg, sl_arg, 1); + + for ( i= 0; i < 2; i++) +if (v2di_result[i] != v2di_expected_result[i]) +#if DEBUG + printf ("test_vec_set_v2di: v2di_result[%d] = %lld, v2df_expected_result[%d] = %lld\n", + i, v2di_result[i], i, v2di_expected_result[
PATCH 11/11] rs6000, make test vec-cmpne.c a runnable test
GCC maintainers: The patch changes the vec-cmpne.c from a compile only test to a runnable test. The macros to create the functions needed to test the built-ins and verify the restults are all there in the include file. The .c file just needed to have the macro definitions inserted and change the header from compile to run. The test can now do functional verification of the results in addition to verifying the expected instructions are generated. The patch has been tested on Power 10 with no regressions. Please let me know if this patch is acceptable for mainline. Thanks. Carl rs6000, make test vec-cmpne.c a runnable test The macros in vec-cmpne.h define test functions. They also setup test value functions, verification functions and execute test functions. The test is setup as a compile only test so none of the verification and execute functions are being used. The patch adds the macro definitions to create the intialization, verfiy and execute functions to a main program so not only can the test verify the correct instructions are generated but also run the tests and verify the results. The test is then changed from a compile to a run test. gcc/testsuite/ChangeLog: * gcc.target/powerpc/vec-cmple.c (main): Add main function with macro calls to define the test functions, create the verify functions and execute functions. Update scan-assembler-times (vcmpequ): Updated count to include instructions used to generate expected test results. * gcc.target/powerpc/vec-cmple.h (vector_tests_##NAME): Remove line continuation after closing bracket. Remove extra blank line. --- gcc/testsuite/gcc.target/powerpc/vec-cmpne.c | 41 +++- gcc/testsuite/gcc.target/powerpc/vec-cmpne.h | 3 +- 2 files changed, 32 insertions(+), 12 deletions(-) diff --git a/gcc/testsuite/gcc.target/powerpc/vec-cmpne.c b/gcc/testsuite/gcc.target/powerpc/vec-cmpne.c index b57e0ac8638..2c369976a44 100644 --- a/gcc/testsuite/gcc.target/powerpc/vec-cmpne.c +++ b/gcc/testsuite/gcc.target/powerpc/vec-cmpne.c @@ -1,20 +1,41 @@ -/* { dg-do compile } */ +/* { dg-do run } */ /* { dg-require-effective-target powerpc_altivec_ok } */ -/* { dg-options "-maltivec -O2" } */ +/* { dg-options "-maltivec -O2 -save-temps" } */ /* Test that the vec_cmpne builtin generates the expected Altivec instructions. */ #include "vec-cmpne.h" -define_test_functions (int, signed int, signed int, si); -define_test_functions (int, unsigned int, unsigned int, ui); -define_test_functions (short, signed short, signed short, ss); -define_test_functions (short, unsigned short, unsigned short, us); -define_test_functions (char, signed char, signed char, sc); -define_test_functions (char, unsigned char, unsigned char, uc); -define_test_functions (int, signed int, float, ff); +int main () +{ + define_test_functions (int, signed int, signed int, si); + define_test_functions (int, unsigned int, unsigned int, ui); + define_test_functions (short, signed short, signed short, ss); + define_test_functions (short, unsigned short, unsigned short, us); + define_test_functions (char, signed char, signed char, sc); + define_test_functions (char, unsigned char, unsigned char, uc); + define_test_functions (int, signed int, float, ff); + + define_init_verify_functions (int, signed int, signed int, si); + define_init_verify_functions (int, unsigned int, unsigned int, ui); + define_init_verify_functions (short, signed short, signed short, ss); + define_init_verify_functions (short, unsigned short, unsigned short, us); + define_init_verify_functions (char, signed char, signed char, sc); + define_init_verify_functions (char, unsigned char, unsigned char, uc); + define_init_verify_functions (int, signed int, float, ff); + + execute_test_functions (int, signed int, signed int, si); + execute_test_functions (int, unsigned int, unsigned int, ui); + execute_test_functions (short, signed short, signed short, ss); + execute_test_functions (short, unsigned short, unsigned short, us); + execute_test_functions (char, signed char, signed char, sc); + execute_test_functions (char, unsigned char, unsigned char, uc); + execute_test_functions (int, signed int, float, ff); + + return 0; +} /* { dg-final { scan-assembler-times {\mvcmpequb\M} 2 } } */ /* { dg-final { scan-assembler-times {\mvcmpequh\M} 2 } } */ -/* { dg-final { scan-assembler-times {\mvcmpequw\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mvcmpequw\M} 32 } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/vec-cmpne.h b/gcc/testsuite/gcc.target/powerpc/vec-cmpne.h index a304de01d86..374cca360b3 100644 --- a/gcc/testsuite/gcc.target/powerpc/vec-cmpne.h +++ b/gcc/testsuite/gcc.target/powerpc/vec-cmpne.h @@ -33,7 +33,7 @@ __attribute__((noinline)) void vector_tests_##NAME () \ tmp_##NAME = vec_cmpne (v1_##NAME, v2_##NAME); \
Re: [PATCH 01/11] rs6000, Fix __builtin_vsx_cmple* args and documentation, builtins
Kewen: Thanks for the review. From the review, it looks like a few of the built-ins just need to be replaced with an overloaded version of an existing PVPIR documented buit-in. Most of the rest can just be removed. I will work on redoing the patch set accordingly. We can then look at the new patch set after stage 4 is over. Carl On 2/20/24 09:55, Carl Love wrote: > > GCC maintainers: > > This patch fixes the arguments and return type for the various > __builtin_vsx_cmple* built-ins. They were defined as signed but should have > been defined as unsigned. > > The patch has been tested on Power 10 with no regressions. > > Please let me know if this patch is acceptable for mainline. Thanks. > > Carl > > - > > rs6000, Fix __builtin_vsx_cmple* args and documentation, builtins > > The built-ins __builtin_vsx_cmple_u16qi, __builtin_vsx_cmple_u2di, > __builtin_vsx_cmple_u4si and __builtin_vsx_cmple_u8hi should take > unsigned arguments and return an unsigned result. This patch changes > the arguments and return type from signed to unsigned. > > The documentation for the signed and unsigned versions of > __builtin_vsx_cmple is missing from extend.texi. This patch adds the > missing documentation. > > Test cases are added for each of the signed and unsigned built-ins. > > gcc/ChangeLog: > * config/rs6000/rs6000-builtins.def (__builtin_vsx_cmple_u16qi, > __builtin_vsx_cmple_u2di, __builtin_vsx_cmple_u4si): Change > arguments and return from signed to unsigned. > * doc/extend.texi (__builtin_vsx_cmple_16qi, > __builtin_vsx_cmple_8hi, __builtin_vsx_cmple_4si, > __builtin_vsx_cmple_u16qi, __builtin_vsx_cmple_u8hi, > __builtin_vsx_cmple_u4si): Add documentation. > > gcc/testsuite/ChangeLog: > * gcc.target/powerpc/vsx-cmple.c: New test file. > --- > gcc/config/rs6000/rs6000-builtins.def| 10 +- > gcc/doc/extend.texi | 23 > gcc/testsuite/gcc.target/powerpc/vsx-cmple.c | 127 +++ > 3 files changed, 155 insertions(+), 5 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-cmple.c > > diff --git a/gcc/config/rs6000/rs6000-builtins.def > b/gcc/config/rs6000/rs6000-builtins.def > index 3bc7fed6956..d66a53a0fab 100644 > --- a/gcc/config/rs6000/rs6000-builtins.def > +++ b/gcc/config/rs6000/rs6000-builtins.def > @@ -1349,16 +1349,16 @@ >const vss __builtin_vsx_cmple_8hi (vss, vss); > CMPLE_8HI vector_ngtv8hi {} > > - const vsc __builtin_vsx_cmple_u16qi (vsc, vsc); > + const vuc __builtin_vsx_cmple_u16qi (vuc, vuc); > CMPLE_U16QI vector_ngtuv16qi {} > > - const vsll __builtin_vsx_cmple_u2di (vsll, vsll); > + const vull __builtin_vsx_cmple_u2di (vull, vull); > CMPLE_U2DI vector_ngtuv2di {} > > - const vsi __builtin_vsx_cmple_u4si (vsi, vsi); > + const vui __builtin_vsx_cmple_u4si (vui, vui); > CMPLE_U4SI vector_ngtuv4si {} > > - const vss __builtin_vsx_cmple_u8hi (vss, vss); > + const vus __builtin_vsx_cmple_u8hi (vus, vus); > CMPLE_U8HI vector_ngtuv8hi {} > >const vd __builtin_vsx_concat_2df (double, double); > @@ -1769,7 +1769,7 @@ >const vf __builtin_vsx_xvcvuxdsp (vull); > XVCVUXDSP vsx_xvcvuxdsp {} > > - const vd __builtin_vsx_xvcvuxwdp (vsi); > + const vd __builtin_vsx_xvcvuxwdp (vui); > XVCVUXWDP vsx_xvcvuxwdp {} > >const vf __builtin_vsx_xvcvuxwsp (vsi); > diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi > index 2b8ba1949bf..4d8610f6aa8 100644 > --- a/gcc/doc/extend.texi > +++ b/gcc/doc/extend.texi > @@ -22522,6 +22522,29 @@ if the VSX instruction set is available. The > @samp{vec_vsx_ld} and > @samp{vec_vsx_st} built-in functions always generate the VSX @samp{LXVD2X}, > @samp{LXVW4X}, @samp{STXVD2X}, and @samp{STXVW4X} instructions. > > + > +@smallexample > +vector signed char __builtin_vsx_cmple_16qi (vector signed char, > + vector signed char); > +vector signed short __builtin_vsx_cmple_8hi (vector signed short, > + vector signed short); > +vector signed int __builtin_vsx_cmple_4si (vector signed int, > + vector signed int); > +vector unsigned char __builtin_vsx_cmple_u16qi (vector unsigned char, > +vector unsigned char); > +vector unsigned short __builtin_vsx_cmple_u8hi (vector unsigned short, > +vector unsigned short); > +vector unsigned i
Re: [PATCH] rs6000, update vec_ld, vec_lde, vec_st and vec_ste, documentation
On 7/3/24 2:36 AM, Kewen.Lin wrote: Hi Carl, on 2024/6/27 01:05, Carl Love wrote: GCC maintainers: The following patch updates the user documentation for the vec_ld, vec_lde, vec_st and vec_ste built-ins to make it clearer that there are data alignment requirements for these built-ins. If the data alignment requirements are not followed, the data loaded or stored by these built-ins will be wrong. Please let me know if this patch is acceptable for mainline. Thanks. Carl rs6000, update vec_ld, vec_lde, vec_st and vec_ste documentation Use of the vec_ld and vec_st built-ins require that the data be 16-byte aligned to work properly. Add some additional text to the existing documentation to make this clearer to the user. Similarly, the vec_lde and vec_ste built-ins also have data alignment requirements based on the size of the vector element. Update the documentation to make this clear to the user. gcc/ChangeLog: * doc/extend.texi: Add clarification for the use of the vec_ld vec_st, vec_lde and vec_ste built-ins. --- gcc/doc/extend.texi | 15 +++ 1 file changed, 11 insertions(+), 4 deletions(-) diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index ee3644a5264..55faded17b9 100644 --- a/gcc/doc/extend.texi +++ b/gcc/doc/extend.texi @@ -22644,10 +22644,17 @@ vector unsigned char vec_xxsldi (vector unsigned char, @end smallexample Note that the @samp{vec_ld} and @samp{vec_st} built-in functions always -generate the AltiVec @samp{LVX} and @samp{STVX} instructions even -if the VSX instruction set is available. The @samp{vec_vsx_ld} and -@samp{vec_vsx_st} built-in functions always generate the VSX @samp{LXVD2X}, -@samp{LXVW4X}, @samp{STXVD2X}, and @samp{STXVW4X} instructions. +generate the AltiVec @samp{LVX}, and @samp{STVX} instructions. The This change removed "even if the VSX instruction set is available.", I think it's not intentional? vec_ld and vec_st are well defined in PVIPR, this paragraph is not to document them IMHO. Since we document vec_vsx_ld and vec_vsx_st here, it aims to note the difference between these two pairs. But I'm not opposed to add more words to emphasis the special masking off, I prefer to use the same words to PVIPR "ignoring the four low-order bits of the calculated address". And IMHO we should not say "it requires the data to be 16-byte aligned to work properly" in case the users are aware of this behavior well and have some no 16-byte aligned data and expect it to behave like that, it's arguable to define "it" as not work properly. Yea, probably should have left "even if the VSX instruction set is available." I was looking to make it clear that if the data is not 16-bye aligned you may not get the expected data loaded/stored. So how about the following instead: Note that the @samp{vec_ld} and @samp{vec_st} built-in functions always generate the AltiVec @samp{LVX}, and @samp{STVX} instructions even if the VSX instruction set is available. The instructions mask off the lower 4-bits of the calculated address. The use of these instructions on data that is not 16-byte aligned may result in unexpected bytes being loaded or stored. +instructions mask off the lower 4 bits of the effective address thus requiring +the data to be 16-byte aligned to work properly. The @samp{vec_lde} and +@samp{vec_ste} built-in functions operate on vectors of bytes, short integer, +integer, and float. The corresponding AltiVec instructions @samp{LVEBX}, +@samp{LVEHX}, @samp{LVEWX}, @samp{STVEBX}, @samp{STVEHX}, @samp{STVEWX} mask +off the lower bits of the effective address based on the size of the data. +Thus the data must be aligned to the size of the vector element to work +properly. The @samp{vec_vsx_ld} and @samp{vec_vsx_st} built-in functions +always generate the VSX @samp{LXVD2X}, @samp{LXVW4X}, @samp{STXVD2X}, and +@samp{STXVW4X} instructions. As above, there was a reason to mention vec_ld and vec_st here, but not one for vec_lde and vec_ste IMHO, so let's not mention vec_lde and vec_ste here and users should read the description in PVIPR instead (it's more recommended). The goal of mentioning the vec_lde and vec_ste built-ins was to give the user a pointer to built-ins that will work as expected on unaligned data. It will probably save them a lot of time an frustration if they are given a hint of what built-ins they should look at. So, how about the following: See the PVIPR description of the vec_lde and vec_ste for loading and storing data that is not 16-byte aligned. Carl
[PATCH 0/13 ver5] rs6000, built-in cleanup patch series
GCC maintainers: The following is the updates to the three patches that have yet to be approved. Patches 1, 3, 5, 6, 8, 9, 10, and 12 were approved in the version 3 or earlier. Patches 7 and 11 from version 4 were approved with minor nits fixed. This leaves patches 2, 4 and 13 still to be approved. Only these unapproved patches are posted in the version 5 series. The goal is to commit the entire series all at once as they are all related. So I a holding off committing the approved patches. Thank you for your time and feedback of these patches. The entire patch series has been tested on Power 10 LE as the changes are fairly minor. Please let me know if the remaining patches are acceptable for mainline. Thanks. Carl
Re: [PATCH 2/13 ver5] rs6000, __builtin_vsx_xvcv{sp{sx,u}ws,dpuxds_uns}
GCC maintainers: Per the comments on patch 2 from version 4, I have moved the removal of built-ins __builtin_vsx_xvcvdpsxws and __builtin_vsx_xvcvdpuxws from patch 4 to this patch. Please let me know if this patch is acceptable. Thanks. Carl rs6000, __builtin_vsx_xvcv{sp{sx,u}ws,dpuxds_uns} The built-in __builtin_vsx_xvcvspsxws is covered by built-in vec_signed built-in that is documented in the PVIPR. The __builtin_vsx_xvcvspsxws built-in is not documented and there are no test cases for it. The built-in __builtin_vsx_xvcvdpuxds_uns is redundant as it is covered by vec_unsigned, remove. The __builtin_vsx_xvcvspuxws is redundant as it is covered by vec_unsigned, remove. The built-in __builtin_vsx_xvcvdpsxws is redundant as it is covered by vec_signed{e,o}, remove. The built-in __builtin_vsx_xvcvdpuxws is redundant as it is covered by vec_unsigned{e,o}, remove. This patch removes the redundant built-ins. gcc/ChangeLog: * config/rs6000/rs6000-builtins.def (__builtin_vsx_xvcvspsxws, __builtin_vsx_xvcvdpuxds_uns, __builtin_vsx_xvcvspuxws, __builtin_vsx_xvcvdpsxws, __builtin_vsx_xvcvdpuxws): Remove built-in definitions. --- gcc/config/rs6000/rs6000-builtins.def | 15 --- 1 file changed, 15 deletions(-) diff --git a/gcc/config/rs6000/rs6000-builtins.def b/gcc/config/rs6000/rs6000-builtins.def index 7c36976a089..60ccc5542be 100644 --- a/gcc/config/rs6000/rs6000-builtins.def +++ b/gcc/config/rs6000/rs6000-builtins.def @@ -1688,36 +1688,21 @@ const vsll __builtin_vsx_xvcvdpsxds_scale (vd, const int); XVCVDPSXDS_SCALE vsx_xvcvdpsxds_scale {} - const vsi __builtin_vsx_xvcvdpsxws (vd); - XVCVDPSXWS vsx_xvcvdpsxws {} - const vsll __builtin_vsx_xvcvdpuxds (vd); XVCVDPUXDS vsx_fixuns_truncv2dfv2di2 {} const vsll __builtin_vsx_xvcvdpuxds_scale (vd, const int); XVCVDPUXDS_SCALE vsx_xvcvdpuxds_scale {} - const vull __builtin_vsx_xvcvdpuxds_uns (vd); - XVCVDPUXDS_UNS vsx_fixuns_truncv2dfv2di2 {} - - const vsi __builtin_vsx_xvcvdpuxws (vd); - XVCVDPUXWS vsx_xvcvdpuxws {} - const vd __builtin_vsx_xvcvspdp (vf); XVCVSPDP vsx_xvcvspdp {} const vsll __builtin_vsx_xvcvspsxds (vf); XVCVSPSXDS vsx_xvcvspsxds {} - const vsi __builtin_vsx_xvcvspsxws (vf); - XVCVSPSXWS vsx_fix_truncv4sfv4si2 {} - const vsll __builtin_vsx_xvcvspuxds (vf); XVCVSPUXDS vsx_xvcvspuxds {} - const vsi __builtin_vsx_xvcvspuxws (vf); - XVCVSPUXWS vsx_fixuns_truncv4sfv4si2 {} - const vd __builtin_vsx_xvcvsxddp (vsll); XVCVSXDDP vsx_floatv2div2df2 {} -- 2.45.0
Re: [PATCH 4/13 ver5] rs6000, extend the current vec_{un, }signed{e, o} built-ins
GCC maintainers: I moved the removal of built-ins __builtin_vsx_xvcvdpsxws and __builtin_vsx_xvcvdpuxws from patch 4 to patch patch 2. I fixed various issues with the ChangeLog wording, spaces and descriptions. Fixed the comments in file gcc/config/rs6000/vsx.md. Updated the built-in description in gcc/doc/extend.texi. Please let me know if the patch is acceptable for mainline. Thanks. Carl rs6000, extend the current vec_{un,}signed{e,o} built-ins The built-ins __builtin_vsx_xvcvspsxds and __builtin_vsx_xvcvspuxds convert a vector of floats to a vector of signed/unsigned long long ints. Extend the existing vec_{un,}signed{e,o} built-ins to handle the argument vector of floats to return a vector of even/odd signed/unsigned integers. The define expands vsignede_v4sf, vsignedo_v4sf, vunsignede_v4sf, vunsignedo_v4sf are added to support the new vec_{un,}signed{e,o} built-ins. The built-ins __builtin_vsx_xvcvspsxds and __builtin_vsx_xvcvspuxds are now for internal use only. They are not documented and they do not have test cases. Add testcases and update documentation. gcc/ChangeLog: (__builtin_vsx_xvcvspsxds, __builtin_vsx_xvcvspuxds): Rename to __builtin_vsignede_v4sf, __builtin_vunsignede_v4sf respectively. (XVCVSPSXDS, XVCVSPUXDS): Rename to VEC_VSIGNEDE_V4SF, VEC_VUNSIGNEDE_V4SF respectively. (__builtin_vsignedo_v4sf, __builtin_vunsignedo_v4sf): New built-in definitions. * config/rs6000/rs6000-overload.def (vec_signede, vec_signedo, vec_unsignede, vec_unsignedo): Add new overloaded specifications. * config/rs6000/vsx.md (vsignede_v4sf, vsignedo_v4sf, vunsignede_v4sf, vunsignedo_v4sf): New define_expands. * doc/extend.texi (vec_signedo, vec_signede, vec_unsignedo, vec_unsignede): Add documentation for new overloaded built-ins to convert vector float to vector {un,}signed long long. gcc/testsuite/ChangeLog: * gcc.target/powerpc/builtins-3-runnable.c (test_unsigned_int_result, test_ll_unsigned_int_result): Add new argument. (vec_signede, vec_signedo, vec_unsignede, vec_unsignedo): New tests for the overloaded built-ins. --- gcc/config/rs6000/rs6000-builtins.def | 14 +++- gcc/config/rs6000/rs6000-overload.def | 8 ++ gcc/config/rs6000/vsx.md | 84 +++ gcc/doc/extend.texi | 10 +++ .../gcc.target/powerpc/builtins-3-runnable.c | 49 +-- 5 files changed, 154 insertions(+), 11 deletions(-) diff --git a/gcc/config/rs6000/rs6000-builtins.def b/gcc/config/rs6000/rs6000-builtins.def index 43d5c229dc3..29a9deb3410 100644 --- a/gcc/config/rs6000/rs6000-builtins.def +++ b/gcc/config/rs6000/rs6000-builtins.def @@ -1697,11 +1697,17 @@ const vd __builtin_vsx_xvcvspdp (vf); XVCVSPDP vsx_xvcvspdp {} - const vsll __builtin_vsx_xvcvspsxds (vf); - XVCVSPSXDS vsx_xvcvspsxds {} + const vsll __builtin_vsignede_v4sf (vf); + VEC_VSIGNEDE_V4SF vsignede_v4sf {} - const vsll __builtin_vsx_xvcvspuxds (vf); - XVCVSPUXDS vsx_xvcvspuxds {} + const vsll __builtin_vsignedo_v4sf (vf); + VEC_VSIGNEDO_V4SF vsignedo_v4sf {} + + const vull __builtin_vunsignede_v4sf (vf); + VEC_VUNSIGNEDE_V4SF vunsignede_v4sf {} + + const vull __builtin_vunsignedo_v4sf (vf); + VEC_VUNSIGNEDO_V4SF vunsignedo_v4sf {} const vd __builtin_vsx_xvcvsxddp (vsll); XVCVSXDDP vsx_floatv2div2df2 {} diff --git a/gcc/config/rs6000/rs6000-overload.def b/gcc/config/rs6000/rs6000-overload.def index 84bd9ae6554..4d857bb1af3 100644 --- a/gcc/config/rs6000/rs6000-overload.def +++ b/gcc/config/rs6000/rs6000-overload.def @@ -3307,10 +3307,14 @@ [VEC_SIGNEDE, vec_signede, __builtin_vec_vsignede] vsi __builtin_vec_vsignede (vd); VEC_VSIGNEDE_V2DF + vsll __builtin_vec_vsignede (vf); + VEC_VSIGNEDE_V4SF [VEC_SIGNEDO, vec_signedo, __builtin_vec_vsignedo] vsi __builtin_vec_vsignedo (vd); VEC_VSIGNEDO_V2DF + vsll __builtin_vec_vsignedo (vf); + VEC_VSIGNEDO_V4SF [VEC_SIGNEXTI, vec_signexti, __builtin_vec_signexti] vsi __builtin_vec_signexti (vsc); @@ -4433,10 +4437,14 @@ [VEC_UNSIGNEDE, vec_unsignede, __builtin_vec_vunsignede] vui __builtin_vec_vunsignede (vd); VEC_VUNSIGNEDE_V2DF + vull __builtin_vec_vunsignede (vf); + VEC_VUNSIGNEDE_V4SF [VEC_UNSIGNEDO, vec_unsignedo, __builtin_vec_vunsignedo] vui __builtin_vec_vunsignedo (vd); VEC_VUNSIGNEDO_V2DF + vull __builtin_vec_vunsignedo (vf); + VEC_VUNSIGNEDO_V4SF [VEC_VEE, vec_extract_exp, __builtin_vec_extract_exp] vui __builtin_vec_extract_exp (vf); diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md index 48ba262f7e4..0f0837a1d43 100644 --- a/gcc/config/rs6000/vsx.md +++ b/gcc/config/rs6000/vsx.md @@ -2704,6 +2704,90 @@ DONE; }) +;; Convert float vector even elements to signed long long vector +(define_expand "vsignede_v4sf" + [(match_operand:V2DI 0 "vsx_register_operand") + (match_
Re: [PATCH 13/13 ver5] rs6000, remove vector set and vector init built-ins.
GCC maintainers: The patch has been updated to remove the customized vec_init built-in code. Specfivically the init identifier, the related generated code for the init built-in attribute bit, function altivec_expand_vec_init_builtin and calls to the function. Please let me know if the patch is acceptable for mainline. Thanks. Carl --- rs6000, remove vector set and vector init built-ins. The vector init built-ins: __builtin_vec_init_v16qi, __builtin_vec_init_v8hi, __builtin_vec_init_v4si, __builtin_vec_init_v4sf, __builtin_vec_init_v2di, __builtin_vec_init_v2df, __builtin_vec_init_v1ti perform the same operation as initializing the vector in C code. For example: result_v4si = __builtin_vec_init_v4si (1, 2, 3, 4); result_v4si = {1, 2, 3, 4}; These two constructs were tested and verified they generate identical assembly instructions with no optimization and -O3 optimization. The vector set built-ins: __builtin_vec_set_v16qi, __builtin_vec_set_v8hi. __builtin_vec_set_v4si, __builtin_vec_set_v4sf, __builtin_vec_set_v1ti, __builtin_vec_set_v2di, __builtin_vec_set_v2df perform the same operation as setting a specific element in the vector in C code. For example: src_v4si = __builtin_vec_set_v4si (src_v4si, int_val, index); src_v4si[index] = int_val; The built-in actually generates more instructions than the inline C code with no optimization but is identical with -O3 optimizations. All of the above built-ins that are removed do not have test cases and are not documented. Built-ins __builtin_vec_set_v1ti __builtin_vec_set_v2di, __builtin_vec_set_v2df are not removed as they are used in function resolve_vec_insert() in file rs6000-c.cc. The built-ins are removed as they don't provide any benefit over just using C code. The code to define the bif_init_bit, bif_is_init, as well as their uses is removed. The function altivec_expand_vec_init_builtin is also removed. gcc/ChangeLog: * config/rs6000/rs6000-builtin.cc (altivec_expand_vec_init_builtin): Removed the function. (rs6000_expand_builtin): Removed the if bif_is_int check to call the altivec_expand_vec_init_builtin function. * config/rs6000/rs6000-builtins.def: Removed the attribute string comment for init. (__builtin_vec_init_v16qi, __builtin_vec_init_v4sf, __builtin_vec_init_v4si, __builtin_vec_init_v8hi, __builtin_vec_init_v1ti, __builtin_vec_init_v2df, __builtin_vec_init_v2di, __builtin_vec_set_v16qi, __builtin_vec_set_v4sf, __builtin_vec_set_v4si, __builtin_vec_set_v8hi): Remove built-in definitions. * config/rs6000-gen-builtins.cc: Removed comment for init attribute string. (struct attrinfo): Removed isint entry. (parse_bif_attrs): Removed the if statement to check for attribute init. (ifdef DEBUG): Removed print for init attribute string. (write_decls): Removed print for define bif_init_bit and define for bif_is_init. (write_bif_static_init): Removed if bifp->attrs.isinit statement. --- gcc/config/rs6000/rs6000-builtin.cc | 40 - gcc/config/rs6000/rs6000-builtins.def | 45 +++- gcc/config/rs6000/rs6000-gen-builtins.cc | 16 +++-- 3 files changed, 8 insertions(+), 93 deletions(-) diff --git a/gcc/config/rs6000/rs6000-builtin.cc b/gcc/config/rs6000/rs6000-builtin.cc index 646e740774e..0a24d20a58c 100644 --- a/gcc/config/rs6000/rs6000-builtin.cc +++ b/gcc/config/rs6000/rs6000-builtin.cc @@ -2313,43 +2313,6 @@ altivec_expand_predicate_builtin (enum insn_code icode, tree exp, rtx target) return target; } -/* Expand vec_init builtin. */ -static rtx -altivec_expand_vec_init_builtin (tree type, tree exp, rtx target) -{ - machine_mode tmode = TYPE_MODE (type); - machine_mode inner_mode = GET_MODE_INNER (tmode); - int i, n_elt = GET_MODE_NUNITS (tmode); - - gcc_assert (VECTOR_MODE_P (tmode)); - gcc_assert (n_elt == call_expr_nargs (exp)); - - if (!target || !register_operand (target, tmode)) - target = gen_reg_rtx (tmode); - - /* If we have a vector compromised of a single element, such as V1TImode, do - the initialization directly. */ - if (n_elt == 1 && GET_MODE_SIZE (tmode) == GET_MODE_SIZE (inner_mode)) - { - rtx x = expand_normal (CALL_EXPR_ARG (exp, 0)); - emit_move_insn (target, gen_lowpart (tmode, x)); - } - else - { - rtvec v = rtvec_alloc (n_elt); - - for (i = 0; i < n_elt; ++i) - { - rtx x = expand_normal (CALL_EXPR_ARG (exp, i)); - RTVEC_ELT (v, i) = gen_lowpart (inner_mode, x); - } - - rs6000_expand_vector_init (target, gen_rtx_PARALLEL (tmode, v)); - } - - return target; -} - /* Return the integer constant in ARG. Constrain it to be in the range of the subparts of VEC_TYPE; issue an error if not. */ @@ -3401,9 +3364,6 @@ rs6000_expand_builtin (tree exp, rtx target, rtx /
Re: [PATCH 13/13 ver4] rs6000, remove vector set and vector init built-ins
Kewen: On 6/18/24 20:04, Kewen.Lin wrote: Hi Carl, on 2024/6/14 03:40, Carl Love wrote: GCC maintainers: The patch has been updated per the feedback from version 3. Please let me know it the patch is acceptable for mainline. Thanks. Carl -- rs6000, remove vector set and vector init built-ins The vector init built-ins: __builtin_vec_init_v16qi, __builtin_vec_init_v8hi, __builtin_vec_init_v4si, __builtin_vec_init_v4sf, __builtin_vec_init_v2di, __builtin_vec_init_v2df, __builtin_vec_init_v1ti perform the same operation as initializing the vector in C code. For example: result_v4si = __builtin_vec_init_v4si (1, 2, 3, 4); result_v4si = {1, 2, 3, 4}; These two constructs were tested and verified they generate identical assembly instructions with no optimization and -O3 optimization. The vector set built-ins: __builtin_vec_set_v16qi, __builtin_vec_set_v8hi. __builtin_vec_set_v4si, __builtin_vec_set_v4sf, __builtin_vec_set_v1ti, __builtin_vec_set_v2di, __builtin_vec_set_v2df perform the same operation as setting a specific element in the vector in C code. For example: src_v4si = __builtin_vec_set_v4si (src_v4si, int_val, index); src_v4si[index] = int_val; The built-in actually generates more instructions than the inline C code with no optimization but is identical with -O3 optimizations. All of the above built-ins that are removed do not have test cases and are not documented. Built-ins __builtin_vec_set_v1ti __builtin_vec_set_v2di, __builtin_vec_set_v2df are not removed as they are used in function resolve_vec_insert() in file rs6000-c.cc. The built-ins are removed as they don't provide any benefit over just using C code. gcc/ChangeLog: * config/rs6000/rs6000-builtins.def (__builtin_vec_init_v16qi, __builtin_vec_init_v4sf, __builtin_vec_init_v4si, __builtin_vec_init_v8hi, __builtin_vec_init_v1ti, __builtin_vec_init_v2df, __builtin_vec_init_v2di, __builtin_vec_set_v16qi, __builtin_vec_set_v4sf, __builtin_vec_set_v4si, __builtin_vec_set_v8hi): Remove built-in definitions. --- gcc/config/rs6000/rs6000-builtins.def | 44 +++ 1 file changed, 4 insertions(+), 40 deletions(-) diff --git a/gcc/config/rs6000/rs6000-builtins.def b/gcc/config/rs6000/rs6000-builtins.def index 02aa04e5698..053dc0115d2 100644 --- a/gcc/config/rs6000/rs6000-builtins.def +++ b/gcc/config/rs6000/rs6000-builtins.def @@ -1118,37 +1118,6 @@ const signed short __builtin_vec_ext_v8hi (vss, signed int); VEC_EXT_V8HI nothing {extract} - const vsc __builtin_vec_init_v16qi (signed char, signed char, signed char, \ -signed char, signed char, signed char, signed char, signed char, \ -signed char, signed char, signed char, signed char, signed char, \ -signed char, signed char, signed char); -VEC_INIT_V16QI nothing {init} I just realized this {init} is customized for vec_init only, these removed vec_init bifs are the only users of it, so we should remove this attribute as well. Sorry that I should have found and pointed out this in the previous review. I think it means some removals are needed on: 1) comments in rs6000-builtins.def ; init Process as a vec_init function 2) related gen code for this attribute bit, like: fprintf (header_file, "#define bif_init_bit\t\t(0x0001)\n"); fprintf (header_file, "#define bif_is_init(x)\t\t((x).bifattrs & bif_init_bit)\n"); if (bifp->attrs.isinit) fprintf (init_file, " | bif_init_bit"); OK, Yes, we can remove the attribute string for the vec_init built-in. In addition to the code you mentioned, we will need to remove the uses of bif_init_bit, bif_is_init and the function altivec_expand_vec_init_builtin. Carl
[PATCH] rs6000, update effective target for tests builtins-10*.c and, vec_perm-runnable-i128.c
GCC maintainers: The tests: tests builtins-10-runnable.c tests builtins-10.c vec_perm-runnable-i128.c generate the following errors when run on a 32-bit BE Power system with GCC configured with multilib enabled. FAIL: gcc.target/powerpc/builtins-10-runnable.c (test for excess errors) FAIL: gcc.target/powerpc/builtins-10.c (test for excess errors) FAIL: gcc.target/powerpc/vec_perm-runnable-i128.c (test for excess errors) The tests use the __int128 type which is not supported on 32-bit systems. The test for int128 and lp64 was added to the test cases to disable the test on 32-bit systems and systems that do not support the __int128 type. The three tests now report "# of unsupported tests 1". The patch has been tested on a Power 9 BE system with multilib enabled for GCC and on a Power 10 LE 64-bit configuration with no regression failures. Please let me know if the patch is acceptable for mainline. Thanks. Carl -- rs6000, update effective target for tests builtins-10*.c and vec_perm-runnable-i128.c The tests: tests builtins-10-runnable.c tests builtins-10.c vec_perm-runnable-i128.c use __int128 types that are not supported on all platforms. The __int128 type is only supported on 64-bit platforms. Need to check that the platform is 64-bits and support the __int128 type. Add the int128 and lp64 flags to the target test. gcc/testsuite/ChangeLog: * gcc.target/powerpc/builtins-10-runnable.c: Add target int128 and lp64. * gcc.target/powerpc/builtins-10.c: Add target int128 and lp64. * gcc.target/powerpc/vec_perm-runnable-i128: Add target int128 and lp64. --- gcc/testsuite/gcc.target/powerpc/builtins-10-runnable.c | 2 +- gcc/testsuite/gcc.target/powerpc/builtins-10.c | 2 +- gcc/testsuite/gcc.target/powerpc/vec_perm-runnable-i128.c | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/gcc/testsuite/gcc.target/powerpc/builtins-10-runnable.c b/gcc/testsuite/gcc.target/powerpc/builtins-10-runnable.c index dede08358e1..da3011d4c00 100644 --- a/gcc/testsuite/gcc.target/powerpc/builtins-10-runnable.c +++ b/gcc/testsuite/gcc.target/powerpc/builtins-10-runnable.c @@ -1,4 +1,4 @@ -/* { dg-do run } */ +/* { dg-do run { target { lp64 } && { int128 } } } */ /* { dg-require-effective-target vmx_hw } */ /* { dg-options "-maltivec -O2 " } */ diff --git a/gcc/testsuite/gcc.target/powerpc/builtins-10.c b/gcc/testsuite/gcc.target/powerpc/builtins-10.c index b00f53cfc62..bc3cdb69305 100644 --- a/gcc/testsuite/gcc.target/powerpc/builtins-10.c +++ b/gcc/testsuite/gcc.target/powerpc/builtins-10.c @@ -1,4 +1,4 @@ -/* { dg-do compile } */ +/* { dg-do compile { target { lp64 } && { int128 } } } */ /* { dg-options "-O2 -maltivec" } */ /* { dg-require-effective-target powerpc_altivec } */ /* { dg-final { scan-assembler-times "xxsel" 6 } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/vec_perm-runnable-i128.c b/gcc/testsuite/gcc.target/powerpc/vec_perm-runnable-i128.c index 0e0d77bcb84..c9b8a2053b7 100644 --- a/gcc/testsuite/gcc.target/powerpc/vec_perm-runnable-i128.c +++ b/gcc/testsuite/gcc.target/powerpc/vec_perm-runnable-i128.c @@ -1,4 +1,4 @@ -/* { dg-do run } */ +/* { dg-do run { target { lp64 } && { int128 } } } */ /* { dg-require-effective-target vmx_hw } */ /* { dg-options "-maltivec -O2 " } */ -- 2.45.2
Re: [PATCH] rs6000, update effective target for tests builtins-10*.c and, vec_perm-runnable-i128.c
Peter: On 7/15/24 4:14 PM, Peter Bergner wrote: On 7/15/24 5:43 PM, Carl Love wrote: -/* { dg-do run } */ +/* { dg-do run { target { lp64 } && { int128 } } } */ Why isn't this just: /* { dg-do run { target int128 } } */ ??? The int128 test should disable this on 32-bit systems just fine. I agree it seems like that should work. I had tried just the int128 initially but was still getting errors so I added the { lp64 } and that fixed it. That said, I went back and tried dg-do run { target int128 } again on one of the files. Now it seems to work? Hmm, I guess I must have had a typo or something when I first tried it. I will try fixing the patch for all of the test files and retest to see if just int128 works. Carl
[PATCH ver 2] rs6000, update effective target for tests builtins-10*.c and, vec_perm-runnable-i128.c
GCC maintainers: Version 2, removed the lp64 from the target per discussion. Tested and it is not needed. The int128 qualifier is sufficient for the thest to report as unsupported on a 32-bit Power system. The tests: tests builtins-10-runnable.c tests builtins-10.c vec_perm-runnable-i128.c generate the following errors when run on a 32-bit BE Power system with GCC configured with multilib enabled. FAIL: gcc.target/powerpc/builtins-10-runnable.c (test for excess errors) FAIL: gcc.target/powerpc/builtins-10.c (test for excess errors) FAIL: gcc.target/powerpc/vec_perm-runnable-i128.c (test for excess errors) The tests use the __int128 type which is not supported on 32-bit systems. The test for int128 and lp64 was added to the test cases to disable the test on 32-bit systems and systems that do not support the __int128 type. The three tests now report "# of unsupported tests 1". The patch has been tested on a Power 9 BE system with multilib enabled for GCC and on a Power 10 LE 64-bit configuration with no regression failures. Please let me know if the patch is acceptable for mainline. Thanks. Carl [PATCH] rs6000, update effective target for tests builtins-10*.c and vec_perm-runnable-i128.c The tests: tests builtins-10-runnable.c tests builtins-10.c vec_perm-runnable-i128.c use __int128 types that are not supported on all platforms. The __int128 type is only supported on 64-bit platforms. Need to check that the platform is 64-bits and support the __int128 type. Add the int128 and lp64 flags to the target test. gcc/testsuite/ChangeLog: * gcc.target/powerpc/builtins-10-runnable.c: Add target int128 and lp64. * gcc.target/powerpc/builtins-10.c: Add target int128 and lp64. * gcc.target/powerpc/vec_perm-runnable-i128: Add target int128 and lp64. --- gcc/testsuite/gcc.target/powerpc/builtins-10-runnable.c | 2 +- gcc/testsuite/gcc.target/powerpc/builtins-10.c | 2 +- gcc/testsuite/gcc.target/powerpc/vec_perm-runnable-i128.c | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/gcc/testsuite/gcc.target/powerpc/builtins-10-runnable.c b/gcc/testsuite/gcc.target/powerpc/builtins-10-runnable.c index dede08358e1..e2d3c990852 100644 --- a/gcc/testsuite/gcc.target/powerpc/builtins-10-runnable.c +++ b/gcc/testsuite/gcc.target/powerpc/builtins-10-runnable.c @@ -1,4 +1,4 @@ -/* { dg-do run } */ +/* { dg-do run { target int128 } } */ /* { dg-require-effective-target vmx_hw } */ /* { dg-options "-maltivec -O2 " } */ diff --git a/gcc/testsuite/gcc.target/powerpc/builtins-10.c b/gcc/testsuite/gcc.target/powerpc/builtins-10.c index b00f53cfc62..007892e2731 100644 --- a/gcc/testsuite/gcc.target/powerpc/builtins-10.c +++ b/gcc/testsuite/gcc.target/powerpc/builtins-10.c @@ -1,4 +1,4 @@ -/* { dg-do compile } */ +/* { dg-do compile { target int128 } } */ /* { dg-options "-O2 -maltivec" } */ /* { dg-require-effective-target powerpc_altivec } */ /* { dg-final { scan-assembler-times "xxsel" 6 } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/vec_perm-runnable-i128.c b/gcc/testsuite/gcc.target/powerpc/vec_perm-runnable-i128.c index 0e0d77bcb84..df1bf873cfc 100644 --- a/gcc/testsuite/gcc.target/powerpc/vec_perm-runnable-i128.c +++ b/gcc/testsuite/gcc.target/powerpc/vec_perm-runnable-i128.c @@ -1,4 +1,4 @@ -/* { dg-do run } */ +/* { dg-do run { target int128 } } */ /* { dg-require-effective-target vmx_hw } */ /* { dg-options "-maltivec -O2 " } */ -- 2.45.2
[PATCH] rs6000, remove __builtin_vsx_xvcmp* built-ins
GCC maintainers: The following patch removes the three __builtin_vsx_xvcmp[eq|ge|gt]sp builtins as they similar to the overloaded vec_cmp[eq|ge|gt] built-ins. The difference is the overloaded built-ins return a vector of boolean or a vector of long long booleans where as the removed built-ins returned a vector of floats or vector of doubles. The tests for __builtin_vsx_xvcmp[eq|ge|gt]sp and __builtin_vsx_xvcmp[eq|ge|gt]dp are updated to use the overloaded vec_cmp[eq|ge|gt] built-in with the required changes for the return type. Note __builtin_vsx_xvcmp[eq|ge|gt]dp are used internally. The patches have been tested on a Power 10 LE system with no regressions. Please let me know if the patch is acceptable for mainline. Thanks. Carl - rs6000, remove __builtin_vsx_xvcmp* built-ins This patch removes the built-ins: __builtin_vsx_xvcmpeqsp, __builtin_vsx_xvcmpgesp, __builtin_vsx_xvcmpgtsp. which are similar to the overloaded vec_cmpeq, vec_cmpgt and vec_cmpge built-ins. The difference is that the overloaded built-ins return a vector of booleans or a vector of long long boolean depending if the inputs were a vector of floats or a vector of doubles. The removed built-ins returned a vector of floats or vector of double for the vector float and vector double inputs respectively. The __builtin_vsx_xvcmpeqdp, __builtin_vsx_xvcmpgedp and __builtin_vsx_xvcmpgtdp are not removed as they are used by the overloaded vec_cmpeq, vec_cmpgt and vec_cmpge built-ins. The test cases for the __builtin_vsx_xvcmpeqsp, __builtin_vsx_xvcmpgesp, __builtin_vsx_xvcmpgtsp, __builtin_vsx_xvcmpeqdp, __builtin_vsx_xvcmpgedp and __builtin_vsx_xvcmpgtdp are changed to use the overloaded vec_cmpeq, vec_cmpgt, vec_cmpge built-ins. Use of the overloaded built-ins requires the result to be stored in a vector of boolean of the appropriate size or the result must be cast to the return type used by the original __builtin_vsx_xvcmp* built-ins. --- gcc/config/rs6000/rs6000-builtins.def | 10 --- .../gcc.target/powerpc/vsx-builtin-3.c | 28 ++- 2 files changed, 21 insertions(+), 17 deletions(-) diff --git a/gcc/config/rs6000/rs6000-builtins.def b/gcc/config/rs6000/rs6000-builtins.def index 77eb0f7e406..896d9686ac6 100644 --- a/gcc/config/rs6000/rs6000-builtins.def +++ b/gcc/config/rs6000/rs6000-builtins.def @@ -1579,30 +1579,20 @@ const signed int __builtin_vsx_xvcmpeqdp_p (signed int, vd, vd); XVCMPEQDP_P vector_eq_v2df_p {pred} - const vf __builtin_vsx_xvcmpeqsp (vf, vf); - XVCMPEQSP vector_eqv4sf {} - const vd __builtin_vsx_xvcmpgedp (vd, vd); XVCMPGEDP vector_gev2df {} const signed int __builtin_vsx_xvcmpgedp_p (signed int, vd, vd); XVCMPGEDP_P vector_ge_v2df_p {pred} - const vf __builtin_vsx_xvcmpgesp (vf, vf); - XVCMPGESP vector_gev4sf {} - const signed int __builtin_vsx_xvcmpgesp_p (signed int, vf, vf); XVCMPGESP_P vector_ge_v4sf_p {pred} const vd __builtin_vsx_xvcmpgtdp (vd, vd); XVCMPGTDP vector_gtv2df {} - const signed int __builtin_vsx_xvcmpgtdp_p (signed int, vd, vd); XVCMPGTDP_P vector_gt_v2df_p {pred} - const vf __builtin_vsx_xvcmpgtsp (vf, vf); - XVCMPGTSP vector_gtv4sf {} - const signed int __builtin_vsx_xvcmpgtsp_p (signed int, vf, vf); XVCMPGTSP_P vector_gt_v4sf_p {pred} diff --git a/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c index 60f91aad23c..d67f97c8011 100644 --- a/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c +++ b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c @@ -156,13 +156,27 @@ int do_cmp (void) { int i = 0; - d[i][0] = __builtin_vsx_xvcmpeqdp (d[i][1], d[i][2]); i++; - d[i][0] = __builtin_vsx_xvcmpgtdp (d[i][1], d[i][2]); i++; - d[i][0] = __builtin_vsx_xvcmpgedp (d[i][1], d[i][2]); i++; - - f[i][0] = __builtin_vsx_xvcmpeqsp (f[i][1], f[i][2]); i++; - f[i][0] = __builtin_vsx_xvcmpgtsp (f[i][1], f[i][2]); i++; - f[i][0] = __builtin_vsx_xvcmpgesp (f[i][1], f[i][2]); i++; + /* The __builtin_vsx_xvcmp[gt|ge|eq]dp and __builtin_vsx_xvcmp[gt|ge|eq]sp + have been removed in favor of the overloaded vec_cmpeq, vec_cmpgt and + vec_cmpge built-ins. The __builtin_vsx_xvcmp* builtins returned a vector + result of the same type as the arguments. The vec_cmp* built-ins return + a vector of boolenas of the same size as the arguments. Thus the result + assignment must be to a boolean or cast to a boolean. Test both cases. + */ + + d[i][0] = (vector double) vec_cmpeq (d[i][1], d[i][2]); i++; + d[i][0] = (vector double) vec_cmpgt (d[i][1], d[i][2]); i++; + d[i][0] = (vector double) vec_cmpge (d[i][1], d[i][2]); i++; + bl[i][0] = vec_cmpeq (d[i][1], d[i][2]); i++; + bl[i][0] = vec_cmpgt (d[i][1], d[i][2]); i++; + bl[i][0] = vec_cmpge (d[i][1], d[i][2]); i++; +
[PATCH] rs6000, Remove __builtin_vec_set_v1ti,, __builtin_vec_set_v2df, __builtin_vec_set_v2di
GCC maintainers: This patch removes the __builtin_vec_set_v1ti, __builtin_vec_set_v2df and __builtin_vec_set_v2di built-ins. The users should just use normal C-code to update the various vector elements. This change was originally intended to be part of the earlier series of cleanup patches. It was initially thought that some additional work would be needed to do some gimple generation instead of these built-ins. However, the existing default code generation does produce the needed code. The code generated with normal C-code is as good or better than the code generated with these built-ins. The patch has been tested on Power 10 LE with no regressions. Please let me know if the patch is acceptable for mainline. Thanks. Carl --- rs6000, Remove __builtin_vec_set_v1ti, __builtin_vec_set_v2df, __builtin_vec_set_v2di Remove the built-ins, use the default gimple generation instead. gcc/ChangeLog: * config/rs6000/rs6000-builtins.def (__builtin_vec_set_v1ti, __builtin_vec_set_v2df, __builtin_vec_set_v2di): Remove built-in definitions. * config/rs6000/rs6000-c.cc (resolve_vec_insert): Remove if statemnts for mode == V2DFmode, mode == V2DImode and mode == V1TImode that reference RS6000_BIF_VEC_SET_V2DF, RS6000_BIF_VEC_SET_V2DI and RS6000_BIF_VEC_SET_V1TI. --- gcc/config/rs6000/rs6000-builtins.def | 13 - gcc/config/rs6000/rs6000-c.cc | 40 --- 2 files changed, 53 deletions(-) diff --git a/gcc/config/rs6000/rs6000-builtins.def b/gcc/config/rs6000/rs6000-builtins.def index 896d9686ac6..0ebc940f395 100644 --- a/gcc/config/rs6000/rs6000-builtins.def +++ b/gcc/config/rs6000/rs6000-builtins.def @@ -1263,19 +1263,6 @@ const signed long long __builtin_vec_ext_v2di (vsll, signed int); VEC_EXT_V2DI nothing {extract} -;; VEC_SET_V1TI, VEC_SET_V2DF and VEC_SET_V2DI are used in -;; resolve_vec_insert(), rs6000-c.cc -;; TODO: Remove VEC_SET_V1TI, VEC_SET_V2DF and VEC_SET_V2DI once the uses -;; in resolve_vec_insert are replaced by the equivalent gimple statements. - const vsq __builtin_vec_set_v1ti (vsq, signed __int128, const int<0,0>); - VEC_SET_V1TI nothing {set} - - const vd __builtin_vec_set_v2df (vd, double, const int<1>); - VEC_SET_V2DF nothing {set} - - const vsll __builtin_vec_set_v2di (vsll, signed long long, const int<1>); - VEC_SET_V2DI nothing {set} - const vsc __builtin_vsx_cmpge_16qi (vsc, vsc); CMPGE_16QI vector_nltv16qi {} diff --git a/gcc/config/rs6000/rs6000-c.cc b/gcc/config/rs6000/rs6000-c.cc index 6229c503bd0..c288acc200b 100644 --- a/gcc/config/rs6000/rs6000-c.cc +++ b/gcc/config/rs6000/rs6000-c.cc @@ -1522,46 +1522,6 @@ resolve_vec_insert (resolution *res, vecva_gc> *arglist, return error_mark_node; } - /* If we can use the VSX xxpermdi instruction, use that for insert. */ - machine_mode mode = TYPE_MODE (arg1_type); - - if ((mode == V2DFmode || mode == V2DImode) - && VECTOR_UNIT_VSX_P (mode) - && TREE_CODE (arg2) == INTEGER_CST) - { - wide_int selector = wi::to_wide (arg2); - selector = wi::umod_trunc (selector, 2); - arg2 = wide_int_to_tree (TREE_TYPE (arg2), selector); - - tree call = NULL_TREE; - if (mode == V2DFmode) - call = rs6000_builtin_decls[RS6000_BIF_VEC_SET_V2DF]; - else if (mode == V2DImode) - call = rs6000_builtin_decls[RS6000_BIF_VEC_SET_V2DI]; - - /* Note, __builtin_vec_insert_ has vector and scalar types - reversed. */ - if (call) - { - *res = resolved; - return build_call_expr (call, 3, arg1, arg0, arg2); - } - } - - else if (mode == V1TImode - && VECTOR_UNIT_VSX_P (mode) - && TREE_CODE (arg2) == INTEGER_CST) - { - tree call = rs6000_builtin_decls[RS6000_BIF_VEC_SET_V1TI]; - wide_int selector = wi::zero(32); - arg2 = wide_int_to_tree (TREE_TYPE (arg2), selector); - - /* Note, __builtin_vec_insert_ has vector and scalar types - reversed. */ - *res = resolved; - return build_call_expr (call, 3, arg1, arg0, arg2); - } - /* Build *(((arg1_inner_type*) & (vector type){arg1}) + arg2) = arg0 with VIEW_CONVERT_EXPR. i.e.: D.3192 = v1; -- 2.45.2
Re: [PATCH ver 2] rs6000, update effective target for tests builtins-10*.c and, vec_perm-runnable-i128.c
On 7/16/24 6:01 PM, Peter Bergner wrote: On 7/16/24 6:19 PM, Carl Love wrote: use __int128 types that are not supported on all platforms. The __int128 type is only supported on 64-bit platforms. Need to check that the platform is 64-bits and support the __int128 type. Add the int128 and lp64 flags to the target test. The test cases themselves look good, but you need to update your git log entry to not mention the lp64/64-bits since you removed them. Yea, I didn't get the lp64 references clean up properly. Sorry about that. Yes, currently, only 64-bit targets support __int128, but our hope is that one day, even 32-bit targets will as well. So how about the following text instead? ... use __int128 types that are not supported on all platforms. Update the tests to check int128 effective target to avoid unsupported type errors on unsupported platforms. OK, changed. Carl
[PATCH ver 3] rs6000, update effective target for tests builtins-10*.c and, vec_perm-runnable-i128.c
GCC maintainers: Version 3, in version 2, the ChangeLog didn't get updated to remove the LP64 references. Fixed that and updated the patch description per the feedback from Peter. Version 2, removed the lp64 from the target per discussion. Tested and it is not needed. The int128 qualifier is sufficient for the thest to report as unsupported on a 32-bit Power system. The tests: tests builtins-10-runnable.c tests builtins-10.c vec_perm-runnable-i128.c generate the following errors when run on a 32-bit BE Power system with GCC configured with multilib enabled. FAIL: gcc.target/powerpc/builtins-10-runnable.c (test for excess errors) FAIL: gcc.target/powerpc/builtins-10.c (test for excess errors) FAIL: gcc.target/powerpc/vec_perm-runnable-i128.c (test for excess errors) The tests use the __int128 type which is not supported on 32-bit systems. The test for int128 and lp64 was added to the test cases to disable the test on 32-bit systems and systems that do not support the __int128 type. The three tests now report "# of unsupported tests 1". The patch has been tested on a Power 9 BE system with multilib enabled for GCC and on a Power 10 LE 64-bit configuration with no regression failures. Please let me know if the patch is acceptable for mainline. Thanks. Carl -- rs6000, update effective target for tests builtins-10*.c and vec_perm-runnable-i128.c The tests: tests builtins-10-runnable.c tests builtins-10.c vec_perm-runnable-i128.c use __int128 types that are not supported on all platforms. Update the tests to check int128 effective target to avoid unsupported type errors on unsupported platforms. gcc/testsuite/ChangeLog: * gcc.target/powerpc/builtins-10-runnable.c: Add target int128. * gcc.target/powerpc/builtins-10.c: Add target int128. * gcc.target/powerpc/vec_perm-runnable-i128: Add target int128. --- gcc/testsuite/gcc.target/powerpc/builtins-10-runnable.c | 2 +- gcc/testsuite/gcc.target/powerpc/builtins-10.c | 2 +- gcc/testsuite/gcc.target/powerpc/vec_perm-runnable-i128.c | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/gcc/testsuite/gcc.target/powerpc/builtins-10-runnable.c b/gcc/testsuite/gcc.target/powerpc/builtins-10-runnable.c index dede08358e1..e2d3c990852 100644 --- a/gcc/testsuite/gcc.target/powerpc/builtins-10-runnable.c +++ b/gcc/testsuite/gcc.target/powerpc/builtins-10-runnable.c @@ -1,4 +1,4 @@ -/* { dg-do run } */ +/* { dg-do run { target int128 } } */ /* { dg-require-effective-target vmx_hw } */ /* { dg-options "-maltivec -O2 " } */ diff --git a/gcc/testsuite/gcc.target/powerpc/builtins-10.c b/gcc/testsuite/gcc.target/powerpc/builtins-10.c index b00f53cfc62..007892e2731 100644 --- a/gcc/testsuite/gcc.target/powerpc/builtins-10.c +++ b/gcc/testsuite/gcc.target/powerpc/builtins-10.c @@ -1,4 +1,4 @@ -/* { dg-do compile } */ +/* { dg-do compile { target int128 } } */ /* { dg-options "-O2 -maltivec" } */ /* { dg-require-effective-target powerpc_altivec } */ /* { dg-final { scan-assembler-times "xxsel" 6 } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/vec_perm-runnable-i128.c b/gcc/testsuite/gcc.target/powerpc/vec_perm-runnable-i128.c index 0e0d77bcb84..df1bf873cfc 100644 --- a/gcc/testsuite/gcc.target/powerpc/vec_perm-runnable-i128.c +++ b/gcc/testsuite/gcc.target/powerpc/vec_perm-runnable-i128.c @@ -1,4 +1,4 @@ -/* { dg-do run } */ +/* { dg-do run { target int128 } } */ /* { dg-require-effective-target vmx_hw } */ /* { dg-options "-maltivec -O2 " } */ -- 2.45.2
[PATCH] rs6000, Add new overloaded vector shift builtin int128, varients
GCC developers: The following patch adds the int128 varients to the existing overloaded built-ins vec_sld, vec_sldb, vec_sldw, vec_sll, vec_slo, vec_srdb, vec_srl, vec_sro. These varients were requested by Steve Munroe. The patch has been tested on a Power 10 system with no regressions. Please let me know if the patch is acceptable for mainline. Carl --- rs6000, Add new overloaded vector shift builtin int128 varients Add the signed __int128 and unsigned __int128 argument types for the overloaded built-ins vec_sld, vec_sldb, vec_sldw, vec_sll, vec_slo, vec_srdb, vec_srl, vec_sro. For each of the new argument types add a testcase and update the documentation for the built-in. Add the missing internal names for the float and double types for overloaded builtin vec_sld for the float and double types. gcc/ChangeLog: * config/rs6000/altivec.md (vsdb_): Change define_insn iterator to VEC_IC. * config/rs6000/rs6000-builtins.def (__builtin_altivec_vsldoi_v1ti, __builtin_vsx_xxsldwi_v1ti, __builtin_altivec_vsldb_v1ti, __builtin_altivec_vsrdb_v1ti): New builtin definitions. * config/rs6000/rs6000-overload.def (vec_sld, vec_sldb, vec_sldw, vec_sll, vec_slo, vec_srdb, vec_srl, vec_sro): New overloaded definitions. (vec_sld): Add missing internal names. * doc/extend.texi (vec_sld, vec_sldb, vec_sldw, vec_sll, vec_slo, vec_srdb, vec_srl, vec_sro): Add documentation for new overloaded built-ins. gcc/testsuite/ChangeLog: * gcc.target/powerpc/vec-shift-double-runnable-int128.c: New test file. --- gcc/config/rs6000/altivec.md | 6 +- gcc/config/rs6000/rs6000-builtins.def | 12 + gcc/config/rs6000/rs6000-overload.def | 44 ++- gcc/doc/extend.texi | 42 +++ .../vec-shift-double-runnable-int128.c | 349 ++ 5 files changed, 448 insertions(+), 5 deletions(-) create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-shift-double-runnable-int128.c diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md index 5af9bf920a2..2a18ee44526 100644 --- a/gcc/config/rs6000/altivec.md +++ b/gcc/config/rs6000/altivec.md @@ -878,9 +878,9 @@ (define_int_attr SLDB_lr [(UNSPEC_SLDB "l") (define_int_iterator VSHIFT_DBL_LR [UNSPEC_SLDB UNSPEC_SRDB]) (define_insn "vsdb_" - [(set (match_operand:VI2 0 "register_operand" "=v") - (unspec:VI2 [(match_operand:VI2 1 "register_operand" "v") - (match_operand:VI2 2 "register_operand" "v") + [(set (match_operand:VEC_IC 0 "register_operand" "=v") + (unspec:VEC_IC [(match_operand:VEC_IC 1 "register_operand" "v") + (match_operand:VEC_IC 2 "register_operand" "v") (match_operand:QI 3 "const_0_to_12_operand" "n")] VSHIFT_DBL_LR))] "TARGET_POWER10" diff --git a/gcc/config/rs6000/rs6000-builtins.def b/gcc/config/rs6000/rs6000-builtins.def index 77eb0f7e406..fbb6e1ddf85 100644 --- a/gcc/config/rs6000/rs6000-builtins.def +++ b/gcc/config/rs6000/rs6000-builtins.def @@ -964,6 +964,9 @@ const vss __builtin_altivec_vsldoi_8hi (vss, vss, const int<4>); VSLDOI_8HI altivec_vsldoi_v8hi {} + const vsq __builtin_altivec_vsldoi_v1ti (vsq, vsq, const int<4>); + VSLDOI_V1TI altivec_vsldoi_v1ti {} + const vss __builtin_altivec_vslh (vss, vus); VSLH vashlv8hi3 {} @@ -1831,6 +1834,9 @@ const vsll __builtin_vsx_xxsldwi_2di (vsll, vsll, const int<2>); XXSLDWI_2DI vsx_xxsldwi_v2di {} + const vsq __builtin_vsx_xxsldwi_v1ti (vsq, vsq, const int<2>); + XXSLDWI_Q vsx_xxsldwi_v1ti {} + const vf __builtin_vsx_xxsldwi_4sf (vf, vf, const int<2>); XXSLDWI_4SF vsx_xxsldwi_v4sf {} @@ -3299,6 +3305,9 @@ const vss __builtin_altivec_vsldb_v8hi (vss, vss, const int<3>); VSLDB_V8HI vsldb_v8hi {} + const vsq __builtin_altivec_vsldb_v1ti (vsq, vsq, const int<3>); + VSLDB_V1TI vsldb_v1ti {} + const vsq __builtin_altivec_vslq (vsq, vuq); VSLQ vashlv1ti3 {} @@ -3317,6 +3326,9 @@ const vss __builtin_altivec_vsrdb_v8hi (vss, vss, const int<3>); VSRDB_V8HI vsrdb_v8hi {} + const vsq __builtin_altivec_vsrdb_v1ti (vsq, vsq, const int<3>); + VSRDB_V1TI vsrdb_v1ti {} + const vsq __builtin_altivec_vsrq (vsq, vuq); VSRQ vlshrv1ti3 {} diff --git a/gcc/config/rs6000/rs6000-overload.def b/gcc/config/rs6000/rs6000-overload.def index c4ecafc6f7e..302e0232533 100644 --- a/gcc/config/rs6000/rs6000-overload.def +++ b/gcc/config/rs6000/rs6000-overload.def @@ -3396,9 +3396,13 @@ vull __builtin_vec_sld (vull, vull, const int); VSLDOI_2DI VSLDOI_VULL vf __builtin_vec_sld (vf, vf, const int); - VSLDOI_4SF + VSLDOI_4SF VSLDOI_VF vd __builtin_vec_sld (vd, vd, const int); - VSLDOI_2DF + VSLDOI_2DF VSLDOI_VD + vsq __builtin_vec_sld (vsq, vsq, const int); + VSLDOI_V1TI VSLDOI_VSQ + vuq __builtin_vec_sld
Re: [PATCH 4/13] rs6000, extend the current vec_{un,}signed{e,o} built-ins
Kewen: I am working thru the patches. I made the changes as requested for this patch but have a question about one of your comments. On 5/14/24 00:53, Kewen.Lin wrote: > Hi, > > on 2024/4/20 05:17, Carl Love wrote: >> rs6000, extend the current vec_{un,}signed{e,o} built-ins >> >> The built-ins __builtin_vsx_xvcvspsxds and __builtin_vsx_xvcvspuxds >> convert a vector of floats to signed/unsigned long long ints. Extend the >> existing vec_{un,}signed{e,o} built-ins to handle the argument >> vector of floats to return the even/odd signed/unsigned integers. >> >> Add testcases and update documentation. >> >> gcc/ChangeLog: >> * config/rs6000/rs6000-builtins.def (__builtin_vsx_xvcvspsxds_low, >> __builtin_vsx_xvcvspuxds_low): New built-in definitions. >> * config/rs6000/rs6000-overload.def (vec_signede, vec_signedo): >> Add new overloaded specifications. >> * config/rs6000/vsx.md (vsx_xvcvspxds_low): New define_expand. >> * doc/extend.texi (vec_signedo, vec_signede): Add documentation. >> >> gcc/testsuite/ChangeLog: >> * gcc.target/powerpc/builtins-3-runnable: New tests for the added > > As the existing instances for vec_signed and vec_unsigned are with > names like VEC_V{UN,}SIGNED{O,E}_V2DF, I prefer these are updated > with similar style, maybe something like: > > VEC_V{UN,}SIGNED{E,O}_V4SF v{un,}signed{e,o}_v4sf Yes, sounds reasonable. Changed XVCVSPUXDS -> VEC_VUNSIGNEDE_V4SF XVCVSPUXDSO -> VEC_VUNSIGNEDO_V4SF XVCVSPSXDS -> VEC_VSIGNEDE_V4SF XVCVSPSXDSO -> VEC_VSIGNEDO_V4SF QUESTION: I am not sure what you want changed to v{un,}signed{e,o}_v4sf?? The overloaded instance entry names for vd, vf have to match the first line of the definition. The name can't be type specific, i.e. v4sf. So not sure where you want the v{un,}signed{e,o}_v4sf name used? For example, file rs6000-overloaded.def now looks like: [VEC_SIGNEDE, vec_signede, __builtin_vec_vsignede] vsi __builtin_vec_vsignede (vd); VEC_VSIGNEDE_V2DF + vsll __builtin_vec_vsignede (vf); +VEC_VSIGNEDE_V4SF [VEC_SIGNEDO, vec_signedo, __builtin_vec_vsignedo] vsi __builtin_vec_vsignedo (vd); VEC_VSIGNEDO_V2DF + vsll __builtin_vec_vsignedo (vf); +VEC_VSIGNEDO_V4SF Carl
Re: [PATCH 6/13] rs6000, add overloaded vec_sel with int128 arguments
Kewen: On 5/13/24 19:54, Kewen.Lin wrote: > Hi, > > on 2024/4/20 05:17, Carl Love wrote: >> rs6000, add overloaded vec_sel with int128 arguments >> >> Extend the vec_sel built-in to take three signed/unsigned int128 arguments >> and return a signed/unsigned int128 result. >> >> Extending the vec_sel built-in makes the existing buit-ins >> __builtin_vsx_xxsel_1ti and __builtin_vsx_xxsel_1ti_uns obsolete. The >> patch removes these built-ins. >> >> The patch adds documentation and test cases for the new overloaded vec_sel >> built-ins. >> >> gcc/ChangeLog: >> * config/rs6000/rs6000-builtins.def (__builtin_vsx_xxsel_1ti, >> __builtin_vsx_xxsel_1ti_uns): Remove built-in definitions. >> * config/rs6000/rs6000-overload.def (vec_sel): Add new overloaded >> definitions. >> * doc/extend.texi: Add documentation for new vec_sel arguments. >> >> gcc/testsuite/ChangeLog: >> * gcc.target/powerpc/vec_sel_runnable-int128.c: New test file. >> --- >> gcc/config/rs6000/rs6000-builtins.def | 6 -- >> gcc/config/rs6000/rs6000-overload.def | 4 + >> gcc/doc/extend.texi | 14 >> .../powerpc/vec-sel-runnable-i128.c | 84 +++ >> 4 files changed, 102 insertions(+), 6 deletions(-) >> create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-sel-runnable-i128.c >> >> diff --git a/gcc/config/rs6000/rs6000-builtins.def >> b/gcc/config/rs6000/rs6000-builtins.def >> index d09e21a9151..46d2ae7b7cb 100644 >> --- a/gcc/config/rs6000/rs6000-builtins.def >> +++ b/gcc/config/rs6000/rs6000-builtins.def >> @@ -1931,12 +1931,6 @@ >>const vuc __builtin_vsx_xxsel_16qi_uns (vuc, vuc, vuc); >> XXSEL_16QI_UNS vector_select_v16qi_uns {} >> >> - const vsq __builtin_vsx_xxsel_1ti (vsq, vsq, vsq); >> -XXSEL_1TI vector_select_v1ti {} >> - >> - const vsq __builtin_vsx_xxsel_1ti_uns (vsq, vsq, vsq); >> -XXSEL_1TI_UNS vector_select_v1ti_uns {} >> - >>const vd __builtin_vsx_xxsel_2df (vd, vd, vd); >> XXSEL_2DF vector_select_v2df {} >> >> diff --git a/gcc/config/rs6000/rs6000-overload.def >> b/gcc/config/rs6000/rs6000-overload.def >> index 68501c05289..5912c9452f4 100644 >> --- a/gcc/config/rs6000/rs6000-overload.def >> +++ b/gcc/config/rs6000/rs6000-overload.def >> @@ -3274,6 +3274,10 @@ >> VSEL_2DF VSEL_2DF_B >>vd __builtin_vec_sel (vd, vd, vull); >> VSEL_2DF VSEL_2DF_U >> + vsq __builtin_vec_sel (vsq, vsq, vsq); >> +VSEL_1TI VSEL_1TI_S >> + vuq __builtin_vec_sel (vuq, vuq, vuq); >> +VSEL_1TI_UNS VSEL_1TI_U >> ; The following variants are deprecated. >>vsll __builtin_vec_sel (vsll, vsll, vsll); >> VSEL_2DI_B VSEL_2DI_S >> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi >> index 64a43b55e2d..86b8e536dbe 100644 >> --- a/gcc/doc/extend.texi >> +++ b/gcc/doc/extend.texi >> @@ -23358,6 +23358,20 @@ The programmer is responsible for understanding the >> endianness issues involved >> with the first argument and the result. >> @findex vec_replace_unaligned >> >> +Vector select >> + >> +@smallexample >> +vector signed __int128 vec_sel (vector signed __int128, >> + vector signed __int128, vector signed __int128); >> +vector unsigned __int128 vec_sel (vector unsigned __int128, >> + vector unsigned __int128, vector unsigned __int128); >> +@end smallexample >> + >> +The overloaded built-in @code{vec_sel} with vector signed/unsigned __int128 >> +arguments and returns a vector selecting bits from the two source vectors >> based >> +on the values of the third input vector. This built-in is an extension of >> the >> +@code{vec_sel} built-in documented in the PVIPR. >> + > > Why did you place this in a section for ISA 3.1 (Power10)? It doesn't really > require this support. The used instance VSEL_1TI and VSEL_1TI_UNS are placed > in altivec stanza, so it looks that we should put it under the section > "PowerPC AltiVec Built-in Functions on ISA 2.05". And since it's an extension > of @code{vec_sel} documented in the PVIPR, I prefer to just mention it's "an > extension of the @code{vec_sel} built-in documented in the PVIPR" and omitting > the description to avoid possible slightly different wording. Honestly, at this point in time I don't remember why I put it there. It has been too long since I created the patch. That said, the test case requires Power 10
Re: [PATCH 13/13] rs6000, remove vector set and vector init built-ins.
Kewen: On 5/13/24 22:44, Kewen.Lin wrote: >> perform the same operation as setting a specific element in the vector in >> C code. For example: >> >> src_v4si = __builtin_vec_set_v4si (src_v4si, int_val, index); >> src_v4si[index] = int_val; >> >> The built-in actually generates more instructions than the inline C code >> with no optimization but is identical with -O3 optimizations. >> >> All of the above built-ins that are removed do not have test cases and >> are not documented. >> >> Built-ins __builtin_vec_set_v1ti __builtin_vec_set_v2di, >> __builtin_vec_set_v2df are not removed as they are used in function >> resolve_vec_insert() in file rs6000-c.cc. > I think we can replace these calls with the equivalent gimple codes > (early expanding it) and then we can get rid of these instances. Hmm, going to need a little coaching here. I am not sure how to do this. Looks like I get to lean some something new. Carl
Re: [PATCH 12/13] rs6000, remove __builtin_vsx_xvcmpeqsp built-in
On 5/13/24 22:37, Kewen.Lin wrote: > Hi, > > on 2024/4/20 05:18, Carl Love wrote: >> rs6000, remove __builtin_vsx_xvcmpeqsp built-in >> >> The built-in __builtin_vsx_xvcmpeqsp is a duplicate of the overloaded >> vec_cmpeq built-in. The built-in is undocumented. The built-in and >> the test cases are removed. >> >> gcc/ChangeLog: >> * config/rs6000/rs6000-builtins.def (__builtin_vsx_xvcmpeqsp): >> Remove built-in definition. >> > > Ah, you separated this __builtin_vsx_xvcmpeqsp from the one for > __builtin_vsx_xvcmpeqsp_p, it's fine, please ignore the comments for > considering this __builtin_vsx_xvcmpeqsp in my previous reply to 11/13. > > >> gcc/testsuite/ChangeLog: >> * vsx-builtin-3.c (do_cmp): Remove test case for >> __builtin_vsx_xvcmpeqsp. >> --- >> gcc/config/rs6000/rs6000-builtins.def| 3 --- >> gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c | 2 -- >> 2 files changed, 5 deletions(-) >> >> diff --git a/gcc/config/rs6000/rs6000-builtins.def >> b/gcc/config/rs6000/rs6000-builtins.def >> index 2f6149edd5f..19d05b8043a 100644 >> --- a/gcc/config/rs6000/rs6000-builtins.def >> +++ b/gcc/config/rs6000/rs6000-builtins.def >> @@ -1613,9 +1613,6 @@ >>const signed int __builtin_vsx_xvcmpeqdp_p (signed int, vd, vd); >> XVCMPEQDP_P vector_eq_v2df_p {pred} >> >> - const vf __builtin_vsx_xvcmpeqsp (vf, vf); >> -XVCMPEQSP vector_eqv4sf {} >> - >>const vd __builtin_vsx_xvcmpgedp (vd, vd); >> XVCMPGEDP vector_gev2df {} >> >> diff --git a/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c >> b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c >> index 35ea31b2616..245893dc0e3 100644 >> --- a/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c >> +++ b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c >> @@ -27,7 +27,6 @@ >> /* { dg-final { scan-assembler "xvcmpeqdp" } } */ >> /* { dg-final { scan-assembler "xvcmpgtdp" } } */ >> /* { dg-final { scan-assembler "xvcmpgedp" } } */ >> -/* { dg-final { scan-assembler "xvcmpeqsp" } } */ >> /* { dg-final { scan-assembler "xvcmpgtsp" } } */ >> /* { dg-final { scan-assembler "xvcmpgesp" } } */ >> /* { dg-final { scan-assembler "xxsldwi" } } */ >> @@ -112,7 +111,6 @@ int do_cmp (void) >>d[i][0] = __builtin_vsx_xvcmpgtdp (d[i][1], d[i][2]); i++; >>d[i][0] = __builtin_vsx_xvcmpgedp (d[i][1], d[i][2]); i++; >> >> - f[i][0] = __builtin_vsx_xvcmpeqsp (f[i][1], f[i][2]); i++; >>f[i][0] = __builtin_vsx_xvcmpgtsp (f[i][1], f[i][2]); i++; >>f[i][0] = __builtin_vsx_xvcmpgesp (f[i][1], f[i][2]); i++; >>return i; > > As the other in this patch series, I prefer to change it with > vec_cmpeq here, OK for trunk with this tweaked (also keep the > scan there), thanks! When I went to change the test case I noticed that __builtin_vsx_xvcmpeqsp and vec_cmpeq both return a vector where the element is all ones if the comparison is True and zeros if False. However, the return type for __builtin_vsx_xvcmpeqsp is vector floats but vec_cmpeq returns vector bool. The PVIPR says the vec_cmpeq built-in returns a value where each bit in the vector element is a 1 if the comparison is equal and 0 otherwise. However, the documented result is a vector bool int for the floating point comparison. The return value for __builtin_vsx_xvcmpeqsp was vector float. So, the "bit values" returned are the same but not of the same type. So technically vec_cmpeq is not a drop in replacement for __builtin_vsx_xvcmpeqsp. Given that, perhaps we should not be removing __builtin_vsx_xvcmpeqsp? The testcase has to be changed from: f[i][0] = __builtin_vsx_xvcmpeqsp (f[i][1], f[i][2]); i++; bi[i][0] = vec_cmpeq (f[i][1], f[i][2]); i++; I am thinking we should drop this patch from the series, i.e. don't remove __builtin_vsx_xvcmpeqsp. Thoughts? Carl > > BR, > Kewen >
Re: [PATCH 12/13] rs6000, remove __builtin_vsx_xvcmpeqsp built-in
Kewen: On 5/24/24 03:43, Kewen.Lin wrote: > Hi, > > on 2024/5/24 02:21, Carl Love wrote: >> >> >> On 5/13/24 22:37, Kewen.Lin wrote: >>> Hi, >>> >>> on 2024/4/20 05:18, Carl Love wrote: >>>> rs6000, remove __builtin_vsx_xvcmpeqsp built-in >>>> >>>> The built-in __builtin_vsx_xvcmpeqsp is a duplicate of the overloaded >>>> vec_cmpeq built-in. The built-in is undocumented. The built-in and >>>> the test cases are removed. >>>> >>>> gcc/ChangeLog: >>>>* config/rs6000/rs6000-builtins.def (__builtin_vsx_xvcmpeqsp): >>>>Remove built-in definition. >>>> >>> >>> Ah, you separated this __builtin_vsx_xvcmpeqsp from the one for >>> __builtin_vsx_xvcmpeqsp_p, it's fine, please ignore the comments for >>> considering this __builtin_vsx_xvcmpeqsp in my previous reply to 11/13. >>> >>> >>>> gcc/testsuite/ChangeLog: >>>>* vsx-builtin-3.c (do_cmp): Remove test case for >>>>__builtin_vsx_xvcmpeqsp. >>>> --- >>>> gcc/config/rs6000/rs6000-builtins.def| 3 --- >>>> gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c | 2 -- >>>> 2 files changed, 5 deletions(-) >>>> >>>> diff --git a/gcc/config/rs6000/rs6000-builtins.def >>>> b/gcc/config/rs6000/rs6000-builtins.def >>>> index 2f6149edd5f..19d05b8043a 100644 >>>> --- a/gcc/config/rs6000/rs6000-builtins.def >>>> +++ b/gcc/config/rs6000/rs6000-builtins.def >>>> @@ -1613,9 +1613,6 @@ >>>>const signed int __builtin_vsx_xvcmpeqdp_p (signed int, vd, vd); >>>> XVCMPEQDP_P vector_eq_v2df_p {pred} >>>> >>>> - const vf __builtin_vsx_xvcmpeqsp (vf, vf); >>>> -XVCMPEQSP vector_eqv4sf {} >>>> - >>>>const vd __builtin_vsx_xvcmpgedp (vd, vd); >>>> XVCMPGEDP vector_gev2df {} >>>> >>>> diff --git a/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c >>>> b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c >>>> index 35ea31b2616..245893dc0e3 100644 >>>> --- a/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c >>>> +++ b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c >>>> @@ -27,7 +27,6 @@ >>>> /* { dg-final { scan-assembler "xvcmpeqdp" } } */ >>>> /* { dg-final { scan-assembler "xvcmpgtdp" } } */ >>>> /* { dg-final { scan-assembler "xvcmpgedp" } } */ >>>> -/* { dg-final { scan-assembler "xvcmpeqsp" } } */ >>>> /* { dg-final { scan-assembler "xvcmpgtsp" } } */ >>>> /* { dg-final { scan-assembler "xvcmpgesp" } } */ >>>> /* { dg-final { scan-assembler "xxsldwi" } } */ >>>> @@ -112,7 +111,6 @@ int do_cmp (void) >>>>d[i][0] = __builtin_vsx_xvcmpgtdp (d[i][1], d[i][2]); i++; >>>>d[i][0] = __builtin_vsx_xvcmpgedp (d[i][1], d[i][2]); i++; >>>> >>>> - f[i][0] = __builtin_vsx_xvcmpeqsp (f[i][1], f[i][2]); i++; >>>>f[i][0] = __builtin_vsx_xvcmpgtsp (f[i][1], f[i][2]); i++; >>>>f[i][0] = __builtin_vsx_xvcmpgesp (f[i][1], f[i][2]); i++; >>>>return i; >>> >>> As the other in this patch series, I prefer to change it with >>> vec_cmpeq here, OK for trunk with this tweaked (also keep the >>> scan there), thanks! >> >> When I went to change the test case I noticed that __builtin_vsx_xvcmpeqsp >> and vec_cmpeq both return a vector where the element is all ones if the >> comparison is True and zeros if False. However, the return type for >> __builtin_vsx_xvcmpeqsp is vector floats but vec_cmpeq returns vector bool. >> > > Ah, so they are not equivalent from prototype perspective. > >> The PVIPR says the vec_cmpeq built-in returns a value where each bit in the >> vector element is a 1 if the comparison is equal and 0 otherwise. However, >> the documented result is a vector bool int for the floating point >> comparison. The return value for __builtin_vsx_xvcmpeqsp was vector float. > > IMHO PVIPR prototype (returning vector bool) makes more sense, > it does match better with what the result holds. Yes, I tend to agree. I think the user would use be likely using the test so they could create a mask to selectively replace vector elements. A bool type make more sense in that case. > >> >> So, the "bit values" returned are the same but not of the same type. S
Re: [PATCH 2/13] rs6000, Remove __builtin_vsx_xvcvspsxws built-in
Kewen: On 5/14/24 01:43, Kewen.Lin wrote: > Hi, > > on 2024/4/20 05:17, Carl Love wrote: >> rs6000, Remove __builtin_vsx_xvcvspsxws built-in >> >> The built-in __builtin_vsx_xvcvspsxws is a duplicate of the vec_signed >> built-in that is documented in the PVIPR. The __builtin_vsx_xvcvspsxws >> built-in is not documented and there are no test cases for it. >> >> This patch removes the redundant built-in. > > By revisiting the comments on the previous version: > https://gcc.gnu.org/pipermail/gcc-patches/2024-February/646723.html The comments from the previous version: - I think we should recommend users to adopt the recommended built-ins in PVIPR, by checking the corresponding mnemonic in PVIPR, I got: __builtin_vsx_xvcvspsxws -> vec_signed __builtin_vsx_xvcvspsxds -> N/A __builtin_vsx_xvcvspuxds -> N/A __builtin_vsx_xvcvdpsxws -> vec_signed{e,o} __builtin_vsx_xvcvdpuxws -> vec_unsigned{e,o} __builtin_vsx_xvcvdpuxds_uns -> vec_unsigned __builtin_vsx_xvcvspdp -> vec_double{e,o} __builtin_vsx_xvcvdpsp -> vec_float{e,o} __builtin_vsx_xvcvspuxws -> vec_unsigned __builtin_vsx_xvcvsxwdp -> vec_double{e,o} __builtin_vsx_xvcvuxddp_uns> vec_double For __builtin_vsx_xvcvspsxds and __builtin_vsx_xvcvspuxds which don't have the according PVIPR built-ins, we can extend the current vec_{un,}signed{e,o} to cover them and document them following the section mentioning PVIPR. are handled by multiple patches in the new series. The main comment on the previous patch series was to remove most of the built-ins as they were redundant. So, basically most of the patches in the previous series were thrown out and a new series to remove the built-ins in the current series. That all said, I distinctly remember addressing each of the above built-ins. The work on the series got interrupted a couple of times and it looks like some of the patches to address the above got lost. My bad. The following is a list of which patch takes care of removing the duplicate built-ins. __builtin_vsx_xvcvspsxws patch 2 removes this built-in __builtin_vsx_xvcvspsxds -> N/A patch 4 extends vec_{un,}signede to cover this built-in, Built-in used in rs6000-overload.def. Built-in now for internal use only. __builtin_vsx_xvcvspuxds -> N/A patch 4 extends vec_{un,}signedo to cover this built-in. Built-in used in rs6000-overload.def. Built-in now for internal use only __builtin_vsx_xvcvdpsxws -> vec_signed{e,o} removed in patch 4 __builtin_vsx_xvcvdpuxws -> vec_unsigned{e,o} removed in patch 4 __builtin_vsx_xvcvdpuxds_uns -> vec_unsigned remove in patch 4 __builtin_vsx_xvcvspuxws -> vec_unsigned remove in patch 4 The following will changes will be put into a new patch when the series is reposted. It appears they got lost in the current series. My bad. __builtin_vsx_xvcvspdp -> vec_double{e,o} remove in new patch number 5 __builtin_vsx_xvcvdpsp -> vec_float{e,o}remove in new patch number 5 __builtin_vsx_xvcvsxwdp -> vec_double{e,o} remove in new patch number 5 __builtin_vsx_xvcvuxddp_uns> vec_double remove in new patch number 5 > > I wonder if it's intentional to keep the others, at least bifs > __builtin_vsx_xvcvdpuxds_uns, __builtin_vsx_xvcvspuxws and > __builtin_vsx_xvcvuxddp_uns looks removable, users can just uses the > equivalent ones in PVIPR. And for the others, users can still use > the PVIPR ones by considering endianness (controlling with endianness > macros). > Hopefully that makes it clearer where the various changes are. The next series will add a new patch 5 in the series. The remaining patches in this series, patches 5, 6, ... will get moved to patch 6, 7, ... in the next posting of the built-in cleanup patch series. Carl
Re: [PATCH 4/13] rs6000, extend the current vec_{un,}signed{e,o} built-ins
Kewen: On 5/14/24 00:53, Kewen.Lin wrote: > Hi, > > on 2024/4/20 05:17, Carl Love wrote: >> rs6000, extend the current vec_{un,}signed{e,o} built-ins >> >> The built-ins __builtin_vsx_xvcvspsxds and __builtin_vsx_xvcvspuxds >> convert a vector of floats to signed/unsigned long long ints. Extend the >> existing vec_{un,}signed{e,o} built-ins to handle the argument >> vector of floats to return the even/odd signed/unsigned integers. >> >> Add testcases and update documentation. >> >> gcc/ChangeLog: >> * config/rs6000/rs6000-builtins.def (__builtin_vsx_xvcvspsxds_low, >> __builtin_vsx_xvcvspuxds_low): New built-in definitions. >> * config/rs6000/rs6000-overload.def (vec_signede, vec_signedo): >> Add new overloaded specifications. >> * config/rs6000/vsx.md (vsx_xvcvspxds_low): New define_expand. >> * doc/extend.texi (vec_signedo, vec_signede): Add documentation. >> >> gcc/testsuite/ChangeLog: >> * gcc.target/powerpc/builtins-3-runnable: New tests for the added >> overloaded built-ins. > > This part is missing, there are no test case changes in this patch. Yes, the new tests are missing. Not sure what happened to them. Fixed. > >> --- >> gcc/config/rs6000/rs6000-builtins.def | 6 ++ >> gcc/config/rs6000/rs6000-overload.def | 8 >> gcc/config/rs6000/vsx.md | 23 +++ >> gcc/doc/extend.texi | 13 + >> 4 files changed, 50 insertions(+) >> >> diff --git a/gcc/config/rs6000/rs6000-builtins.def >> b/gcc/config/rs6000/rs6000-builtins.def >> index bf9a0ae22fc..5b7237a2327 100644 >> --- a/gcc/config/rs6000/rs6000-builtins.def >> +++ b/gcc/config/rs6000/rs6000-builtins.def >> @@ -1709,9 +1709,15 @@ >>const vsll __builtin_vsx_xvcvspsxds (vf); >> XVCVSPSXDS vsx_xvcvspsxds {} >> >> + const vsll __builtin_vsx_xvcvspsxds_low (vf); >> +XVCVSPSXDSO vsx_xvcvspsxds_low {} >> + >>const vsll __builtin_vsx_xvcvspuxds (vf); >> XVCVSPUXDS vsx_xvcvspuxds {} > > This existing should return with type vull, ... Fixed. > >> >> + const vsll __builtin_vsx_xvcvspuxds_low (vf); >> +XVCVSPUXDSO vsx_xvcvspuxds_low {} > > ... so this copied one should be vull too. Fixed. > > As the existing instances for vec_signed and vec_unsigned are with > names like VEC_V{UN,}SIGNED{O,E}_V2DF, I prefer these are updated > with similar style, maybe something like: > > VEC_V{UN,}SIGNED{E,O}_V4SF v{un,}signed{e,o}_v4sf Yes, sounds reasonable. Changed XVCVSPUXDS -> VEC_VUNSIGNEDE_V4SF XVCVSPUXDSO -> VEC_VUNSIGNEDO_V4SF XVCVSPSXDS -> VEC_VSIGNEDE_V4SF XVCVSPSXDSO -> VEC_VSIGNEDO_V4SF NEED TO ADDRESS RESPONSE TO QUESTION I ASKED. > >>const vsi __builtin_vsx_xvcvspuxws (vf); >> XVCVSPUXWS vsx_fixuns_truncv4sfv4si2 {} >> > diff --git a/gcc/config/rs6000/rs6000-overload.def >> b/gcc/config/rs6000/rs6000-overload.def >> index 84bd9ae6554..68501c05289 100644 >> --- a/gcc/config/rs6000/rs6000-overload.def >> +++ b/gcc/config/rs6000/rs6000-overload.def >> @@ -3307,10 +3307,14 @@ >> [VEC_SIGNEDE, vec_signede, __builtin_vec_vsignede] >>vsi __builtin_vec_vsignede (vd); >> VEC_VSIGNEDE_V2DF >> + vsll __builtin_vec_vsignede (vf); >> +XVCVSPSXDS >> >> [VEC_SIGNEDO, vec_signedo, __builtin_vec_vsignedo] >>vsi __builtin_vec_vsignedo (vd); >> VEC_VSIGNEDO_V2DF >> + vsll __builtin_vec_vsignedo (vf); >> +XVCVSPSXDSO >> >> [VEC_SIGNEXTI, vec_signexti, __builtin_vec_signexti] >>vsi __builtin_vec_signexti (vsc); >> @@ -4433,10 +4437,14 @@ >> [VEC_UNSIGNEDE, vec_unsignede, __builtin_vec_vunsignede] >>vui __builtin_vec_vunsignede (vd); >> VEC_VUNSIGNEDE_V2DF >> + vull __builtin_vec_vunsignede (vf); >> +XVCVSPUXDS >> >> [VEC_UNSIGNEDO, vec_unsignedo, __builtin_vec_vunsignedo] >>vui __builtin_vec_vunsignedo (vd); >> VEC_VUNSIGNEDO_V2DF >> + vull __builtin_vec_vunsignedo (vf); >> +XVCVSPUXDSO >> > As above, the name can be tweaked. Fixed. > >> [VEC_VEE, vec_extract_exp, __builtin_vec_extract_exp] >>vui __builtin_vec_extract_exp (vf); >> diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md >> index f135fa079bd..3d39ae7995f 100644 >> --- a/gcc/config/rs6000/vsx.md >> +++ b/gcc/config/rs6000/vsx.md >> @@ -2704,6
Re: [PATCH 3/13] rs6000, fix error in unsigned vector float to unsigned int built-in definitions
Keewn: On 5/14/24 00:00, Kewen.Lin wrote: > Hi, > > on 2024/4/20 05:17, Carl Love wrote: >> rs6000, fix error in unsigned vector float to unsigned int built-in >> definitions >> >> The built-ins __builtin_vsx_vunsigned_v2df and__builtin_vsx_vunsigned_v4sf >> are supposed to take a vector of floats and return a vector of unsigned >> long long ints. The definitions are using the signed version of the > > Sorry for nitpicking, here __builtin_vsx_vunsigned_v2df takes vector of > doubles > and returns vector of unsigned long long ints while > __builtin_vsx_vunsigned_v4sf > takes vector of floats and returns vector of unsigned ints. That is not nitpicking, the description is wrong. Changed float to double. > >> instructions not the unsigned version of the instruction. The results >> should also be unsigned. The builtins are used by the overloaded >> vec_unsigned builtin which has an unsigned result. >> >> Similarly the built-ins __builtin_vsx_vunsignede_v2df and >> __builtin_vsx_vunsignedo_v2df are supposed to retun an unsigned result. > > Nit: s/retun/return/ Fixed. > >> If the floating point argument is negative, the unsigned result is zero. >> The built-ins are used in the overloaded built-in vec_unsignede and >> vec_unsignedo respectively. >> >> Add a test cases for a negative floating point arguments for each of the >> above built-ins. >> >> gcc/ChangeLog: >> * config/rs6000/rs6000-builtins.def (__builtin_vsx_vunsigned_v2df, >> __builtin_vsx_vunsigned_v4sf, __builtin_vsx_vunsignede_v2df, >> __builtin_vsx_vunsignedo_v2df): Change the result type to unsigned. >> >> gcc/testsuite/ChangeLog: >> * gcc.target/powerpc/builtins-3-runnable.c: Add tests for >> vec_unsignede and vec_unsignedo with negative arguments. >> --- >> gcc/config/rs6000/rs6000-builtins.def | 12 +- >> .../gcc.target/powerpc/builtins-3-runnable.c | 23 --- >> 2 files changed, 26 insertions(+), 9 deletions(-) >> >> diff --git a/gcc/config/rs6000/rs6000-builtins.def >> b/gcc/config/rs6000/rs6000-builtins.def >> index c6d2ea1bc39..bf9a0ae22fc 100644 >> --- a/gcc/config/rs6000/rs6000-builtins.def >> +++ b/gcc/config/rs6000/rs6000-builtins.def >> @@ -1580,16 +1580,16 @@ >>const vsi __builtin_vsx_vsignedo_v2df (vd); >> VEC_VSIGNEDO_V2DF vsignedo_v2df {} >> >> - const vsll __builtin_vsx_vunsigned_v2df (vd); >> -VEC_VUNSIGNED_V2DF vsx_xvcvdpsxds {} >> + const vull __builtin_vsx_vunsigned_v2df (vd); >> +VEC_VUNSIGNED_V2DF vsx_xvcvdpuxds {} >> >> - const vsi __builtin_vsx_vunsigned_v4sf (vf); >> -VEC_VUNSIGNED_V4SF vsx_xvcvspsxws {} >> + const vui __builtin_vsx_vunsigned_v4sf (vf); >> +VEC_VUNSIGNED_V4SF vsx_xvcvspuxws {} >> >> - const vsi __builtin_vsx_vunsignede_v2df (vd); >> + const vui __builtin_vsx_vunsignede_v2df (vd); >> VEC_VUNSIGNEDE_V2DF vunsignede_v2df {} >> >> - const vsi __builtin_vsx_vunsignedo_v2df (vd); >> + const vui __builtin_vsx_vunsignedo_v2df (vd); >> VEC_VUNSIGNEDO_V2DF vunsignedo_v2df {} >> >>const vf __builtin_vsx_xscvdpsp (double); >> diff --git a/gcc/testsuite/gcc.target/powerpc/builtins-3-runnable.c >> b/gcc/testsuite/gcc.target/powerpc/builtins-3-runnable.c >> index 0231a1fd086..6d4fe84c8a1 100644 >> --- a/gcc/testsuite/gcc.target/powerpc/builtins-3-runnable.c >> +++ b/gcc/testsuite/gcc.target/powerpc/builtins-3-runnable.c >> @@ -313,6 +313,15 @@ int main() >> test_unsigned_int_result (ALL, vec_uns_int_result, >>vec_uns_int_expected); >> >> +/* Convert single precision float to unsigned int. Negative >> + arguments >> + */ >> +vec_flt0 = (vector float){-14.930, -834.49, -3.3, -5.4}; >> +vec_uns_int_expected = (vector unsigned int){0, 0, 0, 0}; >> +vec_uns_int_result = vec_unsigned (vec_flt0); >> +test_unsigned_int_result (ALL, vec_uns_int_result, >> + vec_uns_int_expected); >> + >> /* Convert double precision float to long long unsigned int */ >> vec_dble0 = (vector double){124.930, 8134.49}; >> vec_ll_uns_int_expected = (vector long long unsigned int){124, 8134}; >> @@ -321,9 +330,9 @@ int main() >> vec_ll_uns_int_expected); > > Nit: Similar coverage on negative for vector double can be added here. Added. Carl
Re: [PATCH 7/13] rs6000, remove the vec_xxsel built-ins, they are duplicates
Kewen: On 5/13/24 19:55, Kewen.Lin wrote: > Hi, > > on 2024/4/20 05:18, Carl Love wrote: >> rs6000, remove the vec_xxsel built-ins, they are duplicates >> -int do_sel(void) >> -{ >> - int i = 0; >> - >> - si[i][0] = __builtin_vsx_xxsel_4si (si[i][1], si[i][2], si[i][3]); i++; ^ changed to ui >> - ss[i][0] = __builtin_vsx_xxsel_8hi (ss[i][1], ss[i][2], ss[i][3]); i++; ^ changed to ui >> - sc[i][0] = __builtin_vsx_xxsel_16qi (sc[i][1], sc[i][2], sc[i][3]); i++; ^ changed to uc >> - f[i][0] = __builtin_vsx_xxsel_4sf (f[i][1], f[i][2], f[i][3]); i++; >> - d[i][0] = __builtin_vsx_xxsel_2df (d[i][1], d[i][2], d[i][3]); i++; >> - >> - si[i][0] = __builtin_vsx_xxsel (si[i][1], si[i][2], bi[i][3]); i++; >> - ss[i][0] = __builtin_vsx_xxsel (ss[i][1], ss[i][2], bs[i][3]); i++; >> - sc[i][0] = __builtin_vsx_xxsel (sc[i][1], sc[i][2], bc[i][3]); i++; >> - f[i][0] = __builtin_vsx_xxsel (f[i][1], f[i][2], bi[i][3]); i++; >> - d[i][0] = __builtin_vsx_xxsel (d[i][1], d[i][2], bl[i][3]); i++; >> - >> - si[i][0] = __builtin_vsx_xxsel (si[i][1], si[i][2], ui[i][3]); i++; >> - ss[i][0] = __builtin_vsx_xxsel (ss[i][1], ss[i][2], us[i][3]); i++; >> - sc[i][0] = __builtin_vsx_xxsel (sc[i][1], sc[i][2], uc[i][3]); i++; >> - f[i][0] = __builtin_vsx_xxsel (f[i][1], f[i][2], ui[i][3]); i++; >> - d[i][0] = __builtin_vsx_xxsel (d[i][1], d[i][2], ul[i][3]); i++; >> - >> - return i; >> -} >> - > > I prefer to keep them but just replacing the call with vec_sel. > > OK with the above nits tweaked, thanks. OK, changed __builtin_vsx_xxsel_4si_* to vec_sel, changed__builtin_vsx_xxsel to vec_sel. Had to add #include . Finally, changed the third argument for the first three calls, as noted above, to be compatible with the vec_sel built-in specification. Carl > > BR, > Kewen > >> int do_perm(void) >> { >>int i = 0; >
Re: [PATCH 11/13] rs6000, remove __builtin_vsx_xvcmpeqsp_p built-in
On 5/13/24 22:26, Kewen.Lin wrote: > Hi, > > on 2024/4/20 05:18, Carl Love wrote: >> rs6000, remove __builtin_vsx_xvcmpeqsp_p built-in >> >> The built-in __builtin_vsx_xvcmpeqsp_p is a duplicate of the overloaded >> __builtin_altivec_vcmpeqfp_p built-in. The built-in is undocumented and >> there are no test cases for it. The patch removes built-in >> __builtin_vsx_xvcmpeqsp_p. > As the previous review comments in the v1 (this is actually v2): > https://gcc.gnu.org/pipermail/gcc-patches/2024-February/646728.html > , both __builtin_vsx_xvcmpeqsp_p and __builtin_vsx_xvcmpeqsp can be > dropped, so please consider __builtin_vsx_xvcmpeqsp as well. Yes, as you noted, __builtin_vsx_xvcmpeqsp is removed in the next patch. > >> >> gcc/ChangeLog: >> * config/rs6000/rs6000-builtin.cc (case RS6000_BIF_RSQRT): >> Remove case statement. > > It seems you mixed this with some other patch, this line doesn't > belong to this patch, ... Took that out of this patch. Didn't get the changes separated cleanly. > >> * config/rs6000/rs6000-builtins.def (__builtin_vsx_xvcmpeqsp_p): >> Remove built-in definition. >> --- >> gcc/config/rs6000/rs6000-builtin.cc | 6 -- >> gcc/config/rs6000/rs6000-builtins.def | 6 -- >> 2 files changed, 12 deletions(-) >> >> diff --git a/gcc/config/rs6000/rs6000-builtin.cc >> b/gcc/config/rs6000/rs6000-builtin.cc >> index f83d65b06ef..74ed8fc1805 100644 >> --- a/gcc/config/rs6000/rs6000-builtin.cc >> +++ b/gcc/config/rs6000/rs6000-builtin.cc >> @@ -269,12 +269,6 @@ rs6000_builtin_md_vectorized_function (tree fndecl, >> tree type_out, >> = (enum rs6000_gen_builtins) DECL_MD_FUNCTION_CODE (fndecl); >>switch (fn) >> { >> -case RS6000_BIF_RSQRTF: >> - if (VECTOR_UNIT_ALTIVEC_OR_VSX_P (V4SFmode) >> - && out_mode == SFmode && out_n == 4 >> - && in_mode == SFmode && in_n == 4) >> -return rs6000_builtin_decls[RS6000_BIF_VRSQRTFP]; >> - break; > > ... and this ... Ditto > >> case RS6000_BIF_RSQRT: >>if (VECTOR_UNIT_VSX_P (V2DFmode) >>&& out_mode == DFmode && out_n == 2 >> diff --git a/gcc/config/rs6000/rs6000-builtins.def >> b/gcc/config/rs6000/rs6000-builtins.def >> index d65c858ac0c..2f6149edd5f 100644 >> --- a/gcc/config/rs6000/rs6000-builtins.def >> +++ b/gcc/config/rs6000/rs6000-builtins.def >> @@ -917,9 +917,6 @@ >>fpmath vf __builtin_altivec_vrsqrtefp (vf); >> VRSQRTEFP rsqrtev4sf2 {} >> >> - fpmath vf __builtin_altivec_vrsqrtfp (vf); >> -VRSQRTFP rsqrtv4sf2 {} >> - > > ..., also this. Ditto > > BR, > Kewen > >>const vsc __builtin_altivec_vsel_16qi (vsc, vsc, vuc); >> VSEL_16QI vector_select_v16qi {} >> >> @@ -1619,9 +1616,6 @@ >>const vf __builtin_vsx_xvcmpeqsp (vf, vf); >> XVCMPEQSP vector_eqv4sf {} >> >> - const signed int __builtin_vsx_xvcmpeqsp_p (signed int, vf, vf); >> -XVCMPEQSP_P vector_eq_v4sf_p {pred} >> - >>const vd __builtin_vsx_xvcmpgedp (vd, vd); >> XVCMPGEDP vector_gev2df {} >>
Re: [PATCH 6/13] rs6000, add overloaded vec_sel with int128 arguments
Kewen: On 5/21/24 20:05, Kewen.Lin wrote: > Hi Carl, > > on 2024/5/22 08:13, Carl Love wrote: >> Kewen: >>> Why did you place this in a section for ISA 3.1 (Power10)? It doesn't >>> really >>> require this support. The used instance VSEL_1TI and VSEL_1TI_UNS are >>> placed >>> in altivec stanza, so it looks that we should put it under the section >>> "PowerPC AltiVec Built-in Functions on ISA 2.05". And since it's an >>> extension >>> of @code{vec_sel} documented in the PVIPR, I prefer to just mention it's "an >>> extension of the @code{vec_sel} built-in documented in the PVIPR" and >>> omitting >>> the description to avoid possible slightly different wording. >> >> Honestly, at this point in time I don't remember why I put it there. It has >> been too long since I created the patch. That said, the test case requires >> Power 10 do to the comparison check using built-in vec_all_eq but that is >> another issue. >> The built-in generates the xxsel instruction that is an ISA 2.06 >> instruction. So, I would say it should to into the ISA 2.06 section. I >> moved it to the ISA 2.06 section. > > But the underlying implementation is: > > const vsq __builtin_altivec_vsel_1ti (vsq, vsq, vuq); > VSEL_1TI vector_select_v1ti {} > > const vuq __builtin_altivec_vsel_1ti_uns (vuq, vuq, vuq); > VSEL_1TI_UNS vector_select_v1ti_uns {} > > , it's under altivec stanza and can result with insn vsel (so not xxsel), > vsel is ISA 2.03, so I think ISA 2.05 better matches the implementation. OK, moved to ISA 2.05 > >> >> Sounds like there was some issue that you noticed on >> r14-10011-g6e62ede7aaccc6. The new version of >> print_i128 should be functionally equivalent but perhaps is "safer"? > > Thanks for checking! Looking into this more closely, I realized you didn't > apply the previously > adopted way for printing (the way used in > gcc.target/powerpc/builtins-6-p9-runnable.c), sorry for > the false alarm! So your supposed print_i128 is fine to me. OK, no problem. Will go with the original print_i128 function. Carl
Re: [PATCH 8/13] rs6000, remove __builtin_vsx_vperm_* built-ins
Kewen: On 5/13/24 19:59, Kewen.Lin wrote: > Hi, > > on 2024/4/20 05:18, Carl Love wrote: >> diff --git a/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c >> b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c >> index 01f35dad713..35ea31b2616 100644 >> --- a/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c >> +++ b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c >> @@ -2,7 +2,6 @@ >> /* { dg-skip-if "" { powerpc*-*-darwin* } } */ >> /* { dg-require-effective-target powerpc_vsx_ok } */ >> /* { dg-options "-O2 -mdejagnu-cpu=power7" } */ >> -/* { dg-final { scan-assembler "vperm" } } */ >> /* { dg-final { scan-assembler "xvrdpi" } } */ >> /* { dg-final { scan-assembler "xvrdpic" } } */ >> /* { dg-final { scan-assembler "xvrdpim" } } */ >> @@ -56,25 +55,6 @@ extern __vector unsigned long long ull[][4]; >> extern __vector __bool long bl[][4]; >> #endif >> >> -int do_perm(void) >> -{ >> - int i = 0; >> - >> - si[i][0] = __builtin_vsx_vperm_4si (si[i][1], si[i][2], uc[i][3]); i++; >> - ss[i][0] = __builtin_vsx_vperm_8hi (ss[i][1], ss[i][2], uc[i][3]); i++; >> - sc[i][0] = __builtin_vsx_vperm_16qi (sc[i][1], sc[i][2], uc[i][3]); i++; >> - f[i][0] = __builtin_vsx_vperm_4sf (f[i][1], f[i][2], uc[i][3]); i++; >> - d[i][0] = __builtin_vsx_vperm_2df (d[i][1], d[i][2], uc[i][3]); i++; >> - >> - si[i][0] = __builtin_vsx_vperm (si[i][1], si[i][2], uc[i][3]); i++; >> - ss[i][0] = __builtin_vsx_vperm (ss[i][1], ss[i][2], uc[i][3]); i++; >> - sc[i][0] = __builtin_vsx_vperm (sc[i][1], sc[i][2], uc[i][3]); i++; >> - f[i][0] = __builtin_vsx_vperm (f[i][1], f[i][2], uc[i][3]); i++; >> - d[i][0] = __builtin_vsx_vperm (d[i][1], d[i][2], uc[i][3]); i++; >> - >> - return i; >> -} >> - > > I prefer to just relace these __builtin_vsx_vperm with vec_perm, > OK with this tweaked (also keep the above removed vperm scan), thanks! OK, sounds good. Updated the patch to change built-in calls to vec_perm. Updated ChangeLog message to match change. Carl
Re: [PATCH 10/13] rs6000, extend vec_xxpermdi built-in for __int128 args
On 5/13/24 22:14, Kewen.Lin wrote: > Hi, > > on 2024/4/20 05:18, Carl Love wrote: >> rs6000, extend vec_xxpermdi built-in for __int128 args >> >> Add a new overloaded instance for vec_xxpermdi >> >>__int128 vec_xxpermdi (__int128, __int128, const int); >> >> Update the documentation to include a reference to the new built-in >> instance. >> >> gcc/ChangeLog: >> * config/rs6000/rs6000-builtins.def (vec_xxpermdi): Add new >> overloaded built-in instance. >> --- >> gcc/config/rs6000/rs6000-overload.def | 2 ++ >> gcc/doc/extend.texi | 1 + >> 2 files changed, 3 insertions(+) >> >> diff --git a/gcc/config/rs6000/rs6000-overload.def >> b/gcc/config/rs6000/rs6000-overload.def >> index 5912c9452f4..49962e2f2a2 100644 >> --- a/gcc/config/rs6000/rs6000-overload.def >> +++ b/gcc/config/rs6000/rs6000-overload.def >> @@ -4932,6 +4932,8 @@ >> XXPERMDI_4SF XXPERMDI_VF >>vd __builtin_vsx_xxpermdi (vd, vd, const int); >> XXPERMDI_2DF XXPERMDI_VD >> + vsq __builtin_vsx_xxpermdi (vsq, vsq, const int); >> +XXPERMDI_1TI XXPERMDI_1TI > > This actually introduces the signed __int128, considering the other > existing ones, I think we want both signed and unsigned. Added unsigned as well. > >> >> [VEC_XXSLDWI, vec_xxsldwi, __builtin_vsx_xxsldwi] >>vsc __builtin_vsx_xxsldwi (vsc, vsc, const int); >> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi >> index 86b8e536dbe..47cf2f3bc8b 100644 >> --- a/gcc/doc/extend.texi >> +++ b/gcc/doc/extend.texi >> @@ -22505,6 +22505,7 @@ void vec_vsx_st (vector bool char, int, vector bool >> char *); >> void vec_vsx_st (vector bool char, int, unsigned char *); >> void vec_vsx_st (vector bool char, int, signed char *); >> >> +vector __int128 vec_xxpermdi (vector __int128, vector __int128, const int); >> vector double vec_xxpermdi (vector double, vector double, const int); >> vector float vec_xxpermdi (vector float, vector float, const int); > > Nit: Considering the existing ones sorted by element size descending, I guess > it's better to move the above here (and with the explicit signed and > unsigned). OK, moved the new prototype down below the float prototype and added the unsigned prototype. > > And we need a test case for it as well? Yes, we need a test case for both. Added a new runnable test file. Carl
[PATCH 0/13 ver 3] rs6000, built-in cleanup patch series
GCC maintainers: The following is an updated patch series to remove duplicate built-ins. There are patches to extend an existing overloaded built-in to cover additional input types. A new patch, 0005-rs6000-Remove-redundant-float-double-type-conversion.patch, was added to remove built-ins that were inadvertently missing in the last version. Patch 12 patch in the previous series was dropped as the built-in __builtin_vsx_xvcmpeqsp is not a duplicate of the overloaded vec_cmpeq built-in. Specifically, the return values are different. The goal in this series is to remove built-ins that are functionally equivalent. Patch 12 from the previous series will be reworked and submitted later. Some of the patches in the previous series were approved, but everything is being reposted for completeness. The following gives the mapping of the patches from the previous version to the current version of the series with notes on the patches. Version 2 Version 3 Notes patch 1 patch 1 Approved, no changes patch 2 patch 2 Responded to comments, no changes to the patch patch 3 patch 3 Updated changelog, no functional changes patch 4 patch 4 Updated patch patch 5 New patch to removed built-ins missed in the series. patch 5 patch 6 Updated patch patch 6 patch 7 Updated patch patch 7 patch 8 Updated patch patch 8 patch 9 Approved, no changes to this patch patch 9 patch 10Approved, no changes to this patch patch 10patch 11Updated, added test file. patch 11patch 12Updated patch 12Patch from previous series removed patch 13patch 13Comments said built-ins __builtin_vec_set_v1ti __builtin_vec_set_v2di, __builtin_vec_set_v2df can also get removed with equivalent gimple codes. This is somewhat more involved than a simple removal of redundant built-ins. The built-ins will be removed in a separate future patch. The patch series has been tested on Power 10 LE, Power 9 BE with no regression failures. in additional patch The patches have all been tested on Power 10 LE. The last patch was also tested on Power 8 BE. No regression tests were seen. Please let me know if the patches are acceptable for mainline. Thanks. Carl
Re: [PATCH 1/13 ver 3] s6000, Remove __builtin_vsx_cmple* builtins
This patch was approved in the previous series. There are no changes to this patch. Reposting for completeness. Carl --- rs6000, Remove __builtin_vsx_cmple* builtins The built-ins __builtin_vsx_cmple_u16qi, __builtin_vsx_cmple_u2di, __builtin_vsx_cmple_u4si and __builtin_vsx_cmple_u8hi should take unsigned arguments and return an unsigned result. The current definitions take signed arguments and return signed results which is incorrect. The signed and unsigned versions of __builtin_vsx_cmple* are not documented in extend.texi. Also there are no test cases for the built-ins. Users can use the existing vec_cmple as PVIPR defines instead of __builtin_vsx_cmple_u16qi, __builtin_vsx_cmple_u2di, __builtin_vsx_cmple_u4si and __builtin_vsx_cmple_u8hi, __builtin_vsx_cmple_16qi, __builtin_vsx_cmple_2di, __builtin_vsx_cmple_4si and __builtin_vsx_cmple_8hi, __builtin_altivec_cmple_1ti, __builtin_altivec_cmple_u1ti. Hence these built-ins are redundant and are removed by this patch. gcc/ChangeLog: * config/rs6000/rs6000-builtin.cc (RS6000_BIF_CMPLE_16QI, RS6000_BIF_CMPLE_U16QI, RS6000_BIF_CMPLE_8HI, RS6000_BIF_CMPLE_U8HI, RS6000_BIF_CMPLE_4SI, RS6000_BIF_CMPLE_U4SI, RS6000_BIF_CMPLE_2DI, RS6000_BIF_CMPLE_U2DI, RS6000_BIF_CMPLE_1TI, RS6000_BIF_CMPLE_U1TI): Remove case statements. * config/rs6000/rs6000-builtins.def (__builtin_vsx_cmple_16qi, __builtin_vsx_cmple_2di, __builtin_vsx_cmple_4si, __builtin_vsx_cmple_8hi, __builtin_vsx_cmple_u16qi, __builtin_vsx_cmple_u2di, __builtin_vsx_cmple_u4si, __builtin_vsx_cmple_u8hi): Remove buit-in definitions. --- gcc/config/rs6000/rs6000-builtin.cc | 13 gcc/config/rs6000/rs6000-builtins.def | 30 --- 2 files changed, 43 deletions(-) diff --git a/gcc/config/rs6000/rs6000-builtin.cc b/gcc/config/rs6000/rs6000-builtin.cc index 320affd79e3..ac9f16fe51a 100644 --- a/gcc/config/rs6000/rs6000-builtin.cc +++ b/gcc/config/rs6000/rs6000-builtin.cc @@ -2027,19 +2027,6 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi) fold_compare_helper (gsi, GT_EXPR, stmt); return true; -case RS6000_BIF_CMPLE_16QI: -case RS6000_BIF_CMPLE_U16QI: -case RS6000_BIF_CMPLE_8HI: -case RS6000_BIF_CMPLE_U8HI: -case RS6000_BIF_CMPLE_4SI: -case RS6000_BIF_CMPLE_U4SI: -case RS6000_BIF_CMPLE_2DI: -case RS6000_BIF_CMPLE_U2DI: -case RS6000_BIF_CMPLE_1TI: -case RS6000_BIF_CMPLE_U1TI: - fold_compare_helper (gsi, LE_EXPR, stmt); - return true; - /* flavors of vec_splat_[us]{8,16,32}. */ case RS6000_BIF_VSPLTISB: case RS6000_BIF_VSPLTISH: diff --git a/gcc/config/rs6000/rs6000-builtins.def b/gcc/config/rs6000/rs6000-builtins.def index 3bc7fed6956..7c36976a089 100644 --- a/gcc/config/rs6000/rs6000-builtins.def +++ b/gcc/config/rs6000/rs6000-builtins.def @@ -1337,30 +1337,6 @@ const vss __builtin_vsx_cmpge_u8hi (vus, vus); CMPGE_U8HI vector_nltuv8hi {} - const vsc __builtin_vsx_cmple_16qi (vsc, vsc); -CMPLE_16QI vector_ngtv16qi {} - - const vsll __builtin_vsx_cmple_2di (vsll, vsll); -CMPLE_2DI vector_ngtv2di {} - - const vsi __builtin_vsx_cmple_4si (vsi, vsi); -CMPLE_4SI vector_ngtv4si {} - - const vss __builtin_vsx_cmple_8hi (vss, vss); -CMPLE_8HI vector_ngtv8hi {} - - const vsc __builtin_vsx_cmple_u16qi (vsc, vsc); -CMPLE_U16QI vector_ngtuv16qi {} - - const vsll __builtin_vsx_cmple_u2di (vsll, vsll); -CMPLE_U2DI vector_ngtuv2di {} - - const vsi __builtin_vsx_cmple_u4si (vsi, vsi); -CMPLE_U4SI vector_ngtuv4si {} - - const vss __builtin_vsx_cmple_u8hi (vss, vss); -CMPLE_U8HI vector_ngtuv8hi {} - const vd __builtin_vsx_concat_2df (double, double); CONCAT_2DF vsx_concat_v2df {} @@ -3117,12 +3093,6 @@ const vbq __builtin_altivec_cmpge_u1ti (vuq, vuq); CMPGE_U1TI vector_nltuv1ti {} - const vbq __builtin_altivec_cmple_1ti (vsq, vsq); -CMPLE_1TI vector_ngtv1ti {} - - const vbq __builtin_altivec_cmple_u1ti (vuq, vuq); -CMPLE_U1TI vector_ngtuv1ti {} - const unsigned long long __builtin_altivec_cntmbb (vuc, const int<1>); VCNTMBB vec_cntmb_v16qi {} -- 2.45.0
Re: [PATCH 2/13 ver 3] rs6000, Remove __builtin_vsx_xvcvspsxws built-in
I responded to comments about the patch from the previous patch series. No functional changes were made to this patch. Carl -- rs6000, Remove __builtin_vsx_xvcvspsxws built-in. The built-in __builtin_vsx_xvcvspsxws is a duplicate of the vec_signed built-in that is documented in the PVIPR. The __builtin_vsx_xvcvspsxws built-in is not documented and there are no test cases for it. This patch removes the redundant built-in. gcc/ChangeLog: * config/rs6000/rs6000-builtins.def (__builtin_vsx_xvcvspsxws): Remove built-in definition. --- gcc/config/rs6000/rs6000-builtins.def | 3 --- 1 file changed, 3 deletions(-) diff --git a/gcc/config/rs6000/rs6000-builtins.def b/gcc/config/rs6000/rs6000-builtins.def index 7c36976a089..c6d2ea1bc39 100644 --- a/gcc/config/rs6000/rs6000-builtins.def +++ b/gcc/config/rs6000/rs6000-builtins.def @@ -1709,9 +1709,6 @@ const vsll __builtin_vsx_xvcvspsxds (vf); XVCVSPSXDS vsx_xvcvspsxds {} - const vsi __builtin_vsx_xvcvspsxws (vf); -XVCVSPSXWS vsx_fix_truncv4sfv4si2 {} - const vsll __builtin_vsx_xvcvspuxds (vf); XVCVSPUXDS vsx_xvcvspuxds {} -- 2.45.0
Re: [PATCH 3/13 ver 3] rs6000, fix error in unsigned vector float to unsigned int built-in definition
This patch was updated per the feedback comment from the previous version in series 2. Carl --- rs6000, fix error in unsigned vector float to unsigned int built-in definitions The built-in __builtin_vsx_vunsigned_v2df is supposed to take a vector of doubles and return a vector of unsigned long long ints. Similarly __builtin_vsx_vunsigned_v4sf takes a vector of floats an is supposed to return a vector of unsinged ints. The definitions are using the signed version of the instructions not the unsigned version of the instruction. The results should also be unsigned. The builtins are used by the overloaded vec_unsigned builtin which has an unsigned result. Similarly the built-ins __builtin_vsx_vunsignede_v2df and __builtin_vsx_vunsignedo_v2df are supposed to return an unsigned result. If the floating point argument is negative, the unsigned result is zero. The built-ins are used in the overloaded built-in vec_unsignede and vec_unsignedo respectively. Add a test cases for a negative floating point arguments for each of the above built-ins. gcc/ChangeLog: * config/rs6000/rs6000-builtins.def (__builtin_vsx_vunsigned_v2df, __builtin_vsx_vunsigned_v4sf, __builtin_vsx_vunsignede_v2df, __builtin_vsx_vunsignedo_v2df): Change the result type to unsigned. gcc/testsuite/ChangeLog: * gcc.target/powerpc/builtins-3-runnable.c: Add tests for vec_unsignede and vec_unsignedo with negative arguments. --- gcc/config/rs6000/rs6000-builtins.def | 12 .../gcc.target/powerpc/builtins-3-runnable.c | 30 +-- 2 files changed, 33 insertions(+), 9 deletions(-) diff --git a/gcc/config/rs6000/rs6000-builtins.def b/gcc/config/rs6000/rs6000-builtins.def index c6d2ea1bc39..bf9a0ae22fc 100644 --- a/gcc/config/rs6000/rs6000-builtins.def +++ b/gcc/config/rs6000/rs6000-builtins.def @@ -1580,16 +1580,16 @@ const vsi __builtin_vsx_vsignedo_v2df (vd); VEC_VSIGNEDO_V2DF vsignedo_v2df {} - const vsll __builtin_vsx_vunsigned_v2df (vd); -VEC_VUNSIGNED_V2DF vsx_xvcvdpsxds {} + const vull __builtin_vsx_vunsigned_v2df (vd); +VEC_VUNSIGNED_V2DF vsx_xvcvdpuxds {} - const vsi __builtin_vsx_vunsigned_v4sf (vf); -VEC_VUNSIGNED_V4SF vsx_xvcvspsxws {} + const vui __builtin_vsx_vunsigned_v4sf (vf); +VEC_VUNSIGNED_V4SF vsx_xvcvspuxws {} - const vsi __builtin_vsx_vunsignede_v2df (vd); + const vui __builtin_vsx_vunsignede_v2df (vd); VEC_VUNSIGNEDE_V2DF vunsignede_v2df {} - const vsi __builtin_vsx_vunsignedo_v2df (vd); + const vui __builtin_vsx_vunsignedo_v2df (vd); VEC_VUNSIGNEDO_V2DF vunsignedo_v2df {} const vf __builtin_vsx_xscvdpsp (double); diff --git a/gcc/testsuite/gcc.target/powerpc/builtins-3-runnable.c b/gcc/testsuite/gcc.target/powerpc/builtins-3-runnable.c index 0231a1fd086..5dcdfbee791 100644 --- a/gcc/testsuite/gcc.target/powerpc/builtins-3-runnable.c +++ b/gcc/testsuite/gcc.target/powerpc/builtins-3-runnable.c @@ -313,6 +313,14 @@ int main() test_unsigned_int_result (ALL, vec_uns_int_result, vec_uns_int_expected); + /* Convert single precision float to unsigned int. Negative + arguments. */ + vec_flt0 = (vector float){-14.930, -834.49, -3.3, -5.4}; + vec_uns_int_expected = (vector unsigned int){0, 0, 0, 0}; + vec_uns_int_result = vec_unsigned (vec_flt0); + test_unsigned_int_result (ALL, vec_uns_int_result, + vec_uns_int_expected); + /* Convert double precision float to long long unsigned int */ vec_dble0 = (vector double){124.930, 8134.49}; vec_ll_uns_int_expected = (vector long long unsigned int){124, 8134}; @@ -320,10 +328,18 @@ int main() test_ll_unsigned_int_result (vec_ll_uns_int_result, vec_ll_uns_int_expected); + /* Convert double precision float to long long unsigned int. Negative + arguments. */ + vec_dble0 = (vector double){-24.93, -134.9}; + vec_ll_uns_int_expected = (vector long long unsigned int){0, 0}; + vec_ll_uns_int_result = vec_unsigned (vec_dble0); + test_ll_unsigned_int_result (vec_ll_uns_int_result, +vec_ll_uns_int_expected); + /* Convert double precision vector float to vector unsigned int, - even words */ - vec_dble0 = (vector double){3124.930, 8234.49}; - vec_uns_int_expected = (vector unsigned int){3124, 0, 8234, 0}; + even words. Negative arguments */ + vec_dble0 = (vector double){-124.930, -234.49}; + vec_uns_int_expected = (vector unsigned int){0, 0, 0, 0}; vec_uns_int_result = vec_unsignede (vec_dble0); test_unsigned_int_result (EVEN, vec_uns_int_result, vec_uns_int_expected); @@ -335,5 +351,13 @@ int main() vec_uns_int_resul
Re: [PATCH 4/13 ver 3] rs6000, extend the current vec_{un,}signed{e,o} built-ins
Updated the patch per the feedback comments from the previous version. Carl --- rs6000, extend the current vec_{un,}signed{e,o} built-ins The built-ins __builtin_vsx_xvcvspsxds and __builtin_vsx_xvcvspuxds convert a vector of floats to signed/unsigned long long ints. Extend the existing vec_{un,}signed{e,o} built-ins to handle the argument vector of floats to return the even/odd signed/unsigned integers. The define expands vsignede_v4sf, vsignedo_v4sf, vunsignede_v4sf, vunsignedo_v4sf are added to support the new vec_{un,}signed{e,o} built-ins. The built-ins __builtin_vsx_xvcvspsxds and __builtin_vsx_xvcvspuxds are now for internal use only. They are not documented and they do not have testcases. The built-in __builtin_vsx_xvcvdpsxws is redundant as it is covered by vec_signed{e,o}, remove. The built-in __builtin_vsx_xvcvdpuxws is redundant as it is covered by vec_unsigned{e,o}, remove. The built-in __builtin_vsx_xvcvdpuxds_uns is redundant as it is covered by vec_unsigned, remove. The __builtin_vsx_xvcvspuxws is redundante as it is covered by vec_unsigned, remove. Add testcases and update documentation. gcc/ChangeLog: * config/rs6000/rs6000-builtins.def (__builtin_vsx_xvcvspsxds_low, __builtin_vsx_xvcvspuxds_low): New built-in definitions. (__builtin_vsx_xvcvspuxds): Fix return type. (XVCVSPSXDS, XVCVSPUXDS): Renamed VEC_VSIGNEDE_V4SF, VEC_VUNSIGNEDE_V4SF respectively. (vsx_xvcvspsxds, vsx_xvcvspuxds): Renamed vsignede_v4sf, vunsignede_v4sf respectively. (__builtin_vsx_xvcvdpsxws, __builtin_vsx_xvcvdpuxws, __builtin_vsx_xvcvdpuxds_uns, __builtin_vsx_xvcvspuxws): Removed. * config/rs6000/rs6000-overload.def (vec_signede, vec_signedo, vec_unsignede,vec_unsignedo): Add new overloaded specifications. * config/rs6000/vsx.md (vsignede_v4sf, vsignedo_v4sf, vunsignede_v4sf, vunsignedo_v4sf): New define_expands. * doc/extend.texi (vec_signedo, vec_signede): Add documentation. gcc/testsuite/ChangeLog: * gcc.target/powerpc/builtins-3-runnable.c: New tests for the added overloaded built-ins. --- gcc/config/rs6000/rs6000-builtins.def | 25 ++ gcc/config/rs6000/rs6000-overload.def | 8 ++ gcc/config/rs6000/vsx.md | 88 +++ gcc/doc/extend.texi | 10 +++ .../gcc.target/powerpc/builtins-3-runnable.c | 51 +-- 5 files changed, 157 insertions(+), 25 deletions(-) diff --git a/gcc/config/rs6000/rs6000-builtins.def b/gcc/config/rs6000/rs6000-builtins.def index bf9a0ae22fc..cea2649b86c 100644 --- a/gcc/config/rs6000/rs6000-builtins.def +++ b/gcc/config/rs6000/rs6000-builtins.def @@ -1688,32 +1688,23 @@ const vsll __builtin_vsx_xvcvdpsxds_scale (vd, const int); XVCVDPSXDS_SCALE vsx_xvcvdpsxds_scale {} - const vsi __builtin_vsx_xvcvdpsxws (vd); -XVCVDPSXWS vsx_xvcvdpsxws {} - - const vsll __builtin_vsx_xvcvdpuxds (vd); -XVCVDPUXDS vsx_fixuns_truncv2dfv2di2 {} - const vsll __builtin_vsx_xvcvdpuxds_scale (vd, const int); XVCVDPUXDS_SCALE vsx_xvcvdpuxds_scale {} - const vull __builtin_vsx_xvcvdpuxds_uns (vd); -XVCVDPUXDS_UNS vsx_fixuns_truncv2dfv2di2 {} - - const vsi __builtin_vsx_xvcvdpuxws (vd); -XVCVDPUXWS vsx_xvcvdpuxws {} - const vd __builtin_vsx_xvcvspdp (vf); XVCVSPDP vsx_xvcvspdp {} const vsll __builtin_vsx_xvcvspsxds (vf); -XVCVSPSXDS vsx_xvcvspsxds {} +VEC_VSIGNEDE_V4SF vsignede_v4sf {} + + const vsll __builtin_vsx_xvcvspsxds_low (vf); +VEC_VSIGNEDO_V4SF vsignedo_v4sf {} - const vsll __builtin_vsx_xvcvspuxds (vf); -XVCVSPUXDS vsx_xvcvspuxds {} + const vull __builtin_vsx_xvcvspuxds (vf); +VEC_VUNSIGNEDE_V4SF vunsignede_v4sf {} - const vsi __builtin_vsx_xvcvspuxws (vf); -XVCVSPUXWS vsx_fixuns_truncv4sfv4si2 {} + const vull __builtin_vsx_xvcvspuxds_low (vf); +VEC_VUNSIGNEDO_V4SF vunsignedo_v4sf {} const vd __builtin_vsx_xvcvsxddp (vsll); XVCVSXDDP vsx_floatv2div2df2 {} diff --git a/gcc/config/rs6000/rs6000-overload.def b/gcc/config/rs6000/rs6000-overload.def index 84bd9ae6554..4d857bb1af3 100644 --- a/gcc/config/rs6000/rs6000-overload.def +++ b/gcc/config/rs6000/rs6000-overload.def @@ -3307,10 +3307,14 @@ [VEC_SIGNEDE, vec_signede, __builtin_vec_vsignede] vsi __builtin_vec_vsignede (vd); VEC_VSIGNEDE_V2DF + vsll __builtin_vec_vsignede (vf); +VEC_VSIGNEDE_V4SF [VEC_SIGNEDO, vec_signedo, __builtin_vec_vsignedo] vsi __builtin_vec_vsignedo (vd); VEC_VSIGNEDO_V2DF + vsll __builtin_vec_vsignedo (vf); +VEC_VSIGNEDO_V4SF [VEC_SIGNEXTI, vec_signexti, __builtin_vec_signexti] vsi __builtin_vec_signexti (vsc); @@ -4433,10 +4437,14 @@ [VEC_UNSIGNEDE, vec_unsignede, __builtin_vec_vunsignede] vui __builtin_vec_vunsignede (vd); VEC_VUNSIGNEDE_V2DF + vull __builtin_vec_vunsignede
Re: [PATCH 5/13 ver 3] rs6000, Remove redundant float/double type conversions
This is a new patch to removed the built-ins that were inadvertently missing in the previous series. Carl -- rs6000, Remove redundant float/double type conversions The following built-ins are redundant as they are covered by another overloaded built-in. __builtin_vsx_xvcvspdp covered by vec_double{e,o} __builtin_vsx_xvcvdpsp covered by vec_float{e,o} __builtin_vsx_xvcvsxwdp covered by vec_double{e,o} __builtin_vsx_xvcvuxddp_uns covered by vec_double Remove the redundant built-ins. They are not documented nor do they have test cases. gcc/ChangeLog: * config/rs6000/rs6000-builtins.def (__builtin_vsx_xvcvspdp, __builtin_vsx_xvcvdpsp, __builtin_vsx_xvcvsxwdp, __builtin_vsx_xvcvuxddp_uns): Remove. --- gcc/config/rs6000/rs6000-builtins.def | 12 1 file changed, 12 deletions(-) diff --git a/gcc/config/rs6000/rs6000-builtins.def b/gcc/config/rs6000/rs6000-builtins.def index cea2649b86c..6049f3a4599 100644 --- a/gcc/config/rs6000/rs6000-builtins.def +++ b/gcc/config/rs6000/rs6000-builtins.def @@ -1679,9 +1679,6 @@ const signed int __builtin_vsx_xvcmpgtsp_p (signed int, vf, vf); XVCMPGTSP_P vector_gt_v4sf_p {pred} - const vf __builtin_vsx_xvcvdpsp (vd); -XVCVDPSP vsx_xvcvdpsp {} - const vsll __builtin_vsx_xvcvdpsxds (vd); XVCVDPSXDS vsx_fix_truncv2dfv2di2 {} @@ -1691,9 +1688,6 @@ const vsll __builtin_vsx_xvcvdpuxds_scale (vd, const int); XVCVDPUXDS_SCALE vsx_xvcvdpuxds_scale {} - const vd __builtin_vsx_xvcvspdp (vf); -XVCVSPDP vsx_xvcvspdp {} - const vsll __builtin_vsx_xvcvspsxds (vf); VEC_VSIGNEDE_V4SF vsignede_v4sf {} @@ -1715,9 +1709,6 @@ const vf __builtin_vsx_xvcvsxdsp (vsll); XVCVSXDSP vsx_xvcvsxdsp {} - const vd __builtin_vsx_xvcvsxwdp (vsi); -XVCVSXWDP vsx_xvcvsxwdp {} - const vf __builtin_vsx_xvcvsxwsp (vsi); XVCVSXWSP vsx_floatv4siv4sf2 {} @@ -1727,9 +1718,6 @@ const vd __builtin_vsx_xvcvuxddp_scale (vsll, const int<5>); XVCVUXDDP_SCALE vsx_xvcvuxddp_scale {} - const vd __builtin_vsx_xvcvuxddp_uns (vull); -XVCVUXDDP_UNS vsx_floatunsv2div2df2 {} - const vf __builtin_vsx_xvcvuxdsp (vull); XVCVUXDSP vsx_xvcvuxdsp {} -- 2.45.0
Re: [PATCH 7/13 ver 3] rs6000, add overloaded vec_sel with int128 arguments
This was patch 6 in the previous series. Updated the documentation file per the comments. No functional changes to the patch. Carl rs6000, add overloaded vec_sel with int128 arguments Extend the vec_sel built-in to take three signed/unsigned int128 arguments and return a signed/unsigned int128 result. Extending the vec_sel built-in makes the existing buit-ins __builtin_vsx_xxsel_1ti and __builtin_vsx_xxsel_1ti_uns obsolete. The patch removes these built-ins. The patch adds documentation and test cases for the new overloaded vec_sel built-ins. gcc/ChangeLog: * config/rs6000/rs6000-builtins.def (__builtin_vsx_xxsel_1ti, __builtin_vsx_xxsel_1ti_uns): Remove built-in definitions. * config/rs6000/rs6000-overload.def (vec_sel): Add new overloaded definitions. * doc/extend.texi: Add documentation for new vec_sel instances. gcc/testsuite/ChangeLog: * gcc.target/powerpc/vec-sel-runnable-i128.c: New test file. --- gcc/config/rs6000/rs6000-builtins.def | 6 - gcc/config/rs6000/rs6000-overload.def | 4 + gcc/doc/extend.texi | 12 ++ .../powerpc/vec-sel-runnable-i128.c | 129 ++ 4 files changed, 145 insertions(+), 6 deletions(-) create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-sel-runnable-i128.c diff --git a/gcc/config/rs6000/rs6000-builtins.def b/gcc/config/rs6000/rs6000-builtins.def index 13e36df008d..ea0da77f13e 100644 --- a/gcc/config/rs6000/rs6000-builtins.def +++ b/gcc/config/rs6000/rs6000-builtins.def @@ -1904,12 +1904,6 @@ const vuc __builtin_vsx_xxsel_16qi_uns (vuc, vuc, vuc); XXSEL_16QI_UNS vector_select_v16qi_uns {} - const vsq __builtin_vsx_xxsel_1ti (vsq, vsq, vsq); -XXSEL_1TI vector_select_v1ti {} - - const vsq __builtin_vsx_xxsel_1ti_uns (vsq, vsq, vsq); -XXSEL_1TI_UNS vector_select_v1ti_uns {} - const vd __builtin_vsx_xxsel_2df (vd, vd, vd); XXSEL_2DF vector_select_v2df {} diff --git a/gcc/config/rs6000/rs6000-overload.def b/gcc/config/rs6000/rs6000-overload.def index 4d857bb1af3..a210c5ad10d 100644 --- a/gcc/config/rs6000/rs6000-overload.def +++ b/gcc/config/rs6000/rs6000-overload.def @@ -3274,6 +3274,10 @@ VSEL_2DF VSEL_2DF_B vd __builtin_vec_sel (vd, vd, vull); VSEL_2DF VSEL_2DF_U + vsq __builtin_vec_sel (vsq, vsq, vsq); +VSEL_1TI VSEL_1TI_S + vuq __builtin_vec_sel (vuq, vuq, vuq); +VSEL_1TI_UNS VSEL_1TI_U ; The following variants are deprecated. vsll __builtin_vec_sel (vsll, vsll, vsll); VSEL_2DI_B VSEL_2DI_S diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index b88e61641a2..0756230b19e 100644 --- a/gcc/doc/extend.texi +++ b/gcc/doc/extend.texi @@ -21372,6 +21372,18 @@ Additional built-in functions are available for the 64-bit PowerPC family of processors, for efficient use of 128-bit floating point (@code{__float128}) values. +Vector select + +@smallexample +vector signed __int128 vec_sel (vector signed __int128, + vector signed __int128, vector signed __int128); +vector unsigned __int128 vec_sel (vector unsigned __int128, + vector unsigned __int128, vector unsigned __int128); +@end smallexample + +The instance is an extension of the exiting overloaded built-in @code{vec_sel} +that is documented in the PVIPR. + @node Basic PowerPC Built-in Functions Available on ISA 2.06 @subsubsection Basic PowerPC Built-in Functions Available on ISA 2.06 diff --git a/gcc/testsuite/gcc.target/powerpc/vec-sel-runnable-i128.c b/gcc/testsuite/gcc.target/powerpc/vec-sel-runnable-i128.c new file mode 100644 index 000..d82225cc847 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/vec-sel-runnable-i128.c @@ -0,0 +1,129 @@ +/* { dg-do run } */ +/* { dg-require-effective-target vmx_hw } */ +/* { dg-options "-save-temps" } */ +/* { dg-final { scan-assembler-times "xxsel" 2 } } */ + +#include + +#define DEBUG 0 + +#if DEBUG +#include +void print_i128 (unsigned __int128 val) +{ + printf(" 0x%016llx%016llx", + (unsigned long long)(val >> 64), + (unsigned long long)(val & 0x)); +} +#endif + +extern void abort (void); + +union convert_union { + vector signed __int128s128; + vector unsigned __int128 u128; + char val[16]; +} convert; + +int check_u128_result(vector unsigned __int128 vresult_u128, + vector unsigned __int128 expected_vresult_u128) +{ + /* Use a for loop to check each byte manually so the test case will run + with ISA 2.06. + + Return 1 if they match, 0 otherwise. */ + + int i; + + union convert_union result; + union convert_union expected; + + result.u128 = vresult_u128; + expected.u128 = expected_vresult_u128; + + /* Check if each byte of the result and expected match. */ + for (i = 0; i < 16; i++) +{ + if (result.val[i] != expected.val[i]) + return 0; +} + return 1;
Re: [PATCH 6/13 ver 3] rs6000, remove duplicated built-ins of vecmergl and, vec_mergeh
This was patch 5 in the previous series. It was previously approved. Not changes in this version. Being posted for completeness. Carl rs6000, remove duplicated built-ins of vecmergl and vec_mergeh The following undocumented built-ins are same as existing documented overloaded builtins. const vf __builtin_vsx_xxmrghw (vf, vf); same as vf __builtin_vec_mergeh (vf, vf); (overloaded vec_mergeh) const vsi __builtin_vsx_xxmrghw_4si (vsi, vsi); same as vsi __builtin_vec_mergeh (vsi, vsi); (overloaded vec_mergeh) const vf __builtin_vsx_xxmrglw (vf, vf); same as vf __builtin_vec_mergel (vf, vf); (overloaded vec_mergel) const vsi __builtin_vsx_xxmrglw_4si (vsi, vsi); same as vsi __builtin_vec_mergel (vsi, vsi); (overloaded vec_mergel) This patch removes the duplicate built-in definitions so only the documented built-ins will be available for use. The case statements in rs6000_gimple_fold_builtin are removed as they are no longer needed. The patch removes the now unused define_expands for vsx_xxmrghw_ and vsx_xxmrglw_. gcc/ChangeLog: * config/rs6000/rs6000-builtins.def (__builtin_vsx_xxmrghw, __builtin_vsx_xxmrghw_4si, __builtin_vsx_xxmrglw, __builtin_vsx_xxmrglw_4si, __builtin_vsx_xxsel_16qi): Remove built-in definition. * config/rs6000/rs6000-builtin.cc (rs6000_gimple_fold_builtin): remove case entries RS6000_BIF_XXMRGLW_4SI, RS6000_BIF_XXMRGLW_4SF, RS6000_BIF_XXMRGHW_4SI, RS6000_BIF_XXMRGHW_4SF. * config/rs6000/vsx.md (vsx_xxmrghw_, vsx_xxmrglw_): Remove unused define_expands. --- gcc/config/rs6000/rs6000-builtin.cc | 4 --- gcc/config/rs6000/rs6000-builtins.def | 12 gcc/config/rs6000/vsx.md | 41 --- 3 files changed, 57 deletions(-) diff --git a/gcc/config/rs6000/rs6000-builtin.cc b/gcc/config/rs6000/rs6000-builtin.cc index ac9f16fe51a..f83d65b06ef 100644 --- a/gcc/config/rs6000/rs6000-builtin.cc +++ b/gcc/config/rs6000/rs6000-builtin.cc @@ -2097,20 +2097,16 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi) /* vec_mergel (integrals). */ case RS6000_BIF_VMRGLH: case RS6000_BIF_VMRGLW: -case RS6000_BIF_XXMRGLW_4SI: case RS6000_BIF_VMRGLB: case RS6000_BIF_VEC_MERGEL_V2DI: -case RS6000_BIF_XXMRGLW_4SF: case RS6000_BIF_VEC_MERGEL_V2DF: fold_mergehl_helper (gsi, stmt, 1); return true; /* vec_mergeh (integrals). */ case RS6000_BIF_VMRGHH: case RS6000_BIF_VMRGHW: -case RS6000_BIF_XXMRGHW_4SI: case RS6000_BIF_VMRGHB: case RS6000_BIF_VEC_MERGEH_V2DI: -case RS6000_BIF_XXMRGHW_4SF: case RS6000_BIF_VEC_MERGEH_V2DF: fold_mergehl_helper (gsi, stmt, 0); return true; diff --git a/gcc/config/rs6000/rs6000-builtins.def b/gcc/config/rs6000/rs6000-builtins.def index 6049f3a4599..13e36df008d 100644 --- a/gcc/config/rs6000/rs6000-builtins.def +++ b/gcc/config/rs6000/rs6000-builtins.def @@ -1877,18 +1877,6 @@ const signed int __builtin_vsx_xvtsqrtsp_fg (vf); XVTSQRTSP_FG vsx_tsqrtv4sf2_fg {} - const vf __builtin_vsx_xxmrghw (vf, vf); -XXMRGHW_4SF vsx_xxmrghw_v4sf {} - - const vsi __builtin_vsx_xxmrghw_4si (vsi, vsi); -XXMRGHW_4SI vsx_xxmrghw_v4si {} - - const vf __builtin_vsx_xxmrglw (vf, vf); -XXMRGLW_4SF vsx_xxmrglw_v4sf {} - - const vsi __builtin_vsx_xxmrglw_4si (vsi, vsi); -XXMRGLW_4SI vsx_xxmrglw_v4si {} - const vsc __builtin_vsx_xxpermdi_16qi (vsc, vsc, const int<2>); XXPERMDI_16QI vsx_xxpermdi_v16qi {} diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md index a8f3d459232..4402b8b01d5 100644 --- a/gcc/config/rs6000/vsx.md +++ b/gcc/config/rs6000/vsx.md @@ -4875,47 +4875,6 @@ (define_insn "vsx_xxspltd_" } [(set_attr "type" "vecperm")]) -;; V4SF/V4SI interleave -(define_expand "vsx_xxmrghw_" - [(set (match_operand:VSX_W 0 "vsx_register_operand" "=wa") -(vec_select:VSX_W - (vec_concat: - (match_operand:VSX_W 1 "vsx_register_operand" "wa") - (match_operand:VSX_W 2 "vsx_register_operand" "wa")) - (parallel [(const_int 0) (const_int 4) -(const_int 1) (const_int 5)])))] - "VECTOR_MEM_VSX_P (mode)" -{ - rtx (*fun) (rtx, rtx, rtx); - fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrghw_direct_ -: gen_altivec_vmrglw_direct_; - if (!BYTES_BIG_ENDIAN) -std::swap (operands[1], operands[2]); - emit_insn (fun (operands[0], operands[1], operands[2])); - DONE; -} - [(set_attr "type" "vecperm")]) - -(define_expand "vsx_xxmrglw_" - [(set (match_operand:VSX_W 0 "vsx_register_operand" "=wa") - (vec_select:VSX_W - (vec_concat: - (match_operand:VSX_W 1 "vsx_register_operand" "wa") - (match_operand:VSX_W 2 "vsx_register_operand" "wa")) - (parallel [(const_int 2) (const_int 6) -(cons
Re: [PATCH 8/13 ver 3] rs6000, remove the vec_xxsel built-ins, they are, duplicates
This was patch 7 in the previous series. Patch was updated to address the feedback comments. Carl rs6000, remove the vec_xxsel built-ins, they are duplicates The following undocumented built-ins are covered by the existing overloaded vec_sel built-in definitions. const vsc __builtin_vsx_xxsel_16qi (vsc, vsc, vsc); same as vsc __builtin_vec_sel (vsc, vsc, vuc); (overloaded vec_sel) const vuc __builtin_vsx_xxsel_16qi_uns (vuc, vuc, vuc); same as vuc __builtin_vec_sel (vuc, vuc, vuc); (overloaded vec_sel) const vd __builtin_vsx_xxsel_2df (vd, vd, vd); same as vd __builtin_vec_sel (vd, vd, vull); (overloaded vec_sel) const vsll __builtin_vsx_xxsel_2di (vsll, vsll, vsll); same as vsll __builtin_vec_sel (vsll, vsll, vsll); (overloaded vec_sel) const vull __builtin_vsx_xxsel_2di_uns (vull, vull, vull); same as vull __builtin_vec_sel (vull, vull, vsll); (overloaded vec_sel) const vf __builtin_vsx_xxsel_4sf (vf, vf, vf); same as vf __builtin_vec_sel (vf, vf, vsi) (overloaded vec_sel) const vsi __builtin_vsx_xxsel_4si (vsi, vsi, vsi); same as vsi __builtin_vec_sel (vsi, vsi, vbi); (overloaded vec_sel) const vui __builtin_vsx_xxsel_4si_uns (vui, vui, vui); same as vui __builtin_vec_sel (vui, vui, vui); (overloaded vec_sel) const vss __builtin_vsx_xxsel_8hi (vss, vss, vss); same as vss __builtin_vec_sel (vss, vss, vbs); (overloaded vec_sel) const vus __builtin_vsx_xxsel_8hi_uns (vus, vus, vus); same as vus __builtin_vec_sel (vus, vus, vus); (overloaded vec_sel) This patch removed the duplicate built-in definitions so users will only use the documented vec_sel built-in. The __builtin_vsx_xxsel_[4si, 8hi, 16qi, 4sf, 2df] tests are also removed. gcc/ChangeLog: * config/rs6000/rs6000-builtins.def (__builtin_vsx_xxsel_16qi, __builtin_vsx_xxsel_16qi_uns, __builtin_vsx_xxsel_2df, __builtin_vsx_xxsel_2di,__builtin_vsx_xxsel_2di_uns, __builtin_vsx_xxsel_4sf,__builtin_vsx_xxsel_4si, __builtin_vsx_xxsel_4si_uns,__builtin_vsx_xxsel_8hi, __builtin_vsx_xxsel_8hi_uns): Removebuilt-in definitions. gcc/testsuite/ChangeLog: * gcc.target/powerpc/vsx-builtin-3.c (__builtin_vsx_xxsel_4si, __builtin_vsx_xxsel_8hi, __builtin_vsx_xxsel_16qi, __builtin_vsx_xxsel_4sf, __builtin_vsx_xxsel_2df, __builtin_vsx_xxsel): Change built-in call to overloaded built-in call vec_sel. --- gcc/config/rs6000/rs6000-builtins.def | 30 .../gcc.target/powerpc/vsx-builtin-3.c| 36 ++- 2 files changed, 19 insertions(+), 47 deletions(-) diff --git a/gcc/config/rs6000/rs6000-builtins.def b/gcc/config/rs6000/rs6000-builtins.def index ea0da77f13e..a78c52183bc 100644 --- a/gcc/config/rs6000/rs6000-builtins.def +++ b/gcc/config/rs6000/rs6000-builtins.def @@ -1898,36 +1898,6 @@ const vss __builtin_vsx_xxpermdi_8hi (vss, vss, const int<2>); XXPERMDI_8HI vsx_xxpermdi_v8hi {} - const vsc __builtin_vsx_xxsel_16qi (vsc, vsc, vsc); -XXSEL_16QI vector_select_v16qi {} - - const vuc __builtin_vsx_xxsel_16qi_uns (vuc, vuc, vuc); -XXSEL_16QI_UNS vector_select_v16qi_uns {} - - const vd __builtin_vsx_xxsel_2df (vd, vd, vd); -XXSEL_2DF vector_select_v2df {} - - const vsll __builtin_vsx_xxsel_2di (vsll, vsll, vsll); -XXSEL_2DI vector_select_v2di {} - - const vull __builtin_vsx_xxsel_2di_uns (vull, vull, vull); -XXSEL_2DI_UNS vector_select_v2di_uns {} - - const vf __builtin_vsx_xxsel_4sf (vf, vf, vf); -XXSEL_4SF vector_select_v4sf {} - - const vsi __builtin_vsx_xxsel_4si (vsi, vsi, vsi); -XXSEL_4SI vector_select_v4si {} - - const vui __builtin_vsx_xxsel_4si_uns (vui, vui, vui); -XXSEL_4SI_UNS vector_select_v4si_uns {} - - const vss __builtin_vsx_xxsel_8hi (vss, vss, vss); -XXSEL_8HI vector_select_v8hi {} - - const vus __builtin_vsx_xxsel_8hi_uns (vus, vus, vus); -XXSEL_8HI_UNS vector_select_v8hi_uns {} - const vsc __builtin_vsx_xxsldwi_16qi (vsc, vsc, const int<2>); XXSLDWI_16QI vsx_xxsldwi_v16qi {} diff --git a/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c index ff875c55304..e20d3f03c86 100644 --- a/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c +++ b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c @@ -37,6 +37,8 @@ /* { dg-final { scan-assembler "xvcvsxdsp" } } */ /* { dg-final { scan-assembler "xvcvuxdsp" } } */ +#include + extern __vector int si[][4]; extern __vector short ss[][4]; extern __vector signed char sc[][4]; @@ -61,23 +63,23 @@ int do_sel(void) { int i = 0; - si[i][0] = __builtin_vsx_xxsel_4si (si[i][1], si[i][2], si[i][3]); i++; - ss[i][0] = __builtin_vsx_xxsel_8hi (ss[i][1], ss[i][2], ss[i][3]); i++; - sc[i][0] = __builtin_vsx_xxsel_16qi (sc[i][1], sc[i][2], sc[i][3]); i++; - f[i][0] = __built
Re: [PATCH 10/13 ver 3] rs6000, remove __builtin_vsx_xvnegdp and, __builtin_vsx_xvnegsp built-ins
This was patch 9 in the previous series. It was previously approved. Reposting for completeness. Carl - rs6000, remove __builtin_vsx_xvnegdp and __builtin_vsx_xvnegsp built-ins The undocumented __builtin_vsx_xvnegdp and __builtin_vsx_xvnegsp are redundant. The overloaded vec_neg built-in provides the same functionality. The two buit-ins are not documented nor are there any test cases for them. Remove the definitions so users will use the overloaded vec_neg built-in which is documented in the PVIPR. gcc/ChangeLog: * config/rs6000/rs6000-builtins.def (__builtin_vsx_xvnegdp, __builtin_vsx_xvnegsp): Remove built-in definitions. --- gcc/config/rs6000/rs6000-builtins.def | 6 -- 1 file changed, 6 deletions(-) diff --git a/gcc/config/rs6000/rs6000-builtins.def b/gcc/config/rs6000/rs6000-builtins.def index f02a8c4de45..64690b9b9b5 100644 --- a/gcc/config/rs6000/rs6000-builtins.def +++ b/gcc/config/rs6000/rs6000-builtins.def @@ -1736,12 +1736,6 @@ const vf __builtin_vsx_xvnabssp (vf); XVNABSSP vsx_nabsv4sf2 {} - const vd __builtin_vsx_xvnegdp (vd); -XVNEGDP negv2df2 {} - - const vf __builtin_vsx_xvnegsp (vf); -XVNEGSP negv4sf2 {} - const vd __builtin_vsx_xvnmadddp (vd, vd, vd); XVNMADDDP nfmav2df4 {} -- 2.45.0
Re: [PATCH 11/13 ver 3] rs6000, extend vec_xxpermdi built-in for __int128 args
This was patch 10 from the previous series. The patch was updated to address feedback comments. Carl --- rs6000, extend vec_xxpermdi built-in for __int128 args Add a new signed and unsigned overloaded instances for vec_xxpermdi __int128 vec_xxpermdi (__int128, __int128, const int); __uint128 vec_xxpermdi (__uint128, __uint128, const int); Update the documentation to include a reference to the new built-in instances. Add test cases for the new overloaded instances. gcc/ChangeLog: * config/rs6000/rs6000-overload.def (vec_xxpermdi): Add new overloaded built-in instances. * doc/extend.texi: Add documentation for new overloaded built-in instances. gcc/testsuite/ChangeLog:gcc/testsuite/ChangeLog: * gcc.target/powerpc/vec_perm-runnable-i128.c: New test file. --- gcc/config/rs6000/rs6000-overload.def | 4 + gcc/doc/extend.texi | 2 + .../powerpc/vec_perm-runnable-i128.c | 229 ++ 3 files changed, 235 insertions(+) create mode 100644 gcc/testsuite/gcc.target/powerpc/vec_perm-runnable-i128.c diff --git a/gcc/config/rs6000/rs6000-overload.def b/gcc/config/rs6000/rs6000-overload.def index a210c5ad10d..45000f161e4 100644 --- a/gcc/config/rs6000/rs6000-overload.def +++ b/gcc/config/rs6000/rs6000-overload.def @@ -4932,6 +4932,10 @@ XXPERMDI_4SF XXPERMDI_VF vd __builtin_vsx_xxpermdi (vd, vd, const int); XXPERMDI_2DF XXPERMDI_VD + vsq __builtin_vsx_xxpermdi (vsq, vsq, const int); +XXPERMDI_1TI XXPERMDI_1TI + vuq __builtin_vsx_xxpermdi (vuq, vuq, const int); +XXPERMDI_1TI XXPERMDI_1TUI [VEC_XXSLDWI, vec_xxsldwi, __builtin_vsx_xxsldwi] vsc __builtin_vsx_xxsldwi (vsc, vsc, const int); diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index 0756230b19e..edfef1bdab7 100644 --- a/gcc/doc/extend.texi +++ b/gcc/doc/extend.texi @@ -22555,6 +22555,8 @@ void vec_vsx_st (vector bool char, int, signed char *); vector double vec_xxpermdi (vector double, vector double, const int); vector float vec_xxpermdi (vector float, vector float, const int); vector long long vec_xxpermdi (vector long long, vector long long, const int); +vector __int128 vec_xxpermdi (vector __int128, vector __int128, const int); +vector __int128 vec_xxpermdi (vector __uint128, vector __uint128, const int); vector unsigned long long vec_xxpermdi (vector unsigned long long, vector unsigned long long, const int); vector int vec_xxpermdi (vector int, vector int, const int); diff --git a/gcc/testsuite/gcc.target/powerpc/vec_perm-runnable-i128.c b/gcc/testsuite/gcc.target/powerpc/vec_perm-runnable-i128.c new file mode 100644 index 000..2d5dce09404 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/vec_perm-runnable-i128.c @@ -0,0 +1,229 @@ +/* { dg-do run } */ +/* { dg-require-effective-target vmx_hw } */ +/* { dg-options "-save-temps" } */ + +#include + +#define DEBUG 0 + +#if DEBUG +#include +void print_i128 (unsigned __int128 val) +{ + printf(" 0x%016llx%016llx", + (unsigned long long)(val >> 64), + (unsigned long long)(val & 0x)); +} +#endif + +extern void abort (void); + +union convert_union { + vector signed __int128s128; + vector unsigned __int128 u128; + char val[16]; +} convert; + +int check_u128_result(vector unsigned __int128 vresult_u128, + vector unsigned __int128 expected_vresult_u128) +{ + /* Use a for loop to check each byte manually so the test case will + run with ISA 2.06. + + Return 1 if they match, 0 otherwise. */ + + int i; + + union convert_union result; + union convert_union expected; + + result.u128 = vresult_u128; + expected.u128 = expected_vresult_u128; + + /* Check if each byte of the result and expected match. */ + for (i = 0; i < 16; i++) +{ + if (result.val[i] != expected.val[i]) + return 0; +} + return 1; +} + +int check_s128_result(vector signed __int128 vresult_s128, + vector signed __int128 expected_vresult_s128) +{ + /* Convert the arguments to unsigned, then check equality. */ + union convert_union result; + union convert_union expected; + + result.s128 = vresult_s128; + expected.s128 = expected_vresult_s128; + + return check_u128_result (result.u128, expected.u128); +} + + +int +main (int argc, char *argv []) +{ + int i; + + vector signed __int128 src_va_s128; + vector signed __int128 src_vb_s128; + vector signed __int128 vresult_s128; + vector signed __int128 expected_vresult_s128; + + vector unsigned __int128 src_va_u128; + vector unsigned __int128 src_vb_u128; + vector unsigned __int128 src_vc_u128; + vector unsigned __int128 vresult_u128; + vector unsigned __int128 expected_vresult_u128; + + src_va_s128 = (vector signed __int128) {0x123456789ABCDEF0}; + src_va_s128 = src_va_s128 << 64; + src_va_s128
Re: [PATCH 9/13 ver 3] rs6000, remove __builtin_vsx_vperm_* built-ins
This was patch 8 in the previous series. Updated patch per the feedback comments. Carl rs6000, remove __builtin_vsx_vperm_* built-ins The undocumented built-ins: __builtin_vsx_vperm_16qi_uns, __builtin_vsx_vperm_1ti, __builtin_vsx_vperm_1ti_uns, __builtin_vsx_vperm_2df, __builtin_vsx_vperm_2di, __builtin_vsx_vperm_2di_uns, __builtin_vsx_vperm_4sf, __builtin_vsx_vperm_4si, __builtin_vsx_vperm_4si_uns are duplicats of the __builtin_altivec_* builtins that are used by the overloaded vec_perm built-in that is documented in the PVIPR. gcc/ChangeLog: * config/rs6000/rs6000-builtins.def (__builtin_vsx_vperm_16qi_uns, __builtin_vsx_vperm_1ti, __builtin_vsx_vperm_1ti_uns, __builtin_vsx_vperm_2df, __builtin_vsx_vperm_2di, __builtin_vsx_vperm_2di_uns, __builtin_vsx_vperm_4sf, __builtin_vsx_vperm_4si, __builtin_vsx_vperm_4si_uns): Remove built-in definitions and comments. gcc/testsuite/ChangeLog: * gcc.target/powerpc/vsx-builtin-3.c (__builtin_vsx_vperm_16qi_uns, __builtin_vsx_vperm_1ti, __builtin_vsx_vperm_1ti_uns, __builtin_vsx_vperm_2df, __builtin_vsx_vperm_2di, __builtin_vsx_vperm_2di_uns, __builtin_vsx_vperm_4sf, __builtin_vsx_vperm_4si, __builtin_vsx_vperm_4si_uns, __builtin_vsx_vperm): Change call to built-in to the overloaded built-in vec_perm. --- gcc/config/rs6000/rs6000-builtins.def | 33 --- .../gcc.target/powerpc/vsx-builtin-3.c| 22 ++--- 2 files changed, 11 insertions(+), 44 deletions(-) diff --git a/gcc/config/rs6000/rs6000-builtins.def b/gcc/config/rs6000/rs6000-builtins.def index a78c52183bc..f02a8c4de45 100644 --- a/gcc/config/rs6000/rs6000-builtins.def +++ b/gcc/config/rs6000/rs6000-builtins.def @@ -1529,39 +1529,6 @@ const vf __builtin_vsx_uns_floato_v2di (vsll); UNS_FLOATO_V2DI unsfloatov2di {} -; These are duplicates of __builtin_altivec_* counterparts, and are being -; kept for backwards compatibility. The reason for their existence is -; unclear. TODO: Consider deprecation/removal at some point. - const vsc __builtin_vsx_vperm_16qi (vsc, vsc, vuc); -VPERM_16QI_X altivec_vperm_v16qi {} - - const vuc __builtin_vsx_vperm_16qi_uns (vuc, vuc, vuc); -VPERM_16QI_UNS_X altivec_vperm_v16qi_uns {} - - const vsq __builtin_vsx_vperm_1ti (vsq, vsq, vsc); -VPERM_1TI_X altivec_vperm_v1ti {} - - const vsq __builtin_vsx_vperm_1ti_uns (vsq, vsq, vsc); -VPERM_1TI_UNS_X altivec_vperm_v1ti_uns {} - - const vd __builtin_vsx_vperm_2df (vd, vd, vuc); -VPERM_2DF_X altivec_vperm_v2df {} - - const vsll __builtin_vsx_vperm_2di (vsll, vsll, vuc); -VPERM_2DI_X altivec_vperm_v2di {} - - const vull __builtin_vsx_vperm_2di_uns (vull, vull, vuc); -VPERM_2DI_UNS_X altivec_vperm_v2di_uns {} - - const vf __builtin_vsx_vperm_4sf (vf, vf, vuc); -VPERM_4SF_X altivec_vperm_v4sf {} - - const vsi __builtin_vsx_vperm_4si (vsi, vsi, vuc); -VPERM_4SI_X altivec_vperm_v4si {} - - const vui __builtin_vsx_vperm_4si_uns (vui, vui, vuc); -VPERM_4SI_UNS_X altivec_vperm_v4si_uns {} - const vss __builtin_vsx_vperm_8hi (vss, vss, vuc); VPERM_8HI_X altivec_vperm_v8hi {} diff --git a/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c index e20d3f03c86..f06d871b6b1 100644 --- a/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c +++ b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c @@ -88,17 +88,17 @@ int do_perm(void) { int i = 0; - si[i][0] = __builtin_vsx_vperm_4si (si[i][1], si[i][2], uc[i][3]); i++; - ss[i][0] = __builtin_vsx_vperm_8hi (ss[i][1], ss[i][2], uc[i][3]); i++; - sc[i][0] = __builtin_vsx_vperm_16qi (sc[i][1], sc[i][2], uc[i][3]); i++; - f[i][0] = __builtin_vsx_vperm_4sf (f[i][1], f[i][2], uc[i][3]); i++; - d[i][0] = __builtin_vsx_vperm_2df (d[i][1], d[i][2], uc[i][3]); i++; - - si[i][0] = __builtin_vsx_vperm (si[i][1], si[i][2], uc[i][3]); i++; - ss[i][0] = __builtin_vsx_vperm (ss[i][1], ss[i][2], uc[i][3]); i++; - sc[i][0] = __builtin_vsx_vperm (sc[i][1], sc[i][2], uc[i][3]); i++; - f[i][0] = __builtin_vsx_vperm (f[i][1], f[i][2], uc[i][3]); i++; - d[i][0] = __builtin_vsx_vperm (d[i][1], d[i][2], uc[i][3]); i++; + si[i][0] = vec_perm (si[i][1], si[i][2], uc[i][3]); i++; + ss[i][0] = vec_perm (ss[i][1], ss[i][2], uc[i][3]); i++; + sc[i][0] = vec_perm (sc[i][1], sc[i][2], uc[i][3]); i++; + f[i][0] = vec_perm (f[i][1], f[i][2], uc[i][3]); i++; + d[i][0] = vec_perm (d[i][1], d[i][2], uc[i][3]); i++; + + si[i][0] = vec_perm (si[i][1], si[i][2], uc[i][3]); i++; + ss[i][0] = vec_perm (ss[i][1], ss[i][2], uc[i][3]); i++; + sc[i][0] = vec_perm (sc[i][1], sc[i][2], uc[i][3]); i++; + f[i][0] = vec_perm (f[i][1], f[i][2], uc[i][3]); i++; + d[i][0] = vec_perm (d[i][1], d[i][2], uc[i][3]); i++; return i; } -- 2.45.0
Re: [PATCH 12/13 ver 3] rs6000, remove __builtin_vsx_xvcmpeqsp_p built-in
This was patch 11 from the previous series. Patch was updated to address feedback comments. Carl -- rs6000, remove __builtin_vsx_xvcmpeqsp_p built-in The built-in __builtin_vsx_xvcmpeqsp_p is a duplicate of the overloaded __builtin_altivec_vcmpeqfp_p built-in. The built-in is undocumented and there are no test cases for it. The patch removes built-in __builtin_vsx_xvcmpeqsp_p. gcc/ChangeLog: * config/rs6000/rs6000-builtins.def (__builtin_vsx_xvcmpeqsp_p): Remove built-in definition. --- gcc/config/rs6000/rs6000-builtins.def | 3 --- 1 file changed, 3 deletions(-) diff --git a/gcc/config/rs6000/rs6000-builtins.def b/gcc/config/rs6000/rs6000-builtins.def index 64690b9b9b5..48ebc018a8d 100644 --- a/gcc/config/rs6000/rs6000-builtins.def +++ b/gcc/config/rs6000/rs6000-builtins.def @@ -1619,9 +1619,6 @@ const vf __builtin_vsx_xvcmpeqsp (vf, vf); XVCMPEQSP vector_eqv4sf {} - const signed int __builtin_vsx_xvcmpeqsp_p (signed int, vf, vf); -XVCMPEQSP_P vector_eq_v4sf_p {pred} - const vd __builtin_vsx_xvcmpgedp (vd, vd); XVCMPGEDP vector_gev2df {} -- 2.45.0
Re: [PATCH 13/13 ver 3] rs6000, remove vector set and vector init built-ins.
This was patch 13 from the previous series. Note the previous series patch 12 was dropped. This patch is the same as the previous version. The additional work to remove __builtin_vec_set_v1ti, __builtin_vec_set_v2di, __builtin_vec_set_v2d per the feedback comments with equivalent gimple code is being deferred to a future patch. The goal of this series was simply to remove duplicated built-ins, extending overloaded built-ins as needed. Adding the needed gimple code to remove the additional built-ins is beyond the goal of this patch series. Carl --- rs6000, remove vector set and vector init built-ins. The vector init built-ins: __builtin_vec_init_v16qi, __builtin_vec_init_v8hi, __builtin_vec_init_v4si, __builtin_vec_init_v4sf, __builtin_vec_init_v2di, __builtin_vec_init_v2df, __builtin_vec_set_v1ti perform the same operation as initializing the vector in C code. For example: result_v4si = __builtin_vec_init_v4si (1, 2, 3, 4); result_v4si = {1, 2, 3, 4}; These two constructs were tested and verified they generate identical assembly instructions with no optimization and -O3 optimization. The vector set built-ins: __builtin_vec_set_v16qi, __builtin_vec_set_v8hi. __builtin_vec_set_v4si, __builtin_vec_set_v4sf perform the same operation as setting a specific element in the vector in C code. For example: src_v4si = __builtin_vec_set_v4si (src_v4si, int_val, index); src_v4si[index] = int_val; The built-in actually generates more instructions than the inline C code with no optimization but is identical with -O3 optimizations. All of the above built-ins that are removed do not have test cases and are not documented. Built-ins __builtin_vec_set_v1ti __builtin_vec_set_v2di, __builtin_vec_set_v2df are not removed as they are used in function resolve_vec_insert() in file rs6000-c.cc. The built-ins are removed as they don't provide any benefit over just using C code. gcc/ChangeLog: * config/rs6000/rs6000-builtins.def (__builtin_vec_init_v16qi, __builtin_vec_init_v8hi, __builtin_vec_init_v4si, __builtin_vec_init_v4sf, __builtin_vec_init_v2di, __builtin_vec_init_v2df, __builtin_vec_set_v1ti, __builtin_vec_set_v16qi, __builtin_vec_set_v8hi. __builtin_vec_set_v4si, __builtin_vec_set_v4sf, __builtin_vec_set_v2di, __builtin_vec_set_v2df, __builtin_vec_set_v1ti): Remove built-in definitions. --- gcc/config/rs6000/rs6000-builtins.def | 42 ++- 1 file changed, 2 insertions(+), 40 deletions(-) diff --git a/gcc/config/rs6000/rs6000-builtins.def b/gcc/config/rs6000/rs6000-builtins.def index 48ebc018a8d..8349d45169f 100644 --- a/gcc/config/rs6000/rs6000-builtins.def +++ b/gcc/config/rs6000/rs6000-builtins.def @@ -1118,37 +1118,6 @@ const signed short __builtin_vec_ext_v8hi (vss, signed int); VEC_EXT_V8HI nothing {extract} - const vsc __builtin_vec_init_v16qi (signed char, signed char, signed char, \ -signed char, signed char, signed char, signed char, signed char, \ -signed char, signed char, signed char, signed char, signed char, \ -signed char, signed char, signed char); -VEC_INIT_V16QI nothing {init} - - const vf __builtin_vec_init_v4sf (float, float, float, float); -VEC_INIT_V4SF nothing {init} - - const vsi __builtin_vec_init_v4si (signed int, signed int, signed int, \ - signed int); -VEC_INIT_V4SI nothing {init} - - const vss __builtin_vec_init_v8hi (signed short, signed short, signed short,\ - signed short, signed short, signed short, signed short, \ - signed short); -VEC_INIT_V8HI nothing {init} - - const vsc __builtin_vec_set_v16qi (vsc, signed char, const int<4>); -VEC_SET_V16QI nothing {set} - - const vf __builtin_vec_set_v4sf (vf, float, const int<2>); -VEC_SET_V4SF nothing {set} - - const vsi __builtin_vec_set_v4si (vsi, signed int, const int<2>); -VEC_SET_V4SI nothing {set} - - const vss __builtin_vec_set_v8hi (vss, signed short, const int<3>); -VEC_SET_V8HI nothing {set} - - ; Cell builtins. [cell] pure vsc __builtin_altivec_lvlx (signed long, const void *); @@ -1295,15 +1264,8 @@ const signed long long __builtin_vec_ext_v2di (vsll, signed int); VEC_EXT_V2DI nothing {extract} - const vsq __builtin_vec_init_v1ti (signed __int128); -VEC_INIT_V1TI nothing {init} - - const vd __builtin_vec_init_v2df (double, double); -VEC_INIT_V2DF nothing {init} - - const vsll __builtin_vec_init_v2di (signed long long, signed long long); -VEC_INIT_V2DI nothing {init} - +;; VEC_SET_V1TI, VEC_SET_V2DF and VEC_SET_V2DI are used in +;; resolve_vec_insert(), rs6000-c.cc const vsq __builtin_vec_set_v1ti (vsq, signed __int128, const int<0,0>); VEC_SET_V1TI nothing {set} -- 2.45.0
[PATCH ver 2] rs6000,extend and document built-ins vec_test_lsbb_all_ones and vec_test_lsbb_all_zeros
Gcc maintainers: Version 2, based on discussion additional overloaded instances of the vec_test_lsbb_all_ones and, vec_test_lsbb_all_zeros built-ins has been added. The additional instances are for arguments of vector signed char and vector bool char. The patch has been tested on Power 10 LE and BE with no regressions. Per a report from a user, the existing vec_test_lsbb_all_ones and, vec_test_lsbb_all_zeros built-ins are not documented in the GCC documentation file. The following patch adds missing documentation for the vec_test_lsbb_all_ones and, vec_test_lsbb_all_zeros built-ins. Please let me know if the patch is acceptable for mainline. Thanks. Carl rs6000,extend and document built-ins vec_test_lsbb_all_ones and vec_test_lsbb_all_zeros The built-ins currently support unsigned char arguments. Extend the built-ins to also support vector signed char and vector bool char aruments. Add documentation for the Power 10 built-ins vec_test_lsbb_all_ones and vec_test_lsbb_all_zeros. The vec_test_lsbb_all_ones built-in returns 1 if the least significant bit in each byte is a 1, returns 0 otherwise. Similarly, vec_test_lsbb_all_zeros returns a 1 if the least significant bit in each byte is a zero and 0 otherwise. Add addtional test cases for the built-ins in files: gcc/testsuite/gcc.target/powerpc/lsbb.c gcc/testsuite/gcc.target/powerpc/lsbb-runnable.c gcc/ChangeLog: * config/rs6000/rs6000-overloaded.def (vec_test_lsbb_all_ones, vec_test_lsbb_all_zeros): Add built-in instances for vector signed char and vector bool char. * doc/extend.texi (vec_test_lsbb_all_ones, vec_test_lsbb_all_zeros): Add documentation for the existing built-ins. gcc/testsuite/ChangeLog:gcc/testsuite/ChangeLog: * gcc.target/powerpc/lsbb-runnable.c: Add test cases for the vector signed char and vector bool char instances of vec_test_lsbb_all_zeros and vec_test_lsbb_all_ones built-ins. * gcc.target/powerpc/lsbb.c: Add compile test cases for the vector signed char and vector bool char instances of vec_test_lsbb_all_zeros and vec_test_lsbb_all_ones built-ins. --- gcc/config/rs6000/rs6000-overload.def | 12 +- gcc/doc/extend.texi | 19 +++ .../gcc.target/powerpc/lsbb-runnable.c | 131 ++ gcc/testsuite/gcc.target/powerpc/lsbb.c | 24 +++- 4 files changed, 156 insertions(+), 30 deletions(-) diff --git a/gcc/config/rs6000/rs6000-overload.def b/gcc/config/rs6000/rs6000-overload.def index 87495aded49..7d9e31c3f9e 100644 --- a/gcc/config/rs6000/rs6000-overload.def +++ b/gcc/config/rs6000/rs6000-overload.def @@ -4403,12 +4403,20 @@ XXEVAL XXEVAL_VUQ [VEC_TEST_LSBB_ALL_ONES, vec_test_lsbb_all_ones, __builtin_vec_xvtlsbb_all_ones] + signed int __builtin_vec_xvtlsbb_all_ones (vsc); + XVTLSBB_ONES LSBB_ALL_ONES_VSC signed int __builtin_vec_xvtlsbb_all_ones (vuc); - XVTLSBB_ONES + XVTLSBB_ONES LSBB_ALL_ONES_VUC + signed int __builtin_vec_xvtlsbb_all_ones (vbc); + XVTLSBB_ONES LSBB_ALL_ONES_VBC [VEC_TEST_LSBB_ALL_ZEROS, vec_test_lsbb_all_zeros, __builtin_vec_xvtlsbb_all_zeros] + signed int __builtin_vec_xvtlsbb_all_zeros (vsc); + XVTLSBB_ZEROS LSBB_ALL_ZEROS_VSC signed int __builtin_vec_xvtlsbb_all_zeros (vuc); - XVTLSBB_ZEROS + XVTLSBB_ZEROS LSBB_ALL_ZEROS_VUC + signed int __builtin_vec_xvtlsbb_all_zeros (vbc); + XVTLSBB_ZEROS LSBB_ALL_ZEROS_VBC [VEC_TRUNC, vec_trunc, __builtin_vec_trunc] vf __builtin_vec_trunc (vf); diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index 89fe5db7aed..5ca87889831 100644 --- a/gcc/doc/extend.texi +++ b/gcc/doc/extend.texi @@ -23332,6 +23332,25 @@ signed long long will sign extend the rightmost byte of each doubleword. The following additional built-in functions are also available for the PowerPC family of processors, starting with ISA 3.1 (@option{-mcpu=power10}): +@smallexample +@exdent int vec_test_lsbb_all_ones (vector signed char); +@exdent int vec_test_lsbb_all_ones (vector unsigned char); +@exdent int vec_test_lsbb_all_ones (vector bool char); +@end smallexample +@findex vec_test_lsbb_all_ones + +The builtin @code{vec_test_lsbb_all_ones} returns 1 if the least significant +bit in each byte is equal to 1. It returns a 0 otherwise. + +@smallexample +@exdent int vec_test_lsbb_all_zeros (vector signed char); +@exdent int vec_test_lsbb_all_zeros (vector unsigned char); +@exdent int vec_test_lsbb_all_zeros (vector bool char); +@end smallexample +@findex vec_test_lsbb_all_zeros + +The builtin @code{vec_test_lsbb_all_zeros} returns 1 if the least significant +bit in each byte is equal to zero. It returns a 0 otherwise. @smallexample @exdent vector unsigned long long int diff --git a/gcc/testsuite/gcc.target/powerpc/lsbb-runnable.c b/gcc/testsuite/gcc.target/powerpc/lsbb-runnable.c index 2e97cc17b60..3e4f71bed12 100644 -
Re: [PATCH 4/4] rs6000, Add tests and documentation for vector, conversions between integer and float
Kewen: Ping. Carl On 8/7/24 10:15 AM, Carl Love wrote: GCC maintainers: The following patch fixes errors in the definition of the __builtin_vsx_uns_floate_v2di, __builtin_vsx_uns_floato_v2di and __builtin_vsx_uns_float2_v2di built-ins. The arguments should be unsigned but are listed as signed. Additionally, there are a number of test cases that are missing for the various instances of the built-ins. Additionally, the documentation for the various built-ins is missing. This patch adds the missing test cases and documentation. The patch has been tested on Power 10 LE and BE with no regressions. Please let me know if it is acceptable for mainline. Thanks. Carl - rs6000, Add tests and documentation for vector conversions between integer and float The arguments for the __builtin_vsx_uns_floate_v2di, __builtin_vsx_uns_floato_v2di and __builtin_vsx_uns_float2_v2di built-ins should be unsigned. Add tests for the following existing integer and long long int to float built-ins: __builtin_altivecfloat_sisf (vsi); __builtin_altivec_uns_float_sisf (vui); __builtin_vsxfloate_v2di (vsll); __builtin_vsx_uns_floate_v2di (vull); __builtin_vsx_floato_v2di (vsll); __builtin_vsx_uns_floato_v2di (vull); __builtin_vsx_float2_v2di (vsll, vsll); __builtin_vsx_uns_float2_v2di (vull, vull); Add tests for the vector float to vector int built-ins: __builtin_altivec_fix_sfsi __builtin_altivec_fixuns_sfsi The various built-ins are not documented. The patch adds the missing documentation for the variouls built-ins. This patch fixes the incorrect __builtin_vsx_uns_float[o|e|2]_v2di argument types and adds test cases for each of the built-ins listed above. gcc/ChangeLog: * config/rs6000/rs6000-builtins.def (__builtin_vsx_uns_floate_v2di, __builtin_vsx_uns_floato_v2di,__builtin_vsx_uns_float2_v2di): Change argument from signed to unsigned. * doc/extend.texi: Add documentation for each of the built-ins. gcc/testsuite/ChangeLog: * gcc.target/powerpc/vsx-int-to-float-runnable.c: New file. --- gcc/config/rs6000/rs6000-builtins.def | 6 +- gcc/doc/extend.texi | 37 +++ .../powerpc/vsx-int-to-float-runnable.c | 260 ++ 3 files changed, 300 insertions(+), 3 deletions(-) create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-int-to-float-runnable.c diff --git a/gcc/config/rs6000/rs6000-builtins.def b/gcc/config/rs6000/rs6000-builtins.def index f2bebd299b2..1227daa1555 100644 --- a/gcc/config/rs6000/rs6000-builtins.def +++ b/gcc/config/rs6000/rs6000-builtins.def @@ -1463,10 +1463,10 @@ const vd __builtin_vsx_uns_doubleo_v4si (vsi); UNS_DOUBLEO_V4SI unsdoubleov4si2 {} - const vf __builtin_vsx_uns_floate_v2di (vsll); + const vf __builtin_vsx_uns_floate_v2di (vull); UNS_FLOATE_V2DI unsfloatev2di {} - const vf __builtin_vsx_uns_floato_v2di (vsll); + const vf __builtin_vsx_uns_floato_v2di (vull); UNS_FLOATO_V2DI unsfloatov2di {} const vsll __builtin_vsx_vsigned_v2df (vd); @@ -2272,7 +2272,7 @@ const vss __builtin_vsx_revb_v8hi (vss); REVB_V8HI revb_v8hi {} - const vf __builtin_vsx_uns_float2_v2di (vsll, vsll); + const vf __builtin_vsx_uns_float2_v2di (vull, vull); UNS_FLOAT2_V2DI uns_float2_v2di {} const vsi __builtin_vsx_vsigned2_v2df (vd, vd); diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index bf6f4094040..7ec4f19a6bf 100644 --- a/gcc/doc/extend.texi +++ b/gcc/doc/extend.texi @@ -22919,6 +22919,43 @@ but the index value must be 0. Only functions excluded from the PVIPR are listed here. +The following built-ins convert signed and unsigned vectors of ints and +long long ints to a vector of 32-bit floating point values. + +@smallexample +vector float __builtin_altivec_float_sisf (vector int); +vector float __builtin_altivec_uns_float_sisf (vector unsigned int); +vector float __builtin_vsx_floate_v2di (vector signed long long int); +vector float __builtin_vsx_uns_floate_v2di (vector unsigned long long int); +vector float __builtin_vsx_floato_v2di (vector signed long long int); +vector float __builtin_vsx_uns_floato_v2di (vector unsigned long long int); +vector float __builtin_vsx_float2_v2di (vector signed long long int, + vector signed long long int); +vector float __builtin_vsx_uns_float2_v2di (vector unsigned long long int, + vector signed long long int); +@end smallexample + +The @code{__builtin_altivec_float_sisf} and +@code{__builtin_altivec_uns_float_sisf} built-ins convert signed and +unsigned vectors of 32-bit integers to a vector of 32-bit floating point +values. The @code{__builtin_vsx_floate_v2di} and +@code{__builtin_vsx_uns_floate_v2di} built-ins converts a vector +long long ints to 32-bit floating point values
Re: [PATCH ver 2] rs6000,extend and document built-ins vec_test_lsbb_all_ones and vec_test_lsbb_all_zeros
Ping. Carl On 8/9/24 8:57 AM, Carl Love wrote: Gcc maintainers: Version 2, based on discussion additional overloaded instances of the vec_test_lsbb_all_ones and, vec_test_lsbb_all_zeros built-ins has been added. The additional instances are for arguments of vector signed char and vector bool char. The patch has been tested on Power 10 LE and BE with no regressions. Per a report from a user, the existing vec_test_lsbb_all_ones and, vec_test_lsbb_all_zeros built-ins are not documented in the GCC documentation file. The following patch adds missing documentation for the vec_test_lsbb_all_ones and, vec_test_lsbb_all_zeros built-ins. Please let me know if the patch is acceptable for mainline. Thanks. Carl rs6000,extend and document built-ins vec_test_lsbb_all_ones and vec_test_lsbb_all_zeros The built-ins currently support unsigned char arguments. Extend the built-ins to also support vector signed char and vector bool char aruments. Add documentation for the Power 10 built-ins vec_test_lsbb_all_ones and vec_test_lsbb_all_zeros. The vec_test_lsbb_all_ones built-in returns 1 if the least significant bit in each byte is a 1, returns 0 otherwise. Similarly, vec_test_lsbb_all_zeros returns a 1 if the least significant bit in each byte is a zero and 0 otherwise. Add addtional test cases for the built-ins in files: gcc/testsuite/gcc.target/powerpc/lsbb.c gcc/testsuite/gcc.target/powerpc/lsbb-runnable.c gcc/ChangeLog: * config/rs6000/rs6000-overloaded.def (vec_test_lsbb_all_ones, vec_test_lsbb_all_zeros): Add built-in instances for vector signed char and vector bool char. * doc/extend.texi (vec_test_lsbb_all_ones, vec_test_lsbb_all_zeros): Add documentation for the existing built-ins. gcc/testsuite/ChangeLog:gcc/testsuite/ChangeLog: * gcc.target/powerpc/lsbb-runnable.c: Add test cases for the vector signed char and vector bool char instances of vec_test_lsbb_all_zeros and vec_test_lsbb_all_ones built-ins. * gcc.target/powerpc/lsbb.c: Add compile test cases for the vector signed char and vector bool char instances of vec_test_lsbb_all_zeros and vec_test_lsbb_all_ones built-ins. --- gcc/config/rs6000/rs6000-overload.def | 12 +- gcc/doc/extend.texi | 19 +++ .../gcc.target/powerpc/lsbb-runnable.c | 131 ++ gcc/testsuite/gcc.target/powerpc/lsbb.c | 24 +++- 4 files changed, 156 insertions(+), 30 deletions(-) diff --git a/gcc/config/rs6000/rs6000-overload.def b/gcc/config/rs6000/rs6000-overload.def index 87495aded49..7d9e31c3f9e 100644 --- a/gcc/config/rs6000/rs6000-overload.def +++ b/gcc/config/rs6000/rs6000-overload.def @@ -4403,12 +4403,20 @@ XXEVAL XXEVAL_VUQ [VEC_TEST_LSBB_ALL_ONES, vec_test_lsbb_all_ones, __builtin_vec_xvtlsbb_all_ones] + signed int __builtin_vec_xvtlsbb_all_ones (vsc); + XVTLSBB_ONES LSBB_ALL_ONES_VSC signed int __builtin_vec_xvtlsbb_all_ones (vuc); - XVTLSBB_ONES + XVTLSBB_ONES LSBB_ALL_ONES_VUC + signed int __builtin_vec_xvtlsbb_all_ones (vbc); + XVTLSBB_ONES LSBB_ALL_ONES_VBC [VEC_TEST_LSBB_ALL_ZEROS, vec_test_lsbb_all_zeros, __builtin_vec_xvtlsbb_all_zeros] + signed int __builtin_vec_xvtlsbb_all_zeros (vsc); + XVTLSBB_ZEROS LSBB_ALL_ZEROS_VSC signed int __builtin_vec_xvtlsbb_all_zeros (vuc); - XVTLSBB_ZEROS + XVTLSBB_ZEROS LSBB_ALL_ZEROS_VUC + signed int __builtin_vec_xvtlsbb_all_zeros (vbc); + XVTLSBB_ZEROS LSBB_ALL_ZEROS_VBC [VEC_TRUNC, vec_trunc, __builtin_vec_trunc] vf __builtin_vec_trunc (vf); diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index 89fe5db7aed..5ca87889831 100644 --- a/gcc/doc/extend.texi +++ b/gcc/doc/extend.texi @@ -23332,6 +23332,25 @@ signed long long will sign extend the rightmost byte of each doubleword. The following additional built-in functions are also available for the PowerPC family of processors, starting with ISA 3.1 (@option{-mcpu=power10}): +@smallexample +@exdent int vec_test_lsbb_all_ones (vector signed char); +@exdent int vec_test_lsbb_all_ones (vector unsigned char); +@exdent int vec_test_lsbb_all_ones (vector bool char); +@end smallexample +@findex vec_test_lsbb_all_ones + +The builtin @code{vec_test_lsbb_all_ones} returns 1 if the least significant +bit in each byte is equal to 1. It returns a 0 otherwise. + +@smallexample +@exdent int vec_test_lsbb_all_zeros (vector signed char); +@exdent int vec_test_lsbb_all_zeros (vector unsigned char); +@exdent int vec_test_lsbb_all_zeros (vector bool char); +@end smallexample +@findex vec_test_lsbb_all_zeros + +The builtin @code{vec_test_lsbb_all_zeros} returns 1 if the least significant +bit in each byte is equal to zero. It returns a 0 otherwise. @smallexample @exdent vector unsigned long long int diff --git a/gcc/testsuite/gcc.target/powerpc/lsbb-runnable.c b/gcc/testsuite
Re: [PATCH ver 2] rs6000,extend and document built-ins vec_test_lsbb_all_ones and vec_test_lsbb_all_zeros
Kewen: On 8/20/24 12:56 AM, Kewen.Lin wrote: Hi Carl, on 2024/8/9 23:57, Carl Love wrote: Gcc maintainers: Version 2, based on discussion additional overloaded instances of the vec_test_lsbb_all_ones and, vec_test_lsbb_all_zeros built-ins has been added. The additional instances are for arguments of vector signed char and vector bool char. The patch has been tested on Power 10 LE and BE with no regressions. Per a report from a user, the existing vec_test_lsbb_all_ones and, vec_test_lsbb_all_zeros built-ins are not documented in the GCC documentation file. The following patch adds missing documentation for the vec_test_lsbb_all_ones and, vec_test_lsbb_all_zeros built-ins. Please let me know if the patch is acceptable for mainline. Thanks. Carl rs6000,extend and document built-ins vec_test_lsbb_all_ones and vec_test_lsbb_all_zeros The built-ins currently support unsigned char arguments. Extend the Nit: /unsigned char/vector unsigned char/ Fixed. built-ins to also support vector signed char and vector bool char aruments. Nit: /aruments/arguments/ Fixed ndex 89fe5db7aed..5ca87889831 100644 --- a/gcc/doc/extend.texi +++ b/gcc/doc/extend.texi @@ -23332,6 +23332,25 @@ signed long long will sign extend the rightmost byte of each doubleword. The following additional built-in functions are also available for the PowerPC family of processors, starting with ISA 3.1 (@option{-mcpu=power10}): +@smallexample +@exdent int vec_test_lsbb_all_ones (vector signed char); +@exdent int vec_test_lsbb_all_ones (vector unsigned char); +@exdent int vec_test_lsbb_all_ones (vector bool char); +@end smallexample +@findex vec_test_lsbb_all_ones + +The builtin @code{vec_test_lsbb_all_ones} returns 1 if the least significant +bit in each byte is equal to 1. It returns a 0 otherwise. Nit: s/a 0/0/ Fixed + +@smallexample +@exdent int vec_test_lsbb_all_zeros (vector signed char); +@exdent int vec_test_lsbb_all_zeros (vector unsigned char); +@exdent int vec_test_lsbb_all_zeros (vector bool char); +@end smallexample +@findex vec_test_lsbb_all_zeros + +The builtin @code{vec_test_lsbb_all_zeros} returns 1 if the least significant +bit in each byte is equal to zero. It returns a 0 otherwise. Nit: s/a 0/0/ Fixed diff --git a/gcc/testsuite/gcc.target/powerpc/lsbb.c b/gcc/testsuite/gcc.target/powerpc/lsbb.c index b5c037094a5..650e944e082 100644 --- a/gcc/testsuite/gcc.target/powerpc/lsbb.c +++ b/gcc/testsuite/gcc.target/powerpc/lsbb.c @@ -9,16 +9,32 @@ /* { dg-require-effective-target power10_ok } */ Nit: This power10_ok isn't needed, could you also remove it together? OK, removed. /* { dg-options "-fno-inline -mdejagnu-cpu=power10 -O2" } */ ... and this "-fno-inline". Removed -/* { dg-final { scan-assembler-times {\mxvtlsbb\M} 2 } } */ -/* { dg-final { scan-assembler-times {\msetbc\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mxvtlsbb\M} 3 } } */ +/* { dg-final { scan-assembler-times {\msetbc\M} 3 } } */ I would expect the times are changed to 6 rather than 3, was this test case really tested? Or am I missing something? BR, Kewen I retested and yes it fails. Should be 6. Not sure why my original testing didn't catch that. Perhaps I looked at the wrong output file??? Changed to -/* { dg-final { scan-assembler-times {\mxvtlsbb\M} 2 } } */ -/* { dg-final { scan-assembler-times {\msetbc\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mxvtlsbb\M} 6 } } */ +/* { dg-final { scan-assembler-times {\msetbc\M} 6 } } */ and retested. It now passes. Carl
[PATCH ver 3] rs6000,extend and document built-ins vec_test_lsbb_all_ones and vec_test_lsbb_all_zeros
Gcc maintainers: Version 3, fixed a few typos per Kewen's review. Fixed the expected number of scan-assembler-times for xvtlsbb and setbc. Retested on Power 10 LE. Version 2, based on discussion additional overloaded instances of the vec_test_lsbb_all_ones and, vec_test_lsbb_all_zeros built-ins has been added. The additional instances are for arguments of vector signed char and vector bool char. The patch has been tested on Power 10 LE and BE with no regressions. Per a report from a user, the existing vec_test_lsbb_all_ones and, vec_test_lsbb_all_zeros built-ins are not documented in the GCC documentation file. The following patch adds missing documentation for the vec_test_lsbb_all_ones and, vec_test_lsbb_all_zeros built-ins. Please let me know if the patch is acceptable for mainline. Thanks. Carl rs6000,extend and document built-ins vec_test_lsbb_all_ones and vec_test_lsbb_all_zeros The built-ins currently support vector unsigned char arguments. Extend the built-ins to also support vector signed char and vector bool char arguments. Add documentation for the Power 10 built-ins vec_test_lsbb_all_ones and vec_test_lsbb_all_zeros. The vec_test_lsbb_all_ones built-in returns 1 if the least significant bit in each byte is a 1, returns 0 otherwise. Similarly, vec_test_lsbb_all_zeros returns a 1 if the least significant bit in each byte is a zero and 0 otherwise. Add addtional test cases for the built-ins in files: gcc/testsuite/gcc.target/powerpc/lsbb.c gcc/testsuite/gcc.target/powerpc/lsbb-runnable.c gcc/ChangeLog: * config/rs6000/rs6000-overloaded.def (vec_test_lsbb_all_ones, vec_test_lsbb_all_zeros): Add built-in instances for vector signed char and vector bool char. * doc/extend.texi (vec_test_lsbb_all_ones, vec_test_lsbb_all_zeros): Add documentation for the existing built-ins. gcc/testsuite/ChangeLog:gcc/testsuite/ChangeLog: * gcc.target/powerpc/lsbb-runnable.c: Add test cases for the vector signed char and vector bool char instances of vec_test_lsbb_all_zeros and vec_test_lsbb_all_ones built-ins. * gcc.target/powerpc/lsbb.c: Add compile test cases for the vector signed char and vector bool char instances of vec_test_lsbb_all_zeros and vec_test_lsbb_all_ones built-ins. --- gcc/config/rs6000/rs6000-overload.def | 12 +- gcc/doc/extend.texi | 19 +++ .../gcc.target/powerpc/lsbb-runnable.c | 131 ++ gcc/testsuite/gcc.target/powerpc/lsbb.c | 28 +++- 4 files changed, 158 insertions(+), 32 deletions(-) diff --git a/gcc/config/rs6000/rs6000-overload.def b/gcc/config/rs6000/rs6000-overload.def index 87495aded49..7d9e31c3f9e 100644 --- a/gcc/config/rs6000/rs6000-overload.def +++ b/gcc/config/rs6000/rs6000-overload.def @@ -4403,12 +4403,20 @@ XXEVAL XXEVAL_VUQ [VEC_TEST_LSBB_ALL_ONES, vec_test_lsbb_all_ones, __builtin_vec_xvtlsbb_all_ones] + signed int __builtin_vec_xvtlsbb_all_ones (vsc); + XVTLSBB_ONES LSBB_ALL_ONES_VSC signed int __builtin_vec_xvtlsbb_all_ones (vuc); - XVTLSBB_ONES + XVTLSBB_ONES LSBB_ALL_ONES_VUC + signed int __builtin_vec_xvtlsbb_all_ones (vbc); + XVTLSBB_ONES LSBB_ALL_ONES_VBC [VEC_TEST_LSBB_ALL_ZEROS, vec_test_lsbb_all_zeros, __builtin_vec_xvtlsbb_all_zeros] + signed int __builtin_vec_xvtlsbb_all_zeros (vsc); + XVTLSBB_ZEROS LSBB_ALL_ZEROS_VSC signed int __builtin_vec_xvtlsbb_all_zeros (vuc); - XVTLSBB_ZEROS + XVTLSBB_ZEROS LSBB_ALL_ZEROS_VUC + signed int __builtin_vec_xvtlsbb_all_zeros (vbc); + XVTLSBB_ZEROS LSBB_ALL_ZEROS_VBC [VEC_TRUNC, vec_trunc, __builtin_vec_trunc] vf __builtin_vec_trunc (vf); diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index 89fe5db7aed..8971d9fbf3c 100644 --- a/gcc/doc/extend.texi +++ b/gcc/doc/extend.texi @@ -23332,6 +23332,25 @@ signed long long will sign extend the rightmost byte of each doubleword. The following additional built-in functions are also available for the PowerPC family of processors, starting with ISA 3.1 (@option{-mcpu=power10}): +@smallexample +@exdent int vec_test_lsbb_all_ones (vector signed char); +@exdent int vec_test_lsbb_all_ones (vector unsigned char); +@exdent int vec_test_lsbb_all_ones (vector bool char); +@end smallexample +@findex vec_test_lsbb_all_ones + +The builtin @code{vec_test_lsbb_all_ones} returns 1 if the least significant +bit in each byte is equal to 1. It returns 0 otherwise. + +@smallexample +@exdent int vec_test_lsbb_all_zeros (vector signed char); +@exdent int vec_test_lsbb_all_zeros (vector unsigned char); +@exdent int vec_test_lsbb_all_zeros (vector bool char); +@end smallexample +@findex vec_test_lsbb_all_zeros + +The builtin @code{vec_test_lsbb_all_zeros} returns 1 if the least significant +bit in each byte is equal to zero. It returns 0 otherwise. @smallexample @exdent vector unsi
Re: [PATCH 1/13 ver 3] rs6000, Remove __builtin_vsx_cmple* builtins
Kewen: On 6/3/24 23:00, Kewen.Lin wrote: > Hi Carl, > > on 2024/5/29 23:52, Carl Love wrote: >> This patch was approved in the previous series. There are no changes to >> this patch. Reposting for completeness. > I guess you can just push the approved ones, as there is no dependency > between any two of them? It can help to reduce the size of this series. The patches do touch some similar files so they are not completely independent from a patch standpoint. Functionally they are all independent. I tried applying the approved patches only to the current mainline tree. The approved patches were: 1,3,5 (with tweak), 6, 8, 9, 10, 12. Patch 5 requires a little rebasing due to a little fuzz in the lines. Not a big deal. Patch 8 also doesn't apply cleanly with git. The patch command gets a little confused when I tried to use it, so I had to manually "recreate" the patch. The changes are straight forward so that is fairly easy. The rest of the patches applied cleanly with git. I am guessing there will be some rebasing needed for the non-approved patches to apply them after the approved patches. The main reason that I reposted everything was that the patch numbers changed and I wanted it to be fairly clear what was going on. I toyed with the idea of committing the 8 approved patches and then working on the additional 5 but I think that is hard as I would have to manually adjust the patch numbers to keep them lined up with version 3 or version 4 has a new numbering patches 1 to 5 (i.e. remapping of version 3 patch numbers). Either way I think it would be hard/confusing. Given that separating out the approved and non-approved patches causes some re-basing issues, it is probably best to just update the 5 patches, posting them as version 4 and not re-post the whole series. I will just note in the header patch 0/13 the patches that have already been approved. I hope that is ok? Carl
Re: [PATCH 4/13 ver 3] rs6000, extend the current vec_{un,}signed{e,o} built-ins
Kewen: On 6/4/24 00:19, Kewen.Lin wrote: > Hi, > > on 2024/5/29 23:58, Carl Love wrote: >> Updated the patch per the feedback comments from the previous version. >> >> Carl >> --- >> >> rs6000, extend the current vec_{un,}signed{e,o} built-ins >> >> The built-ins __builtin_vsx_xvcvspsxds and __builtin_vsx_xvcvspuxds >> convert a vector of floats to signed/unsigned long long ints. Extend the >> existing vec_{un,}signed{e,o} built-ins to handle the argument >> vector of floats to return the even/odd signed/unsigned integers. >> >> The define expands vsignede_v4sf, vsignedo_v4sf, vunsignede_v4sf, >> vunsignedo_v4sf are added to support the new vec_{un,}signed{e,o} >> built-ins. >> >> The built-ins __builtin_vsx_xvcvspsxds and __builtin_vsx_xvcvspuxds are >> now for internal use only. They are not documented and they do not >> have testcases. >>> The built-in __builtin_vsx_xvcvdpsxws is redundant as it is covered by >> vec_signed{e,o}, remove. >> >> The built-in __builtin_vsx_xvcvdpuxws is redundant as it is covered by >> vec_unsigned{e,o}, remove. >> >> The built-in __builtin_vsx_xvcvdpuxds_uns is redundant as it is covered by >> vec_unsigned, remove. >> >> The __builtin_vsx_xvcvspuxws is redundante as it is covered by >> vec_unsigned, remove. > > I perfer to move these removals into sub-patch 2/13 or split them out into > a new patch, since they don't match the subject of this patch. Moving it > to sub-patch 2/13 looks good as they are all about vec_{un,}signed{,e,o}. Yes, we need to have all of the vec_unsigned in the same patch. Moved __builtin_vsx_xvcvdpuxds_uns and __builtin_vsx_xvcvspuxws to patch 2. > >> >> Add testcases and update documentation. >> >> gcc/ChangeLog: >> * config/rs6000/rs6000-builtins.def (__builtin_vsx_xvcvspsxds_low, >> __builtin_vsx_xvcvspuxds_low): New built-in definitions. >> (__builtin_vsx_xvcvspuxds): Fix return type. >> (XVCVSPSXDS, XVCVSPUXDS): Renamed VEC_VSIGNEDE_V4SF, >> VEC_VUNSIGNEDE_V4SF respectively. >> (vsx_xvcvspsxds, vsx_xvcvspuxds): Renamed vsignede_v4sf, >> vunsignede_v4sf respectively. >> (__builtin_vsx_xvcvdpsxws, __builtin_vsx_xvcvdpuxws, >> __builtin_vsx_xvcvdpuxds_uns, __builtin_vsx_xvcvspuxws): Removed. >> * config/rs6000/rs6000-overload.def (vec_signede, vec_signedo, >> vec_unsignede,vec_unsignedo): Add new overloaded specifications. >> * config/rs6000/vsx.md (vsignede_v4sf, vsignedo_v4sf, >> vunsignede_v4sf, vunsignedo_v4sf): New define_expands. >> * doc/extend.texi (vec_signedo, vec_signede): Add documentation. >> >> gcc/testsuite/ChangeLog: >> * gcc.target/powerpc/builtins-3-runnable.c: New tests for the added >> overloaded built-ins. >> --- >> gcc/config/rs6000/rs6000-builtins.def | 25 ++ >> gcc/config/rs6000/rs6000-overload.def | 8 ++ >> gcc/config/rs6000/vsx.md | 88 +++ >> gcc/doc/extend.texi | 10 +++ >> .../gcc.target/powerpc/builtins-3-runnable.c | 51 +-- >> 5 files changed, 157 insertions(+), 25 deletions(-) >> >> diff --git a/gcc/config/rs6000/rs6000-builtins.def >> b/gcc/config/rs6000/rs6000-builtins.def >> index bf9a0ae22fc..cea2649b86c 100644 >> --- a/gcc/config/rs6000/rs6000-builtins.def >> +++ b/gcc/config/rs6000/rs6000-builtins.def >> @@ -1688,32 +1688,23 @@ >>const vsll __builtin_vsx_xvcvdpsxds_scale (vd, const int); >> XVCVDPSXDS_SCALE vsx_xvcvdpsxds_scale {} >> >> - const vsi __builtin_vsx_xvcvdpsxws (vd); >> -XVCVDPSXWS vsx_xvcvdpsxws {} >> - >> - const vsll __builtin_vsx_xvcvdpuxds (vd); >> -XVCVDPUXDS vsx_fixuns_truncv2dfv2di2 {} >> - >>const vsll __builtin_vsx_xvcvdpuxds_scale (vd, const int); >> XVCVDPUXDS_SCALE vsx_xvcvdpuxds_scale {} >> >> - const vull __builtin_vsx_xvcvdpuxds_uns (vd); >> -XVCVDPUXDS_UNS vsx_fixuns_truncv2dfv2di2 {} >> - >> - const vsi __builtin_vsx_xvcvdpuxws (vd); >> -XVCVDPUXWS vsx_xvcvdpuxws {} >> - >>const vd __builtin_vsx_xvcvspdp (vf); >> XVCVSPDP vsx_xvcvspdp {} >> >>const vsll __builtin_vsx_xvcvspsxds (vf); >> -XVCVSPSXDS vsx_xvcvspsxds {} >> +VEC_VSIGNEDE_V4SF vsignede_v4sf {} > > We should rename __builtin_vsx_xvcvspsxds to > __builtin_vsx_vsignede_v4sf, one reason is to align with
Re: [PATCH 7/13 ver 3] rs6000, add overloaded vec_sel with int128 arguments
Kewen: On 6/3/24 22:58, Kewen.Lin wrote: > Hi, > > on 2024/5/30 00:03, Carl Love wrote: >> This was patch 6 in the previous series. Updated the documentation file per >> the comments. No functional changes to the patch. >> >> Carl >> >> >> rs6000, add overloaded vec_sel with int128 arguments >> >> Extend the vec_sel built-in to take three signed/unsigned int128 arguments >> and return a signed/unsigned int128 result. >> >> Extending the vec_sel built-in makes the existing buit-ins >> __builtin_vsx_xxsel_1ti and __builtin_vsx_xxsel_1ti_uns obsolete. The >> patch removes these built-ins. >> >> The patch adds documentation and test cases for the new overloaded vec_sel >> built-ins. >> >> gcc/ChangeLog: >> * config/rs6000/rs6000-builtins.def (__builtin_vsx_xxsel_1ti, >> __builtin_vsx_xxsel_1ti_uns): Remove built-in definitions. >> * config/rs6000/rs6000-overload.def (vec_sel): Add new overloaded >> definitions. >> * doc/extend.texi: Add documentation for new vec_sel instances. >> >> gcc/testsuite/ChangeLog: >> * gcc.target/powerpc/vec-sel-runnable-i128.c: New test file. >> --- >> gcc/config/rs6000/rs6000-builtins.def | 6 - >> gcc/config/rs6000/rs6000-overload.def | 4 + >> gcc/doc/extend.texi | 12 ++ >> .../powerpc/vec-sel-runnable-i128.c | 129 ++ >> 4 files changed, 145 insertions(+), 6 deletions(-) >> create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-sel-runnable-i128.c >> >> diff --git a/gcc/config/rs6000/rs6000-builtins.def >> b/gcc/config/rs6000/rs6000-builtins.def >> index 13e36df008d..ea0da77f13e 100644 >> --- a/gcc/config/rs6000/rs6000-builtins.def >> +++ b/gcc/config/rs6000/rs6000-builtins.def >> @@ -1904,12 +1904,6 @@ >>const vuc __builtin_vsx_xxsel_16qi_uns (vuc, vuc, vuc); >> XXSEL_16QI_UNS vector_select_v16qi_uns {} >> >> - const vsq __builtin_vsx_xxsel_1ti (vsq, vsq, vsq); >> -XXSEL_1TI vector_select_v1ti {} >> - >> - const vsq __builtin_vsx_xxsel_1ti_uns (vsq, vsq, vsq); >> -XXSEL_1TI_UNS vector_select_v1ti_uns {} >> - >>const vd __builtin_vsx_xxsel_2df (vd, vd, vd); >> XXSEL_2DF vector_select_v2df {} >> >> diff --git a/gcc/config/rs6000/rs6000-overload.def >> b/gcc/config/rs6000/rs6000-overload.def >> index 4d857bb1af3..a210c5ad10d 100644 >> --- a/gcc/config/rs6000/rs6000-overload.def >> +++ b/gcc/config/rs6000/rs6000-overload.def >> @@ -3274,6 +3274,10 @@ >> VSEL_2DF VSEL_2DF_B >>vd __builtin_vec_sel (vd, vd, vull); >> VSEL_2DF VSEL_2DF_U >> + vsq __builtin_vec_sel (vsq, vsq, vsq); >> +VSEL_1TI VSEL_1TI_S >> + vuq __builtin_vec_sel (vuq, vuq, vuq); >> +VSEL_1TI_UNS VSEL_1TI_U > > I just noticed that for integral types, such as: signed/unsigned int, we have > six instances: > > vsi __builtin_vec_sel (vsi, vsi, vbi); > VSEL_4SI VSEL_4SI_B > vsi __builtin_vec_sel (vsi, vsi, vui); > VSEL_4SI VSEL_4SI_U > vui __builtin_vec_sel (vui, vui, vbi); > VSEL_4SI_UNS VSEL_4SI_UB > vui __builtin_vec_sel (vui, vui, vui); > VSEL_4SI_UNS VSEL_4SI_UU > vbi __builtin_vec_sel (vbi, vbi, vbi); > VSEL_4SI_UNS VSEL_4SI_BB > vbi __builtin_vec_sel (vbi, vbi, vui); > > It considers the control vector can only have unsigned and bool types, also > consider the > return type can be bool. It aligns with what PVIPR defines, so here we > should have: > > vsq __builtin_vec_sel (vsq, vsq, vbq); > vsq __builtin_vec_sel (vsq, vsq, vuq); > vuq __builtin_vec_sel (vuq, vuq, vbq); > vuq __builtin_vec_sel (vuq, vuq, vuq); > vbq __builtin_vec_sel (vbq, vbq, vbq); > vbq __builtin_vec_sel (vbq, vbq, vuq); > > Sorry that I didn't find this in the previous review. Yea, my bad I missed that as well. Fixed to add all six instances. > > >> ; The following variants are deprecated. >>vsll __builtin_vec_sel (vsll, vsll, vsll); >> VSEL_2DI_B VSEL_2DI_S >> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi >> index b88e61641a2..0756230b19e 100644 >> --- a/gcc/doc/extend.texi >> +++ b/gcc/doc/extend.texi >> @@ -21372,6 +21372,18 @@ Additional built-in functions are available for the >> 64-bit PowerPC >> family of processors, for efficient use of 128-bit floating point >> (@code{__float128}) values. >> >> +Vector select >> + >>
Re: [PATCH 11/13 ver 3] rs6000, extend vec_xxpermdi built-in for __int128 args
Kewen: On 6/3/24 22:58, Kewen.Lin wrote: > Hi, > > on 2024/5/30 00:10, Carl Love wrote: >> This was patch 10 from the previous series. The patch was updated to >> address feedback comments. >> >> Carl >> --- >> >> rs6000, extend vec_xxpermdi built-in for __int128 args >> >> Add a new signed and unsigned overloaded instances for vec_xxpermdi >> >>__int128 vec_xxpermdi (__int128, __int128, const int); >>__uint128 vec_xxpermdi (__uint128, __uint128, const int); >> >> Update the documentation to include a reference to the new built-in >> instances. >> >> Add test cases for the new overloaded instances. >> >> gcc/ChangeLog: >> * config/rs6000/rs6000-overload.def (vec_xxpermdi): Add new >> overloaded built-in instances. >> * doc/extend.texi: Add documentation for new overloaded built-in >> instances. >> >> gcc/testsuite/ChangeLog:gcc/testsuite/ChangeLog: >> * gcc.target/powerpc/vec_perm-runnable-i128.c: New test file. >> --- >> gcc/config/rs6000/rs6000-overload.def | 4 + >> gcc/doc/extend.texi | 2 + >> .../powerpc/vec_perm-runnable-i128.c | 229 ++ >> 3 files changed, 235 insertions(+) >> create mode 100644 gcc/testsuite/gcc.target/powerpc/vec_perm-runnable-i128.c >> >> diff --git a/gcc/config/rs6000/rs6000-overload.def >> b/gcc/config/rs6000/rs6000-overload.def >> index a210c5ad10d..45000f161e4 100644 >> --- a/gcc/config/rs6000/rs6000-overload.def >> +++ b/gcc/config/rs6000/rs6000-overload.def >> @@ -4932,6 +4932,10 @@ >> XXPERMDI_4SF XXPERMDI_VF >>vd __builtin_vsx_xxpermdi (vd, vd, const int); >> XXPERMDI_2DF XXPERMDI_VD >> + vsq __builtin_vsx_xxpermdi (vsq, vsq, const int); >> +XXPERMDI_1TI XXPERMDI_1TI >> + vuq __builtin_vsx_xxpermdi (vuq, vuq, const int); >> +XXPERMDI_1TI XXPERMDI_1TUI > > Nits: > - Move them before "vf __builtin_vsx_xxpermdi (vf, vf, const int);" so > they are close to instances for other integral types. > - As the existing name convention, _{SQ,UQ} are better. > > vsq __builtin_vsx_xxpermdi (vsq, vsq, const int); >XXPERMDI_1TI XXPERMDI_1SQ > vuq __builtin_vsx_xxpermdi (vuq, vuq, const int); >XXPERMDI_1TI XXPERMDI_1UQ > OK, moved the definitions up and changed the names. >> >> [VEC_XXSLDWI, vec_xxsldwi, __builtin_vsx_xxsldwi] >>vsc __builtin_vsx_xxsldwi (vsc, vsc, const int); >> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi >> index 0756230b19e..edfef1bdab7 100644 >> --- a/gcc/doc/extend.texi >> +++ b/gcc/doc/extend.texi >> @@ -22555,6 +22555,8 @@ void vec_vsx_st (vector bool char, int, signed char >> *); >> vector double vec_xxpermdi (vector double, vector double, const int); >> vector float vec_xxpermdi (vector float, vector float, const int); >> vector long long vec_xxpermdi (vector long long, vector long long, const >> int); > >> +vector __int128 vec_xxpermdi (vector __int128, vector __int128, const int); >> +vector __int128 vec_xxpermdi (vector __uint128, vector __uint128, const >> int); > > Nit: These two lines break the long long and unsigned long long lines, can > you move > them one line upward? Also using the explicit "signed" and "unsigned" would > be > better than "__{u,}int128". > Yup, I didn't get them in the right place. Fixed. >> vector unsigned long long vec_xxpermdi (vector unsigned long long, >> vector unsigned long long, const >> int); >> vector int vec_xxpermdi (vector int, vector int, const int); >> diff --git a/gcc/testsuite/gcc.target/powerpc/vec_perm-runnable-i128.c >> b/gcc/testsuite/gcc.target/powerpc/vec_perm-runnable-i128.c >> new file mode 100644 >> index 000..2d5dce09404 >> --- /dev/null >> +++ b/gcc/testsuite/gcc.target/powerpc/vec_perm-runnable-i128.c >> @@ -0,0 +1,229 @@ >> +/* { dg-do run } */ >> +/* { dg-require-effective-target vmx_hw } */ >> +/* { dg-options "-save-temps" } */ > > Nit: dg-options line isn't needed as it doesn't check assembly. Removed the save-temps. > > BR, > Kewen > >> + >> +#include >> + >> +#define DEBUG 0 >> + >> +#if DEBUG >> +#include >> +void print_i128 (unsigned __int128 val) >> +{ >> + printf(" 0x%016llx%016llx", >> +
[PATCH] rs6000, altivec-2-runnable.c should be a runnable test
GCC maintainers: The test gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c is supposed to be a runnable test to verify the execution of the vec_unpackl and vec_unpackh built-ins. The dg-do command is set to compile not run. This patch fixes the dg-do command argument. The patch has been verified on a P10. The test runs without errors. Please let me know if the patch is acceptable. Thanks. Carl - rs6000, altivec-2-runnable.c should be a runnable test The test case has "dg-do compile" set not "dg-do run" for a runnable test. This patch changes the dg-do command argument to run. gcc/testsuite/ChangeLog:gcc/testsuite/ChangeLog: * gcc.target/powerpc/altivec-2-runnable.c: Change dg-do argument to run. --- gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c b/gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c index 6975ea57e65..3e66435d0d2 100644 --- a/gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c +++ b/gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c @@ -1,4 +1,4 @@ -/* { dg-do compile { target powerpc*-*-* } } */ +/* { dg-do run { target powerpc*-*-* } } */ /* { dg-options "-mvsx" } */ /* { dg-additional-options "-mdejagnu-cpu=power8" { target { ! has_arch_pwr8 } } } */ /* { dg-require-effective-target powerpc_vsx } */ -- 2.45.0
[PATCH 2/13 ver4] rs6000, Remove __builtin_vsx_xvcvspsxws,, __builtin_vsx_xvcvdpuxds_uns, __builtin_vsx_xvcvspuxws built-ins.
GCC maintainers: Per the comments on patch 0004 from version 3, the removal of The built-in __builtin_vsx_xvcvdpuxds_uns and __builtin_vsx_xvcvspuxws was moved to this patch. The rest of the patch is unchanged from version 3. There were no comments on this patch for version 3. Please let me know if this patch is acceptable. Thanks. Carl - rs6000, Remove __builtin_vsx_xvcvspsxws, __builtin_vsx_xvcvdpuxds_uns, __builtin_vsx_xvcvspuxws built-ins. The built-in __builtin_vsx_xvcvspsxws is a duplicate of the vec_signed built-in that is documented in the PVIPR. The __builtin_vsx_xvcvspsxws built-in is not documented and there are no test cases for it. The built-in __builtin_vsx_xvcvdpuxds_uns is redundant as it is covered by vec_unsigned, remove. The __builtin_vsx_xvcvspuxws is redundant as it is covered by vec_unsigned, remove. This patch removes the redundant built-in. gcc/ChangeLog: * config/rs6000/rs6000-builtins.def (__builtin_vsx_xvcvspsxws, __builtin_vsx_xvcvdpuxds_uns, __builtin_vsx_xvcvspuxws): Remove built-in definitions. --- gcc/config/rs6000/rs6000-builtins.def | 9 - 1 file changed, 9 deletions(-) diff --git a/gcc/config/rs6000/rs6000-builtins.def b/gcc/config/rs6000/rs6000-builtins.def index 7c36976a089..8cf0b715898 100644 --- a/gcc/config/rs6000/rs6000-builtins.def +++ b/gcc/config/rs6000/rs6000-builtins.def @@ -1697,9 +1697,6 @@ const vsll __builtin_vsx_xvcvdpuxds_scale (vd, const int); XVCVDPUXDS_SCALE vsx_xvcvdpuxds_scale {} - const vull __builtin_vsx_xvcvdpuxds_uns (vd); -XVCVDPUXDS_UNS vsx_fixuns_truncv2dfv2di2 {} - const vsi __builtin_vsx_xvcvdpuxws (vd); XVCVDPUXWS vsx_xvcvdpuxws {} @@ -1709,15 +1706,9 @@ const vsll __builtin_vsx_xvcvspsxds (vf); XVCVSPSXDS vsx_xvcvspsxds {} - const vsi __builtin_vsx_xvcvspsxws (vf); -XVCVSPSXWS vsx_fix_truncv4sfv4si2 {} - const vsll __builtin_vsx_xvcvspuxds (vf); XVCVSPUXDS vsx_xvcvspuxds {} - const vsi __builtin_vsx_xvcvspuxws (vf); -XVCVSPUXWS vsx_fixuns_truncv4sfv4si2 {} - const vd __builtin_vsx_xvcvsxddp (vsll); XVCVSXDDP vsx_floatv2div2df2 {} -- 2.45.0
[PATCH 11/13 ver4] rs6000, extend vec_xxpermdi built-in for __int128 args
GCC maintainers: The patch has been updated per the comments from version 3. Please let me know if the patch is acceptable for mainline. Thanks. Carl - rs6000, extend vec_xxpermdi built-in for __int128 args Add a new signed and unsigned overloaded instances for vec_xxpermdi __int128 vec_xxpermdi (__int128, __int128, const int); __uint128 vec_xxpermdi (__uint128, __uint128, const int); Update the documentation to include a reference to the new built-in instances. Add test cases for the new overloaded instances. gcc/ChangeLog: * config/rs6000/rs6000-overload.def (vec_xxpermdi): Add new overloaded built-in instances. * doc/extend.texi: Add documentation for new overloaded built-in instances. gcc/testsuite/ChangeLog:gcc/testsuite/ChangeLog: * gcc.target/powerpc/vec_perm-runnable-i128.c: New test file. --- gcc/config/rs6000/rs6000-overload.def | 4 + gcc/doc/extend.texi | 4 + .../powerpc/vec_perm-runnable-i128.c | 229 ++ 3 files changed, 237 insertions(+) create mode 100644 gcc/testsuite/gcc.target/powerpc/vec_perm-runnable-i128.c diff --git a/gcc/config/rs6000/rs6000-overload.def b/gcc/config/rs6000/rs6000-overload.def index 6cec1ad4f1a..354f8fabe0f 100644 --- a/gcc/config/rs6000/rs6000-overload.def +++ b/gcc/config/rs6000/rs6000-overload.def @@ -4936,6 +4936,10 @@ XXPERMDI_2DI XXPERMDI_VSLL vull __builtin_vsx_xxpermdi (vull, vull, const int); XXPERMDI_2DI XXPERMDI_VULL + vsq __builtin_vsx_xxpermdi (vsq, vsq, const int); +XXPERMDI_1TI XXPERMDI_1SQ + vuq __builtin_vsx_xxpermdi (vuq, vuq, const int); +XXPERMDI_1TI XXPERMDI_1UQ vf __builtin_vsx_xxpermdi (vf, vf, const int); XXPERMDI_4SF XXPERMDI_VF vd __builtin_vsx_xxpermdi (vd, vd, const int); diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index d7d8d149a43..9e45976436b 100644 --- a/gcc/doc/extend.texi +++ b/gcc/doc/extend.texi @@ -22610,6 +22610,10 @@ void vec_vsx_st (vector bool char, int, signed char *); vector double vec_xxpermdi (vector double, vector double, const int); vector float vec_xxpermdi (vector float, vector float, const int); +vector __int128 vec_xxpermdi (vector signed __int128, + vector signed __int128, const int); +vector __int128 vec_xxpermdi (vector unsigned __int128, + vector unsigned __int128, const int); vector long long vec_xxpermdi (vector long long, vector long long, const int); vector unsigned long long vec_xxpermdi (vector unsigned long long, vector unsigned long long, const int); diff --git a/gcc/testsuite/gcc.target/powerpc/vec_perm-runnable-i128.c b/gcc/testsuite/gcc.target/powerpc/vec_perm-runnable-i128.c new file mode 100644 index 000..0e0d77bcb84 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/vec_perm-runnable-i128.c @@ -0,0 +1,229 @@ +/* { dg-do run } */ +/* { dg-require-effective-target vmx_hw } */ +/* { dg-options "-maltivec -O2 " } */ + +#include + +#define DEBUG 0 + +#if DEBUG +#include +void print_i128 (unsigned __int128 val) +{ + printf(" 0x%016llx%016llx", + (unsigned long long)(val >> 64), + (unsigned long long)(val & 0x)); +} +#endif + +extern void abort (void); + +union convert_union { + vector signed __int128s128; + vector unsigned __int128 u128; + char val[16]; +} convert; + +int check_u128_result(vector unsigned __int128 vresult_u128, + vector unsigned __int128 expected_vresult_u128) +{ + /* Use a for loop to check each byte manually so the test case will + run with ISA 2.06. + + Return 1 if they match, 0 otherwise. */ + + int i; + + union convert_union result; + union convert_union expected; + + result.u128 = vresult_u128; + expected.u128 = expected_vresult_u128; + + /* Check if each byte of the result and expected match. */ + for (i = 0; i < 16; i++) +{ + if (result.val[i] != expected.val[i]) + return 0; +} + return 1; +} + +int check_s128_result(vector signed __int128 vresult_s128, + vector signed __int128 expected_vresult_s128) +{ + /* Convert the arguments to unsigned, then check equality. */ + union convert_union result; + union convert_union expected; + + result.s128 = vresult_s128; + expected.s128 = expected_vresult_s128; + + return check_u128_result (result.u128, expected.u128); +} + + +int +main (int argc, char *argv []) +{ + int i; + + vector signed __int128 src_va_s128; + vector signed __int128 src_vb_s128; + vector signed __int128 vresult_s128; + vector signed __int128 expected_vresult_s128; + + vector unsigned __int128 src_va_u128; + vector unsigned __int128 src_vb_u128; + vector unsigned __int128 src_vc_u128; + vector unsigned __int128 vresult_u128; + vector unsigned __int128 expected_
Re: [PATCH 13/13 ver4] rs6000, remove vector set and vector init built-ins
GCC maintainers: The patch has been updated per the feedback from version 3. Please let me know it the patch is acceptable for mainline. Thanks. Carl -- rs6000, remove vector set and vector init built-ins The vector init built-ins: __builtin_vec_init_v16qi, __builtin_vec_init_v8hi, __builtin_vec_init_v4si, __builtin_vec_init_v4sf, __builtin_vec_init_v2di, __builtin_vec_init_v2df, __builtin_vec_init_v1ti perform the same operation as initializing the vector in C code. For example: result_v4si = __builtin_vec_init_v4si (1, 2, 3, 4); result_v4si = {1, 2, 3, 4}; These two constructs were tested and verified they generate identical assembly instructions with no optimization and -O3 optimization. The vector set built-ins: __builtin_vec_set_v16qi, __builtin_vec_set_v8hi. __builtin_vec_set_v4si, __builtin_vec_set_v4sf, __builtin_vec_set_v1ti, __builtin_vec_set_v2di, __builtin_vec_set_v2df perform the same operation as setting a specific element in the vector in C code. For example: src_v4si = __builtin_vec_set_v4si (src_v4si, int_val, index); src_v4si[index] = int_val; The built-in actually generates more instructions than the inline C code with no optimization but is identical with -O3 optimizations. All of the above built-ins that are removed do not have test cases and are not documented. Built-ins __builtin_vec_set_v1ti __builtin_vec_set_v2di, __builtin_vec_set_v2df are not removed as they are used in function resolve_vec_insert() in file rs6000-c.cc. The built-ins are removed as they don't provide any benefit over just using C code. gcc/ChangeLog: * config/rs6000/rs6000-builtins.def (__builtin_vec_init_v16qi, __builtin_vec_init_v4sf, __builtin_vec_init_v4si, __builtin_vec_init_v8hi, __builtin_vec_init_v1ti, __builtin_vec_init_v2df, __builtin_vec_init_v2di, __builtin_vec_set_v16qi, __builtin_vec_set_v4sf, __builtin_vec_set_v4si, __builtin_vec_set_v8hi): Remove built-in definitions. --- gcc/config/rs6000/rs6000-builtins.def | 44 +++ 1 file changed, 4 insertions(+), 40 deletions(-) diff --git a/gcc/config/rs6000/rs6000-builtins.def b/gcc/config/rs6000/rs6000-builtins.def index 02aa04e5698..053dc0115d2 100644 --- a/gcc/config/rs6000/rs6000-builtins.def +++ b/gcc/config/rs6000/rs6000-builtins.def @@ -1118,37 +1118,6 @@ const signed short __builtin_vec_ext_v8hi (vss, signed int); VEC_EXT_V8HI nothing {extract} - const vsc __builtin_vec_init_v16qi (signed char, signed char, signed char, \ -signed char, signed char, signed char, signed char, signed char, \ -signed char, signed char, signed char, signed char, signed char, \ -signed char, signed char, signed char); -VEC_INIT_V16QI nothing {init} - - const vf __builtin_vec_init_v4sf (float, float, float, float); -VEC_INIT_V4SF nothing {init} - - const vsi __builtin_vec_init_v4si (signed int, signed int, signed int, \ - signed int); -VEC_INIT_V4SI nothing {init} - - const vss __builtin_vec_init_v8hi (signed short, signed short, signed short,\ - signed short, signed short, signed short, signed short, \ - signed short); -VEC_INIT_V8HI nothing {init} - - const vsc __builtin_vec_set_v16qi (vsc, signed char, const int<4>); -VEC_SET_V16QI nothing {set} - - const vf __builtin_vec_set_v4sf (vf, float, const int<2>); -VEC_SET_V4SF nothing {set} - - const vsi __builtin_vec_set_v4si (vsi, signed int, const int<2>); -VEC_SET_V4SI nothing {set} - - const vss __builtin_vec_set_v8hi (vss, signed short, const int<3>); -VEC_SET_V8HI nothing {set} - - ; Cell builtins. [cell] pure vsc __builtin_altivec_lvlx (signed long, const void *); @@ -1295,15 +1264,10 @@ const signed long long __builtin_vec_ext_v2di (vsll, signed int); VEC_EXT_V2DI nothing {extract} - const vsq __builtin_vec_init_v1ti (signed __int128); -VEC_INIT_V1TI nothing {init} - - const vd __builtin_vec_init_v2df (double, double); -VEC_INIT_V2DF nothing {init} - - const vsll __builtin_vec_init_v2di (signed long long, signed long long); -VEC_INIT_V2DI nothing {init} - +;; VEC_SET_V1TI, VEC_SET_V2DF and VEC_SET_V2DI are used in +;; resolve_vec_insert(), rs6000-c.cc +;; TODO: Remove VEC_SET_V1TI, VEC_SET_V2DF and VEC_SET_V2DI once the uses +;; in resolve_vec_insert are replaced by the equivalent gimple statements. const vsq __builtin_vec_set_v1ti (vsq, signed __int128, const int<0,0>); VEC_SET_V1TI nothing {set} -- 2.45.0
[PATCH 0/13 ver4] rs6000, built-in cleanup patch series
GCC maintainers: I have addressed the comments to the five patches in the series that have not yet been approved. The patches that have already been approved are 1, 3, 5, 6, 8, 9, 10, and 12. The remaining patches all have fairly minor fixes requested. I will just post version 4 of these patches here. The goal is to commit the entire series all at once as they are all related. So I a holding off committing the approved patches. Thank you for your time and feedback of these patches. The entire patch series has been tested on Power 10 LE, Power 9 BE with no regression failures. Carl
Re: [PATCH 13/13 ver 3] rs6000, remove vector set and vector init built-ins.
Kewen: On 6/3/24 22:59, Kewen.Lin wrote: > Hi, > > on 2024/5/30 00:16, Carl Love wrote: >> This was patch 13 from the previous series. Note the previous series patch >> 12 was dropped. This patch is the same as the previous version. The >> additional work to remove __builtin_vec_set_v1ti, __builtin_vec_set_v2di, >> __builtin_vec_set_v2d per the feedback comments with equivalent gimple code >> is being deferred to a future patch. The goal of this series was simply to >> remove duplicated built-ins, extending overloaded built-ins as needed. >> Adding the needed gimple code to remove the additional built-ins is beyond >> the goal of this patch series. >> >> Carl >> --- >> >> rs6000, remove vector set and vector init built-ins. >> >> The vector init built-ins: >> >> __builtin_vec_init_v16qi, __builtin_vec_init_v8hi, >> __builtin_vec_init_v4si, __builtin_vec_init_v4sf, >> __builtin_vec_init_v2di, __builtin_vec_init_v2df, >> __builtin_vec_set_v1ti > > Typo here, s/__builtin_vec_set_v1ti/__builtin_vec_init_v1ti/ Fixed. > >> >> perform the same operation as initializing the vector in C code. For >> example: >> >> result_v4si = __builtin_vec_init_v4si (1, 2, 3, 4); >> result_v4si = {1, 2, 3, 4}; >> >> These two constructs were tested and verified they generate identical >> assembly instructions with no optimization and -O3 optimization. >> >> The vector set built-ins: >> >> __builtin_vec_set_v16qi, __builtin_vec_set_v8hi. >> __builtin_vec_set_v4si, __builtin_vec_set_v4sf > > Please also add the reserved ones (...v1ti/v2di/v2df), as they are the > same too, temporarily reserving them for the uses in resolve_vec_insert() > doesn't affect this. Added the three additional built-ins to the list. > >> >> perform the same operation as setting a specific element in the vector in >> C code. For example: >> >> src_v4si = __builtin_vec_set_v4si (src_v4si, int_val, index); >> src_v4si[index] = int_val; >> >> The built-in actually generates more instructions than the inline C code >> with no optimization but is identical with -O3 optimizations. >> >> All of the above built-ins that are removed do not have test cases and >> are not documented. >> >> Built-ins __builtin_vec_set_v1ti __builtin_vec_set_v2di, >> __builtin_vec_set_v2df are not removed as they are used in function >> resolve_vec_insert() in file rs6000-c.cc. >> >> The built-ins are removed as they don't provide any benefit over just >> using C code. >> >> gcc/ChangeLog: >> * config/rs6000/rs6000-builtins.def (__builtin_vec_init_v16qi, >> __builtin_vec_init_v8hi, __builtin_vec_init_v4si, >> __builtin_vec_init_v4sf, __builtin_vec_init_v2di, >> __builtin_vec_init_v2df, __builtin_vec_set_v1ti, > > Typo, s/__builtin_vec_set_v1ti/__builtin_vec_init_v1ti/ Fixed > >> __builtin_vec_set_v16qi, __builtin_vec_set_v8hi. >> __builtin_vec_set_v4si, __builtin_vec_set_v4sf, >> __builtin_vec_set_v2di, __builtin_vec_set_v2df, >> __builtin_vec_set_v1ti): Remove built-in definitions. > > The last three ones are not actually removed. OK, fixed. > >> --- >> gcc/config/rs6000/rs6000-builtins.def | 42 ++- >> 1 file changed, 2 insertions(+), 40 deletions(-) >> >> diff --git a/gcc/config/rs6000/rs6000-builtins.def >> b/gcc/config/rs6000/rs6000-builtins.def >> index 48ebc018a8d..8349d45169f 100644 >> --- a/gcc/config/rs6000/rs6000-builtins.def >> +++ b/gcc/config/rs6000/rs6000-builtins.def >> @@ -1118,37 +1118,6 @@ >>const signed short __builtin_vec_ext_v8hi (vss, signed int); >> VEC_EXT_V8HI nothing {extract} >> >> - const vsc __builtin_vec_init_v16qi (signed char, signed char, signed >> char, \ >> -signed char, signed char, signed char, signed char, signed >> char, \ >> -signed char, signed char, signed char, signed char, signed >> char, \ >> -signed char, signed char, signed char); >> -VEC_INIT_V16QI nothing {init} >> - >> - const vf __builtin_vec_init_v4sf (float, float, float, float); >> -VEC_INIT_V4SF nothing {init} >> - >> - const vsi __builtin_vec_init_v4si (signed int, signed int, signed int, \ >> - signed int); >> -VEC_INIT_V4SI nothing {init} >> - >> - const vss __bu
[PATCH 7/13 ver4] rs6000, add overloaded vec_sel with int128 arguments
GCC maintainers: The patch has been updated per the comments from version 3. Please let me know if the patch is acceptable for mainline. Carl - rs6000, add overloaded vec_sel with int128 arguments Extend the vec_sel built-in to take three signed/unsigned/bool int128 arguments and return a signed/unsigned/bool int128 result. Extending the vec_sel built-in makes the existing buit-ins __builtin_vsx_xxsel_1ti and __builtin_vsx_xxsel_1ti_uns obsolete. The patch removes these built-ins. The patch adds documentation and test cases for the new overloaded vec_sel built-ins. gcc/ChangeLog: * config/rs6000/rs6000-builtins.def (__builtin_vsx_xxsel_1ti, __builtin_vsx_xxsel_1ti_uns): Remove built-in definitions. * config/rs6000/rs6000-overload.def (vec_sel): Add new overloaded definitions. * doc/extend.texi: Add documentation for new vec_sel instances. gcc/testsuite/ChangeLog: * gcc.target/powerpc/builtins-10-runnable.c: New runnable test file. * gcc.target/powerpc/builtins-10.c: New compile only test file. --- gcc/config/rs6000/rs6000-builtins.def | 6 - gcc/config/rs6000/rs6000-overload.def | 12 + gcc/doc/extend.texi | 20 ++ .../gcc.target/powerpc/builtins-10-runnable.c | 220 ++ .../gcc.target/powerpc/builtins-10.c | 63 + 5 files changed, 315 insertions(+), 6 deletions(-) create mode 100644 gcc/testsuite/gcc.target/powerpc/builtins-10-runnable.c create mode 100644 gcc/testsuite/gcc.target/powerpc/builtins-10.c diff --git a/gcc/config/rs6000/rs6000-builtins.def b/gcc/config/rs6000/rs6000-builtins.def index b90b3f34167..c969cd0f3f6 100644 --- a/gcc/config/rs6000/rs6000-builtins.def +++ b/gcc/config/rs6000/rs6000-builtins.def @@ -1907,12 +1907,6 @@ const vuc __builtin_vsx_xxsel_16qi_uns (vuc, vuc, vuc); XXSEL_16QI_UNS vector_select_v16qi_uns {} - const vsq __builtin_vsx_xxsel_1ti (vsq, vsq, vsq); -XXSEL_1TI vector_select_v1ti {} - - const vsq __builtin_vsx_xxsel_1ti_uns (vsq, vsq, vsq); -XXSEL_1TI_UNS vector_select_v1ti_uns {} - const vd __builtin_vsx_xxsel_2df (vd, vd, vd); XXSEL_2DF vector_select_v2df {} diff --git a/gcc/config/rs6000/rs6000-overload.def b/gcc/config/rs6000/rs6000-overload.def index 4d857bb1af3..6cec1ad4f1a 100644 --- a/gcc/config/rs6000/rs6000-overload.def +++ b/gcc/config/rs6000/rs6000-overload.def @@ -3274,6 +3274,18 @@ VSEL_2DF VSEL_2DF_B vd __builtin_vec_sel (vd, vd, vull); VSEL_2DF VSEL_2DF_U + vsq __builtin_vec_sel (vsq, vsq, vbq); +VSEL_1TI VSEL_1TI_B + vsq __builtin_vec_sel (vsq, vsq, vuq); +VSEL_1TI VSEL_1TI_U + vuq __builtin_vec_sel (vuq, vuq, vbq); +VSEL_1TI_UNS VSEL_1TI_UB + vuq __builtin_vec_sel (vuq, vuq, vuq); +VSEL_1TI_UNS VSEL_1TI_UU + vbq __builtin_vec_sel (vbq, vbq, vbq); +VSEL_1TI_UNS VSEL_1TI_BB + vbq __builtin_vec_sel (vbq, vbq, vuq); +VSEL_1TI_UNS VSEL_1TI_BU ; The following variants are deprecated. vsll __builtin_vec_sel (vsll, vsll, vsll); VSEL_2DI_B VSEL_2DI_S diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index b1620274285..d7d8d149a43 100644 --- a/gcc/doc/extend.texi +++ b/gcc/doc/extend.texi @@ -21420,6 +21420,26 @@ Additional built-in functions are available for the 64-bit PowerPC family of processors, for efficient use of 128-bit floating point (@code{__float128}) values. +Vector select + +@smallexample +vector signed __int128 vec_sel (vector signed __int128, + vector signed __int128, vector bool __int128); +vector signed __int128 vec_sel (vector signed __int128, + vector signed __int128, vector unsigned __int128); +vector unsigned __int128 vec_sel (vector unsigned __int128, + vector unsigned __int128, vector bool __int128); +vector unsigned __int128 vec_sel (vector unsigned __int128, + vector unsigned __int128, vector unsigned __int128); +vector bool __int128 vec_sel (vector bool __int128, + vector bool __int128, vector bool __int128); +vector bool __int128 vec_sel (vector bool __int128, + vector bool __int128, vector unsigned __int128); +@end smallexample + +The instance is an extension of the exiting overloaded built-in @code{vec_sel} +that is documented in the PVIPR. + @node Basic PowerPC Built-in Functions Available on ISA 2.06 @subsubsection Basic PowerPC Built-in Functions Available on ISA 2.06 diff --git a/gcc/testsuite/gcc.target/powerpc/builtins-10-runnable.c b/gcc/testsuite/gcc.target/powerpc/builtins-10-runnable.c new file mode 100644 index 000..b7b4a95ea0e --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/builtins-10-runnable.c @@ -0,0 +1,220 @@ +/* { dg-do run } */ +/* { dg-require-effective-target vmx_hw } */ +/* { dg-options "-maltivec -O2 " } */ + +#include + +#define DEBUG 0 + +#if DEBUG +#include +vo
[PATCH 4/13 ver4] rs6000, extend the current vec_{un,}signed{e,o}, built-ins
GCC maintainers: As noted the removal of __builtin_vsx_xvcvdpuxds_uns and __builtin_vsx_xvcvspuxws was moved to patch 2 in the seris. The patch has been updated per the comments from version 3. Please let me know if this patch is acceptable for mainline. Carl -- rs6000, extend the current vec_{un,}signed{e,o} built-ins The built-ins __builtin_vsx_xvcvspsxds and __builtin_vsx_xvcvspuxds convert a vector of floats to signed/unsigned long long ints. Extend the existing vec_{un,}signed{e,o} built-ins to handle the argument vector of floats to return the even/odd signed/unsigned integers. The define expands vsignede_v4sf, vsignedo_v4sf, vunsignede_v4sf, vunsignedo_v4sf are added to support the new vec_{un,}signed{e,o} built-ins. The built-ins __builtin_vsx_xvcvspsxds and __builtin_vsx_xvcvspuxds are now for internal use only. They are not documented and they do not have testcases. The built-in __builtin_vsx_xvcvdpsxws is redundant as it is covered by vec_signed{e,o}, remove. The built-in __builtin_vsx_xvcvdpuxws is redundant as it is covered by vec_unsigned{e,o}, remove. Add testcases and update documentation. gcc/ChangeLog: * config/rs6000/rs6000-builtins.def: __builtin_vsx_xvcvdpsxws, __builtin_vsx_xvcvdpuxws): Removed. (__builtin_vsx_xvcvspsxds, __builtin_vsx_xvcvspuxds): Renamed __builtin_vsignede_v4sf, __builtin_vunsignede_v4sf respectively. (XVCVSPSXDS, XVCVSPUXDS): Renamed VEC_VSIGNEDE_V4SF, VEC_VUNSIGNEDE_V4SF respectively. (__builtin_vsignedo_v4sf, __builtin_vunsignedo_v4sf): New built-in definitions. * config/rs6000/rs6000-overload.def (vec_signede, vec_signedo, vec_unsignede,vec_unsignedo): Add new overloaded specifications. * config/rs6000/vsx.md (vsignede_v4sf, vsignedo_v4sf, vunsignede_v4sf, vunsignedo_v4sf): New define_expands. * doc/extend.texi (vec_signedo, vec_signede): Add documentation for new overloaded built-ins. gcc/testsuite/ChangeLog: * gcc.target/powerpc/builtins-3-runnable.c (test_unsigned_int_result, test_ll_unsigned_int_result): Add new argument. (vec_signede, vec_signedo, vec_unsignede, vec_unsignedo): New tests for the overloaded built-ins. --- gcc/config/rs6000/rs6000-builtins.def | 20 ++--- gcc/config/rs6000/rs6000-overload.def | 8 ++ gcc/config/rs6000/vsx.md | 84 +++ gcc/doc/extend.texi | 10 +++ .../gcc.target/powerpc/builtins-3-runnable.c | 49 +-- 5 files changed, 154 insertions(+), 17 deletions(-) diff --git a/gcc/config/rs6000/rs6000-builtins.def b/gcc/config/rs6000/rs6000-builtins.def index 322d27b7a0d..29a9deb3410 100644 --- a/gcc/config/rs6000/rs6000-builtins.def +++ b/gcc/config/rs6000/rs6000-builtins.def @@ -1688,26 +1688,26 @@ const vsll __builtin_vsx_xvcvdpsxds_scale (vd, const int); XVCVDPSXDS_SCALE vsx_xvcvdpsxds_scale {} - const vsi __builtin_vsx_xvcvdpsxws (vd); -XVCVDPSXWS vsx_xvcvdpsxws {} - const vsll __builtin_vsx_xvcvdpuxds (vd); XVCVDPUXDS vsx_fixuns_truncv2dfv2di2 {} const vsll __builtin_vsx_xvcvdpuxds_scale (vd, const int); XVCVDPUXDS_SCALE vsx_xvcvdpuxds_scale {} - const vsi __builtin_vsx_xvcvdpuxws (vd); -XVCVDPUXWS vsx_xvcvdpuxws {} - const vd __builtin_vsx_xvcvspdp (vf); XVCVSPDP vsx_xvcvspdp {} - const vsll __builtin_vsx_xvcvspsxds (vf); -XVCVSPSXDS vsx_xvcvspsxds {} + const vsll __builtin_vsignede_v4sf (vf); +VEC_VSIGNEDE_V4SF vsignede_v4sf {} + + const vsll __builtin_vsignedo_v4sf (vf); +VEC_VSIGNEDO_V4SF vsignedo_v4sf {} + + const vull __builtin_vunsignede_v4sf (vf); +VEC_VUNSIGNEDE_V4SF vunsignede_v4sf {} - const vsll __builtin_vsx_xvcvspuxds (vf); -XVCVSPUXDS vsx_xvcvspuxds {} + const vull __builtin_vunsignedo_v4sf (vf); +VEC_VUNSIGNEDO_V4SF vunsignedo_v4sf {} const vd __builtin_vsx_xvcvsxddp (vsll); XVCVSXDDP vsx_floatv2div2df2 {} diff --git a/gcc/config/rs6000/rs6000-overload.def b/gcc/config/rs6000/rs6000-overload.def index 84bd9ae6554..4d857bb1af3 100644 --- a/gcc/config/rs6000/rs6000-overload.def +++ b/gcc/config/rs6000/rs6000-overload.def @@ -3307,10 +3307,14 @@ [VEC_SIGNEDE, vec_signede, __builtin_vec_vsignede] vsi __builtin_vec_vsignede (vd); VEC_VSIGNEDE_V2DF + vsll __builtin_vec_vsignede (vf); +VEC_VSIGNEDE_V4SF [VEC_SIGNEDO, vec_signedo, __builtin_vec_vsignedo] vsi __builtin_vec_vsignedo (vd); VEC_VSIGNEDO_V2DF + vsll __builtin_vec_vsignedo (vf); +VEC_VSIGNEDO_V4SF [VEC_SIGNEXTI, vec_signexti, __builtin_vec_signexti] vsi __builtin_vec_signexti (vsc); @@ -4433,10 +4437,14 @@ [VEC_UNSIGNEDE, vec_unsignede, __builtin_vec_vunsignede] vui __builtin_vec_vunsignede (vd); VEC_VUNSIGNEDE_V2DF + vull __builtin_vec_vunsignede (vf); +VEC_VUNSIGNEDE
Re: [PATCH] rs6000, altivec-2-runnable.c should be a runnable test
Segher: On 6/13/24 12:51, Segher Boessenkool wrote: > >> --- a/gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c >> +++ b/gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c >> @@ -1,4 +1,4 @@ >> -/* { dg-do compile { target powerpc*-*-* } } */ >> +/* { dg-do run { target powerpc*-*-* } } */ >> /* { dg-options "-mvsx" } */ >> /* { dg-additional-options "-mdejagnu-cpu=power8" { target { ! >> has_arch_pwr8 } } } */ >> /* { dg-require-effective-target powerpc_vsx } */ > > Everything in gcc.target/powerpc/ is tested for "target powerpc*-*-*" > already, so you could remove that target clause even (after testing of > course :-) ) > > Okay for trunk with or without that extra tweak. Thank you! I updated the patch by removing the target clause as suggested: -/* { dg-do compile { target powerpc*-*-* } } */ +/* { dg-do run } */ /* { dg-options "-mvsx" } */ /* { dg-additional-options "-mdejagnu-cpu=power8" { target { ! has_arch_pwr8 } } } */ /* { dg-require-effective-target powerpc_vsx } */ Retested on Power 10. Reports 2 passes and no failures. I will go ahead and commit. Thanks. Carl
[PATCH] rs6000, altivec-2-runnable.c update the require-effective-target
GCC maintainers: Per the additional feedback after patch: commit c892525813c94b018464d5a4edc17f79186606b7 Author: Carl Love Date: Tue Jun 11 14:01:16 2024 -0400 rs6000, altivec-2-runnable.c should be a runnable test The test case has "dg-do compile" set not "dg-do run" for a runnable test. This patch changes the dg-do command argument to run. gcc/testsuite/ChangeLog:gcc/testsuite/ChangeLog: * gcc.target/powerpc/altivec-2-runnable.c: Change dg-do argument to run. was approved and committed, I have updated the dg-require-effective-target and dg-options as requested so the test will compile with -O2 on a machine that has a minimum support of Power 8 vector hardware. The patch has been tested on Power 10 with no regression failures. Please let me know if this patch is acceptable for mainline. Thanks. Carl -- rs6000, altivec-2-runnable.c update the require-effective-target The test requires a minimum of Power8 vector HW and a compile level of -O2. gcc/testsuite/ChangeLog:gcc/testsuite/ChangeLog: * gcc.target/powerpc/altivec-2-runnable.c: Change the require-effective-target for the test. --- gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c b/gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c index 17b23eb9d50..04c7d1ac70e 100644 --- a/gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c +++ b/gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c @@ -1,7 +1,6 @@ /* { dg-do run } */ -/* { dg-options "-mvsx" } */ -/* { dg-additional-options "-mdejagnu-cpu=power8" { target { ! has_arch_pwr8 } } } */ -/* { dg-require-effective-target powerpc_vsx } */ +/* { dg-options "-O2 -mdejagnu-cpu=power8" } */ +/* { dg-require-effective-target p8vector_hw } */ #include -- 2.45.0
Re: [PATCH] rs6000, altivec-2-runnable.c update the require-effective-target
Kewen, Peter, Segher: On 6/17/24 19:56, Kewen.Lin wrote: > Hi, > > on 2024/6/18 00:08, Peter Bergner wrote: >> On 6/14/24 1:37 PM, Carl Love wrote: >>> Per the additional feedback after patch: >>> >>> commit c892525813c94b018464d5a4edc17f79186606b7 >>> Author: Carl Love >>> Date: Tue Jun 11 14:01:16 2024 -0400 >>> >>> rs6000, altivec-2-runnable.c should be a runnable test >>> >>> The test case has "dg-do compile" set not "dg-do run" for a runnable >>> test. This patch changes the dg-do command argument to run. >>> >>> gcc/testsuite/ChangeLog:gcc/testsuite/ChangeLog: >>> * gcc.target/powerpc/altivec-2-runnable.c: Change dg-do >>> argument to run. >> >> Test case altivec-1-runnable.c seems to have the same issue, in that it >> is currently a dg-do compile test case rather than the intended dg-do run. > > Good catch! OK, will update that as well. I think it will need the same header as altivec-2-runnable.c so once we have a final change for altivec-2-runnable.c, I will make the header for altivec-1-runnable.c be the same. > >> Can you have a look at changing that to dg-do run too? My guess it that >> this one will want something similar to some other altivec test cases, ala: >> >> /* { dg-do run { target vmx_hw } } */ >> /* { dg-do compile { target { ! vmx_hw } } } */ >> /* { dg-require-effective-target powerpc_altivec_ok } */ >> /* { dg-options "-O2 -maltivec -mabi=altivec" } */ > > I'd expect the "-runnable" test case focuses on testing for run. Normally, > the one without "-runnable" would focus on testing for compiling (scan some > desired insn), but this altivec-1.c and altivec-1-runnable.c seems to test > for different things, maybe we should separate them into different names > if they don't test for a same test point. The altivec-1-runnable.c and altivec-2-runnable.c tests were added for various built-ins that didn't have any test cases. There wasn't an intention that there was any connection to the existing altivec-*.c test files. I started creating runnable when I started adding support for built-ins that we claimed to support but had never actually been implemented. I created runnable tests to make sure my implementation actually worked. I continued to add runnable tests for built-ins that existed but didn't have a test case. Adding runnable tests did find a couple of issues where the existing implementation had a bug. That all said, if we want tochange the name of altivec-1-runnable.c and altivec-2-runnable.c a different naming scheme that is fine with me. Perhaps we should finish fixing the header for this test file, then do altivec-1-runnable, and then a final patch that does all the file renaming? > >> >> That said, I don't like not having a -mdejagnu-cpu=... here. >> I think for our server cpus, this is fine, but on an embedded system >> with a old ISA default for -mcpu=... (so we be doing a dg-do compile), >> just adding -maltivec to that default may not make much sense for that >> default and probably should be an error. Maybe something like: > > Yes, for some embedded cpus, there will be some error messages, but since > we have powerpc_altivec_ok effective target, the error would make that > effective target checking fail so I'd expect it'll stop it being tested > (unsupported). > >> >> /* { dg-do run { target vmx_hw } } */ >> /* { dg-do compile { target { ! vmx_hw } } } */ >> /* { dg-require-effective-target powerpc_altivec_ok } */ >> /* { dg-options "-O2 -mdejagnu=power7" } */ >> >> ...makes more sense? Ke Wen & Segher, thoughts on that? >> Ke Wen, should powerpc_altivec_ok be powerpc_altivec here??? > > Yes, I just pushed r15-1390 for this change. > > BR, > Kewen > We had -mdejagnu=power8 before, but it looks like we want to go to power7 now. It sounds like we want the following: /* { dg-do run { target vmx_hw } } */ /* { dg-do compile { target { ! vmx_hw } } } */ /* { dg-options "-O2 -mdejagnu=power7" } */ /* { dg-require-effective-target powerpc_altivec } */ Carl
[PATCH ver2] rs6000, altivec-2-runnable.c update the require-effective-target
GCC maintainers: version 2: Updated per the feedback from Peter, Kewen and Segher. Note, Peter suggested the -mdejagnu-cpu= value must be power7. The test fails if -mdejagnu-cpu= is set to power7, needs to be power8. Patch has been retested on a Power 10 box, it succeeds with 2 passes and no fails. Per the additional feedback after patch: commit c892525813c94b018464d5a4edc17f79186606b7 Author: Carl Love Date: Tue Jun 11 14:01:16 2024 -0400 rs6000, altivec-2-runnable.c should be a runnable test The test case has "dg-do compile" set not "dg-do run" for a runnable test. This patch changes the dg-do command argument to run. gcc/testsuite/ChangeLog:gcc/testsuite/ChangeLog: * gcc.target/powerpc/altivec-2-runnable.c: Change dg-do argument to run. was approved and committed, I have updated the dg-require-effective-target and dg-options as requested so the test will compile with -O2 on a machine that has a minimum support of Power 8 vector hardware. The patch has been tested on Power 10 with no regression failures. Please let me know if this patch is acceptable for mainline. Thanks. Carl rs6000, altivec-2-runnable.c update the require-effective-target The test requires a minimum of Power8 vector HW and a compile level of -O2. gcc/testsuite/ChangeLog:gcc/testsuite/ChangeLog: * gcc.target/powerpc/altivec-2-runnable.c: Change the require-effective-target for the test. --- gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c b/gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c index 17b23eb9d50..9e7ef89327b 100644 --- a/gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c +++ b/gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c @@ -1,7 +1,7 @@ -/* { dg-do run } */ -/* { dg-options "-mvsx" } */ -/* { dg-additional-options "-mdejagnu-cpu=power8" { target { ! has_arch_pwr8 } } } */ -/* { dg-require-effective-target powerpc_vsx } */ +/* { dg-do run { target vsx_hw } } */ +/* { dg-do compile { target { ! vmx_hw } } } */ +/* { dg-options "-O2 -mdejagnu-cpu=power8" } */ +/* { dg-require-effective-target powerpc_altivec } */ #include -- 2.45.0
Re: [PATCH ver3] rs6000, altivec-2-runnable.c update the require-effective-target
Everyone, Oops, this should be version 3 not 2. Sorry. Carl On 6/19/24 09:13, Carl Love wrote: > GCC maintainers: > > version 2: Updated per the feedback from Peter, Kewen and Segher. Note, > Peter suggested the -mdejagnu-cpu= value must be power7. > The test fails if -mdejagnu-cpu= is set to power7, needs to be power8. Patch > has been retested on a Power 10 box, it succeeds > with 2 passes and no fails. > > Per the additional feedback after patch: > > commit c892525813c94b018464d5a4edc17f79186606b7 > Author: Carl Love > Date: Tue Jun 11 14:01:16 2024 -0400 > > rs6000, altivec-2-runnable.c should be a runnable test > > The test case has "dg-do compile" set not "dg-do run" for a runnable > test. This patch changes the dg-do command argument to run. > > gcc/testsuite/ChangeLog:gcc/testsuite/ChangeLog: > * gcc.target/powerpc/altivec-2-runnable.c: Change dg-do > argument to run. > > was approved and committed, I have updated the dg-require-effective-target > and dg-options as requested so the test will compile with -O2 on a > machine that has a minimum support of Power 8 vector hardware. > > The patch has been tested on Power 10 with no regression failures. > > Please let me know if this patch is acceptable for mainline. Thanks. > > Carl > > > rs6000, altivec-2-runnable.c update the require-effective-target > > The test requires a minimum of Power8 vector HW and a compile level > of -O2. > > gcc/testsuite/ChangeLog:gcc/testsuite/ChangeLog: > * gcc.target/powerpc/altivec-2-runnable.c: Change the > require-effective-target for the test. > --- > gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c | 8 > 1 file changed, 4 insertions(+), 4 deletions(-) > > diff --git a/gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c > b/gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c > index 17b23eb9d50..9e7ef89327b 100644 > --- a/gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c > +++ b/gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c > @@ -1,7 +1,7 @@ > -/* { dg-do run } */ > -/* { dg-options "-mvsx" } */ > -/* { dg-additional-options "-mdejagnu-cpu=power8" { target { ! has_arch_pwr8 > } } } */ > -/* { dg-require-effective-target powerpc_vsx } */ > +/* { dg-do run { target vsx_hw } } */ > +/* { dg-do compile { target { ! vmx_hw } } } */ > +/* { dg-options "-O2 -mdejagnu-cpu=power8" } */ > +/* { dg-require-effective-target powerpc_altivec } */ > > #include >
[PATCH] rs6000, altivec-1-runnable.c update the require-effective-target
GCC maintainers: The dg options for this test should be the same as for altivec-2-runnable.c. This patch updates the dg options to match the settings in altivec-2-runnable.c. The patch has been tested on Power 10 with no regression failures. Please let me know if this patch is acceptable for mainline. Thanks. Carl --From 289e15d215161ad45ae1aae7a5dedd2374737ec4 rs6000, altivec-1-runnable.c update the require-effective-target The test requires a minimum of Power8 vector HW and a compile level of -O2. gcc/testsuite/ChangeLog:gcc/testsuite/ChangeLog: * gcc.target/powerpc/altivec-1-runnable.c: Change the require-effective-target for the test. --- gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c b/gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c index da8ebbc30ba..c113089c13a 100644 --- a/gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c +++ b/gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c @@ -1,6 +1,7 @@ -/* { dg-do compile { target powerpc*-*-* } } */ -/* { dg-require-effective-target powerpc_altivec_ok } */ -/* { dg-options "-maltivec" } */ +/* { dg-do run { target vsx_hw } } */ +/* { dg-do compile { target { ! vmx_hw } } } */ +/* { dg-options "-O2 -mdejagnu-cpu=power8" } */ +/* { dg-require-effective-target powerpc_altivec } */ #include -- 2.45.0
Re: [PATCH ver2] rs6000, altivec-2-runnable.c update the require-effective-target
Kewen: On 6/21/24 03:36, Kewen.Lin wrote: > Hi Carl, > > on 2024/6/20 00:13, Carl Love wrote: >> GCC maintainers: >> >> version 2: Updated per the feedback from Peter, Kewen and Segher. Note, >> Peter suggested the -mdejagnu-cpu= value must be power7. >> The test fails if -mdejagnu-cpu= is set to power7, needs to be power8. >> Patch has been retested on a Power 10 box, it succeeds >> with 2 passes and no fails. > > IMHO Peter's suggestion on power7 (-mdejagnu-cpu=power7) is mainly for > altivec-1-runnable.c. Both your testing and the comments in the test > case show this altivec-2-runnable.c requires at least power8. OK. Per other thread changed altivec-1-runnable to power7. > >> >> Per the additional feedback after patch: >> >> commit c892525813c94b018464d5a4edc17f79186606b7 >> Author: Carl Love >> Date: Tue Jun 11 14:01:16 2024 -0400 >> >> rs6000, altivec-2-runnable.c should be a runnable test >> >> The test case has "dg-do compile" set not "dg-do run" for a runnable >> test. This patch changes the dg-do command argument to run. >> >> gcc/testsuite/ChangeLog:gcc/testsuite/ChangeLog: >> * gcc.target/powerpc/altivec-2-runnable.c: Change dg-do >> argument to run. >> >> was approved and committed, I have updated the dg-require-effective-target >> and dg-options as requested so the test will compile with -O2 on a >> machine that has a minimum support of Power 8 vector hardware. >> >> The patch has been tested on Power 10 with no regression failures. >> >> Please let me know if this patch is acceptable for mainline. Thanks. >> >> Carl >> >> >> rs6000, altivec-2-runnable.c update the require-effective-target >> >> The test requires a minimum of Power8 vector HW and a compile level >> of -O2. >> >> gcc/testsuite/ChangeLog:gcc/testsuite/ChangeLog: >> * gcc.target/powerpc/altivec-2-runnable.c: Change the >> require-effective-target for the test. >> --- >> gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c | 8 >> 1 file changed, 4 insertions(+), 4 deletions(-) >> >> diff --git a/gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c >> b/gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c >> index 17b23eb9d50..9e7ef89327b 100644 >> --- a/gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c >> +++ b/gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c >> @@ -1,7 +1,7 @@ >> -/* { dg-do run } */ >> -/* { dg-options "-mvsx" } */ >> -/* { dg-additional-options "-mdejagnu-cpu=power8" { target { ! >> has_arch_pwr8 } } } */ >> -/* { dg-require-effective-target powerpc_vsx } */ >> +/* { dg-do run { target vsx_hw } } */ > > As this test case requires power8 and up, and dg-options specifies > -mdejagnu-cpu=power8, we should use p8vector_hw instead of vsx_hw here, > otherwise it will fail on power7 env. Changed to p8vector_hw > >> +/* { dg-do compile { target { ! vmx_hw } } } */ > > This condition should be ! , so ! p8vector_hw. Changed. > >> +/* { dg-options "-O2 -mdejagnu-cpu=power8" } */> +/* { >> dg-require-effective-target powerpc_altivec } */ > > This should be powerpc_vsx instead, otherwise this case can still be > tested with -mno-vsx -maltivec, then this test case would fail. OK > > Besides, as the discussion on the name of this test case, could you also > rename this to p8vector-builtin-9.c instead? Put the name change in a separate patch to change both test file names. Carl
[PATCH] rs6000, change altivec*-runnable.c test file names
GCC maintainers: Per the discussion of the dg header changes for test files altivec-1-runnable.c and altivec-2-runnable.c it was decided it would be best to change the names of the two tests to better align them with the tests that they are better aligned with. This patch is dependent on the two patches to update the dg arguments for test files altivec-1-runnable.c and altivec-2-runnable.c being accepted and committed before this patch. The patch has been tested on Power 10 with no regression failures. Please let me know if this patch is acceptable for mainline. Thanks. Carl -- rs6000, change altivec*-runnable.c test file names Changed the names of the test files. gcc/testsuite/ChangeLog:gcc/testsuite/ChangeLog: * gcc.target/powerpc/altivec-1-runnable.c: Change the name to altivec-38.c. * gcc.target/powerpc/altivec-2-runnable.c: Change the name to p8vector-builtin-9.c. --- .../gcc.target/powerpc/{altivec-1-runnable.c => altivec-38.c} | 0 .../powerpc/{altivec-2-runnable.c => p8vector-builtin-9.c}| 0 2 files changed, 0 insertions(+), 0 deletions(-) rename gcc/testsuite/gcc.target/powerpc/{altivec-1-runnable.c => altivec-38.c} (100%) rename gcc/testsuite/gcc.target/powerpc/{altivec-2-runnable.c => p8vector-builtin-9.c} (100%) diff --git a/gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c b/gcc/testsuite/gcc.target/powerpc/altivec-38.c similarity index 100% rename from gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c rename to gcc/testsuite/gcc.target/powerpc/altivec-38.c diff --git a/gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c b/gcc/testsuite/gcc.target/powerpc/p8vector-builtin-9.c similarity index 100% rename from gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c rename to gcc/testsuite/gcc.target/powerpc/p8vector-builtin-9.c -- 2.45.0
Re: [PATCH] rs6000, altivec-1-runnable.c update the require-effective-target
Kewen: On 6/21/24 03:37, Kewen.Lin wrote: > Hi Carl, > > on 2024/6/20 00:18, Carl Love wrote: >> GCC maintainers: >> >> The dg options for this test should be the same as for altivec-2-runnable.c. >> This patch updates the dg options to match >> the settings in altivec-2-runnable.c. >> >> The patch has been tested on Power 10 with no regression failures. >> >> Please let me know if this patch is acceptable for mainline. Thanks. >> >> Carl >> >> --From >> 289e15d215161ad45ae1aae7a5dedd2374737ec4 rs6000, altivec-1-runnable.c >> update the require-effective-target >> >> The test requires a minimum of Power8 vector HW and a compile level >> of -O2. > > This is not true, vec_unpackh and vec_unpackl doesn't require power8, > vupk[hl]s[hb]/vupk[hl]px are all ISA 2.03. > >> >> gcc/testsuite/ChangeLog:gcc/testsuite/ChangeLog: >> * gcc.target/powerpc/altivec-1-runnable.c: Change the >> require-effective-target for the test. >> --- >> gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c | 7 --- >> 1 file changed, 4 insertions(+), 3 deletions(-) >> >> diff --git a/gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c >> b/gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c >> index da8ebbc30ba..c113089c13a 100644 >> --- a/gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c >> +++ b/gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c >> @@ -1,6 +1,7 @@ >> -/* { dg-do compile { target powerpc*-*-* } } */ >> -/* { dg-require-effective-target powerpc_altivec_ok } */ >> -/* { dg-options "-maltivec" } */ >> +/* { dg-do run { target vsx_hw } } */ > > So this line should check for vmx_hw. OK, fingers are used to typing vsx Fixed. > >> +/* { dg-do compile { target { ! vmx_hw } } } */ >> +/* { dg-options "-O2 -mdejagnu-cpu=power8" } */ > > With more thinking, I think it's better to use > "-O2 -maltivec" to be consistent with the others. OK, changed it back. We now have: /* { dg-do run { target vmx_hw } } */ /* { dg-do compile { target { ! vmx_hw } } } */ /* { dg-options "-O2 -maltivec" } */ /* { dg-require-effective-target powerpc_altivec } */ The regression test runs fine with the above. Two passes, no failures. > > As mentioned in the other thread, powerpc_altivec > effective target check should guarantee the altivec > feature support, if any default cpu type or user > specified option disable altivec, this test case > will not be tested. If we specify one cpu type > specially here, it may cause confusion why it's > different from the other existing ones. So let's > go without no specified cpu type. > > Besides, similar to the request for altivec-1-runnable.c, > could you also rename this to altivec-38.c? OK, will change the names for the two test cases at the same time in a separate patch. Carl
[PATCH version 2] rs6000, altivec-1-runnable.c update the, require-effective-target
GCC maintainers: version 2, update the dg options per the feedback. Retested the patch on Power 10 with no regressions. This patch updates the dg options. The patch has been tested on Power 10 with no regression failures. Please let me know if this patch is acceptable for mainline. Thanks. Carl -- rs6000, altivec-1-runnable.c update the require-effective-target Update the dg test directives. gcc/testsuite/ChangeLog:gcc/testsuite/ChangeLog: * gcc.target/powerpc/altivec-1-runnable.c: Change the require-effective-target for the test. --- gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c b/gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c index da8ebbc30ba..3f084c91798 100644 --- a/gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c +++ b/gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c @@ -1,6 +1,7 @@ -/* { dg-do compile { target powerpc*-*-* } } */ -/* { dg-require-effective-target powerpc_altivec_ok } */ -/* { dg-options "-maltivec" } */ +/* { dg-do run { target vmx_hw } } */ +/* { dg-do compile { target { ! vmx_hw } } } */ +/* { dg-options "-O2 -maltivec" } */ +/* { dg-require-effective-target powerpc_altivec } */ #include -- 2.45.0
[PATCH version 4] rs6000, altivec-2-runnable.c update the, require-effective-target
GCC maintainers: version 4: Additional dg option updates per the feedback. Retested the patch on Power 10, no regressions. version 3: Updated per the feedback from Peter, Kewen and Segher. Note, Peter suggested the -mdejagnu-cpu= value must be power7. The test fails if -mdejagnu-cpu= is set to power7, needs to be power8. Patch has been retested on a Power 10 box, it succeeds with 2 passes and no fails. Per the additional feedback after patch: commit c892525813c94b018464d5a4edc17f79186606b7 Author: Carl Love Date: Tue Jun 11 14:01:16 2024 -0400 rs6000, altivec-2-runnable.c should be a runnable test The test case has "dg-do compile" set not "dg-do run" for a runnable test. This patch changes the dg-do command argument to run. gcc/testsuite/ChangeLog:gcc/testsuite/ChangeLog: * gcc.target/powerpc/altivec-2-runnable.c: Change dg-do argument to run. was approved and committed, I have updated the dg-require-effective-target and dg-options as requested so the test will compile with -O2 on a machine that has a minimum support of Power 8 vector hardware. The patch has been tested on Power 10 with no regression failures. Please let me know if this patch is acceptable for mainline. Thanks. Carl -- rs6000, altivec-2-runnable.c update the require-effective-target The test requires a minimum of Power8 vector HW and a compile level of -O2. Update the dg test directives. gcc/testsuite/ChangeLog:gcc/testsuite/ChangeLog: * gcc.target/powerpc/altivec-2-runnable.c: Change the require-effective-target for the test. --- gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c b/gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c index 17b23eb9d50..660669f69fd 100644 --- a/gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c +++ b/gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c @@ -1,6 +1,6 @@ -/* { dg-do run } */ -/* { dg-options "-mvsx" } */ -/* { dg-additional-options "-mdejagnu-cpu=power8" { target { ! has_arch_pwr8 } } } */ +/* { dg-do run { target p8vector_hw } } */ +/* { dg-do compile { target { ! p8vector_hw } } } */ +/* { dg-options "-O2 -mdejagnu-cpu=power8" } */ /* { dg-require-effective-target powerpc_vsx } */ #include -- 2.45.0
Re: [PATCH 4/13 ver4] rs6000, extend the current vec_{un,}signed{e,o}, built-ins
On 6/18/24 20:03, Kewen.Lin wrote: > Hi Carl, > > on 2024/6/14 03:40, Carl Love wrote: >> >> GCC maintainers: >> >> As noted the removal of __builtin_vsx_xvcvdpuxds_uns and >> __builtin_vsx_xvcvspuxws was moved to patch 2 in the seris. The patch has >> been updated per the comments from version 3. >> >> Please let me know if this patch is acceptable for mainline. >> >> Carl >> >> -- >> >> rs6000, extend the current vec_{un,}signed{e,o} built-ins >> >> The built-ins __builtin_vsx_xvcvspsxds and __builtin_vsx_xvcvspuxds >> convert a vector of floats to signed/unsigned long long ints. Extend the > > Nit: s/signed/a vector of signed/ Fixed. > >> existing vec_{un,}signed{e,o} built-ins to handle the argument >> vector of floats to return the even/odd signed/unsigned integers. >> > > Likewise. Fixed. > >> The define expands vsignede_v4sf, vsignedo_v4sf, vunsignede_v4sf, >> vunsignedo_v4sf are added to support the new vec_{un,}signed{e,o} >> built-ins. >> >> The built-ins __builtin_vsx_xvcvspsxds and __builtin_vsx_xvcvspuxds are >> now for internal use only. They are not documented and they do not >> have testcases. >> > > >> The built-in __builtin_vsx_xvcvdpsxws is redundant as it is covered by >> vec_signed{e,o}, remove. >> >> The built-in __builtin_vsx_xvcvdpuxws is redundant as it is covered by >> vec_unsigned{e,o}, remove. > > As the comments in 2/13 v4 and the previous review comments, I preferred > these two are moved to 2/13 as well (this patch should focus on extending). > Moved to patch 2. >> >> Add testcases and update documentation. >> >> gcc/ChangeLog: >> * config/rs6000/rs6000-builtins.def: __builtin_vsx_xvcvdpsxws, >> __builtin_vsx_xvcvdpuxws): Removed. >> (__builtin_vsx_xvcvspsxds, __builtin_vsx_xvcvspuxds): Renamed > > Nit: s/Renamed/Rename to/ OK, fixed. > >> __builtin_vsignede_v4sf, __builtin_vunsignede_v4sf respectively. >> (XVCVSPSXDS, XVCVSPUXDS): Renamed VEC_VSIGNEDE_V4SF, >> VEC_VUNSIGNEDE_V4SF respectively. > > Likewise. OK, fixed. > >> (__builtin_vsignedo_v4sf, __builtin_vunsignedo_v4sf): New >> built-in definitions. >> * config/rs6000/rs6000-overload.def (vec_signede, vec_signedo, >> vec_unsignede,vec_unsignedo): Add new overloaded specifications. > > Formatting nits: "..,.." -> ".., ..", " " -> " " OK, I fixed the various spacing issues. > >> * config/rs6000/vsx.md (vsignede_v4sf, vsignedo_v4sf, >> vunsignede_v4sf, vunsignedo_v4sf): New define_expands. > > Likewise. dito > >> * doc/extend.texi (vec_signedo, vec_signede): Add documentation >> for new overloaded built-ins. > > Missing vec_unsignedo and vec_unsignede, may be also mention for which > types, like "converting vector float to vector {un,}signed long long". > OK, fixed. >> >> gcc/testsuite/ChangeLog: >> * gcc.target/powerpc/builtins-3-runnable.c >> (test_unsigned_int_result, test_ll_unsigned_int_result): Add >> new argument. >> (vec_signede, vec_signedo, vec_unsignede, vec_unsignedo): New >> tests for the overloaded built-ins. >> --- gcc/config/rs6000/rs6000-builtins.def | 20 ++--- >> gcc/config/rs6000/rs6000-overload.def | 8 ++ >> gcc/config/rs6000/vsx.md | 84 +++ >> gcc/doc/extend.texi | 10 +++ >> .../gcc.target/powerpc/builtins-3-runnable.c | 49 +-- >> 5 files changed, 154 insertions(+), 17 deletions(-) >> >> diff --git a/gcc/config/rs6000/rs6000-builtins.def >> b/gcc/config/rs6000/rs6000-builtins.def >> index 322d27b7a0d..29a9deb3410 100644 >> --- a/gcc/config/rs6000/rs6000-builtins.def >> +++ b/gcc/config/rs6000/rs6000-builtins.def >> @@ -1688,26 +1688,26 @@ >>const vsll __builtin_vsx_xvcvdpsxds_scale (vd, const int); >> XVCVDPSXDS_SCALE vsx_xvcvdpsxds_scale {} >> >> - const vsi __builtin_vsx_xvcvdpsxws (vd); >> -XVCVDPSXWS vsx_xvcvdpsxws {} >> - >>const vsll __builtin_vsx_xvcvdpuxds (vd); >> XVCVDPUXDS vsx_fixuns_truncv2dfv2di2 {} >> >>const vsll __builtin_vsx_xvcvdpuxds_scale (vd, const int); >> XVCVDPUXDS_SCALE vsx_xvcvdpuxds_scale {} >> >> - const vsi __builtin_vsx_xvcvdpuxws (vd); >> -XVCVDPUXWS
Re: [PATCH 2/13 ver4] rs6000, Remove __builtin_vsx_xvcvspsxws,, __builtin_vsx_xvcvdpuxds_uns, __builtin_vsx_xvcvspuxws built-ins.
Kewen: On 6/18/24 20:03, Kewen.Lin wrote: > Hi Carl, > > on 2024/6/14 03:40, Carl Love wrote: >> GCC maintainers: >> >> Per the comments on patch 0004 from version 3, the removal of >> The built-in __builtin_vsx_xvcvdpuxds_uns and __builtin_vsx_xvcvspuxws was >> moved to this patch. The rest of the patch is unchanged from version 3. >> There were no comments on this patch for version 3. >> >> Please let me know if this patch is acceptable. Thanks. >> >> Carl >> >> >> - >> >> rs6000, Remove __builtin_vsx_xvcvspsxws, >> __builtin_vsx_xvcvdpuxds_uns, __builtin_vsx_xvcvspuxws built-ins. > > Nit: Maybe make it shorter like: Remove built-ins > __builtin_vsx_xvcv{sp{sx,u}ws,dpuxds_uns} > >> >> The built-in __builtin_vsx_xvcvspsxws is a duplicate of the vec_signed > > Nit: Strictly speaking, not a duplicate of vec_signed but covered by it. > >> built-in that is documented in the PVIPR. The __builtin_vsx_xvcvspsxws >> built-in is not documented and there are no test cases for it. >> >> The built-in __builtin_vsx_xvcvdpuxds_uns is redundant as it is covered by >> vec_unsigned, remove. >> >> The __builtin_vsx_xvcvspuxws is redundant as it is covered by >> vec_unsigned, remove. > > As mentioned in the previous review, I'd expect patch 4/13 only focuses on > extending vec_{un,}signed{e,o} for vector float (aka. __builtin_vsx_xvcvspsxds > and __builtin_vsx_xvcvspuxds related), and this patch focuses on some built-in > removals which have been covered by the existing vec_{un,}signed{,e,o}, so > it can also drop the built-ins: > > "The built-in __builtin_vsx_xvcvdpsxws is redundant as it is covered by > vec_signed{e,o}, remove. > > The built-in __builtin_vsx_xvcvdpuxws is redundant as it is covered by > vec_unsigned{e,o}, remove." > > // copied from 4/13. Not sure why I didn't move these two with the other two??? Sorry. Moved them from patch 4. Carl
[PATCH ver3] rs6000, altivec-1-runnable.c update the, require-effective-target
GCC maintainers: version 3, rebased on current mainline tree. Version 2 of the patch was out of sync. Retested the patch on Power 10 with no regressions. version 2, update the dg options per the feedback. Retested the patch on Power 10 with no regressions. This patch updates the dg options. The patch has been tested on Power 10 with no regression failures. Please let me know if this patch is acceptable for mainline. Thanks. Carl rs6000, altivec-1-runnable.c update the require-effective-target Update the dg test directives. gcc/testsuite/ChangeLog:gcc/testsuite/ChangeLog: * gcc.target/powerpc/altivec-1-runnable.c: Change the require-effective-target for the test. --- gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c b/gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c index 4e32860a169..6763ff3ff8b 100644 --- a/gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c +++ b/gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c @@ -1,7 +1,9 @@ -/* { dg-do compile { target powerpc*-*-* } } */ -/* { dg-options "-maltivec" } */ +/* { dg-do run { target vmx_hw } } */ +/* { dg-do compile { target { ! vmx_hw } } } */ +/* { dg-options "-O2 -maltivec" } */ /* { dg-require-effective-target powerpc_altivec } */ + #include #ifdef DEBUG -- 2.45.0
Re: [PATCH version 2] rs6000, altivec-1-runnable.c update the, require-effective-target
Kewen: On 6/23/24 19:41, Kewen.Lin wrote: > Hi, > > on 2024/6/22 00:15, Carl Love wrote: >> GCC maintainers: >> >> version 2, update the dg options per the feedback. Retested the patch on >> Power 10 with no regressions. >> >> This patch updates the dg options. >> >> The patch has been tested on Power 10 with no regression failures. >> >> Please let me know if this patch is acceptable for mainline. Thanks. >> >> Carl >> >> -- >> rs6000, altivec-1-runnable.c update the require-effective-target >> >> Update the dg test directives. >> >> gcc/testsuite/ChangeLog:gcc/testsuite/ChangeLog: >> * gcc.target/powerpc/altivec-1-runnable.c: Change the >> require-effective-target for the test. >> --- >> gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c | 7 --- >> 1 file changed, 4 insertions(+), 3 deletions(-) >> >> diff --git a/gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c >> b/gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c >> index da8ebbc30ba..3f084c91798 100644 >> --- a/gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c >> +++ b/gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c >> @@ -1,6 +1,7 @@ >> -/* { dg-do compile { target powerpc*-*-* } } */ >> -/* { dg-require-effective-target powerpc_altivec_ok } */ >> -/* { dg-options "-maltivec" } */ >> +/* { dg-do run { target vmx_hw } } */ >> +/* { dg-do compile { target { ! vmx_hw } } } */ >> +/* { dg-options "-O2 -maltivec" } */ >> +/* { dg-require-effective-target powerpc_altivec } */ > > This one needs rebasing, "powerpc_altivec" has been adjusted on trunk. Yes, this seems to be out of sync. I will rebase on the current upstream tree and re-post. Carl
[PATCH] rs6000, update vec_ld, vec_lde, vec_st and vec_ste, documentation
GCC maintainers: The following patch updates the user documentation for the vec_ld, vec_lde, vec_st and vec_ste built-ins to make it clearer that there are data alignment requirements for these built-ins. If the data alignment requirements are not followed, the data loaded or stored by these built-ins will be wrong. Please let me know if this patch is acceptable for mainline. Thanks. Carl rs6000, update vec_ld, vec_lde, vec_st and vec_ste documentation Use of the vec_ld and vec_st built-ins require that the data be 16-byte aligned to work properly. Add some additional text to the existing documentation to make this clearer to the user. Similarly, the vec_lde and vec_ste built-ins also have data alignment requirements based on the size of the vector element. Update the documentation to make this clear to the user. gcc/ChangeLog: * doc/extend.texi: Add clarification for the use of the vec_ld vec_st, vec_lde and vec_ste built-ins. --- gcc/doc/extend.texi | 15 +++ 1 file changed, 11 insertions(+), 4 deletions(-) diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index ee3644a5264..55faded17b9 100644 --- a/gcc/doc/extend.texi +++ b/gcc/doc/extend.texi @@ -22644,10 +22644,17 @@ vector unsigned char vec_xxsldi (vector unsigned char, @end smallexample Note that the @samp{vec_ld} and @samp{vec_st} built-in functions always -generate the AltiVec @samp{LVX} and @samp{STVX} instructions even -if the VSX instruction set is available. The @samp{vec_vsx_ld} and -@samp{vec_vsx_st} built-in functions always generate the VSX @samp{LXVD2X}, -@samp{LXVW4X}, @samp{STXVD2X}, and @samp{STXVW4X} instructions. +generate the AltiVec @samp{LVX}, and @samp{STVX} instructions. The +instructions mask off the lower 4 bits of the effective address thus requiring +the data to be 16-byte aligned to work properly. The @samp{vec_lde} and +@samp{vec_ste} built-in functions operate on vectors of bytes, short integer, +integer, and float. The corresponding AltiVec instructions @samp{LVEBX}, +@samp{LVEHX}, @samp{LVEWX}, @samp{STVEBX}, @samp{STVEHX}, @samp{STVEWX} mask +off the lower bits of the effective address based on the size of the data. +Thus the data must be aligned to the size of the vector element to work +properly. The @samp{vec_vsx_ld} and @samp{vec_vsx_st} built-in functions +always generate the VSX @samp{LXVD2X}, @samp{LXVW4X}, @samp{STXVD2X}, and +@samp{STXVW4X} instructions. @node PowerPC AltiVec Built-in Functions Available on ISA 2.07 @subsubsection PowerPC AltiVec Built-in Functions Available on ISA 2.07 -- 2.45.0
Re: [PATCH] rs6000, Remove __builtin_vec_set_v1ti,, __builtin_vec_set_v2df, __builtin_vec_set_v2di
Kewen: On 7/22/24 2:09 AM, Kewen.Lin wrote: Hi Carl, on 2024/7/18 00:01, Carl Love wrote: GCC maintainers: This patch removes the __builtin_vec_set_v1ti, __builtin_vec_set_v2df and __builtin_vec_set_v2di built-ins. The users should just use normal C-code to update the various vector elements. This change was originally intended to be part of the earlier series of cleanup patches. It was initially thought that some additional work would be needed to do some gimple generation instead of these built-ins. However, the existing default code generation does produce the needed code. The code generated with normal C-code is as good or better than the code generated with these built-ins. I think we need to expand this a bit: - For vec_set bif, the equivalent C code is as good as or better than it. - For vec_insert bif whose resolving makes use of vec_set bif previously (now get removed), it's as good as before with optimization. The patch has been tested on Power 10 LE with no regressions. Please let me know if the patch is acceptable for mainline. Thanks. Carl --- rs6000, Remove __builtin_vec_set_v1ti, __builtin_vec_set_v2df, __builtin_vec_set_v2di Remove the built-ins, use the default gimple generation instead. gcc/ChangeLog: * config/rs6000/rs6000-builtins.def (__builtin_vec_set_v1ti, __builtin_vec_set_v2df, __builtin_vec_set_v2di): Remove built-in definitions. * config/rs6000/rs6000-c.cc (resolve_vec_insert): Remove if statemnts for mode == V2DFmode, mode == V2DImode and Nit: s/statemnts/statements/ OK, fixed Maybe a bit more meaningful like: Remove the handling for constant vec_insert position with VECTOR_UNIT_VSX_P V1TImode, V2DFmode and V2DImode modes. OK, changed mode == V1TImode that reference RS6000_BIF_VEC_SET_V2DF, RS6000_BIF_VEC_SET_V2DI and RS6000_BIF_VEC_SET_V1TI. --- gcc/config/rs6000/rs6000-builtins.def | 13 - gcc/config/rs6000/rs6000-c.cc | 40 --- 2 files changed, 53 deletions(-) diff --git a/gcc/config/rs6000/rs6000-builtins.def b/gcc/config/rs6000/rs6000-builtins.def index 896d9686ac6..0ebc940f395 100644 --- a/gcc/config/rs6000/rs6000-builtins.def +++ b/gcc/config/rs6000/rs6000-builtins.def @@ -1263,19 +1263,6 @@ const signed long long __builtin_vec_ext_v2di (vsll, signed int); VEC_EXT_V2DI nothing {extract} -;; VEC_SET_V1TI, VEC_SET_V2DF and VEC_SET_V2DI are used in -;; resolve_vec_insert(), rs6000-c.cc -;; TODO: Remove VEC_SET_V1TI, VEC_SET_V2DF and VEC_SET_V2DI once the uses -;; in resolve_vec_insert are replaced by the equivalent gimple statements. - const vsq __builtin_vec_set_v1ti (vsq, signed __int128, const int<0,0>); - VEC_SET_V1TI nothing {set} - - const vd __builtin_vec_set_v2df (vd, double, const int<1>); - VEC_SET_V2DF nothing {set} - - const vsll __builtin_vec_set_v2di (vsll, signed long long, const int<1>); - VEC_SET_V2DI nothing {set} - Unexpected empty line removed. ?? I don't remove the blank line before the removed comment, so there is still a single blank line before the next entry. Specifically, the code with the above removed now looks like: ... const signed long long __builtin_vec_ext_v2di (vsll, signed int); VEC_EXT_V2DI nothing {extract} const vsc __builtin_vsx_cmpge_16qi (vsc, vsc); CMPGE_16QI vector_nltv16qi {} const vsll __builtin_vsx_cmpge_2di (vsll, vsll); CMPGE_2DI vector_nltv2di {} Which looks OK to me? Similar to vec_init removal, we should also get rid of set bif attribute, bif_is_set and altivec_expand_vec_set_builtin etc. That will also require removing: const vsq __builtin_vsx_set_1ti (vsq, signed __int128, const int<0,0>); SET_1TI vsx_set_v1ti {set} const vd __builtin_vsx_set_2df (vd, double, const int<0,1>); SET_2DF vsx_set_v2df {set} const vsll __builtin_vsx_set_2di (vsll, signed long long, const int<0,1>); SET_2DI vsx_set_v2di {set} I would assume the C-code generation for the above will be as good or better than the code generation for the built-ins but will need to verify that. I haven't looked at them specifically. Carl
[PATCH ver 2] rs6000, remove __builtin_vsx_xvcmp* built-ins
GCC maintainers: version 2, Updated patch comments, added missing ChangeLog. Fixed unintended line removal. The following patch removes the three __builtin_vsx_xvcmp[eq|ge|gt]sp builtins as they similar to the overloaded vec_cmp[eq|ge|gt] built-ins. The difference is the overloaded built-ins return a vector of boolean or a vector of long long booleans where as the removed built-ins returned a vector of floats or vector of doubles. The tests for __builtin_vsx_xvcmp[eq|ge|gt]sp and __builtin_vsx_xvcmp[eq|ge|gt]dp are updated to use the overloaded vec_cmp[eq|ge|gt] built-in with the required changes for the return type. Note __builtin_vsx_xvcmp[eq|ge|gt]dp are used internally. The patches have been tested on a Power 10 LE system with no regressions. Please let me know if the patch is acceptable for mainline. Thanks. Carl - rs6000, remove __builtin_vsx_xvcmp* built-ins This patch removes the built-ins: __builtin_vsx_xvcmpeqsp, __builtin_vsx_xvcmpgesp, __builtin_vsx_xvcmpgtsp. which are similar to the recommended PVIPR documented overloaded vec_cmpeq, vec_cmpgt and vec_cmpge built-ins. The difference is that the overloaded built-ins return a vector of 32-bit booleans. The removed built-ins returned a vector of floats. The __builtin_vsx_xvcmpeqdp, __builtin_vsx_xvcmpgedp and __builtin_vsx_xvcmpgtdp are not removed as they are used by the overloaded vec_cmpeq, vec_cmpgt and vec_cmpge built-ins. The test cases for the __builtin_vsx_xvcmpeqsp, __builtin_vsx_xvcmpgesp, __builtin_vsx_xvcmpgtsp, __builtin_vsx_xvcmpeqdp, __builtin_vsx_xvcmpgedp and __builtin_vsx_xvcmpgtdp are changed to use the overloaded vec_cmpeq, vec_cmpgt, vec_cmpge built-ins. Use of the overloaded built-ins requires the result to be stored in a vector of boolean of the appropriate size or the result must be cast to the return type used by the original __builtin_vsx_xvcmp* built-ins. gcc/ChangeLog: * config/rs6000/rs6000-builtins.def (__builtin_vsx_xvcmpeqsp, __builtin_vsx_xvcmpgesp, __builtin_vsx_xvcmpgtsp): Remove definitions. gcc/testsuite/ChangeLog: * gcc.target/powerpc/vsx-builtin-3.c (__builtin_vsx_xvcmpeqdp, __builtin_vsx_xvcmpgtdp, __builtin_vsx_xvcmpgedp, __builtin_vsx_xvcmpeqsp, __builtin_vsx_xvcmpgtsp, __builtin_vsx_xvcmpgesp): Remove. (vec_cmpeq, vec_cmpgt, vec_cmpge): Add tests for float arguments that store result in boolean and cast result to store result in float. Add tests for double arguments that store the result in long long boolean and cast result to double. --- gcc/config/rs6000/rs6000-builtins.def | 9 -- .../gcc.target/powerpc/vsx-builtin-3.c | 28 ++- 2 files changed, 21 insertions(+), 16 deletions(-) diff --git a/gcc/config/rs6000/rs6000-builtins.def b/gcc/config/rs6000/rs6000-builtins.def index 77eb0f7e406..47830b7dcb0 100644 --- a/gcc/config/rs6000/rs6000-builtins.def +++ b/gcc/config/rs6000/rs6000-builtins.def @@ -1579,18 +1579,12 @@ const signed int __builtin_vsx_xvcmpeqdp_p (signed int, vd, vd); XVCMPEQDP_P vector_eq_v2df_p {pred} - const vf __builtin_vsx_xvcmpeqsp (vf, vf); - XVCMPEQSP vector_eqv4sf {} - const vd __builtin_vsx_xvcmpgedp (vd, vd); XVCMPGEDP vector_gev2df {} const signed int __builtin_vsx_xvcmpgedp_p (signed int, vd, vd); XVCMPGEDP_P vector_ge_v2df_p {pred} - const vf __builtin_vsx_xvcmpgesp (vf, vf); - XVCMPGESP vector_gev4sf {} - const signed int __builtin_vsx_xvcmpgesp_p (signed int, vf, vf); XVCMPGESP_P vector_ge_v4sf_p {pred} @@ -1600,9 +1594,6 @@ const signed int __builtin_vsx_xvcmpgtdp_p (signed int, vd, vd); XVCMPGTDP_P vector_gt_v2df_p {pred} - const vf __builtin_vsx_xvcmpgtsp (vf, vf); - XVCMPGTSP vector_gtv4sf {} - const signed int __builtin_vsx_xvcmpgtsp_p (signed int, vf, vf); XVCMPGTSP_P vector_gt_v4sf_p {pred} diff --git a/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c index 60f91aad23c..d67f97c8011 100644 --- a/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c +++ b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c @@ -156,13 +156,27 @@ int do_cmp (void) { int i = 0; - d[i][0] = __builtin_vsx_xvcmpeqdp (d[i][1], d[i][2]); i++; - d[i][0] = __builtin_vsx_xvcmpgtdp (d[i][1], d[i][2]); i++; - d[i][0] = __builtin_vsx_xvcmpgedp (d[i][1], d[i][2]); i++; - - f[i][0] = __builtin_vsx_xvcmpeqsp (f[i][1], f[i][2]); i++; - f[i][0] = __builtin_vsx_xvcmpgtsp (f[i][1], f[i][2]); i++; - f[i][0] = __builtin_vsx_xvcmpgesp (f[i][1], f[i][2]); i++; + /* The __builtin_vsx_xvcmp[gt|ge|eq]dp and __builtin_vsx_xvcmp[gt|ge|eq]sp + have been removed in favor of the overloaded vec_cmpeq, vec_cmpgt and + vec_cmpge built-ins. The __builtin_vsx_xvcmp* builtins returned a vector + result of the same type as the
[PATCH 0/2] rs6000, remove vec and vsx set builtins
GCC maintainers: The code generated by using C-code to set a vector element versus using a built-in has been investigated. The assembly code generated from the C-code is as good or better than the assembly code generated for the built-ins for both the -O0 and -O3 levels of optimization. For the vec_insert built-in bif whose resolving makes use of the vec_set bif previously, is now removed, is as good as before with optimization. This two patch series removes the __builtin_vec_set_v1ti, __builtin_vec_set_v2df, __builtin_vec_set_v2di and built-ins __builtin_vsx_set_1ti, __builtin_vsx_set_2df, __builtin_vsx_set_2di built-ins in favor of using C-code instead. The built-ins use the built-in set attribute in the definitions of the built-ins. With the removal of these 6 built-ins, the set built-in attribute is no longer used and the related code for the attribute is removed. The patch, first patch in this series, to remove the __builtin_vec_set_v1ti, __builtin_vec_set_v2df, __builtin_vec_set_v2di was previously posted. The feedback on the patch was that we could also remove set bif attribute. Removal of the set bif attribute requires also removing the __builtin_vsx_set_1ti, __builtin_vsx_set_2df, __builtin_vsx_set_2di built-ins. The second patch removes the vsx set built-ins and the now no longer used set built-in attribute and associated code. The patches have been tested on a Power 10 LE system with no regressions. Carl
Re: [PATCH 2/2] rs6000, remove built-ins __builtin_vsx_set_1ti, __builtin_vsx_set_2df, __builtin_vsx_set_2di
GCC maintainers: This patch removes the vsx set built-ins: __builtin_vsx_set_1ti, __builtin_vsx_set_2df, __builtin_vsx_set_2di. With the removal of these built-ins, the built-in attribute "set", used in the built-in definition file, is no longer needed. The "set" and the associated code for the "set" is removed. The assembly code generated by using C code to set an element of a vector versus using the vsx set built-in to set an element was investigated. With -O0 optimization the generated assmenly code is comparable in therms of the generated assembly instrucitons and number of instructions. For the -O3 optimization level, the 2DI an 2DF cases the built-ins and the C code generate identical assembly code. The assembly code generated for the 1TI case for the C code has one less instruction. The built-in generates an extra load instruction. Hence, the C code is better as it has fewer load instructions. The testcase for the __builtin_vsx_set_2df is removed. The other built-ins do not have testcases. The patch has been tested on a Power 10 LE system with no regressions. Please let me know if the patch is acceptable for mainline. Thanks. Carl -- rs6000, remove built-ins __builtin_vsx_set_1ti, __builtin_vsx_set_2df, __builtin_vsx_set_2di The built-ins set a value in a vector. The same operation can be done in C-code. The assembly code generated from the C-code is as good or better than the code generated by the built-ins. With default optimization the number of assembly generated for the two methods are similar. With -O3 optimization, the assembly generated for the two approaches is identical for the 2DF and 2DI types. The assembly for the C-code version of the 1Ti requres one less assembly instruction. It also only uses one load versus two loads for the built-in. With the removal of the built-ins, there are no other uses of the set built-in attribute. The code associated with the set built-in attribute is removed. Finally, the testcase for the __builtin_vsx_set_2df is removed. The other built-ins do not have testcases. gcc/ChangeLog: * config/rs6000/rs6000-builtin.cc (get_element_number, altivec_expand_vec_set_builtin): Remove functions. (rs6000_expand_builtin): Remove the if statement to call altivec_expand_vec_set_builtin. * config/rs6000/rs6000-builtins.def (__builtin_vsx_set_1ti, __builtin_vsx_set_2df, __builtin_vsx_set_2di): Remove the built-in definitions. * config/rs6000/rs6000-gen-builtins.cc (struct attrinfo): Remove the isset variable from the structure. (parse_bif_attrs): Remove the uses of the isset variable. gcc/testsuite/ChangeLog: * gcc.target/powerpc/vsx-builtin-3.c: Remove test cases for the __builtin_vsx_set_2df built-in. --- gcc/config/rs6000/rs6000-builtin.cc | 53 --- gcc/config/rs6000/rs6000-builtins.def | 10 gcc/config/rs6000/rs6000-gen-builtins.cc | 29 -- .../gcc.target/powerpc/vsx-builtin-3.c | 6 --- 4 files changed, 11 insertions(+), 87 deletions(-) diff --git a/gcc/config/rs6000/rs6000-builtin.cc b/gcc/config/rs6000/rs6000-builtin.cc index 117cf0125f8..099cbc82245 100644 --- a/gcc/config/rs6000/rs6000-builtin.cc +++ b/gcc/config/rs6000/rs6000-builtin.cc @@ -2313,56 +2313,6 @@ altivec_expand_predicate_builtin (enum insn_code icode, tree exp, rtx target) return target; } -/* Return the integer constant in ARG. Constrain it to be in the range - of the subparts of VEC_TYPE; issue an error if not. */ - -static int -get_element_number (tree vec_type, tree arg) -{ - unsigned HOST_WIDE_INT elt, max = TYPE_VECTOR_SUBPARTS (vec_type) - 1; - - if (!tree_fits_uhwi_p (arg) - || (elt = tree_to_uhwi (arg), elt > max)) - { - error ("selector must be an integer constant in the range [0, %wi]", max); - return 0; - } - - return elt; -} - -/* Expand vec_set builtin. */ -static rtx -altivec_expand_vec_set_builtin (tree exp) -{ - machine_mode tmode, mode1; - tree arg0, arg1, arg2; - int elt; - rtx op0, op1; - - arg0 = CALL_EXPR_ARG (exp, 0); - arg1 = CALL_EXPR_ARG (exp, 1); - arg2 = CALL_EXPR_ARG (exp, 2); - - tmode = TYPE_MODE (TREE_TYPE (arg0)); - mode1 = TYPE_MODE (TREE_TYPE (TREE_TYPE (arg0))); - gcc_assert (VECTOR_MODE_P (tmode)); - - op0 = expand_expr (arg0, NULL_RTX, tmode, EXPAND_NORMAL); - op1 = expand_expr (arg1, NULL_RTX, mode1, EXPAND_NORMAL); - elt = get_element_number (TREE_TYPE (arg0), arg2); - - if (GET_MODE (op1) != mode1 && GET_MODE (op1) != VOIDmode) - op1 = convert_modes (mode1, GET_MODE (op1), op1, true); - - op0 = force_reg (tmode, op0); - op1 = force_reg (mode1, op1); - - rs6000_expand_vector_set (op0, op1, GEN_INT (elt)); - - return op0; -} - /* Expand vec_ext builtin. */ static rtx altivec_expan
Re: [PATCH 1/2] rs6000, Remove __builtin_vec_set_v1ti,, __builtin_vec_set_v2df, __builtin_vec_set_v2di
GCC maintainers: This patch was previously posted. Per the feedback, it is now the first of two patches to remove the set built-ins. This patch removes the __builtin_vec_set_v1ti, __builtin_vec_set_v2df and __builtin_vec_set_v2di built-ins. The users should just use normal C-code to update the various vector elements. This change was originally intended to be part of the earlier series of cleanup patches. It was initially thought that some additional work would be needed to do some gimple generation instead of these built-ins. However, the existing default code generation does produce the needed code. For the vec_set bif, the equivalent C code is as good or better than the built-in. For the vec_insert bif whose resolving previously made use of the vec_set bif, the assembly code generation is as good as before with the -O3 optimization. The patch has been tested on Power 10 LE with no regressions. Please let me know if the patch is acceptable for mainline. Thanks. Carl - rs6000, Remove __builtin_vec_set_v1ti, __builtin_vec_set_v2df, __builtin_vec_set_v2di Remove the built-ins, use the default gimple generation instead. gcc/ChangeLog: * config/rs6000/rs6000-builtins.def (__builtin_vec_set_v1ti, __builtin_vec_set_v2df, __builtin_vec_set_v2di): Remove built-in definitions. * config/rs6000/rs6000-c.cc (resolve_vec_insert): Remove the handling for constant vec_insert position with VECTOR_UNIT_VSX_P V1TImode, V2DFmode and V2DImode modes. --- gcc/config/rs6000/rs6000-builtins.def | 13 - gcc/config/rs6000/rs6000-c.cc | 40 --- 2 files changed, 53 deletions(-) diff --git a/gcc/config/rs6000/rs6000-builtins.def b/gcc/config/rs6000/rs6000-builtins.def index 47830b7dcb0..75c33aa9ffc 100644 --- a/gcc/config/rs6000/rs6000-builtins.def +++ b/gcc/config/rs6000/rs6000-builtins.def @@ -1263,19 +1263,6 @@ const signed long long __builtin_vec_ext_v2di (vsll, signed int); VEC_EXT_V2DI nothing {extract} -;; VEC_SET_V1TI, VEC_SET_V2DF and VEC_SET_V2DI are used in -;; resolve_vec_insert(), rs6000-c.cc -;; TODO: Remove VEC_SET_V1TI, VEC_SET_V2DF and VEC_SET_V2DI once the uses -;; in resolve_vec_insert are replaced by the equivalent gimple statements. - const vsq __builtin_vec_set_v1ti (vsq, signed __int128, const int<0,0>); - VEC_SET_V1TI nothing {set} - - const vd __builtin_vec_set_v2df (vd, double, const int<1>); - VEC_SET_V2DF nothing {set} - - const vsll __builtin_vec_set_v2di (vsll, signed long long, const int<1>); - VEC_SET_V2DI nothing {set} - const vsc __builtin_vsx_cmpge_16qi (vsc, vsc); CMPGE_16QI vector_nltv16qi {} diff --git a/gcc/config/rs6000/rs6000-c.cc b/gcc/config/rs6000/rs6000-c.cc index 68519e1397f..04882c396bf 100644 --- a/gcc/config/rs6000/rs6000-c.cc +++ b/gcc/config/rs6000/rs6000-c.cc @@ -1524,46 +1524,6 @@ resolve_vec_insert (resolution *res, vecva_gc> *arglist, return error_mark_node; } - /* If we can use the VSX xxpermdi instruction, use that for insert. */ - machine_mode mode = TYPE_MODE (arg1_type); - - if ((mode == V2DFmode || mode == V2DImode) - && VECTOR_UNIT_VSX_P (mode) - && TREE_CODE (arg2) == INTEGER_CST) - { - wide_int selector = wi::to_wide (arg2); - selector = wi::umod_trunc (selector, 2); - arg2 = wide_int_to_tree (TREE_TYPE (arg2), selector); - - tree call = NULL_TREE; - if (mode == V2DFmode) - call = rs6000_builtin_decls[RS6000_BIF_VEC_SET_V2DF]; - else if (mode == V2DImode) - call = rs6000_builtin_decls[RS6000_BIF_VEC_SET_V2DI]; - - /* Note, __builtin_vec_insert_ has vector and scalar types - reversed. */ - if (call) - { - *res = resolved; - return build_call_expr (call, 3, arg1, arg0, arg2); - } - } - - else if (mode == V1TImode - && VECTOR_UNIT_VSX_P (mode) - && TREE_CODE (arg2) == INTEGER_CST) - { - tree call = rs6000_builtin_decls[RS6000_BIF_VEC_SET_V1TI]; - wide_int selector = wi::zero(32); - arg2 = wide_int_to_tree (TREE_TYPE (arg2), selector); - - /* Note, __builtin_vec_insert_ has vector and scalar types - reversed. */ - *res = resolved; - return build_call_expr (call, 3, arg1, arg0, arg2); - } - /* Build *(((arg1_inner_type*) & (vector type){arg1}) + arg2) = arg0 with VIEW_CONVERT_EXPR. i.e.: D.3192 = v1; -- 2.45.2