Re: [PATCH,rs6000] Fix implementation of vec_unpackh, vec_unpackl builtins

2018-07-03 Thread Carl Love
Segher:

On Mon, 2018-07-02 at 11:53 -0500, Segher Boessenkool wrote:
> Hi!
> 
> On Fri, Jun 29, 2018 at 07:38:39AM -0700, Carl Love wrote:
> > +;; Unpack high elements of float vector to vector of doubles
> > +(define_expand "altivec_unpackh_v4sf"
> > +  [(set (match_operand:V2DF 0 "register_operand" "=v")
> > +(match_operand:V4SF 1 "register_operand" "v"))]
> > +  "TARGET_VSX"
> > +{
> > +  emit_insn (gen_doublehv4sf2 (operands[0], operands[1]));
> > +  DONE;
> > +}
> > +  [(set_attr "type" "veccomplex")])
> 
> I wondered if these mactually work for all VSX registers, not just
> the VMX
> registers (i.e. "wa" or similar instead of "v").  But constraints in
> define_expand are meaningless anyway; just leave them out please.
> 
> Does it help to define these altivec_unpackh_v4sf, when all it does
> is
> expand as doublehv4sf2?  Can't you more easily put the latter in the
> tables?

Yes, my bad. It is way cleaner to just do it directly.  My first
attempt needed the define_expand but then I realized I had made things
way more complicated then needed and rewrote the patch.

> 
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c
> > @@ -0,0 +1,257 @@
> > +/* { dg-do compile { target powerpc*-*-* } } */
> > +/* { dg-require-effective-target powerpc_altivec_ok } */
> > +/* { dg-options "-mpower8-vector -maltivec" } */
> 
> This needs p8vector_ok then.  Is that correct?  What requires p8?
> Is VSX (p7) enough for everything here?
> 
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c
> > @@ -0,0 +1,94 @@
> > +/* { dg-do compile { target powerpc*-*-* } } */
> > +/* { dg-require-effective-target powerpc_altivec_ok } */
> > +/* { dg-options "-mpower8-vector -mvsx" } */
> 
> Same here: required target does not match options.
> 
By bad again, I can't follow my own comments. altivec-1-runnable.c does
not need power 8.  But altivec-2-runnable.c does, per the comments in
the file.

Fixed the various issues and retested on 

powerpc64le-unknown-linux-gnu (Power 8 LE)  
powerpc64-unknown-linux-gnu (Power 8 BE)
    powerpc64le-unknown-linux-gnu (Power 9 LE)

Please let me know if the patch looks OK for GCC mainline. The patch
also needs to be backported to GCC 8.

 Carl Love
-----


gcc/ChangeLog:

2018-07-03  Carl Love  

* config/rs6000/rs6000-c.c: Map ALTIVEC_BUILTIN_VEC_UNPACKH for
float argument to VSX_BUILTIN_DOUBLEH_V4SF.
Map ALTIVEC_BUILTIN_VEC_UNPACKL for float argument to
VSX_BUILTIN_DOUBLEL_V4SF.

gcc/testsuite/ChangeLog:

2018-07-03  Carl Love  
* gcc.target/altivec-1-runnable.c: New test file.
* gcc.target/altivec-2-runnable.c: New test file.
* gcc.target/vsx-7.c (main2): Change expected expected instruction
for tests.
---
 gcc/config/rs6000/rs6000-c.c   |   4 +-
 .../gcc.target/powerpc/altivec-1-runnable.c| 257 +
 .../gcc.target/powerpc/altivec-2-runnable.c|  94 
 gcc/testsuite/gcc.target/powerpc/vsx-7.c   |   7 +-
 4 files changed, 356 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c

diff --git a/gcc/config/rs6000/rs6000-c.c b/gcc/config/rs6000/rs6000-c.c
index f4b1bf7..f37f0b1 100644
--- a/gcc/config/rs6000/rs6000-c.c
+++ b/gcc/config/rs6000/rs6000-c.c
@@ -865,7 +865,7 @@ const struct altivec_builtin_types 
altivec_overloaded_builtins[] = {
 RS6000_BTI_bool_V2DI, RS6000_BTI_bool_V4SI, 0, 0 },
   { ALTIVEC_BUILTIN_VEC_UNPACKH, ALTIVEC_BUILTIN_VUPKHPX,
 RS6000_BTI_unsigned_V4SI, RS6000_BTI_pixel_V8HI, 0, 0 },
-  { ALTIVEC_BUILTIN_VEC_UNPACKH, ALTIVEC_BUILTIN_VUPKHPX,
+  { ALTIVEC_BUILTIN_VEC_UNPACKH, VSX_BUILTIN_DOUBLEH_V4SF,
 RS6000_BTI_V2DF, RS6000_BTI_V4SF, 0, 0 },
   { ALTIVEC_BUILTIN_VEC_VUPKHSH, ALTIVEC_BUILTIN_VUPKHSH,
 RS6000_BTI_V4SI, RS6000_BTI_V8HI, 0, 0 },
@@ -897,7 +897,7 @@ const struct altivec_builtin_types 
altivec_overloaded_builtins[] = {
 RS6000_BTI_V2DI, RS6000_BTI_V4SI, 0, 0 },
   { ALTIVEC_BUILTIN_VEC_UNPACKL, P8V_BUILTIN_VUPKLSW,
 RS6000_BTI_bool_V2DI, RS6000_BTI_bool_V4SI, 0, 0 },
-  { ALTIVEC_BUILTIN_VEC_UNPACKL, ALTIVEC_BUILTIN_VUPKLPX,
+  { ALTIVEC_BUILTIN_VEC_UNPACKL, VSX_BUILTIN_DOUBLEL_V4SF,
 RS6000_BTI_V2DF, RS6000_BTI_V4SF, 0, 0 },
   { ALTIVEC_BUILTIN_VEC_VUPKLPX, ALTIVEC_BUILTIN_VUPKLPX,
 RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V8HI, 0, 0 },
diff --git a/gcc/testsuite/gcc.target/powerpc/altivec-1-ru

[PATCH,rs6000] Backport of stxvl instruction fix to GCC 7

2018-07-09 Thread Carl Love
GCC Maintainers:

The following patch is a back port for a commit to mainline prior to
GCC 8 release.  Note, the code fixed by this patch was later modified
in commit 256798 as part of adding vec_xst_len support.  The sldi
instruction gets replaced by an ashift of the operand for the stxvl
instruction.  Commit 256798 adds additional functionality and does not
fix any functional issues.  Hence it is not being back ported, just the
original bug fix given below.

The patch has been tested on 

powerpc64le-unknown-linux-gnu (Power 8 LE)  

With no regressions.

Please let me know if the patch looks OK for GCC 7.

 Carl Love
---
2018-07-09  Carl Love  

Backport from mainline
2017-09-07  Carl Love  

* config/rs6000/vsx.md (define_insn "*stxvl"): Add missing argument to
the sldi instruction.
---
 gcc/config/rs6000/vsx.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index eef5357..37d768f 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -3946,7 +3946,7 @@
      (match_operand:DI 2 "register_operand" "+r")]
     UNSPEC_STXVL))]
   "TARGET_P9_VECTOR && TARGET_64BIT"
-  "sldi %2,%2\;stxvl %x0,%1,%2"
+  "sldi %2,%2,56\;stxvl %x0,%1,%2"
   [(set_attr "length" "8")
(set_attr "type" "vecstore")])
 
-- 
2.7.4



Re: [PATCH, rs6000] Fix AIX test case failures

2018-07-13 Thread Carl Love
On Fri, 2018-07-13 at 16:00 -0500, Segher Boessenkool wrote:
> On Fri, Jul 13, 2018 at 10:51:24AM -0400, David Edelsohn wrote:
> > On AIX it would be calling divtc3, but AIX defaults to 64 bit long
> > double.  Either all of these tests need
> > 
> > /* { dg-require-effective-target longdouble128 } */
> > 
> > or
> > 
> > /* { dg-additional-options "-mlong-double-128" { target powerpc-
> > ibm-aix* } } */
> > 
> > along with testing for "tc", e.g., bl .__divtc3
> 
> Which would you prefer David?  (I'd do the former).
> 
> 
> Segher
> 

Segher, David:

I reworked the patch per the first option that David gave.  The tests
divkc3-2.c, divkc3-3.c, mulkc3-2.c and mulkc3-3.c pass on Power 9 Linux
as they did before.  The tests are unsupported on Power8 Linux as they
were before.  Now, the tests are reported as unsupported on AIX rather
then failing on AIX.

Please let me know if you both approve the updated patch below.  Thanks
for the input and help on this.

   Carl Love

-------

gcc/testsuite/ChangeLog:

2018-07-13  Carl Love  

* gcc.target/powerpc/divkc3-2.c: Add dg-require-effective-target
longdouble128.
* gcc.target/powerpc/divkc3-3.c: Ditto.
* gcc.target/powerpc/mulkc3-2.c: Ditto.
* gcc.target/powerpc/mulkc3-3.c: Ditto.
* gcc.target/powerpc/fold-vec-mergehl-double.c: Update counts.
* gcc.target/powerpc/pr85456.c: Make check Linux and AIX specific.
---
 gcc/testsuite/gcc.target/powerpc/divkc3-2.c| 1 +
 gcc/testsuite/gcc.target/powerpc/divkc3-3.c| 1 +
 gcc/testsuite/gcc.target/powerpc/fold-vec-mergehl-double.c | 4 +---
 gcc/testsuite/gcc.target/powerpc/mulkc3-2.c| 1 +
 gcc/testsuite/gcc.target/powerpc/mulkc3-3.c| 1 +
 gcc/testsuite/gcc.target/powerpc/pr85456.c | 3 ++-
 6 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/gcc/testsuite/gcc.target/powerpc/divkc3-2.c 
b/gcc/testsuite/gcc.target/powerpc/divkc3-2.c
index d3fcbedac..e34ed40ba 100644
--- a/gcc/testsuite/gcc.target/powerpc/divkc3-2.c
+++ b/gcc/testsuite/gcc.target/powerpc/divkc3-2.c
@@ -1,5 +1,6 @@
 /* { dg-do compile { target { powerpc*-*-* } } } */
 /* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-require-effective-target longdouble128 } */
 /* { dg-options "-O2 -mpower8-vector -mabi=ieeelongdouble -Wno-psabi" } */
 
 /* Check that complex multiply generates the right call when long double is
diff --git a/gcc/testsuite/gcc.target/powerpc/divkc3-3.c 
b/gcc/testsuite/gcc.target/powerpc/divkc3-3.c
index 45695fef8..c0fda8b24 100644
--- a/gcc/testsuite/gcc.target/powerpc/divkc3-3.c
+++ b/gcc/testsuite/gcc.target/powerpc/divkc3-3.c
@@ -1,5 +1,6 @@
 /* { dg-do compile { target { powerpc*-*-* } } } */
 /* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-require-effective-target longdouble128 } */
 /* { dg-options "-O2 -mpower8-vector -mabi=ibmlongdouble -Wno-psabi" } */
 
 /* Check that complex multiply generates the right call when long double is
diff --git a/gcc/testsuite/gcc.target/powerpc/fold-vec-mergehl-double.c 
b/gcc/testsuite/gcc.target/powerpc/fold-vec-mergehl-double.c
index 25f4bc6aa..14f944817 100644
--- a/gcc/testsuite/gcc.target/powerpc/fold-vec-mergehl-double.c
+++ b/gcc/testsuite/gcc.target/powerpc/fold-vec-mergehl-double.c
@@ -19,7 +19,5 @@ testd_h (vector double vd2, vector double vd3)
   return vec_mergeh (vd2, vd3);
 }
 
-/* vec_merge with doubles tend to just use xxpermdi (3 ea for BE, 1 ea for 
LE).  */
-/* { dg-final { scan-assembler-times "xxpermdi" 2  { target { powerpc*le-*-* } 
}} } */
-/* { dg-final { scan-assembler-times "xxpermdi" 6  { target { powerpc-*-* } }  
   } } */
+/* { dg-final { scan-assembler-times "xxpermdi" 2 } } */
 
diff --git a/gcc/testsuite/gcc.target/powerpc/mulkc3-2.c 
b/gcc/testsuite/gcc.target/powerpc/mulkc3-2.c
index 9ba577a0c..eee6de9e2 100644
--- a/gcc/testsuite/gcc.target/powerpc/mulkc3-2.c
+++ b/gcc/testsuite/gcc.target/powerpc/mulkc3-2.c
@@ -1,5 +1,6 @@
 /* { dg-do compile { target { powerpc*-*-* } } } */
 /* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-require-effective-target longdouble128 } */
 /* { dg-options "-O2 -mpower8-vector -mabi=ieeelongdouble -Wno-psabi" } */
 
 /* Check that complex multiply generates the right call when long double is
diff --git a/gcc/testsuite/gcc.target/powerpc/mulkc3-3.c 
b/gcc/testsuite/gcc.target/powerpc/mulkc3-3.c
index db8730158..b6d2bdf73 100644
--- a/gcc/testsuite/gcc.target/powerpc/mulkc3-3.c
+++ b/gcc/testsuite/gcc.target/powerpc/mulkc3-3.c
@@ -1,5 +1,6 @@
 /* { dg-do compile { target { powerpc*-*-* } } } */
 /* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-require-effective-target longd

[PATCH, rs6000] Fix AIX test case failures

2018-07-17 Thread Carl Love
Segher:

I was requested to backport the patch for the AIX test case failures to
GCC 8.  The trunk patch applied cleanly to GCC 8.  I updated the
changelog patch, built and retested the patch on:

    powerpc64le-unknown-linux-gnu (Power 8 LE)  
powerpc64-unknown-linux-gnu (Power 8 BE)
AIX 7200-00-01-1543 (Power 8 BE)

With no regressions.

Please let me know if it is OK to apply the patch to the GCC 8 branch. 
Thanks.

 Carl Love
 -


gcc/testsuite/ChangeLog:

2018-07-17  Carl Love  

Backport from mainline
2018-07-16  Carl Love  

PR target/86414
* gcc.target/powerpc/divkc3-2.c: Add dg-require-effective-target
longdouble128.
* gcc.target/powerpc/divkc3-3.c: Ditto.
* gcc.target/powerpc/mulkc3-2.c: Ditto.
* gcc.target/powerpc/mulkc3-3.c: Ditto.
* gcc.target/powerpc/fold-vec-mergehl-double.c: Update counts.
* gcc.target/powerpc/pr85456.c: Make check Linux and AIX specific.
---
 gcc/testsuite/gcc.target/powerpc/divkc3-2.c| 1 +
 gcc/testsuite/gcc.target/powerpc/divkc3-3.c| 1 +
 gcc/testsuite/gcc.target/powerpc/fold-vec-mergehl-double.c | 4 +---
 gcc/testsuite/gcc.target/powerpc/mulkc3-2.c| 1 +
 gcc/testsuite/gcc.target/powerpc/mulkc3-3.c| 1 +
 gcc/testsuite/gcc.target/powerpc/pr85456.c | 3 ++-
 6 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/gcc/testsuite/gcc.target/powerpc/divkc3-2.c 
b/gcc/testsuite/gcc.target/powerpc/divkc3-2.c
index d3fcbed..e34ed40 100644
--- a/gcc/testsuite/gcc.target/powerpc/divkc3-2.c
+++ b/gcc/testsuite/gcc.target/powerpc/divkc3-2.c
@@ -1,5 +1,6 @@
 /* { dg-do compile { target { powerpc*-*-* } } } */
 /* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-require-effective-target longdouble128 } */
 /* { dg-options "-O2 -mpower8-vector -mabi=ieeelongdouble -Wno-psabi" } */
 
 /* Check that complex multiply generates the right call when long double is
diff --git a/gcc/testsuite/gcc.target/powerpc/divkc3-3.c 
b/gcc/testsuite/gcc.target/powerpc/divkc3-3.c
index 45695fe..c0fda8b 100644
--- a/gcc/testsuite/gcc.target/powerpc/divkc3-3.c
+++ b/gcc/testsuite/gcc.target/powerpc/divkc3-3.c
@@ -1,5 +1,6 @@
 /* { dg-do compile { target { powerpc*-*-* } } } */
 /* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-require-effective-target longdouble128 } */
 /* { dg-options "-O2 -mpower8-vector -mabi=ibmlongdouble -Wno-psabi" } */
 
 /* Check that complex multiply generates the right call when long double is
diff --git a/gcc/testsuite/gcc.target/powerpc/fold-vec-mergehl-double.c 
b/gcc/testsuite/gcc.target/powerpc/fold-vec-mergehl-double.c
index 25f4bc6..14f9448 100644
--- a/gcc/testsuite/gcc.target/powerpc/fold-vec-mergehl-double.c
+++ b/gcc/testsuite/gcc.target/powerpc/fold-vec-mergehl-double.c
@@ -19,7 +19,5 @@ testd_h (vector double vd2, vector double vd3)
   return vec_mergeh (vd2, vd3);
 }
 
-/* vec_merge with doubles tend to just use xxpermdi (3 ea for BE, 1 ea for 
LE).  */
-/* { dg-final { scan-assembler-times "xxpermdi" 2  { target { powerpc*le-*-* } 
}} } */
-/* { dg-final { scan-assembler-times "xxpermdi" 6  { target { powerpc-*-* } }  
   } } */
+/* { dg-final { scan-assembler-times "xxpermdi" 2 } } */
 
diff --git a/gcc/testsuite/gcc.target/powerpc/mulkc3-2.c 
b/gcc/testsuite/gcc.target/powerpc/mulkc3-2.c
index 9ba577a..eee6de9 100644
--- a/gcc/testsuite/gcc.target/powerpc/mulkc3-2.c
+++ b/gcc/testsuite/gcc.target/powerpc/mulkc3-2.c
@@ -1,5 +1,6 @@
 /* { dg-do compile { target { powerpc*-*-* } } } */
 /* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-require-effective-target longdouble128 } */
 /* { dg-options "-O2 -mpower8-vector -mabi=ieeelongdouble -Wno-psabi" } */
 
 /* Check that complex multiply generates the right call when long double is
diff --git a/gcc/testsuite/gcc.target/powerpc/mulkc3-3.c 
b/gcc/testsuite/gcc.target/powerpc/mulkc3-3.c
index db87301..b6d2bdf 100644
--- a/gcc/testsuite/gcc.target/powerpc/mulkc3-3.c
+++ b/gcc/testsuite/gcc.target/powerpc/mulkc3-3.c
@@ -1,5 +1,6 @@
 /* { dg-do compile { target { powerpc*-*-* } } } */
 /* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-require-effective-target longdouble128 } */
 /* { dg-options "-O2 -mpower8-vector -mabi=ibmlongdouble -Wno-psabi" } */
 
 /* Check that complex multiply generates the right call when long double is
diff --git a/gcc/testsuite/gcc.target/powerpc/pr85456.c 
b/gcc/testsuite/gcc.target/powerpc/pr85456.c
index b9df16a..b928292 100644
--- a/gcc/testsuite/gcc.target/powerpc/pr85456.c
+++ b/gcc/testsuite/gcc.target/powerpc/pr85456.c
@@ -11,4 +11,5 @@ do_powl (long double a, int i)
   return __builtin_powil (a, i);
 }
 
-/* { dg-final { scan-assembler "bl __powikf2" } } */
+/* { dg-final { scan-assembl

[PATCH,rs6000] AIX test fixes 2

2018-07-20 Thread Carl Love
GCC maintainers:

The following patch fixes errors on AIX for the "vector double" tests
in altivec-1-runnable.c file.  The type "vector double" requires the
use of the GCC command line option -mvsx. The vector double tests
in altivec-1-runnable.c should be in altivec-2-runnable.c.  It looks
like my Linux testing of the original patch worked because I configured
GCC by default with -mcpu=power8.  AIX is not using that as the default
processor thus causing the compile of altivec-1-runnable.c to fail.

The vec_or tests in builtins-1.c were moved to another file by a
previous patch.  The vec_or test generated the xxlor instruction.  The
count of the xxlor instruction varies depending on the target as it is
used as a move instruction.  No other tests generate the xxlor
instruction. Hence, the count check was removed.

The patch has been tested on 

powerpc64le-unknown-linux-gnu (Power 8 LE) 
powerpc64-unknown-linux-gnu (Power 8 BE)
AIX (Power 8)

With no regressions.

Please let me know if the patch looks OK for trunk.

     Carl Love



gcc/testsuite/ChangeLog:

2018-07-20  Carl Love  

* gcc.target/powerpc/altivec-1-runnable.c: Move vector double tests to
file altivec-2-runnable.c.
* gcc.target/powerpc/altivec-2-runnable.c: Add vector double tests.
* gcc.target/powerpc/buitlins-1.c: Remove check for xxlor.  Add linux 
and AIX
targets for divdi3 and udivdi3 instructions.
---
 .../gcc.target/powerpc/altivec-1-runnable.c| 50 --
 .../gcc.target/powerpc/altivec-2-runnable.c| 49 -
 gcc/testsuite/gcc.target/powerpc/builtins-1.c  |  9 ++--
 3 files changed, 52 insertions(+), 56 deletions(-)

diff --git a/gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c 
b/gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c
index bb913d2..da8ebbc 100644
--- a/gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c
+++ b/gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c
@@ -31,16 +31,9 @@ int main ()
   vector signed int vec_si_result, vec_si_expected;
   vector signed char vec_sc_arg;
   vector signed char vec_sc_result, vec_sc_expected;
-  vector float vec_float_arg;
-  vector double vec_double_result, vec_double_expected;
   vector pixel vec_pixel_arg;
   vector unsigned int vec_ui_result, vec_ui_expected;
 
-  union conv {
-     double d;
-     unsigned long long l;
-  } conv_exp, conv_val;
-
   vec_bs_arg = (vector bool short){ 0, 101, 202, 303,
    404, 505, 606, 707 };
   vec_bi_expected = (vector bool int){ 0, 101, 202, 303 };
@@ -209,49 +202,6 @@ int main ()
abort();
 #endif
   }
-  
-
-  vec_float_arg = (vector float){ 0.0, 1.5, 2.5, 3.5 };
-
-  vec_double_expected = (vector double){ 0.0, 1.5 };
-
-  vec_double_result = vec_unpackh (vec_float_arg);
-
-  for (i = 0; i < 2; i++) {
-if (vec_double_expected[i] != vec_double_result[i])
-  {
-#if DEBUG
-    printf("ERROR: vec_unpackh(), vec_double_expected[%d] = %f does not 
match vec_double_result[%d] = %f\n",
-   i, vec_double_expected[i], i, vec_double_result[i]);
-    conv_val.d = vec_double_result[i];
-    conv_exp.d = vec_double_expected[i];
-    printf(" vec_unpackh(), vec_double_expected[%d] = 0x%llx does not 
match vec_double_result[%d] = 0x%llx\n",
-   i, conv_exp.l, i,conv_val.l);
-#else
-    abort();
-#endif
-}
-  }
-
-  vec_double_expected = (vector double){ 2.5, 3.5 };
-
-  vec_double_result = vec_unpackl (vec_float_arg);
-
-  for (i = 0; i < 2; i++) {
-if (vec_double_expected[i] != vec_double_result[i])
-  {
-#if DEBUG
- printf("ERROR: vec_unpackl() vec_double_expected[%d] = %f does not 
match vec_double_result[%d] = %f\n",
-   i, vec_double_expected[i], i, vec_double_result[i]);
-    conv_val.d = vec_double_result[i];
-    conv_exp.d = vec_double_expected[i];
-    printf(" vec_unpackh(), vec_double_expected[%d] = 0x%llx does not 
match vec_double_result[%d] = 0x%llx\n",
-   i, conv_exp.l, i,conv_val.l);
-#else
- abort();
-#endif
-  }
-  }
 
   return 0;
 }
diff --git a/gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c 
b/gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c
index 9d8aad4..041edcb 100644
--- a/gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c
+++ b/gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c
@@ -23,8 +23,15 @@ int main ()
 
   vector signed int vec_si_arg;
   vector signed long long int vec_slli_result, vec_slli_expected;
+  vector float vec_float_arg;
+  vector double vec_double_result, vec_double_expected;
 
-  /*  use of ‘long long’ in AltiVec types requires -mvsx */
+  union conv {
+     double d;
+     unsigned long long l;
+  } conv_exp, conv_val;
+
+  

[PATCH] rs6000, Add missing overloaded bcd builtin tests

2023-10-30 Thread Carl Love
GCC maintainers:

The following patch adds tests for two of the rs6000 overloaded built-
ins that do not have tests.  Additionally the GCC documentation file
doc/extend.texi is updated to include the built-in definitions as they
were missing.

The patch has been tested on a Power 10 system with no regressions. 
Please let me know if this patch is acceptable for mainline.

 Carl

---
rs6000, Add missing overloaded bcd builtin tests

The two BCD overloaded built-ins __builtin_bcdsub_ge and __builtin_bcdsub_le
do not have a corresponding test.  Add tests to existing test file and update
the documentation with the built-in definitions.

gcc/ChangeLog:
* doc/extend.texi (__builtin_bcdsub_le, __builtin_bcdsub_ge): Add
documentation for the builti-ins.

gcc/testsuite/ChangeLog:
* bcd-3.c (do_sub_ge, do_suble): Add functions to test builtins
__builtin_bcdsub_ge and __builtin_bcdsub_le).
---
 gcc/doc/extend.texi  |  4 
 gcc/testsuite/gcc.target/powerpc/bcd-3.c | 22 +-
 2 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index cf0d0c63cce..fa7402813e7 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -20205,12 +20205,16 @@ int __builtin_bcdadd_ov (vector unsigned char, vector 
unsigned char, const int);
 vector __int128 __builtin_bcdsub (vector __int128, vector __int128, const int);
 vector unsigned char __builtin_bcdsub (vector unsigned char, vector unsigned 
char,
const int);
+int __builtin_bcdsub_le (vector __int128, vector __int128, const int);
+int __builtin_bcdsub_le (vector unsigned char, vector unsigned char, const 
int);
 int __builtin_bcdsub_lt (vector __int128, vector __int128, const int);
 int __builtin_bcdsub_lt (vector unsigned char, vector unsigned char, const 
int);
 int __builtin_bcdsub_eq (vector __int128, vector __int128, const int);
 int __builtin_bcdsub_eq (vector unsigned char, vector unsigned char, const 
int);
 int __builtin_bcdsub_gt (vector __int128, vector __int128, const int);
 int __builtin_bcdsub_gt (vector unsigned char, vector unsigned char, const 
int);
+int __builtin_bcdsub_ge (vector __int128, vector __int128, const int);
+int __builtin_bcdsub_ge (vector unsigned char, vector unsigned char, const 
int);
 int __builtin_bcdsub_ov (vector __int128, vector __int128, const int);
 int __builtin_bcdsub_ov (vector unsigned char, vector unsigned char, const 
int);
 @end smallexample
diff --git a/gcc/testsuite/gcc.target/powerpc/bcd-3.c 
b/gcc/testsuite/gcc.target/powerpc/bcd-3.c
index 7948a0c95e2..9891f4ff08e 100644
--- a/gcc/testsuite/gcc.target/powerpc/bcd-3.c
+++ b/gcc/testsuite/gcc.target/powerpc/bcd-3.c
@@ -3,7 +3,7 @@
 /* { dg-require-effective-target powerpc_p8vector_ok } */
 /* { dg-options "-mdejagnu-cpu=power8 -O2" } */
 /* { dg-final { scan-assembler-times "bcdadd\[.\] " 4 } } */
-/* { dg-final { scan-assembler-times "bcdsub\[.\] " 4 } } */
+/* { dg-final { scan-assembler-times "bcdsub\[.\] " 6 } } */
 /* { dg-final { scan-assembler-not   "bl __builtin"   } } */
 /* { dg-final { scan-assembler-not   "mtvsr" } } */
 /* { dg-final { scan-assembler-not   "mfvsr" } } */
@@ -93,6 +93,26 @@ do_sub_gt (vector_128_t a, vector_128_t b, int *p)
   return ret;
 }
 
+vector_128_t
+do_sub_ge (vector_128_t a, vector_128_t b, int *p)
+{
+  vector_128_t ret = __builtin_bcdsub (a, b, 0);
+  if (__builtin_bcdsub_ge (a, b, 0))
+*p = 1;
+
+  return ret;
+}
+
+vector_128_t
+do_sub_le (vector_128_t a, vector_128_t b, int *p)
+{
+  vector_128_t ret = __builtin_bcdsub (a, b, 0);
+  if (__builtin_bcdsub_le (a, b, 0))
+*p = 1;
+
+  return ret;
+}
+
 vector_128_t
 do_sub_ov (vector_128_t a, vector_128_t b, int *p)
 {
-- 
2.37.2




Re: [PATCH] rs6000, Add missing overloaded bcd builtin tests

2023-10-31 Thread Carl Love
On Tue, 2023-10-31 at 10:34 +0800, Kewen.Lin wrote:
> Hi Carl,
> 
> on 2023/10/31 08:08, Carl Love wrote:
> > GCC maintainers:
> > 
> > The following patch adds tests for two of the rs6000 overloaded
> > built-
> > ins that do not have tests.  Additionally the GCC documentation
> > file
> 
> I just found that actually they have the test coverage, because we
> have
> 
> #define __builtin_bcdcmpeq(a,b)   __builtin_vec_bcdsub_eq(a,b,0)
> #define __builtin_bcdcmpgt(a,b)   __builtin_vec_bcdsub_gt(a,b,0)
> #define __builtin_bcdcmplt(a,b)   __builtin_vec_bcdsub_lt(a,b,0)
> #define __builtin_bcdcmpge(a,b)   __builtin_vec_bcdsub_ge(a,b,0)
> #define __builtin_bcdcmple(a,b)   __builtin_vec_bcdsub_le(a,b,0)
> 
> in altivec.h and gcc/testsuite/gcc.target/powerpc/bcd-4.c tests all
> these

OK, my simple scripts are not going to pickup the stuff in altivec.h. 
They were just grepping for the built-in name in the test file
directory.

> __builtin_bcdcmp* ...
> 
> > doc/extend.texi is updated to include the built-in definitions as
> > they
> > were missing.
> 
> ... since we already document __builtin_vec_bcdsub_{eq,gt,lt}, I
> think
> it's still good to supplement the documentation and add the explicit
> testing cases.
> 
> > The patch has been tested on a Power 10 system with no
> > regressions. 
> > Please let me know if this patch is acceptable for mainline.
> > 
> >  Carl
> > 
> > ---
> > rs6000, Add missing overloaded bcd builtin tests
> > 
> > The two BCD overloaded built-ins __builtin_bcdsub_ge and
> > __builtin_bcdsub_le
> > do not have a corresponding test.  Add tests to existing test file
> > and update
> > the documentation with the built-in definitions.
> 
> As above, this commit log doesn't describe the actuality well, please
> update
> it with something like:
> 
> Currently we have the documentation for
> __builtin_vec_bcdsub_{eq,gt,lt} but
> not for __builtin_bcdsub_[gl]e, this patch is to supplement the
> descriptions
> for them.  Although they are mainly for __builtin_bcdcmp{ge,le}, we
> already
> have some testing coverage for __builtin_vec_bcdsub_{eq,gt,lt}, this
> patch
> adds the corresponding explicit test cases as well.
> 

OK, replaced the commit log with the suggestion.

> > gcc/ChangeLog:
> > * doc/extend.texi (__builtin_bcdsub_le, __builtin_bcdsub_ge):
> > Add
> > documentation for the builti-ins.
> > 
> > gcc/testsuite/ChangeLog:
> > * bcd-3.c (do_sub_ge, do_suble): Add functions to test builtins
> > __builtin_bcdsub_ge and __builtin_bcdsub_le).
> 
> 1) Unexpected ")" at the end.
> 
> 2) I supposed git gcc-verify would complain on this changelog entry.
> 
> Should be starting with:
> 
>   * gcc.target/powerpc/bcd-3.c (
> 
> , no?
> 

Yes, I ment to run the commit check but obviously got distracted and
didn't.  Sorry about that.  

> OK for trunk with the above comments addressed, thanks!
> 
OK, thanks.

Carl 

> BR,
> Kewen
> 
> > ---
> >  gcc/doc/extend.texi  |  4 
> >  gcc/testsuite/gcc.target/powerpc/bcd-3.c | 22
> > +-
> >  2 files changed, 25 insertions(+), 1 deletion(-)
> > 
> > diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> > index cf0d0c63cce..fa7402813e7 100644
> > --- a/gcc/doc/extend.texi
> > +++ b/gcc/doc/extend.texi
> > @@ -20205,12 +20205,16 @@ int __builtin_bcdadd_ov (vector unsigned
> > char, vector unsigned char, const int);
> >  vector __int128 __builtin_bcdsub (vector __int128, vector
> > __int128, const int);
> >  vector unsigned char __builtin_bcdsub (vector unsigned char,
> > vector unsigned char,
> > const int);
> > +int __builtin_bcdsub_le (vector __int128, vector __int128, const
> > int);
> > +int __builtin_bcdsub_le (vector unsigned char, vector unsigned
> > char, const int);
> >  int __builtin_bcdsub_lt (vector __int128, vector __int128, const
> > int);
> >  int __builtin_bcdsub_lt (vector unsigned char, vector unsigned
> > char, const int);
> >  int __builtin_bcdsub_eq (vector __int128, vector __int128, const
> > int);
> >  int __builtin_bcdsub_eq (vector unsigned char, vector unsigned
> > char, const int);
> >  int __builtin_bcdsub_gt (vector __int128, vector __int128, const
> > int);
> >  int __builtin_bcdsub_gt (vector unsigned char, vector unsigned
> > char, const int);
> > +int __builtin_bcdsub_ge (vect

Re: [PATCH] rs6000, Add missing overloaded bcd builtin tests

2023-10-31 Thread Carl Love
Segher:

On Tue, 2023-10-31 at 11:17 -0500, Segher Boessenkool wrote:


> 
> You could use gcov to see which rs6000 builtins are not exercised by
> anything in the testsuite, maybe.  This probably can be automated
> pretty
> nicely.

I will take a look at gcov.  I just did some relatively simple scripts
to go look for test cases.  For the non-overloaded built-ins, the
scrips had to exclude built-ins referenced by the overloaded built-ins.

This patch is just the first of a series of patches that I am working
on to try and clean up the built-in stuff per some comments in a PR. 
The internal LTC issue is
 
https://github.ibm.com/ltc-toolchain/power-gcc/issues/1288

The goal is to make sure there are test cases and documentation for all
of the overloaded and non overloaded built-in definitions.  Just a low
priority project to fill any spare cycles.  :-)

  Carl 




rs6000, built-in cleanup patch series

2024-02-20 Thread Carl Love
GCC maintainers:

The following series of patches cleanup some of the rs6000 built-in support.  
Some of the first patches fix errors in the definition of a few of the 
built-ins.  The built-ins are supposed to have unsigned arguments but are 
listed as signed.  Some of the built-ins are supposed to return unsigned values 
but were defined to return a signed value.

There are a number of built-ins that are not documented but are duplicates of 
other documented built-ins.  The duplicate definitions are removed so users 
will only use the supported documented built-ins.

There are a number of the built-ins that are not documented in either the Power 
Vector Intrinsic Reference manual or in the gcc/doc/extend.texi file.  The 
patch adds the missing documentation as needed.  

Also most of the built-ins do not have test cases.  The patch adds test cases 
for the various built-ins.

Carl 


[PATCH 01/11] rs6000, Fix __builtin_vsx_cmple* args and documentation, builtins

2024-02-20 Thread Carl Love


GCC maintainers:

This patch fixes the arguments and return type for the various 
__builtin_vsx_cmple* built-ins.  They were defined as signed but should have 
been defined as unsigned.

The patch has been tested on Power 10 with no regressions.

Please let me know if this patch is acceptable for mainline.  Thanks.

  Carl 

-

rs6000, Fix __builtin_vsx_cmple* args and documentation, builtins

The built-ins __builtin_vsx_cmple_u16qi, __builtin_vsx_cmple_u2di,
__builtin_vsx_cmple_u4si and __builtin_vsx_cmple_u8hi should take
unsigned arguments and return an unsigned result.  This patch changes
the arguments and return type from signed to unsigned.

The documentation for the signed and unsigned versions of
__builtin_vsx_cmple is missing from extend.texi.  This patch adds the
missing documentation.

Test cases are added for each of the signed and unsigned built-ins.

gcc/ChangeLog:
* config/rs6000/rs6000-builtins.def (__builtin_vsx_cmple_u16qi,
__builtin_vsx_cmple_u2di, __builtin_vsx_cmple_u4si): Change
arguments and return from signed to unsigned.
* doc/extend.texi (__builtin_vsx_cmple_16qi,
__builtin_vsx_cmple_8hi, __builtin_vsx_cmple_4si,
__builtin_vsx_cmple_u16qi, __builtin_vsx_cmple_u8hi,
__builtin_vsx_cmple_u4si): Add documentation.

gcc/testsuite/ChangeLog:
* gcc.target/powerpc/vsx-cmple.c: New test file.
---
 gcc/config/rs6000/rs6000-builtins.def|  10 +-
 gcc/doc/extend.texi  |  23 
 gcc/testsuite/gcc.target/powerpc/vsx-cmple.c | 127 +++
 3 files changed, 155 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-cmple.c

diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index 3bc7fed6956..d66a53a0fab 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -1349,16 +1349,16 @@
   const vss __builtin_vsx_cmple_8hi (vss, vss);
 CMPLE_8HI vector_ngtv8hi {}
 
-  const vsc __builtin_vsx_cmple_u16qi (vsc, vsc);
+  const vuc __builtin_vsx_cmple_u16qi (vuc, vuc);
 CMPLE_U16QI vector_ngtuv16qi {}
 
-  const vsll __builtin_vsx_cmple_u2di (vsll, vsll);
+  const vull __builtin_vsx_cmple_u2di (vull, vull);
 CMPLE_U2DI vector_ngtuv2di {}
 
-  const vsi __builtin_vsx_cmple_u4si (vsi, vsi);
+  const vui __builtin_vsx_cmple_u4si (vui, vui);
 CMPLE_U4SI vector_ngtuv4si {}
 
-  const vss __builtin_vsx_cmple_u8hi (vss, vss);
+  const vus __builtin_vsx_cmple_u8hi (vus, vus);
 CMPLE_U8HI vector_ngtuv8hi {}
 
   const vd __builtin_vsx_concat_2df (double, double);
@@ -1769,7 +1769,7 @@
   const vf __builtin_vsx_xvcvuxdsp (vull);
 XVCVUXDSP vsx_xvcvuxdsp {}
 
-  const vd __builtin_vsx_xvcvuxwdp (vsi);
+  const vd __builtin_vsx_xvcvuxwdp (vui);
 XVCVUXWDP vsx_xvcvuxwdp {}
 
   const vf __builtin_vsx_xvcvuxwsp (vsi);
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 2b8ba1949bf..4d8610f6aa8 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -22522,6 +22522,29 @@ if the VSX instruction set is available.  The 
@samp{vec_vsx_ld} and
 @samp{vec_vsx_st} built-in functions always generate the VSX @samp{LXVD2X},
 @samp{LXVW4X}, @samp{STXVD2X}, and @samp{STXVW4X} instructions.
 
+
+@smallexample
+vector signed char __builtin_vsx_cmple_16qi (vector signed char,
+ vector signed char);
+vector signed short __builtin_vsx_cmple_8hi (vector signed short,
+ vector signed short);
+vector signed int __builtin_vsx_cmple_4si (vector signed int,
+ vector signed int);
+vector unsigned char __builtin_vsx_cmple_u16qi (vector unsigned char,
+vector unsigned char);
+vector unsigned short __builtin_vsx_cmple_u8hi (vector unsigned short,
+vector unsigned short);
+vector unsigned int __builtin_vsx_cmple_u4si (vector unsigned int,
+  vector unsigned int);
+@end smallexample
+
+The builti-ins @code{__builtin_vsx_cmple_16qi}, @code{__builtin_vsx_cmple_8hi},
+@code{__builtin_vsx_cmple_4si}, @code{__builtin_vsx_cmple_u16qi},
+@code{__builtin_vsx_cmple_u8hi} and @code{__builtin_vsx_cmple_u4si} compare
+vectors of their defined type.  The corresponding result element is set to
+all ones if the two argument elements are less than or equal and all zeros
+otherwise.
+
 @node PowerPC AltiVec Built-in Functions Available on ISA 2.07
 @subsubsection PowerPC AltiVec Built-in Functions Available on ISA 2.07
 
diff --git a/gcc/testsuite/gcc.target/powerpc/vsx-cmple.c 
b/gcc/testsuite/gcc.target/powerpc/vsx-cmple.c
new file mode 100644
index 000..081817b4ba3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vsx-cmple.c
@@ -0,0 +1,127 @@
+/* { dg

[PATCH 02/11] rs6000, fix arguments, add documentation for vector, element conversions

2024-02-20 Thread Carl Love


GCC maintainers:

This patch fixes the  return type for the __builtin_vsx_xvcvdpuxws and 
__builtin_vsx_xvcvspuxds built-ins.  They were defined as signed but should 
have been defined as unsigned.

The patch has been tested on Power 10 with no regressions.

Please let me know if this patch is acceptable for mainline.  Thanks.

  Carl 

-
rs6000, fix arguments, add documentation for vector element conversions

The return type for the __builtin_vsx_xvcvdpuxws, __builtin_vsx_xvcvspuxds,
__builtin_vsx_xvcvspuxws built-ins should be unsigned.  This patch changes
the return values from signed to unsigned.

The documentation for the vector element conversion built-ins:

__builtin_vsx_xvcvspsxws
__builtin_vsx_xvcvspsxds
__builtin_vsx_xvcvspuxds
__builtin_vsx_xvcvdpsxws
__builtin_vsx_xvcvdpuxws
__builtin_vsx_xvcvdpuxds_uns
__builtin_vsx_xvcvspdp
__builtin_vsx_xvcvdpsp
__builtin_vsx_xvcvspuxws
__builtin_vsx_xvcvsxwdp
__builtin_vsx_xvcvuxddp_uns
__builtin_vsx_xvcvuxwdp

is missing from extend.texi.  This patch adds the missing documentation.

This patch also adds runnable test cases for each of the built-ins.

gcc/ChangeLog:
* config/rs6000/rs6000-builtins.def (__builtin_vsx_xvcvdpuxws,
__builtin_vsx_xvcvspuxds, __builtin_vsx_xvcvspuxws): Change
return type from signed to unsigned.
* doc/extend.texi (__builtin_vsx_xvcvspsxws,
__builtin_vsx_xvcvspsxds, __builtin_vsx_xvcvspuxds,
__builtin_vsx_xvcvdpsxws, __builtin_vsx_xvcvdpuxws,
__builtin_vsx_xvcvdpuxds_uns, __builtin_vsx_xvcvspdp,
__builtin_vsx_xvcvdpsp, __builtin_vsx_xvcvspuxws,
__builtin_vsx_xvcvsxwdp, __builtin_vsx_xvcvuxddp_uns,
__builtin_vsx_xvcvuxwdp): Add documentation for builtins.

gcc/testsuite/ChangeLog:
* gcc.target/powerpc/vsx-builtin-runnable-1.c: New test file.
---
 gcc/config/rs6000/rs6000-builtins.def |   6 +-
 gcc/doc/extend.texi   | 135 ++
 .../powerpc/vsx-builtin-runnable-1.c  | 233 ++
 3 files changed, 371 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-builtin-runnable-1.c

diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index d66a53a0fab..fd316f629e5 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -1724,7 +1724,7 @@
   const vull __builtin_vsx_xvcvdpuxds_uns (vd);
 XVCVDPUXDS_UNS vsx_fixuns_truncv2dfv2di2 {}
 
-  const vsi __builtin_vsx_xvcvdpuxws (vd);
+  const vui __builtin_vsx_xvcvdpuxws (vd);
 XVCVDPUXWS vsx_xvcvdpuxws {}
 
   const vd __builtin_vsx_xvcvspdp (vf);
@@ -1736,10 +1736,10 @@
   const vsi __builtin_vsx_xvcvspsxws (vf);
 XVCVSPSXWS vsx_fix_truncv4sfv4si2 {}
 
-  const vsll __builtin_vsx_xvcvspuxds (vf);
+  const vull __builtin_vsx_xvcvspuxds (vf);
 XVCVSPUXDS vsx_xvcvspuxds {}
 
-  const vsi __builtin_vsx_xvcvspuxws (vf);
+  const vui __builtin_vsx_xvcvspuxws (vf);
 XVCVSPUXWS vsx_fixuns_truncv4sfv4si2 {}
 
   const vd __builtin_vsx_xvcvsxddp (vsll);
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 4d8610f6aa8..583b1d890bf 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -21360,6 +21360,141 @@ __float128 __builtin_sqrtf128 (__float128);
 __float128 __builtin_fmaf128 (__float128, __float128, __float128);
 @end smallexample
 
+@smallexample
+vector int __builtin_vsx_xvcvspsxws (vector float);
+@end smallexample
+
+The @code{__builtin_vsx_xvcvspsxws} converts the single precision floating
+point vector element i to a signed single-precision integer value using
+round to zero storing the result in element i.  If the source element is NaN
+the result is set to 0x8000 and VXCI is set to 1.  If the source
+element is SNaN then VXSNAN is also set to 1.  If the rounded value is greater
+than 2^31 - 1 the result is 0x7FFF and VXCVI is set to 1.  If the
+rounded value is less than -2^31, the result is set to 0x8000 and
+VXCVI is set to 1. If the rounded result is inexact then XX is set to 1.
+
+@smallexample
+vector signed long long int __builtin_vsx_xvcvspsxds (vector float);
+@end smallexample
+
+The @code{__builtin_vsx_xvcvspsxds} converts the single precision floating
+point vector element to a double precision signed integer value using the
+round to zero rounding mode.  If the source element is NaN the result
+is set to 0x8000 and VXCI is set to 1.  If the source element is
+SNaN then VXSNAN is also set to 1.  If the rounded value is greater than
+2^63 - 1 the result is 0x7FFF and VXCVI is set to 1.  If the
+rounded value is less than zero, the result is set to 0x8000 and
+VXCVI is set to 1.  If the rounded result is inexact then XX is set to 1.
+
+@smallexample
+vector unsigned long long __builtin_vsx_xvcvspuxds (vector float);
+@end smallexample
+
+The @code{__builtin_vsx_xvcvspuxds} conv

[PATCH 05/11] rs6000, __builtin_vsx_xvneg[sp,dp] add documentation, and test cases

2024-02-20 Thread Carl Love
GCC maintainers:

The patch adds documentation and test cases for the __builtin_vsx_xvnegsp, 
__builtin_vsx_xvnegdp built-ins.

The patch has been tested on Power 10 with no regressions.

Please let me know if this patch is acceptable for mainline.  Thanks.

  Carl 

rs6000, __builtin_vsx_xvneg[sp,dp] add documentation and test cases

Add documentation to the extend.texi file for the two built-ins
__builtin_vsx_xvnegsp, __builtin_vsx_xvnegdp.

Add test cases for the two built-ins.

gcc/ChangeLog:
* doc/extend.texi (__builtin_vsx_xvnegsp, __builtin_vsx_xvnegdp):
Add documentation.

gcc/testsuite/ChangeLog:
* gcc.target/powerpc/vsx-builtin-runnable-2.c: New test case.
---
 gcc/doc/extend.texi   | 13 +
 .../powerpc/vsx-builtin-runnable-2.c  | 51 +++
 2 files changed, 64 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-builtin-runnable-2.c

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 583b1d890bf..83eed9e334b 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -21495,6 +21495,19 @@ The @code{__builtin_vsx_xvcvuxwdp} converts single 
precision unsigned integer
 value to a double precision floating point value.  Input element at index 2*i
 is stored in the destination element i.
 
+@smallexample
+vector float __builtin_vsx_xvnegsp (vector float);
+vector double __builtin_vsx_xvnegdp (vector double);
+@end smallexample
+
+The  @code{__builtin_vsx_xvnegsp} and @code{__builtin_vsx_xvnegdp} negate each
+vector element.
+
+@smallexample
+vector __int128  __builtin_vsx_xxpermdi_1ti (vector __int128, vector __int128,
+const int);
+
+@end smallexample
 @node Basic PowerPC Built-in Functions Available on ISA 2.07
 @subsubsection Basic PowerPC Built-in Functions Available on ISA 2.07
 
diff --git a/gcc/testsuite/gcc.target/powerpc/vsx-builtin-runnable-2.c 
b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-runnable-2.c
new file mode 100644
index 000..7906a8e01d7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-runnable-2.c
@@ -0,0 +1,51 @@
+/* { dg-do run { target { lp64 } } } */
+/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-options "-O2 -mdejagnu-cpu=power7" } */
+
+#define DEBUG 0
+
+#if DEBUG
+#include 
+#include 
+#endif
+
+void abort (void);
+
+int main ()
+{
+  int i;
+  vector double vd_arg1, vd_result, vd_expected_result;
+  vector float vf_arg1, vf_result, vf_expected_result;
+
+  /* VSX Vector Negate Single-Precision.  */
+
+  vf_arg1 = (vector float) {-1.0, 12345.98, -2.1234, 238.9};
+  vf_result = __builtin_vsx_xvnegsp (vf_arg1);
+  vf_expected_result = (vector float) {1.0, -12345.98, 2.1234, -238.9};
+
+  for (i = 0; i < 4; i++)
+if (vf_result[i] != vf_expected_result[i])
+#if DEBUG
+  printf("ERROR, __builtin_vsx_xvnegsp: vf_result[%d] = %f, 
vf_expected_result[%d] = %f\n",
+i, vf_result[i], i, vf_expected_result[i]);
+#else
+  abort();
+#endif
+
+  /* VSX Vector Negate Double-Precision.  */
+
+  vd_arg1 = (vector double) {12345.98, -2.1234};
+  vd_result = __builtin_vsx_xvnegdp (vd_arg1);
+  vd_expected_result = (vector double) {-12345.98, 2.1234};
+
+  for (i = 0; i < 2; i++)
+if (vd_result[i] != vd_expected_result[i])
+#if DEBUG
+  printf("ERROR, __builtin_vsx_xvnegdp: vd_result[%d] = %f, 
vd_expected_result[%d] = %f\n",
+i, vd_result[i], i, vd_expected_result[i]);
+#else
+  abort();
+#endif
+
+  return 0;
+}
-- 
2.43.0



[PATCH 06/11] rs6000, __builtin_vsx_xxpermdi_1ti add documentation, and test case

2024-02-20 Thread Carl Love
GCC maintainers:

The patch adds documentation and test case for the __builtin_vsx_xxpermdi_1ti 
built-in.

The patch has been tested on Power 10 with no regressions.

Please let me know if this patch is acceptable for mainline.  Thanks.

  Carl 


rs6000, __builtin_vsx_xxpermdi_1ti add documentation and test case

Add documentation to the extend.texi file for the
__builtin_vsx_xxpermdi_1ti built-in.

Add test cases for the __builtin_vsx_xxpermdi_1ti built-in.

gcc/ChangeLog:
* doc/extend.texi (__builtin_vsx_xxpermdi_1ti): Add documentation.

gcc/testsuite/ChangeLog:
* gcc.target/powerpc/vsx-builtin-runnable-3.c: New test case.
---
 gcc/doc/extend.texi   |  7 +++
 .../powerpc/vsx-builtin-runnable-3.c  | 48 +++
 2 files changed, 55 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-builtin-runnable-3.c

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 83eed9e334b..22f67ebab31 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -21508,6 +21508,13 @@ vector __int128  __builtin_vsx_xxpermdi_1ti (vector 
__int128, vector __int128,
 const int);
 
 @end smallexample
+
+The  @code{__builtin_vsx_xxpermdi_1ti} Let srcA[127:0] be the 128-bit first
+argument and srcB[127:0] be the 128-bit second argument.  Let sel[1:0] be the
+least significant bits of the const int argument (third input argument).  The
+result bits [127:64] is srcB[127:64] if  sel[1] = 0, srcB[63:0] otherwise.  The
+result bits [63:0] is srcA[127:64] if  sel[0] = 0, srcA[63:0] otherwise.
+
 @node Basic PowerPC Built-in Functions Available on ISA 2.07
 @subsubsection Basic PowerPC Built-in Functions Available on ISA 2.07
 
diff --git a/gcc/testsuite/gcc.target/powerpc/vsx-builtin-runnable-3.c 
b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-runnable-3.c
new file mode 100644
index 000..ba287597cec
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-runnable-3.c
@@ -0,0 +1,48 @@
+/* { dg-do run { target { lp64 } } } */
+/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-options "-O2 -mdejagnu-cpu=power7" } */
+
+#include 
+
+#define DEBUG 0
+
+#if DEBUG
+#include 
+#include 
+#endif
+
+void abort (void);
+
+int main ()
+{
+  int i;
+
+  vector signed __int128 vsq_arg1, vsq_arg2, vsq_result, vsq_expected_result;
+
+  vsq_arg1[0] = (__int128) 0x;
+  vsq_arg1[0] = vsq_arg1[0] << 64 | (__int128) 0x;
+  vsq_arg2[0] = (__int128) 0x1100110011001100;
+  vsq_arg2[0] = (vsq_arg2[0]  << 64) | (__int128) 0x;
+
+  vsq_expected_result[0] = (__int128) 0x;
+  vsq_expected_result[0] = (vsq_expected_result[0] << 64)
+| (__int128) 0x;
+
+  vsq_result = __builtin_vsx_xxpermdi_1ti (vsq_arg1, vsq_arg2, 2);
+
+  if (vsq_result[0] != vsq_expected_result[0])
+{
+#if DEBUG
+   printf("ERROR, __builtin_vsx_xxpermdi_1ti: vsq_result = 0x%016llx 
%016llx\n",
+ (unsigned long long) (vsq_result[0] >> 64),
+ (unsigned long long) vsq_result[0]);
+   printf(" vsq_expected_resultd = 0x%016llx 
%016llx\n",
+ (unsigned long long)(vsq_expected_result[0] >> 64),
+ (unsigned long long) vsq_expected_result[0]);
+#else
+  abort();
+#endif
+ }
+
+  return 0;
+}
-- 
2.43.0



[PATCH 04/11] rs6000, Update comment for the __builtin_vsx_vper*, built-ins.

2024-02-20 Thread Carl Love
GCC maintainers:

The patch expands an existing comment to document that the duplicates are 
covered by an overloaded built-in.  I am wondering if we should just go ahead 
and remove the duplicates?

The patch has been tested on Power 10 with no regressions.

Please let me know if this patch is acceptable for mainline.  Thanks.

  Carl 

-
rs6000, Update comment for the __builtin_vsx_vper* built-ins.

There is a comment about the __builtin_vsx_vper* built-ins being
duplicates of the __builtin_altivec_* built-ins.  The note says we
should consider deprecation/removeal of the __builtin_vsx_vper*.  Add a
note that the _builtin_vsx_vper* built-ins are covered by the overloaded
vec_perm built-ins which use the __builtin_altivec_* built-in definitions.

gcc/ChangeLog:
* config/rs6000/rs6000-builtins.def ( __builtin_vsx_vperm_*):
Add comment to existing comment about the built-ins.
---
 gcc/config/rs6000/rs6000-builtins.def | 8 
 1 file changed, 8 insertions(+)

diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index 96d095da2cb..4c95429f137 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -1556,6 +1556,14 @@
 ; These are duplicates of __builtin_altivec_* counterparts, and are being
 ; kept for backwards compatibility.  The reason for their existence is
 ; unclear.  TODO: Consider deprecation/removal at some point.
+; Note, __builtin_vsx_vperm_16qi, __builtin_vsx_vperm_16qi_uns,
+; __builtin_vsx_vperm_1ti, __builtin_vsx_vperm_v1ti_uns,
+; __builtin_vsx_vperm_2df, __builtin_vsx_vperm_2di, __builtin_vsx_vperm_2di,
+; __builtin_vsx_vperm_2di_uns, __builtin_vsx_vperm_4sf,
+; __builtin_vsx_vperm_4si, __builtin_vsx_vperm_4si_uns,
+; __builtin_vsx_vperm_8hi, __builtin_altivec_vperm_8hi_uns
+; are all covered by the overloaded vec_perm built-in which uses the
+; __builtin_altivec_* built-in definitions.
   const vsc __builtin_vsx_vperm_16qi (vsc, vsc, vuc);
 VPERM_16QI_X altivec_vperm_v16qi {}
 
-- 
2.43.0



[PATCH 08/11] rs6000, add tests and documentation for various, built-ins

2024-02-20 Thread Carl Love
 
 GCC maintainers:

The patch adds documentation a number of built-ins.

The patch has been tested on Power 10 with no regressions.

Please let me know if this patch is acceptable for mainline.  Thanks.

  Carl 

 rs6000, add tests and documentation for various built-ins

This patch adds a test case and documentation in extend.texi for the
following built-ins:

__builtin_altivec_fix_sfsi
__builtin_altivec_fixuns_sfsi
__builtin_altivec_float_sisf
__builtin_altivec_uns_float_sisf
__builtin_altivec_vrsqrtfp
__builtin_altivec_mask_for_load
__builtin_altivec_vsel_1ti
__builtin_altivec_vsel_1ti_uns
__builtin_vec_init_v16qi
__builtin_vec_init_v4sf
__builtin_vec_init_v4si
__builtin_vec_init_v8hi
__builtin_vec_set_v16qi
__builtin_vec_set_v4sf
__builtin_vec_set_v4si
__builtin_vec_set_v8hi

gcc/ChangeLog:
* doc/extend.texi (__builtin_altivec_fix_sfsi,
__builtin_altivec_fixuns_sfsi, __builtin_altivec_float_sisf,
__builtin_altivec_uns_float_sisf, __builtin_altivec_vrsqrtfp,
__builtin_altivec_mask_for_load, __builtin_altivec_vsel_1ti,
__builtin_altivec_vsel_1ti_uns, __builtin_vec_init_v16qi,
__builtin_vec_init_v4sf, __builtin_vec_init_v4si,
__builtin_vec_init_v8hi, __builtin_vec_set_v16qi,
__builtin_vec_set_v4sf, __builtin_vec_set_v4si,
__builtin_vec_set_v8hi): Add documentation.

gcc/testsuite/ChangeLog:
* gcc.target/powerpc/altivec-38.c: New test case.
---
 gcc/doc/extend.texi   |  98 
 gcc/testsuite/gcc.target/powerpc/altivec-38.c | 503 ++
 2 files changed, 601 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/altivec-38.c

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 87fd30bfa9e..89d0a1f77b0 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -22678,6 +22678,104 @@ if the VSX instruction set is available.  The 
@samp{vec_vsx_ld} and
 @samp{LXVW4X}, @samp{STXVD2X}, and @samp{STXVW4X} instructions.
 
 
+@smallexample
+vector signed int __builtin_altivec_fix_sfsi (vector float);
+vector signed int __builtin_altivec_fixuns_sfsi (vector float);
+vector float __builtin_altivec_float_sisf (vector int);
+vector float __builtin_altivec_uns_float_sisf (vector int);
+vector float __builtin_altivec_vrsqrtfp (vector float);
+@end smallexample
+
+The @code{__builtin_altivec_fix_sfsi} converts a vector of single precision
+floating point values to a vector of signed integers with round to zero.
+
+The @code{__builtin_altivec_fixuns_sfsi} converts a vector of single precision
+floating point values to a vector of unsigned integers with round to zero.  If
+the rounded floating point value is less then 0 the result is 0 and VXCVI
+is set to 1.
+
+The @code{__builtin_altivec_float_sisf} converts a vector of single precision
+signed integers to a vector of floating point values using the rounding mode
+specified by RN.
+
+The @code{__builtin_altivec_uns_float_sisf} converts a vector of single
+precision unsigned integers to a vector of floating point values using the
+rounding mode specified by RN.
+
+The @code{__builtin_altivec_vrsqrtfp} returns a vector of floating point
+estimates of the reciprical square root of each floating point source vector
+element.
+
+@smallexample
+vector signed char test_altivec_mask_for_load (const void *);
+@end smallexample
+
+The @code{__builtin_altivec_vrsqrtfp} returns a vector mask based on the
+bottom four bits of the argument.  Let X be the 32-byte value:
+0x00 || 0x01 || 0x02 || ... || 0x1D || 0x1E || 0x1F.
+Bytes sh to sh+15 are returned where sh is given by the least significant 4
+bit of the argument. See description of lvsl, lvsr instructions.
+
+@smallexample
+vector signed __int128 __builtin_altivec_vsel_1ti (vector signed __int128,
+   vector signed __int128,
+   vector unsigned __int128);
+vector unsigned __int128
+  __builtin_altivec_vsel_1ti_uns (vector unsigned __int128,
+  vector unsigned __int128,
+  vector unsigned __int128)
+@end smallexample
+
+Let the arguments of @code{__builtin_altivec_vsel_1ti} and
+@code{__builtin_altivec_vsel_1ti_uns} be src1, src2, mask.  The result is
+given by (src1 & ~mask) | (src2 & mask).
+
+@smallexample
+vector signed char
+__builtin_vec_init_v16qi (signed char, signed char, signed char, signed char,
+  signed char, signed char, signed char, signed char,
+  signed char, signed char, signed char, signed char,
+  signed char, signed char, signed char, signed char);
+
+vector short int __builtin_vec_init_v8hi (short int, short int, short int,
+  short int, short int, short int,
+  short int, short int);

[PATCH 03/11] rs6000, remove duplicated built-ins

2024-02-20 Thread Carl Love
GCC maintainers:

There are a number of undocumented built-ins that are duplicates of other 
documented built-ins.  This patch removes the duplicates so users will only use 
the documented built-in.

The patch has been tested on Power 10 with no regressions.

Please let me know if this patch is acceptable for mainline.  Thanks.

  Carl 

-

rs6000, remove duplicated built-ins

The following undocumented built-ins are same as existing documented
overloaded builtins.

  const vf __builtin_vsx_xxmrghw (vf, vf);
same as  vf __builtin_vec_mergeh (vf, vf);  (overloaded vec_mergeh)

  const vsi __builtin_vsx_xxmrghw_4si (vsi, vsi);
same as vsi __builtin_vec_mergeh (vsi, vsi);   (overloaded vec_mergeh)

  const vf __builtin_vsx_xxmrglw (vf, vf);
same as vf __builtin_vec_mergel (vf, vf);  (overloaded vec_mergel)

  const vsi __builtin_vsx_xxmrglw_4si (vsi, vsi);
same as vsi __builtin_vec_mergel (vsi, vsi);   (overloaded vec_mergel)

  const vsc __builtin_vsx_xxsel_16qi (vsc, vsc, vsc);
same as vsc __builtin_vec_sel (vsc, vsc, vuc);  (overloaded vec_sel)

  const vuc __builtin_vsx_xxsel_16qi_uns (vuc, vuc, vuc);
same as vuc __builtin_vec_sel (vuc, vuc, vuc);  (overloaded vec_sel)

  const vd __builtin_vsx_xxsel_2df (vd, vd, vd);
same as  vd __builtin_vec_sel (vd, vd, vull);   (overloaded vec_sel)

  const vsll __builtin_vsx_xxsel_2di (vsll, vsll, vsll);
same as vsll __builtin_vec_sel (vsll, vsll, vsll);  (overloaded vec_sel)

  const vull __builtin_vsx_xxsel_2di_uns (vull, vull, vull);
same as vull __builtin_vec_sel (vull, vull, vsll);  (overloaded vec_sel)

  const vf __builtin_vsx_xxsel_4sf (vf, vf, vf);
same as vf __builtin_vec_sel (vf, vf, vsi)  (overloaded vec_sel)

  const vsi __builtin_vsx_xxsel_4si (vsi, vsi, vsi);
same as vsi __builtin_vec_sel (vsi, vsi, vbi);  (overloaded vec_sel)

  const vui __builtin_vsx_xxsel_4si_uns (vui, vui, vui);
same as vui __builtin_vec_sel (vui, vui, vui);  (overloaded vec_sel)

  const vss __builtin_vsx_xxsel_8hi (vss, vss, vss);
same as vss __builtin_vec_sel (vss, vss, vbs);  (overloaded vec_sel)

  const vus __builtin_vsx_xxsel_8hi_uns (vus, vus, vus);
same as vus __builtin_vec_sel (vus, vus, vus);  (overloaded vec_sel)

This patch removed the duplicate built-in definitions so only the
documented built-ins will be available for use.  The case statements in
rs6000_gimple_fold_builtin that ar no longer needed are also removed.

gcc/ChangeLog:
* config/rs6000/rs6000-builtins.def (__builtin_vsx_xxmrghw,
__builtin_vsx_xxmrghw_4si, __builtin_vsx_xxmrglw,
__builtin_vsx_xxmrglw_4si, __builtin_vsx_xxsel_16qi,
__builtin_vsx_xxsel_16qi_uns, __builtin_vsx_xxsel_2df,
__builtin_vsx_xxsel_2di, __builtin_vsx_xxsel_2di_uns,
__builtin_vsx_xxsel_4sf, __builtin_vsx_xxsel_4si,
__builtin_vsx_xxsel_4si_uns, __builtin_vsx_xxsel_8hi,
__builtin_vsx_xxsel_8hi_uns): Removed built-in definition.
* config/rs6000/rs6000-builtin.cc (rs6000_gimple_fold_builtin):
remove case entries RS6000_BIF_XXMRGLW_4SI,
RS6000_BIF_XXMRGLW_4SF, RS6000_BIF_XXMRGHW_4SI,
RS6000_BIF_XXMRGHW_4SF.

gcc/testsuite/ChangeLog:
* gcc.target/powerpc/vsx-builtin-3.c (__builtin_vsx_xxsel_4si,
__builtin_vsx_xxsel_8hi, __builtin_vsx_xxsel_16qi,
__builtin_vsx_xxsel_4sf, __builtin_vsx_xxsel_2df): Remove test
cases for removed built-ins.
---
 gcc/config/rs6000/rs6000-builtin.cc   |  4 --
 gcc/config/rs6000/rs6000-builtins.def | 42 ---
 .../gcc.target/powerpc/vsx-builtin-3.c|  6 ---
 3 files changed, 52 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-builtin.cc 
b/gcc/config/rs6000/rs6000-builtin.cc
index 6698274031b..e436cbe4935 100644
--- a/gcc/config/rs6000/rs6000-builtin.cc
+++ b/gcc/config/rs6000/rs6000-builtin.cc
@@ -2110,20 +2110,16 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
 /* vec_mergel (integrals).  */
 case RS6000_BIF_VMRGLH:
 case RS6000_BIF_VMRGLW:
-case RS6000_BIF_XXMRGLW_4SI:
 case RS6000_BIF_VMRGLB:
 case RS6000_BIF_VEC_MERGEL_V2DI:
-case RS6000_BIF_XXMRGLW_4SF:
 case RS6000_BIF_VEC_MERGEL_V2DF:
   fold_mergehl_helper (gsi, stmt, 1);
   return true;
 /* vec_mergeh (integrals).  */
 case RS6000_BIF_VMRGHH:
 case RS6000_BIF_VMRGHW:
-case RS6000_BIF_XXMRGHW_4SI:
 case RS6000_BIF_VMRGHB:
 case RS6000_BIF_VEC_MERGEH_V2DI:
-case RS6000_BIF_XXMRGHW_4SF:
 case RS6000_BIF_VEC_MERGEH_V2DF:
   fold_mergehl_helper (gsi, stmt, 0);
   return true;
diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index fd316f629e5..96d095da2cb 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -1925,18 +1925,6 @@
   const signed int __builtin_vsx_xvtsqrtsp_fg (vf);
 XVTSQRTSP_FG vsx_tsqrtv4sf2_fg {}
 

[PATCH 07/11] rs6000, __builtin_vsx_xvcmpeq[sp, dp, sp_p] add, documentation and test case

2024-02-20 Thread Carl Love


 GCC maintainers:

The patch adds documentation and test case for the  __builtin_vsx_xvcmpeq[sp, 
dp, sp_p] built-ins.

The patch has been tested on Power 10 with no regressions.

Please let me know if this patch is acceptable for mainline.  Thanks.

  Carl 


rs6000, __builtin_vsx_xvcmpeq[sp, dp, sp_p] add documentation and test case

Add a test case for the __builtin_vsx_xvcmpeqsp_p built-in.

Add documentation for the __builtin_vsx_xvcmpeqsp_p,
__builtin_vsx_xvcmpeqdp, and __builtin_vsx_xvcmpeqsp builtins.

gcc/ChangeLog:
* doc/extend.texi (__builtin_vsx_xvcmpeqsp_p,
__builtin_vsx_xvcmpeqdp, __builtin_vsx_xvcmpeqsp): Add
documentation.

gcc/testsuite/ChangeLog:
* gcc.target/powerpc/vsx-builtin-runnable-4.c: New test case.
---
 gcc/doc/extend.texi   |  23 +++
 .../powerpc/vsx-builtin-runnable-4.c  | 135 ++
 2 files changed, 158 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-builtin-runnable-4.c

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 22f67ebab31..87fd30bfa9e 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -22700,6 +22700,18 @@ vectors of their defined type.  The corresponding 
result element is set to
 all ones if the two argument elements are less than or equal and all zeros
 otherwise.
 
+@smallexample
+const vf __builtin_vsx_xvcmpeqsp (vf, vf);
+const vd __builtin_vsx_xvcmpeqdp (vd, vd);
+@end smallexample
+
+The builti-ins @code{__builtin_vsx_xvcmpeqdp} and
+@code{__builtin_vsx_xvcmpeqdp} compare two floating point vectors and return
+a vector.  If the corresponding elements are equal then the corresponding
+vector element of the result is set to all ones, it is set to all zeros
+otherwise.
+
+
 @node PowerPC AltiVec Built-in Functions Available on ISA 2.07
 @subsubsection PowerPC AltiVec Built-in Functions Available on ISA 2.07
 
@@ -23989,6 +24001,17 @@ is larger than 128 bits, the result is undefined.
 The result is the modulo result of dividing the first input  by the second
 input.
 
+@smallexample
+const signed int __builtin_vsx_xvcmpeqdp_p (signed int, vd, vd);
+@end smallexample
+
+The first argument of the builti-in @code{__builtin_vsx_xvcmpeqdp_p} is an
+integer in the range of 0 to 1.  The second and third arguments are floating
+point vectors to be compared.  The result is 1 if the first argument is a 1
+and one or more of the corresponding vector elements are equal.  The result is
+1 if the first argument is 0 and all of the corresponding vector elements are
+not equal.  The result is zero otherwise.
+
 The following builtins perform 128-bit vector comparisons.  The
 @code{vec_all_xx}, @code{vec_any_xx}, and @code{vec_cmpxx}, where @code{xx} is
 one of the operations @code{eq, ne, gt, lt, ge, le} perform pairwise
diff --git a/gcc/testsuite/gcc.target/powerpc/vsx-builtin-runnable-4.c 
b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-runnable-4.c
new file mode 100644
index 000..8ac07c7c807
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-runnable-4.c
@@ -0,0 +1,135 @@
+/* { dg-do run { target { power10_hw } } } */
+/* { dg-do link { target { ! power10_hw } } } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2 -save-temps" } */
+/* { dg-require-effective-target power10_ok } */
+
+#define DEBUG 0
+
+#if DEBUG
+#include 
+#include 
+#endif
+
+void abort (void);
+
+int main ()
+{
+  int i;
+  int result;
+  vector float vf_arg1, vf_arg2;
+  vector double d_arg1, d_arg2;
+
+  /* Compare vectors with one equal element, check
+ for all elements unequal, i.e. first arg is 1.  */
+  vf_arg1 = (vector float) {1.0, 2.0, 3.0, 4.0};
+  vf_arg2 = (vector float) {1.0, 3.0, 2.0, 8.0};
+  result = __builtin_vsx_xvcmpeqsp_p (1, vf_arg1, vf_arg2);
+
+#if DEBUG
+  printf("result = 0x%x\n", (unsigned int) result);
+#endif
+
+  if (result != 1)
+for (i = 0; i < 4; i++)
+#if DEBUG
+  printf("ERROR, __builtin_vsx_xvcmpeqsp_p 1: arg 1 = 1, varg3[%d] = %f, 
varg3[%d] = %f\n",
+i, vf_arg1[i], i, vf_arg2[i]);
+#else
+  abort();
+#endif
+  /* Compare vectors with one equal element, check
+ for all elements unequal, i.e. first arg is 0.  */
+  vf_arg1 = (vector float) {1.0, 2.0, 3.0, 4.0};
+  vf_arg2 = (vector float) {1.0, 3.0, 2.0, 8.0};
+  result = __builtin_vsx_xvcmpeqsp_p (0, vf_arg1, vf_arg2);
+
+#if DEBUG
+  printf("result = 0x%x\n", (unsigned int) result);
+#endif
+
+  if (result != 0)
+for (i = 0; i < 4; i++)
+#if DEBUG
+  printf("ERROR, __builtin_vsx_xvcmpeqsp_p 2: arg 1 = 0, varg3[%d] = %f, 
varg3[%d] = %f\n",
+i, vf_arg1[i], i, vf_arg2[i]);
+#else
+  abort();
+#endif
+
+  /* Compare vectors with all unequal elements, check
+ for all elements unequal, i.e. first arg is 1.  */
+  vf_arg1 = (vector float) {1.0, 2.0, 3.0, 4.0};
+  vf_arg2 = (vector float) {8.0, 3.0, 2.0, 8.0};
+  result = __builtin_vsx_xvcmpeqsp_p (1

[PATCH 09/11] rs6000, add test cases for the vec_cmpne built-ins

2024-02-20 Thread Carl Love
GCC maintainers:

The patch adds test cases for the vec_cmpne of built-ins.

The patch has been tested on Power 10 with no regressions.

Please let me know if this patch is acceptable for mainline.  Thanks.

  Carl 

rs6000, add test cases for the vec_cmpne built-ins

Add test cases for the signed int, unsigned it, signed short, unsigned
short, signed char and unsigned char built-ins.

Note, the built-ins are documented in the Power Vector Instrinsic
Programing reference manual.

gcc/testsuite/ChangeLog:
* gcc.target/powerpc/vec-cmple.c: New test case.
* gcc.target/powerpc/vec-cmple.h: New test case include file.
---
 gcc/testsuite/gcc.target/powerpc/vec-cmple.c | 35 
 gcc/testsuite/gcc.target/powerpc/vec-cmple.h | 84 
 2 files changed, 119 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-cmple.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-cmple.h

diff --git a/gcc/testsuite/gcc.target/powerpc/vec-cmple.c 
b/gcc/testsuite/gcc.target/powerpc/vec-cmple.c
new file mode 100644
index 000..766a1c770e2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-cmple.c
@@ -0,0 +1,35 @@
+/* { dg-do run } */
+/* { dg-require-effective-target powerpc_altivec_ok } */
+/* { dg-options "-maltivec -O2" } */
+
+/* Test that the vec_cmpne builtin generates the expected Altivec
+   instructions.  */
+
+#include "vec-cmple.h"
+
+int main ()
+{
+  /* Note macro expansions for "signed long long int" and
+ "unsigned long long int" do not work for the vec_vsx_ld builtin.  */
+  define_test_functions (int, signed int, signed int, si);
+  define_test_functions (int, unsigned int, unsigned int, ui);
+  define_test_functions (short, signed short, signed short, ss);
+  define_test_functions (short, unsigned short, unsigned short, us);
+  define_test_functions (char, signed char, signed char, sc);
+  define_test_functions (char, unsigned char, unsigned char, uc);
+
+  define_init_verify_functions (int, signed int, signed int, si);
+  define_init_verify_functions (int, unsigned int, unsigned int, ui);
+  define_init_verify_functions (short, signed short, signed short, ss);
+  define_init_verify_functions (short, unsigned short, unsigned short, us);
+  define_init_verify_functions (char, signed char, signed char, sc);
+  define_init_verify_functions (char, unsigned char, unsigned char, uc);
+
+  execute_test_functions (int, signed int, signed int, si);
+  execute_test_functions (int, unsigned int, unsigned int, ui);
+  execute_test_functions (short, signed short, signed short, ss);
+  execute_test_functions (short, unsigned short, unsigned short, us);
+  execute_test_functions (char, signed char, signed char, sc);
+  execute_test_functions (char, unsigned char, unsigned char, uc);
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-cmple.h 
b/gcc/testsuite/gcc.target/powerpc/vec-cmple.h
new file mode 100644
index 000..4126706b99a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-cmple.h
@@ -0,0 +1,84 @@
+#include "altivec.h"
+
+#define N 4096
+
+#include 
+void abort ();
+
+#define PRAGMA(X) _Pragma (#X)
+#define UNROLL0 PRAGMA (GCC unroll 0)
+
+#define define_test_functions(VBTYPE, RTYPE, STYPE, NAME)  \
+\
+RTYPE result_le_##NAME[N] __attribute__((aligned(16))); \
+STYPE operand1_##NAME[N] __attribute__((aligned(16))); \
+STYPE operand2_##NAME[N] __attribute__((aligned(16))); \
+RTYPE expected_##NAME[N] __attribute__((aligned(16))); \
+\
+__attribute__((noinline)) void vector_tests_##NAME () \
+{ \
+  vector STYPE v1_##NAME, v2_##NAME; \
+  vector bool VBTYPE tmp_##NAME; \
+  int i; \
+  UNROLL0 \
+  for (i = 0; i < N; i+=16/sizeof (STYPE)) \
+{ \
+  /* result_le = operand1!=operand2.  */ \
+  v1_##NAME = vec_vsx_ld (0, (const vector STYPE*)&operand1_##NAME[i]); \
+  v2_##NAME = vec_vsx_ld (0, (const vector STYPE*)&operand2_##NAME[i]); \
+\
+  tmp_##NAME = vec_cmple (v1_##NAME, v2_##NAME); \
+  vec_vsx_st (tmp_##NAME, 0, &result_le_##NAME[i]); \
+} \
+}
+
+#define define_init_verify_functions(VBTYPE, RTYPE, STYPE, NAME)   \
+__attribute__((noinline)) void init_##NAME () \
+{ \
+  int i; \
+  for (i = 0; i < N; ++i) \
+{ \
+  result_le_##NAME[i] = 7; \
+  if (i%3 == 0) \
+   { \
+ /* op1 < op2.  */ \
+ operand1_##NAME[i] = 1; \
+ operand2_##NAME[i] = 2; \
+   } \
+  else if (i%3 == 1) \
+   { \
+ /* op1 > op2.  */ \
+ operand1_##NAME[i] = 2; \
+ operand2_##NAME[i] = 1; \
+   } \
+  else if (i%3 == 2) \
+   { \
+ /* op1 == op2.  */ \
+ operand1_##NAME[i] = 3; \
+ operand2_##NAME[i] = 3; \
+   } \
+  /* For vector comparisons: "For each element of the result_le, the \
+ value of each bit is 1 if the corresponding elements of ARG1 and \
+ ARG2 are equal." {or whatever the

PATCH 10/11] rs6000, add test cases for __builtin_vec_init* and, __builtin_vec_set*

2024-02-20 Thread Carl Love
GCC maintainers:

The patch adds test cases for the __builtin_vec_init* and __builtin_vec_set* 
built-ins.

The patch has been tested on Power 10 with no regressions.

Please let me know if this patch is acceptable for mainline.  Thanks.

  Carl 


rs6000, add test cases for __builtin_vec_init* and __builtin_vec_set*

Add test cases for the following built-ins:

__builtin_vec_init_v1ti
__builtin_vec_init_v2df
__builtin_vec_init_v2di
__builtin_vec_set_v1ti
__builtin_vec_set_v2df
__builtin_vec_set_v2di

Note, the above built-ins are documented in extend.texi.

gcc/testsuite/ChangeLog:
* gcc.target/powerpc/vsx-builtin-21.c: New test file.
---
 .../gcc.target/powerpc/vsx-builtin-21.c   | 181 ++
 1 file changed, 181 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-builtin-21.c

diff --git a/gcc/testsuite/gcc.target/powerpc/vsx-builtin-21.c 
b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-21.c
new file mode 100644
index 000..b7e1201f37e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-21.c
@@ -0,0 +1,181 @@
+/* { dg-do run { target int128 } } */
+/* { dg-require-effective-target vsx_hw } */
+/* { dg-options "-mvsx" } */
+
+/* This test should run the same on any target that supports vsx
+   instructions.  Intentionally not specifying cpu in order to test
+   all code generation paths.  */
+
+#define DEBUG 0
+
+#include 
+
+#if DEBUG
+#include 
+#include 
+
+void print_i128 (__int128_t val)
+{
+  printf(" %lld %llu (0x%llx %llx)",
+(signed long long)(val >> 64),
+(unsigned long long)(val & 0x),
+(unsigned long long)(val >> 64),
+(unsigned long long)(val & 0x));
+}
+#endif
+
+void abort (void);
+
+void test_vec_init_v1ti (__int128_t ti_arg,
+vector __int128_t v1ti_expected_result)
+{
+  vector __int128_t v1ti_result;
+
+  v1ti_result = __builtin_vec_init_v1ti (ti_arg);
+  if (v1ti_result[0] != v1ti_expected_result[0])
+{
+#if DEBUG
+   printf ("test_vec_init_v1ti: v1ti_result[0] = ");
+   print_i128 (v1ti_result[0]);
+   printf( "vf_expected_result[0] = ");
+   print_i128 (v1ti_expected_result[0]);
+   printf("\n");
+#else
+   abort();
+#endif
+}
+}
+
+void test_vec_init_v2df (double d_arg1, double d_arg2,
+vector double v2df_expected_result)
+{
+  vector double v2df_result;
+  int i;
+
+  v2df_result = __builtin_vec_init_v2df (d_arg1, d_arg2);
+
+  for ( i= 0; i < 2; i++)
+if (v2df_result[i] != v2df_expected_result[i])
+#if DEBUG
+  printf ("test_vec_init_v2df: v2df_result[%d] = %f, 
v2df_expected_result[%d] = %f\n",
+ i, v2df_result[i], i, v2df_expected_result[i]);
+#else
+   abort();
+#endif
+}
+
+void test_vec_init_v2di (signed long long sl_arg1, signed long long sl_arg2,
+vector signed long long v2di_expected_result)
+{
+  vector signed long long v2di_result;
+  int i;
+
+  v2di_result = __builtin_vec_init_v2di (sl_arg1, sl_arg2);
+
+  for ( i= 0; i < 2; i++)
+if (v2di_result[i] != v2di_expected_result[i])
+#if DEBUG
+  printf ("test_vec_init_v2di: v2di_result[%d] = %lld, 
v2df_expected_result[%d] = %lld\n",
+ i, v2di_result[i], i, v2di_expected_result[i]);
+#else
+   abort();
+#endif
+}
+
+void test_vec_set_v1ti (vector __int128_t v1ti_arg, __int128_t ti_arg,
+   vector __int128_t v1ti_expected_result)
+{
+  vector __int128_t v1ti_result;
+
+  v1ti_result = __builtin_vec_set_v1ti (v1ti_arg, ti_arg, 0);
+  if (v1ti_result[0] != v1ti_expected_result[0])
+{
+#if DEBUG
+   printf ("test_vec_set_v1ti: v1ti_result[0] = ");
+   print_i128 (v1ti_result[0]);
+   printf( "vf_expected_result[0] = ");
+   print_i128 (v1ti_expected_result[0]);
+   printf("\n");
+#else
+   abort();
+#endif
+}
+}
+
+void test_vec_set_v2df (vector double v2df_arg, double d_arg,
+   vector double v2df_expected_result)
+{
+  vector double v2df_result;
+  int i;
+
+  v2df_result = __builtin_vec_set_v2df (v2df_arg, d_arg, 0);
+
+  for ( i= 0; i < 2; i++)
+if (v2df_result[i] != v2df_expected_result[i])
+#if DEBUG
+  printf ("test_vec_set_v2df: v2df_result[%d] = %f, 
v2df_expected_result[%d] = %f\n",
+ i, v2df_result[i], i, v2df_expected_result[i]);
+#else
+   abort();
+#endif
+}
+
+void test_vec_set_v2di (vector signed long long v2di_arg, signed long long 
sl_arg,
+   vector signed long long v2di_expected_result)
+{
+  vector signed long long v2di_result;
+  int i;
+
+  v2di_result = __builtin_vec_set_v2di (v2di_arg, sl_arg, 1);
+
+  for ( i= 0; i < 2; i++)
+if (v2di_result[i] != v2di_expected_result[i])
+#if DEBUG
+  printf ("test_vec_set_v2di: v2di_result[%d] = %lld, 
v2df_expected_result[%d] = %lld\n",
+ i, v2di_result[i], i, v2di_expected_result[

PATCH 11/11] rs6000, make test vec-cmpne.c a runnable test

2024-02-20 Thread Carl Love
 GCC maintainers:

The patch changes the  vec-cmpne.c from a compile only test to a runnable test. 
 The macros to create the functions needed to test the built-ins and verify the 
restults are all there in the include file.  The .c file just needed to have 
the macro definitions inserted and change the header from compile to run.  The 
test can now do functional verification of the results in addition to verifying 
the expected instructions are generated.

The patch has been tested on Power 10 with no regressions.

Please let me know if this patch is acceptable for mainline.  Thanks.

  Carl 

rs6000, make test vec-cmpne.c a runnable test

The macros in vec-cmpne.h define test functions.  They also setup
test value functions, verification functions and execute test functions.
The test is setup as a compile only test so none of the verification and
execute functions are being used.

The patch adds the macro definitions to create the intialization,
verfiy and execute functions to a main program so not only can the
test verify the correct instructions are generated but also run the
tests and verify the results.  The test is then changed from a compile
to a run test.

gcc/testsuite/ChangeLog:
* gcc.target/powerpc/vec-cmple.c (main): Add main function with
macro calls to define the test functions, create the verify
functions and execute functions.
Update scan-assembler-times (vcmpequ): Updated count to include
instructions used to generate expected test results.
* gcc.target/powerpc/vec-cmple.h (vector_tests_##NAME): Remove
line continuation after closing bracket.  Remove extra blank line.
---
 gcc/testsuite/gcc.target/powerpc/vec-cmpne.c | 41 +++-
 gcc/testsuite/gcc.target/powerpc/vec-cmpne.h |  3 +-
 2 files changed, 32 insertions(+), 12 deletions(-)

diff --git a/gcc/testsuite/gcc.target/powerpc/vec-cmpne.c 
b/gcc/testsuite/gcc.target/powerpc/vec-cmpne.c
index b57e0ac8638..2c369976a44 100644
--- a/gcc/testsuite/gcc.target/powerpc/vec-cmpne.c
+++ b/gcc/testsuite/gcc.target/powerpc/vec-cmpne.c
@@ -1,20 +1,41 @@
-/* { dg-do compile } */
+/* { dg-do run } */
 /* { dg-require-effective-target powerpc_altivec_ok } */
-/* { dg-options "-maltivec -O2" } */
+/* { dg-options "-maltivec -O2 -save-temps" } */
 
 /* Test that the vec_cmpne builtin generates the expected Altivec
instructions.  */
 
 #include "vec-cmpne.h"
 
-define_test_functions (int, signed int, signed int, si);
-define_test_functions (int, unsigned int, unsigned int, ui);
-define_test_functions (short, signed short, signed short, ss);
-define_test_functions (short, unsigned short, unsigned short, us);
-define_test_functions (char, signed char, signed char, sc);
-define_test_functions (char, unsigned char, unsigned char, uc);
-define_test_functions (int, signed int, float, ff);
+int main ()
+{
+  define_test_functions (int, signed int, signed int, si);
+  define_test_functions (int, unsigned int, unsigned int, ui);
+  define_test_functions (short, signed short, signed short, ss);
+  define_test_functions (short, unsigned short, unsigned short, us);
+  define_test_functions (char, signed char, signed char, sc);
+  define_test_functions (char, unsigned char, unsigned char, uc);
+  define_test_functions (int, signed int, float, ff);
+
+  define_init_verify_functions (int, signed int, signed int, si);
+  define_init_verify_functions (int, unsigned int, unsigned int, ui);
+  define_init_verify_functions (short, signed short, signed short, ss);
+  define_init_verify_functions (short, unsigned short, unsigned short, us);
+  define_init_verify_functions (char, signed char, signed char, sc);
+  define_init_verify_functions (char, unsigned char, unsigned char, uc);
+  define_init_verify_functions (int, signed int, float, ff);
+
+  execute_test_functions (int, signed int, signed int, si);
+  execute_test_functions (int, unsigned int, unsigned int, ui);
+  execute_test_functions (short, signed short, signed short, ss);
+  execute_test_functions (short, unsigned short, unsigned short, us);
+  execute_test_functions (char, signed char, signed char, sc);
+  execute_test_functions (char, unsigned char, unsigned char, uc);
+  execute_test_functions (int, signed int, float, ff);
+
+  return 0;
+}
 
 /* { dg-final { scan-assembler-times {\mvcmpequb\M}  2 } } */
 /* { dg-final { scan-assembler-times {\mvcmpequh\M}  2 } } */
-/* { dg-final { scan-assembler-times {\mvcmpequw\M}  2 } } */
+/* { dg-final { scan-assembler-times {\mvcmpequw\M}  32 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-cmpne.h 
b/gcc/testsuite/gcc.target/powerpc/vec-cmpne.h
index a304de01d86..374cca360b3 100644
--- a/gcc/testsuite/gcc.target/powerpc/vec-cmpne.h
+++ b/gcc/testsuite/gcc.target/powerpc/vec-cmpne.h
@@ -33,7 +33,7 @@ __attribute__((noinline)) void vector_tests_##NAME () \
   tmp_##NAME = vec_cmpne (v1_##NAME, v2_##NAME); \
 

Re: [PATCH 01/11] rs6000, Fix __builtin_vsx_cmple* args and documentation, builtins

2024-02-28 Thread Carl Love
Kewen:

Thanks for the review.  From the review, it looks like a few of the built-ins 
just need to be replaced with an overloaded version of an existing PVPIR 
documented buit-in.  Most of the rest can just be removed.  I will work on 
redoing the patch set accordingly.  We can then look at the new patch set after 
stage 4 is over.

   Carl 

On 2/20/24 09:55, Carl Love wrote:
> 
> GCC maintainers:
> 
> This patch fixes the arguments and return type for the various 
> __builtin_vsx_cmple* built-ins.  They were defined as signed but should have 
> been defined as unsigned.
> 
> The patch has been tested on Power 10 with no regressions.
> 
> Please let me know if this patch is acceptable for mainline.  Thanks.
> 
>   Carl 
> 
> -
> 
> rs6000, Fix __builtin_vsx_cmple* args and documentation, builtins
> 
> The built-ins __builtin_vsx_cmple_u16qi, __builtin_vsx_cmple_u2di,
> __builtin_vsx_cmple_u4si and __builtin_vsx_cmple_u8hi should take
> unsigned arguments and return an unsigned result.  This patch changes
> the arguments and return type from signed to unsigned.
> 
> The documentation for the signed and unsigned versions of
> __builtin_vsx_cmple is missing from extend.texi.  This patch adds the
> missing documentation.
> 
> Test cases are added for each of the signed and unsigned built-ins.
> 
> gcc/ChangeLog:
>   * config/rs6000/rs6000-builtins.def (__builtin_vsx_cmple_u16qi,
>   __builtin_vsx_cmple_u2di, __builtin_vsx_cmple_u4si): Change
>   arguments and return from signed to unsigned.
>   * doc/extend.texi (__builtin_vsx_cmple_16qi,
>   __builtin_vsx_cmple_8hi, __builtin_vsx_cmple_4si,
>   __builtin_vsx_cmple_u16qi, __builtin_vsx_cmple_u8hi,
>   __builtin_vsx_cmple_u4si): Add documentation.
> 
> gcc/testsuite/ChangeLog:
>   * gcc.target/powerpc/vsx-cmple.c: New test file.
> ---
>  gcc/config/rs6000/rs6000-builtins.def|  10 +-
>  gcc/doc/extend.texi  |  23 
>  gcc/testsuite/gcc.target/powerpc/vsx-cmple.c | 127 +++
>  3 files changed, 155 insertions(+), 5 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-cmple.c
> 
> diff --git a/gcc/config/rs6000/rs6000-builtins.def 
> b/gcc/config/rs6000/rs6000-builtins.def
> index 3bc7fed6956..d66a53a0fab 100644
> --- a/gcc/config/rs6000/rs6000-builtins.def
> +++ b/gcc/config/rs6000/rs6000-builtins.def
> @@ -1349,16 +1349,16 @@
>const vss __builtin_vsx_cmple_8hi (vss, vss);
>  CMPLE_8HI vector_ngtv8hi {}
>  
> -  const vsc __builtin_vsx_cmple_u16qi (vsc, vsc);
> +  const vuc __builtin_vsx_cmple_u16qi (vuc, vuc);
>  CMPLE_U16QI vector_ngtuv16qi {}
>  
> -  const vsll __builtin_vsx_cmple_u2di (vsll, vsll);
> +  const vull __builtin_vsx_cmple_u2di (vull, vull);
>  CMPLE_U2DI vector_ngtuv2di {}
>  
> -  const vsi __builtin_vsx_cmple_u4si (vsi, vsi);
> +  const vui __builtin_vsx_cmple_u4si (vui, vui);
>  CMPLE_U4SI vector_ngtuv4si {}
>  
> -  const vss __builtin_vsx_cmple_u8hi (vss, vss);
> +  const vus __builtin_vsx_cmple_u8hi (vus, vus);
>  CMPLE_U8HI vector_ngtuv8hi {}
>  
>const vd __builtin_vsx_concat_2df (double, double);
> @@ -1769,7 +1769,7 @@
>const vf __builtin_vsx_xvcvuxdsp (vull);
>  XVCVUXDSP vsx_xvcvuxdsp {}
>  
> -  const vd __builtin_vsx_xvcvuxwdp (vsi);
> +  const vd __builtin_vsx_xvcvuxwdp (vui);
>  XVCVUXWDP vsx_xvcvuxwdp {}
>  
>const vf __builtin_vsx_xvcvuxwsp (vsi);
> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> index 2b8ba1949bf..4d8610f6aa8 100644
> --- a/gcc/doc/extend.texi
> +++ b/gcc/doc/extend.texi
> @@ -22522,6 +22522,29 @@ if the VSX instruction set is available.  The 
> @samp{vec_vsx_ld} and
>  @samp{vec_vsx_st} built-in functions always generate the VSX @samp{LXVD2X},
>  @samp{LXVW4X}, @samp{STXVD2X}, and @samp{STXVW4X} instructions.
>  
> +
> +@smallexample
> +vector signed char __builtin_vsx_cmple_16qi (vector signed char,
> + vector signed char);
> +vector signed short __builtin_vsx_cmple_8hi (vector signed short,
> + vector signed short);
> +vector signed int __builtin_vsx_cmple_4si (vector signed int,
> + vector signed int);
> +vector unsigned char __builtin_vsx_cmple_u16qi (vector unsigned char,
> +vector unsigned char);
> +vector unsigned short __builtin_vsx_cmple_u8hi (vector unsigned short,
> +vector unsigned short);
> +vector unsigned i

Re: [PATCH] rs6000, update vec_ld, vec_lde, vec_st and vec_ste, documentation

2024-07-03 Thread Carl Love



On 7/3/24 2:36 AM, Kewen.Lin wrote:

Hi Carl,

on 2024/6/27 01:05, Carl Love wrote:

GCC maintainers:

The following patch updates the user documentation for the vec_ld, vec_lde, 
vec_st and vec_ste built-ins to make it clearer that there are data alignment 
requirements for these built-ins.  If the data alignment requirements are not 
followed, the data loaded or stored by these built-ins will be wrong.

Please let me know if this patch is acceptable for mainline.  Thanks.

   Carl


rs6000, update vec_ld, vec_lde, vec_st and vec_ste documentation

Use of the vec_ld and vec_st built-ins require that the data be 16-byte
aligned to work properly.  Add some additional text to the existing
documentation to make this clearer to the user.

Similarly, the vec_lde and vec_ste built-ins also have data alignment
requirements based on the size of the vector element.  Update the
documentation to make this clear to the user.

gcc/ChangeLog:
* doc/extend.texi: Add clarification for the use of the vec_ld
vec_st, vec_lde and vec_ste built-ins.
---
  gcc/doc/extend.texi | 15 +++
  1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index ee3644a5264..55faded17b9 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -22644,10 +22644,17 @@ vector unsigned char vec_xxsldi (vector unsigned char,
  @end smallexample
  
  Note that the @samp{vec_ld} and @samp{vec_st} built-in functions always

-generate the AltiVec @samp{LVX} and @samp{STVX} instructions even
-if the VSX instruction set is available.  The @samp{vec_vsx_ld} and
-@samp{vec_vsx_st} built-in functions always generate the VSX @samp{LXVD2X},
-@samp{LXVW4X}, @samp{STXVD2X}, and @samp{STXVW4X} instructions.
+generate the AltiVec @samp{LVX}, and @samp{STVX} instructions.  The

This change removed "even if the VSX instruction set is available.", I think 
it's
not intentional?  vec_ld and vec_st are well defined in PVIPR, this paragraph is
not to document them IMHO.  Since we document vec_vsx_ld and vec_vsx_st here, it
aims to note the difference between these two pairs.  But I'm not opposed to add
more words to emphasis the special masking off, I prefer to use the same words 
to
PVIPR "ignoring the four low-order bits of the calculated address".  And IMHO we
should not say "it requires the data to be 16-byte aligned to work properly" in
case the users are aware of this behavior well and have some no 16-byte aligned
data and expect it to behave like that, it's arguable to define "it" as not work
properly.


Yea, probably should have left "even if the VSX instruction set is 
available."


I was looking to make it clear that if the data is not 16-bye aligned 
you may not get the expected data loaded/stored.


So how about the following instead:

   Note that the @samp{vec_ld} and @samp{vec_st} built-in functions always
   generate the AltiVec @samp{LVX}, and @samp{STVX} instructions even
   if the VSX
   instruction set is available. The instructions mask off the lower
   4-bits of
   the calculated address. The use of these instructions on data that
   is not
   16-byte aligned may result in unexpected bytes being loaded or stored.


+instructions mask off the lower 4 bits of the effective address thus requiring
+the data to be 16-byte aligned to work properly.  The @samp{vec_lde} and
+@samp{vec_ste} built-in functions operate on vectors of bytes, short integer,
+integer, and float.  The corresponding AltiVec instructions @samp{LVEBX},
+@samp{LVEHX}, @samp{LVEWX}, @samp{STVEBX}, @samp{STVEHX}, @samp{STVEWX} mask
+off the lower bits of the effective address based on the size of the data.
+Thus the data must be aligned to the size of the vector element to work
+properly.  The @samp{vec_vsx_ld} and @samp{vec_vsx_st} built-in functions
+always generate the VSX @samp{LXVD2X}, @samp{LXVW4X}, @samp{STXVD2X}, and
+@samp{STXVW4X} instructions.

As above, there was a reason to mention vec_ld and vec_st here, but not one for
vec_lde and vec_ste IMHO, so let's not mention vec_lde and vec_ste here and 
users
should read the description in PVIPR instead (it's more recommended).


The goal of mentioning the vec_lde and vec_ste built-ins was to give the 
user a pointer to built-ins that will work as expected on unaligned 
data.  It will probably save them a lot of time an frustration if they 
are given a hint of what built-ins they should look at.  So, how about 
the following:


   See the PVIPR description of the vec_lde and vec_ste for loading and
   storing
   data that is not 16-byte aligned.

   Carl


[PATCH 0/13 ver5] rs6000, built-in cleanup patch series

2024-07-03 Thread Carl Love

GCC maintainers:

The following is the updates to the three patches that have yet to be approved.

Patches 1, 3, 5, 6, 8, 9, 10, and 12 were approved in the version 3 or earlier.

Patches 7 and 11 from version 4 were approved with minor nits fixed.

This leaves patches 2, 4 and 13 still to be approved. Only these unapproved 
patches are posted in the version 5 series.

The goal is to commit the entire series all at once as they are all related.  
So I a holding off committing the approved patches.

Thank you for your time and feedback of these patches.  The entire patch series 
has been tested on Power 10 LE as the changes are fairly minor.

Please let me know if the remaining patches are acceptable for mainline.  
Thanks.

 Carl



Re: [PATCH 2/13 ver5] rs6000, __builtin_vsx_xvcv{sp{sx,u}ws,dpuxds_uns}

2024-07-03 Thread Carl Love

GCC maintainers:

Per the comments on patch 2 from version 4, I have moved the removal of 
built-ins __builtin_vsx_xvcvdpsxws and __builtin_vsx_xvcvdpuxws from patch 4 to 
this patch.

Please let me know if this patch is acceptable.  Thanks.

Carl



rs6000, __builtin_vsx_xvcv{sp{sx,u}ws,dpuxds_uns}

The built-in __builtin_vsx_xvcvspsxws is covered by built-in vec_signed
built-in that is documented in the PVIPR.  The __builtin_vsx_xvcvspsxws
built-in is not documented and there are no test cases for it.

The built-in __builtin_vsx_xvcvdpuxds_uns is redundant as it is covered by
vec_unsigned, remove.

The __builtin_vsx_xvcvspuxws is redundant as it is covered by
vec_unsigned, remove.

The built-in __builtin_vsx_xvcvdpsxws is redundant as it is covered by
vec_signed{e,o}, remove.

The built-in __builtin_vsx_xvcvdpuxws is redundant as it is covered by
vec_unsigned{e,o}, remove.

This patch removes the redundant built-ins.

gcc/ChangeLog:
    * config/rs6000/rs6000-builtins.def (__builtin_vsx_xvcvspsxws,
    __builtin_vsx_xvcvdpuxds_uns, __builtin_vsx_xvcvspuxws,
    __builtin_vsx_xvcvdpsxws, __builtin_vsx_xvcvdpuxws): Remove
    built-in definitions.
---
 gcc/config/rs6000/rs6000-builtins.def | 15 ---
 1 file changed, 15 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def

index 7c36976a089..60ccc5542be 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -1688,36 +1688,21 @@
   const vsll __builtin_vsx_xvcvdpsxds_scale (vd, const int);
 XVCVDPSXDS_SCALE vsx_xvcvdpsxds_scale {}

-  const vsi __builtin_vsx_xvcvdpsxws (vd);
-    XVCVDPSXWS vsx_xvcvdpsxws {}
-
   const vsll __builtin_vsx_xvcvdpuxds (vd);
 XVCVDPUXDS vsx_fixuns_truncv2dfv2di2 {}

   const vsll __builtin_vsx_xvcvdpuxds_scale (vd, const int);
 XVCVDPUXDS_SCALE vsx_xvcvdpuxds_scale {}

-  const vull __builtin_vsx_xvcvdpuxds_uns (vd);
-    XVCVDPUXDS_UNS vsx_fixuns_truncv2dfv2di2 {}
-
-  const vsi __builtin_vsx_xvcvdpuxws (vd);
-    XVCVDPUXWS vsx_xvcvdpuxws {}
-
   const vd __builtin_vsx_xvcvspdp (vf);
 XVCVSPDP vsx_xvcvspdp {}

   const vsll __builtin_vsx_xvcvspsxds (vf);
 XVCVSPSXDS vsx_xvcvspsxds {}

-  const vsi __builtin_vsx_xvcvspsxws (vf);
-    XVCVSPSXWS vsx_fix_truncv4sfv4si2 {}
-
   const vsll __builtin_vsx_xvcvspuxds (vf);
 XVCVSPUXDS vsx_xvcvspuxds {}

-  const vsi __builtin_vsx_xvcvspuxws (vf);
-    XVCVSPUXWS vsx_fixuns_truncv4sfv4si2 {}
-
   const vd __builtin_vsx_xvcvsxddp (vsll);
 XVCVSXDDP vsx_floatv2div2df2 {}

--
2.45.0




Re: [PATCH 4/13 ver5] rs6000, extend the current vec_{un, }signed{e, o} built-ins

2024-07-03 Thread Carl Love



GCC maintainers:

I moved the removal of built-ins __builtin_vsx_xvcvdpsxws and 
__builtin_vsx_xvcvdpuxws from patch 4 to  patch patch 2.


I fixed various issues with the ChangeLog wording, spaces and descriptions.

Fixed the comments in file gcc/config/rs6000/vsx.md.

Updated the built-in description in gcc/doc/extend.texi.

Please let me know if the patch is acceptable for mainline. Thanks.

Carl



 rs6000, extend the current vec_{un,}signed{e,o}  built-ins

The built-ins __builtin_vsx_xvcvspsxds and __builtin_vsx_xvcvspuxds
convert a vector of floats to a vector of signed/unsigned long long ints.
Extend the existing vec_{un,}signed{e,o} built-ins to handle the argument
vector of floats to return a vector of even/odd signed/unsigned integers.

The define expands vsignede_v4sf, vsignedo_v4sf, vunsignede_v4sf,
vunsignedo_v4sf are added to support the new vec_{un,}signed{e,o}
built-ins.

The built-ins __builtin_vsx_xvcvspsxds and __builtin_vsx_xvcvspuxds are
now for internal use only. They are not documented and they do not
have test cases.

Add testcases and update documentation.

gcc/ChangeLog:
    (__builtin_vsx_xvcvspsxds, __builtin_vsx_xvcvspuxds): Rename to
    __builtin_vsignede_v4sf, __builtin_vunsignede_v4sf respectively.
    (XVCVSPSXDS, XVCVSPUXDS): Rename to VEC_VSIGNEDE_V4SF,
    VEC_VUNSIGNEDE_V4SF respectively.
    (__builtin_vsignedo_v4sf, __builtin_vunsignedo_v4sf): New
    built-in definitions.
    * config/rs6000/rs6000-overload.def (vec_signede, vec_signedo,
    vec_unsignede, vec_unsignedo): Add new overloaded specifications.
    * config/rs6000/vsx.md (vsignede_v4sf, vsignedo_v4sf,
    vunsignede_v4sf, vunsignedo_v4sf): New define_expands.
    * doc/extend.texi (vec_signedo, vec_signede, vec_unsignedo,
    vec_unsignede): Add documentation for new overloaded built-ins to
    convert vector float to vector {un,}signed long long.

gcc/testsuite/ChangeLog:
    * gcc.target/powerpc/builtins-3-runnable.c
    (test_unsigned_int_result, test_ll_unsigned_int_result): Add
    new argument.
    (vec_signede, vec_signedo, vec_unsignede, vec_unsignedo): New
    tests for the overloaded built-ins.
---
 gcc/config/rs6000/rs6000-builtins.def | 14 +++-
 gcc/config/rs6000/rs6000-overload.def |  8 ++
 gcc/config/rs6000/vsx.md  | 84 +++
 gcc/doc/extend.texi   | 10 +++
 .../gcc.target/powerpc/builtins-3-runnable.c  | 49 +--
 5 files changed, 154 insertions(+), 11 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def

index 43d5c229dc3..29a9deb3410 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -1697,11 +1697,17 @@
   const vd __builtin_vsx_xvcvspdp (vf);
 XVCVSPDP vsx_xvcvspdp {}

-  const vsll __builtin_vsx_xvcvspsxds (vf);
-    XVCVSPSXDS vsx_xvcvspsxds {}
+  const vsll __builtin_vsignede_v4sf (vf);
+    VEC_VSIGNEDE_V4SF vsignede_v4sf {}

-  const vsll __builtin_vsx_xvcvspuxds (vf);
-    XVCVSPUXDS vsx_xvcvspuxds {}
+  const vsll __builtin_vsignedo_v4sf (vf);
+    VEC_VSIGNEDO_V4SF vsignedo_v4sf {}
+
+  const vull __builtin_vunsignede_v4sf (vf);
+    VEC_VUNSIGNEDE_V4SF vunsignede_v4sf {}
+
+  const vull __builtin_vunsignedo_v4sf (vf);
+    VEC_VUNSIGNEDO_V4SF vunsignedo_v4sf {}

   const vd __builtin_vsx_xvcvsxddp (vsll);
 XVCVSXDDP vsx_floatv2div2df2 {}
diff --git a/gcc/config/rs6000/rs6000-overload.def 
b/gcc/config/rs6000/rs6000-overload.def

index 84bd9ae6554..4d857bb1af3 100644
--- a/gcc/config/rs6000/rs6000-overload.def
+++ b/gcc/config/rs6000/rs6000-overload.def
@@ -3307,10 +3307,14 @@
 [VEC_SIGNEDE, vec_signede, __builtin_vec_vsignede]
   vsi __builtin_vec_vsignede (vd);
 VEC_VSIGNEDE_V2DF
+  vsll __builtin_vec_vsignede (vf);
+    VEC_VSIGNEDE_V4SF

 [VEC_SIGNEDO, vec_signedo, __builtin_vec_vsignedo]
   vsi __builtin_vec_vsignedo (vd);
 VEC_VSIGNEDO_V2DF
+  vsll __builtin_vec_vsignedo (vf);
+    VEC_VSIGNEDO_V4SF

 [VEC_SIGNEXTI, vec_signexti, __builtin_vec_signexti]
   vsi __builtin_vec_signexti (vsc);
@@ -4433,10 +4437,14 @@
 [VEC_UNSIGNEDE, vec_unsignede, __builtin_vec_vunsignede]
   vui __builtin_vec_vunsignede (vd);
 VEC_VUNSIGNEDE_V2DF
+  vull __builtin_vec_vunsignede (vf);
+    VEC_VUNSIGNEDE_V4SF

 [VEC_UNSIGNEDO, vec_unsignedo, __builtin_vec_vunsignedo]
   vui __builtin_vec_vunsignedo (vd);
 VEC_VUNSIGNEDO_V2DF
+  vull __builtin_vec_vunsignedo (vf);
+    VEC_VUNSIGNEDO_V4SF

 [VEC_VEE, vec_extract_exp, __builtin_vec_extract_exp]
   vui __builtin_vec_extract_exp (vf);
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index 48ba262f7e4..0f0837a1d43 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -2704,6 +2704,90 @@
   DONE;
 })

+;; Convert float vector even elements to signed long long vector
+(define_expand "vsignede_v4sf"
+  [(match_operand:V2DI 0 "vsx_register_operand")
+   (match_

Re: [PATCH 13/13 ver5] rs6000, remove vector set and vector init built-ins.

2024-07-03 Thread Carl Love

 GCC maintainers:

The patch has been updated to remove the customized vec_init built-in 
code.  Specfivically the init identifier, the related generated code for 
the init built-in attribute bit, function 
altivec_expand_vec_init_builtin and calls to the function.


Please let me know if the patch is acceptable for mainline. Thanks.

  Carl

---

rs6000, remove vector set and vector init built-ins.

The vector init built-ins:

  __builtin_vec_init_v16qi, __builtin_vec_init_v8hi,
  __builtin_vec_init_v4si, __builtin_vec_init_v4sf,
  __builtin_vec_init_v2di, __builtin_vec_init_v2df,
  __builtin_vec_init_v1ti

perform the same operation as initializing the vector in C code. For
example:

  result_v4si = __builtin_vec_init_v4si (1, 2, 3, 4);
  result_v4si = {1, 2, 3, 4};

These two constructs were tested and verified they generate identical
assembly instructions with no optimization and -O3 optimization.

The vector set built-ins:

  __builtin_vec_set_v16qi, __builtin_vec_set_v8hi.
  __builtin_vec_set_v4si, __builtin_vec_set_v4sf,
  __builtin_vec_set_v1ti, __builtin_vec_set_v2di,
  __builtin_vec_set_v2df

perform the same operation as setting a specific element in the vector in
C code.  For example:

  src_v4si = __builtin_vec_set_v4si (src_v4si, int_val, index);
  src_v4si[index] = int_val;

The built-in actually generates more instructions than the inline C code
with no optimization but is identical with -O3 optimizations.

All of the above built-ins that are removed do not have test cases and
are not documented.

Built-ins   __builtin_vec_set_v1ti __builtin_vec_set_v2di,
__builtin_vec_set_v2df are not removed as they are used in function
resolve_vec_insert() in file rs6000-c.cc.

The built-ins are removed as they don't provide any benefit over just
using C code.

The code to define the bif_init_bit, bif_is_init, as well as their uses
is removed.  The function altivec_expand_vec_init_builtin is also removed.

gcc/ChangeLog:
    * config/rs6000/rs6000-builtin.cc (altivec_expand_vec_init_builtin):
    Removed the function.
    (rs6000_expand_builtin): Removed the if bif_is_int check to call
    the altivec_expand_vec_init_builtin function.
    * config/rs6000/rs6000-builtins.def: Removed the attribute string
    comment for init.
    (__builtin_vec_init_v16qi,
    __builtin_vec_init_v4sf, __builtin_vec_init_v4si,
    __builtin_vec_init_v8hi, __builtin_vec_init_v1ti,
    __builtin_vec_init_v2df, __builtin_vec_init_v2di,
    __builtin_vec_set_v16qi, __builtin_vec_set_v4sf,
    __builtin_vec_set_v4si, __builtin_vec_set_v8hi): Remove
    built-in definitions.
    * config/rs6000-gen-builtins.cc: Removed comment for init attribute
    string.
    (struct attrinfo): Removed isint entry.
    (parse_bif_attrs): Removed the if statement to check for attribute
    init.
    (ifdef DEBUG): Removed print for init attribute string.
    (write_decls): Removed print for define bif_init_bit and
    define for bif_is_init.
    (write_bif_static_init): Removed if bifp->attrs.isinit statement.
---
 gcc/config/rs6000/rs6000-builtin.cc  | 40 -
 gcc/config/rs6000/rs6000-builtins.def    | 45 +++-
 gcc/config/rs6000/rs6000-gen-builtins.cc | 16 +++--
 3 files changed, 8 insertions(+), 93 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-builtin.cc 
b/gcc/config/rs6000/rs6000-builtin.cc

index 646e740774e..0a24d20a58c 100644
--- a/gcc/config/rs6000/rs6000-builtin.cc
+++ b/gcc/config/rs6000/rs6000-builtin.cc
@@ -2313,43 +2313,6 @@ altivec_expand_predicate_builtin (enum insn_code 
icode, tree exp, rtx target)

   return target;
 }

-/* Expand vec_init builtin.  */
-static rtx
-altivec_expand_vec_init_builtin (tree type, tree exp, rtx target)
-{
-  machine_mode tmode = TYPE_MODE (type);
-  machine_mode inner_mode = GET_MODE_INNER (tmode);
-  int i, n_elt = GET_MODE_NUNITS (tmode);
-
-  gcc_assert (VECTOR_MODE_P (tmode));
-  gcc_assert (n_elt == call_expr_nargs (exp));
-
-  if (!target || !register_operand (target, tmode))
-    target = gen_reg_rtx (tmode);
-
-  /* If we have a vector compromised of a single element, such as 
V1TImode, do

- the initialization directly.  */
-  if (n_elt == 1 && GET_MODE_SIZE (tmode) == GET_MODE_SIZE (inner_mode))
-    {
-  rtx x = expand_normal (CALL_EXPR_ARG (exp, 0));
-  emit_move_insn (target, gen_lowpart (tmode, x));
-    }
-  else
-    {
-  rtvec v = rtvec_alloc (n_elt);
-
-  for (i = 0; i < n_elt; ++i)
-    {
-      rtx x = expand_normal (CALL_EXPR_ARG (exp, i));
-      RTVEC_ELT (v, i) = gen_lowpart (inner_mode, x);
-    }
-
-  rs6000_expand_vector_init (target, gen_rtx_PARALLEL (tmode, v));
-    }
-
-  return target;
-}
-
 /* Return the integer constant in ARG.  Constrain it to be in the range
    of the subparts of VEC_TYPE; issue an error if not.  */

@@ -3401,9 +3364,6 @@ rs6000_expand_builtin (tree exp, rtx target, rtx 
/

Re: [PATCH 13/13 ver4] rs6000, remove vector set and vector init built-ins

2024-07-03 Thread Carl Love

Kewen:

On 6/18/24 20:04, Kewen.Lin wrote:


Hi Carl,

on 2024/6/14 03:40, Carl Love wrote:

GCC maintainers:

The patch has been updated per the feedback from version 3.  Please let me know 
it the patch is acceptable for mainline.

Thanks.

   Carl

--

rs6000, remove vector set and vector init built-ins

The vector init built-ins:

   __builtin_vec_init_v16qi, __builtin_vec_init_v8hi,
   __builtin_vec_init_v4si, __builtin_vec_init_v4sf,
   __builtin_vec_init_v2di, __builtin_vec_init_v2df,
   __builtin_vec_init_v1ti

perform the same operation as initializing the vector in C code.  For
example:

   result_v4si = __builtin_vec_init_v4si (1, 2, 3, 4);
   result_v4si = {1, 2, 3, 4};

These two constructs were tested and verified they generate identical
assembly instructions with no optimization and -O3 optimization.

The vector set built-ins:

   __builtin_vec_set_v16qi, __builtin_vec_set_v8hi.
   __builtin_vec_set_v4si, __builtin_vec_set_v4sf,
   __builtin_vec_set_v1ti, __builtin_vec_set_v2di,
   __builtin_vec_set_v2df

perform the same operation as setting a specific element in the vector in
C code.  For example:

   src_v4si = __builtin_vec_set_v4si (src_v4si, int_val, index);
   src_v4si[index] = int_val;

The built-in actually generates more instructions than the inline C code
with no optimization but is identical with -O3 optimizations.

All of the above built-ins that are removed do not have test cases and
are not documented.

Built-ins   __builtin_vec_set_v1ti __builtin_vec_set_v2di,
__builtin_vec_set_v2df are not removed as they are used in function
resolve_vec_insert() in file rs6000-c.cc.

The built-ins are removed as they don't provide any benefit over just
using C code.

gcc/ChangeLog:
* config/rs6000/rs6000-builtins.def (__builtin_vec_init_v16qi,
__builtin_vec_init_v4sf, __builtin_vec_init_v4si,
__builtin_vec_init_v8hi, __builtin_vec_init_v1ti,
__builtin_vec_init_v2df, __builtin_vec_init_v2di,
__builtin_vec_set_v16qi, __builtin_vec_set_v4sf,
__builtin_vec_set_v4si, __builtin_vec_set_v8hi): Remove
built-in definitions.
---
  gcc/config/rs6000/rs6000-builtins.def | 44 +++
  1 file changed, 4 insertions(+), 40 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index 02aa04e5698..053dc0115d2 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -1118,37 +1118,6 @@
const signed short __builtin_vec_ext_v8hi (vss, signed int);
  VEC_EXT_V8HI nothing {extract}
  
-  const vsc __builtin_vec_init_v16qi (signed char, signed char, signed char, \

-signed char, signed char, signed char, signed char, signed char, \
-signed char, signed char, signed char, signed char, signed char, \
-signed char, signed char, signed char);
-VEC_INIT_V16QI nothing {init}

I just realized this {init} is customized for vec_init only, these removed 
vec_init
bifs are the only users of it, so we should remove this attribute as well.  
Sorry that
I should have found and pointed out this in the previous review.  I think it 
means
some removals are needed on:

 1) comments in rs6000-builtins.def
;   init Process as a vec_init function

 2) related gen code for this attribute bit, like:

   fprintf (header_file, "#define bif_init_bit\t\t(0x0001)\n");
   fprintf (header_file,
   "#define bif_is_init(x)\t\t((x).bifattrs & bif_init_bit)\n");
   if (bifp->attrs.isinit)
fprintf (init_file, " | bif_init_bit");


OK, Yes, we can remove the attribute string for the vec_init built-in.  In 
addition to the code you mentioned, we will need to remove the uses of 
bif_init_bit, bif_is_init and the function altivec_expand_vec_init_builtin.

  Carl



[PATCH] rs6000, update effective target for tests builtins-10*.c and, vec_perm-runnable-i128.c

2024-07-15 Thread Carl Love

GCC maintainers:

The tests:

  tests builtins-10-runnable.c
  tests builtins-10.c
  vec_perm-runnable-i128.c

generate the following errors when run on a 32-bit BE Power system with 
GCC configured with multilib enabled.


FAIL: gcc.target/powerpc/builtins-10-runnable.c (test for excess errors)
FAIL: gcc.target/powerpc/builtins-10.c (test for excess errors)
FAIL: gcc.target/powerpc/vec_perm-runnable-i128.c (test for excess errors)

The tests use the __int128 type which is not supported on 32-bit 
systems.  The test for int128 and lp64 was added to the test cases to 
disable the test on 32-bit systems and systems that do not support the 
__int128 type.  The three tests now report "# of unsupported tests 1".


The patch has been tested on a Power 9 BE system with multilib enabled 
for GCC and on a Power 10 LE 64-bit configuration with no regression 
failures.


Please let me know if the patch is acceptable for mainline. Thanks.

   Carl

--
rs6000, update effective target for tests builtins-10*.c and 
vec_perm-runnable-i128.c


The tests:

  tests builtins-10-runnable.c
  tests builtins-10.c
  vec_perm-runnable-i128.c

use __int128 types that are not supported on all platforms.  The
__int128 type is only supported on 64-bit platforms.  Need to check that
the platform is 64-bits and support the __int128 type.  Add the int128 and
lp64 flags to the target test.

gcc/testsuite/ChangeLog:
    * gcc.target/powerpc/builtins-10-runnable.c: Add
    target int128 and lp64.
    * gcc.target/powerpc/builtins-10.c: Add
    target int128 and lp64.
    * gcc.target/powerpc/vec_perm-runnable-i128: Add
    target int128 and lp64.
---
 gcc/testsuite/gcc.target/powerpc/builtins-10-runnable.c   | 2 +-
 gcc/testsuite/gcc.target/powerpc/builtins-10.c    | 2 +-
 gcc/testsuite/gcc.target/powerpc/vec_perm-runnable-i128.c | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/testsuite/gcc.target/powerpc/builtins-10-runnable.c 
b/gcc/testsuite/gcc.target/powerpc/builtins-10-runnable.c

index dede08358e1..da3011d4c00 100644
--- a/gcc/testsuite/gcc.target/powerpc/builtins-10-runnable.c
+++ b/gcc/testsuite/gcc.target/powerpc/builtins-10-runnable.c
@@ -1,4 +1,4 @@
-/* { dg-do run } */
+/* { dg-do run { target { lp64 } && { int128 } } } */
 /* { dg-require-effective-target vmx_hw } */
 /* { dg-options "-maltivec -O2 " } */

diff --git a/gcc/testsuite/gcc.target/powerpc/builtins-10.c 
b/gcc/testsuite/gcc.target/powerpc/builtins-10.c

index b00f53cfc62..bc3cdb69305 100644
--- a/gcc/testsuite/gcc.target/powerpc/builtins-10.c
+++ b/gcc/testsuite/gcc.target/powerpc/builtins-10.c
@@ -1,4 +1,4 @@
-/* { dg-do compile } */
+/* { dg-do compile { target { lp64 } && { int128 } } } */
 /* { dg-options "-O2 -maltivec" } */
 /* { dg-require-effective-target powerpc_altivec } */
 /* { dg-final { scan-assembler-times "xxsel" 6 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec_perm-runnable-i128.c 
b/gcc/testsuite/gcc.target/powerpc/vec_perm-runnable-i128.c

index 0e0d77bcb84..c9b8a2053b7 100644
--- a/gcc/testsuite/gcc.target/powerpc/vec_perm-runnable-i128.c
+++ b/gcc/testsuite/gcc.target/powerpc/vec_perm-runnable-i128.c
@@ -1,4 +1,4 @@
-/* { dg-do run } */
+/* { dg-do run { target { lp64 } && { int128 } } } */
 /* { dg-require-effective-target vmx_hw } */
 /* { dg-options "-maltivec -O2 " } */

--
2.45.2




Re: [PATCH] rs6000, update effective target for tests builtins-10*.c and, vec_perm-runnable-i128.c

2024-07-16 Thread Carl Love

Peter:

On 7/15/24 4:14 PM, Peter Bergner wrote:

On 7/15/24 5:43 PM, Carl Love wrote:

-/* { dg-do run } */
+/* { dg-do run { target { lp64 } && { int128 } } } */

Why isn't this just:

   /* { dg-do run { target int128 } } */

???   The int128 test should disable this on 32-bit systems just fine.


I agree it seems like that should work.  I had tried just the int128 
initially but was still getting errors so I added the


{ lp64 } and that fixed it.

That said, I went back and tried dg-do run { target int128 } again on one of 
the files.  Now it seems to work?  Hmm, I guess I must have had a typo or 
something when I first tried it.  I will try fixing the patch for all of the 
test files and retest to see if just int128 works.

Carl



[PATCH ver 2] rs6000, update effective target for tests builtins-10*.c and, vec_perm-runnable-i128.c

2024-07-16 Thread Carl Love



GCC maintainers:

Version 2, removed the lp64 from the target per discussion.  Tested and 
it is not needed.  The int128 qualifier is sufficient for the thest to 
report as unsupported on a 32-bit Power system.


The tests:

  tests builtins-10-runnable.c
  tests builtins-10.c
  vec_perm-runnable-i128.c

generate the following errors when run on a 32-bit BE Power system with 
GCC configured with multilib enabled.


FAIL: gcc.target/powerpc/builtins-10-runnable.c (test for excess errors)
FAIL: gcc.target/powerpc/builtins-10.c (test for excess errors)
FAIL: gcc.target/powerpc/vec_perm-runnable-i128.c (test for excess errors)

The tests use the __int128 type which is not supported on 32-bit 
systems.  The test for int128 and lp64 was added to the test cases to 
disable the test on 32-bit systems and systems that do not support the 
__int128 type.  The three tests now report "# of unsupported tests 1".


The patch has been tested on a Power 9 BE system with multilib enabled 
for GCC and on a Power 10 LE 64-bit configuration with no regression 
failures.


Please let me know if the patch is acceptable for mainline. Thanks.

   Carl



[PATCH] rs6000, update effective target for tests builtins-10*.c and 
vec_perm-runnable-i128.c


The tests:

  tests builtins-10-runnable.c
  tests builtins-10.c
  vec_perm-runnable-i128.c

use __int128 types that are not supported on all platforms.  The
__int128 type is only supported on 64-bit platforms.  Need to check that
the platform is 64-bits and support the __int128 type.  Add the int128 and
lp64 flags to the target test.

gcc/testsuite/ChangeLog:
    * gcc.target/powerpc/builtins-10-runnable.c: Add
    target int128 and lp64.
    * gcc.target/powerpc/builtins-10.c: Add
    target int128 and lp64.
    * gcc.target/powerpc/vec_perm-runnable-i128: Add
    target int128 and lp64.
---
 gcc/testsuite/gcc.target/powerpc/builtins-10-runnable.c   | 2 +-
 gcc/testsuite/gcc.target/powerpc/builtins-10.c    | 2 +-
 gcc/testsuite/gcc.target/powerpc/vec_perm-runnable-i128.c | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/testsuite/gcc.target/powerpc/builtins-10-runnable.c 
b/gcc/testsuite/gcc.target/powerpc/builtins-10-runnable.c

index dede08358e1..e2d3c990852 100644
--- a/gcc/testsuite/gcc.target/powerpc/builtins-10-runnable.c
+++ b/gcc/testsuite/gcc.target/powerpc/builtins-10-runnable.c
@@ -1,4 +1,4 @@
-/* { dg-do run } */
+/* { dg-do run { target int128 } } */
 /* { dg-require-effective-target vmx_hw } */
 /* { dg-options "-maltivec -O2 " } */

diff --git a/gcc/testsuite/gcc.target/powerpc/builtins-10.c 
b/gcc/testsuite/gcc.target/powerpc/builtins-10.c

index b00f53cfc62..007892e2731 100644
--- a/gcc/testsuite/gcc.target/powerpc/builtins-10.c
+++ b/gcc/testsuite/gcc.target/powerpc/builtins-10.c
@@ -1,4 +1,4 @@
-/* { dg-do compile } */
+/* { dg-do compile { target int128 } } */
 /* { dg-options "-O2 -maltivec" } */
 /* { dg-require-effective-target powerpc_altivec } */
 /* { dg-final { scan-assembler-times "xxsel" 6 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec_perm-runnable-i128.c 
b/gcc/testsuite/gcc.target/powerpc/vec_perm-runnable-i128.c

index 0e0d77bcb84..df1bf873cfc 100644
--- a/gcc/testsuite/gcc.target/powerpc/vec_perm-runnable-i128.c
+++ b/gcc/testsuite/gcc.target/powerpc/vec_perm-runnable-i128.c
@@ -1,4 +1,4 @@
-/* { dg-do run } */
+/* { dg-do run { target  int128 } } */
 /* { dg-require-effective-target vmx_hw } */
 /* { dg-options "-maltivec -O2 " } */

--
2.45.2




[PATCH] rs6000, remove __builtin_vsx_xvcmp* built-ins

2024-07-17 Thread Carl Love

GCC maintainers:

The following patch removes the three __builtin_vsx_xvcmp[eq|ge|gt]sp  
builtins as they similar to the overloaded vec_cmp[eq|ge|gt] built-ins.  
The difference is the overloaded built-ins return a vector of boolean or 
a vector of long long booleans where as the removed built-ins returned a 
vector of floats or vector of doubles.


The tests for __builtin_vsx_xvcmp[eq|ge|gt]sp and 
__builtin_vsx_xvcmp[eq|ge|gt]dp are updated to use the overloaded 
vec_cmp[eq|ge|gt] built-in with the required changes for the return 
type.  Note __builtin_vsx_xvcmp[eq|ge|gt]dp are used internally.


The patches have been tested on a Power 10 LE system with no regressions.

Please let me know if the patch is acceptable for mainline.  Thanks.

   Carl
-
rs6000, remove __builtin_vsx_xvcmp* built-ins

This patch removes the built-ins:
 __builtin_vsx_xvcmpeqsp, __builtin_vsx_xvcmpgesp,
 __builtin_vsx_xvcmpgtsp.

which are similar to the overloaded vec_cmpeq, vec_cmpgt and vec_cmpge
built-ins.

The difference is that the overloaded built-ins return a vector of
booleans or a vector of long long boolean depending if the inputs were a
vector of floats or a vector of doubles.  The removed built-ins
returned a vector of floats or vector of double for the vector float and
vector double inputs respectively.

The __builtin_vsx_xvcmpeqdp, __builtin_vsx_xvcmpgedp and
__builtin_vsx_xvcmpgtdp are not removed as they are used by the
overloaded vec_cmpeq, vec_cmpgt and vec_cmpge built-ins.

The test cases for the __builtin_vsx_xvcmpeqsp, __builtin_vsx_xvcmpgesp,
__builtin_vsx_xvcmpgtsp, __builtin_vsx_xvcmpeqdp,
__builtin_vsx_xvcmpgedp and __builtin_vsx_xvcmpgtdp  are changed to use
the overloaded vec_cmpeq, vec_cmpgt, vec_cmpge built-ins.  Use of the
overloaded built-ins requires the result to be stored in a vector of
boolean of the appropriate size or the result must be cast to the return
type used by the original __builtin_vsx_xvcmp* built-ins.
---
 gcc/config/rs6000/rs6000-builtins.def | 10 ---
 .../gcc.target/powerpc/vsx-builtin-3.c    | 28 ++-
 2 files changed, 21 insertions(+), 17 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def

index 77eb0f7e406..896d9686ac6 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -1579,30 +1579,20 @@
   const signed int __builtin_vsx_xvcmpeqdp_p (signed int, vd, vd);
 XVCMPEQDP_P vector_eq_v2df_p {pred}

-  const vf __builtin_vsx_xvcmpeqsp (vf, vf);
-    XVCMPEQSP vector_eqv4sf {}
-
   const vd __builtin_vsx_xvcmpgedp (vd, vd);
 XVCMPGEDP vector_gev2df {}

   const signed int __builtin_vsx_xvcmpgedp_p (signed int, vd, vd);
 XVCMPGEDP_P vector_ge_v2df_p {pred}

-  const vf __builtin_vsx_xvcmpgesp (vf, vf);
-    XVCMPGESP vector_gev4sf {}
-
   const signed int __builtin_vsx_xvcmpgesp_p (signed int, vf, vf);
 XVCMPGESP_P vector_ge_v4sf_p {pred}

   const vd __builtin_vsx_xvcmpgtdp (vd, vd);
 XVCMPGTDP vector_gtv2df {}
-
   const signed int __builtin_vsx_xvcmpgtdp_p (signed int, vd, vd);
 XVCMPGTDP_P vector_gt_v2df_p {pred}

-  const vf __builtin_vsx_xvcmpgtsp (vf, vf);
-    XVCMPGTSP vector_gtv4sf {}
-
   const signed int __builtin_vsx_xvcmpgtsp_p (signed int, vf, vf);
 XVCMPGTSP_P vector_gt_v4sf_p {pred}

diff --git a/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c 
b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c

index 60f91aad23c..d67f97c8011 100644
--- a/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c
+++ b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c
@@ -156,13 +156,27 @@ int do_cmp (void)
 {
   int i = 0;

-  d[i][0] = __builtin_vsx_xvcmpeqdp (d[i][1], d[i][2]); i++;
-  d[i][0] = __builtin_vsx_xvcmpgtdp (d[i][1], d[i][2]); i++;
-  d[i][0] = __builtin_vsx_xvcmpgedp (d[i][1], d[i][2]); i++;
-
-  f[i][0] = __builtin_vsx_xvcmpeqsp (f[i][1], f[i][2]); i++;
-  f[i][0] = __builtin_vsx_xvcmpgtsp (f[i][1], f[i][2]); i++;
-  f[i][0] = __builtin_vsx_xvcmpgesp (f[i][1], f[i][2]); i++;
+  /* The __builtin_vsx_xvcmp[gt|ge|eq]dp and 
__builtin_vsx_xvcmp[gt|ge|eq]sp

+ have been removed in favor of the overloaded vec_cmpeq, vec_cmpgt and
+ vec_cmpge built-ins.  The __builtin_vsx_xvcmp* builtins returned a 
vector
+ result of the same type as the arguments.  The vec_cmp* built-ins 
return
+ a vector of boolenas of the same size as the arguments. Thus the 
result
+ assignment must be to a boolean or cast to a boolean.  Test both 
cases.

+  */
+
+  d[i][0] = (vector double) vec_cmpeq (d[i][1], d[i][2]); i++;
+  d[i][0] = (vector double) vec_cmpgt (d[i][1], d[i][2]); i++;
+  d[i][0] = (vector double) vec_cmpge (d[i][1], d[i][2]); i++;
+  bl[i][0] = vec_cmpeq (d[i][1], d[i][2]); i++;
+  bl[i][0] = vec_cmpgt (d[i][1], d[i][2]); i++;
+  bl[i][0] = vec_cmpge (d[i][1], d[i][2]); i++;
+

[PATCH] rs6000, Remove __builtin_vec_set_v1ti,, __builtin_vec_set_v2df, __builtin_vec_set_v2di

2024-07-17 Thread Carl Love

GCC maintainers:

This patch removes the __builtin_vec_set_v1ti, __builtin_vec_set_v2df 
and __builtin_vec_set_v2di built-ins.  The users should just use normal 
C-code to update the various vector elements.  This change was 
originally intended to be part of the earlier series of cleanup 
patches.  It was initially thought that some additional work would be 
needed to do some gimple generation instead of these built-ins.  
However, the existing default code generation does produce the needed 
code.  The code generated with normal C-code is as good or better than 
the code generated with these built-ins.


The patch has been tested on Power 10 LE with no regressions.

Please let me know if the patch is acceptable for mainline.  Thanks.

   Carl

---
rs6000, Remove __builtin_vec_set_v1ti, __builtin_vec_set_v2df, 
__builtin_vec_set_v2di


Remove the built-ins, use the default gimple generation instead.

gcc/ChangeLog:
    * config/rs6000/rs6000-builtins.def (__builtin_vec_set_v1ti,
    __builtin_vec_set_v2df, __builtin_vec_set_v2di): Remove built-in
    definitions.
    * config/rs6000/rs6000-c.cc (resolve_vec_insert):  Remove if
    statemnts for mode == V2DFmode, mode == V2DImode and
    mode == V1TImode that reference RS6000_BIF_VEC_SET_V2DF,
    RS6000_BIF_VEC_SET_V2DI and RS6000_BIF_VEC_SET_V1TI.
---
 gcc/config/rs6000/rs6000-builtins.def | 13 -
 gcc/config/rs6000/rs6000-c.cc | 40 ---
 2 files changed, 53 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def

index 896d9686ac6..0ebc940f395 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -1263,19 +1263,6 @@
   const signed long long __builtin_vec_ext_v2di (vsll, signed int);
 VEC_EXT_V2DI nothing {extract}

-;; VEC_SET_V1TI, VEC_SET_V2DF and VEC_SET_V2DI are used in
-;; resolve_vec_insert(), rs6000-c.cc
-;; TODO: Remove VEC_SET_V1TI, VEC_SET_V2DF and VEC_SET_V2DI once the uses
-;; in resolve_vec_insert are replaced by the equivalent gimple statements.
-  const vsq __builtin_vec_set_v1ti (vsq, signed __int128, const int<0,0>);
-    VEC_SET_V1TI nothing {set}
-
-  const vd __builtin_vec_set_v2df (vd, double, const int<1>);
-    VEC_SET_V2DF nothing {set}
-
-  const vsll __builtin_vec_set_v2di (vsll, signed long long, const int<1>);
-    VEC_SET_V2DI nothing {set}
-
   const vsc __builtin_vsx_cmpge_16qi (vsc, vsc);
 CMPGE_16QI vector_nltv16qi {}

diff --git a/gcc/config/rs6000/rs6000-c.cc b/gcc/config/rs6000/rs6000-c.cc
index 6229c503bd0..c288acc200b 100644
--- a/gcc/config/rs6000/rs6000-c.cc
+++ b/gcc/config/rs6000/rs6000-c.cc
@@ -1522,46 +1522,6 @@ resolve_vec_insert (resolution *res, vecva_gc> *arglist,

   return error_mark_node;
 }

-  /* If we can use the VSX xxpermdi instruction, use that for insert.  */
-  machine_mode mode = TYPE_MODE (arg1_type);
-
-  if ((mode == V2DFmode || mode == V2DImode)
-  && VECTOR_UNIT_VSX_P (mode)
-  && TREE_CODE (arg2) == INTEGER_CST)
-    {
-  wide_int selector = wi::to_wide (arg2);
-  selector = wi::umod_trunc (selector, 2);
-  arg2 = wide_int_to_tree (TREE_TYPE (arg2), selector);
-
-  tree call = NULL_TREE;
-  if (mode == V2DFmode)
-    call = rs6000_builtin_decls[RS6000_BIF_VEC_SET_V2DF];
-  else if (mode == V2DImode)
-    call = rs6000_builtin_decls[RS6000_BIF_VEC_SET_V2DI];
-
-  /* Note, __builtin_vec_insert_ has vector and scalar types
-     reversed.  */
-  if (call)
-    {
-      *res = resolved;
-      return build_call_expr (call, 3, arg1, arg0, arg2);
-    }
-    }
-
-  else if (mode == V1TImode
-       && VECTOR_UNIT_VSX_P (mode)
-       && TREE_CODE (arg2) == INTEGER_CST)
-    {
-  tree call = rs6000_builtin_decls[RS6000_BIF_VEC_SET_V1TI];
-  wide_int selector = wi::zero(32);
-  arg2 = wide_int_to_tree (TREE_TYPE (arg2), selector);
-
-  /* Note, __builtin_vec_insert_ has vector and scalar types
-     reversed.  */
-  *res = resolved;
-  return build_call_expr (call, 3, arg1, arg0, arg2);
-    }
-
   /* Build *(((arg1_inner_type*) & (vector type){arg1}) + arg2) = arg0 
with

  VIEW_CONVERT_EXPR.  i.e.:
    D.3192 = v1;
--
2.45.2




Re: [PATCH ver 2] rs6000, update effective target for tests builtins-10*.c and, vec_perm-runnable-i128.c

2024-07-17 Thread Carl Love




On 7/16/24 6:01 PM, Peter Bergner wrote:

On 7/16/24 6:19 PM, Carl Love wrote:

use __int128 types that are not supported on all platforms.  The
__int128 type is only supported on 64-bit platforms.  Need to check that
the platform is 64-bits and support the __int128 type.  Add the int128 and
lp64 flags to the target test.

The test cases themselves look good, but you need to update your git log entry
to not mention the lp64/64-bits since you removed them.

Yea, I didn't get the lp64 references clean up properly.  Sorry about that.

  Yes, currently, only
64-bit targets support __int128, but our hope is that one day, even 32-bit
targets will as well.  So how about the following text instead?


...
use __int128 types that are not supported on all platforms.  Update the
tests to check int128 effective target to avoid unsupported type errors
on unsupported platforms.


OK, changed.

 Carl




[PATCH ver 3] rs6000, update effective target for tests builtins-10*.c and, vec_perm-runnable-i128.c

2024-07-17 Thread Carl Love

GCC maintainers:

Version 3, in version 2, the ChangeLog didn't get updated to remove the 
LP64 references.  Fixed that and updated the patch description per the 
feedback from Peter.


Version 2, removed the lp64 from the target per discussion.  Tested and 
it is not needed.  The int128 qualifier is sufficient for the thest to 
report as unsupported on a 32-bit Power system.


The tests:

  tests builtins-10-runnable.c
  tests builtins-10.c
  vec_perm-runnable-i128.c

generate the following errors when run on a 32-bit BE Power system with 
GCC configured with multilib enabled.


FAIL: gcc.target/powerpc/builtins-10-runnable.c (test for excess errors)
FAIL: gcc.target/powerpc/builtins-10.c (test for excess errors)
FAIL: gcc.target/powerpc/vec_perm-runnable-i128.c (test for excess errors)

The tests use the __int128 type which is not supported on 32-bit 
systems.  The test for int128 and lp64 was added to the test cases to 
disable the test on 32-bit systems and systems that do not support the 
__int128 type.  The three tests now report "# of unsupported tests 1".


The patch has been tested on a Power 9 BE system with multilib enabled 
for GCC and on a Power 10 LE 64-bit configuration with no regression 
failures.


Please let me know if the patch is acceptable for mainline. Thanks.

   Carl
--
rs6000, update effective target for tests builtins-10*.c and 
vec_perm-runnable-i128.c


The tests:

  tests builtins-10-runnable.c
  tests builtins-10.c
  vec_perm-runnable-i128.c

use __int128 types that are not supported on all platforms.  Update the
tests to check int128 effective target to avoid unsupported type errors
on unsupported platforms.

gcc/testsuite/ChangeLog:
    * gcc.target/powerpc/builtins-10-runnable.c: Add
    target int128.
    * gcc.target/powerpc/builtins-10.c: Add
    target int128.
    * gcc.target/powerpc/vec_perm-runnable-i128: Add
    target int128.
---
 gcc/testsuite/gcc.target/powerpc/builtins-10-runnable.c   | 2 +-
 gcc/testsuite/gcc.target/powerpc/builtins-10.c    | 2 +-
 gcc/testsuite/gcc.target/powerpc/vec_perm-runnable-i128.c | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/testsuite/gcc.target/powerpc/builtins-10-runnable.c 
b/gcc/testsuite/gcc.target/powerpc/builtins-10-runnable.c

index dede08358e1..e2d3c990852 100644
--- a/gcc/testsuite/gcc.target/powerpc/builtins-10-runnable.c
+++ b/gcc/testsuite/gcc.target/powerpc/builtins-10-runnable.c
@@ -1,4 +1,4 @@
-/* { dg-do run } */
+/* { dg-do run { target int128 } } */
 /* { dg-require-effective-target vmx_hw } */
 /* { dg-options "-maltivec -O2 " } */

diff --git a/gcc/testsuite/gcc.target/powerpc/builtins-10.c 
b/gcc/testsuite/gcc.target/powerpc/builtins-10.c

index b00f53cfc62..007892e2731 100644
--- a/gcc/testsuite/gcc.target/powerpc/builtins-10.c
+++ b/gcc/testsuite/gcc.target/powerpc/builtins-10.c
@@ -1,4 +1,4 @@
-/* { dg-do compile } */
+/* { dg-do compile { target int128 } } */
 /* { dg-options "-O2 -maltivec" } */
 /* { dg-require-effective-target powerpc_altivec } */
 /* { dg-final { scan-assembler-times "xxsel" 6 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec_perm-runnable-i128.c 
b/gcc/testsuite/gcc.target/powerpc/vec_perm-runnable-i128.c

index 0e0d77bcb84..df1bf873cfc 100644
--- a/gcc/testsuite/gcc.target/powerpc/vec_perm-runnable-i128.c
+++ b/gcc/testsuite/gcc.target/powerpc/vec_perm-runnable-i128.c
@@ -1,4 +1,4 @@
-/* { dg-do run } */
+/* { dg-do run { target  int128 } } */
 /* { dg-require-effective-target vmx_hw } */
 /* { dg-options "-maltivec -O2 " } */

--
2.45.2




[PATCH] rs6000, Add new overloaded vector shift builtin int128, varients

2024-07-19 Thread Carl Love

GCC developers:

The following patch adds the int128 varients to the existing overloaded 
built-ins vec_sld, vec_sldb, vec_sldw, vec_sll, vec_slo, vec_srdb, 
vec_srl, vec_sro.  These varients were requested by Steve Munroe.


The patch has been tested on a Power 10 system with no regressions.

Please let me know if the patch is acceptable for mainline.

   Carl


---
 rs6000, Add new overloaded vector shift builtin int128 varients

Add the signed __int128 and unsigned __int128 argument types for the
overloaded built-ins vec_sld, vec_sldb, vec_sldw, vec_sll, vec_slo,
vec_srdb, vec_srl, vec_sro.  For each of the new argument types add a
testcase and update the documentation for the built-in.

Add the missing internal names for the float and double types for
overloaded builtin vec_sld for the float and double types.

gcc/ChangeLog:
    * config/rs6000/altivec.md (vsdb_): Change
    define_insn iterator to VEC_IC.
    * config/rs6000/rs6000-builtins.def (__builtin_altivec_vsldoi_v1ti,
    __builtin_vsx_xxsldwi_v1ti, __builtin_altivec_vsldb_v1ti,
    __builtin_altivec_vsrdb_v1ti): New builtin definitions.
    * config/rs6000/rs6000-overload.def (vec_sld, vec_sldb, vec_sldw,
    vec_sll, vec_slo, vec_srdb, vec_srl, vec_sro): New overloaded
    definitions.
    (vec_sld): Add missing internal names.
    * doc/extend.texi (vec_sld, vec_sldb, vec_sldw,    vec_sll, vec_slo,
    vec_srdb, vec_srl, vec_sro): Add documentation for new overloaded
    built-ins.

gcc/testsuite/ChangeLog:
    * gcc.target/powerpc/vec-shift-double-runnable-int128.c: New test
    file.
---
 gcc/config/rs6000/altivec.md  |   6 +-
 gcc/config/rs6000/rs6000-builtins.def |  12 +
 gcc/config/rs6000/rs6000-overload.def |  44 ++-
 gcc/doc/extend.texi   |  42 +++
 .../vec-shift-double-runnable-int128.c    | 349 ++
 5 files changed, 448 insertions(+), 5 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/powerpc/vec-shift-double-runnable-int128.c


diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index 5af9bf920a2..2a18ee44526 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -878,9 +878,9 @@ (define_int_attr SLDB_lr [(UNSPEC_SLDB "l")
 (define_int_iterator VSHIFT_DBL_LR [UNSPEC_SLDB UNSPEC_SRDB])

 (define_insn "vsdb_"
- [(set (match_operand:VI2 0 "register_operand" "=v")
-  (unspec:VI2 [(match_operand:VI2 1 "register_operand" "v")
-       (match_operand:VI2 2 "register_operand" "v")
+ [(set (match_operand:VEC_IC 0 "register_operand" "=v")
+  (unspec:VEC_IC [(match_operand:VEC_IC 1 "register_operand" "v")
+       (match_operand:VEC_IC 2 "register_operand" "v")
    (match_operand:QI 3 "const_0_to_12_operand" "n")]
   VSHIFT_DBL_LR))]
   "TARGET_POWER10"
diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def

index 77eb0f7e406..fbb6e1ddf85 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -964,6 +964,9 @@
   const vss __builtin_altivec_vsldoi_8hi (vss, vss, const int<4>);
 VSLDOI_8HI altivec_vsldoi_v8hi {}

+  const vsq __builtin_altivec_vsldoi_v1ti (vsq, vsq, const int<4>);
+    VSLDOI_V1TI altivec_vsldoi_v1ti {}
+
   const vss __builtin_altivec_vslh (vss, vus);
 VSLH vashlv8hi3 {}

@@ -1831,6 +1834,9 @@
   const vsll __builtin_vsx_xxsldwi_2di (vsll, vsll, const int<2>);
 XXSLDWI_2DI vsx_xxsldwi_v2di {}

+  const vsq __builtin_vsx_xxsldwi_v1ti (vsq, vsq, const int<2>);
+    XXSLDWI_Q vsx_xxsldwi_v1ti {}
+
   const vf __builtin_vsx_xxsldwi_4sf (vf, vf, const int<2>);
 XXSLDWI_4SF vsx_xxsldwi_v4sf {}

@@ -3299,6 +3305,9 @@
   const vss __builtin_altivec_vsldb_v8hi (vss, vss, const int<3>);
 VSLDB_V8HI vsldb_v8hi {}

+  const vsq __builtin_altivec_vsldb_v1ti (vsq, vsq, const int<3>);
+    VSLDB_V1TI vsldb_v1ti {}
+
   const vsq __builtin_altivec_vslq (vsq, vuq);
 VSLQ vashlv1ti3 {}

@@ -3317,6 +3326,9 @@
   const vss __builtin_altivec_vsrdb_v8hi (vss, vss, const int<3>);
 VSRDB_V8HI vsrdb_v8hi {}

+  const vsq __builtin_altivec_vsrdb_v1ti (vsq, vsq, const int<3>);
+    VSRDB_V1TI vsrdb_v1ti {}
+
   const vsq __builtin_altivec_vsrq (vsq, vuq);
 VSRQ vlshrv1ti3 {}

diff --git a/gcc/config/rs6000/rs6000-overload.def 
b/gcc/config/rs6000/rs6000-overload.def

index c4ecafc6f7e..302e0232533 100644
--- a/gcc/config/rs6000/rs6000-overload.def
+++ b/gcc/config/rs6000/rs6000-overload.def
@@ -3396,9 +3396,13 @@
   vull __builtin_vec_sld (vull, vull, const int);
 VSLDOI_2DI  VSLDOI_VULL
   vf __builtin_vec_sld (vf, vf, const int);
-    VSLDOI_4SF
+    VSLDOI_4SF VSLDOI_VF
   vd __builtin_vec_sld (vd, vd, const int);
-    VSLDOI_2DF
+    VSLDOI_2DF VSLDOI_VD
+  vsq __builtin_vec_sld (vsq, vsq, const int);
+    VSLDOI_V1TI  VSLDOI_VSQ
+  vuq __builtin_vec_sld

Re: [PATCH 4/13] rs6000, extend the current vec_{un,}signed{e,o} built-ins

2024-05-17 Thread Carl Love
Kewen:

I am working thru the patches.  I made the changes as requested for this patch 
but have a question about 
one of your comments.

On 5/14/24 00:53, Kewen.Lin wrote:
> Hi,
> 
> on 2024/4/20 05:17, Carl Love wrote:
>> rs6000, extend the current vec_{un,}signed{e,o} built-ins
>>
>> The built-ins __builtin_vsx_xvcvspsxds and __builtin_vsx_xvcvspuxds
>> convert a vector of floats to signed/unsigned long long ints.  Extend the
>> existing vec_{un,}signed{e,o} built-ins to handle the argument
>> vector of floats to return the even/odd signed/unsigned integers.
>>
>> Add testcases and update documentation.
>>
>> gcc/ChangeLog:
>> * config/rs6000/rs6000-builtins.def (__builtin_vsx_xvcvspsxds_low,
>> __builtin_vsx_xvcvspuxds_low): New built-in definitions.
>> * config/rs6000/rs6000-overload.def (vec_signede, vec_signedo):
>> Add new overloaded specifications.
>> * config/rs6000/vsx.md (vsx_xvcvspxds_low): New define_expand.
>> * doc/extend.texi (vec_signedo, vec_signede): Add documentation.
>>
>> gcc/testsuite/ChangeLog:
>> * gcc.target/powerpc/builtins-3-runnable: New tests for the added



> 
> As the existing instances for vec_signed and vec_unsigned are with
> names like VEC_V{UN,}SIGNED{O,E}_V2DF, I prefer these are updated
> with similar style, maybe something like:
> 
> VEC_V{UN,}SIGNED{E,O}_V4SF v{un,}signed{e,o}_v4sf

Yes, sounds reasonable.  Changed XVCVSPUXDS -> VEC_VUNSIGNEDE_V4SF
 XVCVSPUXDSO -> VEC_VUNSIGNEDO_V4SF
 XVCVSPSXDS  -> VEC_VSIGNEDE_V4SF
 XVCVSPSXDSO  -> VEC_VSIGNEDO_V4SF

QUESTION:
I am not sure what you want changed to v{un,}signed{e,o}_v4sf??  The overloaded 
instance entry names
for vd, vf have to match the first line of the definition. The name can't be 
type specific, i.e. v4sf.  
So not sure where you want the v{un,}signed{e,o}_v4sf name used?

For example, file rs6000-overloaded.def now looks like:

[VEC_SIGNEDE, vec_signede, __builtin_vec_vsignede]
   vsi __builtin_vec_vsignede (vd);
 VEC_VSIGNEDE_V2DF
+  vsll __builtin_vec_vsignede (vf);
+VEC_VSIGNEDE_V4SF
 
 [VEC_SIGNEDO, vec_signedo, __builtin_vec_vsignedo]
   vsi __builtin_vec_vsignedo (vd);
 VEC_VSIGNEDO_V2DF
+  vsll __builtin_vec_vsignedo (vf);
+VEC_VSIGNEDO_V4SF
 




 Carl 


Re: [PATCH 6/13] rs6000, add overloaded vec_sel with int128 arguments

2024-05-21 Thread Carl Love
Kewen:

On 5/13/24 19:54, Kewen.Lin wrote:
> Hi,
> 
> on 2024/4/20 05:17, Carl Love wrote:
>> rs6000, add overloaded vec_sel with int128 arguments
>>
>> Extend the vec_sel built-in to take three signed/unsigned int128 arguments
>> and return a signed/unsigned int128 result.
>>
>> Extending the vec_sel built-in makes the existing buit-ins
>> __builtin_vsx_xxsel_1ti and __builtin_vsx_xxsel_1ti_uns obsolete.  The
>> patch removes these built-ins.
>>
>> The patch adds documentation and test cases for the new overloaded vec_sel
>> built-ins.
>>
>> gcc/ChangeLog:
>>  * config/rs6000/rs6000-builtins.def (__builtin_vsx_xxsel_1ti,
>>  __builtin_vsx_xxsel_1ti_uns): Remove built-in definitions.
>>  * config/rs6000/rs6000-overload.def (vec_sel): Add new overloaded
>>  definitions.
>>  * doc/extend.texi: Add documentation for new vec_sel arguments.
>>
>> gcc/testsuite/ChangeLog:
>>  * gcc.target/powerpc/vec_sel_runnable-int128.c: New test file.
>> ---
>>  gcc/config/rs6000/rs6000-builtins.def |  6 --
>>  gcc/config/rs6000/rs6000-overload.def |  4 +
>>  gcc/doc/extend.texi   | 14 
>>  .../powerpc/vec-sel-runnable-i128.c   | 84 +++
>>  4 files changed, 102 insertions(+), 6 deletions(-)
>>  create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-sel-runnable-i128.c
>>
>> diff --git a/gcc/config/rs6000/rs6000-builtins.def 
>> b/gcc/config/rs6000/rs6000-builtins.def
>> index d09e21a9151..46d2ae7b7cb 100644
>> --- a/gcc/config/rs6000/rs6000-builtins.def
>> +++ b/gcc/config/rs6000/rs6000-builtins.def
>> @@ -1931,12 +1931,6 @@
>>const vuc __builtin_vsx_xxsel_16qi_uns (vuc, vuc, vuc);
>>  XXSEL_16QI_UNS vector_select_v16qi_uns {}
>>  
>> -  const vsq __builtin_vsx_xxsel_1ti (vsq, vsq, vsq);
>> -XXSEL_1TI vector_select_v1ti {}
>> -
>> -  const vsq __builtin_vsx_xxsel_1ti_uns (vsq, vsq, vsq);
>> -XXSEL_1TI_UNS vector_select_v1ti_uns {}
>> -
>>const vd __builtin_vsx_xxsel_2df (vd, vd, vd);
>>  XXSEL_2DF vector_select_v2df {}
>>  
>> diff --git a/gcc/config/rs6000/rs6000-overload.def 
>> b/gcc/config/rs6000/rs6000-overload.def
>> index 68501c05289..5912c9452f4 100644
>> --- a/gcc/config/rs6000/rs6000-overload.def
>> +++ b/gcc/config/rs6000/rs6000-overload.def
>> @@ -3274,6 +3274,10 @@
>>  VSEL_2DF  VSEL_2DF_B
>>vd __builtin_vec_sel (vd, vd, vull);
>>  VSEL_2DF  VSEL_2DF_U
>> +  vsq __builtin_vec_sel (vsq, vsq, vsq);
>> +VSEL_1TI  VSEL_1TI_S
>> +  vuq __builtin_vec_sel (vuq, vuq, vuq);
>> +VSEL_1TI_UNS  VSEL_1TI_U
>>  ; The following variants are deprecated.
>>vsll __builtin_vec_sel (vsll, vsll, vsll);
>>  VSEL_2DI_B  VSEL_2DI_S
>> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
>> index 64a43b55e2d..86b8e536dbe 100644
>> --- a/gcc/doc/extend.texi
>> +++ b/gcc/doc/extend.texi
>> @@ -23358,6 +23358,20 @@ The programmer is responsible for understanding the 
>> endianness issues involved
>>  with the first argument and the result.
>>  @findex vec_replace_unaligned
>>  
>> +Vector select
>> +
>> +@smallexample
>> +vector signed __int128 vec_sel (vector signed __int128,
>> +   vector signed __int128, vector signed __int128);
>> +vector unsigned __int128 vec_sel (vector unsigned __int128,
>> +   vector unsigned __int128, vector unsigned __int128);
>> +@end smallexample
>> +
>> +The overloaded built-in @code{vec_sel} with vector signed/unsigned __int128
>> +arguments and returns a vector selecting bits from the two source vectors 
>> based
>> +on the values of the third input vector.  This built-in is an extension of 
>> the
>> +@code{vec_sel} built-in documented in the PVIPR.
>> +
> 
> Why did you place this in a section for ISA 3.1 (Power10)?  It doesn't really
> require this support.  The used instance VSEL_1TI and VSEL_1TI_UNS are placed
> in altivec stanza, so it looks that we should put it under the section
> "PowerPC AltiVec Built-in Functions on ISA 2.05".  And since it's an extension
> of @code{vec_sel} documented in the PVIPR, I prefer to just mention it's "an
> extension of the @code{vec_sel} built-in documented in the PVIPR" and omitting
> the description to avoid possible slightly different wording.

Honestly, at this point in time I don't remember why I put it there.  It has 
been too long since I created the patch.  That said, the test case requires 
Power 10

Re: [PATCH 13/13] rs6000, remove vector set and vector init built-ins.

2024-05-22 Thread Carl Love
Kewen:

On 5/13/24 22:44, Kewen.Lin wrote:
>> perform the same operation as setting a specific element in the vector in
>> C code.  For example:
>>
>>   src_v4si = __builtin_vec_set_v4si (src_v4si, int_val, index);
>>   src_v4si[index] = int_val;
>>
>> The built-in actually generates more instructions than the inline C code
>> with no optimization but is identical with -O3 optimizations.
>>
>> All of the above built-ins that are removed do not have test cases and
>> are not documented.
>>
>> Built-ins   __builtin_vec_set_v1ti __builtin_vec_set_v2di,
>> __builtin_vec_set_v2df are not removed as they are used in function
>> resolve_vec_insert() in file rs6000-c.cc.
> I think we can replace these calls with the equivalent gimple codes
> (early expanding it) and then we can get rid of these instances.

Hmm, going to need a little coaching here.  I am not sure how to do this.  
Looks like I get to lean some  something new.

   Carl 


Re: [PATCH 12/13] rs6000, remove __builtin_vsx_xvcmpeqsp built-in

2024-05-23 Thread Carl Love



On 5/13/24 22:37, Kewen.Lin wrote:
> Hi,
> 
> on 2024/4/20 05:18, Carl Love wrote:
>> rs6000, remove __builtin_vsx_xvcmpeqsp built-in
>>
>> The built-in __builtin_vsx_xvcmpeqsp is a duplicate of the overloaded
>> vec_cmpeq built-in.  The built-in is undocumented.  The built-in and
>> the test cases are removed.
>>
>> gcc/ChangeLog:
>>  * config/rs6000/rs6000-builtins.def (__builtin_vsx_xvcmpeqsp):
>>  Remove built-in definition.
>>
> 
> Ah, you separated this __builtin_vsx_xvcmpeqsp from the one for
> __builtin_vsx_xvcmpeqsp_p, it's fine, please ignore the comments for
> considering this __builtin_vsx_xvcmpeqsp in my previous reply to 11/13.
> 
> 
>> gcc/testsuite/ChangeLog:
>>  * vsx-builtin-3.c (do_cmp): Remove test case for
>>  __builtin_vsx_xvcmpeqsp.
>> ---
>>  gcc/config/rs6000/rs6000-builtins.def| 3 ---
>>  gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c | 2 --
>>  2 files changed, 5 deletions(-)
>>
>> diff --git a/gcc/config/rs6000/rs6000-builtins.def 
>> b/gcc/config/rs6000/rs6000-builtins.def
>> index 2f6149edd5f..19d05b8043a 100644
>> --- a/gcc/config/rs6000/rs6000-builtins.def
>> +++ b/gcc/config/rs6000/rs6000-builtins.def
>> @@ -1613,9 +1613,6 @@
>>const signed int __builtin_vsx_xvcmpeqdp_p (signed int, vd, vd);
>>  XVCMPEQDP_P vector_eq_v2df_p {pred}
>>  
>> -  const vf __builtin_vsx_xvcmpeqsp (vf, vf);
>> -XVCMPEQSP vector_eqv4sf {}
>> -
>>const vd __builtin_vsx_xvcmpgedp (vd, vd);
>>  XVCMPGEDP vector_gev2df {}
>>  
>> diff --git a/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c 
>> b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c
>> index 35ea31b2616..245893dc0e3 100644
>> --- a/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c
>> +++ b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c
>> @@ -27,7 +27,6 @@
>>  /* { dg-final { scan-assembler "xvcmpeqdp" } } */
>>  /* { dg-final { scan-assembler "xvcmpgtdp" } } */
>>  /* { dg-final { scan-assembler "xvcmpgedp" } } */
>> -/* { dg-final { scan-assembler "xvcmpeqsp" } } */
>>  /* { dg-final { scan-assembler "xvcmpgtsp" } } */
>>  /* { dg-final { scan-assembler "xvcmpgesp" } } */
>>  /* { dg-final { scan-assembler "xxsldwi" } } */
>> @@ -112,7 +111,6 @@ int do_cmp (void)
>>d[i][0] = __builtin_vsx_xvcmpgtdp (d[i][1], d[i][2]); i++;
>>d[i][0] = __builtin_vsx_xvcmpgedp (d[i][1], d[i][2]); i++;
>>  
>> -  f[i][0] = __builtin_vsx_xvcmpeqsp (f[i][1], f[i][2]); i++;
>>f[i][0] = __builtin_vsx_xvcmpgtsp (f[i][1], f[i][2]); i++;
>>f[i][0] = __builtin_vsx_xvcmpgesp (f[i][1], f[i][2]); i++;
>>return i;
> 
> As the other in this patch series, I prefer to change it with
> vec_cmpeq here, OK for trunk with this tweaked (also keep the
> scan there), thanks!

When I went to change the test case I noticed that __builtin_vsx_xvcmpeqsp and 
vec_cmpeq both return a vector where the element is all ones if the comparison 
is True and zeros if False.  However, the return type for 
__builtin_vsx_xvcmpeqsp is vector floats but vec_cmpeq returns vector bool.

The PVIPR says the vec_cmpeq built-in returns a value where each bit in the 
vector element is a 1 if the comparison is equal and 0 otherwise.  However, the 
documented result is a vector bool int for the floating point comparison.  The 
return value for __builtin_vsx_xvcmpeqsp was vector float.  

So, the "bit values" returned are the same but not of the same type. So 
technically vec_cmpeq is not a drop in replacement for __builtin_vsx_xvcmpeqsp. 
 Given that, perhaps we should not be removing __builtin_vsx_xvcmpeqsp?

The testcase has to be changed from:
 f[i][0] = __builtin_vsx_xvcmpeqsp (f[i][1], f[i][2]); i++;
 bi[i][0] = vec_cmpeq (f[i][1], f[i][2]); i++;

I am thinking we should drop this patch from the series, i.e. don't remove 
__builtin_vsx_xvcmpeqsp.  Thoughts?

 Carl 
 

> 
> BR,
> Kewen
> 


Re: [PATCH 12/13] rs6000, remove __builtin_vsx_xvcmpeqsp built-in

2024-05-24 Thread Carl Love
Kewen:

On 5/24/24 03:43, Kewen.Lin wrote:
> Hi,
> 
> on 2024/5/24 02:21, Carl Love wrote:
>>
>>
>> On 5/13/24 22:37, Kewen.Lin wrote:
>>> Hi,
>>>
>>> on 2024/4/20 05:18, Carl Love wrote:
>>>> rs6000, remove __builtin_vsx_xvcmpeqsp built-in
>>>>
>>>> The built-in __builtin_vsx_xvcmpeqsp is a duplicate of the overloaded
>>>> vec_cmpeq built-in.  The built-in is undocumented.  The built-in and
>>>> the test cases are removed.
>>>>
>>>> gcc/ChangeLog:
>>>>* config/rs6000/rs6000-builtins.def (__builtin_vsx_xvcmpeqsp):
>>>>Remove built-in definition.
>>>>
>>>
>>> Ah, you separated this __builtin_vsx_xvcmpeqsp from the one for
>>> __builtin_vsx_xvcmpeqsp_p, it's fine, please ignore the comments for
>>> considering this __builtin_vsx_xvcmpeqsp in my previous reply to 11/13.
>>>
>>>
>>>> gcc/testsuite/ChangeLog:
>>>>* vsx-builtin-3.c (do_cmp): Remove test case for
>>>>__builtin_vsx_xvcmpeqsp.
>>>> ---
>>>>  gcc/config/rs6000/rs6000-builtins.def| 3 ---
>>>>  gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c | 2 --
>>>>  2 files changed, 5 deletions(-)
>>>>
>>>> diff --git a/gcc/config/rs6000/rs6000-builtins.def 
>>>> b/gcc/config/rs6000/rs6000-builtins.def
>>>> index 2f6149edd5f..19d05b8043a 100644
>>>> --- a/gcc/config/rs6000/rs6000-builtins.def
>>>> +++ b/gcc/config/rs6000/rs6000-builtins.def
>>>> @@ -1613,9 +1613,6 @@
>>>>const signed int __builtin_vsx_xvcmpeqdp_p (signed int, vd, vd);
>>>>  XVCMPEQDP_P vector_eq_v2df_p {pred}
>>>>  
>>>> -  const vf __builtin_vsx_xvcmpeqsp (vf, vf);
>>>> -XVCMPEQSP vector_eqv4sf {}
>>>> -
>>>>const vd __builtin_vsx_xvcmpgedp (vd, vd);
>>>>  XVCMPGEDP vector_gev2df {}
>>>>  
>>>> diff --git a/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c 
>>>> b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c
>>>> index 35ea31b2616..245893dc0e3 100644
>>>> --- a/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c
>>>> +++ b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c
>>>> @@ -27,7 +27,6 @@
>>>>  /* { dg-final { scan-assembler "xvcmpeqdp" } } */
>>>>  /* { dg-final { scan-assembler "xvcmpgtdp" } } */
>>>>  /* { dg-final { scan-assembler "xvcmpgedp" } } */
>>>> -/* { dg-final { scan-assembler "xvcmpeqsp" } } */
>>>>  /* { dg-final { scan-assembler "xvcmpgtsp" } } */
>>>>  /* { dg-final { scan-assembler "xvcmpgesp" } } */
>>>>  /* { dg-final { scan-assembler "xxsldwi" } } */
>>>> @@ -112,7 +111,6 @@ int do_cmp (void)
>>>>d[i][0] = __builtin_vsx_xvcmpgtdp (d[i][1], d[i][2]); i++;
>>>>d[i][0] = __builtin_vsx_xvcmpgedp (d[i][1], d[i][2]); i++;
>>>>  
>>>> -  f[i][0] = __builtin_vsx_xvcmpeqsp (f[i][1], f[i][2]); i++;
>>>>f[i][0] = __builtin_vsx_xvcmpgtsp (f[i][1], f[i][2]); i++;
>>>>f[i][0] = __builtin_vsx_xvcmpgesp (f[i][1], f[i][2]); i++;
>>>>return i;
>>>
>>> As the other in this patch series, I prefer to change it with
>>> vec_cmpeq here, OK for trunk with this tweaked (also keep the
>>> scan there), thanks!
>>
>> When I went to change the test case I noticed that __builtin_vsx_xvcmpeqsp 
>> and vec_cmpeq both return a vector where the element is all ones if the 
>> comparison is True and zeros if False.  However, the return type for 
>> __builtin_vsx_xvcmpeqsp is vector floats but vec_cmpeq returns vector bool.
>>
> 
> Ah, so they are not equivalent from prototype perspective.
> 
>> The PVIPR says the vec_cmpeq built-in returns a value where each bit in the 
>> vector element is a 1 if the comparison is equal and 0 otherwise.  However, 
>> the documented result is a vector bool int for the floating point 
>> comparison.  The return value for __builtin_vsx_xvcmpeqsp was vector float.
> 
> IMHO PVIPR prototype (returning vector bool) makes more sense,
> it does match better with what the result holds.

Yes, I tend to agree.  I think the user would use be likely using the test so 
they could create a mask to selectively replace vector elements.  A bool type 
make more sense in that case.

> 
>>
>> So, the "bit values" returned are the same but not of the same type. S

Re: [PATCH 2/13] rs6000, Remove __builtin_vsx_xvcvspsxws built-in

2024-05-24 Thread Carl Love
Kewen:

On 5/14/24 01:43, Kewen.Lin wrote:
> Hi,
> 
> on 2024/4/20 05:17, Carl Love wrote:
>> rs6000, Remove __builtin_vsx_xvcvspsxws built-in
>>
>> The built-in __builtin_vsx_xvcvspsxws is a duplicate of the vec_signed
>> built-in that is documented in the PVIPR.  The __builtin_vsx_xvcvspsxws
>> built-in is not documented and there are no test cases for it.
>>
>> This patch removes the redundant built-in.
> 
> By revisiting the comments on the previous version:
> https://gcc.gnu.org/pipermail/gcc-patches/2024-February/646723.html

The comments from the previous version:
-
   I think we should recommend users to adopt the recommended built-ins in
   PVIPR, by checking the corresponding mnemonic in PVIPR, I got:

   __builtin_vsx_xvcvspsxws -> vec_signed
   __builtin_vsx_xvcvspsxds -> N/A
   __builtin_vsx_xvcvspuxds -> N/A
   __builtin_vsx_xvcvdpsxws -> vec_signed{e,o}
   __builtin_vsx_xvcvdpuxws -> vec_unsigned{e,o}
   __builtin_vsx_xvcvdpuxds_uns -> vec_unsigned
   __builtin_vsx_xvcvspdp   -> vec_double{e,o}
   __builtin_vsx_xvcvdpsp   -> vec_float{e,o}
   __builtin_vsx_xvcvspuxws -> vec_unsigned
   __builtin_vsx_xvcvsxwdp  -> vec_double{e,o}
   __builtin_vsx_xvcvuxddp_uns> vec_double

   For __builtin_vsx_xvcvspsxds and __builtin_vsx_xvcvspuxds which don't have
   the according PVIPR built-ins, we can extend the current vec_{un,}signed{e,o}
   to cover them and document them following the section mentioning PVIPR.

are handled by multiple patches in the new series.  The main comment on the 
previous patch series was to remove most of the built-ins as they were 
redundant.  So, basically most of the patches in the previous series were 
thrown out and a new series to remove the built-ins in the current series.


That all said, I distinctly remember addressing each of the above built-ins.  
The work on the series got
interrupted a couple of times and it looks like some of the patches to address 
the above got lost.  My bad.
The following is a list of which patch takes care of removing the duplicate 
built-ins.

__builtin_vsx_xvcvspsxws patch 2 removes this built-in
__builtin_vsx_xvcvspsxds -> N/A  patch 4 extends vec_{un,}signede 
to cover this built-in,
 Built-in used in 
rs6000-overload.def.  Built-in now for   
 internal use only.
__builtin_vsx_xvcvspuxds -> N/A  patch 4 extends vec_{un,}signedo 
to cover this built-in.
 Built-in used in 
rs6000-overload.def.  Built-in now for
 internal use only 


__builtin_vsx_xvcvdpsxws -> vec_signed{e,o}   removed in patch 4
__builtin_vsx_xvcvdpuxws -> vec_unsigned{e,o} removed in patch 4

__builtin_vsx_xvcvdpuxds_uns -> vec_unsigned  remove in patch 4
__builtin_vsx_xvcvspuxws -> vec_unsigned  remove in patch 4

The following will changes will be put into a new patch when the series is 
reposted.  It appears they
got lost in the current series.  My bad.

__builtin_vsx_xvcvspdp   -> vec_double{e,o}   remove in new patch number 5
__builtin_vsx_xvcvdpsp   -> vec_float{e,o}remove in new patch number 5

__builtin_vsx_xvcvsxwdp  -> vec_double{e,o}   remove in new patch number 5
__builtin_vsx_xvcvuxddp_uns> vec_double   remove in new patch number 5

> 
> I wonder if it's intentional to keep the others, at least bifs
> __builtin_vsx_xvcvdpuxds_uns, __builtin_vsx_xvcvspuxws and
> __builtin_vsx_xvcvuxddp_uns looks removable, users can just uses the
> equivalent ones in PVIPR.  And for the others, users can still use
> the PVIPR ones by considering endianness (controlling with endianness
> macros).
> 

Hopefully that makes it clearer where the various changes are.   

The next series will add a new patch 5 in the series.  The remaining patches in 
this series, patches 5, 6, ... will get moved to patch 6, 7, ... in the next 
posting of the built-in cleanup patch series.

Carl 


Re: [PATCH 4/13] rs6000, extend the current vec_{un,}signed{e,o} built-ins

2024-05-24 Thread Carl Love
Kewen:

On 5/14/24 00:53, Kewen.Lin wrote:
> Hi,
> 
> on 2024/4/20 05:17, Carl Love wrote:
>> rs6000, extend the current vec_{un,}signed{e,o} built-ins
>>
>> The built-ins __builtin_vsx_xvcvspsxds and __builtin_vsx_xvcvspuxds
>> convert a vector of floats to signed/unsigned long long ints.  Extend the
>> existing vec_{un,}signed{e,o} built-ins to handle the argument
>> vector of floats to return the even/odd signed/unsigned integers.
>>
>> Add testcases and update documentation.
>>
>> gcc/ChangeLog:
>> * config/rs6000/rs6000-builtins.def (__builtin_vsx_xvcvspsxds_low,
>> __builtin_vsx_xvcvspuxds_low): New built-in definitions.
>> * config/rs6000/rs6000-overload.def (vec_signede, vec_signedo):
>> Add new overloaded specifications.
>> * config/rs6000/vsx.md (vsx_xvcvspxds_low): New define_expand.
>> * doc/extend.texi (vec_signedo, vec_signede): Add documentation.
>>
>> gcc/testsuite/ChangeLog:
>> * gcc.target/powerpc/builtins-3-runnable: New tests for the added
>> overloaded built-ins.
> 
> This part is missing, there are no test case changes in this patch.

Yes, the new tests are missing.  Not sure what happened to them.  Fixed.

> 
>> ---
>>  gcc/config/rs6000/rs6000-builtins.def |  6 ++
>>  gcc/config/rs6000/rs6000-overload.def |  8 
>>  gcc/config/rs6000/vsx.md  | 23 +++
>>  gcc/doc/extend.texi   | 13 +
>>  4 files changed, 50 insertions(+)
>>
>> diff --git a/gcc/config/rs6000/rs6000-builtins.def 
>> b/gcc/config/rs6000/rs6000-builtins.def
>> index bf9a0ae22fc..5b7237a2327 100644
>> --- a/gcc/config/rs6000/rs6000-builtins.def
>> +++ b/gcc/config/rs6000/rs6000-builtins.def
>> @@ -1709,9 +1709,15 @@
>>const vsll __builtin_vsx_xvcvspsxds (vf);
>>  XVCVSPSXDS vsx_xvcvspsxds {}
>>  
>> +  const vsll __builtin_vsx_xvcvspsxds_low (vf);
>> +XVCVSPSXDSO vsx_xvcvspsxds_low {}
>> +
>>const vsll __builtin_vsx_xvcvspuxds (vf);
>>  XVCVSPUXDS vsx_xvcvspuxds {}
> 
> This existing should return with type vull, ...

Fixed.

> 
>>  
>> +  const vsll __builtin_vsx_xvcvspuxds_low (vf);
>> +XVCVSPUXDSO vsx_xvcvspuxds_low {}
> 
> ... so this copied one should be vull too.

Fixed.

> 
> As the existing instances for vec_signed and vec_unsigned are with
> names like VEC_V{UN,}SIGNED{O,E}_V2DF, I prefer these are updated
> with similar style, maybe something like:
> 
> VEC_V{UN,}SIGNED{E,O}_V4SF v{un,}signed{e,o}_v4sf

Yes, sounds reasonable.  Changed XVCVSPUXDS -> VEC_VUNSIGNEDE_V4SF
 XVCVSPUXDSO -> VEC_VUNSIGNEDO_V4SF
 XVCVSPSXDS  -> VEC_VSIGNEDE_V4SF
 XVCVSPSXDSO  -> VEC_VSIGNEDO_V4SF

NEED TO ADDRESS RESPONSE TO QUESTION I ASKED.

> 
>>const vsi __builtin_vsx_xvcvspuxws (vf);
>>  XVCVSPUXWS vsx_fixuns_truncv4sfv4si2 {}
>>  > diff --git a/gcc/config/rs6000/rs6000-overload.def 
>> b/gcc/config/rs6000/rs6000-overload.def
>> index 84bd9ae6554..68501c05289 100644
>> --- a/gcc/config/rs6000/rs6000-overload.def
>> +++ b/gcc/config/rs6000/rs6000-overload.def
>> @@ -3307,10 +3307,14 @@
>>  [VEC_SIGNEDE, vec_signede, __builtin_vec_vsignede]
>>vsi __builtin_vec_vsignede (vd);
>>  VEC_VSIGNEDE_V2DF
>> +  vsll __builtin_vec_vsignede (vf);
>> +XVCVSPSXDS
>>  
>>  [VEC_SIGNEDO, vec_signedo, __builtin_vec_vsignedo]
>>vsi __builtin_vec_vsignedo (vd);
>>  VEC_VSIGNEDO_V2DF
>> +  vsll __builtin_vec_vsignedo (vf);
>> +XVCVSPSXDSO
>>  
>>  [VEC_SIGNEXTI, vec_signexti, __builtin_vec_signexti]
>>vsi __builtin_vec_signexti (vsc);
>> @@ -4433,10 +4437,14 @@
>>  [VEC_UNSIGNEDE, vec_unsignede, __builtin_vec_vunsignede]
>>vui __builtin_vec_vunsignede (vd);
>>  VEC_VUNSIGNEDE_V2DF
>> +  vull __builtin_vec_vunsignede (vf);
>> +XVCVSPUXDS
>>  
>>  [VEC_UNSIGNEDO, vec_unsignedo, __builtin_vec_vunsignedo]
>>vui __builtin_vec_vunsignedo (vd);
>>  VEC_VUNSIGNEDO_V2DF
>> +  vull __builtin_vec_vunsignedo (vf);
>> +XVCVSPUXDSO
>>  
> As above, the name can be tweaked.

Fixed.

> 
>>  [VEC_VEE, vec_extract_exp, __builtin_vec_extract_exp]
>>vui __builtin_vec_extract_exp (vf);
>> diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
>> index f135fa079bd..3d39ae7995f 100644
>> --- a/gcc/config/rs6000/vsx.md
>> +++ b/gcc/config/rs6000/vsx.md
>> @@ -2704,6

Re: [PATCH 3/13] rs6000, fix error in unsigned vector float to unsigned int built-in definitions

2024-05-24 Thread Carl Love
Keewn:

On 5/14/24 00:00, Kewen.Lin wrote:
> Hi,
> 
> on 2024/4/20 05:17, Carl Love wrote:
>> rs6000, fix error in unsigned vector float to unsigned  int built-in 
>> definitions
>>
>> The built-ins __builtin_vsx_vunsigned_v2df and__builtin_vsx_vunsigned_v4sf
>> are supposed to take a vector of floats and return a vector of unsigned
>> long long ints.  The definitions are using the signed version of the
> 
> Sorry for nitpicking, here __builtin_vsx_vunsigned_v2df takes vector of 
> doubles
> and returns vector of unsigned long long ints while 
> __builtin_vsx_vunsigned_v4sf
> takes vector of floats and returns vector of unsigned ints.

That is not nitpicking, the description is wrong.  Changed float to double.
> 
>> instructions not the unsigned version of the instruction.  The results
>> should also be unsigned.  The builtins are used by the overloaded
>> vec_unsigned builtin which has an unsigned result.
>>
>> Similarly the built-ins __builtin_vsx_vunsignede_v2df and
>> __builtin_vsx_vunsignedo_v2df are supposed to retun an unsigned result.
> 
> Nit: s/retun/return/

Fixed.

> 
>> If the floating point argument is negative, the unsigned result is zero.
>> The built-ins are used in the overloaded built-in vec_unsignede and
>> vec_unsignedo respectively.
>>
>> Add a test cases for a negative floating point arguments for each of the
>> above built-ins.
>>
>> gcc/ChangeLog:
>>  * config/rs6000/rs6000-builtins.def (__builtin_vsx_vunsigned_v2df,
>>  __builtin_vsx_vunsigned_v4sf, __builtin_vsx_vunsignede_v2df,
>>  __builtin_vsx_vunsignedo_v2df): Change the result type to unsigned.
>>
>> gcc/testsuite/ChangeLog:
>>  * gcc.target/powerpc/builtins-3-runnable.c: Add tests for
>>  vec_unsignede and vec_unsignedo with negative arguments.
>> ---
>>  gcc/config/rs6000/rs6000-builtins.def | 12 +-
>>  .../gcc.target/powerpc/builtins-3-runnable.c  | 23 ---
>>  2 files changed, 26 insertions(+), 9 deletions(-)
>>
>> diff --git a/gcc/config/rs6000/rs6000-builtins.def 
>> b/gcc/config/rs6000/rs6000-builtins.def
>> index c6d2ea1bc39..bf9a0ae22fc 100644
>> --- a/gcc/config/rs6000/rs6000-builtins.def
>> +++ b/gcc/config/rs6000/rs6000-builtins.def
>> @@ -1580,16 +1580,16 @@
>>const vsi __builtin_vsx_vsignedo_v2df (vd);
>>  VEC_VSIGNEDO_V2DF vsignedo_v2df {}
>>  
>> -  const vsll __builtin_vsx_vunsigned_v2df (vd);
>> -VEC_VUNSIGNED_V2DF vsx_xvcvdpsxds {}
>> +  const vull __builtin_vsx_vunsigned_v2df (vd);
>> +VEC_VUNSIGNED_V2DF vsx_xvcvdpuxds {}
>>  
>> -  const vsi __builtin_vsx_vunsigned_v4sf (vf);
>> -VEC_VUNSIGNED_V4SF vsx_xvcvspsxws {}
>> +  const vui __builtin_vsx_vunsigned_v4sf (vf);
>> +VEC_VUNSIGNED_V4SF vsx_xvcvspuxws {}
>>  
>> -  const vsi __builtin_vsx_vunsignede_v2df (vd);
>> +  const vui __builtin_vsx_vunsignede_v2df (vd);
>>  VEC_VUNSIGNEDE_V2DF vunsignede_v2df {}
>>  
>> -  const vsi __builtin_vsx_vunsignedo_v2df (vd);
>> +  const vui __builtin_vsx_vunsignedo_v2df (vd);
>>  VEC_VUNSIGNEDO_V2DF vunsignedo_v2df {}
>>  
>>const vf __builtin_vsx_xscvdpsp (double);
>> diff --git a/gcc/testsuite/gcc.target/powerpc/builtins-3-runnable.c 
>> b/gcc/testsuite/gcc.target/powerpc/builtins-3-runnable.c
>> index 0231a1fd086..6d4fe84c8a1 100644
>> --- a/gcc/testsuite/gcc.target/powerpc/builtins-3-runnable.c
>> +++ b/gcc/testsuite/gcc.target/powerpc/builtins-3-runnable.c
>> @@ -313,6 +313,15 @@ int main()
>>  test_unsigned_int_result (ALL, vec_uns_int_result,
>>vec_uns_int_expected);
>>  
>> +/* Convert single precision float to  unsigned int.  Negative
>> +   arguments
>> + */
>> +vec_flt0 = (vector float){-14.930, -834.49, -3.3, -5.4};
>> +vec_uns_int_expected = (vector unsigned int){0, 0, 0, 0};
>> +vec_uns_int_result = vec_unsigned (vec_flt0);
>> +test_unsigned_int_result (ALL, vec_uns_int_result,
>> +  vec_uns_int_expected);
>> +
>>  /* Convert double precision float to long long unsigned int */
>>  vec_dble0 = (vector double){124.930, 8134.49};
>>  vec_ll_uns_int_expected = (vector long long unsigned int){124, 8134};
>> @@ -321,9 +330,9 @@ int main()
>>   vec_ll_uns_int_expected);
> 
> Nit: Similar coverage on negative for vector double can be added here.

Added.

  Carl


Re: [PATCH 7/13] rs6000, remove the vec_xxsel built-ins, they are duplicates

2024-05-24 Thread Carl Love
Kewen:

On 5/13/24 19:55, Kewen.Lin wrote:
> Hi,
> 
> on 2024/4/20 05:18, Carl Love wrote:
>> rs6000, remove the vec_xxsel built-ins, they are duplicates


>> -int do_sel(void)
>> -{
>> -  int i = 0;
>> -
>> -  si[i][0] = __builtin_vsx_xxsel_4si (si[i][1], si[i][2], si[i][3]); i++;
  ^ changed to ui
>> -  ss[i][0] = __builtin_vsx_xxsel_8hi (ss[i][1], ss[i][2], ss[i][3]); i++;
  ^ changed to ui
>> -  sc[i][0] = __builtin_vsx_xxsel_16qi (sc[i][1], sc[i][2], sc[i][3]); i++;
   ^ changed to uc
>> -  f[i][0] = __builtin_vsx_xxsel_4sf (f[i][1], f[i][2], f[i][3]); i++;
>> -  d[i][0] = __builtin_vsx_xxsel_2df (d[i][1], d[i][2], d[i][3]); i++;
>> -
>> -  si[i][0] = __builtin_vsx_xxsel (si[i][1], si[i][2], bi[i][3]); i++;
>> -  ss[i][0] = __builtin_vsx_xxsel (ss[i][1], ss[i][2], bs[i][3]); i++;
>> -  sc[i][0] = __builtin_vsx_xxsel (sc[i][1], sc[i][2], bc[i][3]); i++;
>> -  f[i][0] = __builtin_vsx_xxsel (f[i][1], f[i][2], bi[i][3]); i++;
>> -  d[i][0] = __builtin_vsx_xxsel (d[i][1], d[i][2], bl[i][3]); i++;
>> -
>> -  si[i][0] = __builtin_vsx_xxsel (si[i][1], si[i][2], ui[i][3]); i++;
>> -  ss[i][0] = __builtin_vsx_xxsel (ss[i][1], ss[i][2], us[i][3]); i++;
>> -  sc[i][0] = __builtin_vsx_xxsel (sc[i][1], sc[i][2], uc[i][3]); i++;
>> -  f[i][0] = __builtin_vsx_xxsel (f[i][1], f[i][2], ui[i][3]); i++;
>> -  d[i][0] = __builtin_vsx_xxsel (d[i][1], d[i][2], ul[i][3]); i++;
>> -
>> -  return i;
>> -}
>> -
> 
> I prefer to keep them but just replacing the call with vec_sel.
> 
> OK with the above nits tweaked, thanks.

OK, changed __builtin_vsx_xxsel_4si_* to vec_sel, changed__builtin_vsx_xxsel to 
vec_sel.
Had to add #include .

Finally, changed the third argument for the first three calls, as noted above, 
to be compatible with the vec_sel built-in specification.

   Carl

> 
> BR,
> Kewen
> 
>>  int do_perm(void)
>>  {
>>int i = 0;
> 


Re: [PATCH 11/13] rs6000, remove __builtin_vsx_xvcmpeqsp_p built-in

2024-05-24 Thread Carl Love



On 5/13/24 22:26, Kewen.Lin wrote:
> Hi,
> 
> on 2024/4/20 05:18, Carl Love wrote:
>> rs6000, remove __builtin_vsx_xvcmpeqsp_p built-in
>>
>> The built-in __builtin_vsx_xvcmpeqsp_p is a duplicate of the overloaded
>> __builtin_altivec_vcmpeqfp_p built-in.  The built-in is undocumented and
>> there are no test cases for it.  The patch removes built-in
>> __builtin_vsx_xvcmpeqsp_p.
> As the previous review comments in the v1 (this is actually v2):
> https://gcc.gnu.org/pipermail/gcc-patches/2024-February/646728.html
> , both __builtin_vsx_xvcmpeqsp_p and __builtin_vsx_xvcmpeqsp can be
> dropped, so please consider __builtin_vsx_xvcmpeqsp as well.

Yes, as you noted, __builtin_vsx_xvcmpeqsp is removed in the next patch.
> 
>>
>> gcc/ChangeLog:
>>  * config/rs6000/rs6000-builtin.cc (case RS6000_BIF_RSQRT):
>>  Remove case statement.
> 
> It seems you mixed this with some other patch, this line doesn't
> belong to this patch, ...

Took that out of this patch.  Didn't get the changes separated cleanly.

> 
>> * config/rs6000/rs6000-builtins.def (__builtin_vsx_xvcmpeqsp_p):
>>  Remove built-in definition.
>> ---
>>  gcc/config/rs6000/rs6000-builtin.cc   | 6 --
>>  gcc/config/rs6000/rs6000-builtins.def | 6 --
>>  2 files changed, 12 deletions(-)
>>
>> diff --git a/gcc/config/rs6000/rs6000-builtin.cc 
>> b/gcc/config/rs6000/rs6000-builtin.cc
>> index f83d65b06ef..74ed8fc1805 100644
>> --- a/gcc/config/rs6000/rs6000-builtin.cc
>> +++ b/gcc/config/rs6000/rs6000-builtin.cc
>> @@ -269,12 +269,6 @@ rs6000_builtin_md_vectorized_function (tree fndecl, 
>> tree type_out,
>>  = (enum rs6000_gen_builtins) DECL_MD_FUNCTION_CODE (fndecl);
>>switch (fn)
>>  {
>> -case RS6000_BIF_RSQRTF:
>> -  if (VECTOR_UNIT_ALTIVEC_OR_VSX_P (V4SFmode)
>> -  && out_mode == SFmode && out_n == 4
>> -  && in_mode == SFmode && in_n == 4)
>> -return rs6000_builtin_decls[RS6000_BIF_VRSQRTFP];
>> -  break;
> 
> ... and this ...

Ditto

> 
>>  case RS6000_BIF_RSQRT:
>>if (VECTOR_UNIT_VSX_P (V2DFmode)
>>&& out_mode == DFmode && out_n == 2
>> diff --git a/gcc/config/rs6000/rs6000-builtins.def 
>> b/gcc/config/rs6000/rs6000-builtins.def
>> index d65c858ac0c..2f6149edd5f 100644
>> --- a/gcc/config/rs6000/rs6000-builtins.def
>> +++ b/gcc/config/rs6000/rs6000-builtins.def
>> @@ -917,9 +917,6 @@
>>fpmath vf __builtin_altivec_vrsqrtefp (vf);
>>  VRSQRTEFP rsqrtev4sf2 {}
>>  
>> -  fpmath vf __builtin_altivec_vrsqrtfp (vf);
>> -VRSQRTFP rsqrtv4sf2 {}
>> -
> 
> ..., also this.

Ditto

> 
> BR,
> Kewen
> 
>>const vsc __builtin_altivec_vsel_16qi (vsc, vsc, vuc);
>>  VSEL_16QI vector_select_v16qi {}
>>  
>> @@ -1619,9 +1616,6 @@
>>const vf __builtin_vsx_xvcmpeqsp (vf, vf);
>>  XVCMPEQSP vector_eqv4sf {}
>>  
>> -  const signed int __builtin_vsx_xvcmpeqsp_p (signed int, vf, vf);
>> -XVCMPEQSP_P vector_eq_v4sf_p {pred}
>> -
>>const vd __builtin_vsx_xvcmpgedp (vd, vd);
>>  XVCMPGEDP vector_gev2df {}
>>  


Re: [PATCH 6/13] rs6000, add overloaded vec_sel with int128 arguments

2024-05-24 Thread Carl Love
Kewen:

On 5/21/24 20:05, Kewen.Lin wrote:
> Hi Carl,
> 
> on 2024/5/22 08:13, Carl Love wrote:
>> Kewen:



>>> Why did you place this in a section for ISA 3.1 (Power10)?  It doesn't 
>>> really
>>> require this support.  The used instance VSEL_1TI and VSEL_1TI_UNS are 
>>> placed
>>> in altivec stanza, so it looks that we should put it under the section
>>> "PowerPC AltiVec Built-in Functions on ISA 2.05".  And since it's an 
>>> extension
>>> of @code{vec_sel} documented in the PVIPR, I prefer to just mention it's "an
>>> extension of the @code{vec_sel} built-in documented in the PVIPR" and 
>>> omitting
>>> the description to avoid possible slightly different wording.
>>
>> Honestly, at this point in time I don't remember why I put it there.  It has 
>> been too long since I created the patch.  That said, the test case requires 
>> Power 10 do to the comparison check using built-in vec_all_eq but that is 
>> another issue.  
>> The built-in generates the xxsel instruction that is an ISA 2.06 
>> instruction.  So, I would say it should to into the ISA 2.06 section.  I 
>> moved it to the ISA 2.06 section.
> 
> But the underlying implementation is:
> 
>   const vsq __builtin_altivec_vsel_1ti (vsq, vsq, vuq);
> VSEL_1TI vector_select_v1ti {}
> 
>   const vuq __builtin_altivec_vsel_1ti_uns (vuq, vuq, vuq);
> VSEL_1TI_UNS vector_select_v1ti_uns {}
> 
> , it's under altivec stanza and can result with insn vsel (so not xxsel),
> vsel is ISA 2.03, so I think ISA 2.05 better matches the implementation.

OK, moved to ISA 2.05

> 



>>
>> Sounds like there was some issue that you noticed on 
>> r14-10011-g6e62ede7aaccc6.  The new version of
>> print_i128 should be functionally equivalent but perhaps is "safer"?
> 
> Thanks for checking!  Looking into this more closely, I realized you didn't 
> apply the previously
> adopted way for printing (the way used in 
> gcc.target/powerpc/builtins-6-p9-runnable.c), sorry for
> the false alarm!  So your supposed print_i128 is fine to me.

OK, no problem.  Will go with the original print_i128 function.

Carl 


Re: [PATCH 8/13] rs6000, remove __builtin_vsx_vperm_* built-ins

2024-05-24 Thread Carl Love
Kewen:

On 5/13/24 19:59, Kewen.Lin wrote:
> Hi,
> 
> on 2024/4/20 05:18, Carl Love wrote:



>> diff --git a/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c 
>> b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c
>> index 01f35dad713..35ea31b2616 100644
>> --- a/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c
>> +++ b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c
>> @@ -2,7 +2,6 @@
>>  /* { dg-skip-if "" { powerpc*-*-darwin* } } */
>>  /* { dg-require-effective-target powerpc_vsx_ok } */
>>  /* { dg-options "-O2 -mdejagnu-cpu=power7" } */
>> -/* { dg-final { scan-assembler "vperm" } } */
>>  /* { dg-final { scan-assembler "xvrdpi" } } */
>>  /* { dg-final { scan-assembler "xvrdpic" } } */
>>  /* { dg-final { scan-assembler "xvrdpim" } } */
>> @@ -56,25 +55,6 @@ extern __vector unsigned long long ull[][4];
>>  extern __vector __bool long bl[][4];
>>  #endif
>>  
>> -int do_perm(void)
>> -{
>> -  int i = 0;
>> -
>> -  si[i][0] = __builtin_vsx_vperm_4si (si[i][1], si[i][2], uc[i][3]); i++;
>> -  ss[i][0] = __builtin_vsx_vperm_8hi (ss[i][1], ss[i][2], uc[i][3]); i++;
>> -  sc[i][0] = __builtin_vsx_vperm_16qi (sc[i][1], sc[i][2], uc[i][3]); i++;
>> -  f[i][0] = __builtin_vsx_vperm_4sf (f[i][1], f[i][2], uc[i][3]); i++;
>> -  d[i][0] = __builtin_vsx_vperm_2df (d[i][1], d[i][2], uc[i][3]); i++;
>> -
>> -  si[i][0] = __builtin_vsx_vperm (si[i][1], si[i][2], uc[i][3]); i++;
>> -  ss[i][0] = __builtin_vsx_vperm (ss[i][1], ss[i][2], uc[i][3]); i++;
>> -  sc[i][0] = __builtin_vsx_vperm (sc[i][1], sc[i][2], uc[i][3]); i++;
>> -  f[i][0] = __builtin_vsx_vperm (f[i][1], f[i][2], uc[i][3]); i++;
>> -  d[i][0] = __builtin_vsx_vperm (d[i][1], d[i][2], uc[i][3]); i++;
>> -
>> -  return i;
>> -}
>> -
> 
> I prefer to just relace these __builtin_vsx_vperm with vec_perm,
> OK with this tweaked (also keep the above removed vperm scan), thanks!

OK, sounds good.  Updated the patch to change built-in calls to vec_perm.  
Updated ChangeLog message to match change.
   
 Carl 


Re: [PATCH 10/13] rs6000, extend vec_xxpermdi built-in for __int128 args

2024-05-24 Thread Carl Love



On 5/13/24 22:14, Kewen.Lin wrote:
> Hi,
> 
> on 2024/4/20 05:18, Carl Love wrote:
>> rs6000, extend vec_xxpermdi built-in for __int128 args
>>
>> Add a new overloaded instance for vec_xxpermdi
>>
>>__int128 vec_xxpermdi (__int128, __int128, const int);
>>
>> Update the documentation to include a reference to the new built-in
>> instance.
>>
>> gcc/ChangeLog:
>> * config/rs6000/rs6000-builtins.def (vec_xxpermdi): Add new
>>  overloaded built-in instance.
>> ---
>>  gcc/config/rs6000/rs6000-overload.def | 2 ++
>>  gcc/doc/extend.texi   | 1 +
>>  2 files changed, 3 insertions(+)
>>
>> diff --git a/gcc/config/rs6000/rs6000-overload.def 
>> b/gcc/config/rs6000/rs6000-overload.def
>> index 5912c9452f4..49962e2f2a2 100644
>> --- a/gcc/config/rs6000/rs6000-overload.def
>> +++ b/gcc/config/rs6000/rs6000-overload.def
>> @@ -4932,6 +4932,8 @@
>>  XXPERMDI_4SF  XXPERMDI_VF
>>vd __builtin_vsx_xxpermdi (vd, vd, const int);
>>  XXPERMDI_2DF  XXPERMDI_VD
>> +  vsq __builtin_vsx_xxpermdi (vsq, vsq, const int);
>> +XXPERMDI_1TI  XXPERMDI_1TI
> 
> This actually introduces the signed __int128, considering the other
> existing ones, I think we want both signed and unsigned.

Added unsigned as well.

> 
>>  
>>  [VEC_XXSLDWI, vec_xxsldwi, __builtin_vsx_xxsldwi]
>>vsc __builtin_vsx_xxsldwi (vsc, vsc, const int);
>> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
>> index 86b8e536dbe..47cf2f3bc8b 100644
>> --- a/gcc/doc/extend.texi
>> +++ b/gcc/doc/extend.texi
>> @@ -22505,6 +22505,7 @@ void vec_vsx_st (vector bool char, int, vector bool 
>> char *);
>>  void vec_vsx_st (vector bool char, int, unsigned char *);
>>  void vec_vsx_st (vector bool char, int, signed char *);
>>  
>> +vector __int128 vec_xxpermdi (vector __int128, vector __int128, const int);
>>  vector double vec_xxpermdi (vector double, vector double, const int);
>>  vector float vec_xxpermdi (vector float, vector float, const int);
> 
> Nit: Considering the existing ones sorted by element size descending, I guess
> it's better to move the above here (and with the explicit signed and 
> unsigned).

OK, moved the new prototype down below the float prototype and added the 
unsigned prototype.
> 
> And we need a test case for it as well?
Yes, we need a test case for both.  Added a new runnable test file.

   Carl 


[PATCH 0/13 ver 3] rs6000, built-in cleanup patch series

2024-05-29 Thread Carl Love


GCC maintainers:

The following is an updated patch series to remove duplicate built-ins.  

There are patches to extend an existing overloaded built-in to cover additional 
input types. 

A new patch, 0005-rs6000-Remove-redundant-float-double-type-conversion.patch, 
was added to remove built-ins that were inadvertently missing in the last 
version.  

Patch 12 patch in the previous series was dropped as the built-in 
__builtin_vsx_xvcmpeqsp is not a duplicate of the overloaded vec_cmpeq 
built-in.  Specifically, the return values are different.  The goal in this 
series is to remove built-ins that are functionally equivalent.  Patch 12 from 
the previous series will be reworked and submitted later.

Some of the patches in the previous series were approved, but everything is 
being reposted for completeness.  The following gives the mapping of the 
patches from the previous version to the current version of the series with 
notes on the patches.

Version 2   Version 3   Notes
patch 1 patch 1 Approved, no changes
patch 2 patch 2 Responded to comments, 
no changes to the patch
patch 3 patch 3 Updated changelog, no 
functional changes
patch 4 patch 4 Updated patch
patch 5 New patch to removed 
built-ins missed in the
series.
patch 5 patch 6 Updated patch
patch 6 patch 7 Updated patch
patch 7 patch 8 Updated patch
patch 8 patch 9 Approved, no changes to 
this patch
patch 9 patch 10Approved, no changes to 
this patch
patch 10patch 11Updated, added test 
file.
patch 11patch 12Updated
patch 12Patch from previous 
series removed
patch 13patch 13Comments said built-ins 
__builtin_vec_set_v1ti
__builtin_vec_set_v2di, 
__builtin_vec_set_v2df
can also get removed 
with equivalent gimple codes.
This is somewhat more 
involved than a simple
removal of redundant 
built-ins.  The built-ins 
will be removed in a 
separate future patch.

The patch series has been tested on Power 10 LE, Power 9 BE with no regression 
failures.
in additional patch


The patches have all been tested on Power 10 LE.  The last patch was also 
tested on Power 8 BE.

No regression tests were seen.

Please let me know if the patches are acceptable for mainline.  Thanks.

   Carl 




Re: [PATCH 1/13 ver 3] s6000, Remove __builtin_vsx_cmple* builtins

2024-05-29 Thread Carl Love
This patch was approved in the previous series.  There are no changes to this 
patch.  Reposting for completeness. 

 Carl 
---

rs6000, Remove __builtin_vsx_cmple* builtins

The built-ins __builtin_vsx_cmple_u16qi, __builtin_vsx_cmple_u2di,
__builtin_vsx_cmple_u4si and __builtin_vsx_cmple_u8hi should take
unsigned arguments and return an unsigned result.  The current definitions
take signed arguments and return signed results which is incorrect.

The signed and unsigned versions of __builtin_vsx_cmple* are not
documented in extend.texi.  Also there are no test cases for the
built-ins.

Users can use the existing vec_cmple as PVIPR defines instead of
__builtin_vsx_cmple_u16qi, __builtin_vsx_cmple_u2di,
__builtin_vsx_cmple_u4si and __builtin_vsx_cmple_u8hi,
__builtin_vsx_cmple_16qi, __builtin_vsx_cmple_2di,
__builtin_vsx_cmple_4si and __builtin_vsx_cmple_8hi,
__builtin_altivec_cmple_1ti, __builtin_altivec_cmple_u1ti.

Hence these built-ins are redundant and are removed by this patch.

gcc/ChangeLog:
* config/rs6000/rs6000-builtin.cc (RS6000_BIF_CMPLE_16QI,
RS6000_BIF_CMPLE_U16QI, RS6000_BIF_CMPLE_8HI,
RS6000_BIF_CMPLE_U8HI, RS6000_BIF_CMPLE_4SI, RS6000_BIF_CMPLE_U4SI,
RS6000_BIF_CMPLE_2DI, RS6000_BIF_CMPLE_U2DI, RS6000_BIF_CMPLE_1TI,
RS6000_BIF_CMPLE_U1TI): Remove case statements.
* config/rs6000/rs6000-builtins.def (__builtin_vsx_cmple_16qi,
__builtin_vsx_cmple_2di, __builtin_vsx_cmple_4si,
__builtin_vsx_cmple_8hi, __builtin_vsx_cmple_u16qi,
__builtin_vsx_cmple_u2di, __builtin_vsx_cmple_u4si,
__builtin_vsx_cmple_u8hi): Remove buit-in definitions.
---
 gcc/config/rs6000/rs6000-builtin.cc   | 13 
 gcc/config/rs6000/rs6000-builtins.def | 30 ---
 2 files changed, 43 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-builtin.cc 
b/gcc/config/rs6000/rs6000-builtin.cc
index 320affd79e3..ac9f16fe51a 100644
--- a/gcc/config/rs6000/rs6000-builtin.cc
+++ b/gcc/config/rs6000/rs6000-builtin.cc
@@ -2027,19 +2027,6 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
   fold_compare_helper (gsi, GT_EXPR, stmt);
   return true;
 
-case RS6000_BIF_CMPLE_16QI:
-case RS6000_BIF_CMPLE_U16QI:
-case RS6000_BIF_CMPLE_8HI:
-case RS6000_BIF_CMPLE_U8HI:
-case RS6000_BIF_CMPLE_4SI:
-case RS6000_BIF_CMPLE_U4SI:
-case RS6000_BIF_CMPLE_2DI:
-case RS6000_BIF_CMPLE_U2DI:
-case RS6000_BIF_CMPLE_1TI:
-case RS6000_BIF_CMPLE_U1TI:
-  fold_compare_helper (gsi, LE_EXPR, stmt);
-  return true;
-
 /* flavors of vec_splat_[us]{8,16,32}.  */
 case RS6000_BIF_VSPLTISB:
 case RS6000_BIF_VSPLTISH:
diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index 3bc7fed6956..7c36976a089 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -1337,30 +1337,6 @@
   const vss __builtin_vsx_cmpge_u8hi (vus, vus);
 CMPGE_U8HI vector_nltuv8hi {}
 
-  const vsc __builtin_vsx_cmple_16qi (vsc, vsc);
-CMPLE_16QI vector_ngtv16qi {}
-
-  const vsll __builtin_vsx_cmple_2di (vsll, vsll);
-CMPLE_2DI vector_ngtv2di {}
-
-  const vsi __builtin_vsx_cmple_4si (vsi, vsi);
-CMPLE_4SI vector_ngtv4si {}
-
-  const vss __builtin_vsx_cmple_8hi (vss, vss);
-CMPLE_8HI vector_ngtv8hi {}
-
-  const vsc __builtin_vsx_cmple_u16qi (vsc, vsc);
-CMPLE_U16QI vector_ngtuv16qi {}
-
-  const vsll __builtin_vsx_cmple_u2di (vsll, vsll);
-CMPLE_U2DI vector_ngtuv2di {}
-
-  const vsi __builtin_vsx_cmple_u4si (vsi, vsi);
-CMPLE_U4SI vector_ngtuv4si {}
-
-  const vss __builtin_vsx_cmple_u8hi (vss, vss);
-CMPLE_U8HI vector_ngtuv8hi {}
-
   const vd __builtin_vsx_concat_2df (double, double);
 CONCAT_2DF vsx_concat_v2df {}
 
@@ -3117,12 +3093,6 @@
   const vbq __builtin_altivec_cmpge_u1ti (vuq, vuq);
 CMPGE_U1TI vector_nltuv1ti {}
 
-  const vbq __builtin_altivec_cmple_1ti (vsq, vsq);
-CMPLE_1TI vector_ngtv1ti {}
-
-  const vbq __builtin_altivec_cmple_u1ti (vuq, vuq);
-CMPLE_U1TI vector_ngtuv1ti {}
-
   const unsigned long long __builtin_altivec_cntmbb (vuc, const int<1>);
 VCNTMBB vec_cntmb_v16qi {}
 
-- 
2.45.0



Re: [PATCH 2/13 ver 3] rs6000, Remove __builtin_vsx_xvcvspsxws built-in

2024-05-29 Thread Carl Love
I responded to comments about the patch from the previous patch series.  No 
functional changes were made to this patch.

Carl 
-- 

rs6000, Remove __builtin_vsx_xvcvspsxws built-in.

The built-in __builtin_vsx_xvcvspsxws is a duplicate of the vec_signed
built-in that is documented in the PVIPR.  The __builtin_vsx_xvcvspsxws
built-in is not documented and there are no test cases for it.

This patch removes the redundant built-in.

gcc/ChangeLog:
* config/rs6000/rs6000-builtins.def (__builtin_vsx_xvcvspsxws):
Remove built-in definition.
---
 gcc/config/rs6000/rs6000-builtins.def | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index 7c36976a089..c6d2ea1bc39 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -1709,9 +1709,6 @@
   const vsll __builtin_vsx_xvcvspsxds (vf);
 XVCVSPSXDS vsx_xvcvspsxds {}
 
-  const vsi __builtin_vsx_xvcvspsxws (vf);
-XVCVSPSXWS vsx_fix_truncv4sfv4si2 {}
-
   const vsll __builtin_vsx_xvcvspuxds (vf);
 XVCVSPUXDS vsx_xvcvspuxds {}
 
-- 
2.45.0



Re: [PATCH 3/13 ver 3] rs6000, fix error in unsigned vector float to unsigned int built-in definition

2024-05-29 Thread Carl Love
This patch was updated per the feedback comment from the previous version in 
series 2.

 Carl 
---

rs6000, fix error in unsigned vector float to unsigned int built-in definitions

The built-in __builtin_vsx_vunsigned_v2df is supposed to take a vector of
doubles and return a vector of unsigned long long ints.  Similarly
__builtin_vsx_vunsigned_v4sf takes a vector of floats an is supposed to
return a vector of unsinged ints.  The definitions are using the signed
version of the instructions not the unsigned version of the instruction.
The results should also be unsigned.  The builtins are used by the
overloaded vec_unsigned builtin which has an unsigned result.

Similarly the built-ins __builtin_vsx_vunsignede_v2df and
__builtin_vsx_vunsignedo_v2df are supposed to return an unsigned result.
If the floating point argument is negative, the unsigned result is zero.
The built-ins are used in the overloaded built-in vec_unsignede and
vec_unsignedo respectively.

Add a test cases for a negative floating point arguments for each of the
above built-ins.

gcc/ChangeLog:
* config/rs6000/rs6000-builtins.def (__builtin_vsx_vunsigned_v2df,
__builtin_vsx_vunsigned_v4sf, __builtin_vsx_vunsignede_v2df,
__builtin_vsx_vunsignedo_v2df): Change the result type to unsigned.

gcc/testsuite/ChangeLog:
* gcc.target/powerpc/builtins-3-runnable.c: Add tests for
vec_unsignede and vec_unsignedo with negative arguments.
---
 gcc/config/rs6000/rs6000-builtins.def | 12 
 .../gcc.target/powerpc/builtins-3-runnable.c  | 30 +--
 2 files changed, 33 insertions(+), 9 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index c6d2ea1bc39..bf9a0ae22fc 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -1580,16 +1580,16 @@
   const vsi __builtin_vsx_vsignedo_v2df (vd);
 VEC_VSIGNEDO_V2DF vsignedo_v2df {}
 
-  const vsll __builtin_vsx_vunsigned_v2df (vd);
-VEC_VUNSIGNED_V2DF vsx_xvcvdpsxds {}
+  const vull __builtin_vsx_vunsigned_v2df (vd);
+VEC_VUNSIGNED_V2DF vsx_xvcvdpuxds {}
 
-  const vsi __builtin_vsx_vunsigned_v4sf (vf);
-VEC_VUNSIGNED_V4SF vsx_xvcvspsxws {}
+  const vui __builtin_vsx_vunsigned_v4sf (vf);
+VEC_VUNSIGNED_V4SF vsx_xvcvspuxws {}
 
-  const vsi __builtin_vsx_vunsignede_v2df (vd);
+  const vui __builtin_vsx_vunsignede_v2df (vd);
 VEC_VUNSIGNEDE_V2DF vunsignede_v2df {}
 
-  const vsi __builtin_vsx_vunsignedo_v2df (vd);
+  const vui __builtin_vsx_vunsignedo_v2df (vd);
 VEC_VUNSIGNEDO_V2DF vunsignedo_v2df {}
 
   const vf __builtin_vsx_xscvdpsp (double);
diff --git a/gcc/testsuite/gcc.target/powerpc/builtins-3-runnable.c 
b/gcc/testsuite/gcc.target/powerpc/builtins-3-runnable.c
index 0231a1fd086..5dcdfbee791 100644
--- a/gcc/testsuite/gcc.target/powerpc/builtins-3-runnable.c
+++ b/gcc/testsuite/gcc.target/powerpc/builtins-3-runnable.c
@@ -313,6 +313,14 @@ int main()
test_unsigned_int_result (ALL, vec_uns_int_result,
  vec_uns_int_expected);
 
+   /* Convert single precision float to  unsigned int.  Negative
+  arguments.  */
+   vec_flt0 = (vector float){-14.930, -834.49, -3.3, -5.4};
+   vec_uns_int_expected = (vector unsigned int){0, 0, 0, 0};
+   vec_uns_int_result = vec_unsigned (vec_flt0);
+   test_unsigned_int_result (ALL, vec_uns_int_result,
+ vec_uns_int_expected);
+
/* Convert double precision float to long long unsigned int */
vec_dble0 = (vector double){124.930, 8134.49};
vec_ll_uns_int_expected = (vector long long unsigned int){124, 8134};
@@ -320,10 +328,18 @@ int main()
test_ll_unsigned_int_result (vec_ll_uns_int_result,
 vec_ll_uns_int_expected);
 
+   /* Convert double precision float to long long unsigned int. Negative
+  arguments.  */
+   vec_dble0 = (vector double){-24.93, -134.9};
+   vec_ll_uns_int_expected = (vector long long unsigned int){0, 0};
+   vec_ll_uns_int_result = vec_unsigned (vec_dble0);
+   test_ll_unsigned_int_result (vec_ll_uns_int_result,
+vec_ll_uns_int_expected);
+
/* Convert double precision vector float to vector unsigned int,
-  even words */
-   vec_dble0 = (vector double){3124.930, 8234.49};
-   vec_uns_int_expected = (vector unsigned int){3124, 0, 8234, 0};
+  even words.  Negative arguments */
+   vec_dble0 = (vector double){-124.930, -234.49};
+   vec_uns_int_expected = (vector unsigned int){0, 0, 0, 0};
vec_uns_int_result = vec_unsignede (vec_dble0);
test_unsigned_int_result (EVEN, vec_uns_int_result,
  vec_uns_int_expected);
@@ -335,5 +351,13 @@ int main()
vec_uns_int_resul

Re: [PATCH 4/13 ver 3] rs6000, extend the current vec_{un,}signed{e,o} built-ins

2024-05-29 Thread Carl Love
Updated the patch per the feedback comments from the previous version.

 Carl 
---

rs6000, extend the current vec_{un,}signed{e,o} built-ins

The built-ins __builtin_vsx_xvcvspsxds and __builtin_vsx_xvcvspuxds
convert a vector of floats to signed/unsigned long long ints.  Extend the
existing vec_{un,}signed{e,o} built-ins to handle the argument
vector of floats to return the even/odd signed/unsigned integers.

The define expands vsignede_v4sf, vsignedo_v4sf, vunsignede_v4sf,
vunsignedo_v4sf are added to support the new vec_{un,}signed{e,o}
built-ins.

The built-ins __builtin_vsx_xvcvspsxds and __builtin_vsx_xvcvspuxds are
now for internal use only. They are not documented and they do not
have testcases.

The built-in __builtin_vsx_xvcvdpsxws is redundant as it is covered by
vec_signed{e,o}, remove.

The built-in __builtin_vsx_xvcvdpuxws is redundant as it is covered by
vec_unsigned{e,o}, remove.

The built-in __builtin_vsx_xvcvdpuxds_uns is redundant as it is covered by
vec_unsigned, remove.

The __builtin_vsx_xvcvspuxws is redundante as it is covered by
vec_unsigned, remove.

Add testcases and update documentation.

gcc/ChangeLog:
* config/rs6000/rs6000-builtins.def (__builtin_vsx_xvcvspsxds_low,
__builtin_vsx_xvcvspuxds_low): New built-in definitions.
(__builtin_vsx_xvcvspuxds): Fix return type.
(XVCVSPSXDS, XVCVSPUXDS): Renamed VEC_VSIGNEDE_V4SF,
VEC_VUNSIGNEDE_V4SF respectively.
(vsx_xvcvspsxds, vsx_xvcvspuxds): Renamed vsignede_v4sf,
vunsignede_v4sf respectively.
(__builtin_vsx_xvcvdpsxws, __builtin_vsx_xvcvdpuxws,
__builtin_vsx_xvcvdpuxds_uns, __builtin_vsx_xvcvspuxws): Removed.
* config/rs6000/rs6000-overload.def (vec_signede, vec_signedo,
vec_unsignede,vec_unsignedo):  Add new overloaded specifications.
* config/rs6000/vsx.md (vsignede_v4sf, vsignedo_v4sf,
vunsignede_v4sf, vunsignedo_v4sf): New define_expands.
* doc/extend.texi (vec_signedo, vec_signede): Add documentation.

gcc/testsuite/ChangeLog:
* gcc.target/powerpc/builtins-3-runnable.c: New tests for the added
overloaded built-ins.
---
 gcc/config/rs6000/rs6000-builtins.def | 25 ++
 gcc/config/rs6000/rs6000-overload.def |  8 ++
 gcc/config/rs6000/vsx.md  | 88 +++
 gcc/doc/extend.texi   | 10 +++
 .../gcc.target/powerpc/builtins-3-runnable.c  | 51 +--
 5 files changed, 157 insertions(+), 25 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index bf9a0ae22fc..cea2649b86c 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -1688,32 +1688,23 @@
   const vsll __builtin_vsx_xvcvdpsxds_scale (vd, const int);
 XVCVDPSXDS_SCALE vsx_xvcvdpsxds_scale {}
 
-  const vsi __builtin_vsx_xvcvdpsxws (vd);
-XVCVDPSXWS vsx_xvcvdpsxws {}
-
-  const vsll __builtin_vsx_xvcvdpuxds (vd);
-XVCVDPUXDS vsx_fixuns_truncv2dfv2di2 {}
-
   const vsll __builtin_vsx_xvcvdpuxds_scale (vd, const int);
 XVCVDPUXDS_SCALE vsx_xvcvdpuxds_scale {}
 
-  const vull __builtin_vsx_xvcvdpuxds_uns (vd);
-XVCVDPUXDS_UNS vsx_fixuns_truncv2dfv2di2 {}
-
-  const vsi __builtin_vsx_xvcvdpuxws (vd);
-XVCVDPUXWS vsx_xvcvdpuxws {}
-
   const vd __builtin_vsx_xvcvspdp (vf);
 XVCVSPDP vsx_xvcvspdp {}
 
   const vsll __builtin_vsx_xvcvspsxds (vf);
-XVCVSPSXDS vsx_xvcvspsxds {}
+VEC_VSIGNEDE_V4SF vsignede_v4sf {}
+
+  const vsll __builtin_vsx_xvcvspsxds_low (vf);
+VEC_VSIGNEDO_V4SF vsignedo_v4sf {}
 
-  const vsll __builtin_vsx_xvcvspuxds (vf);
-XVCVSPUXDS vsx_xvcvspuxds {}
+  const vull __builtin_vsx_xvcvspuxds (vf);
+VEC_VUNSIGNEDE_V4SF vunsignede_v4sf {}
 
-  const vsi __builtin_vsx_xvcvspuxws (vf);
-XVCVSPUXWS vsx_fixuns_truncv4sfv4si2 {}
+  const vull __builtin_vsx_xvcvspuxds_low (vf);
+VEC_VUNSIGNEDO_V4SF vunsignedo_v4sf {}
 
   const vd __builtin_vsx_xvcvsxddp (vsll);
 XVCVSXDDP vsx_floatv2div2df2 {}
diff --git a/gcc/config/rs6000/rs6000-overload.def 
b/gcc/config/rs6000/rs6000-overload.def
index 84bd9ae6554..4d857bb1af3 100644
--- a/gcc/config/rs6000/rs6000-overload.def
+++ b/gcc/config/rs6000/rs6000-overload.def
@@ -3307,10 +3307,14 @@
 [VEC_SIGNEDE, vec_signede, __builtin_vec_vsignede]
   vsi __builtin_vec_vsignede (vd);
 VEC_VSIGNEDE_V2DF
+  vsll __builtin_vec_vsignede (vf);
+VEC_VSIGNEDE_V4SF
 
 [VEC_SIGNEDO, vec_signedo, __builtin_vec_vsignedo]
   vsi __builtin_vec_vsignedo (vd);
 VEC_VSIGNEDO_V2DF
+  vsll __builtin_vec_vsignedo (vf);
+VEC_VSIGNEDO_V4SF
 
 [VEC_SIGNEXTI, vec_signexti, __builtin_vec_signexti]
   vsi __builtin_vec_signexti (vsc);
@@ -4433,10 +4437,14 @@
 [VEC_UNSIGNEDE, vec_unsignede, __builtin_vec_vunsignede]
   vui __builtin_vec_vunsignede (vd);
 VEC_VUNSIGNEDE_V2DF
+  vull __builtin_vec_vunsignede 

Re: [PATCH 5/13 ver 3] rs6000, Remove redundant float/double type conversions

2024-05-29 Thread Carl Love
This is a new patch to removed the built-ins that were inadvertently missing in 
the previous series.

  Carl 
--

rs6000, Remove redundant float/double type conversions

The following built-ins are redundant as they are covered by another
overloaded built-in.

  __builtin_vsx_xvcvspdp covered by vec_double{e,o}
  __builtin_vsx_xvcvdpsp covered by vec_float{e,o}
  __builtin_vsx_xvcvsxwdp covered by vec_double{e,o}
  __builtin_vsx_xvcvuxddp_uns covered by  vec_double

Remove the redundant built-ins. They are not documented nor do they have
test cases.

gcc/ChangeLog:
* config/rs6000/rs6000-builtins.def (__builtin_vsx_xvcvspdp,
__builtin_vsx_xvcvdpsp, __builtin_vsx_xvcvsxwdp,
__builtin_vsx_xvcvuxddp_uns): Remove.
---
 gcc/config/rs6000/rs6000-builtins.def | 12 
 1 file changed, 12 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index cea2649b86c..6049f3a4599 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -1679,9 +1679,6 @@
   const signed int __builtin_vsx_xvcmpgtsp_p (signed int, vf, vf);
 XVCMPGTSP_P vector_gt_v4sf_p {pred}
 
-  const vf __builtin_vsx_xvcvdpsp (vd);
-XVCVDPSP vsx_xvcvdpsp {}
-
   const vsll __builtin_vsx_xvcvdpsxds (vd);
 XVCVDPSXDS vsx_fix_truncv2dfv2di2 {}
 
@@ -1691,9 +1688,6 @@
   const vsll __builtin_vsx_xvcvdpuxds_scale (vd, const int);
 XVCVDPUXDS_SCALE vsx_xvcvdpuxds_scale {}
 
-  const vd __builtin_vsx_xvcvspdp (vf);
-XVCVSPDP vsx_xvcvspdp {}
-
   const vsll __builtin_vsx_xvcvspsxds (vf);
 VEC_VSIGNEDE_V4SF vsignede_v4sf {}
 
@@ -1715,9 +1709,6 @@
   const vf __builtin_vsx_xvcvsxdsp (vsll);
 XVCVSXDSP vsx_xvcvsxdsp {}
 
-  const vd __builtin_vsx_xvcvsxwdp (vsi);
-XVCVSXWDP vsx_xvcvsxwdp {}
-
   const vf __builtin_vsx_xvcvsxwsp (vsi);
 XVCVSXWSP vsx_floatv4siv4sf2 {}
 
@@ -1727,9 +1718,6 @@
   const vd __builtin_vsx_xvcvuxddp_scale (vsll, const int<5>);
 XVCVUXDDP_SCALE vsx_xvcvuxddp_scale {}
 
-  const vd __builtin_vsx_xvcvuxddp_uns (vull);
-XVCVUXDDP_UNS vsx_floatunsv2div2df2 {}
-
   const vf __builtin_vsx_xvcvuxdsp (vull);
 XVCVUXDSP vsx_xvcvuxdsp {}
 
-- 
2.45.0



Re: [PATCH 7/13 ver 3] rs6000, add overloaded vec_sel with int128 arguments

2024-05-29 Thread Carl Love
This was patch 6 in the previous series.  Updated the documentation file per 
the comments.  No functional changes to the patch.

  Carl 


rs6000, add overloaded vec_sel with int128 arguments

Extend the vec_sel built-in to take three signed/unsigned int128 arguments
and return a signed/unsigned int128 result.

Extending the vec_sel built-in makes the existing buit-ins
__builtin_vsx_xxsel_1ti and __builtin_vsx_xxsel_1ti_uns obsolete.  The
patch removes these built-ins.

The patch adds documentation and test cases for the new overloaded vec_sel
built-ins.

gcc/ChangeLog:
* config/rs6000/rs6000-builtins.def (__builtin_vsx_xxsel_1ti,
__builtin_vsx_xxsel_1ti_uns): Remove built-in definitions.
* config/rs6000/rs6000-overload.def (vec_sel): Add new overloaded
definitions.
* doc/extend.texi: Add documentation for new vec_sel instances.

gcc/testsuite/ChangeLog:
* gcc.target/powerpc/vec-sel-runnable-i128.c: New test file.
---
 gcc/config/rs6000/rs6000-builtins.def |   6 -
 gcc/config/rs6000/rs6000-overload.def |   4 +
 gcc/doc/extend.texi   |  12 ++
 .../powerpc/vec-sel-runnable-i128.c   | 129 ++
 4 files changed, 145 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-sel-runnable-i128.c

diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index 13e36df008d..ea0da77f13e 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -1904,12 +1904,6 @@
   const vuc __builtin_vsx_xxsel_16qi_uns (vuc, vuc, vuc);
 XXSEL_16QI_UNS vector_select_v16qi_uns {}
 
-  const vsq __builtin_vsx_xxsel_1ti (vsq, vsq, vsq);
-XXSEL_1TI vector_select_v1ti {}
-
-  const vsq __builtin_vsx_xxsel_1ti_uns (vsq, vsq, vsq);
-XXSEL_1TI_UNS vector_select_v1ti_uns {}
-
   const vd __builtin_vsx_xxsel_2df (vd, vd, vd);
 XXSEL_2DF vector_select_v2df {}
 
diff --git a/gcc/config/rs6000/rs6000-overload.def 
b/gcc/config/rs6000/rs6000-overload.def
index 4d857bb1af3..a210c5ad10d 100644
--- a/gcc/config/rs6000/rs6000-overload.def
+++ b/gcc/config/rs6000/rs6000-overload.def
@@ -3274,6 +3274,10 @@
 VSEL_2DF  VSEL_2DF_B
   vd __builtin_vec_sel (vd, vd, vull);
 VSEL_2DF  VSEL_2DF_U
+  vsq __builtin_vec_sel (vsq, vsq, vsq);
+VSEL_1TI  VSEL_1TI_S
+  vuq __builtin_vec_sel (vuq, vuq, vuq);
+VSEL_1TI_UNS  VSEL_1TI_U
 ; The following variants are deprecated.
   vsll __builtin_vec_sel (vsll, vsll, vsll);
 VSEL_2DI_B  VSEL_2DI_S
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index b88e61641a2..0756230b19e 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -21372,6 +21372,18 @@ Additional built-in functions are available for the 
64-bit PowerPC
 family of processors, for efficient use of 128-bit floating point
 (@code{__float128}) values.
 
+Vector select
+
+@smallexample
+vector signed __int128 vec_sel (vector signed __int128,
+   vector signed __int128, vector signed __int128);
+vector unsigned __int128 vec_sel (vector unsigned __int128,
+   vector unsigned __int128, vector unsigned __int128);
+@end smallexample
+
+The instance is an extension of the exiting overloaded built-in @code{vec_sel}
+that is documented in the PVIPR.
+
 @node Basic PowerPC Built-in Functions Available on ISA 2.06
 @subsubsection Basic PowerPC Built-in Functions Available on ISA 2.06
 
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-sel-runnable-i128.c 
b/gcc/testsuite/gcc.target/powerpc/vec-sel-runnable-i128.c
new file mode 100644
index 000..d82225cc847
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-sel-runnable-i128.c
@@ -0,0 +1,129 @@
+/* { dg-do run } */
+/* { dg-require-effective-target vmx_hw } */
+/* { dg-options "-save-temps" } */
+/* { dg-final { scan-assembler-times "xxsel" 2 } } */
+
+#include 
+
+#define DEBUG 0
+
+#if DEBUG
+#include 
+void print_i128 (unsigned __int128 val)
+{
+  printf(" 0x%016llx%016llx",
+ (unsigned long long)(val >> 64),
+ (unsigned long long)(val & 0x));
+}
+#endif
+
+extern void abort (void);
+
+union convert_union {
+  vector signed __int128s128;
+  vector unsigned __int128  u128;
+  char  val[16];
+} convert;
+
+int check_u128_result(vector unsigned __int128 vresult_u128,
+ vector unsigned __int128 expected_vresult_u128)
+{
+  /* Use a for loop to check each byte manually so the test case will run
+ with ISA 2.06.
+
+ Return 1 if they match, 0 otherwise.  */
+
+  int i;
+
+  union convert_union result;
+  union convert_union expected;
+
+  result.u128 = vresult_u128;
+  expected.u128 = expected_vresult_u128;
+
+  /* Check if each byte of the result and expected match. */
+  for (i = 0; i < 16; i++)
+{
+  if (result.val[i] != expected.val[i])
+   return 0;
+}
+  return 1;

Re: [PATCH 6/13 ver 3] rs6000, remove duplicated built-ins of vecmergl and, vec_mergeh

2024-05-29 Thread Carl Love
This was patch 5 in the previous series.  It was previously approved.  Not 
changes in this version.  Being posted for completeness.

 Carl 


rs6000, remove duplicated built-ins of vecmergl and
 vec_mergeh

The following undocumented built-ins are same as existing documented
overloaded builtins.

  const vf __builtin_vsx_xxmrghw (vf, vf);
same as  vf __builtin_vec_mergeh (vf, vf);  (overloaded vec_mergeh)

  const vsi __builtin_vsx_xxmrghw_4si (vsi, vsi);
same as vsi __builtin_vec_mergeh (vsi, vsi);   (overloaded vec_mergeh)

  const vf __builtin_vsx_xxmrglw (vf, vf);
same as vf __builtin_vec_mergel (vf, vf);  (overloaded vec_mergel)

  const vsi __builtin_vsx_xxmrglw_4si (vsi, vsi);
same as vsi __builtin_vec_mergel (vsi, vsi);   (overloaded vec_mergel)

This patch removes the duplicate built-in definitions so only the
documented built-ins will be available for use.  The case statements in
rs6000_gimple_fold_builtin are removed as they are no longer needed.  The
patch removes the now unused define_expands for vsx_xxmrghw_ and
vsx_xxmrglw_.

gcc/ChangeLog:
* config/rs6000/rs6000-builtins.def (__builtin_vsx_xxmrghw,
__builtin_vsx_xxmrghw_4si, __builtin_vsx_xxmrglw,
__builtin_vsx_xxmrglw_4si, __builtin_vsx_xxsel_16qi): Remove
built-in definition.
* config/rs6000/rs6000-builtin.cc (rs6000_gimple_fold_builtin):
remove case entries RS6000_BIF_XXMRGLW_4SI,
RS6000_BIF_XXMRGLW_4SF, RS6000_BIF_XXMRGHW_4SI,
RS6000_BIF_XXMRGHW_4SF.
* config/rs6000/vsx.md (vsx_xxmrghw_, vsx_xxmrglw_):
Remove unused define_expands.
---
 gcc/config/rs6000/rs6000-builtin.cc   |  4 ---
 gcc/config/rs6000/rs6000-builtins.def | 12 
 gcc/config/rs6000/vsx.md  | 41 ---
 3 files changed, 57 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-builtin.cc 
b/gcc/config/rs6000/rs6000-builtin.cc
index ac9f16fe51a..f83d65b06ef 100644
--- a/gcc/config/rs6000/rs6000-builtin.cc
+++ b/gcc/config/rs6000/rs6000-builtin.cc
@@ -2097,20 +2097,16 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
 /* vec_mergel (integrals).  */
 case RS6000_BIF_VMRGLH:
 case RS6000_BIF_VMRGLW:
-case RS6000_BIF_XXMRGLW_4SI:
 case RS6000_BIF_VMRGLB:
 case RS6000_BIF_VEC_MERGEL_V2DI:
-case RS6000_BIF_XXMRGLW_4SF:
 case RS6000_BIF_VEC_MERGEL_V2DF:
   fold_mergehl_helper (gsi, stmt, 1);
   return true;
 /* vec_mergeh (integrals).  */
 case RS6000_BIF_VMRGHH:
 case RS6000_BIF_VMRGHW:
-case RS6000_BIF_XXMRGHW_4SI:
 case RS6000_BIF_VMRGHB:
 case RS6000_BIF_VEC_MERGEH_V2DI:
-case RS6000_BIF_XXMRGHW_4SF:
 case RS6000_BIF_VEC_MERGEH_V2DF:
   fold_mergehl_helper (gsi, stmt, 0);
   return true;
diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index 6049f3a4599..13e36df008d 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -1877,18 +1877,6 @@
   const signed int __builtin_vsx_xvtsqrtsp_fg (vf);
 XVTSQRTSP_FG vsx_tsqrtv4sf2_fg {}
 
-  const vf __builtin_vsx_xxmrghw (vf, vf);
-XXMRGHW_4SF vsx_xxmrghw_v4sf {}
-
-  const vsi __builtin_vsx_xxmrghw_4si (vsi, vsi);
-XXMRGHW_4SI vsx_xxmrghw_v4si {}
-
-  const vf __builtin_vsx_xxmrglw (vf, vf);
-XXMRGLW_4SF vsx_xxmrglw_v4sf {}
-
-  const vsi __builtin_vsx_xxmrglw_4si (vsi, vsi);
-XXMRGLW_4SI vsx_xxmrglw_v4si {}
-
   const vsc __builtin_vsx_xxpermdi_16qi (vsc, vsc, const int<2>);
 XXPERMDI_16QI vsx_xxpermdi_v16qi {}
 
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index a8f3d459232..4402b8b01d5 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -4875,47 +4875,6 @@ (define_insn "vsx_xxspltd_"
 }
   [(set_attr "type" "vecperm")])
 
-;; V4SF/V4SI interleave
-(define_expand "vsx_xxmrghw_"
-  [(set (match_operand:VSX_W 0 "vsx_register_operand" "=wa")
-(vec_select:VSX_W
- (vec_concat:
-   (match_operand:VSX_W 1 "vsx_register_operand" "wa")
-   (match_operand:VSX_W 2 "vsx_register_operand" "wa"))
- (parallel [(const_int 0) (const_int 4)
-(const_int 1) (const_int 5)])))]
-  "VECTOR_MEM_VSX_P (mode)"
-{
-  rtx (*fun) (rtx, rtx, rtx);
-  fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrghw_direct_
-: gen_altivec_vmrglw_direct_;
-  if (!BYTES_BIG_ENDIAN)
-std::swap (operands[1], operands[2]);
-  emit_insn (fun (operands[0], operands[1], operands[2]));
-  DONE;
-}
-  [(set_attr "type" "vecperm")])
-
-(define_expand "vsx_xxmrglw_"
-  [(set (match_operand:VSX_W 0 "vsx_register_operand" "=wa")
-   (vec_select:VSX_W
- (vec_concat:
-   (match_operand:VSX_W 1 "vsx_register_operand" "wa")
-   (match_operand:VSX_W 2 "vsx_register_operand" "wa"))
- (parallel [(const_int 2) (const_int 6)
-(cons

Re: [PATCH 8/13 ver 3] rs6000, remove the vec_xxsel built-ins, they are, duplicates

2024-05-29 Thread Carl Love
This was patch 7 in the previous series.  Patch was updated to address the 
feedback comments.

Carl 


rs6000, remove the vec_xxsel built-ins, they are duplicates

The following undocumented built-ins are covered by the existing overloaded
vec_sel built-in definitions.

  const vsc __builtin_vsx_xxsel_16qi (vsc, vsc, vsc);
same as vsc __builtin_vec_sel (vsc, vsc, vuc);  (overloaded vec_sel)

  const vuc __builtin_vsx_xxsel_16qi_uns (vuc, vuc, vuc);
same as vuc __builtin_vec_sel (vuc, vuc, vuc);  (overloaded vec_sel)

  const vd __builtin_vsx_xxsel_2df (vd, vd, vd);
same as  vd __builtin_vec_sel (vd, vd, vull);   (overloaded vec_sel)

  const vsll __builtin_vsx_xxsel_2di (vsll, vsll, vsll);
same as vsll __builtin_vec_sel (vsll, vsll, vsll);  (overloaded vec_sel)

  const vull __builtin_vsx_xxsel_2di_uns (vull, vull, vull);
same as vull __builtin_vec_sel (vull, vull, vsll);  (overloaded vec_sel)

  const vf __builtin_vsx_xxsel_4sf (vf, vf, vf);
same as vf __builtin_vec_sel (vf, vf, vsi)  (overloaded vec_sel)

  const vsi __builtin_vsx_xxsel_4si (vsi, vsi, vsi);
same as vsi __builtin_vec_sel (vsi, vsi, vbi);  (overloaded vec_sel)

  const vui __builtin_vsx_xxsel_4si_uns (vui, vui, vui);
same as vui __builtin_vec_sel (vui, vui, vui);  (overloaded vec_sel)

  const vss __builtin_vsx_xxsel_8hi (vss, vss, vss);
same as vss __builtin_vec_sel (vss, vss, vbs);  (overloaded vec_sel)

  const vus __builtin_vsx_xxsel_8hi_uns (vus, vus, vus);
same as vus __builtin_vec_sel (vus, vus, vus);  (overloaded vec_sel)

This patch removed the duplicate built-in definitions so users will only
use the documented vec_sel built-in.  The __builtin_vsx_xxsel_[4si, 8hi,
16qi, 4sf, 2df] tests are also removed.

gcc/ChangeLog:
* config/rs6000/rs6000-builtins.def (__builtin_vsx_xxsel_16qi,
__builtin_vsx_xxsel_16qi_uns, __builtin_vsx_xxsel_2df,
__builtin_vsx_xxsel_2di,__builtin_vsx_xxsel_2di_uns,
__builtin_vsx_xxsel_4sf,__builtin_vsx_xxsel_4si,
__builtin_vsx_xxsel_4si_uns,__builtin_vsx_xxsel_8hi,
__builtin_vsx_xxsel_8hi_uns): Removebuilt-in definitions.

gcc/testsuite/ChangeLog:
* gcc.target/powerpc/vsx-builtin-3.c (__builtin_vsx_xxsel_4si,
__builtin_vsx_xxsel_8hi, __builtin_vsx_xxsel_16qi,
__builtin_vsx_xxsel_4sf, __builtin_vsx_xxsel_2df,
__builtin_vsx_xxsel): Change built-in call to overloaded built-in
call vec_sel.
---
 gcc/config/rs6000/rs6000-builtins.def | 30 
 .../gcc.target/powerpc/vsx-builtin-3.c| 36 ++-
 2 files changed, 19 insertions(+), 47 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index ea0da77f13e..a78c52183bc 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -1898,36 +1898,6 @@
   const vss __builtin_vsx_xxpermdi_8hi (vss, vss, const int<2>);
 XXPERMDI_8HI vsx_xxpermdi_v8hi {}
 
-  const vsc __builtin_vsx_xxsel_16qi (vsc, vsc, vsc);
-XXSEL_16QI vector_select_v16qi {}
-
-  const vuc __builtin_vsx_xxsel_16qi_uns (vuc, vuc, vuc);
-XXSEL_16QI_UNS vector_select_v16qi_uns {}
-
-  const vd __builtin_vsx_xxsel_2df (vd, vd, vd);
-XXSEL_2DF vector_select_v2df {}
-
-  const vsll __builtin_vsx_xxsel_2di (vsll, vsll, vsll);
-XXSEL_2DI vector_select_v2di {}
-
-  const vull __builtin_vsx_xxsel_2di_uns (vull, vull, vull);
-XXSEL_2DI_UNS vector_select_v2di_uns {}
-
-  const vf __builtin_vsx_xxsel_4sf (vf, vf, vf);
-XXSEL_4SF vector_select_v4sf {}
-
-  const vsi __builtin_vsx_xxsel_4si (vsi, vsi, vsi);
-XXSEL_4SI vector_select_v4si {}
-
-  const vui __builtin_vsx_xxsel_4si_uns (vui, vui, vui);
-XXSEL_4SI_UNS vector_select_v4si_uns {}
-
-  const vss __builtin_vsx_xxsel_8hi (vss, vss, vss);
-XXSEL_8HI vector_select_v8hi {}
-
-  const vus __builtin_vsx_xxsel_8hi_uns (vus, vus, vus);
-XXSEL_8HI_UNS vector_select_v8hi_uns {}
-
   const vsc __builtin_vsx_xxsldwi_16qi (vsc, vsc, const int<2>);
 XXSLDWI_16QI vsx_xxsldwi_v16qi {}
 
diff --git a/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c 
b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c
index ff875c55304..e20d3f03c86 100644
--- a/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c
+++ b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c
@@ -37,6 +37,8 @@
 /* { dg-final { scan-assembler "xvcvsxdsp" } } */
 /* { dg-final { scan-assembler "xvcvuxdsp" } } */
 
+#include 
+
 extern __vector int si[][4];
 extern __vector short ss[][4];
 extern __vector signed char sc[][4];
@@ -61,23 +63,23 @@ int do_sel(void)
 {
   int i = 0;
 
-  si[i][0] = __builtin_vsx_xxsel_4si (si[i][1], si[i][2], si[i][3]); i++;
-  ss[i][0] = __builtin_vsx_xxsel_8hi (ss[i][1], ss[i][2], ss[i][3]); i++;
-  sc[i][0] = __builtin_vsx_xxsel_16qi (sc[i][1], sc[i][2], sc[i][3]); i++;
-  f[i][0] = __built

Re: [PATCH 10/13 ver 3] rs6000, remove __builtin_vsx_xvnegdp and, __builtin_vsx_xvnegsp built-ins

2024-05-29 Thread Carl Love
 This was patch 9 in the previous series.  It was previously approved.  
Reposting for completeness.

 Carl
-

rs6000, remove __builtin_vsx_xvnegdp and __builtin_vsx_xvnegsp built-ins

The undocumented __builtin_vsx_xvnegdp and __builtin_vsx_xvnegsp are
redundant.  The overloaded vec_neg built-in provides the same
functionality.  The two buit-ins are not documented nor are there any
test cases for them.

Remove the definitions so users will use the overloaded vec_neg built-in
which is documented in the PVIPR.

gcc/ChangeLog:
* config/rs6000/rs6000-builtins.def (__builtin_vsx_xvnegdp,
__builtin_vsx_xvnegsp): Remove built-in definitions.
---
 gcc/config/rs6000/rs6000-builtins.def | 6 --
 1 file changed, 6 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index f02a8c4de45..64690b9b9b5 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -1736,12 +1736,6 @@
   const vf __builtin_vsx_xvnabssp (vf);
 XVNABSSP vsx_nabsv4sf2 {}
 
-  const vd __builtin_vsx_xvnegdp (vd);
-XVNEGDP negv2df2 {}
-
-  const vf __builtin_vsx_xvnegsp (vf);
-XVNEGSP negv4sf2 {}
-
   const vd __builtin_vsx_xvnmadddp (vd, vd, vd);
 XVNMADDDP nfmav2df4 {}
 
-- 
2.45.0



Re: [PATCH 11/13 ver 3] rs6000, extend vec_xxpermdi built-in for __int128 args

2024-05-29 Thread Carl Love
 This was patch 10 from the previous series.  The patch was updated to address 
feedback comments.

Carl 
---

rs6000, extend vec_xxpermdi built-in for __int128 args

Add a new signed and unsigned overloaded instances for vec_xxpermdi

   __int128 vec_xxpermdi (__int128, __int128, const int);
   __uint128 vec_xxpermdi (__uint128, __uint128, const int);

Update the documentation to include a reference to the new built-in
instances.

Add test cases for the new overloaded instances.

gcc/ChangeLog:
* config/rs6000/rs6000-overload.def (vec_xxpermdi): Add new
overloaded built-in instances.
* doc/extend.texi:  Add documentation for new overloaded built-in
instances.

gcc/testsuite/ChangeLog:gcc/testsuite/ChangeLog:
* gcc.target/powerpc/vec_perm-runnable-i128.c: New test file.
---
 gcc/config/rs6000/rs6000-overload.def |   4 +
 gcc/doc/extend.texi   |   2 +
 .../powerpc/vec_perm-runnable-i128.c  | 229 ++
 3 files changed, 235 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vec_perm-runnable-i128.c

diff --git a/gcc/config/rs6000/rs6000-overload.def 
b/gcc/config/rs6000/rs6000-overload.def
index a210c5ad10d..45000f161e4 100644
--- a/gcc/config/rs6000/rs6000-overload.def
+++ b/gcc/config/rs6000/rs6000-overload.def
@@ -4932,6 +4932,10 @@
 XXPERMDI_4SF  XXPERMDI_VF
   vd __builtin_vsx_xxpermdi (vd, vd, const int);
 XXPERMDI_2DF  XXPERMDI_VD
+  vsq __builtin_vsx_xxpermdi (vsq, vsq, const int);
+XXPERMDI_1TI  XXPERMDI_1TI
+  vuq __builtin_vsx_xxpermdi (vuq, vuq, const int);
+XXPERMDI_1TI  XXPERMDI_1TUI
 
 [VEC_XXSLDWI, vec_xxsldwi, __builtin_vsx_xxsldwi]
   vsc __builtin_vsx_xxsldwi (vsc, vsc, const int);
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 0756230b19e..edfef1bdab7 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -22555,6 +22555,8 @@ void vec_vsx_st (vector bool char, int, signed char *);
 vector double vec_xxpermdi (vector double, vector double, const int);
 vector float vec_xxpermdi (vector float, vector float, const int);
 vector long long vec_xxpermdi (vector long long, vector long long, const int);
+vector __int128 vec_xxpermdi (vector __int128, vector __int128, const int);
+vector __int128 vec_xxpermdi (vector __uint128, vector __uint128, const int);
 vector unsigned long long vec_xxpermdi (vector unsigned long long,
 vector unsigned long long, const int);
 vector int vec_xxpermdi (vector int, vector int, const int);
diff --git a/gcc/testsuite/gcc.target/powerpc/vec_perm-runnable-i128.c 
b/gcc/testsuite/gcc.target/powerpc/vec_perm-runnable-i128.c
new file mode 100644
index 000..2d5dce09404
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec_perm-runnable-i128.c
@@ -0,0 +1,229 @@
+/* { dg-do run } */
+/* { dg-require-effective-target vmx_hw } */
+/* { dg-options "-save-temps" } */
+
+#include 
+
+#define DEBUG 0
+
+#if DEBUG
+#include 
+void print_i128 (unsigned __int128 val)
+{
+  printf(" 0x%016llx%016llx",
+ (unsigned long long)(val >> 64),
+ (unsigned long long)(val & 0x));
+}
+#endif
+
+extern void abort (void);
+
+union convert_union {
+  vector signed __int128s128;
+  vector unsigned __int128  u128;
+  char  val[16];
+} convert;
+
+int check_u128_result(vector unsigned __int128 vresult_u128,
+ vector unsigned __int128 expected_vresult_u128)
+{
+  /* Use a for loop to check each byte manually so the test case will
+ run with ISA 2.06.
+
+ Return 1 if they match, 0 otherwise.  */
+
+  int i;
+
+  union convert_union result;
+  union convert_union expected;
+
+  result.u128 = vresult_u128;
+  expected.u128 = expected_vresult_u128;
+
+  /* Check if each byte of the result and expected match. */
+  for (i = 0; i < 16; i++)
+{
+  if (result.val[i] != expected.val[i])
+   return 0;
+}
+  return 1;
+}
+
+int check_s128_result(vector signed __int128 vresult_s128,
+ vector signed __int128 expected_vresult_s128)
+{
+  /* Convert the arguments to unsigned, then check equality.  */
+  union convert_union result;
+  union convert_union expected;
+
+  result.s128 = vresult_s128;
+  expected.s128 = expected_vresult_s128;
+
+  return check_u128_result (result.u128, expected.u128);
+}
+
+
+int
+main (int argc, char *argv [])
+{
+  int i;
+  
+  vector signed __int128 src_va_s128;
+  vector signed __int128 src_vb_s128;
+  vector signed __int128 vresult_s128;
+  vector signed __int128 expected_vresult_s128;
+
+  vector unsigned __int128 src_va_u128;
+  vector unsigned __int128 src_vb_u128;
+  vector unsigned __int128 src_vc_u128;
+  vector unsigned __int128 vresult_u128;
+  vector unsigned __int128 expected_vresult_u128;
+
+  src_va_s128 = (vector signed __int128) {0x123456789ABCDEF0};
+  src_va_s128 = src_va_s128 << 64; 
+  src_va_s128

Re: [PATCH 9/13 ver 3] rs6000, remove __builtin_vsx_vperm_* built-ins

2024-05-29 Thread Carl Love
This was patch 8 in the previous series.  Updated patch per the feedback 
comments.

Carl 


rs6000, remove __builtin_vsx_vperm_* built-ins

The undocumented built-ins:
  __builtin_vsx_vperm_16qi_uns,
  __builtin_vsx_vperm_1ti,
  __builtin_vsx_vperm_1ti_uns,
  __builtin_vsx_vperm_2df,
  __builtin_vsx_vperm_2di,
  __builtin_vsx_vperm_2di_uns,
  __builtin_vsx_vperm_4sf,
  __builtin_vsx_vperm_4si,
  __builtin_vsx_vperm_4si_uns

are duplicats of the __builtin_altivec_* builtins that are used by
the overloaded vec_perm built-in that is documented in the PVIPR.

gcc/ChangeLog:
* config/rs6000/rs6000-builtins.def (__builtin_vsx_vperm_16qi_uns,
__builtin_vsx_vperm_1ti, __builtin_vsx_vperm_1ti_uns,
__builtin_vsx_vperm_2df, __builtin_vsx_vperm_2di,
__builtin_vsx_vperm_2di_uns, __builtin_vsx_vperm_4sf,
__builtin_vsx_vperm_4si, __builtin_vsx_vperm_4si_uns): Remove
built-in definitions and comments.

gcc/testsuite/ChangeLog:
* gcc.target/powerpc/vsx-builtin-3.c (__builtin_vsx_vperm_16qi_uns,
__builtin_vsx_vperm_1ti, __builtin_vsx_vperm_1ti_uns,
__builtin_vsx_vperm_2df, __builtin_vsx_vperm_2di,
__builtin_vsx_vperm_2di_uns, __builtin_vsx_vperm_4sf,
__builtin_vsx_vperm_4si, __builtin_vsx_vperm_4si_uns,
__builtin_vsx_vperm): Change call to built-in to the  overloaded
built-in vec_perm.
---
 gcc/config/rs6000/rs6000-builtins.def | 33 ---
 .../gcc.target/powerpc/vsx-builtin-3.c| 22 ++---
 2 files changed, 11 insertions(+), 44 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index a78c52183bc..f02a8c4de45 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -1529,39 +1529,6 @@
   const vf __builtin_vsx_uns_floato_v2di (vsll);
 UNS_FLOATO_V2DI unsfloatov2di {}
 
-; These are duplicates of __builtin_altivec_* counterparts, and are being
-; kept for backwards compatibility.  The reason for their existence is
-; unclear.  TODO: Consider deprecation/removal at some point.
-  const vsc __builtin_vsx_vperm_16qi (vsc, vsc, vuc);
-VPERM_16QI_X altivec_vperm_v16qi {}
-
-  const vuc __builtin_vsx_vperm_16qi_uns (vuc, vuc, vuc);
-VPERM_16QI_UNS_X altivec_vperm_v16qi_uns {}
-
-  const vsq __builtin_vsx_vperm_1ti (vsq, vsq, vsc);
-VPERM_1TI_X altivec_vperm_v1ti {}
-
-  const vsq __builtin_vsx_vperm_1ti_uns (vsq, vsq, vsc);
-VPERM_1TI_UNS_X altivec_vperm_v1ti_uns {}
-
-  const vd __builtin_vsx_vperm_2df (vd, vd, vuc);
-VPERM_2DF_X altivec_vperm_v2df {}
-
-  const vsll __builtin_vsx_vperm_2di (vsll, vsll, vuc);
-VPERM_2DI_X altivec_vperm_v2di {}
-
-  const vull __builtin_vsx_vperm_2di_uns (vull, vull, vuc);
-VPERM_2DI_UNS_X altivec_vperm_v2di_uns {}
-
-  const vf __builtin_vsx_vperm_4sf (vf, vf, vuc);
-VPERM_4SF_X altivec_vperm_v4sf {}
-
-  const vsi __builtin_vsx_vperm_4si (vsi, vsi, vuc);
-VPERM_4SI_X altivec_vperm_v4si {}
-
-  const vui __builtin_vsx_vperm_4si_uns (vui, vui, vuc);
-VPERM_4SI_UNS_X altivec_vperm_v4si_uns {}
-
   const vss __builtin_vsx_vperm_8hi (vss, vss, vuc);
 VPERM_8HI_X altivec_vperm_v8hi {}
 
diff --git a/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c 
b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c
index e20d3f03c86..f06d871b6b1 100644
--- a/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c
+++ b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c
@@ -88,17 +88,17 @@ int do_perm(void)
 {
   int i = 0;
 
-  si[i][0] = __builtin_vsx_vperm_4si (si[i][1], si[i][2], uc[i][3]); i++;
-  ss[i][0] = __builtin_vsx_vperm_8hi (ss[i][1], ss[i][2], uc[i][3]); i++;
-  sc[i][0] = __builtin_vsx_vperm_16qi (sc[i][1], sc[i][2], uc[i][3]); i++;
-  f[i][0] = __builtin_vsx_vperm_4sf (f[i][1], f[i][2], uc[i][3]); i++;
-  d[i][0] = __builtin_vsx_vperm_2df (d[i][1], d[i][2], uc[i][3]); i++;
-
-  si[i][0] = __builtin_vsx_vperm (si[i][1], si[i][2], uc[i][3]); i++;
-  ss[i][0] = __builtin_vsx_vperm (ss[i][1], ss[i][2], uc[i][3]); i++;
-  sc[i][0] = __builtin_vsx_vperm (sc[i][1], sc[i][2], uc[i][3]); i++;
-  f[i][0] = __builtin_vsx_vperm (f[i][1], f[i][2], uc[i][3]); i++;
-  d[i][0] = __builtin_vsx_vperm (d[i][1], d[i][2], uc[i][3]); i++;
+  si[i][0] = vec_perm (si[i][1], si[i][2], uc[i][3]); i++;
+  ss[i][0] = vec_perm (ss[i][1], ss[i][2], uc[i][3]); i++;
+  sc[i][0] = vec_perm (sc[i][1], sc[i][2], uc[i][3]); i++;
+  f[i][0] = vec_perm (f[i][1], f[i][2], uc[i][3]); i++;
+  d[i][0] = vec_perm (d[i][1], d[i][2], uc[i][3]); i++;
+
+  si[i][0] = vec_perm (si[i][1], si[i][2], uc[i][3]); i++;
+  ss[i][0] = vec_perm (ss[i][1], ss[i][2], uc[i][3]); i++;
+  sc[i][0] = vec_perm (sc[i][1], sc[i][2], uc[i][3]); i++;
+  f[i][0] = vec_perm (f[i][1], f[i][2], uc[i][3]); i++;
+  d[i][0] = vec_perm (d[i][1], d[i][2], uc[i][3]); i++;
 
   return i;
 }
-- 
2.45.0



Re: [PATCH 12/13 ver 3] rs6000, remove __builtin_vsx_xvcmpeqsp_p built-in

2024-05-29 Thread Carl Love
This was patch 11 from the previous series.  Patch was updated to address 
feedback comments.

   Carl 
--

rs6000, remove __builtin_vsx_xvcmpeqsp_p built-in

The built-in __builtin_vsx_xvcmpeqsp_p is a duplicate of the overloaded
__builtin_altivec_vcmpeqfp_p built-in.  The built-in is undocumented and
there are no test cases for it.  The patch removes built-in
__builtin_vsx_xvcmpeqsp_p.

gcc/ChangeLog:
* config/rs6000/rs6000-builtins.def (__builtin_vsx_xvcmpeqsp_p):
Remove built-in definition.
---
 gcc/config/rs6000/rs6000-builtins.def | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index 64690b9b9b5..48ebc018a8d 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -1619,9 +1619,6 @@
   const vf __builtin_vsx_xvcmpeqsp (vf, vf);
 XVCMPEQSP vector_eqv4sf {}
 
-  const signed int __builtin_vsx_xvcmpeqsp_p (signed int, vf, vf);
-XVCMPEQSP_P vector_eq_v4sf_p {pred}
-
   const vd __builtin_vsx_xvcmpgedp (vd, vd);
 XVCMPGEDP vector_gev2df {}
 
-- 
2.45.0



Re: [PATCH 13/13 ver 3] rs6000, remove vector set and vector init built-ins.

2024-05-29 Thread Carl Love
This was patch 13 from the previous series.  Note the previous series patch 12 
was dropped.  This patch is the same as the previous version.  The additional 
work to remove  __builtin_vec_set_v1ti, __builtin_vec_set_v2di,  
__builtin_vec_set_v2d per the feedback comments with equivalent gimple code is 
being deferred to a future patch.  The goal of this series was simply to remove 
duplicated built-ins, extending overloaded built-ins as needed.  Adding the 
needed gimple code to remove the additional built-ins is beyond the goal of 
this patch series.

 Carl 
---

rs6000, remove vector set and vector init built-ins.

The vector init built-ins:

  __builtin_vec_init_v16qi, __builtin_vec_init_v8hi,
  __builtin_vec_init_v4si, __builtin_vec_init_v4sf,
  __builtin_vec_init_v2di, __builtin_vec_init_v2df,
  __builtin_vec_set_v1ti

perform the same operation as initializing the vector in C code.  For
example:

  result_v4si = __builtin_vec_init_v4si (1, 2, 3, 4);
  result_v4si = {1, 2, 3, 4};

These two constructs were tested and verified they generate identical
assembly instructions with no optimization and -O3 optimization.

The vector set built-ins:

  __builtin_vec_set_v16qi, __builtin_vec_set_v8hi.
  __builtin_vec_set_v4si, __builtin_vec_set_v4sf

perform the same operation as setting a specific element in the vector in
C code.  For example:

  src_v4si = __builtin_vec_set_v4si (src_v4si, int_val, index);
  src_v4si[index] = int_val;

The built-in actually generates more instructions than the inline C code
with no optimization but is identical with -O3 optimizations.

All of the above built-ins that are removed do not have test cases and
are not documented.

Built-ins   __builtin_vec_set_v1ti __builtin_vec_set_v2di,
__builtin_vec_set_v2df are not removed as they are used in function
resolve_vec_insert() in file rs6000-c.cc.

The built-ins are removed as they don't provide any benefit over just
using C code.

gcc/ChangeLog:
* config/rs6000/rs6000-builtins.def (__builtin_vec_init_v16qi,
__builtin_vec_init_v8hi, __builtin_vec_init_v4si,
__builtin_vec_init_v4sf, __builtin_vec_init_v2di,
__builtin_vec_init_v2df, __builtin_vec_set_v1ti,
__builtin_vec_set_v16qi, __builtin_vec_set_v8hi.
__builtin_vec_set_v4si, __builtin_vec_set_v4sf,
__builtin_vec_set_v2di, __builtin_vec_set_v2df,
__builtin_vec_set_v1ti): Remove built-in definitions.
---
 gcc/config/rs6000/rs6000-builtins.def | 42 ++-
 1 file changed, 2 insertions(+), 40 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index 48ebc018a8d..8349d45169f 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -1118,37 +1118,6 @@
   const signed short __builtin_vec_ext_v8hi (vss, signed int);
 VEC_EXT_V8HI nothing {extract}
 
-  const vsc __builtin_vec_init_v16qi (signed char, signed char, signed char, \
-signed char, signed char, signed char, signed char, signed char, \
-signed char, signed char, signed char, signed char, signed char, \
-signed char, signed char, signed char);
-VEC_INIT_V16QI nothing {init}
-
-  const vf __builtin_vec_init_v4sf (float, float, float, float);
-VEC_INIT_V4SF nothing {init}
-
-  const vsi __builtin_vec_init_v4si (signed int, signed int, signed int, \
- signed int);
-VEC_INIT_V4SI nothing {init}
-
-  const vss __builtin_vec_init_v8hi (signed short, signed short, signed short,\
- signed short, signed short, signed short, signed short, \
- signed short);
-VEC_INIT_V8HI nothing {init}
-
-  const vsc __builtin_vec_set_v16qi (vsc, signed char, const int<4>);
-VEC_SET_V16QI nothing {set}
-
-  const vf __builtin_vec_set_v4sf (vf, float, const int<2>);
-VEC_SET_V4SF nothing {set}
-
-  const vsi __builtin_vec_set_v4si (vsi, signed int, const int<2>);
-VEC_SET_V4SI nothing {set}
-
-  const vss __builtin_vec_set_v8hi (vss, signed short, const int<3>);
-VEC_SET_V8HI nothing {set}
-
-
 ; Cell builtins.
 [cell]
   pure vsc __builtin_altivec_lvlx (signed long, const void *);
@@ -1295,15 +1264,8 @@
   const signed long long __builtin_vec_ext_v2di (vsll, signed int);
 VEC_EXT_V2DI nothing {extract}
 
-  const vsq __builtin_vec_init_v1ti (signed __int128);
-VEC_INIT_V1TI nothing {init}
-
-  const vd __builtin_vec_init_v2df (double, double);
-VEC_INIT_V2DF nothing {init}
-
-  const vsll __builtin_vec_init_v2di (signed long long, signed long long);
-VEC_INIT_V2DI nothing {init}
-
+;; VEC_SET_V1TI, VEC_SET_V2DF and VEC_SET_V2DI are used in
+;; resolve_vec_insert(), rs6000-c.cc
   const vsq __builtin_vec_set_v1ti (vsq, signed __int128, const int<0,0>);
 VEC_SET_V1TI nothing {set}
 
-- 
2.45.0



[PATCH ver 2] rs6000,extend and document built-ins vec_test_lsbb_all_ones and vec_test_lsbb_all_zeros

2024-08-09 Thread Carl Love



Gcc maintainers:

Version 2, based on discussion additional overloaded instances of the 
vec_test_lsbb_all_ones and, vec_test_lsbb_all_zeros built-ins has been 
added.  The additional instances are for arguments of vector signed char 
and vector bool char.  The patch has been tested on Power 10 LE and BE 
with no regressions.


Per a report from a user, the existing vec_test_lsbb_all_ones and, 
vec_test_lsbb_all_zeros built-ins are not documented in the GCC 
documentation file.


The following patch adds missing documentation for the 
vec_test_lsbb_all_ones and, vec_test_lsbb_all_zeros built-ins.


Please let me know if the patch is acceptable for mainline.  Thanks.

  Carl

rs6000,extend and document built-ins vec_test_lsbb_all_ones and 
vec_test_lsbb_all_zeros


The built-ins currently support unsigned char arguments.  Extend the
built-ins to also support vector signed char and vector bool char aruments.

Add documentation for the Power 10 built-ins vec_test_lsbb_all_ones
and vec_test_lsbb_all_zeros.  The vec_test_lsbb_all_ones built-in
returns 1 if the least significant bit in each byte is a 1, returns
0 otherwise.  Similarly, vec_test_lsbb_all_zeros returns a 1 if
the least significant bit in each byte is a zero and 0 otherwise.

Add addtional test cases for the built-ins in files:
  gcc/testsuite/gcc.target/powerpc/lsbb.c
  gcc/testsuite/gcc.target/powerpc/lsbb-runnable.c

gcc/ChangeLog:
    * config/rs6000/rs6000-overloaded.def (vec_test_lsbb_all_ones,
    vec_test_lsbb_all_zeros): Add built-in instances for vector signed
    char and vector bool char.
    * doc/extend.texi (vec_test_lsbb_all_ones,
    vec_test_lsbb_all_zeros): Add documentation for the
    existing built-ins.

gcc/testsuite/ChangeLog:gcc/testsuite/ChangeLog:
    * gcc.target/powerpc/lsbb-runnable.c: Add test cases for the vector
    signed char and vector bool char instances of
    vec_test_lsbb_all_zeros and vec_test_lsbb_all_ones built-ins.
    * gcc.target/powerpc/lsbb.c: Add compile test cases for the vector
    signed char and vector bool char instances of
    vec_test_lsbb_all_zeros and vec_test_lsbb_all_ones built-ins.
---
 gcc/config/rs6000/rs6000-overload.def |  12 +-
 gcc/doc/extend.texi   |  19 +++
 .../gcc.target/powerpc/lsbb-runnable.c    | 131 ++
 gcc/testsuite/gcc.target/powerpc/lsbb.c   |  24 +++-
 4 files changed, 156 insertions(+), 30 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-overload.def 
b/gcc/config/rs6000/rs6000-overload.def

index 87495aded49..7d9e31c3f9e 100644
--- a/gcc/config/rs6000/rs6000-overload.def
+++ b/gcc/config/rs6000/rs6000-overload.def
@@ -4403,12 +4403,20 @@
 XXEVAL  XXEVAL_VUQ

 [VEC_TEST_LSBB_ALL_ONES, vec_test_lsbb_all_ones, 
__builtin_vec_xvtlsbb_all_ones]

+  signed int __builtin_vec_xvtlsbb_all_ones (vsc);
+    XVTLSBB_ONES LSBB_ALL_ONES_VSC
   signed int __builtin_vec_xvtlsbb_all_ones (vuc);
-    XVTLSBB_ONES
+    XVTLSBB_ONES LSBB_ALL_ONES_VUC
+  signed int __builtin_vec_xvtlsbb_all_ones (vbc);
+    XVTLSBB_ONES LSBB_ALL_ONES_VBC

 [VEC_TEST_LSBB_ALL_ZEROS, vec_test_lsbb_all_zeros, 
__builtin_vec_xvtlsbb_all_zeros]

+  signed int __builtin_vec_xvtlsbb_all_zeros (vsc);
+    XVTLSBB_ZEROS LSBB_ALL_ZEROS_VSC
   signed int __builtin_vec_xvtlsbb_all_zeros (vuc);
-    XVTLSBB_ZEROS
+    XVTLSBB_ZEROS LSBB_ALL_ZEROS_VUC
+  signed int __builtin_vec_xvtlsbb_all_zeros (vbc);
+    XVTLSBB_ZEROS LSBB_ALL_ZEROS_VBC

 [VEC_TRUNC, vec_trunc, __builtin_vec_trunc]
   vf __builtin_vec_trunc (vf);
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 89fe5db7aed..5ca87889831 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -23332,6 +23332,25 @@ signed long long will sign extend the rightmost 
byte of each doubleword.

 The following additional built-in functions are also available for the
 PowerPC family of processors, starting with ISA 3.1 
(@option{-mcpu=power10}):


+@smallexample
+@exdent int vec_test_lsbb_all_ones (vector signed char);
+@exdent int vec_test_lsbb_all_ones (vector unsigned char);
+@exdent int vec_test_lsbb_all_ones (vector bool char);
+@end smallexample
+@findex vec_test_lsbb_all_ones
+
+The builtin @code{vec_test_lsbb_all_ones} returns 1 if the least 
significant

+bit in each byte is equal to 1.  It returns a 0 otherwise.
+
+@smallexample
+@exdent int vec_test_lsbb_all_zeros (vector signed char);
+@exdent int vec_test_lsbb_all_zeros (vector unsigned char);
+@exdent int vec_test_lsbb_all_zeros (vector bool char);
+@end smallexample
+@findex vec_test_lsbb_all_zeros
+
+The builtin @code{vec_test_lsbb_all_zeros} returns 1 if the least 
significant

+bit in each byte is equal to zero.  It returns a 0 otherwise.

 @smallexample
 @exdent vector unsigned long long int
diff --git a/gcc/testsuite/gcc.target/powerpc/lsbb-runnable.c 
b/gcc/testsuite/gcc.target/powerpc/lsbb-runnable.c

index 2e97cc17b60..3e4f71bed12 100644
-

Re: [PATCH 4/4] rs6000, Add tests and documentation for vector, conversions between integer and float

2024-08-16 Thread Carl Love

Kewen:

Ping.

  Carl

On 8/7/24 10:15 AM, Carl Love wrote:



 GCC maintainers:

The following patch fixes errors in the definition of the 
__builtin_vsx_uns_floate_v2di, __builtin_vsx_uns_floato_v2di and 
__builtin_vsx_uns_float2_v2di built-ins.  The arguments should be 
unsigned but are listed as signed.


Additionally, there are a number of test cases that are missing for 
the various instances of the built-ins.  Additionally, the 
documentation for the various built-ins is missing.


This patch adds the missing test cases and documentation.

The patch has been tested on Power 10 LE and BE with no regressions.

Please let me know if it is acceptable for mainline.  Thanks.

    Carl
- 

rs6000, Add tests and documentation for vector conversions between 
integer and float


The arguments for the __builtin_vsx_uns_floate_v2di,
__builtin_vsx_uns_floato_v2di and __builtin_vsx_uns_float2_v2di built-ins
should be unsigned.

Add tests for the following existing integer and long long int to float
built-ins:
  __builtin_altivecfloat_sisf (vsi);
  __builtin_altivec_uns_float_sisf (vui);
  __builtin_vsxfloate_v2di (vsll);
  __builtin_vsx_uns_floate_v2di (vull);
  __builtin_vsx_floato_v2di (vsll);
  __builtin_vsx_uns_floato_v2di (vull);
  __builtin_vsx_float2_v2di (vsll, vsll);
  __builtin_vsx_uns_float2_v2di (vull, vull);

Add tests for the vector float to vector int built-ins:
  __builtin_altivec_fix_sfsi
  __builtin_altivec_fixuns_sfsi

The various built-ins are not documented.  The patch adds the missing
documentation for the variouls built-ins.

This patch fixes the incorrect __builtin_vsx_uns_float[o|e|2]_v2di
argument types and adds test cases for each of the built-ins listed 
above.


gcc/ChangeLog:
    * config/rs6000/rs6000-builtins.def (__builtin_vsx_uns_floate_v2di,
    __builtin_vsx_uns_floato_v2di,__builtin_vsx_uns_float2_v2di): Change
    argument from signed to unsigned.
    * doc/extend.texi: Add documentation for each of the built-ins.

gcc/testsuite/ChangeLog:
    * gcc.target/powerpc/vsx-int-to-float-runnable.c: New file.
---
 gcc/config/rs6000/rs6000-builtins.def |   6 +-
 gcc/doc/extend.texi   |  37 +++
 .../powerpc/vsx-int-to-float-runnable.c   | 260 ++
 3 files changed, 300 insertions(+), 3 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/powerpc/vsx-int-to-float-runnable.c


diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def

index f2bebd299b2..1227daa1555 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -1463,10 +1463,10 @@
   const vd __builtin_vsx_uns_doubleo_v4si (vsi);
 UNS_DOUBLEO_V4SI unsdoubleov4si2 {}

-  const vf __builtin_vsx_uns_floate_v2di (vsll);
+  const vf __builtin_vsx_uns_floate_v2di (vull);
 UNS_FLOATE_V2DI unsfloatev2di {}

-  const vf __builtin_vsx_uns_floato_v2di (vsll);
+  const vf __builtin_vsx_uns_floato_v2di (vull);
 UNS_FLOATO_V2DI unsfloatov2di {}

   const vsll __builtin_vsx_vsigned_v2df (vd);
@@ -2272,7 +2272,7 @@
   const vss __builtin_vsx_revb_v8hi (vss);
 REVB_V8HI revb_v8hi {}

-  const vf __builtin_vsx_uns_float2_v2di (vsll, vsll);
+  const vf __builtin_vsx_uns_float2_v2di (vull, vull);
 UNS_FLOAT2_V2DI uns_float2_v2di {}

   const vsi __builtin_vsx_vsigned2_v2df (vd, vd);
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index bf6f4094040..7ec4f19a6bf 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -22919,6 +22919,43 @@ but the index value must be 0.

 Only functions excluded from the PVIPR are listed here.

+The following built-ins convert signed and unsigned vectors of ints and
+long long ints to a vector of 32-bit floating point values.
+
+@smallexample
+vector float __builtin_altivec_float_sisf (vector int);
+vector float __builtin_altivec_uns_float_sisf (vector unsigned int);
+vector float __builtin_vsx_floate_v2di (vector signed long long int);
+vector float __builtin_vsx_uns_floate_v2di (vector unsigned long long 
int);

+vector float __builtin_vsx_floato_v2di (vector signed long long int);
+vector float __builtin_vsx_uns_floato_v2di (vector unsigned long long 
int);

+vector float __builtin_vsx_float2_v2di (vector signed long long int,
+    vector signed long long int);
+vector float __builtin_vsx_uns_float2_v2di (vector unsigned long long 
int,
+    vector signed long long 
int);

+@end smallexample
+
+The @code{__builtin_altivec_float_sisf} and
+@code{__builtin_altivec_uns_float_sisf} built-ins convert signed and
+unsigned vectors of 32-bit integers to a vector of 32-bit floating point
+values.  The @code{__builtin_vsx_floate_v2di} and
+@code{__builtin_vsx_uns_floate_v2di} built-ins converts a vector
+long long ints to 32-bit floating point values

Re: [PATCH ver 2] rs6000,extend and document built-ins vec_test_lsbb_all_ones and vec_test_lsbb_all_zeros

2024-08-16 Thread Carl Love

Ping.

 Carl

On 8/9/24 8:57 AM, Carl Love wrote:


Gcc maintainers:

Version 2, based on discussion additional overloaded instances of the 
vec_test_lsbb_all_ones and, vec_test_lsbb_all_zeros built-ins has been 
added.  The additional instances are for arguments of vector signed 
char and vector bool char.  The patch has been tested on Power 10 LE 
and BE with no regressions.


Per a report from a user, the existing vec_test_lsbb_all_ones and, 
vec_test_lsbb_all_zeros built-ins are not documented in the GCC 
documentation file.


The following patch adds missing documentation for the 
vec_test_lsbb_all_ones and, vec_test_lsbb_all_zeros built-ins.


Please let me know if the patch is acceptable for mainline. Thanks.

  Carl

rs6000,extend and document built-ins vec_test_lsbb_all_ones and 
vec_test_lsbb_all_zeros


The built-ins currently support unsigned char arguments.  Extend the
built-ins to also support vector signed char and vector bool char 
aruments.


Add documentation for the Power 10 built-ins vec_test_lsbb_all_ones
and vec_test_lsbb_all_zeros.  The vec_test_lsbb_all_ones built-in
returns 1 if the least significant bit in each byte is a 1, returns
0 otherwise.  Similarly, vec_test_lsbb_all_zeros returns a 1 if
the least significant bit in each byte is a zero and 0 otherwise.

Add addtional test cases for the built-ins in files:
  gcc/testsuite/gcc.target/powerpc/lsbb.c
  gcc/testsuite/gcc.target/powerpc/lsbb-runnable.c

gcc/ChangeLog:
    * config/rs6000/rs6000-overloaded.def (vec_test_lsbb_all_ones,
    vec_test_lsbb_all_zeros): Add built-in instances for vector signed
    char and vector bool char.
    * doc/extend.texi (vec_test_lsbb_all_ones,
    vec_test_lsbb_all_zeros): Add documentation for the
    existing built-ins.

gcc/testsuite/ChangeLog:gcc/testsuite/ChangeLog:
    * gcc.target/powerpc/lsbb-runnable.c: Add test cases for the vector
    signed char and vector bool char instances of
    vec_test_lsbb_all_zeros and vec_test_lsbb_all_ones built-ins.
    * gcc.target/powerpc/lsbb.c: Add compile test cases for the vector
    signed char and vector bool char instances of
    vec_test_lsbb_all_zeros and vec_test_lsbb_all_ones built-ins.
---
 gcc/config/rs6000/rs6000-overload.def |  12 +-
 gcc/doc/extend.texi   |  19 +++
 .../gcc.target/powerpc/lsbb-runnable.c    | 131 ++
 gcc/testsuite/gcc.target/powerpc/lsbb.c   |  24 +++-
 4 files changed, 156 insertions(+), 30 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-overload.def 
b/gcc/config/rs6000/rs6000-overload.def

index 87495aded49..7d9e31c3f9e 100644
--- a/gcc/config/rs6000/rs6000-overload.def
+++ b/gcc/config/rs6000/rs6000-overload.def
@@ -4403,12 +4403,20 @@
 XXEVAL  XXEVAL_VUQ

 [VEC_TEST_LSBB_ALL_ONES, vec_test_lsbb_all_ones, 
__builtin_vec_xvtlsbb_all_ones]

+  signed int __builtin_vec_xvtlsbb_all_ones (vsc);
+    XVTLSBB_ONES LSBB_ALL_ONES_VSC
   signed int __builtin_vec_xvtlsbb_all_ones (vuc);
-    XVTLSBB_ONES
+    XVTLSBB_ONES LSBB_ALL_ONES_VUC
+  signed int __builtin_vec_xvtlsbb_all_ones (vbc);
+    XVTLSBB_ONES LSBB_ALL_ONES_VBC

 [VEC_TEST_LSBB_ALL_ZEROS, vec_test_lsbb_all_zeros, 
__builtin_vec_xvtlsbb_all_zeros]

+  signed int __builtin_vec_xvtlsbb_all_zeros (vsc);
+    XVTLSBB_ZEROS LSBB_ALL_ZEROS_VSC
   signed int __builtin_vec_xvtlsbb_all_zeros (vuc);
-    XVTLSBB_ZEROS
+    XVTLSBB_ZEROS LSBB_ALL_ZEROS_VUC
+  signed int __builtin_vec_xvtlsbb_all_zeros (vbc);
+    XVTLSBB_ZEROS LSBB_ALL_ZEROS_VBC

 [VEC_TRUNC, vec_trunc, __builtin_vec_trunc]
   vf __builtin_vec_trunc (vf);
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 89fe5db7aed..5ca87889831 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -23332,6 +23332,25 @@ signed long long will sign extend the 
rightmost byte of each doubleword.

 The following additional built-in functions are also available for the
 PowerPC family of processors, starting with ISA 3.1 
(@option{-mcpu=power10}):


+@smallexample
+@exdent int vec_test_lsbb_all_ones (vector signed char);
+@exdent int vec_test_lsbb_all_ones (vector unsigned char);
+@exdent int vec_test_lsbb_all_ones (vector bool char);
+@end smallexample
+@findex vec_test_lsbb_all_ones
+
+The builtin @code{vec_test_lsbb_all_ones} returns 1 if the least 
significant

+bit in each byte is equal to 1.  It returns a 0 otherwise.
+
+@smallexample
+@exdent int vec_test_lsbb_all_zeros (vector signed char);
+@exdent int vec_test_lsbb_all_zeros (vector unsigned char);
+@exdent int vec_test_lsbb_all_zeros (vector bool char);
+@end smallexample
+@findex vec_test_lsbb_all_zeros
+
+The builtin @code{vec_test_lsbb_all_zeros} returns 1 if the least 
significant

+bit in each byte is equal to zero.  It returns a 0 otherwise.

 @smallexample
 @exdent vector unsigned long long int
diff --git a/gcc/testsuite/gcc.target/powerpc/lsbb-runnable.c 
b/gcc/testsuite

Re: [PATCH ver 2] rs6000,extend and document built-ins vec_test_lsbb_all_ones and vec_test_lsbb_all_zeros

2024-08-22 Thread Carl Love



Kewen:

On 8/20/24 12:56 AM, Kewen.Lin wrote:

Hi Carl,

on 2024/8/9 23:57, Carl Love wrote:

Gcc maintainers:

Version 2, based on discussion additional overloaded instances of the 
vec_test_lsbb_all_ones and, vec_test_lsbb_all_zeros built-ins has been added.  
The additional instances are for arguments of vector signed char and vector 
bool char.  The patch has been tested on Power 10 LE and BE with no regressions.

Per a report from a user, the existing vec_test_lsbb_all_ones and, 
vec_test_lsbb_all_zeros built-ins are not documented in the GCC documentation 
file.

The following patch adds missing documentation for the vec_test_lsbb_all_ones 
and, vec_test_lsbb_all_zeros built-ins.

Please let me know if the patch is acceptable for mainline.  Thanks.

   Carl

rs6000,extend and document built-ins vec_test_lsbb_all_ones and 
vec_test_lsbb_all_zeros

The built-ins currently support unsigned char arguments.  Extend the

Nit: /unsigned char/vector unsigned char/


Fixed.




built-ins to also support vector signed char and vector bool char aruments.

Nit: /aruments/arguments/


Fixed





ndex 89fe5db7aed..5ca87889831 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -23332,6 +23332,25 @@ signed long long will sign extend the rightmost byte 
of each doubleword.
  The following additional built-in functions are also available for the
  PowerPC family of processors, starting with ISA 3.1 (@option{-mcpu=power10}):

+@smallexample
+@exdent int vec_test_lsbb_all_ones (vector signed char);
+@exdent int vec_test_lsbb_all_ones (vector unsigned char);
+@exdent int vec_test_lsbb_all_ones (vector bool char);
+@end smallexample
+@findex vec_test_lsbb_all_ones
+
+The builtin @code{vec_test_lsbb_all_ones} returns 1 if the least significant
+bit in each byte is equal to 1.  It returns a 0 otherwise.
Nit: s/a 0/0/


Fixed




+
+@smallexample
+@exdent int vec_test_lsbb_all_zeros (vector signed char);
+@exdent int vec_test_lsbb_all_zeros (vector unsigned char);
+@exdent int vec_test_lsbb_all_zeros (vector bool char);
+@end smallexample
+@findex vec_test_lsbb_all_zeros
+
+The builtin @code{vec_test_lsbb_all_zeros} returns 1 if the least significant
+bit in each byte is equal to zero.  It returns a 0 otherwise.

Nit: s/a 0/0/


Fixed




diff --git a/gcc/testsuite/gcc.target/powerpc/lsbb.c 
b/gcc/testsuite/gcc.target/powerpc/lsbb.c
index b5c037094a5..650e944e082 100644
--- a/gcc/testsuite/gcc.target/powerpc/lsbb.c
+++ b/gcc/testsuite/gcc.target/powerpc/lsbb.c
@@ -9,16 +9,32 @@
  /* { dg-require-effective-target power10_ok } */

Nit: This power10_ok isn't needed, could you also remove it together?


OK, removed.




  /* { dg-options "-fno-inline -mdejagnu-cpu=power10 -O2" } */

... and this "-fno-inline".


Removed



-/* { dg-final { scan-assembler-times {\mxvtlsbb\M} 2 } } */
-/* { dg-final { scan-assembler-times {\msetbc\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mxvtlsbb\M} 3 } } */
+/* { dg-final { scan-assembler-times {\msetbc\M} 3 } } */

I would expect the times are changed to 6 rather than 3, was this test
case really tested?  Or am I missing something?

BR,
Kewen


I retested and yes it fails.  Should be 6.  Not sure why my original 
testing didn't catch that.  Perhaps

I looked at the wrong output file???

Changed to

-/* { dg-final { scan-assembler-times {\mxvtlsbb\M} 2 } } */
-/* { dg-final { scan-assembler-times {\msetbc\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mxvtlsbb\M} 6 } } */
+/* { dg-final { scan-assembler-times {\msetbc\M} 6 } } */

and retested.  It now passes.

 Carl



[PATCH ver 3] rs6000,extend and document built-ins vec_test_lsbb_all_ones and vec_test_lsbb_all_zeros

2024-08-22 Thread Carl Love

Gcc maintainers:

Version 3, fixed a few typos per Kewen's review.  Fixed the expected 
number of scan-assembler-times for xvtlsbb and setbc.  Retested on Power 
10 LE.


Version 2, based on discussion additional overloaded instances of the 
vec_test_lsbb_all_ones and, vec_test_lsbb_all_zeros built-ins has been 
added.  The additional instances are for arguments of vector signed char 
and vector bool char.  The patch has been tested on Power 10 LE and BE 
with no regressions.


Per a report from a user, the existing vec_test_lsbb_all_ones and, 
vec_test_lsbb_all_zeros built-ins are not documented in the GCC 
documentation file.


The following patch adds missing documentation for the 
vec_test_lsbb_all_ones and, vec_test_lsbb_all_zeros built-ins.


Please let me know if the patch is acceptable for mainline.  Thanks.

  Carl



rs6000,extend and document built-ins vec_test_lsbb_all_ones  and 
vec_test_lsbb_all_zeros


The built-ins currently support vector unsigned char arguments. Extend the
built-ins to also support vector signed char and vector bool char
arguments.

Add documentation for the Power 10 built-ins vec_test_lsbb_all_ones
and vec_test_lsbb_all_zeros.  The vec_test_lsbb_all_ones built-in
returns 1 if the least significant bit in each byte is a 1, returns
0 otherwise.  Similarly, vec_test_lsbb_all_zeros returns a 1 if
the least significant bit in each byte is a zero and 0 otherwise.

Add addtional test cases for the built-ins in files:
  gcc/testsuite/gcc.target/powerpc/lsbb.c
  gcc/testsuite/gcc.target/powerpc/lsbb-runnable.c

gcc/ChangeLog:
    * config/rs6000/rs6000-overloaded.def (vec_test_lsbb_all_ones,
    vec_test_lsbb_all_zeros): Add built-in instances for vector signed
    char and vector bool char.
    * doc/extend.texi (vec_test_lsbb_all_ones,
    vec_test_lsbb_all_zeros): Add documentation for the
    existing built-ins.

gcc/testsuite/ChangeLog:gcc/testsuite/ChangeLog:
    * gcc.target/powerpc/lsbb-runnable.c: Add test cases for the vector
    signed char and vector bool char instances of
    vec_test_lsbb_all_zeros and vec_test_lsbb_all_ones built-ins.
    * gcc.target/powerpc/lsbb.c: Add compile test cases for the vector
    signed char and vector bool char instances of
    vec_test_lsbb_all_zeros and vec_test_lsbb_all_ones built-ins.
---
 gcc/config/rs6000/rs6000-overload.def |  12 +-
 gcc/doc/extend.texi   |  19 +++
 .../gcc.target/powerpc/lsbb-runnable.c    | 131 ++
 gcc/testsuite/gcc.target/powerpc/lsbb.c   |  28 +++-
 4 files changed, 158 insertions(+), 32 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-overload.def 
b/gcc/config/rs6000/rs6000-overload.def

index 87495aded49..7d9e31c3f9e 100644
--- a/gcc/config/rs6000/rs6000-overload.def
+++ b/gcc/config/rs6000/rs6000-overload.def
@@ -4403,12 +4403,20 @@
 XXEVAL  XXEVAL_VUQ

 [VEC_TEST_LSBB_ALL_ONES, vec_test_lsbb_all_ones, 
__builtin_vec_xvtlsbb_all_ones]

+  signed int __builtin_vec_xvtlsbb_all_ones (vsc);
+    XVTLSBB_ONES LSBB_ALL_ONES_VSC
   signed int __builtin_vec_xvtlsbb_all_ones (vuc);
-    XVTLSBB_ONES
+    XVTLSBB_ONES LSBB_ALL_ONES_VUC
+  signed int __builtin_vec_xvtlsbb_all_ones (vbc);
+    XVTLSBB_ONES LSBB_ALL_ONES_VBC

 [VEC_TEST_LSBB_ALL_ZEROS, vec_test_lsbb_all_zeros, 
__builtin_vec_xvtlsbb_all_zeros]

+  signed int __builtin_vec_xvtlsbb_all_zeros (vsc);
+    XVTLSBB_ZEROS LSBB_ALL_ZEROS_VSC
   signed int __builtin_vec_xvtlsbb_all_zeros (vuc);
-    XVTLSBB_ZEROS
+    XVTLSBB_ZEROS LSBB_ALL_ZEROS_VUC
+  signed int __builtin_vec_xvtlsbb_all_zeros (vbc);
+    XVTLSBB_ZEROS LSBB_ALL_ZEROS_VBC

 [VEC_TRUNC, vec_trunc, __builtin_vec_trunc]
   vf __builtin_vec_trunc (vf);
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 89fe5db7aed..8971d9fbf3c 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -23332,6 +23332,25 @@ signed long long will sign extend the rightmost 
byte of each doubleword.

 The following additional built-in functions are also available for the
 PowerPC family of processors, starting with ISA 3.1 
(@option{-mcpu=power10}):


+@smallexample
+@exdent int vec_test_lsbb_all_ones (vector signed char);
+@exdent int vec_test_lsbb_all_ones (vector unsigned char);
+@exdent int vec_test_lsbb_all_ones (vector bool char);
+@end smallexample
+@findex vec_test_lsbb_all_ones
+
+The builtin @code{vec_test_lsbb_all_ones} returns 1 if the least 
significant

+bit in each byte is equal to 1.  It returns 0 otherwise.
+
+@smallexample
+@exdent int vec_test_lsbb_all_zeros (vector signed char);
+@exdent int vec_test_lsbb_all_zeros (vector unsigned char);
+@exdent int vec_test_lsbb_all_zeros (vector bool char);
+@end smallexample
+@findex vec_test_lsbb_all_zeros
+
+The builtin @code{vec_test_lsbb_all_zeros} returns 1 if the least 
significant

+bit in each byte is equal to zero.  It returns 0 otherwise.

 @smallexample
 @exdent vector unsi

Re: [PATCH 1/13 ver 3] rs6000, Remove __builtin_vsx_cmple* builtins

2024-06-05 Thread Carl Love
Kewen:

On 6/3/24 23:00, Kewen.Lin wrote:
> Hi Carl,
> 
> on 2024/5/29 23:52, Carl Love wrote:
>> This patch was approved in the previous series.  There are no changes to 
>> this patch.  Reposting for completeness. 
> I guess you can just push the approved ones, as there is no dependency
> between any two of them?  It can help to reduce the size of this series.

The patches do touch some similar files so they are not completely independent 
from a patch standpoint.  Functionally they are all independent.

I tried applying the approved patches only to the current mainline tree.  The 
approved patches were: 1,3,5 (with tweak), 6, 8, 9, 10, 12.  Patch 5 requires a 
little rebasing due to a little fuzz in the lines.  Not a big deal.  Patch 8 
also doesn't apply cleanly with git.  The patch command gets a little confused 
when I tried to use it, so I had to manually "recreate" the patch.  The changes 
are straight forward so that is fairly easy.  The rest of the patches applied 
cleanly with git. I am guessing there will be some rebasing needed for the 
non-approved patches to apply them after the approved patches.

The main reason that I reposted everything was that the patch numbers changed 
and I wanted it to be fairly clear what was going on.  

I toyed with the idea of committing the 8 approved patches and then working on 
the additional 5 but I think that is hard as I would have to manually adjust 
the patch numbers to keep them lined up with version 3 or version 4 has a new 
numbering patches 1 to 5 (i.e. remapping of version 3 patch numbers).  Either 
way I think it would be hard/confusing. 

Given that separating out the approved and non-approved patches causes some 
re-basing issues, it is probably best to just update the 5 patches, posting 
them as version 4 and not re-post the whole series. I will just note in the 
header patch 0/13 the patches that have already been approved.  I hope that is 
ok?

 Carl 


Re: [PATCH 4/13 ver 3] rs6000, extend the current vec_{un,}signed{e,o} built-ins

2024-06-13 Thread Carl Love
Kewen:

On 6/4/24 00:19, Kewen.Lin wrote:
> Hi,
> 
> on 2024/5/29 23:58, Carl Love wrote:
>> Updated the patch per the feedback comments from the previous version.
>>
>>  Carl 
>> ---
>>
>> rs6000, extend the current vec_{un,}signed{e,o} built-ins
>>
>> The built-ins __builtin_vsx_xvcvspsxds and __builtin_vsx_xvcvspuxds
>> convert a vector of floats to signed/unsigned long long ints.  Extend the
>> existing vec_{un,}signed{e,o} built-ins to handle the argument
>> vector of floats to return the even/odd signed/unsigned integers.
>>
>> The define expands vsignede_v4sf, vsignedo_v4sf, vunsignede_v4sf,
>> vunsignedo_v4sf are added to support the new vec_{un,}signed{e,o}
>> built-ins.
>>
>> The built-ins __builtin_vsx_xvcvspsxds and __builtin_vsx_xvcvspuxds are
>> now for internal use only. They are not documented and they do not
>> have testcases.
>>> The built-in __builtin_vsx_xvcvdpsxws is redundant as it is covered by
>> vec_signed{e,o}, remove.
>>
>> The built-in __builtin_vsx_xvcvdpuxws is redundant as it is covered by
>> vec_unsigned{e,o}, remove.
>>
>> The built-in __builtin_vsx_xvcvdpuxds_uns is redundant as it is covered by
>> vec_unsigned, remove.
>>
>> The __builtin_vsx_xvcvspuxws is redundante as it is covered by
>> vec_unsigned, remove.
> 
> I perfer to move these removals into sub-patch 2/13 or split them out into
> a new patch, since they don't match the subject of this patch.  Moving it
> to sub-patch 2/13 looks good as they are all about vec_{un,}signed{,e,o}.

Yes, we need to have all of the vec_unsigned in the same patch.  Moved 
__builtin_vsx_xvcvdpuxds_uns and __builtin_vsx_xvcvspuxws to patch 2.
> 
>>
>> Add testcases and update documentation.
>>
>> gcc/ChangeLog:
>>  * config/rs6000/rs6000-builtins.def (__builtin_vsx_xvcvspsxds_low,
>>  __builtin_vsx_xvcvspuxds_low): New built-in definitions.
>>  (__builtin_vsx_xvcvspuxds): Fix return type.
>>  (XVCVSPSXDS, XVCVSPUXDS): Renamed VEC_VSIGNEDE_V4SF,
>>  VEC_VUNSIGNEDE_V4SF respectively.
>>  (vsx_xvcvspsxds, vsx_xvcvspuxds): Renamed vsignede_v4sf,
>>  vunsignede_v4sf respectively.
>>  (__builtin_vsx_xvcvdpsxws, __builtin_vsx_xvcvdpuxws,
>>  __builtin_vsx_xvcvdpuxds_uns, __builtin_vsx_xvcvspuxws): Removed.
>>  * config/rs6000/rs6000-overload.def (vec_signede, vec_signedo,
>>  vec_unsignede,vec_unsignedo):  Add new overloaded specifications.
>>  * config/rs6000/vsx.md (vsignede_v4sf, vsignedo_v4sf,
>>  vunsignede_v4sf, vunsignedo_v4sf): New define_expands.
>>  * doc/extend.texi (vec_signedo, vec_signede): Add documentation.
>>
>> gcc/testsuite/ChangeLog:
>>  * gcc.target/powerpc/builtins-3-runnable.c: New tests for the added
>>  overloaded built-ins.
>> ---
>>  gcc/config/rs6000/rs6000-builtins.def | 25 ++
>>  gcc/config/rs6000/rs6000-overload.def |  8 ++
>>  gcc/config/rs6000/vsx.md  | 88 +++
>>  gcc/doc/extend.texi   | 10 +++
>>  .../gcc.target/powerpc/builtins-3-runnable.c  | 51 +--
>>  5 files changed, 157 insertions(+), 25 deletions(-)
>>
>> diff --git a/gcc/config/rs6000/rs6000-builtins.def 
>> b/gcc/config/rs6000/rs6000-builtins.def
>> index bf9a0ae22fc..cea2649b86c 100644
>> --- a/gcc/config/rs6000/rs6000-builtins.def
>> +++ b/gcc/config/rs6000/rs6000-builtins.def
>> @@ -1688,32 +1688,23 @@
>>const vsll __builtin_vsx_xvcvdpsxds_scale (vd, const int);
>>  XVCVDPSXDS_SCALE vsx_xvcvdpsxds_scale {}
>>  
>> -  const vsi __builtin_vsx_xvcvdpsxws (vd);
>> -XVCVDPSXWS vsx_xvcvdpsxws {}
>> -
>> -  const vsll __builtin_vsx_xvcvdpuxds (vd);
>> -XVCVDPUXDS vsx_fixuns_truncv2dfv2di2 {}
>> -
>>const vsll __builtin_vsx_xvcvdpuxds_scale (vd, const int);
>>  XVCVDPUXDS_SCALE vsx_xvcvdpuxds_scale {}
>>  
>> -  const vull __builtin_vsx_xvcvdpuxds_uns (vd);
>> -XVCVDPUXDS_UNS vsx_fixuns_truncv2dfv2di2 {}
>> -
>> -  const vsi __builtin_vsx_xvcvdpuxws (vd);
>> -XVCVDPUXWS vsx_xvcvdpuxws {}
>> -
>>const vd __builtin_vsx_xvcvspdp (vf);
>>  XVCVSPDP vsx_xvcvspdp {}
>>  
>>const vsll __builtin_vsx_xvcvspsxds (vf);
>> -XVCVSPSXDS vsx_xvcvspsxds {}
>> +VEC_VSIGNEDE_V4SF vsignede_v4sf {}
> 
> We should rename __builtin_vsx_xvcvspsxds to
> __builtin_vsx_vsignede_v4sf, one reason is to align with

Re: [PATCH 7/13 ver 3] rs6000, add overloaded vec_sel with int128 arguments

2024-06-13 Thread Carl Love
Kewen:

On 6/3/24 22:58, Kewen.Lin wrote:
> Hi,
> 
> on 2024/5/30 00:03, Carl Love wrote:
>> This was patch 6 in the previous series.  Updated the documentation file per 
>> the comments.  No functional changes to the patch.
>>
>>   Carl 
>> 
>>
>> rs6000, add overloaded vec_sel with int128 arguments
>>
>> Extend the vec_sel built-in to take three signed/unsigned int128 arguments
>> and return a signed/unsigned int128 result.
>>
>> Extending the vec_sel built-in makes the existing buit-ins
>> __builtin_vsx_xxsel_1ti and __builtin_vsx_xxsel_1ti_uns obsolete.  The
>> patch removes these built-ins.
>>
>> The patch adds documentation and test cases for the new overloaded vec_sel
>> built-ins.
>>
>> gcc/ChangeLog:
>>  * config/rs6000/rs6000-builtins.def (__builtin_vsx_xxsel_1ti,
>>  __builtin_vsx_xxsel_1ti_uns): Remove built-in definitions.
>>  * config/rs6000/rs6000-overload.def (vec_sel): Add new overloaded
>>  definitions.
>>  * doc/extend.texi: Add documentation for new vec_sel instances.
>>
>> gcc/testsuite/ChangeLog:
>>  * gcc.target/powerpc/vec-sel-runnable-i128.c: New test file.
>> ---
>>  gcc/config/rs6000/rs6000-builtins.def |   6 -
>>  gcc/config/rs6000/rs6000-overload.def |   4 +
>>  gcc/doc/extend.texi   |  12 ++
>>  .../powerpc/vec-sel-runnable-i128.c   | 129 ++
>>  4 files changed, 145 insertions(+), 6 deletions(-)
>>  create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-sel-runnable-i128.c
>>
>> diff --git a/gcc/config/rs6000/rs6000-builtins.def 
>> b/gcc/config/rs6000/rs6000-builtins.def
>> index 13e36df008d..ea0da77f13e 100644
>> --- a/gcc/config/rs6000/rs6000-builtins.def
>> +++ b/gcc/config/rs6000/rs6000-builtins.def
>> @@ -1904,12 +1904,6 @@
>>const vuc __builtin_vsx_xxsel_16qi_uns (vuc, vuc, vuc);
>>  XXSEL_16QI_UNS vector_select_v16qi_uns {}
>>  
>> -  const vsq __builtin_vsx_xxsel_1ti (vsq, vsq, vsq);
>> -XXSEL_1TI vector_select_v1ti {}
>> -
>> -  const vsq __builtin_vsx_xxsel_1ti_uns (vsq, vsq, vsq);
>> -XXSEL_1TI_UNS vector_select_v1ti_uns {}
>> -
>>const vd __builtin_vsx_xxsel_2df (vd, vd, vd);
>>  XXSEL_2DF vector_select_v2df {}
>>  
>> diff --git a/gcc/config/rs6000/rs6000-overload.def 
>> b/gcc/config/rs6000/rs6000-overload.def
>> index 4d857bb1af3..a210c5ad10d 100644
>> --- a/gcc/config/rs6000/rs6000-overload.def
>> +++ b/gcc/config/rs6000/rs6000-overload.def
>> @@ -3274,6 +3274,10 @@
>>  VSEL_2DF  VSEL_2DF_B
>>vd __builtin_vec_sel (vd, vd, vull);
>>  VSEL_2DF  VSEL_2DF_U
>> +  vsq __builtin_vec_sel (vsq, vsq, vsq);
>> +VSEL_1TI  VSEL_1TI_S
>> +  vuq __builtin_vec_sel (vuq, vuq, vuq);
>> +VSEL_1TI_UNS  VSEL_1TI_U
> 
> I just noticed that for integral types, such as: signed/unsigned int, we have 
> six instances:
> 
>   vsi __builtin_vec_sel (vsi, vsi, vbi);
> VSEL_4SI  VSEL_4SI_B
>   vsi __builtin_vec_sel (vsi, vsi, vui);
> VSEL_4SI  VSEL_4SI_U
>   vui __builtin_vec_sel (vui, vui, vbi);
> VSEL_4SI_UNS  VSEL_4SI_UB
>   vui __builtin_vec_sel (vui, vui, vui);
> VSEL_4SI_UNS  VSEL_4SI_UU
>   vbi __builtin_vec_sel (vbi, vbi, vbi);
> VSEL_4SI_UNS  VSEL_4SI_BB
>   vbi __builtin_vec_sel (vbi, vbi, vui);
> 
> It considers the control vector can only have unsigned and bool types, also 
> consider the
> return type can be bool.  It aligns with what PVIPR defines, so here we 
> should have:
> 
> vsq __builtin_vec_sel (vsq, vsq, vbq);
> vsq __builtin_vec_sel (vsq, vsq, vuq);
> vuq __builtin_vec_sel (vuq, vuq, vbq);
> vuq __builtin_vec_sel (vuq, vuq, vuq);
> vbq __builtin_vec_sel (vbq, vbq, vbq);
> vbq __builtin_vec_sel (vbq, vbq, vuq);
> 
> Sorry that I didn't find this in the previous review.

Yea, my bad I missed that as well.  Fixed to add all six instances.
> 
> 
>>  ; The following variants are deprecated.
>>vsll __builtin_vec_sel (vsll, vsll, vsll);
>>  VSEL_2DI_B  VSEL_2DI_S
>> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
>> index b88e61641a2..0756230b19e 100644
>> --- a/gcc/doc/extend.texi
>> +++ b/gcc/doc/extend.texi
>> @@ -21372,6 +21372,18 @@ Additional built-in functions are available for the 
>> 64-bit PowerPC
>>  family of processors, for efficient use of 128-bit floating point
>>  (@code{__float128}) values.
>>  
>> +Vector select
>> +
>>

Re: [PATCH 11/13 ver 3] rs6000, extend vec_xxpermdi built-in for __int128 args

2024-06-13 Thread Carl Love
Kewen:

On 6/3/24 22:58, Kewen.Lin wrote:
> Hi,
> 
> on 2024/5/30 00:10, Carl Love wrote:
>>  This was patch 10 from the previous series.  The patch was updated to 
>> address feedback comments.
>>
>> Carl 
>> ---
>>
>> rs6000, extend vec_xxpermdi built-in for __int128 args
>>
>> Add a new signed and unsigned overloaded instances for vec_xxpermdi
>>
>>__int128 vec_xxpermdi (__int128, __int128, const int);
>>__uint128 vec_xxpermdi (__uint128, __uint128, const int);
>>
>> Update the documentation to include a reference to the new built-in
>> instances.
>>
>> Add test cases for the new overloaded instances.
>>
>> gcc/ChangeLog:
>>  * config/rs6000/rs6000-overload.def (vec_xxpermdi): Add new
>>  overloaded built-in instances.
>>  * doc/extend.texi:  Add documentation for new overloaded built-in
>>  instances.
>>
>> gcc/testsuite/ChangeLog:gcc/testsuite/ChangeLog:
>>  * gcc.target/powerpc/vec_perm-runnable-i128.c: New test file.
>> ---
>>  gcc/config/rs6000/rs6000-overload.def |   4 +
>>  gcc/doc/extend.texi   |   2 +
>>  .../powerpc/vec_perm-runnable-i128.c  | 229 ++
>>  3 files changed, 235 insertions(+)
>>  create mode 100644 gcc/testsuite/gcc.target/powerpc/vec_perm-runnable-i128.c
>>
>> diff --git a/gcc/config/rs6000/rs6000-overload.def 
>> b/gcc/config/rs6000/rs6000-overload.def
>> index a210c5ad10d..45000f161e4 100644
>> --- a/gcc/config/rs6000/rs6000-overload.def
>> +++ b/gcc/config/rs6000/rs6000-overload.def
>> @@ -4932,6 +4932,10 @@
>>  XXPERMDI_4SF  XXPERMDI_VF
>>vd __builtin_vsx_xxpermdi (vd, vd, const int);
>>  XXPERMDI_2DF  XXPERMDI_VD
>> +  vsq __builtin_vsx_xxpermdi (vsq, vsq, const int);
>> +XXPERMDI_1TI  XXPERMDI_1TI
>> +  vuq __builtin_vsx_xxpermdi (vuq, vuq, const int);
>> +XXPERMDI_1TI  XXPERMDI_1TUI
> 
> Nits:
>   - Move them before "vf __builtin_vsx_xxpermdi (vf, vf, const int);" so
> they are close to instances for other integral types.
>   - As the existing name convention, _{SQ,UQ} are better.
> 
> vsq __builtin_vsx_xxpermdi (vsq, vsq, const int);
>XXPERMDI_1TI  XXPERMDI_1SQ
> vuq __builtin_vsx_xxpermdi (vuq, vuq, const int);
>XXPERMDI_1TI  XXPERMDI_1UQ
> 

OK, moved the definitions up and changed the names.

>>  
>>  [VEC_XXSLDWI, vec_xxsldwi, __builtin_vsx_xxsldwi]
>>vsc __builtin_vsx_xxsldwi (vsc, vsc, const int);
>> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
>> index 0756230b19e..edfef1bdab7 100644
>> --- a/gcc/doc/extend.texi
>> +++ b/gcc/doc/extend.texi
>> @@ -22555,6 +22555,8 @@ void vec_vsx_st (vector bool char, int, signed char 
>> *);
>>  vector double vec_xxpermdi (vector double, vector double, const int);
>>  vector float vec_xxpermdi (vector float, vector float, const int);
>>  vector long long vec_xxpermdi (vector long long, vector long long, const 
>> int);
> 
>> +vector __int128 vec_xxpermdi (vector __int128, vector __int128, const int);
>> +vector __int128 vec_xxpermdi (vector __uint128, vector __uint128, const 
>> int);
> 
> Nit: These two lines break the long long and unsigned long long lines, can 
> you move
> them one line upward?  Also using the explicit "signed" and "unsigned" would 
> be
> better than "__{u,}int128".
> 

Yup, I didn't get them in the right place.  Fixed.

>>  vector unsigned long long vec_xxpermdi (vector unsigned long long,
>>  vector unsigned long long, const 
>> int);
>>  vector int vec_xxpermdi (vector int, vector int, const int);
>> diff --git a/gcc/testsuite/gcc.target/powerpc/vec_perm-runnable-i128.c 
>> b/gcc/testsuite/gcc.target/powerpc/vec_perm-runnable-i128.c
>> new file mode 100644
>> index 000..2d5dce09404
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/powerpc/vec_perm-runnable-i128.c
>> @@ -0,0 +1,229 @@
>> +/* { dg-do run } */
>> +/* { dg-require-effective-target vmx_hw } */
>> +/* { dg-options "-save-temps" } */
> 
> Nit: dg-options line isn't needed as it doesn't check assembly.

Removed the save-temps.

> 
> BR,
> Kewen
> 
>> +
>> +#include 
>> +
>> +#define DEBUG 0
>> +
>> +#if DEBUG
>> +#include 
>> +void print_i128 (unsigned __int128 val)
>> +{
>> +  printf(" 0x%016llx%016llx",
>> +   

[PATCH] rs6000, altivec-2-runnable.c should be a runnable test

2024-06-13 Thread Carl Love


GCC maintainers:

The test gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c is supposed to 
be a runnable test
to verify the execution of the vec_unpackl and vec_unpackh built-ins.  The 
dg-do command is set to
compile not run.  This patch fixes the dg-do command argument.

The patch has been verified on a P10.  The test runs without errors.

Please let me know if the patch is acceptable.  Thanks.

Carl 

-

rs6000, altivec-2-runnable.c should be a runnable test

The test case has "dg-do compile" set not "dg-do run" for a runnable
test.  This patch changes the dg-do command argument to run.

gcc/testsuite/ChangeLog:gcc/testsuite/ChangeLog:
* gcc.target/powerpc/altivec-2-runnable.c: Change dg-do
argument to run.
---
 gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c 
b/gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c
index 6975ea57e65..3e66435d0d2 100644
--- a/gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c
+++ b/gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c
@@ -1,4 +1,4 @@
-/* { dg-do compile { target powerpc*-*-* } } */
+/* { dg-do run { target powerpc*-*-* } } */
 /* { dg-options "-mvsx" } */
 /* { dg-additional-options "-mdejagnu-cpu=power8" { target { ! has_arch_pwr8 } 
} } */
 /* { dg-require-effective-target powerpc_vsx } */
-- 
2.45.0



[PATCH 2/13 ver4] rs6000, Remove __builtin_vsx_xvcvspsxws,, __builtin_vsx_xvcvdpuxds_uns, __builtin_vsx_xvcvspuxws built-ins.

2024-06-13 Thread Carl Love
GCC maintainers:

Per the comments on patch 0004 from version 3, the removal of 
The built-in __builtin_vsx_xvcvdpuxds_uns and __builtin_vsx_xvcvspuxws was 
moved to this patch.  The rest of the patch is unchanged from version 3.  There 
were no comments on this patch for version 3.

Please let me know if this patch is acceptable.  Thanks.

Carl 


-

rs6000, Remove __builtin_vsx_xvcvspsxws,
 __builtin_vsx_xvcvdpuxds_uns, __builtin_vsx_xvcvspuxws built-ins.

The built-in __builtin_vsx_xvcvspsxws is a duplicate of the vec_signed
built-in that is documented in the PVIPR.  The __builtin_vsx_xvcvspsxws
built-in is not documented and there are no test cases for it.

The built-in __builtin_vsx_xvcvdpuxds_uns is redundant as it is covered by
vec_unsigned, remove.

The __builtin_vsx_xvcvspuxws is redundant as it is covered by
vec_unsigned, remove.

This patch removes the redundant built-in.

gcc/ChangeLog:
* config/rs6000/rs6000-builtins.def (__builtin_vsx_xvcvspsxws,
__builtin_vsx_xvcvdpuxds_uns, __builtin_vsx_xvcvspuxws):
Remove built-in definitions.
---
 gcc/config/rs6000/rs6000-builtins.def | 9 -
 1 file changed, 9 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index 7c36976a089..8cf0b715898 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -1697,9 +1697,6 @@
   const vsll __builtin_vsx_xvcvdpuxds_scale (vd, const int);
 XVCVDPUXDS_SCALE vsx_xvcvdpuxds_scale {}
 
-  const vull __builtin_vsx_xvcvdpuxds_uns (vd);
-XVCVDPUXDS_UNS vsx_fixuns_truncv2dfv2di2 {}
-
   const vsi __builtin_vsx_xvcvdpuxws (vd);
 XVCVDPUXWS vsx_xvcvdpuxws {}
 
@@ -1709,15 +1706,9 @@
   const vsll __builtin_vsx_xvcvspsxds (vf);
 XVCVSPSXDS vsx_xvcvspsxds {}
 
-  const vsi __builtin_vsx_xvcvspsxws (vf);
-XVCVSPSXWS vsx_fix_truncv4sfv4si2 {}
-
   const vsll __builtin_vsx_xvcvspuxds (vf);
 XVCVSPUXDS vsx_xvcvspuxds {}
 
-  const vsi __builtin_vsx_xvcvspuxws (vf);
-XVCVSPUXWS vsx_fixuns_truncv4sfv4si2 {}
-
   const vd __builtin_vsx_xvcvsxddp (vsll);
 XVCVSXDDP vsx_floatv2div2df2 {}
 
-- 
2.45.0



[PATCH 11/13 ver4] rs6000, extend vec_xxpermdi built-in for __int128 args

2024-06-13 Thread Carl Love


GCC maintainers:

The patch has been updated per the comments from version 3.  Please let me know 
if the patch is acceptable for mainline.

Thanks.

 Carl 

-

rs6000, extend vec_xxpermdi built-in for __int128 args

Add a new signed and unsigned overloaded instances for vec_xxpermdi

   __int128 vec_xxpermdi (__int128, __int128, const int);
   __uint128 vec_xxpermdi (__uint128, __uint128, const int);

Update the documentation to include a reference to the new built-in
instances.

Add test cases for the new overloaded instances.

gcc/ChangeLog:
* config/rs6000/rs6000-overload.def (vec_xxpermdi): Add new
overloaded built-in instances.
* doc/extend.texi:  Add documentation for new overloaded built-in
instances.

gcc/testsuite/ChangeLog:gcc/testsuite/ChangeLog:
* gcc.target/powerpc/vec_perm-runnable-i128.c: New test file.
---
 gcc/config/rs6000/rs6000-overload.def |   4 +
 gcc/doc/extend.texi   |   4 +
 .../powerpc/vec_perm-runnable-i128.c  | 229 ++
 3 files changed, 237 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vec_perm-runnable-i128.c

diff --git a/gcc/config/rs6000/rs6000-overload.def 
b/gcc/config/rs6000/rs6000-overload.def
index 6cec1ad4f1a..354f8fabe0f 100644
--- a/gcc/config/rs6000/rs6000-overload.def
+++ b/gcc/config/rs6000/rs6000-overload.def
@@ -4936,6 +4936,10 @@
 XXPERMDI_2DI  XXPERMDI_VSLL
   vull __builtin_vsx_xxpermdi (vull, vull, const int);
 XXPERMDI_2DI  XXPERMDI_VULL
+  vsq __builtin_vsx_xxpermdi (vsq, vsq, const int);
+XXPERMDI_1TI  XXPERMDI_1SQ
+  vuq __builtin_vsx_xxpermdi (vuq, vuq, const int);
+XXPERMDI_1TI  XXPERMDI_1UQ
   vf __builtin_vsx_xxpermdi (vf, vf, const int);
 XXPERMDI_4SF  XXPERMDI_VF
   vd __builtin_vsx_xxpermdi (vd, vd, const int);
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index d7d8d149a43..9e45976436b 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -22610,6 +22610,10 @@ void vec_vsx_st (vector bool char, int, signed char *);
 
 vector double vec_xxpermdi (vector double, vector double, const int);
 vector float vec_xxpermdi (vector float, vector float, const int);
+vector __int128 vec_xxpermdi (vector signed __int128,
+  vector signed __int128, const int);
+vector __int128 vec_xxpermdi (vector unsigned __int128,
+  vector unsigned __int128, const int);
 vector long long vec_xxpermdi (vector long long, vector long long, const int);
 vector unsigned long long vec_xxpermdi (vector unsigned long long,
 vector unsigned long long, const int);
diff --git a/gcc/testsuite/gcc.target/powerpc/vec_perm-runnable-i128.c 
b/gcc/testsuite/gcc.target/powerpc/vec_perm-runnable-i128.c
new file mode 100644
index 000..0e0d77bcb84
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec_perm-runnable-i128.c
@@ -0,0 +1,229 @@
+/* { dg-do run } */
+/* { dg-require-effective-target vmx_hw } */
+/* { dg-options "-maltivec -O2 " } */
+
+#include 
+
+#define DEBUG 0
+
+#if DEBUG
+#include 
+void print_i128 (unsigned __int128 val)
+{
+  printf(" 0x%016llx%016llx",
+ (unsigned long long)(val >> 64),
+ (unsigned long long)(val & 0x));
+}
+#endif
+
+extern void abort (void);
+
+union convert_union {
+  vector signed __int128s128;
+  vector unsigned __int128  u128;
+  char  val[16];
+} convert;
+
+int check_u128_result(vector unsigned __int128 vresult_u128,
+ vector unsigned __int128 expected_vresult_u128)
+{
+  /* Use a for loop to check each byte manually so the test case will
+ run with ISA 2.06.
+
+ Return 1 if they match, 0 otherwise.  */
+
+  int i;
+
+  union convert_union result;
+  union convert_union expected;
+
+  result.u128 = vresult_u128;
+  expected.u128 = expected_vresult_u128;
+
+  /* Check if each byte of the result and expected match. */
+  for (i = 0; i < 16; i++)
+{
+  if (result.val[i] != expected.val[i])
+   return 0;
+}
+  return 1;
+}
+
+int check_s128_result(vector signed __int128 vresult_s128,
+ vector signed __int128 expected_vresult_s128)
+{
+  /* Convert the arguments to unsigned, then check equality.  */
+  union convert_union result;
+  union convert_union expected;
+
+  result.s128 = vresult_s128;
+  expected.s128 = expected_vresult_s128;
+
+  return check_u128_result (result.u128, expected.u128);
+}
+
+
+int
+main (int argc, char *argv [])
+{
+  int i;
+  
+  vector signed __int128 src_va_s128;
+  vector signed __int128 src_vb_s128;
+  vector signed __int128 vresult_s128;
+  vector signed __int128 expected_vresult_s128;
+
+  vector unsigned __int128 src_va_u128;
+  vector unsigned __int128 src_vb_u128;
+  vector unsigned __int128 src_vc_u128;
+  vector unsigned __int128 vresult_u128;
+  vector unsigned __int128 expected_

Re: [PATCH 13/13 ver4] rs6000, remove vector set and vector init built-ins

2024-06-13 Thread Carl Love
GCC maintainers:

The patch has been updated per the feedback from version 3.  Please let me know 
it the patch is acceptable for mainline.

Thanks.

  Carl 

--

rs6000, remove vector set and vector init built-ins

The vector init built-ins:

  __builtin_vec_init_v16qi, __builtin_vec_init_v8hi,
  __builtin_vec_init_v4si, __builtin_vec_init_v4sf,
  __builtin_vec_init_v2di, __builtin_vec_init_v2df,
  __builtin_vec_init_v1ti

perform the same operation as initializing the vector in C code.  For
example:

  result_v4si = __builtin_vec_init_v4si (1, 2, 3, 4);
  result_v4si = {1, 2, 3, 4};

These two constructs were tested and verified they generate identical
assembly instructions with no optimization and -O3 optimization.

The vector set built-ins:

  __builtin_vec_set_v16qi, __builtin_vec_set_v8hi.
  __builtin_vec_set_v4si, __builtin_vec_set_v4sf,
  __builtin_vec_set_v1ti, __builtin_vec_set_v2di,
  __builtin_vec_set_v2df

perform the same operation as setting a specific element in the vector in
C code.  For example:

  src_v4si = __builtin_vec_set_v4si (src_v4si, int_val, index);
  src_v4si[index] = int_val;

The built-in actually generates more instructions than the inline C code
with no optimization but is identical with -O3 optimizations.

All of the above built-ins that are removed do not have test cases and
are not documented.

Built-ins   __builtin_vec_set_v1ti __builtin_vec_set_v2di,
__builtin_vec_set_v2df are not removed as they are used in function
resolve_vec_insert() in file rs6000-c.cc.

The built-ins are removed as they don't provide any benefit over just
using C code.

gcc/ChangeLog:
* config/rs6000/rs6000-builtins.def (__builtin_vec_init_v16qi,
__builtin_vec_init_v4sf, __builtin_vec_init_v4si,
__builtin_vec_init_v8hi, __builtin_vec_init_v1ti,
__builtin_vec_init_v2df, __builtin_vec_init_v2di,
__builtin_vec_set_v16qi, __builtin_vec_set_v4sf,
__builtin_vec_set_v4si, __builtin_vec_set_v8hi): Remove
built-in definitions.
---
 gcc/config/rs6000/rs6000-builtins.def | 44 +++
 1 file changed, 4 insertions(+), 40 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index 02aa04e5698..053dc0115d2 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -1118,37 +1118,6 @@
   const signed short __builtin_vec_ext_v8hi (vss, signed int);
 VEC_EXT_V8HI nothing {extract}
 
-  const vsc __builtin_vec_init_v16qi (signed char, signed char, signed char, \
-signed char, signed char, signed char, signed char, signed char, \
-signed char, signed char, signed char, signed char, signed char, \
-signed char, signed char, signed char);
-VEC_INIT_V16QI nothing {init}
-
-  const vf __builtin_vec_init_v4sf (float, float, float, float);
-VEC_INIT_V4SF nothing {init}
-
-  const vsi __builtin_vec_init_v4si (signed int, signed int, signed int, \
- signed int);
-VEC_INIT_V4SI nothing {init}
-
-  const vss __builtin_vec_init_v8hi (signed short, signed short, signed short,\
- signed short, signed short, signed short, signed short, \
- signed short);
-VEC_INIT_V8HI nothing {init}
-
-  const vsc __builtin_vec_set_v16qi (vsc, signed char, const int<4>);
-VEC_SET_V16QI nothing {set}
-
-  const vf __builtin_vec_set_v4sf (vf, float, const int<2>);
-VEC_SET_V4SF nothing {set}
-
-  const vsi __builtin_vec_set_v4si (vsi, signed int, const int<2>);
-VEC_SET_V4SI nothing {set}
-
-  const vss __builtin_vec_set_v8hi (vss, signed short, const int<3>);
-VEC_SET_V8HI nothing {set}
-
-
 ; Cell builtins.
 [cell]
   pure vsc __builtin_altivec_lvlx (signed long, const void *);
@@ -1295,15 +1264,10 @@
   const signed long long __builtin_vec_ext_v2di (vsll, signed int);
 VEC_EXT_V2DI nothing {extract}
 
-  const vsq __builtin_vec_init_v1ti (signed __int128);
-VEC_INIT_V1TI nothing {init}
-
-  const vd __builtin_vec_init_v2df (double, double);
-VEC_INIT_V2DF nothing {init}
-
-  const vsll __builtin_vec_init_v2di (signed long long, signed long long);
-VEC_INIT_V2DI nothing {init}
-
+;; VEC_SET_V1TI, VEC_SET_V2DF and VEC_SET_V2DI are used in
+;; resolve_vec_insert(), rs6000-c.cc
+;; TODO: Remove VEC_SET_V1TI, VEC_SET_V2DF and VEC_SET_V2DI once the uses
+;; in resolve_vec_insert are replaced by the equivalent gimple statements.
   const vsq __builtin_vec_set_v1ti (vsq, signed __int128, const int<0,0>);
 VEC_SET_V1TI nothing {set}
 
-- 
2.45.0



[PATCH 0/13 ver4] rs6000, built-in cleanup patch series

2024-06-13 Thread Carl Love
GCC maintainers:

I have addressed the comments to the five patches in the series that have not 
yet been approved.
The patches that have already been approved are 1, 3, 5, 6, 8, 9, 10, and 12.

The remaining patches all have fairly minor fixes requested.  I will just post 
version 4 of these patches here.  The goal is to commit the entire series all 
at once as they are all related.  So I a holding off committing the approved 
patches.  

Thank you for your time and feedback of these patches.  The entire patch series 
has been tested on Power 10 LE, Power 9 BE with no regression failures.

   Carl 


Re: [PATCH 13/13 ver 3] rs6000, remove vector set and vector init built-ins.

2024-06-13 Thread Carl Love
Kewen:

On 6/3/24 22:59, Kewen.Lin wrote:
> Hi,
> 
> on 2024/5/30 00:16, Carl Love wrote:
>> This was patch 13 from the previous series.  Note the previous series patch 
>> 12 was dropped.  This patch is the same as the previous version.  The 
>> additional work to remove  __builtin_vec_set_v1ti, __builtin_vec_set_v2di,  
>> __builtin_vec_set_v2d per the feedback comments with equivalent gimple code 
>> is being deferred to a future patch.  The goal of this series was simply to 
>> remove duplicated built-ins, extending overloaded built-ins as needed.  
>> Adding the needed gimple code to remove the additional built-ins is beyond 
>> the goal of this patch series.
>>
>>  Carl 
>> ---
>>
>> rs6000, remove vector set and vector init built-ins.
>>
>> The vector init built-ins:
>>
>>   __builtin_vec_init_v16qi, __builtin_vec_init_v8hi,
>>   __builtin_vec_init_v4si, __builtin_vec_init_v4sf,
>>   __builtin_vec_init_v2di, __builtin_vec_init_v2df,
>>   __builtin_vec_set_v1ti
> 
> Typo here, s/__builtin_vec_set_v1ti/__builtin_vec_init_v1ti/

Fixed.

> 
>>
>> perform the same operation as initializing the vector in C code.  For
>> example:
>>
>>   result_v4si = __builtin_vec_init_v4si (1, 2, 3, 4);
>>   result_v4si = {1, 2, 3, 4};
>>
>> These two constructs were tested and verified they generate identical
>> assembly instructions with no optimization and -O3 optimization.
>>
>> The vector set built-ins:
>>
>>   __builtin_vec_set_v16qi, __builtin_vec_set_v8hi.
>>   __builtin_vec_set_v4si, __builtin_vec_set_v4sf
> 
> Please also add the reserved ones (...v1ti/v2di/v2df), as they are the 
> same too, temporarily reserving them for the uses in resolve_vec_insert()
> doesn't affect this.

Added the three additional built-ins to the list.

> 
>>
>> perform the same operation as setting a specific element in the vector in
>> C code.  For example:
>>
>>   src_v4si = __builtin_vec_set_v4si (src_v4si, int_val, index);
>>   src_v4si[index] = int_val;
>>
>> The built-in actually generates more instructions than the inline C code
>> with no optimization but is identical with -O3 optimizations.
>>
>> All of the above built-ins that are removed do not have test cases and
>> are not documented.
>>
>> Built-ins   __builtin_vec_set_v1ti __builtin_vec_set_v2di,
>> __builtin_vec_set_v2df are not removed as they are used in function
>> resolve_vec_insert() in file rs6000-c.cc.
>>
>> The built-ins are removed as they don't provide any benefit over just
>> using C code.
>>
>> gcc/ChangeLog:
>>  * config/rs6000/rs6000-builtins.def (__builtin_vec_init_v16qi,
>>  __builtin_vec_init_v8hi, __builtin_vec_init_v4si,
>>  __builtin_vec_init_v4sf, __builtin_vec_init_v2di,
>>  __builtin_vec_init_v2df, __builtin_vec_set_v1ti,
> 
> Typo, s/__builtin_vec_set_v1ti/__builtin_vec_init_v1ti/

Fixed

> 
>>  __builtin_vec_set_v16qi, __builtin_vec_set_v8hi.
>>  __builtin_vec_set_v4si, __builtin_vec_set_v4sf,
>>  __builtin_vec_set_v2di, __builtin_vec_set_v2df,
>>  __builtin_vec_set_v1ti): Remove built-in definitions.
> 
> The last three ones are not actually removed.

OK, fixed.

> 
>> ---
>>  gcc/config/rs6000/rs6000-builtins.def | 42 ++-
>>  1 file changed, 2 insertions(+), 40 deletions(-)
>>
>> diff --git a/gcc/config/rs6000/rs6000-builtins.def 
>> b/gcc/config/rs6000/rs6000-builtins.def
>> index 48ebc018a8d..8349d45169f 100644
>> --- a/gcc/config/rs6000/rs6000-builtins.def
>> +++ b/gcc/config/rs6000/rs6000-builtins.def
>> @@ -1118,37 +1118,6 @@
>>const signed short __builtin_vec_ext_v8hi (vss, signed int);
>>  VEC_EXT_V8HI nothing {extract}
>>  
>> -  const vsc __builtin_vec_init_v16qi (signed char, signed char, signed 
>> char, \
>> -signed char, signed char, signed char, signed char, signed 
>> char, \
>> -signed char, signed char, signed char, signed char, signed 
>> char, \
>> -signed char, signed char, signed char);
>> -VEC_INIT_V16QI nothing {init}
>> -
>> -  const vf __builtin_vec_init_v4sf (float, float, float, float);
>> -VEC_INIT_V4SF nothing {init}
>> -
>> -  const vsi __builtin_vec_init_v4si (signed int, signed int, signed int, \
>> - signed int);
>> -VEC_INIT_V4SI nothing {init}
>> -
>> -  const vss __bu

[PATCH 7/13 ver4] rs6000, add overloaded vec_sel with int128 arguments

2024-06-13 Thread Carl Love


GCC maintainers:

The patch has been updated per the comments from version 3.  Please let me know 
if the patch is acceptable for mainline.

 Carl 

-

rs6000, add overloaded vec_sel with int128 arguments

Extend the vec_sel built-in to take three signed/unsigned/bool int128
arguments and return a signed/unsigned/bool int128 result.

Extending the vec_sel built-in makes the existing buit-ins
__builtin_vsx_xxsel_1ti and __builtin_vsx_xxsel_1ti_uns obsolete.  The
patch removes these built-ins.

The patch adds documentation and test cases for the new overloaded
vec_sel built-ins.

gcc/ChangeLog:
* config/rs6000/rs6000-builtins.def (__builtin_vsx_xxsel_1ti,
__builtin_vsx_xxsel_1ti_uns): Remove built-in definitions.
* config/rs6000/rs6000-overload.def (vec_sel): Add new
overloaded  definitions.
* doc/extend.texi: Add documentation for new vec_sel instances.

gcc/testsuite/ChangeLog:
* gcc.target/powerpc/builtins-10-runnable.c: New runnable test
file.
* gcc.target/powerpc/builtins-10.c: New compile only test file.
---
 gcc/config/rs6000/rs6000-builtins.def |   6 -
 gcc/config/rs6000/rs6000-overload.def |  12 +
 gcc/doc/extend.texi   |  20 ++
 .../gcc.target/powerpc/builtins-10-runnable.c | 220 ++
 .../gcc.target/powerpc/builtins-10.c  |  63 +
 5 files changed, 315 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/builtins-10-runnable.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/builtins-10.c

diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index b90b3f34167..c969cd0f3f6 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -1907,12 +1907,6 @@
   const vuc __builtin_vsx_xxsel_16qi_uns (vuc, vuc, vuc);
 XXSEL_16QI_UNS vector_select_v16qi_uns {}
 
-  const vsq __builtin_vsx_xxsel_1ti (vsq, vsq, vsq);
-XXSEL_1TI vector_select_v1ti {}
-
-  const vsq __builtin_vsx_xxsel_1ti_uns (vsq, vsq, vsq);
-XXSEL_1TI_UNS vector_select_v1ti_uns {}
-
   const vd __builtin_vsx_xxsel_2df (vd, vd, vd);
 XXSEL_2DF vector_select_v2df {}
 
diff --git a/gcc/config/rs6000/rs6000-overload.def 
b/gcc/config/rs6000/rs6000-overload.def
index 4d857bb1af3..6cec1ad4f1a 100644
--- a/gcc/config/rs6000/rs6000-overload.def
+++ b/gcc/config/rs6000/rs6000-overload.def
@@ -3274,6 +3274,18 @@
 VSEL_2DF  VSEL_2DF_B
   vd __builtin_vec_sel (vd, vd, vull);
 VSEL_2DF  VSEL_2DF_U
+  vsq __builtin_vec_sel (vsq, vsq, vbq);
+VSEL_1TI  VSEL_1TI_B
+  vsq __builtin_vec_sel (vsq, vsq, vuq);
+VSEL_1TI  VSEL_1TI_U
+  vuq __builtin_vec_sel (vuq, vuq, vbq);
+VSEL_1TI_UNS  VSEL_1TI_UB
+  vuq __builtin_vec_sel (vuq, vuq, vuq);
+VSEL_1TI_UNS  VSEL_1TI_UU
+  vbq __builtin_vec_sel (vbq, vbq, vbq);
+VSEL_1TI_UNS  VSEL_1TI_BB
+  vbq __builtin_vec_sel (vbq, vbq, vuq);
+VSEL_1TI_UNS  VSEL_1TI_BU
 ; The following variants are deprecated.
   vsll __builtin_vec_sel (vsll, vsll, vsll);
 VSEL_2DI_B  VSEL_2DI_S
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index b1620274285..d7d8d149a43 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -21420,6 +21420,26 @@ Additional built-in functions are available for the 
64-bit PowerPC
 family of processors, for efficient use of 128-bit floating point
 (@code{__float128}) values.
 
+Vector select
+
+@smallexample
+vector signed __int128 vec_sel (vector signed __int128,
+   vector signed __int128, vector bool __int128);
+vector signed __int128 vec_sel (vector signed __int128,
+   vector signed __int128, vector unsigned __int128);
+vector unsigned __int128 vec_sel (vector unsigned __int128,
+   vector unsigned __int128, vector bool __int128);
+vector unsigned __int128 vec_sel (vector unsigned __int128,
+   vector unsigned __int128, vector unsigned __int128);
+vector bool __int128 vec_sel (vector bool __int128,
+   vector bool __int128, vector bool __int128);
+vector bool __int128 vec_sel (vector bool __int128,
+   vector bool __int128, vector unsigned __int128);
+@end smallexample
+
+The instance is an extension of the exiting overloaded built-in @code{vec_sel}
+that is documented in the PVIPR.
+
 @node Basic PowerPC Built-in Functions Available on ISA 2.06
 @subsubsection Basic PowerPC Built-in Functions Available on ISA 2.06
 
diff --git a/gcc/testsuite/gcc.target/powerpc/builtins-10-runnable.c 
b/gcc/testsuite/gcc.target/powerpc/builtins-10-runnable.c
new file mode 100644
index 000..b7b4a95ea0e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/builtins-10-runnable.c
@@ -0,0 +1,220 @@
+/* { dg-do run } */
+/* { dg-require-effective-target vmx_hw } */
+/* { dg-options "-maltivec -O2 " } */
+
+#include 
+
+#define DEBUG 0
+
+#if DEBUG
+#include 
+vo

[PATCH 4/13 ver4] rs6000, extend the current vec_{un,}signed{e,o}, built-ins

2024-06-13 Thread Carl Love


GCC maintainers:

As noted the removal of __builtin_vsx_xvcvdpuxds_uns and 
__builtin_vsx_xvcvspuxws was moved to patch 2 in the seris.  The patch has been 
updated per the comments from version 3.

Please let me know if this patch is acceptable for mainline.  

 Carl 

--

rs6000, extend the current vec_{un,}signed{e,o} built-ins

The built-ins __builtin_vsx_xvcvspsxds and __builtin_vsx_xvcvspuxds
convert a vector of floats to signed/unsigned long long ints.  Extend the
existing vec_{un,}signed{e,o} built-ins to handle the argument
vector of floats to return the even/odd signed/unsigned integers.

The define expands vsignede_v4sf, vsignedo_v4sf, vunsignede_v4sf,
vunsignedo_v4sf are added to support the new vec_{un,}signed{e,o}
built-ins.

The built-ins __builtin_vsx_xvcvspsxds and __builtin_vsx_xvcvspuxds are
now for internal use only. They are not documented and they do not
have testcases.

The built-in __builtin_vsx_xvcvdpsxws is redundant as it is covered by
vec_signed{e,o}, remove.

The built-in __builtin_vsx_xvcvdpuxws is redundant as it is covered by
vec_unsigned{e,o}, remove.

Add testcases and update documentation.

gcc/ChangeLog:
* config/rs6000/rs6000-builtins.def: __builtin_vsx_xvcvdpsxws,
__builtin_vsx_xvcvdpuxws): Removed.
(__builtin_vsx_xvcvspsxds, __builtin_vsx_xvcvspuxds): Renamed
__builtin_vsignede_v4sf, __builtin_vunsignede_v4sf respectively.
(XVCVSPSXDS, XVCVSPUXDS): Renamed VEC_VSIGNEDE_V4SF,
VEC_VUNSIGNEDE_V4SF respectively.
(__builtin_vsignedo_v4sf, __builtin_vunsignedo_v4sf): New
built-in definitions.
* config/rs6000/rs6000-overload.def (vec_signede, vec_signedo,
vec_unsignede,vec_unsignedo):  Add new overloaded specifications.
* config/rs6000/vsx.md (vsignede_v4sf, vsignedo_v4sf,
vunsignede_v4sf, vunsignedo_v4sf): New  define_expands.
* doc/extend.texi (vec_signedo, vec_signede): Add documentation
for new overloaded built-ins.

gcc/testsuite/ChangeLog:
* gcc.target/powerpc/builtins-3-runnable.c
(test_unsigned_int_result, test_ll_unsigned_int_result): Add
new argument.
(vec_signede, vec_signedo, vec_unsignede, vec_unsignedo): New
tests for the overloaded built-ins.
---  gcc/config/rs6000/rs6000-builtins.def | 20 ++---
 gcc/config/rs6000/rs6000-overload.def |  8 ++
 gcc/config/rs6000/vsx.md  | 84 +++
 gcc/doc/extend.texi   | 10 +++
 .../gcc.target/powerpc/builtins-3-runnable.c  | 49 +--
 5 files changed, 154 insertions(+), 17 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index 322d27b7a0d..29a9deb3410 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -1688,26 +1688,26 @@
   const vsll __builtin_vsx_xvcvdpsxds_scale (vd, const int);
 XVCVDPSXDS_SCALE vsx_xvcvdpsxds_scale {}
 
-  const vsi __builtin_vsx_xvcvdpsxws (vd);
-XVCVDPSXWS vsx_xvcvdpsxws {}
-
   const vsll __builtin_vsx_xvcvdpuxds (vd);
 XVCVDPUXDS vsx_fixuns_truncv2dfv2di2 {}
 
   const vsll __builtin_vsx_xvcvdpuxds_scale (vd, const int);
 XVCVDPUXDS_SCALE vsx_xvcvdpuxds_scale {}
 
-  const vsi __builtin_vsx_xvcvdpuxws (vd);
-XVCVDPUXWS vsx_xvcvdpuxws {}
-
   const vd __builtin_vsx_xvcvspdp (vf);
 XVCVSPDP vsx_xvcvspdp {}
 
-  const vsll __builtin_vsx_xvcvspsxds (vf);
-XVCVSPSXDS vsx_xvcvspsxds {}
+  const vsll __builtin_vsignede_v4sf (vf);
+VEC_VSIGNEDE_V4SF vsignede_v4sf {}
+
+  const vsll __builtin_vsignedo_v4sf (vf);
+VEC_VSIGNEDO_V4SF vsignedo_v4sf {}
+
+  const vull __builtin_vunsignede_v4sf (vf);
+VEC_VUNSIGNEDE_V4SF vunsignede_v4sf {}
 
-  const vsll __builtin_vsx_xvcvspuxds (vf);
-XVCVSPUXDS vsx_xvcvspuxds {}
+  const vull __builtin_vunsignedo_v4sf (vf);
+VEC_VUNSIGNEDO_V4SF vunsignedo_v4sf {}
 
   const vd __builtin_vsx_xvcvsxddp (vsll);
 XVCVSXDDP vsx_floatv2div2df2 {}
diff --git a/gcc/config/rs6000/rs6000-overload.def 
b/gcc/config/rs6000/rs6000-overload.def
index 84bd9ae6554..4d857bb1af3 100644
--- a/gcc/config/rs6000/rs6000-overload.def
+++ b/gcc/config/rs6000/rs6000-overload.def
@@ -3307,10 +3307,14 @@
 [VEC_SIGNEDE, vec_signede, __builtin_vec_vsignede]
   vsi __builtin_vec_vsignede (vd);
 VEC_VSIGNEDE_V2DF
+  vsll __builtin_vec_vsignede (vf);
+VEC_VSIGNEDE_V4SF
 
 [VEC_SIGNEDO, vec_signedo, __builtin_vec_vsignedo]
   vsi __builtin_vec_vsignedo (vd);
 VEC_VSIGNEDO_V2DF
+  vsll __builtin_vec_vsignedo (vf);
+VEC_VSIGNEDO_V4SF
 
 [VEC_SIGNEXTI, vec_signexti, __builtin_vec_signexti]
   vsi __builtin_vec_signexti (vsc);
@@ -4433,10 +4437,14 @@
 [VEC_UNSIGNEDE, vec_unsignede, __builtin_vec_vunsignede]
   vui __builtin_vec_vunsignede (vd);
 VEC_VUNSIGNEDE_V2DF
+  vull __builtin_vec_vunsignede (vf);
+VEC_VUNSIGNEDE

Re: [PATCH] rs6000, altivec-2-runnable.c should be a runnable test

2024-06-13 Thread Carl Love
Segher:

On 6/13/24 12:51, Segher Boessenkool wrote:



> 
>> --- a/gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c
>> +++ b/gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c
>> @@ -1,4 +1,4 @@
>> -/* { dg-do compile { target powerpc*-*-* } } */
>> +/* { dg-do run { target powerpc*-*-* } } */
>>  /* { dg-options "-mvsx" } */
>>  /* { dg-additional-options "-mdejagnu-cpu=power8" { target { ! 
>> has_arch_pwr8 } } } */
>>  /* { dg-require-effective-target powerpc_vsx } */
> 
> Everything in gcc.target/powerpc/ is tested for "target powerpc*-*-*"
> already, so you could remove that target clause even (after testing of
> course :-) )
> 
> Okay for trunk with or without that extra tweak.  Thank you!

I updated the patch by removing the target clause as suggested:

-/* { dg-do compile { target powerpc*-*-* } } */
+/* { dg-do run } */
 /* { dg-options "-mvsx" } */
 /* { dg-additional-options "-mdejagnu-cpu=power8" { target { ! has_arch_pwr8 } 
} } */
 /* { dg-require-effective-target powerpc_vsx } */
 
Retested on Power 10.  Reports 2 passes and no failures.  I will go ahead and 
commit.

Thanks. 

   Carl 


[PATCH] rs6000, altivec-2-runnable.c update the require-effective-target

2024-06-14 Thread Carl Love
GCC maintainers:

Per the additional feedback after patch: 

  commit c892525813c94b018464d5a4edc17f79186606b7
  Author: Carl Love 
  Date:   Tue Jun 11 14:01:16 2024 -0400

  rs6000, altivec-2-runnable.c should be a runnable test

  The test case has "dg-do compile" set not "dg-do run" for a runnable
  test.  This patch changes the dg-do command argument to run.

  gcc/testsuite/ChangeLog:gcc/testsuite/ChangeLog:
  * gcc.target/powerpc/altivec-2-runnable.c: Change dg-do
  argument to run.

was approved and committed, I have updated the dg-require-effective-target
and dg-options as requested so the test will compile with -O2 on a 
machine that has a minimum support of Power 8 vector hardware.

The patch has been tested on Power 10 with no regression failures.

Please let me know if this patch is acceptable for mainline.  Thanks.

Carl 

--

rs6000, altivec-2-runnable.c update the require-effective-target

The test requires a minimum of Power8 vector HW and a compile level
of -O2.

gcc/testsuite/ChangeLog:gcc/testsuite/ChangeLog:
* gcc.target/powerpc/altivec-2-runnable.c: Change the
require-effective-target for the test.
---
 gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c 
b/gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c
index 17b23eb9d50..04c7d1ac70e 100644
--- a/gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c
+++ b/gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c
@@ -1,7 +1,6 @@
 /* { dg-do run } */
-/* { dg-options "-mvsx" } */
-/* { dg-additional-options "-mdejagnu-cpu=power8" { target { ! has_arch_pwr8 } 
} } */
-/* { dg-require-effective-target powerpc_vsx } */
+/* { dg-options "-O2 -mdejagnu-cpu=power8" } */
+/* { dg-require-effective-target p8vector_hw } */
 
 #include 
 
-- 
2.45.0



Re: [PATCH] rs6000, altivec-2-runnable.c update the require-effective-target

2024-06-18 Thread Carl Love
Kewen, Peter, Segher:

On 6/17/24 19:56, Kewen.Lin wrote:
> Hi,
> 
> on 2024/6/18 00:08, Peter Bergner wrote:
>> On 6/14/24 1:37 PM, Carl Love wrote:
>>> Per the additional feedback after patch: 
>>>
>>>   commit c892525813c94b018464d5a4edc17f79186606b7
>>>   Author: Carl Love 
>>>   Date:   Tue Jun 11 14:01:16 2024 -0400
>>>
>>>   rs6000, altivec-2-runnable.c should be a runnable test
>>> 
>>>   The test case has "dg-do compile" set not "dg-do run" for a runnable
>>>   test.  This patch changes the dg-do command argument to run.
>>> 
>>>   gcc/testsuite/ChangeLog:gcc/testsuite/ChangeLog:
>>>   * gcc.target/powerpc/altivec-2-runnable.c: Change dg-do
>>>   argument to run.
>>
>> Test case altivec-1-runnable.c seems to have the same issue, in that it
>> is currently a dg-do compile test case rather than the intended dg-do run.
> 
> Good catch!

OK, will update that as well.  I think it will need the same header as 
altivec-2-runnable.c
so once we have a final change for altivec-2-runnable.c, I will make the header 
for
altivec-1-runnable.c be the same.

> 
>> Can you have a look at changing that to dg-do run too?  My guess it that
>> this one will want something similar to some other altivec test cases, ala:
>>
>> /* { dg-do run { target vmx_hw } } */
>> /* { dg-do compile { target { ! vmx_hw } } } */
>> /* { dg-require-effective-target powerpc_altivec_ok } */
>> /* { dg-options "-O2 -maltivec -mabi=altivec" } */
> 
> I'd expect the "-runnable" test case focuses on testing for run.  Normally,
> the one without "-runnable" would focus on testing for compiling (scan some
> desired insn), but this altivec-1.c and altivec-1-runnable.c seems to test
> for different things, maybe we should separate them into different names
> if they don't test for a same test point.

The altivec-1-runnable.c and altivec-2-runnable.c tests were added for various
built-ins that didn't have any test cases.  There wasn't an intention that 
there was 
any connection to the existing altivec-*.c test files.  I started creating 
runnable
when I started adding support for built-ins that we claimed to support but had 
never
actually been implemented.  I created runnable tests to make sure my 
implementation
actually worked.  I continued to add runnable tests for built-ins
that existed but didn't have a test case.  Adding runnable tests did find a 
couple
of issues where the existing implementation had a bug.  

That all said, if we want tochange the name of altivec-1-runnable.c and 
altivec-2-runnable.c a different naming scheme that is fine with me. Perhaps we 
should 
finish fixing the header for this test file, then do altivec-1-runnable, and 
then 
a final patch that does all the file renaming?

> 
>>
>> That said, I don't like not having a -mdejagnu-cpu=... here.
>> I think for our server cpus, this is fine, but on an embedded system
>> with a old ISA default for -mcpu=... (so we be doing a dg-do compile),
>> just adding -maltivec to that default may not make much sense for that
>> default and probably should be an error.  Maybe something like:
> 
> Yes, for some embedded cpus, there will be some error messages, but since
> we have powerpc_altivec_ok effective target, the error would make that
> effective target checking fail so I'd expect it'll stop it being tested
> (unsupported).
> 
>>
>> /* { dg-do run { target vmx_hw } } */
>> /* { dg-do compile { target { ! vmx_hw } } } */
>> /* { dg-require-effective-target powerpc_altivec_ok } */
>> /* { dg-options "-O2 -mdejagnu=power7" } */
>>
>> ...makes more sense?   Ke Wen & Segher, thoughts on that?
>> Ke Wen, should powerpc_altivec_ok be powerpc_altivec here???
> 
> Yes, I just pushed r15-1390 for this change.
> 
> BR,
> Kewen
> 

We had -mdejagnu=power8 before, but it looks like we want to go to power7 now.

It sounds like we want the following:

/* { dg-do run { target vmx_hw } } */
/* { dg-do compile { target { ! vmx_hw } } } */
/* { dg-options "-O2 -mdejagnu=power7" } */
/* { dg-require-effective-target powerpc_altivec } */

 Carl 


[PATCH ver2] rs6000, altivec-2-runnable.c update the require-effective-target

2024-06-19 Thread Carl Love
GCC maintainers:

version 2:  Updated per the feedback from Peter, Kewen and Segher.  Note, Peter 
suggested the -mdejagnu-cpu= value must be power7.  
The test fails if -mdejagnu-cpu= is set to power7, needs to be power8.  Patch 
has been retested on a Power 10 box, it succeeds
with 2 passes and no fails.

Per the additional feedback after patch: 

  commit c892525813c94b018464d5a4edc17f79186606b7
  Author: Carl Love 
  Date:   Tue Jun 11 14:01:16 2024 -0400

  rs6000, altivec-2-runnable.c should be a runnable test

  The test case has "dg-do compile" set not "dg-do run" for a runnable
  test.  This patch changes the dg-do command argument to run.

  gcc/testsuite/ChangeLog:gcc/testsuite/ChangeLog:
  * gcc.target/powerpc/altivec-2-runnable.c: Change dg-do
  argument to run.

was approved and committed, I have updated the dg-require-effective-target
and dg-options as requested so the test will compile with -O2 on a 
machine that has a minimum support of Power 8 vector hardware.

The patch has been tested on Power 10 with no regression failures.

Please let me know if this patch is acceptable for mainline.  Thanks.

Carl 


rs6000, altivec-2-runnable.c update the require-effective-target

The test requires a minimum of Power8 vector HW and a compile level
of -O2.

gcc/testsuite/ChangeLog:gcc/testsuite/ChangeLog:
* gcc.target/powerpc/altivec-2-runnable.c: Change the
require-effective-target for the test.
---
 gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c 
b/gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c
index 17b23eb9d50..9e7ef89327b 100644
--- a/gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c
+++ b/gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c
@@ -1,7 +1,7 @@
-/* { dg-do run } */
-/* { dg-options "-mvsx" } */
-/* { dg-additional-options "-mdejagnu-cpu=power8" { target { ! has_arch_pwr8 } 
} } */
-/* { dg-require-effective-target powerpc_vsx } */
+/* { dg-do run { target vsx_hw } } */
+/* { dg-do compile { target { ! vmx_hw } } } */
+/* { dg-options "-O2  -mdejagnu-cpu=power8" } */
+/* { dg-require-effective-target powerpc_altivec } */
 
 #include 
 
-- 
2.45.0



Re: [PATCH ver3] rs6000, altivec-2-runnable.c update the require-effective-target

2024-06-19 Thread Carl Love
Everyone, Oops, this should be version 3 not 2.  Sorry.

  Carl 

On 6/19/24 09:13, Carl Love wrote:
> GCC maintainers:
> 
> version 2:  Updated per the feedback from Peter, Kewen and Segher.  Note, 
> Peter suggested the -mdejagnu-cpu= value must be power7.  
> The test fails if -mdejagnu-cpu= is set to power7, needs to be power8.  Patch 
> has been retested on a Power 10 box, it succeeds
> with 2 passes and no fails.
> 
> Per the additional feedback after patch: 
> 
>   commit c892525813c94b018464d5a4edc17f79186606b7
>   Author: Carl Love 
>   Date:   Tue Jun 11 14:01:16 2024 -0400
> 
>   rs6000, altivec-2-runnable.c should be a runnable test
> 
>   The test case has "dg-do compile" set not "dg-do run" for a runnable
>   test.  This patch changes the dg-do command argument to run.
> 
>   gcc/testsuite/ChangeLog:gcc/testsuite/ChangeLog:
>   * gcc.target/powerpc/altivec-2-runnable.c: Change dg-do
>   argument to run.
> 
> was approved and committed, I have updated the dg-require-effective-target
> and dg-options as requested so the test will compile with -O2 on a 
> machine that has a minimum support of Power 8 vector hardware.
> 
> The patch has been tested on Power 10 with no regression failures.
> 
> Please let me know if this patch is acceptable for mainline.  Thanks.
> 
> Carl 
> 
> 
> rs6000, altivec-2-runnable.c update the require-effective-target
> 
> The test requires a minimum of Power8 vector HW and a compile level
> of -O2.
> 
> gcc/testsuite/ChangeLog:gcc/testsuite/ChangeLog:
>   * gcc.target/powerpc/altivec-2-runnable.c: Change the
>   require-effective-target for the test.
> ---
>  gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c 
> b/gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c
> index 17b23eb9d50..9e7ef89327b 100644
> --- a/gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c
> +++ b/gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c
> @@ -1,7 +1,7 @@
> -/* { dg-do run } */
> -/* { dg-options "-mvsx" } */
> -/* { dg-additional-options "-mdejagnu-cpu=power8" { target { ! has_arch_pwr8 
> } } } */
> -/* { dg-require-effective-target powerpc_vsx } */
> +/* { dg-do run { target vsx_hw } } */
> +/* { dg-do compile { target { ! vmx_hw } } } */
> +/* { dg-options "-O2  -mdejagnu-cpu=power8" } */
> +/* { dg-require-effective-target powerpc_altivec } */
>  
>  #include 
>  


[PATCH] rs6000, altivec-1-runnable.c update the require-effective-target

2024-06-19 Thread Carl Love
GCC maintainers:

The dg options for this test should be the same as for altivec-2-runnable.c.  
This patch updates the dg options to match 
the settings in altivec-2-runnable.c.

The patch has been tested on Power 10 with no regression failures.

Please let me know if this patch is acceptable for mainline.  Thanks.

Carl 

--From
 289e15d215161ad45ae1aae7a5dedd2374737ec4 rs6000, altivec-1-runnable.c update 
the require-effective-target

The test requires a minimum of Power8 vector HW and a compile level
of -O2.

gcc/testsuite/ChangeLog:gcc/testsuite/ChangeLog:
* gcc.target/powerpc/altivec-1-runnable.c: Change the
require-effective-target for the test.
---
 gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c 
b/gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c
index da8ebbc30ba..c113089c13a 100644
--- a/gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c
+++ b/gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c
@@ -1,6 +1,7 @@
-/* { dg-do compile { target powerpc*-*-* } } */
-/* { dg-require-effective-target powerpc_altivec_ok } */
-/* { dg-options "-maltivec" } */
+/* { dg-do run { target vsx_hw } } */
+/* { dg-do compile { target { ! vmx_hw } } } */
+/* { dg-options "-O2 -mdejagnu-cpu=power8" } */
+/* { dg-require-effective-target powerpc_altivec } */
 
 #include 
 
-- 
2.45.0



Re: [PATCH ver2] rs6000, altivec-2-runnable.c update the require-effective-target

2024-06-21 Thread Carl Love
Kewen:

On 6/21/24 03:36, Kewen.Lin wrote:
> Hi Carl,
> 
> on 2024/6/20 00:13, Carl Love wrote:
>> GCC maintainers:
>>
>> version 2:  Updated per the feedback from Peter, Kewen and Segher.  Note, 
>> Peter suggested the -mdejagnu-cpu= value must be power7.  
>> The test fails if -mdejagnu-cpu= is set to power7, needs to be power8.  
>> Patch has been retested on a Power 10 box, it succeeds
>> with 2 passes and no fails.
> 
> IMHO Peter's suggestion on power7 (-mdejagnu-cpu=power7) is mainly for
> altivec-1-runnable.c.  Both your testing and the comments in the test
> case show this altivec-2-runnable.c requires at least power8.

OK.  Per other thread changed altivec-1-runnable to power7.

> 
>>
>> Per the additional feedback after patch: 
>>
>>   commit c892525813c94b018464d5a4edc17f79186606b7
>>   Author: Carl Love 
>>   Date:   Tue Jun 11 14:01:16 2024 -0400
>>
>>   rs6000, altivec-2-runnable.c should be a runnable test
>> 
>>   The test case has "dg-do compile" set not "dg-do run" for a runnable
>>   test.  This patch changes the dg-do command argument to run.
>> 
>>   gcc/testsuite/ChangeLog:gcc/testsuite/ChangeLog:
>>   * gcc.target/powerpc/altivec-2-runnable.c: Change dg-do
>>   argument to run.
>>
>> was approved and committed, I have updated the dg-require-effective-target
>> and dg-options as requested so the test will compile with -O2 on a 
>> machine that has a minimum support of Power 8 vector hardware.
>>
>> The patch has been tested on Power 10 with no regression failures.
>>
>> Please let me know if this patch is acceptable for mainline.  Thanks.
>>
>> Carl 
>>
>> 
>> rs6000, altivec-2-runnable.c update the require-effective-target
>>
>> The test requires a minimum of Power8 vector HW and a compile level
>> of -O2.
>>
>> gcc/testsuite/ChangeLog:gcc/testsuite/ChangeLog:
>>  * gcc.target/powerpc/altivec-2-runnable.c: Change the
>>  require-effective-target for the test.
>> ---
>>  gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c | 8 
>>  1 file changed, 4 insertions(+), 4 deletions(-)
>>
>> diff --git a/gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c 
>> b/gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c
>> index 17b23eb9d50..9e7ef89327b 100644
>> --- a/gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c
>> +++ b/gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c
>> @@ -1,7 +1,7 @@
>> -/* { dg-do run } */
>> -/* { dg-options "-mvsx" } */
>> -/* { dg-additional-options "-mdejagnu-cpu=power8" { target { ! 
>> has_arch_pwr8 } } } */
>> -/* { dg-require-effective-target powerpc_vsx } */
>> +/* { dg-do run { target vsx_hw } } */
> 
> As this test case requires power8 and up, and dg-options specifies
> -mdejagnu-cpu=power8, we should use p8vector_hw instead of vsx_hw here,
> otherwise it will fail on power7 env.

Changed to p8vector_hw

> 
>> +/* { dg-do compile { target { ! vmx_hw } } } */
> 
> This condition should be ! , so ! p8vector_hw.

Changed. 

> 
>> +/* { dg-options "-O2  -mdejagnu-cpu=power8" } */> +/* { 
>> dg-require-effective-target powerpc_altivec } */
> 
> This should be powerpc_vsx instead, otherwise this case can still be
> tested with -mno-vsx -maltivec, then this test case would fail.

OK
> 
> Besides, as the discussion on the name of this test case, could you also
> rename this to p8vector-builtin-9.c instead?

Put the name change in a separate patch to change both test file names.
 
  Carl 


[PATCH] rs6000, change altivec*-runnable.c test file names

2024-06-21 Thread Carl Love
GCC maintainers:

Per the discussion of the dg header changes for test files altivec-1-runnable.c 
and altivec-2-runnable.c it was decided it would be best to change the names of 
the two tests to better align them with the tests that they are better aligned 
with.

This patch is dependent on the two patches to update the dg arguments for test 
files altivec-1-runnable.c and altivec-2-runnable.c being accepted and 
committed before this patch.

The patch has been tested on Power 10 with no regression failures.

Please let me know if this patch is acceptable for mainline.  Thanks.

Carl 

--
rs6000, change altivec*-runnable.c test file names

Changed the names of the test files.

gcc/testsuite/ChangeLog:gcc/testsuite/ChangeLog:
* gcc.target/powerpc/altivec-1-runnable.c: Change the name to
altivec-38.c.
* gcc.target/powerpc/altivec-2-runnable.c: Change the name to
p8vector-builtin-9.c.
---
 .../gcc.target/powerpc/{altivec-1-runnable.c => altivec-38.c} | 0
 .../powerpc/{altivec-2-runnable.c => p8vector-builtin-9.c}| 0
 2 files changed, 0 insertions(+), 0 deletions(-)
 rename gcc/testsuite/gcc.target/powerpc/{altivec-1-runnable.c => altivec-38.c} 
(100%)
 rename gcc/testsuite/gcc.target/powerpc/{altivec-2-runnable.c => 
p8vector-builtin-9.c} (100%)

diff --git a/gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c 
b/gcc/testsuite/gcc.target/powerpc/altivec-38.c
similarity index 100%
rename from gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c
rename to gcc/testsuite/gcc.target/powerpc/altivec-38.c
diff --git a/gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c 
b/gcc/testsuite/gcc.target/powerpc/p8vector-builtin-9.c
similarity index 100%
rename from gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c
rename to gcc/testsuite/gcc.target/powerpc/p8vector-builtin-9.c
-- 
2.45.0



Re: [PATCH] rs6000, altivec-1-runnable.c update the require-effective-target

2024-06-21 Thread Carl Love
Kewen:

On 6/21/24 03:37, Kewen.Lin wrote:
> Hi Carl,
> 
> on 2024/6/20 00:18, Carl Love wrote:
>> GCC maintainers:
>>
>> The dg options for this test should be the same as for altivec-2-runnable.c. 
>>  This patch updates the dg options to match 
>> the settings in altivec-2-runnable.c.
>>
>> The patch has been tested on Power 10 with no regression failures.
>>
>> Please let me know if this patch is acceptable for mainline.  Thanks.
>>
>> Carl 
>>
>> --From
>>  289e15d215161ad45ae1aae7a5dedd2374737ec4 rs6000, altivec-1-runnable.c 
>> update the require-effective-target
>>
>> The test requires a minimum of Power8 vector HW and a compile level
>> of -O2.
> 
> This is not true, vec_unpackh and vec_unpackl doesn't require power8,
> vupk[hl]s[hb]/vupk[hl]px are all ISA 2.03.
> 
>>
>> gcc/testsuite/ChangeLog:gcc/testsuite/ChangeLog:
>>  * gcc.target/powerpc/altivec-1-runnable.c: Change the
>>  require-effective-target for the test.
>> ---
>>  gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c | 7 ---
>>  1 file changed, 4 insertions(+), 3 deletions(-)
>>
>> diff --git a/gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c 
>> b/gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c
>> index da8ebbc30ba..c113089c13a 100644
>> --- a/gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c
>> +++ b/gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c
>> @@ -1,6 +1,7 @@
>> -/* { dg-do compile { target powerpc*-*-* } } */
>> -/* { dg-require-effective-target powerpc_altivec_ok } */
>> -/* { dg-options "-maltivec" } */
>> +/* { dg-do run { target vsx_hw } } */
> 
> So this line should check for vmx_hw.

OK, fingers are used to typing vsx   Fixed.

> 
>> +/* { dg-do compile { target { ! vmx_hw } } } */
>> +/* { dg-options "-O2 -mdejagnu-cpu=power8" } */
> 
> With more thinking, I think it's better to use
> "-O2 -maltivec" to be consistent with the others.

OK, changed it back.  We now have:

/* { dg-do run { target vmx_hw } } */
/* { dg-do compile { target { ! vmx_hw } } } */
/* { dg-options "-O2 -maltivec" } */
/* { dg-require-effective-target powerpc_altivec } */

The regression test runs fine with the above.  Two passes, no failures.


> 
> As mentioned in the other thread, powerpc_altivec
> effective target check should guarantee the altivec
> feature support, if any default cpu type or user
> specified option disable altivec, this test case
> will not be tested.  If we specify one cpu type
> specially here, it may cause confusion why it's
> different from the other existing ones.  So let's
> go without no specified cpu type.
> 
> Besides, similar to the request for altivec-1-runnable.c,
> could you also rename this to altivec-38.c?

OK, will change the names for the two test cases at the same time in a separate 
patch.
 
 Carl 


[PATCH version 2] rs6000, altivec-1-runnable.c update the, require-effective-target

2024-06-21 Thread Carl Love
GCC maintainers:

version 2, update the dg options per the feedback.  Retested the patch on Power 
10 with no regressions.

This patch updates the dg options.

The patch has been tested on Power 10 with no regression failures.

Please let me know if this patch is acceptable for mainline.  Thanks.

Carl 

-- 
rs6000, altivec-1-runnable.c update the require-effective-target

Update the dg test directives.

gcc/testsuite/ChangeLog:gcc/testsuite/ChangeLog:
* gcc.target/powerpc/altivec-1-runnable.c: Change the
require-effective-target for the test.
---
 gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c 
b/gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c
index da8ebbc30ba..3f084c91798 100644
--- a/gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c
+++ b/gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c
@@ -1,6 +1,7 @@
-/* { dg-do compile { target powerpc*-*-* } } */
-/* { dg-require-effective-target powerpc_altivec_ok } */
-/* { dg-options "-maltivec" } */
+/* { dg-do run { target vmx_hw } } */
+/* { dg-do compile { target { ! vmx_hw } } } */
+/* { dg-options "-O2 -maltivec" } */
+/* { dg-require-effective-target powerpc_altivec } */
 
 #include 
 
-- 
2.45.0



[PATCH version 4] rs6000, altivec-2-runnable.c update the, require-effective-target

2024-06-21 Thread Carl Love
GCC maintainers:

version 4:  Additional dg option updates per the feedback.  Retested the patch 
on Power 10, no regressions.

version 3:  Updated per the feedback from Peter, Kewen and Segher.  Note, Peter 
suggested the -mdejagnu-cpu= value must be power7.  
The test fails if -mdejagnu-cpu= is set to power7, needs to be power8.  Patch 
has been retested on a Power 10 box, it succeeds
with 2 passes and no fails.

Per the additional feedback after patch: 

  commit c892525813c94b018464d5a4edc17f79186606b7
  Author: Carl Love 
  Date:   Tue Jun 11 14:01:16 2024 -0400

  rs6000, altivec-2-runnable.c should be a runnable test

  The test case has "dg-do compile" set not "dg-do run" for a runnable
  test.  This patch changes the dg-do command argument to run.

  gcc/testsuite/ChangeLog:gcc/testsuite/ChangeLog:
  * gcc.target/powerpc/altivec-2-runnable.c: Change dg-do
  argument to run.

was approved and committed, I have updated the dg-require-effective-target
and dg-options as requested so the test will compile with -O2 on a 
machine that has a minimum support of Power 8 vector hardware.

The patch has been tested on Power 10 with no regression failures.

Please let me know if this patch is acceptable for mainline.  Thanks.

Carl 

--
rs6000, altivec-2-runnable.c update the require-effective-target

The test requires a minimum of Power8 vector HW and a compile level
of -O2.  Update the dg test directives.

gcc/testsuite/ChangeLog:gcc/testsuite/ChangeLog:
* gcc.target/powerpc/altivec-2-runnable.c: Change the
require-effective-target for the test.
---
 gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c 
b/gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c
index 17b23eb9d50..660669f69fd 100644
--- a/gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c
+++ b/gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c
@@ -1,6 +1,6 @@
-/* { dg-do run } */
-/* { dg-options "-mvsx" } */
-/* { dg-additional-options "-mdejagnu-cpu=power8" { target { ! has_arch_pwr8 } 
} } */
+/* { dg-do run { target p8vector_hw } } */
+/* { dg-do compile { target { ! p8vector_hw } } } */
+/* { dg-options "-O2  -mdejagnu-cpu=power8" } */
 /* { dg-require-effective-target powerpc_vsx } */
 
 #include 
-- 
2.45.0



Re: [PATCH 4/13 ver4] rs6000, extend the current vec_{un,}signed{e,o}, built-ins

2024-06-24 Thread Carl Love



On 6/18/24 20:03, Kewen.Lin wrote:
> Hi Carl,
> 
> on 2024/6/14 03:40, Carl Love wrote:
>>
>> GCC maintainers:
>>
>> As noted the removal of __builtin_vsx_xvcvdpuxds_uns and 
>> __builtin_vsx_xvcvspuxws was moved to patch 2 in the seris.  The patch has 
>> been updated per the comments from version 3.
>>
>> Please let me know if this patch is acceptable for mainline.  
>>
>>  Carl 
>>
>> --
>>
>> rs6000, extend the current vec_{un,}signed{e,o} built-ins
>>
>> The built-ins __builtin_vsx_xvcvspsxds and __builtin_vsx_xvcvspuxds
>> convert a vector of floats to signed/unsigned long long ints.  Extend the
> 
> Nit: s/signed/a vector of signed/

Fixed.

> 
>> existing vec_{un,}signed{e,o} built-ins to handle the argument
>> vector of floats to return the even/odd signed/unsigned integers.
>>
> 
> Likewise.

Fixed.

> 
>> The define expands vsignede_v4sf, vsignedo_v4sf, vunsignede_v4sf,
>> vunsignedo_v4sf are added to support the new vec_{un,}signed{e,o}
>> built-ins.
>>
>> The built-ins __builtin_vsx_xvcvspsxds and __builtin_vsx_xvcvspuxds are
>> now for internal use only. They are not documented and they do not
>> have testcases.
>>
> 
> 
>> The built-in __builtin_vsx_xvcvdpsxws is redundant as it is covered by
>> vec_signed{e,o}, remove.
>>
>> The built-in __builtin_vsx_xvcvdpuxws is redundant as it is covered by
>> vec_unsigned{e,o}, remove.
> 
> As the comments in 2/13 v4 and the previous review comments, I preferred
> these two are moved to 2/13 as well (this patch should focus on extending).
> 

Moved to patch 2.

>>
>> Add testcases and update documentation.
>>
>> gcc/ChangeLog:
>>  * config/rs6000/rs6000-builtins.def: __builtin_vsx_xvcvdpsxws,
>>  __builtin_vsx_xvcvdpuxws): Removed.
>>  (__builtin_vsx_xvcvspsxds, __builtin_vsx_xvcvspuxds): Renamed
> 
> Nit: s/Renamed/Rename to/

OK, fixed.

> 
>>  __builtin_vsignede_v4sf, __builtin_vunsignede_v4sf respectively.
>>  (XVCVSPSXDS, XVCVSPUXDS): Renamed VEC_VSIGNEDE_V4SF,
>>  VEC_VUNSIGNEDE_V4SF respectively.
> 
> Likewise.

OK, fixed. 

> 
>>  (__builtin_vsignedo_v4sf, __builtin_vunsignedo_v4sf): New
>>  built-in definitions.
>>  * config/rs6000/rs6000-overload.def (vec_signede, vec_signedo,
>>  vec_unsignede,vec_unsignedo):  Add new overloaded specifications.
> 
> Formatting nits: "..,.." -> ".., ..", "  " -> " "

OK, I fixed the various spacing issues.
> 
>>  * config/rs6000/vsx.md (vsignede_v4sf, vsignedo_v4sf,
>>  vunsignede_v4sf, vunsignedo_v4sf): New  define_expands.
> 
> Likewise.

dito

> 
>>  * doc/extend.texi (vec_signedo, vec_signede): Add documentation
>>  for new overloaded built-ins.
> 
> Missing vec_unsignedo and vec_unsignede, may be also mention for which
> types, like "converting vector float to vector {un,}signed long long".
> 

OK, fixed.

>>
>> gcc/testsuite/ChangeLog:
>>  * gcc.target/powerpc/builtins-3-runnable.c
>>  (test_unsigned_int_result, test_ll_unsigned_int_result): Add
>>  new argument.
>>  (vec_signede, vec_signedo, vec_unsignede, vec_unsignedo): New
>>  tests for the overloaded built-ins.
>> ---  gcc/config/rs6000/rs6000-builtins.def | 20 ++---
>>  gcc/config/rs6000/rs6000-overload.def |  8 ++
>>  gcc/config/rs6000/vsx.md  | 84 +++
>>  gcc/doc/extend.texi   | 10 +++
>>  .../gcc.target/powerpc/builtins-3-runnable.c  | 49 +--
>>  5 files changed, 154 insertions(+), 17 deletions(-)
>>
>> diff --git a/gcc/config/rs6000/rs6000-builtins.def 
>> b/gcc/config/rs6000/rs6000-builtins.def
>> index 322d27b7a0d..29a9deb3410 100644
>> --- a/gcc/config/rs6000/rs6000-builtins.def
>> +++ b/gcc/config/rs6000/rs6000-builtins.def
>> @@ -1688,26 +1688,26 @@
>>const vsll __builtin_vsx_xvcvdpsxds_scale (vd, const int);
>>  XVCVDPSXDS_SCALE vsx_xvcvdpsxds_scale {}
>>  
>> -  const vsi __builtin_vsx_xvcvdpsxws (vd);
>> -XVCVDPSXWS vsx_xvcvdpsxws {}
>> -
>>const vsll __builtin_vsx_xvcvdpuxds (vd);
>>  XVCVDPUXDS vsx_fixuns_truncv2dfv2di2 {}
>>  
>>const vsll __builtin_vsx_xvcvdpuxds_scale (vd, const int);
>>  XVCVDPUXDS_SCALE vsx_xvcvdpuxds_scale {}
>>  
>> -  const vsi __builtin_vsx_xvcvdpuxws (vd);
>> -XVCVDPUXWS

Re: [PATCH 2/13 ver4] rs6000, Remove __builtin_vsx_xvcvspsxws,, __builtin_vsx_xvcvdpuxds_uns, __builtin_vsx_xvcvspuxws built-ins.

2024-06-24 Thread Carl Love
Kewen:

On 6/18/24 20:03, Kewen.Lin wrote:
> Hi Carl,
> 
> on 2024/6/14 03:40, Carl Love wrote:
>> GCC maintainers:
>>
>> Per the comments on patch 0004 from version 3, the removal of 
>> The built-in __builtin_vsx_xvcvdpuxds_uns and __builtin_vsx_xvcvspuxws was 
>> moved to this patch.  The rest of the patch is unchanged from version 3.  
>> There were no comments on this patch for version 3.
>>
>> Please let me know if this patch is acceptable.  Thanks.
>>
>> Carl 
>>
>>
>> -
>>
>> rs6000, Remove __builtin_vsx_xvcvspsxws,
>>  __builtin_vsx_xvcvdpuxds_uns, __builtin_vsx_xvcvspuxws built-ins.
> 
> Nit: Maybe make it shorter like: Remove built-ins 
> __builtin_vsx_xvcv{sp{sx,u}ws,dpuxds_uns}
> 
>>
>> The built-in __builtin_vsx_xvcvspsxws is a duplicate of the vec_signed
> 
> Nit: Strictly speaking, not a duplicate of vec_signed but covered by it.
> 
>> built-in that is documented in the PVIPR.  The __builtin_vsx_xvcvspsxws
>> built-in is not documented and there are no test cases for it.
>>
>> The built-in __builtin_vsx_xvcvdpuxds_uns is redundant as it is covered by
>> vec_unsigned, remove.
>>
>> The __builtin_vsx_xvcvspuxws is redundant as it is covered by
>> vec_unsigned, remove.
> 
> As mentioned in the previous review, I'd expect patch 4/13 only focuses on
> extending vec_{un,}signed{e,o} for vector float (aka. __builtin_vsx_xvcvspsxds
> and __builtin_vsx_xvcvspuxds related), and this patch focuses on some built-in
> removals which have been covered by the existing vec_{un,}signed{,e,o}, so
> it can also drop the built-ins:
> 
> "The built-in __builtin_vsx_xvcvdpsxws is redundant as it is covered by
> vec_signed{e,o}, remove.
> 
> The built-in __builtin_vsx_xvcvdpuxws is redundant as it is covered by
> vec_unsigned{e,o}, remove."
> 
> // copied from 4/13.

Not sure why I didn't move these two with the other two???  Sorry.

Moved them from patch 4.

  Carl 


[PATCH ver3] rs6000, altivec-1-runnable.c update the, require-effective-target

2024-06-25 Thread Carl Love
GCC maintainers:

version 3, rebased on current mainline tree.  Version 2 of the patch was out of 
sync. Retested the patch on 
Power 10 with no regressions.

version 2, update the dg options per the feedback.  Retested the patch on Power 
10 with no regressions.

This patch updates the dg options.

The patch has been tested on Power 10 with no regression failures.

Please let me know if this patch is acceptable for mainline.  Thanks.

Carl 



rs6000, altivec-1-runnable.c update the require-effective-target

Update the dg test directives.

gcc/testsuite/ChangeLog:gcc/testsuite/ChangeLog:
* gcc.target/powerpc/altivec-1-runnable.c: Change the
require-effective-target for the test.
---
 gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c 
b/gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c
index 4e32860a169..6763ff3ff8b 100644
--- a/gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c
+++ b/gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c
@@ -1,7 +1,9 @@
-/* { dg-do compile { target powerpc*-*-* } } */
-/* { dg-options "-maltivec" } */
+/* { dg-do run { target vmx_hw } } */
+/* { dg-do compile { target { ! vmx_hw } } } */
+/* { dg-options "-O2 -maltivec" } */
 /* { dg-require-effective-target powerpc_altivec } */
 
+
 #include 
 
 #ifdef DEBUG
-- 
2.45.0



Re: [PATCH version 2] rs6000, altivec-1-runnable.c update the, require-effective-target

2024-06-25 Thread Carl Love
Kewen:

On 6/23/24 19:41, Kewen.Lin wrote:
> Hi,
> 
> on 2024/6/22 00:15, Carl Love wrote:
>> GCC maintainers:
>>
>> version 2, update the dg options per the feedback.  Retested the patch on 
>> Power 10 with no regressions.
>>
>> This patch updates the dg options.
>>
>> The patch has been tested on Power 10 with no regression failures.
>>
>> Please let me know if this patch is acceptable for mainline.  Thanks.
>>
>> Carl 
>>
>> -- 
>> rs6000, altivec-1-runnable.c update the require-effective-target
>>
>> Update the dg test directives.
>>
>> gcc/testsuite/ChangeLog:gcc/testsuite/ChangeLog:
>>  * gcc.target/powerpc/altivec-1-runnable.c: Change the
>>  require-effective-target for the test.
>> ---
>>  gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c | 7 ---
>>  1 file changed, 4 insertions(+), 3 deletions(-)
>>
>> diff --git a/gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c 
>> b/gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c
>> index da8ebbc30ba..3f084c91798 100644
>> --- a/gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c
>> +++ b/gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c
>> @@ -1,6 +1,7 @@
>> -/* { dg-do compile { target powerpc*-*-* } } */
>> -/* { dg-require-effective-target powerpc_altivec_ok } */
>> -/* { dg-options "-maltivec" } */
>> +/* { dg-do run { target vmx_hw } } */
>> +/* { dg-do compile { target { ! vmx_hw } } } */
>> +/* { dg-options "-O2 -maltivec" } */
>> +/* { dg-require-effective-target powerpc_altivec } */
> 
> This one needs rebasing, "powerpc_altivec" has been adjusted on trunk.

Yes, this seems to be out of sync.  I will rebase on the current upstream tree 
and re-post.

 Carl  


[PATCH] rs6000, update vec_ld, vec_lde, vec_st and vec_ste, documentation

2024-06-26 Thread Carl Love
GCC maintainers:

The following patch updates the user documentation for the vec_ld, vec_lde, 
vec_st and vec_ste built-ins to make it clearer that there are data alignment 
requirements for these built-ins.  If the data alignment requirements are not 
followed, the data loaded or stored by these built-ins will be wrong.

Please let me know if this patch is acceptable for mainline.  Thanks.

  Carl 


rs6000, update vec_ld, vec_lde, vec_st and vec_ste documentation

Use of the vec_ld and vec_st built-ins require that the data be 16-byte
aligned to work properly.  Add some additional text to the existing
documentation to make this clearer to the user.

Similarly, the vec_lde and vec_ste built-ins also have data alignment
requirements based on the size of the vector element.  Update the
documentation to make this clear to the user.

gcc/ChangeLog:
* doc/extend.texi: Add clarification for the use of the vec_ld
vec_st, vec_lde and vec_ste built-ins.
---
 gcc/doc/extend.texi | 15 +++
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index ee3644a5264..55faded17b9 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -22644,10 +22644,17 @@ vector unsigned char vec_xxsldi (vector unsigned char,
 @end smallexample
 
 Note that the @samp{vec_ld} and @samp{vec_st} built-in functions always
-generate the AltiVec @samp{LVX} and @samp{STVX} instructions even
-if the VSX instruction set is available.  The @samp{vec_vsx_ld} and
-@samp{vec_vsx_st} built-in functions always generate the VSX @samp{LXVD2X},
-@samp{LXVW4X}, @samp{STXVD2X}, and @samp{STXVW4X} instructions.
+generate the AltiVec @samp{LVX}, and @samp{STVX} instructions.  The
+instructions mask off the lower 4 bits of the effective address thus requiring
+the data to be 16-byte aligned to work properly.  The @samp{vec_lde} and
+@samp{vec_ste} built-in functions operate on vectors of bytes, short integer,
+integer, and float.  The corresponding AltiVec instructions @samp{LVEBX},
+@samp{LVEHX}, @samp{LVEWX}, @samp{STVEBX}, @samp{STVEHX}, @samp{STVEWX} mask
+off the lower bits of the effective address based on the size of the data.
+Thus the data must be aligned to the size of the vector element to work
+properly.  The @samp{vec_vsx_ld} and @samp{vec_vsx_st} built-in functions
+always generate the VSX @samp{LXVD2X}, @samp{LXVW4X}, @samp{STXVD2X}, and
+@samp{STXVW4X} instructions.
 
 @node PowerPC AltiVec Built-in Functions Available on ISA 2.07
 @subsubsection PowerPC AltiVec Built-in Functions Available on ISA 2.07
-- 
2.45.0



Re: [PATCH] rs6000, Remove __builtin_vec_set_v1ti,, __builtin_vec_set_v2df, __builtin_vec_set_v2di

2024-07-22 Thread Carl Love



Kewen:

On 7/22/24 2:09 AM, Kewen.Lin wrote:

Hi Carl,

on 2024/7/18 00:01, Carl Love wrote:

GCC maintainers:

This patch removes the __builtin_vec_set_v1ti, __builtin_vec_set_v2df and 
__builtin_vec_set_v2di built-ins.  The users should just use normal C-code to 
update the various vector elements.  This change was originally intended to be 
part of the earlier series of cleanup patches.  It was initially thought that 
some additional work would be needed to do some gimple generation instead of 
these built-ins.  However, the existing default code generation does produce 
the needed code.  The code generated with normal C-code is as good or better 
than the code generated with these built-ins.

I think we need to expand this a bit:
   - For vec_set bif, the equivalent C code is as good as or better than it.
   - For vec_insert bif whose resolving makes use of vec_set bif previously 
(now get removed),
 it's as good as before with optimization.

The patch has been tested on Power 10 LE with no regressions.

Please let me know if the patch is acceptable for mainline.  Thanks.

    Carl

---
rs6000, Remove __builtin_vec_set_v1ti, __builtin_vec_set_v2df, 
__builtin_vec_set_v2di

Remove the built-ins, use the default gimple generation instead.

gcc/ChangeLog:
     * config/rs6000/rs6000-builtins.def (__builtin_vec_set_v1ti,
     __builtin_vec_set_v2df, __builtin_vec_set_v2di): Remove built-in
     definitions.
     * config/rs6000/rs6000-c.cc (resolve_vec_insert):  Remove if
     statemnts for mode == V2DFmode, mode == V2DImode and

Nit: s/statemnts/statements/


OK, fixed

Maybe a bit more meaningful like: Remove the handling for constant vec_insert 
position
with VECTOR_UNIT_VSX_P V1TImode, V2DFmode and V2DImode modes.

OK, changed




     mode == V1TImode that reference RS6000_BIF_VEC_SET_V2DF,
     RS6000_BIF_VEC_SET_V2DI and RS6000_BIF_VEC_SET_V1TI.
---
  gcc/config/rs6000/rs6000-builtins.def | 13 -
  gcc/config/rs6000/rs6000-c.cc | 40 ---
  2 files changed, 53 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index 896d9686ac6..0ebc940f395 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -1263,19 +1263,6 @@
    const signed long long __builtin_vec_ext_v2di (vsll, signed int);
  VEC_EXT_V2DI nothing {extract}

-;; VEC_SET_V1TI, VEC_SET_V2DF and VEC_SET_V2DI are used in
-;; resolve_vec_insert(), rs6000-c.cc
-;; TODO: Remove VEC_SET_V1TI, VEC_SET_V2DF and VEC_SET_V2DI once the uses
-;; in resolve_vec_insert are replaced by the equivalent gimple statements.
-  const vsq __builtin_vec_set_v1ti (vsq, signed __int128, const int<0,0>);
-    VEC_SET_V1TI nothing {set}
-
-  const vd __builtin_vec_set_v2df (vd, double, const int<1>);
-    VEC_SET_V2DF nothing {set}
-
-  const vsll __builtin_vec_set_v2di (vsll, signed long long, const int<1>);
-    VEC_SET_V2DI nothing {set}
-

Unexpected empty line removed.
??  I don't remove the blank line before the removed comment, so there 
is still a single blank line before the next entry. Specifically, the 
code with the above removed now looks like:


...
  const signed long long __builtin_vec_ext_v2di (vsll, signed int);
    VEC_EXT_V2DI nothing {extract}

  const vsc __builtin_vsx_cmpge_16qi (vsc, vsc);
    CMPGE_16QI vector_nltv16qi {}

  const vsll __builtin_vsx_cmpge_2di (vsll, vsll);
    CMPGE_2DI vector_nltv2di {}


Which looks OK to me?


Similar to vec_init removal, we should also get rid of set bif attribute,
bif_is_set and altivec_expand_vec_set_builtin etc.

That will also require removing:

 const vsq __builtin_vsx_set_1ti (vsq, signed __int128, const int<0,0>);
   SET_1TI vsx_set_v1ti {set}

  const vd __builtin_vsx_set_2df (vd, double, const int<0,1>);
    SET_2DF vsx_set_v2df {set}

 const vsll __builtin_vsx_set_2di (vsll, signed long long, const int<0,1>);
    SET_2DI vsx_set_v2di {set}

I would assume the C-code generation for the above will be as good or 
better than the code generation for the built-ins but will need to 
verify that.  I haven't looked at them specifically.


  Carl


[PATCH ver 2] rs6000, remove __builtin_vsx_xvcmp* built-ins

2024-07-23 Thread Carl Love

GCC maintainers:

version 2, Updated patch comments, added missing ChangeLog.  Fixed 
unintended line removal.


The following patch removes the three __builtin_vsx_xvcmp[eq|ge|gt]sp  
builtins as they similar to the overloaded vec_cmp[eq|ge|gt] built-ins.  
The difference is the overloaded built-ins return a vector of boolean or 
a vector of long long booleans where as the removed built-ins returned a 
vector of floats or vector of doubles.


The tests for __builtin_vsx_xvcmp[eq|ge|gt]sp and 
__builtin_vsx_xvcmp[eq|ge|gt]dp are updated to use the overloaded 
vec_cmp[eq|ge|gt] built-in with the required changes for the return 
type.  Note __builtin_vsx_xvcmp[eq|ge|gt]dp are used internally.


The patches have been tested on a Power 10 LE system with no regressions.

Please let me know if the patch is acceptable for mainline.  Thanks.

   Carl
-
rs6000, remove __builtin_vsx_xvcmp* built-ins

This patch removes the built-ins:
 __builtin_vsx_xvcmpeqsp, __builtin_vsx_xvcmpgesp,
 __builtin_vsx_xvcmpgtsp.

which are similar to the recommended PVIPR documented overloaded
vec_cmpeq, vec_cmpgt and vec_cmpge built-ins.

The difference is that the overloaded built-ins return a vector of
32-bit booleans.  The removed built-ins returned a vector of floats.

The __builtin_vsx_xvcmpeqdp, __builtin_vsx_xvcmpgedp and
__builtin_vsx_xvcmpgtdp are not removed as they are used by the
overloaded vec_cmpeq, vec_cmpgt and vec_cmpge built-ins.

The test cases for the __builtin_vsx_xvcmpeqsp, __builtin_vsx_xvcmpgesp,
__builtin_vsx_xvcmpgtsp, __builtin_vsx_xvcmpeqdp,
__builtin_vsx_xvcmpgedp and __builtin_vsx_xvcmpgtdp  are changed to use
the overloaded vec_cmpeq, vec_cmpgt, vec_cmpge built-ins.  Use of the
overloaded built-ins requires the result to be stored in a vector of
boolean of the appropriate size or the result must be cast to the return
type used by the original __builtin_vsx_xvcmp* built-ins.

gcc/ChangeLog:
    * config/rs6000/rs6000-builtins.def (__builtin_vsx_xvcmpeqsp,
    __builtin_vsx_xvcmpgesp, __builtin_vsx_xvcmpgtsp): Remove
    definitions.

gcc/testsuite/ChangeLog:
    * gcc.target/powerpc/vsx-builtin-3.c (__builtin_vsx_xvcmpeqdp,
    __builtin_vsx_xvcmpgtdp, __builtin_vsx_xvcmpgedp,
    __builtin_vsx_xvcmpeqsp, __builtin_vsx_xvcmpgtsp,
    __builtin_vsx_xvcmpgesp): Remove.
    (vec_cmpeq, vec_cmpgt, vec_cmpge): Add tests for float
    arguments that     store result in boolean and cast result to
    store result in float.  Add tests for double arguments that
    store the result in long long boolean and cast result to
    double.
---
 gcc/config/rs6000/rs6000-builtins.def |  9 --
 .../gcc.target/powerpc/vsx-builtin-3.c    | 28 ++-
 2 files changed, 21 insertions(+), 16 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def

index 77eb0f7e406..47830b7dcb0 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -1579,18 +1579,12 @@
   const signed int __builtin_vsx_xvcmpeqdp_p (signed int, vd, vd);
 XVCMPEQDP_P vector_eq_v2df_p {pred}

-  const vf __builtin_vsx_xvcmpeqsp (vf, vf);
-    XVCMPEQSP vector_eqv4sf {}
-
   const vd __builtin_vsx_xvcmpgedp (vd, vd);
 XVCMPGEDP vector_gev2df {}

   const signed int __builtin_vsx_xvcmpgedp_p (signed int, vd, vd);
 XVCMPGEDP_P vector_ge_v2df_p {pred}

-  const vf __builtin_vsx_xvcmpgesp (vf, vf);
-    XVCMPGESP vector_gev4sf {}
-
   const signed int __builtin_vsx_xvcmpgesp_p (signed int, vf, vf);
 XVCMPGESP_P vector_ge_v4sf_p {pred}

@@ -1600,9 +1594,6 @@
   const signed int __builtin_vsx_xvcmpgtdp_p (signed int, vd, vd);
 XVCMPGTDP_P vector_gt_v2df_p {pred}

-  const vf __builtin_vsx_xvcmpgtsp (vf, vf);
-    XVCMPGTSP vector_gtv4sf {}
-
   const signed int __builtin_vsx_xvcmpgtsp_p (signed int, vf, vf);
 XVCMPGTSP_P vector_gt_v4sf_p {pred}

diff --git a/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c 
b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c

index 60f91aad23c..d67f97c8011 100644
--- a/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c
+++ b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c
@@ -156,13 +156,27 @@ int do_cmp (void)
 {
   int i = 0;

-  d[i][0] = __builtin_vsx_xvcmpeqdp (d[i][1], d[i][2]); i++;
-  d[i][0] = __builtin_vsx_xvcmpgtdp (d[i][1], d[i][2]); i++;
-  d[i][0] = __builtin_vsx_xvcmpgedp (d[i][1], d[i][2]); i++;
-
-  f[i][0] = __builtin_vsx_xvcmpeqsp (f[i][1], f[i][2]); i++;
-  f[i][0] = __builtin_vsx_xvcmpgtsp (f[i][1], f[i][2]); i++;
-  f[i][0] = __builtin_vsx_xvcmpgesp (f[i][1], f[i][2]); i++;
+  /* The __builtin_vsx_xvcmp[gt|ge|eq]dp and 
__builtin_vsx_xvcmp[gt|ge|eq]sp

+ have been removed in favor of the overloaded vec_cmpeq, vec_cmpgt and
+ vec_cmpge built-ins.  The __builtin_vsx_xvcmp* builtins returned a 
vector
+ result of the same type as the 

[PATCH 0/2] rs6000, remove vec and vsx set builtins

2024-07-23 Thread Carl Love

GCC maintainers:

The code generated by using C-code to set a vector element versus using 
a built-in has been investigated.  The assembly code generated from the 
C-code is as good or better than the assembly code generated for the 
built-ins for both the -O0 and -O3 levels of optimization.


For the vec_insert built-in bif whose resolving makes use of the vec_set 
bif previously, is now removed, is as good as before with optimization.


This two patch series removes the __builtin_vec_set_v1ti, 
__builtin_vec_set_v2df, __builtin_vec_set_v2di and  built-ins 
__builtin_vsx_set_1ti,  __builtin_vsx_set_2df, __builtin_vsx_set_2di 
built-ins in favor of using C-code instead.  The built-ins use the 
built-in set attribute in the definitions of the built-ins.  With the 
removal of these 6 built-ins, the set built-in attribute is no longer 
used and the related code for the attribute is removed.


The patch, first patch in this series, to remove the 
__builtin_vec_set_v1ti, __builtin_vec_set_v2df, __builtin_vec_set_v2di 
was previously posted.  The feedback on the patch was that we could also 
remove set bif attribute.  Removal of the set bif attribute requires 
also removing the __builtin_vsx_set_1ti,  __builtin_vsx_set_2df, 
__builtin_vsx_set_2di built-ins.  The second patch removes the vsx set 
built-ins and the now no longer used set built-in attribute and 
associated code.


The patches have been tested on a Power 10 LE system with no regressions.

Carl


Re: [PATCH 2/2] rs6000, remove built-ins __builtin_vsx_set_1ti, __builtin_vsx_set_2df, __builtin_vsx_set_2di

2024-07-23 Thread Carl Love

GCC maintainers:

This patch removes the vsx set built-ins: __builtin_vsx_set_1ti, 
__builtin_vsx_set_2df, __builtin_vsx_set_2di.  With the  removal of 
these built-ins, the built-in attribute "set", used in the built-in 
definition file, is no longer needed.  The "set"  and the associated 
code for the "set" is removed.


The assembly code generated by using C code to set an element of a 
vector versus using the vsx set built-in to set an element was 
investigated.  With -O0 optimization the generated assmenly code is 
comparable in therms of the generated assembly instrucitons and number 
of instructions.  For the -O3 optimization level, the 2DI an 2DF cases 
the built-ins and the C code generate identical assembly code.  The 
assembly code generated for the 1TI case for the C code has one less 
instruction.  The built-in generates an extra load instruction.  Hence, 
the C code is better as it has fewer load instructions.


The testcase for the __builtin_vsx_set_2df is removed.  The other 
built-ins do not have testcases.


The patch has been tested on a Power 10 LE system with no regressions.

Please let me know if the patch is acceptable for mainline.  Thanks.

   Carl

--
rs6000, remove built-ins __builtin_vsx_set_1ti, __builtin_vsx_set_2df, 
__builtin_vsx_set_2di


The built-ins set a value in a vector.  The same operation can be done
in C-code.  The assembly code generated from the C-code is as good or
better than the code generated by the built-ins.  With default
optimization the number of assembly generated for the two methods are
similar.  With -O3 optimization, the assembly generated for the two
approaches is identical for the 2DF and 2DI types.  The assembly for
the C-code version of the 1Ti requres one less assembly instruction.
It also only uses one load versus two loads for the built-in.

With the removal of the built-ins, there are no other uses of the
set built-in attribute.  The code associated with the set built-in
attribute is removed.

Finally, the testcase for the __builtin_vsx_set_2df is removed.  The
other built-ins do not have testcases.

gcc/ChangeLog:
    * config/rs6000/rs6000-builtin.cc (get_element_number,
    altivec_expand_vec_set_builtin): Remove functions.
    (rs6000_expand_builtin): Remove the if statement to call
    altivec_expand_vec_set_builtin.
    * config/rs6000/rs6000-builtins.def (__builtin_vsx_set_1ti,
    __builtin_vsx_set_2df, __builtin_vsx_set_2di): Remove the
    built-in definitions.
    * config/rs6000/rs6000-gen-builtins.cc (struct attrinfo):
    Remove the isset variable from the structure.
    (parse_bif_attrs): Remove the uses of the isset variable.

gcc/testsuite/ChangeLog:
    * gcc.target/powerpc/vsx-builtin-3.c: Remove test cases for the
    __builtin_vsx_set_2df built-in.
---
 gcc/config/rs6000/rs6000-builtin.cc   | 53 ---
 gcc/config/rs6000/rs6000-builtins.def | 10 
 gcc/config/rs6000/rs6000-gen-builtins.cc  | 29 --
 .../gcc.target/powerpc/vsx-builtin-3.c    |  6 ---
 4 files changed, 11 insertions(+), 87 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-builtin.cc 
b/gcc/config/rs6000/rs6000-builtin.cc

index 117cf0125f8..099cbc82245 100644
--- a/gcc/config/rs6000/rs6000-builtin.cc
+++ b/gcc/config/rs6000/rs6000-builtin.cc
@@ -2313,56 +2313,6 @@ altivec_expand_predicate_builtin (enum insn_code 
icode, tree exp, rtx target)

   return target;
 }

-/* Return the integer constant in ARG.  Constrain it to be in the range
-   of the subparts of VEC_TYPE; issue an error if not.  */
-
-static int
-get_element_number (tree vec_type, tree arg)
-{
-  unsigned HOST_WIDE_INT elt, max = TYPE_VECTOR_SUBPARTS (vec_type) - 1;
-
-  if (!tree_fits_uhwi_p (arg)
-  || (elt = tree_to_uhwi (arg), elt > max))
-    {
-  error ("selector must be an integer constant in the range [0, 
%wi]", max);

-  return 0;
-    }
-
-  return elt;
-}
-
-/* Expand vec_set builtin.  */
-static rtx
-altivec_expand_vec_set_builtin (tree exp)
-{
-  machine_mode tmode, mode1;
-  tree arg0, arg1, arg2;
-  int elt;
-  rtx op0, op1;
-
-  arg0 = CALL_EXPR_ARG (exp, 0);
-  arg1 = CALL_EXPR_ARG (exp, 1);
-  arg2 = CALL_EXPR_ARG (exp, 2);
-
-  tmode = TYPE_MODE (TREE_TYPE (arg0));
-  mode1 = TYPE_MODE (TREE_TYPE (TREE_TYPE (arg0)));
-  gcc_assert (VECTOR_MODE_P (tmode));
-
-  op0 = expand_expr (arg0, NULL_RTX, tmode, EXPAND_NORMAL);
-  op1 = expand_expr (arg1, NULL_RTX, mode1, EXPAND_NORMAL);
-  elt = get_element_number (TREE_TYPE (arg0), arg2);
-
-  if (GET_MODE (op1) != mode1 && GET_MODE (op1) != VOIDmode)
-    op1 = convert_modes (mode1, GET_MODE (op1), op1, true);
-
-  op0 = force_reg (tmode, op0);
-  op1 = force_reg (mode1, op1);
-
-  rs6000_expand_vector_set (op0, op1, GEN_INT (elt));
-
-  return op0;
-}
-
 /* Expand vec_ext builtin.  */
 static rtx
 altivec_expan

Re: [PATCH 1/2] rs6000, Remove __builtin_vec_set_v1ti,, __builtin_vec_set_v2df, __builtin_vec_set_v2di

2024-07-23 Thread Carl Love



GCC maintainers:

This patch was previously posted.  Per the feedback, it is now the first 
of two patches to remove the set built-ins.


This patch removes the __builtin_vec_set_v1ti, __builtin_vec_set_v2df 
and __builtin_vec_set_v2di built-ins.  The users should just use normal 
C-code to update the various vector elements.  This change was 
originally intended to be part of the earlier series of cleanup 
patches.  It was initially thought that some additional work would be 
needed to do some gimple generation instead of these built-ins.  
However, the existing default code generation does produce the needed 
code.    For the vec_set bif, the equivalent C code is as good or better 
than the built-in.  For the vec_insert bif whose resolving previously 
made use of the vec_set bif, the assembly code generation is as good as 
before with the -O3 optimization.


The patch has been tested on Power 10 LE with no regressions.

Please let me know if the patch is acceptable for mainline.  Thanks.

   Carl

-
rs6000, Remove __builtin_vec_set_v1ti, __builtin_vec_set_v2df, 
__builtin_vec_set_v2di


Remove the built-ins, use the default gimple generation instead.

gcc/ChangeLog:
    * config/rs6000/rs6000-builtins.def (__builtin_vec_set_v1ti,
    __builtin_vec_set_v2df, __builtin_vec_set_v2di): Remove built-in
    definitions.
    * config/rs6000/rs6000-c.cc (resolve_vec_insert): Remove the
    handling for constant vec_insert position with
    VECTOR_UNIT_VSX_P V1TImode, V2DFmode and V2DImode modes.
---
 gcc/config/rs6000/rs6000-builtins.def | 13 -
 gcc/config/rs6000/rs6000-c.cc | 40 ---
 2 files changed, 53 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def

index 47830b7dcb0..75c33aa9ffc 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -1263,19 +1263,6 @@
   const signed long long __builtin_vec_ext_v2di (vsll, signed int);
 VEC_EXT_V2DI nothing {extract}

-;; VEC_SET_V1TI, VEC_SET_V2DF and VEC_SET_V2DI are used in
-;; resolve_vec_insert(), rs6000-c.cc
-;; TODO: Remove VEC_SET_V1TI, VEC_SET_V2DF and VEC_SET_V2DI once the uses
-;; in resolve_vec_insert are replaced by the equivalent gimple statements.
-  const vsq __builtin_vec_set_v1ti (vsq, signed __int128, const int<0,0>);
-    VEC_SET_V1TI nothing {set}
-
-  const vd __builtin_vec_set_v2df (vd, double, const int<1>);
-    VEC_SET_V2DF nothing {set}
-
-  const vsll __builtin_vec_set_v2di (vsll, signed long long, const int<1>);
-    VEC_SET_V2DI nothing {set}
-
   const vsc __builtin_vsx_cmpge_16qi (vsc, vsc);
 CMPGE_16QI vector_nltv16qi {}

diff --git a/gcc/config/rs6000/rs6000-c.cc b/gcc/config/rs6000/rs6000-c.cc
index 68519e1397f..04882c396bf 100644
--- a/gcc/config/rs6000/rs6000-c.cc
+++ b/gcc/config/rs6000/rs6000-c.cc
@@ -1524,46 +1524,6 @@ resolve_vec_insert (resolution *res, vecva_gc> *arglist,

   return error_mark_node;
 }

-  /* If we can use the VSX xxpermdi instruction, use that for insert.  */
-  machine_mode mode = TYPE_MODE (arg1_type);
-
-  if ((mode == V2DFmode || mode == V2DImode)
-  && VECTOR_UNIT_VSX_P (mode)
-  && TREE_CODE (arg2) == INTEGER_CST)
-    {
-  wide_int selector = wi::to_wide (arg2);
-  selector = wi::umod_trunc (selector, 2);
-  arg2 = wide_int_to_tree (TREE_TYPE (arg2), selector);
-
-  tree call = NULL_TREE;
-  if (mode == V2DFmode)
-    call = rs6000_builtin_decls[RS6000_BIF_VEC_SET_V2DF];
-  else if (mode == V2DImode)
-    call = rs6000_builtin_decls[RS6000_BIF_VEC_SET_V2DI];
-
-  /* Note, __builtin_vec_insert_ has vector and scalar types
-     reversed.  */
-  if (call)
-    {
-      *res = resolved;
-      return build_call_expr (call, 3, arg1, arg0, arg2);
-    }
-    }
-
-  else if (mode == V1TImode
-       && VECTOR_UNIT_VSX_P (mode)
-       && TREE_CODE (arg2) == INTEGER_CST)
-    {
-  tree call = rs6000_builtin_decls[RS6000_BIF_VEC_SET_V1TI];
-  wide_int selector = wi::zero(32);
-  arg2 = wide_int_to_tree (TREE_TYPE (arg2), selector);
-
-  /* Note, __builtin_vec_insert_ has vector and scalar types
-     reversed.  */
-  *res = resolved;
-  return build_call_expr (call, 3, arg1, arg0, arg2);
-    }
-
   /* Build *(((arg1_inner_type*) & (vector type){arg1}) + arg2) = arg0 
with

  VIEW_CONVERT_EXPR.  i.e.:
    D.3192 = v1;
--
2.45.2




  1   2   3   4   5   >