[PATCH] MATCH: add abs support for half float

2024-07-14 Thread Kugan Vivekanandarajah
This patch extends abs detection in matched for half float.

Bootstrapped and regression test on aarch64-linux-gnu. Is this OK for trunk?

gcc/ChangeLog:

* match.pd: Add pattern to convert (type)A >=/> 0 ? A : -A into abs (A) for 
half float.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/absfloat16.c: New test.

Signed-off-by: Kugan Vivekanandarajah 



0001-abs-for-half-float.patch
Description: 0001-abs-for-half-float.patch


[PATCH] gimple-fold: consistent dump of builtin call simplifications

2024-07-14 Thread rubin.gerritsen
Previously only simplifications of the `__st[xrp]cpy_chk`
were dumped. Now all call replacement simplifications are
dumped.

Examples of statements with corresponding dumpfile entries:

`printf("mystr\n");`:
  optimized: simplified printf to __builtin_puts
`printf("%c", 'a');`:
  optimized: simplified printf to __builtin_putchar
`printf("%s\n", "mystr");`:
  optimized: simplified printf to __builtin_puts

2024-07-13  Rubin Gerritsen  

gcc/ChangeLog:

   * gimple-fold.cc (dump_transformation): Moved definition.
   (replace_call_with_call_and_fold): Calls dump_transformation.
   (gimple_fold_builtin_stxcpy_chk): Removes call to
   dump_transformation, now in replace_call_with_call_and_fold.
   (gimple_fold_builtin_stxncpy_chk): Removes call to
dump_transformation, now in replace_call_with_call_and_fold.
---
 gcc/gimple-fold.cc | 22 ++
 1 file changed, 10 insertions(+), 12 deletions(-)

diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc
index 7c534d56bf1..b20d3a2ff9a 100644
--- a/gcc/gimple-fold.cc
+++ b/gcc/gimple-fold.cc
@@ -802,6 +802,15 @@ gimplify_and_update_call_from_tree
(gimple_stmt_iterator *si_p, tree expr)
   gsi_replace_with_seq_vops (si_p, stmts);
 }

+/* Print a message in the dump file recording transformation of FROM to
TO.  */
+
+static void
+dump_transformation (gcall *from, gcall *to)
+{
+  if (dump_enabled_p ())
+dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, from, "simplified %T to
%T\n",
+  gimple_call_fn (from), gimple_call_fn (to));
+}

 /* Replace the call at *GSI with the gimple value VAL.  */

@@ -835,6 +844,7 @@ static void
 replace_call_with_call_and_fold (gimple_stmt_iterator *gsi, gimple *repl)
 {
   gimple *stmt = gsi_stmt (*gsi);
+  dump_transformation (as_a  (stmt), as_a  (repl));
   gimple_call_set_lhs (repl, gimple_call_lhs (stmt));
   gimple_set_location (repl, gimple_location (stmt));
   gimple_move_vops (repl, stmt);
@@ -3090,16 +3100,6 @@ gimple_fold_builtin_memory_chk (gimple_stmt_iterator
*gsi,
   return true;
 }

-/* Print a message in the dump file recording transformation of FROM to
TO.  */
-
-static void
-dump_transformation (gcall *from, gcall *to)
-{
-  if (dump_enabled_p ())
-dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, from, "simplified %T to
%T\n",
-  gimple_call_fn (from), gimple_call_fn (to));
-}
-
 /* Fold a call to the __st[rp]cpy_chk builtin.
DEST, SRC, and SIZE are the arguments to the call.
IGNORE is true if return value can be ignored.  FCODE is the BUILT_IN_*
@@ -3189,7 +3189,6 @@ gimple_fold_builtin_stxcpy_chk (gimple_stmt_iterator
*gsi,
 return false;

   gcall *repl = gimple_build_call (fn, 2, dest, src);
-  dump_transformation (stmt, repl);
   replace_call_with_call_and_fold (gsi, repl);
   return true;
 }
@@ -3235,7 +3234,6 @@ gimple_fold_builtin_stxncpy_chk (gimple_stmt_iterator
*gsi,
 return false;

   gcall *repl = gimple_build_call (fn, 3, dest, src, len);
-  dump_transformation (stmt, repl);
   replace_call_with_call_and_fold (gsi, repl);
   return true;
 }
-- 
2.34.1


[pushed] wwwdocs: gcc-*: Tweak links to testing instructions to use https

2024-07-14 Thread Gerald Pfeifer
Business as usual; pushed.

Gerald

---
 htdocs/gcc-5/buildstat.html | 2 +-
 htdocs/gcc-6/buildstat.html | 2 +-
 htdocs/gcc-7/buildstat.html | 2 +-
 htdocs/gcc-8/buildstat.html | 2 +-
 4 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/htdocs/gcc-5/buildstat.html b/htdocs/gcc-5/buildstat.html
index 59c9a5a6..03cbb03e 100644
--- a/htdocs/gcc-5/buildstat.html
+++ b/htdocs/gcc-5/buildstat.html
@@ -16,7 +16,7 @@ summaries.
 
 Instructions for running the testsuite and for submitting test results
 are part of
-http://gcc.gnu.org/install/test.html";>
+https://gcc.gnu.org/install/test.html";>
 Installing GCC: Testing.
 
 
diff --git a/htdocs/gcc-6/buildstat.html b/htdocs/gcc-6/buildstat.html
index a4609405..06a87da7 100644
--- a/htdocs/gcc-6/buildstat.html
+++ b/htdocs/gcc-6/buildstat.html
@@ -16,7 +16,7 @@ summaries.
 
 Instructions for running the testsuite and for submitting test results
 are part of
-http://gcc.gnu.org/install/test.html";>
+https://gcc.gnu.org/install/test.html";>
 Installing GCC: Testing.
 
 
diff --git a/htdocs/gcc-7/buildstat.html b/htdocs/gcc-7/buildstat.html
index fb9524d1..62659059 100644
--- a/htdocs/gcc-7/buildstat.html
+++ b/htdocs/gcc-7/buildstat.html
@@ -16,7 +16,7 @@ summaries.
 
 Instructions for running the test suite and for submitting test results
 are part of
-http://gcc.gnu.org/install/test.html";>
+https://gcc.gnu.org/install/test.html";>
 Installing GCC: Testing.
 
 
diff --git a/htdocs/gcc-8/buildstat.html b/htdocs/gcc-8/buildstat.html
index 0e7a808e..ad0ec217 100644
--- a/htdocs/gcc-8/buildstat.html
+++ b/htdocs/gcc-8/buildstat.html
@@ -16,7 +16,7 @@ summaries.
 
 Instructions for running the test suite and for submitting test results
 are part of
-http://gcc.gnu.org/install/test.html";>
+https://gcc.gnu.org/install/test.html";>
 Installing GCC: Testing.
 
 
-- 
2.45.2


[match.pd PATCH] PR tree-optimization/114661: Generalize MULT_EXPR recognition (take #2)

2024-07-14 Thread Roger Sayle

Hi Richard,
Many thanks for the review and recommendation to use nop_convert?.
This revised patch implements that suggestion, which required a little
experimentation/tweaking as ranger/EVRP records the ranges on the
useless type conversions rather than the multiplications.

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32}
with no new failures.  Ok for mainline?


2024-07-14  Roger Sayle  
Richard Biener  

gcc/ChangeLog
PR tree-optimization/114661
* match.pd ((X*C1)|(X*C2) to X*(C1+C2)): Allow optional useless
type conversions around multiplicaitions, such as those inserted
by this transformation.

gcc/testsuite/ChangeLog
PR tree-optimization/114661
* gcc.dg/pr114661.c: New test case.


Thanks again,
Roger
--

> -Original Message-
> From: Richard Biener 
> Sent: 10 July 2024 12:34
> To: Roger Sayle 
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [match.pd PATCH] PR tree-optimization/114661: Generalize
> MULT_EXPR recognition.
> 
> On Wed, Jul 10, 2024 at 12:28 AM Roger Sayle 
> wrote:
> >
> > This patch resolves PR tree-optimization/114661, by generalizing the
> > set of expressions that we canonicalize to multiplication.  This
> > extends the
> > optimization(s) contributed (by me) back in July 2021.
> > https://gcc.gnu.org/pipermail/gcc-patches/2021-July/575999.html
> >
> > The existing transformation folds (X*C1)^(X< > allowed.  A subtlety is that for non-wrapping integer types, we
> > actually fold this into (int)((unsigned)X*C3) so that we don't
> > introduce an undefined overflow that wasn't in the original.
> > Unfortunately, this transformation confuses itself, as the type-safe
> > multiplication isn't recognized when further combining bit operations.
> > Fixed here by adding transforms to turn (int)((unsigned)X*C1)^(X< > into (int)((unsigned)X*C3) so that match.pd and EVRP can continue to
> > construct multiplications.
> >
> > For the example given in the PR:
> >
> > unsigned mul(unsigned char c) {
> > if (c > 3) __builtin_unreachable();
> > return c << 18 | c << 15 |
> >c << 12 | c << 9 |
> >c << 6 | c << 3 | c;
> > }
> >
> > GCC on x86_64 with -O2 previously generated:
> >
> > mul:movzbl  %dil, %edi
> > leal(%rdi,%rdi,8), %edx
> > leal0(,%rdx,8), %eax
> > movl%edx, %ecx
> > sall$15, %edx
> > orl %edi, %eax
> > sall$9, %ecx
> > orl %ecx, %eax
> > orl %edx, %eax
> > ret
> >
> > with this patch we now generate:
> >
> > mul:movzbl  %dil, %eax
> > imull   $299593, %eax, %eax
> > ret
> >
> > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> > and make -k check, both with and without --target_board=unix{-m32}
> > with no new failures.  Ok for mainline?
> 
> I'm looking at the difference between the existing
> 
>  (simplify
>   (op:c (mult:s@0 @1 INTEGER_CST@2)
> (lshift:s@3 @1 INTEGER_CST@4))
>   (if (INTEGRAL_TYPE_P (type) && TYPE_OVERFLOW_WRAPS (type)
>&& tree_int_cst_sgn (@4) > 0
>&& (tree_nonzero_bits (@0) & tree_nonzero_bits (@3)) == 0)
>(with { wide_int wone = wi::one (TYPE_PRECISION (type));
>wide_int c = wi::add (wi::to_wide (@2),
>  wi::lshift (wone, wi::to_wide (@4))); }
> (mult @1 { wide_int_to_tree (type, c); }
> 
> and
> 
> + (simplify
> +  (op:c (convert:s@0 (mult:s@1 (convert @2) INTEGER_CST@3))
> +   (lshift:s@4 @2 INTEGER_CST@5))
> +  (if (INTEGRAL_TYPE_P (type)
> +   && INTEGRAL_TYPE_P (TREE_TYPE (@1))
> +   && TREE_TYPE (@2) == type
> +   && TYPE_UNSIGNED (TREE_TYPE (@1))
> +   && TYPE_PRECISION (type) == TYPE_PRECISION (TREE_TYPE (@1))
> +   && tree_int_cst_sgn (@5) > 0
> +   && (tree_nonzero_bits (@0) & tree_nonzero_bits (@4)) == 0)
> +   (with { tree t = TREE_TYPE (@1);
> +  wide_int wone = wi::one (TYPE_PRECISION (t));
> +  wide_int c = wi::add (wi::to_wide (@3),
> +wi::lshift (wone, wi::to_wide (@5))); }
> +(convert (mult:t (convert:t @2) { wide_int_to_tree (t, c); })
> 
> and wonder whether wrapping of the multiplication is required for correctness,
> specifically the former seems to allow signed types with -fwrapv while the 
> latter
> won't.  It also looks the patterns could be merged doing
> 
>  (simplify
>   (op:c (nop_convert:s? (mult:s@0 (nop_convert? @1) INTEGER_CST@2)
> (lshift:s@3 @1 INTEGER_CST@4))
> 
> and by using nop_convert instead of convert simplify the condition?
> 
> Richard.
> 
> >
> > 2024-07-09  Roger Sayle  
> >
> > gcc/ChangeLog
> > PR tree-optimization/114661
> > * match.pd ((X*C1)|(X*C2) to X*(C1+C2)): Additionally recognize
> > multiplications surrounded by casts to an unsigned type and back
> > such as those generated by

[x86 PATCH] Tweak i386-expand.cc to restore bootstrap on RHEL.

2024-07-14 Thread Roger Sayle

This is a minor change to restore bootstrap on systems using gcc 4.8
as a host compiler.  The fatal error is:

In file included from gcc/gcc/coretypes.h:471:0,
 from gcc/gcc/config/i386/i386-expand.cc:23:
gcc/gcc/config/i386/i386-expand.cc: In function 'void
ix86_expand_fp_absneg_operator(rtx_code, machine_mode, rtx_def**)':
./insn-modes.h:315:75: error: temporary of non-literal type
'scalar_float_mode' in a constant expression
 #define HFmode (scalar_float_mode ((scalar_float_mode::from_int) E_HFmode))
   ^
gcc/gcc/config/i386/i386-expand.cc:2179:8: note: in expansion of macro
'HFmode'
   case HFmode:
^


The solution is to use the E_?Fmode enumeration constants as case values
in switch statements.

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32}
with no new failures (from this change).  Ok for mainline?


2024-07-14  Roger Sayle  

* config/i386/i386-expand.cc (ix86_expand_fp_absneg_operator):
Use E_?Fmode enumeration constants in switch statement.
(ix86_expand_copysign): Likewise.
(ix86_expand_xorsign): Likewise.


Thanks in advance,
Roger
--

diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
index cfcfdd9..9a31e6d 100644
--- a/gcc/config/i386/i386-expand.cc
+++ b/gcc/config/i386/i386-expand.cc
@@ -2176,19 +2176,19 @@ ix86_expand_fp_absneg_operator (enum rtx_code code, 
machine_mode mode,
 
   switch (mode)
   {
-  case HFmode:
+  case E_HFmode:
 use_sse = true;
 vmode = V8HFmode;
 break;
-  case BFmode:
+  case E_BFmode:
 use_sse = true;
 vmode = V8BFmode;
 break;
-  case SFmode:
+  case E_SFmode:
 use_sse = TARGET_SSE_MATH && TARGET_SSE;
 vmode = V4SFmode;
 break;
-  case DFmode:
+  case E_DFmode:
 use_sse = TARGET_SSE_MATH && TARGET_SSE2;
 vmode = V2DFmode;
 break;
@@ -2330,19 +2330,19 @@ ix86_expand_copysign (rtx operands[])
 
   switch (mode)
   {
-  case HFmode:
+  case E_HFmode:
 vmode = V8HFmode;
 break;
-  case BFmode:
+  case E_BFmode:
 vmode = V8BFmode;
 break;
-  case SFmode:
+  case E_SFmode:
 vmode = V4SFmode;
 break;
-  case DFmode:
+  case E_DFmode:
 vmode = V2DFmode;
 break;
-  case TFmode:
+  case E_TFmode:
 vmode = mode;
 break;
   default:
@@ -2410,16 +2410,16 @@ ix86_expand_xorsign (rtx operands[])
 
   switch (mode)
   {
-  case HFmode:
+  case E_HFmode:
 vmode = V8HFmode;
 break;
-  case BFmode:
+  case E_BFmode:
 vmode = V8BFmode;
 break;
-  case SFmode:
+  case E_SFmode:
 vmode = V4SFmode;
 break;
-  case DFmode:
+  case E_DFmode:
 vmode = V2DFmode;
 break;
   default:


Re: [pushed] Add function filtering to gcov

2024-07-14 Thread Roger Sayle


I’m seeing (dejagnu) testsuite problems from this (recent) patch.

Running /home/roger/GCC/patchem/gcc/testsuite/gcc.misc-tests/gcov.exp ...
ERROR: (DejaGnu) proc "lmap key { snd } {
if { $key in $seen } continue
set key
}" does not exist.
The error code is NONE
The info on the error is:
invalid command name "lmap"
while executing
"::tcl_unknown lmap key { snd } {
if { $key in $seen } continue
set key
}"
("uplevel" body line 1)
invoked from within
"uplevel 1 ::tcl_unknown $args"


I guess (but I’m not sure) that lmap requires Tcl 8.6, and my RHEL-based
system has
Tcl 8.5.  Is there a simple workaround to avoid the use of lmap?  Admittedly
the
systems that I use are a bit "long in the tooth" (obsolete?) [but that's
also why
they're available for use with my gcc "hobby"].

Thoughts?
Roger
--




Re: [x86 PATCH] Tweak i386-expand.cc to restore bootstrap on RHEL.

2024-07-14 Thread Uros Bizjak
On Sun, Jul 14, 2024 at 3:42 PM Roger Sayle  wrote:
>
>
> This is a minor change to restore bootstrap on systems using gcc 4.8
> as a host compiler.  The fatal error is:
>
> In file included from gcc/gcc/coretypes.h:471:0,
>  from gcc/gcc/config/i386/i386-expand.cc:23:
> gcc/gcc/config/i386/i386-expand.cc: In function 'void
> ix86_expand_fp_absneg_operator(rtx_code, machine_mode, rtx_def**)':
> ./insn-modes.h:315:75: error: temporary of non-literal type
> 'scalar_float_mode' in a constant expression
>  #define HFmode (scalar_float_mode ((scalar_float_mode::from_int) E_HFmode))
>^
> gcc/gcc/config/i386/i386-expand.cc:2179:8: note: in expansion of macro
> 'HFmode'
>case HFmode:
> ^
>
>
> The solution is to use the E_?Fmode enumeration constants as case values
> in switch statements.
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32}
> with no new failures (from this change).  Ok for mainline?
>
>
> 2024-07-14  Roger Sayle  
>
> * config/i386/i386-expand.cc (ix86_expand_fp_absneg_operator):
> Use E_?Fmode enumeration constants in switch statement.
> (ix86_expand_copysign): Likewise.
> (ix86_expand_xorsign): Likewise.

OK, also for backports.

Thanks,
Uros.

>
>
> Thanks in advance,
> Roger
> --
>


Re: [Patch, fortran] PR84868 - [11/12/13/14/15 Regression] ICE in gfc_conv_descriptor_offset, at fortran/trans-array.c:208

2024-07-14 Thread Harald Anlauf

Hi Paul,

at first sight the patch seems to be the right approach, but
it breaks for the following two variations:

(1) LEN_TRIM is elemental, but the following is erroneously rejected:

  function g(n) result(z)
integer,  intent(in) :: n
character, parameter :: d(3,3) = 'x'
character(len_trim(d(n,n))) :: z
z = d(n,n)
  end

This is fixed here by commenting/removing the line

  expr->rank = 1;

as the result shall have the same shape as the argument.
Can you check?

(2) The handling of namespaces is problematic: using the same name
for a parameter within procedures in the same scope generates another
ICE.  The following testcase demonstrates this:

module m
  implicit none
  integer :: c
contains
  function f(n) result(z)
integer,  intent(in) :: n
character, parameter :: c(3) = ['x', 'y', 'z']
character(len_trim(c(n)))  :: z
z = c(n)
  end
  function h(n) result(z)
integer,  intent(in) :: n
character, parameter :: c(3,3) = 'x'
character(len_trim(c(n,n)))  :: z
z = c(n,n)
  end
end
program p
  use m
  implicit none
  print *, f(2)
  print *, h(1)
end

I get:

pr84868-z0.f90:22:15:

   22 |   print *, h(1)
  |   1
internal compiler error: in gfc_conv_descriptor_stride_get, at
fortran/trans-array.cc:483
0x243e156 internal_error(char const*, ...)
../../gcc-trunk/gcc/diagnostic-global-context.cc:491
0x96dd70 fancy_abort(char const*, int, char const*)
../../gcc-trunk/gcc/diagnostic.cc:1725
0x749d68 gfc_conv_descriptor_stride_get(tree_node*, tree_node*)
../../gcc-trunk/gcc/fortran/trans-array.cc:483
[rest of traceback elided]

Renaming the parameter array in h solves the problem.

Am 13.07.24 um 17:57 schrieb Paul Richard Thomas:

Hi All,

Harald has pointed out that I attached the ChangeLog twice and the patch
not at all :-(

Please find the patch duly attached.

Paul


On Sat, 13 Jul 2024 at 10:58, Paul Richard Thomas <
paul.richard.tho...@gmail.com> wrote:


Hi All,

After messing around with argument mapping, where I found and fixed
another bug, I realised that the problem lay with simplification of
len_trim with an argument that is the element of a parameter array. The fix
was then a straightforward lift of existing code in expr.cc. The mapping
bug is also fixed by supplying the se string length when building character
typespecs.

Regtests just fine. OK for mainline? I believe that this is safe for
backporting to 14-branch before the 14.2 release - thoughts?


If you manage to correct/fix the above issues, I am fine with
backporting, as this appears a very reasonable fix.

Thanks,
Harald


Regards

Paul







Re: [pushed] Add function filtering to gcov

2024-07-14 Thread Jørgen Kvalsvik

Certainly, I can rewrite from lmap. I'll send a patch shortly.

On 7/14/24 16:27, Roger Sayle wrote:


I’m seeing (dejagnu) testsuite problems from this (recent) patch.

Running /home/roger/GCC/patchem/gcc/testsuite/gcc.misc-tests/gcov.exp ...
ERROR: (DejaGnu) proc "lmap key { snd } {
 if { $key in $seen } continue
 set key
 }" does not exist.
The error code is NONE
The info on the error is:
invalid command name "lmap"
 while executing
"::tcl_unknown lmap key { snd } {
 if { $key in $seen } continue
 set key
 }"
 ("uplevel" body line 1)
 invoked from within
"uplevel 1 ::tcl_unknown $args"


I guess (but I’m not sure) that lmap requires Tcl 8.6, and my RHEL-based
system has
Tcl 8.5.  Is there a simple workaround to avoid the use of lmap?  Admittedly
the
systems that I use are a bit "long in the tooth" (obsolete?) [but that's
also why
they're available for use with my gcc "hobby"].

Thoughts?
Roger
--




Re: [PATCH] MATCH: add abs support for half float

2024-07-14 Thread Andrew Pinski
On Sun, Jul 14, 2024 at 1:12 AM Kugan Vivekanandarajah
 wrote:
>
> This patch extends abs detection in matched for half float.
>
> Bootstrapped and regression test on aarch64-linux-gnu. Is this OK for trunk?

This is basically this pattern:
```
 /* A >=/> 0 ? A : -Asame as abs (A) */
 (for cmp (ge gt)
  (simplify
   (cnd (cmp @0 zerop) @1 (negate @1))
(if (!HONOR_SIGNED_ZEROS (TREE_TYPE(@0))
 && !TYPE_UNSIGNED (TREE_TYPE(@0))
 && bitwise_equal_p (@0, @1))
 (if (TYPE_UNSIGNED (type))
  (absu:type @0)
  (abs @0)
```

except extended to handle an optional convert. Why didn't you just
extend the above pattern to handle the convert instead? Also I think
you have an issue with unsigned types with the comparison.
Also you should extend the -abs(A) pattern right below it in a similar fashion.

Thanks,
Andrew Pinski


>
> gcc/ChangeLog:
>
> * match.pd: Add pattern to convert (type)A >=/> 0 ? A : -A into abs (A) for 
> half float.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/tree-ssa/absfloat16.c: New test.
>
> Signed-off-by: Kugan Vivekanandarajah 
>


[PATCH] Use foreach, not lmap, for tcl <= 8.5 compat

2024-07-14 Thread Jørgen Kvalsvik
lmap was introduced in tcl 8.6, and while it was released in 2012, lmap
does not really make too much of a difference to warrant the friction on
consverative (and relevant) systems.

gcc/testsuite/ChangeLog:

* lib/gcov.exp: Use foreach for tcl <= 8.5.
---
 gcc/testsuite/lib/gcov.exp | 28 
 1 file changed, 16 insertions(+), 12 deletions(-)

diff --git a/gcc/testsuite/lib/gcov.exp b/gcc/testsuite/lib/gcov.exp
index 3fc7b65bee5..68696c9aa50 100644
--- a/gcc/testsuite/lib/gcov.exp
+++ b/gcc/testsuite/lib/gcov.exp
@@ -512,25 +512,29 @@ proc verify-filters { testname testcase file expected 
unexpected } {
 
 set seen [lsort -unique $seen]
 
-set expected [lmap key $expected {
-   if { $key in $seen } continue
-   set key
-}]
-set unexpected [lmap key $unexpected {
-   if { $key ni $seen } continue
-   set key
-}]
-
-foreach sym $expected {
+set ex {}
+foreach key $expected {
+   if { $key ni $seen } {
+   lappend ex $key
+   }
+}
+set unex {}
+foreach key $unexpected {
+   if { $key in $seen } {
+   lappend unex $key
+   }
+}
+
+foreach sym $ex {
fail "Did not see expected symbol '$sym'"
 }
 
-foreach sym $unexpected {
+foreach sym $unex {
fail "Found unexpected symbol '$sym'"
 }
 
 close $fd
-return [expr [llength $expected] + [llength $unexpected]]
+return [expr [llength $ex] + [llength $unex]]
 }
 
 proc verify-prime-paths { testname testcase file } {
-- 
2.39.2



[PR middle-end/114635] Set OMP safelen handling to INT_MAX when the pragma didn’t provide one.

2024-07-14 Thread Kugan Vivekanandarajah
OMP safelen handling is assigning backend provided max as an int even when the 
pragma didn’t provide one. As a result, vectoriser is rejecting SVE modes while 
comparing poly_int with the safelen.  

That is, for the attached test case,  omp_max_vf gets [16, 16] from the 
backend. This then becomes 16 as omp safelen is an integer. When vectoriser 
compares the potential vector mode with  maybe_lt (max_vf, min_vf)) , this 
would fail resulting in any SVE vector mode being  selected.

One suggestion there was to set safelen to INT_MAX when OMP pragma does not 
provide safely explicitly. 

Bootstrapped and regression tested on aarch64-linux-gnu. Is this OK for trunk.

Thanks,
Kugan



PR middle-end/114635
PR 114635

gcc/ChangeLog:

* omp-low.cc (lower_rec_input_clauses): Set INT_MAX
when safelen is not provided instead of using backend
provided safelen.

gcc/testsuite/ChangeLog:

* c-c++-common/pr114635-1.cpp: New test.
* c-c++-common/pr114635-2.cpp: New test.

Signed-off-by: Kugan Vivekanandarajah 

diff --git a/gcc/omp-low.cc b/gcc/omp-low.cc
index 4d003f42098..69feedbde54 100644
--- a/gcc/omp-low.cc
+++ b/gcc/omp-low.cc
@@ -6980,6 +6980,8 @@ lower_rec_input_clauses (tree clauses, gimple_seq *ilist, 
gimple_seq *dlist,
  || (poly_int_tree_p (OMP_CLAUSE_SAFELEN_EXPR (c), &safe_len)
  && maybe_gt (safe_len, sctx.max_vf)))
{
+ if (!sctx.is_simt && maybe_ne (sctx.max_vf, 1U))
+   sctx.max_vf = INT_MAX;
  c = build_omp_clause (UNKNOWN_LOCATION, OMP_CLAUSE_SAFELEN);
  OMP_CLAUSE_SAFELEN_EXPR (c) = build_int_cst (integer_type_node,
   sctx.max_vf);
diff --git a/gcc/testsuite/c-c++-common/pr114635-1.cpp 
b/gcc/testsuite/c-c++-common/pr114635-1.cpp
new file mode 100644
index 000..9bf52ba85b0
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/pr114635-1.cpp
@@ -0,0 +1,60 @@
+
+/* PR middle-end/114635 */
+/* { dg-do compile } */
+/* { dg-options "-fopenmp -O3 -fdump-tree-omplower" } */
+namespace std {
+  inline constexpr float
+  sqrt(float __x)
+  { return __builtin_sqrtf(__x); }
+}
+extern const float PolyCoefficients4[] = {
+  0.263729f, -0.0686285f, 0.00882248f, -0.000592487f, 0.164622f
+};
+
+template 
+static void GravityForceKernel(int n, float *__restrict__ x, float 
*__restrict__ y,
+   float *__restrict__ z, float *__restrict__ mass,
+   float x0, float y0, float z0,
+   float MaxSepSqrd, float SofteningLenSqrd,
+   float &__restrict__ ax, float &__restrict__ ay,
+   float &__restrict__ az) {
+  float lax = 0.0f, lay = 0.0f, laz = 0.0f;
+
+#pragma omp simd reduction(+:lax,lay,laz)
+
+  for (int i = 0; i < n; ++i) {
+float dx = x[i] - x0, dy = y[i] - y0, dz = z[i] - z0;
+float r2 = dx * dx + dy * dy + dz * dz;
+
+if (r2 >= MaxSepSqrd || r2 == 0.0f)
+  continue;
+
+float r2s = r2 + SofteningLenSqrd;
+float f = PolyCoefficients[PolyOrder];
+for (int p = 1; p <= PolyOrder; ++p)
+  f = PolyCoefficients[PolyOrder-p] + r2*f;
+
+f = (1.0f / (r2s * std::sqrt(r2s)) - f) * mass[i];
+
+lax += f * dx;
+lay += f * dy;
+laz += f * dz;
+  }
+
+  ax += lax;
+  ay += lay;
+  az += laz;
+}
+
+void GravityForceKernel4(int n, float *__restrict__ x, float *__restrict__ y,
+ float *__restrict__ z, float *__restrict__ mass,
+ float x0, float y0, float z0,
+ float MaxSepSqrd, float SofteningLenSqrd,
+ float &__restrict__ ax, float &__restrict__ ay,
+ float &__restrict__ az) {
+  GravityForceKernel<4, PolyCoefficients4>(n, x, y, z, mass, x0, y0, z0,
+   MaxSepSqrd, SofteningLenSqrd,
+   ax, ay, az);
+}
+
+/* { dg-final { scan-tree-dump "safelen(2147483647)" "omplower" } } */
diff --git a/gcc/testsuite/c-c++-common/pr114635-2.cpp 
b/gcc/testsuite/c-c++-common/pr114635-2.cpp
new file mode 100644
index 000..7de2c8eea73
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/pr114635-2.cpp
@@ -0,0 +1,61 @@
+
+/* PR middle-end/114635 */
+/* { dg-do compile } */
+/* { dg-options "-fopenmp -O3 -fdump-tree-omplower" } */
+
+namespace std {
+  inline constexpr float
+  sqrt(float __x)
+  { return __builtin_sqrtf(__x); }
+}
+extern const float PolyCoefficients4[] = {
+  0.263729f, -0.0686285f, 0.00882248f, -0.000592487f, 0.164622f
+};
+
+template 
+static void GravityForceKernel(int n, float *__restrict__ x, float 
*__restrict__ y,
+   float *__restrict__ z, float *__restrict__ mass,
+   float x0, float y0, float z0,
+   float MaxSepSqrd, float SofteningLenSqrd,
+   float &__

Re: Re: [PATCH 3/3 v3] RISC-V: Add md files for vector BFloat16

2024-07-14 Thread wangf...@eswincomputing.com
On 2024-07-12 06:19  Jeff Law  wrote:
>
>
>
>On 7/11/24 1:10 AM, Feng Wang wrote:
>> V3: Add Bfloat16 vector insn in generic-vector-ooo.md
>> v2: Rebase
>> Accroding to the BFloat16 spec, some vector iterators and new pattern
>> are added in md files.
>>
>> Signed-off-by: Feng Wang 
>> gcc/ChangeLog:
>>
>> * config/riscv/generic-vector-ooo.md: Add def_insn_reservation for vector 
>> BFloat16.
>> * config/riscv/riscv.md: Add new insn name for vector BFloat16.
>> * config/riscv/vector-iterators.md: Add some iterators for vector BFloat16.
>> * config/riscv/vector.md: Add some attribute for vector BFloat16.
>> * config/riscv/vector-bfloat16.md: New file. Add insn pattern vector 
>> BFloat16.
>Note the spaces vs tabs issue pointed out by the lint phase.  Those
>should be fixed.  I don't think the rest of the lint issues need to be
>fixed. 
>jeff
Thanks, will fix this lint error type according to the CI log and then commit 
it.

Re: [PATCH] AVX512BF16: Do not allow permutation with vcvtne2ps2bf16 [PR115889]

2024-07-14 Thread Hongtao Liu
On Sat, Jul 13, 2024 at 3:44 PM Hongyu Wang  wrote:
>
> Hi,
>
> According to the instruction spec of AVX512BF16, the convert from float
> to BF16 is not a simple truncation. It has special handling for
> denormal/nan, even for normal float it will add an extra bias according
> to the least significant bit for bf number. This means we cannot use the
> vcvtne2ps2bf16 for any bf16 vector shuffle.
> The optimization introduced in r15-1368 adds a specific split to convert
> HImode permutation with this instruction, so remove it and treat the
> BFmode permutation same as HFmode.
>
> Bootstrapped & regtested on x86_64-pc-linux-gnu. OK for trunk?
Could you just git revert 6d0b7b69d143025f271d0041cfa29cf26e6c343b?
>
> gcc/ChangeLog:
>
> PR target/115889
> * config/i386/predicates.md (vcvtne2ps2bf_parallel): Remove.
> * config/i386/sse.md (hi_cvt_bf): Remove.
> (HI_CVT_BF): Likewise.
> (vpermt2_sepcial_bf16_shuffle_):Likewise.
>
> gcc/testsuite/ChangeLog:
>
> PR target/115889
> * gcc.target/i386/vpermt2-special-bf16-shufflue.c: Adjust option
> and output scan.
> ---
>  gcc/config/i386/predicates.md | 11 --
>  gcc/config/i386/sse.md| 35 ---
>  .../i386/vpermt2-special-bf16-shufflue.c  |  5 ++-
>  3 files changed, 2 insertions(+), 49 deletions(-)
>
> diff --git a/gcc/config/i386/predicates.md b/gcc/config/i386/predicates.md
> index a894847adaf..5d0bb1e0f54 100644
> --- a/gcc/config/i386/predicates.md
> +++ b/gcc/config/i386/predicates.md
> @@ -2327,14 +2327,3 @@ (define_predicate "apx_ndd_add_memory_operand"
>
>return true;
>  })
> -
> -;; Check that each element is odd and incrementally increasing from 1
> -(define_predicate "vcvtne2ps2bf_parallel"
> -  (and (match_code "const_vector")
> -   (match_code "const_int" "a"))
> -{
> -  for (int i = 0; i < XVECLEN (op, 0); ++i)
> -if (INTVAL (XVECEXP (op, 0, i)) != (2 * i + 1))
> -  return false;
> -  return true;
> -})
> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> index b3b4697924b..c134494cd20 100644
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -31460,38 +31460,3 @@ (define_insn "vpdp_"
>"TARGET_AVXVNNIINT16"
>"vpdp\t{%3, %2, %0|%0, %2, %3}"
> [(set_attr "prefix" "vex")])
> -
> -(define_mode_attr hi_cvt_bf
> -  [(V8HI "v8bf") (V16HI "v16bf") (V32HI "v32bf")])
> -
> -(define_mode_attr HI_CVT_BF
> -  [(V8HI "V8BF") (V16HI "V16BF") (V32HI "V32BF")])
> -
> -(define_insn_and_split "vpermt2_sepcial_bf16_shuffle_"
> -  [(set (match_operand:VI2_AVX512F 0 "register_operand")
> -   (unspec:VI2_AVX512F
> - [(match_operand:VI2_AVX512F 1 "vcvtne2ps2bf_parallel")
> -  (match_operand:VI2_AVX512F 2 "register_operand")
> -  (match_operand:VI2_AVX512F 3 "nonimmediate_operand")]
> -  UNSPEC_VPERMT2))]
> -  "TARGET_AVX512VL && TARGET_AVX512BF16 && ix86_pre_reload_split ()"
> -  "#"
> -  "&& 1"
> -  [(const_int 0)]
> -{
> -  rtx op0 = gen_reg_rtx (mode);
> -  operands[2] = lowpart_subreg (mode,
> -   force_reg (mode, operands[2]),
> -   mode);
> -  operands[3] = lowpart_subreg (mode,
> -   force_reg (mode, operands[3]),
> -   mode);
> -
> -  emit_insn (gen_avx512f_cvtne2ps2bf16_(op0,
> -  operands[3],
> -  operands[2]));
> -  emit_move_insn (operands[0], lowpart_subreg (mode, op0,
> -  mode));
> -  DONE;
> -}
> -[(set_attr "mode" "")])
> diff --git a/gcc/testsuite/gcc.target/i386/vpermt2-special-bf16-shufflue.c 
> b/gcc/testsuite/gcc.target/i386/vpermt2-special-bf16-shufflue.c
> index 5c65f2a9884..4cbc85735de 100755
> --- a/gcc/testsuite/gcc.target/i386/vpermt2-special-bf16-shufflue.c
> +++ b/gcc/testsuite/gcc.target/i386/vpermt2-special-bf16-shufflue.c
> @@ -1,7 +1,6 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O2 -mavx512bf16 -mavx512vl" } */
> -/* { dg-final { scan-assembler-not "vpermi2b" } } */
> -/* { dg-final { scan-assembler-times "vcvtne2ps2bf16" 3 } } */
> +/* { dg-options "-O2 -mavx512vbmi -mavx512vl" } */
> +/* { dg-final { scan-assembler-times "vpermi2w" 3 } } */
>
>  typedef __bf16 v8bf __attribute__((vector_size(16)));
>  typedef __bf16 v16bf __attribute__((vector_size(32)));
> --
> 2.34.1
>


-- 
BR,
Hongtao


[PATCHv2, rs6000] Add TARGET_FLOAT128_HW guard for quad-precision insns

2024-07-14 Thread HAO CHEN GUI
Hi,
  This patch adds TARGET_FLOAT128_HW into pattern conditions for quad-
precision insns. Some qp patterns are guarded by TARGET_P9_VECTOR
originally, so replace it with "TARGET_FLOAT128_HW".

  For test case float128-cmp2-runnable.c, it should be guarded with
ppc_float128_hw as it calls qp insns. The p9vector_hw is covered with
ppc_float128_hw, so it's removed.

  Compared to previous version, the main change it to split redundant
FLOAT128_IEEE_P removal to another patch.

  Bootstrapped and tested on powerpc64-linux BE and LE with no
regressions. Is it OK for trunk?

Thanks
Gui Haochen

ChangeLog
rs6000: Add TARGET_FLOAT128_HW guard for quad-precision insns

gcc/
* config/rs6000/rs6000.md (floatti2, floatunsti2,
fix_truncti2): Add guard TARGET_FLOAT128_HW.
* config/rs6000/vsx.md (xsxexpqp__,
xsxsigqp__, xsiexpqpf_,
xsiexpqp__, xscmpexpqp__,
*xscmpexpqp, xststdcnegqp_): Replace guard TARGET_P9_VECTOR
with TARGET_FLOAT128_HW.
(xststdc_, *xststdc_, isinf2): Add guard
TARGET_FLOAT128_HW for the IEEE128 modes.

gcc/testsuite/
* testsuite/gcc.target/powerpc/float128-cmp2-runnable.c: Replace
ppc_float128_sw with ppc_float128_hw and remove p9vector_hw.

patch.diff
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index deffc4b601c..c0f6599c08b 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -6928,7 +6928,7 @@ (define_insn "floatdidf2"
 (define_insn "floatti2"
   [(set (match_operand:IEEE128 0 "vsx_register_operand" "=v")
(float:IEEE128 (match_operand:TI 1 "vsx_register_operand" "v")))]
-  "TARGET_POWER10"
+  "TARGET_POWER10 && TARGET_FLOAT128_HW"
 {
   return  "xscvsqqp %0,%1";
 }
@@ -6937,7 +6937,7 @@ (define_insn "floatti2"
 (define_insn "floatunsti2"
   [(set (match_operand:IEEE128 0 "vsx_register_operand" "=v")
(unsigned_float:IEEE128 (match_operand:TI 1 "vsx_register_operand" 
"v")))]
-  "TARGET_POWER10"
+  "TARGET_POWER10 && TARGET_FLOAT128_HW"
 {
   return  "xscvuqqp %0,%1";
 }
@@ -6946,7 +6946,7 @@ (define_insn "floatunsti2"
 (define_insn "fix_truncti2"
   [(set (match_operand:TI 0 "vsx_register_operand" "=v")
(fix:TI (match_operand:IEEE128 1 "vsx_register_operand" "v")))]
-  "TARGET_POWER10"
+  "TARGET_POWER10 && TARGET_FLOAT128_HW"
 {
   return  "xscvqpsqz %0,%1";
 }
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index 1272f8b2080..7dd08895bec 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -5157,7 +5157,7 @@ (define_insn "xsxexpqp__"
(unspec:V2DI_DI
  [(match_operand:IEEE128 1 "altivec_register_operand" "v")]
 UNSPEC_VSX_SXEXPDP))]
-  "TARGET_P9_VECTOR"
+  "TARGET_FLOAT128_HW"
   "xsxexpqp %0,%1"
   [(set_attr "type" "vecmove")])

@@ -5176,7 +5176,7 @@ (define_insn "xsxsigqp__"
(unspec:VEC_TI [(match_operand:IEEE128 1
"altivec_register_operand" "v")]
 UNSPEC_VSX_SXSIG))]
-  "TARGET_P9_VECTOR"
+  "TARGET_FLOAT128_HW"
   "xsxsigqp %0,%1"
   [(set_attr "type" "vecmove")])

@@ -5196,7 +5196,7 @@ (define_insn "xsiexpqpf_"
 [(match_operand:IEEE128 1 "altivec_register_operand" "v")
  (match_operand:DI 2 "altivec_register_operand" "v")]
 UNSPEC_VSX_SIEXPQP))]
-  "TARGET_P9_VECTOR"
+  "TARGET_FLOAT128_HW"
   "xsiexpqp %0,%1,%2"
   [(set_attr "type" "vecmove")])

@@ -5208,7 +5208,7 @@ (define_insn "xsiexpqp__"
 (match_operand:V2DI_DI 2
  "altivec_register_operand" "v")]
 UNSPEC_VSX_SIEXPQP))]
-  "TARGET_P9_VECTOR"
+  "TARGET_FLOAT128_HW"
   "xsiexpqp %0,%1,%2"
   [(set_attr "type" "vecmove")])

@@ -5278,7 +5278,7 @@ (define_expand "xscmpexpqp__"
(set (match_operand:SI 0 "register_operand" "=r")
(CMP_TEST:SI (match_dup 3)
 (const_int 0)))]
-  "TARGET_P9_VECTOR"
+  "TARGET_FLOAT128_HW"
 {
   if ( == UNORDERED && !HONOR_NANS (mode))
 {
@@ -5296,7 +5296,7 @@ (define_insn "*xscmpexpqp"
  (match_operand:IEEE128 2 "altivec_register_operand" 
"v")]
  UNSPEC_VSX_SCMPEXPQP)
 (match_operand:SI 3 "zero_constant" "j")))]
-  "TARGET_P9_VECTOR"
+  "TARGET_FLOAT128_HW"
   "xscmpexpqp %0,%1,%2"
   [(set_attr "type" "fpcompare")])

@@ -5315,7 +5315,8 @@ (define_expand "xststdc_"
(set (match_operand:SI 0 "register_operand" "=r")
(eq:SI (match_dup 3)
   (const_int 0)))]
-  "TARGET_P9_VECTOR"
+  "TARGET_P9_VECTOR
+   && (!FLOAT128_IEEE_P (mode) || TARGET_FLOAT128_HW)"
 {
   operands[3] = gen_reg_rtx (CCFPmode);
   operands[4] = CONST0_RTX (SImode);
@@ -5324,7 +5325,8 @@ (define_expand "xststdc_"
 (define_expand "isinf2"
   [(use (match_operand:SI 0 "gpc_reg_operand"))
(use (match_operand:IEEE_FP 1 ""))]
-  "TARGET_HARD_FLOAT && TARGET_P9_VECTOR"
+  "TARGET_P9_VECTOR
+   && (!FLOAT128_IEEE_P (mode) || TARGET_FLOAT128_HW)"
 {
   int mask = VSX_TEST_DATA_CLASS_POS_INF | VSX_TEST_DATA_

[PATCH, rs6000] Remove redundant guard for float128 mode patterns

2024-07-14 Thread HAO CHEN GUI
Hi,
  This patch removes FLOAT128_IEEE_P guard when the mode of pattern
is IEEE128 and FLOAT128_IBM_P when the mode of pattern is IBM128.
The mode iterators already do the checking. So they're redundant.

  Bootstrapped and tested on powerpc64-linux BE and LE with no
regressions. Is it OK for trunk?

Thanks
Gui Haochen

ChangeLog
rs6000: Remove redundant guard for float128 mode patterns

gcc/
* config/rs6000/rs6000.md (movcc, *movcc_p10,
*movcc_invert_p10, *fpmask, *xxsel,
@ieee_128bit_vsx_abs2, *ieee_128bit_vsx_nabs2,
add3, sub3, mul3, div3, sqrt2,
copysign3, copysign3_hard, copysign3_soft,
@neg2_hw, @abs2_hw, *nabs2_hw, fma4_hw,
*fms4_hw, *nfma4_hw, *nfms4_hw,
extend2_hw, truncdf2_hw,
truncsf2_hw, fix_2_hw,
fix_trunc2,
*fix_trunc2_mem,
float_di2_hw, float_si2_hw,
float2, floatuns_di2_hw,
floatuns_si2_hw, floatuns2,
floor2, ceil2, btrunc2, round2,
add3_odd, sub3_odd, mul3_odd, div3_odd,
sqrt2_odd, fma4_odd, *fms4_odd, *nfma4_odd,
*nfms4_odd, truncdf2_odd, *cmp_hw for IEEE128):
Remove guard FLOAT128_IEEE_P.
(@extenddf2_fprs, @extenddf2_vsx,
truncdf2_internal1, truncdf2_internal2,
fix_trunc_helper, neg2, *cmp_internal1,
*cmp_internal2 for IBM128): Remove guard FLOAT128_IBM_P.

patch.diff
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index c0f6599c08b..f22b7ed6256 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -5736,7 +5736,7 @@ (define_expand "movcc"
 (if_then_else:IEEE128 (match_operand 1 "comparison_operator")
   (match_operand:IEEE128 2 "gpc_reg_operand")
   (match_operand:IEEE128 3 "gpc_reg_operand")))]
-  "TARGET_POWER10 && TARGET_FLOAT128_HW && FLOAT128_IEEE_P (mode)"
+  "TARGET_POWER10 && TARGET_FLOAT128_HW"
 {
   if (rs6000_emit_cmove (operands[0], operands[1], operands[2], operands[3]))
 DONE;
@@ -5753,7 +5753,7 @@ (define_insn_and_split "*movcc_p10"
 (match_operand:IEEE128 4 "altivec_register_operand" "v,v")
 (match_operand:IEEE128 5 "altivec_register_operand" "v,v")))
(clobber (match_scratch:V2DI 6 "=0,&v"))]
-  "TARGET_POWER10 && TARGET_FLOAT128_HW && FLOAT128_IEEE_P (mode)"
+  "TARGET_POWER10 && TARGET_FLOAT128_HW"
   "#"
   "&& 1"
   [(set (match_dup 6)
@@ -5785,7 +5785,7 @@ (define_insn_and_split "*movcc_invert_p10"
 (match_operand:IEEE128 4 "altivec_register_operand" "v,v")
 (match_operand:IEEE128 5 "altivec_register_operand" "v,v")))
(clobber (match_scratch:V2DI 6 "=0,&v"))]
-  "TARGET_POWER10 && TARGET_FLOAT128_HW && FLOAT128_IEEE_P (mode)"
+  "TARGET_POWER10 && TARGET_FLOAT128_HW"
   "#"
   "&& 1"
   [(set (match_dup 6)
@@ -5820,7 +5820,7 @@ (define_insn "*fpmask"
 (match_operand:IEEE128 3 "altivec_register_operand" "v")])
 (match_operand:V2DI 4 "all_ones_constant" "")
 (match_operand:V2DI 5 "zero_constant" "")))]
-  "TARGET_POWER10 && TARGET_FLOAT128_HW && FLOAT128_IEEE_P (mode)"
+  "TARGET_POWER10 && TARGET_FLOAT128_HW"
   "xscmp%V1qp %0,%2,%3"
   [(set_attr "type" "fpcompare")])

@@ -5831,7 +5831,7 @@ (define_insn "*xxsel"
 (match_operand:V2DI 2 "zero_constant" ""))
 (match_operand:IEEE128 3 "altivec_register_operand" "v")
 (match_operand:IEEE128 4 "altivec_register_operand" "v")))]
-  "TARGET_POWER10 && TARGET_FLOAT128_HW && FLOAT128_IEEE_P (mode)"
+  "TARGET_POWER10 && TARGET_FLOAT128_HW"
   "xxsel %x0,%x4,%x3,%x1"
   [(set_attr "type" "vecmove")])

@@ -8904,7 +8904,7 @@ (define_insn_and_split "@extenddf2_fprs"
 (match_operand:DF 1 "nonimmediate_operand" "d,m,d")))
(use (match_operand:DF 2 "nonimmediate_operand" "m,m,d"))]
   "!TARGET_VSX && TARGET_HARD_FLOAT
-   && TARGET_LONG_DOUBLE_128 && FLOAT128_IBM_P (mode)"
+   && TARGET_LONG_DOUBLE_128"
   "#"
   "&& reload_completed"
   [(set (match_dup 3) (match_dup 1))
@@ -8921,7 +8921,7 @@ (define_insn_and_split "@extenddf2_vsx"
   [(set (match_operand:IBM128 0 "gpc_reg_operand" "=d,d")
(float_extend:IBM128
 (match_operand:DF 1 "nonimmediate_operand" "wa,m")))]
-  "TARGET_LONG_DOUBLE_128 && TARGET_VSX && FLOAT128_IBM_P (mode)"
+  "TARGET_LONG_DOUBLE_128 && TARGET_VSX"
   "#"
   "&& reload_completed"
   [(set (match_dup 2) (match_dup 1))
@@ -8967,7 +8967,7 @@ (define_insn_and_split "truncdf2_internal1"
   [(set (match_operand:DF 0 "gpc_reg_operand" "=d,?d")
(float_truncate:DF
 (match_operand:IBM128 1 "gpc_reg_operand" "0,d")))]
-  "FLOAT128_IBM_P (mode) && !TARGET_XL_COMPAT
+  "!TARGET_XL_COMPAT
&& TARGET_HARD_FLOAT && TARGET_LONG_DOUBLE_128"
   "@
#
@@ -8983,7 +8983,7 @@ (define_insn_and_split "truncdf2_internal1"
 (define_insn "truncdf2_internal2"
   [(set (match_operand:DF 0 "gpc_reg_operand" "=d")
(float_truncate:DF (match_operand:IBM128 1 "gpc_reg_operand" "d")))]
-  "FLOAT12

Re: [PATCH] AVX512BF16: Do not allow permutation with vcvtne2ps2bf16 [PR115889]

2024-07-14 Thread Hongyu Wang
> Could you just git revert 6d0b7b69d143025f271d0041cfa29cf26e6c343b?

We can still deal with BFmode permutation the same way as HFmode, so
the change in ix86_vectorize_vec_perm_const can be preserved.

Hongtao Liu  于2024年7月15日周一 09:40写道:
>
> On Sat, Jul 13, 2024 at 3:44 PM Hongyu Wang  wrote:
> >
> > Hi,
> >
> > According to the instruction spec of AVX512BF16, the convert from float
> > to BF16 is not a simple truncation. It has special handling for
> > denormal/nan, even for normal float it will add an extra bias according
> > to the least significant bit for bf number. This means we cannot use the
> > vcvtne2ps2bf16 for any bf16 vector shuffle.
> > The optimization introduced in r15-1368 adds a specific split to convert
> > HImode permutation with this instruction, so remove it and treat the
> > BFmode permutation same as HFmode.
> >
> > Bootstrapped & regtested on x86_64-pc-linux-gnu. OK for trunk?
> Could you just git revert 6d0b7b69d143025f271d0041cfa29cf26e6c343b?
> >
> > gcc/ChangeLog:
> >
> > PR target/115889
> > * config/i386/predicates.md (vcvtne2ps2bf_parallel): Remove.
> > * config/i386/sse.md (hi_cvt_bf): Remove.
> > (HI_CVT_BF): Likewise.
> > (vpermt2_sepcial_bf16_shuffle_):Likewise.
> >
> > gcc/testsuite/ChangeLog:
> >
> > PR target/115889
> > * gcc.target/i386/vpermt2-special-bf16-shufflue.c: Adjust option
> > and output scan.
> > ---
> >  gcc/config/i386/predicates.md | 11 --
> >  gcc/config/i386/sse.md| 35 ---
> >  .../i386/vpermt2-special-bf16-shufflue.c  |  5 ++-
> >  3 files changed, 2 insertions(+), 49 deletions(-)
> >
> > diff --git a/gcc/config/i386/predicates.md b/gcc/config/i386/predicates.md
> > index a894847adaf..5d0bb1e0f54 100644
> > --- a/gcc/config/i386/predicates.md
> > +++ b/gcc/config/i386/predicates.md
> > @@ -2327,14 +2327,3 @@ (define_predicate "apx_ndd_add_memory_operand"
> >
> >return true;
> >  })
> > -
> > -;; Check that each element is odd and incrementally increasing from 1
> > -(define_predicate "vcvtne2ps2bf_parallel"
> > -  (and (match_code "const_vector")
> > -   (match_code "const_int" "a"))
> > -{
> > -  for (int i = 0; i < XVECLEN (op, 0); ++i)
> > -if (INTVAL (XVECEXP (op, 0, i)) != (2 * i + 1))
> > -  return false;
> > -  return true;
> > -})
> > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> > index b3b4697924b..c134494cd20 100644
> > --- a/gcc/config/i386/sse.md
> > +++ b/gcc/config/i386/sse.md
> > @@ -31460,38 +31460,3 @@ (define_insn "vpdp_"
> >"TARGET_AVXVNNIINT16"
> >"vpdp\t{%3, %2, %0|%0, %2, %3}"
> > [(set_attr "prefix" "vex")])
> > -
> > -(define_mode_attr hi_cvt_bf
> > -  [(V8HI "v8bf") (V16HI "v16bf") (V32HI "v32bf")])
> > -
> > -(define_mode_attr HI_CVT_BF
> > -  [(V8HI "V8BF") (V16HI "V16BF") (V32HI "V32BF")])
> > -
> > -(define_insn_and_split "vpermt2_sepcial_bf16_shuffle_"
> > -  [(set (match_operand:VI2_AVX512F 0 "register_operand")
> > -   (unspec:VI2_AVX512F
> > - [(match_operand:VI2_AVX512F 1 "vcvtne2ps2bf_parallel")
> > -  (match_operand:VI2_AVX512F 2 "register_operand")
> > -  (match_operand:VI2_AVX512F 3 "nonimmediate_operand")]
> > -  UNSPEC_VPERMT2))]
> > -  "TARGET_AVX512VL && TARGET_AVX512BF16 && ix86_pre_reload_split ()"
> > -  "#"
> > -  "&& 1"
> > -  [(const_int 0)]
> > -{
> > -  rtx op0 = gen_reg_rtx (mode);
> > -  operands[2] = lowpart_subreg (mode,
> > -   force_reg (mode, operands[2]),
> > -   mode);
> > -  operands[3] = lowpart_subreg (mode,
> > -   force_reg (mode, operands[3]),
> > -   mode);
> > -
> > -  emit_insn (gen_avx512f_cvtne2ps2bf16_(op0,
> > -  operands[3],
> > -  operands[2]));
> > -  emit_move_insn (operands[0], lowpart_subreg (mode, op0,
> > -  mode));
> > -  DONE;
> > -}
> > -[(set_attr "mode" "")])
> > diff --git a/gcc/testsuite/gcc.target/i386/vpermt2-special-bf16-shufflue.c 
> > b/gcc/testsuite/gcc.target/i386/vpermt2-special-bf16-shufflue.c
> > index 5c65f2a9884..4cbc85735de 100755
> > --- a/gcc/testsuite/gcc.target/i386/vpermt2-special-bf16-shufflue.c
> > +++ b/gcc/testsuite/gcc.target/i386/vpermt2-special-bf16-shufflue.c
> > @@ -1,7 +1,6 @@
> >  /* { dg-do compile } */
> > -/* { dg-options "-O2 -mavx512bf16 -mavx512vl" } */
> > -/* { dg-final { scan-assembler-not "vpermi2b" } } */
> > -/* { dg-final { scan-assembler-times "vcvtne2ps2bf16" 3 } } */
> > +/* { dg-options "-O2 -mavx512vbmi -mavx512vl" } */
> > +/* { dg-final { scan-assembler-times "vpermi2w" 3 } } */
> >
> >  typedef __bf16 v8bf __attribute__((vector_size(16)));
> >  typedef __bf16 v16bf __attribute__((vector_size(32)));
> > --
> > 2.34.1
> >
>
>
> --
> BR,
> Hongtao


Re: [PATCH] AVX512BF16: Do not allow permutation with vcvtne2ps2bf16 [PR115889]

2024-07-14 Thread Hongtao Liu
On Mon, Jul 15, 2024 at 10:21 AM Hongyu Wang  wrote:
>
> > Could you just git revert 6d0b7b69d143025f271d0041cfa29cf26e6c343b?
>
> We can still deal with BFmode permutation the same way as HFmode, so
> the change in ix86_vectorize_vec_perm_const can be preserved.
>
> Hongtao Liu  于2024年7月15日周一 09:40写道:
> >
> > On Sat, Jul 13, 2024 at 3:44 PM Hongyu Wang  wrote:
> > >
> > > Hi,
> > >
> > > According to the instruction spec of AVX512BF16, the convert from float
> > > to BF16 is not a simple truncation. It has special handling for
> > > denormal/nan, even for normal float it will add an extra bias according
> > > to the least significant bit for bf number. This means we cannot use the
> > > vcvtne2ps2bf16 for any bf16 vector shuffle.
> > > The optimization introduced in r15-1368 adds a specific split to convert
> > > HImode permutation with this instruction, so remove it and treat the
> > > BFmode permutation same as HFmode.
I see, patch LGTM.
> > >
> > > Bootstrapped & regtested on x86_64-pc-linux-gnu. OK for trunk?
> > Could you just git revert 6d0b7b69d143025f271d0041cfa29cf26e6c343b?
> > >
> > > gcc/ChangeLog:
> > >
> > > PR target/115889
> > > * config/i386/predicates.md (vcvtne2ps2bf_parallel): Remove.
> > > * config/i386/sse.md (hi_cvt_bf): Remove.
> > > (HI_CVT_BF): Likewise.
> > > (vpermt2_sepcial_bf16_shuffle_):Likewise.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > > PR target/115889
> > > * gcc.target/i386/vpermt2-special-bf16-shufflue.c: Adjust option
> > > and output scan.
> > > ---
> > >  gcc/config/i386/predicates.md | 11 --
> > >  gcc/config/i386/sse.md| 35 ---
> > >  .../i386/vpermt2-special-bf16-shufflue.c  |  5 ++-
> > >  3 files changed, 2 insertions(+), 49 deletions(-)
> > >
> > > diff --git a/gcc/config/i386/predicates.md b/gcc/config/i386/predicates.md
> > > index a894847adaf..5d0bb1e0f54 100644
> > > --- a/gcc/config/i386/predicates.md
> > > +++ b/gcc/config/i386/predicates.md
> > > @@ -2327,14 +2327,3 @@ (define_predicate "apx_ndd_add_memory_operand"
> > >
> > >return true;
> > >  })
> > > -
> > > -;; Check that each element is odd and incrementally increasing from 1
> > > -(define_predicate "vcvtne2ps2bf_parallel"
> > > -  (and (match_code "const_vector")
> > > -   (match_code "const_int" "a"))
> > > -{
> > > -  for (int i = 0; i < XVECLEN (op, 0); ++i)
> > > -if (INTVAL (XVECEXP (op, 0, i)) != (2 * i + 1))
> > > -  return false;
> > > -  return true;
> > > -})
> > > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> > > index b3b4697924b..c134494cd20 100644
> > > --- a/gcc/config/i386/sse.md
> > > +++ b/gcc/config/i386/sse.md
> > > @@ -31460,38 +31460,3 @@ (define_insn "vpdp_"
> > >"TARGET_AVXVNNIINT16"
> > >"vpdp\t{%3, %2, %0|%0, %2, %3}"
> > > [(set_attr "prefix" "vex")])
> > > -
> > > -(define_mode_attr hi_cvt_bf
> > > -  [(V8HI "v8bf") (V16HI "v16bf") (V32HI "v32bf")])
> > > -
> > > -(define_mode_attr HI_CVT_BF
> > > -  [(V8HI "V8BF") (V16HI "V16BF") (V32HI "V32BF")])
> > > -
> > > -(define_insn_and_split "vpermt2_sepcial_bf16_shuffle_"
> > > -  [(set (match_operand:VI2_AVX512F 0 "register_operand")
> > > -   (unspec:VI2_AVX512F
> > > - [(match_operand:VI2_AVX512F 1 "vcvtne2ps2bf_parallel")
> > > -  (match_operand:VI2_AVX512F 2 "register_operand")
> > > -  (match_operand:VI2_AVX512F 3 "nonimmediate_operand")]
> > > -  UNSPEC_VPERMT2))]
> > > -  "TARGET_AVX512VL && TARGET_AVX512BF16 && ix86_pre_reload_split ()"
> > > -  "#"
> > > -  "&& 1"
> > > -  [(const_int 0)]
> > > -{
> > > -  rtx op0 = gen_reg_rtx (mode);
> > > -  operands[2] = lowpart_subreg (mode,
> > > -   force_reg (mode, operands[2]),
> > > -   mode);
> > > -  operands[3] = lowpart_subreg (mode,
> > > -   force_reg (mode, operands[3]),
> > > -   mode);
> > > -
> > > -  emit_insn (gen_avx512f_cvtne2ps2bf16_(op0,
> > > -  operands[3],
> > > -  operands[2]));
> > > -  emit_move_insn (operands[0], lowpart_subreg (mode, op0,
> > > -  mode));
> > > -  DONE;
> > > -}
> > > -[(set_attr "mode" "")])
> > > diff --git 
> > > a/gcc/testsuite/gcc.target/i386/vpermt2-special-bf16-shufflue.c 
> > > b/gcc/testsuite/gcc.target/i386/vpermt2-special-bf16-shufflue.c
> > > index 5c65f2a9884..4cbc85735de 100755
> > > --- a/gcc/testsuite/gcc.target/i386/vpermt2-special-bf16-shufflue.c
> > > +++ b/gcc/testsuite/gcc.target/i386/vpermt2-special-bf16-shufflue.c
> > > @@ -1,7 +1,6 @@
> > >  /* { dg-do compile } */
> > > -/* { dg-options "-O2 -mavx512bf16 -mavx512vl" } */
> > > -/* { dg-final { scan-assembler-not "vpermi2b" } } */
> > > -/* { dg-final { scan-assembler-times "vcvtne2ps2bf16" 3 } } *

Re: [committed] Fix previously latent bug in reorg affecting cris port

2024-07-14 Thread Hans-Peter Nilsson
> From: Hans-Peter Nilsson 
> Date: Fri, 12 Jul 2024 02:11:45 +0200
> 
> > Date: Wed, 3 Jul 2024 12:46:46 -0600
> > From: Jeff Law 
> 
> > The late-combine patch has triggered a previously latent bug in reorg.
> > 
> > Basically we have a sequence like this in the middle of reorg before we 
> > start relaxing delay slots (cris-elf, gcc.dg/torture/pr98289.c)
> 
> [...]
> 
> > Pushing to the trunk momentarily.

JFTR, for cris-elf, this can't be blamed on (to have been
exposed by) late-combine, because this appeared with
r15-1619-g3b9b8d6cfdf593 "ira: Scale save/restore costs of
callee save registers with block frequency", even with
-fno-late-combine-instructions.

I noticed because I chased another regression, an XPASS,
happening for gcc.dg/tree-ssa/loop-1.c which was also caused
by that revision.

Regarding that commit, checking the generated code for
loop-1.c, that XPASS was reflecting a *regression*, not an
improvement.  To wit, it looks like _foo is no longer stored
in a register for cris-elf, but there's no change in the
number of saved registers.  As coremark results are the same
before/after that commit for cris-elf, I'm not going to make
a fuss; IOW, not open a PR for the regression.  (Phew, one
less rabbit-hole.  I see that patch exposed as many problems
as late-combine!)

Still, a heads-up to the author of that patch.  Maybe the
frequencies are miscalculated for that test-case.  I tried
to look at regs.h:REG_FREQ_FROM_BB, but it's a mystery to
me: its value seems to vary between 1 and 1000, that doesn't
seem right, but that macro's used all over the place.  Not
documented very much though. :(  FAOD, not blaming the author
of r15-1619-g3b9b8d6cfdf593 here.

Also FTR, I had to search a bit to find the patch submission
and review.  It's in the archives of last October:
https://gcc.gnu.org/pipermail/gcc-patches/2023-October/631849.html
as mentioned in another message.

brgds, H-P


Re: [PATCH] [APX NF] Add a pass to convert legacy insn to NF insns

2024-07-14 Thread Hongtao Liu
On Wed, Jul 10, 2024 at 2:46 PM Hongyu Wang  wrote:
>
> Hi,
>
> For APX ccmp, current infrastructure will always generate cstore for
> the ccmp flag user, like
>
> cmpe%rcx, %r8
> ccmpnel %rax, %rbx
> seta%dil
> add %rcx, %r9
> add %r9, %rdx
> testb   %dil, %dil
> je  .L2
>
> For such case, the legacy add clobbers FLAGS_REG so there should have
> extra cstore to avoid the flag be reset before using it. If the
> instructions between flag producer and user are NF insns, the setcc/
> test sequence is not required.
>
> Add a pass to convert legacy flag clobber insns to their NF counterpart.
> The convertion only happens when
> 1. APX_NF enabled.
> 2. For a BB, cstore was find, and there are insns between such cstore
> and next explicit set insn to FLAGS_REG (test or cmp).
> 3. All the insns between should have NF counterpart.
>
> The pass was added after rtl-ifcvt which eliminates some branch when
> profitable, which could cause some flag-clobbering insn put between
> cstore and jcc.
>
> Bootstrapped & regtested on x86_64-pc-linux-gnu and SDE. Also passed
> spec2017 simulation run on SDE.
>
> Ok for trunk?
Ok.
>
> gcc/ChangeLog:
>
> * config/i386/i386.md (has_nf): New define_attr, add to all
> nf related patterns.
> * config/i386/i386-features.cc (apx_nf_convert): New function
> to convert Non-NF insns to their NF counterparts.
> (class pass_apx_nf_convert): New pass class.
> (make_pass_apx_nf_convert): New.
> * config/i386/i386-passes.def: Add pass_apx_nf_convert after
> rtl_ifcvt.
> * config/i386/i386-protos.h (make_pass_apx_nf_convert): Declare.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/apx-nf-2.c: New test.
> ---
>  gcc/config/i386/i386-features.cc | 163 +++
>  gcc/config/i386/i386-passes.def  |   1 +
>  gcc/config/i386/i386-protos.h|   1 +
>  gcc/config/i386/i386.md  |  67 +-
>  gcc/testsuite/gcc.target/i386/apx-nf-2.c |  32 +
>  5 files changed, 259 insertions(+), 5 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/apx-nf-2.c
>
> diff --git a/gcc/config/i386/i386-features.cc 
> b/gcc/config/i386/i386-features.cc
> index fc224ed06b0..3da56ddbdcc 100644
> --- a/gcc/config/i386/i386-features.cc
> +++ b/gcc/config/i386/i386-features.cc
> @@ -3259,6 +3259,169 @@ make_pass_remove_partial_avx_dependency (gcc::context 
> *ctxt)
>return new pass_remove_partial_avx_dependency (ctxt);
>  }
>
> +/* Convert legacy instructions that clobbers EFLAGS to APX_NF
> +   instructions when there are no flag set between a flag
> +   producer and user.  */
> +
> +static unsigned int
> +ix86_apx_nf_convert (void)
> +{
> +  timevar_push (TV_MACH_DEP);
> +
> +  basic_block bb;
> +  rtx_insn *insn;
> +  hash_map  converting_map;
> +  auto_vec  current_convert_list;
> +
> +  bool converting_seq = false;
> +  rtx cc = gen_rtx_REG (CCmode, FLAGS_REG);
> +
> +  FOR_EACH_BB_FN (bb, cfun)
> +{
> +  /* Reset conversion for each bb.  */
> +  converting_seq = false;
> +  FOR_BB_INSNS (bb, insn)
> +   {
> + if (!NONDEBUG_INSN_P (insn))
> +   continue;
> +
> + if (recog_memoized (insn) < 0)
> +   continue;
> +
> + /* Convert candidate insns after cstore, which should
> +satisify the two conditions:
> +1. Is not flag user or producer, only clobbers
> +FLAGS_REG.
> +2. Have corresponding nf pattern.  */
> +
> + rtx pat = PATTERN (insn);
> +
> + /* Starting convertion at first cstorecc.  */
> + rtx set = NULL_RTX;
> + if (!converting_seq
> + && (set = single_set (insn))
> + && ix86_comparison_operator (SET_SRC (set), VOIDmode)
> + && reg_overlap_mentioned_p (cc, SET_SRC (set))
> + && !reg_overlap_mentioned_p (cc, SET_DEST (set)))
> +   {
> + converting_seq = true;
> + current_convert_list.truncate (0);
> +   }
> + /* Terminate at the next explicit flag set.  */
> + else if (reg_set_p (cc, pat)
> +  && GET_CODE (set_of (cc, pat)) != CLOBBER)
> +   converting_seq = false;
> +
> + if (!converting_seq)
> +   continue;
> +
> + if (get_attr_has_nf (insn)
> + && GET_CODE (pat) == PARALLEL)
> +   {
> + /* Record the insn to candidate map.  */
> + current_convert_list.safe_push (insn);
> + converting_map.put (insn, pat);
> +   }
> + /* If the insn clobbers flags but has no nf_attr,
> +revoke all previous candidates.  */
> + else if (!get_attr_has_nf (insn)
> +  && reg_set_p (cc, pat)
> +  && GET_CODE (set_of (cc, pat)) == CLOBBER)
> +   {
> + for (auto item : current_conv

[COMMITTED] CRIS: Adjust gcc.dg/tree-ssa/loop-1.c

2024-07-14 Thread Hans-Peter Nilsson
Committed.
-- >8 --
With r15-1619-g3b9b8d6cfdf593, there's a XPASS and a FAIL
for this test-case for cris-elf.  Looking at the generated
code, _foo is indeed no longer saved in a register for CRIS.
While that looks like a regression, coremark results are the
same around this revision, so simply adjust the test-case:
remove the target-specific exceptions for cris-*-*.

* gcc.dg/tree-ssa/loop-1.c: Remove target-specific test
and xfail to adjust for recent changes in register allocation.
---
 gcc/testsuite/gcc.dg/tree-ssa/loop-1.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/loop-1.c 
b/gcc/testsuite/gcc.dg/tree-ssa/loop-1.c
index a531b7584a64..a8f2c3bbfdb4 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/loop-1.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/loop-1.c
@@ -43,16 +43,15 @@ int xxx(void)
 /* The SH targets always use separate instructions to load the address
and to do the actual call - bsr is only generated by link time
relaxation.  */
-/* CRIS and MSP430 keep the address in a register.  */
+/* MSP430 keeps the address in a register.  */
 /* m68k sometimes puts the address in a register, depending on CPU and PIC.  */
 
-/* { dg-final { scan-assembler-times "foo" 5 { xfail hppa*-*-* ia64*-*-* 
sh*-*-* cris-*-* fido-*-* m68k-*-* i?86-*-mingw* i?86-*-cygwin* x86_64-*-mingw* 
visium-*-* nvptx*-*-* pdp11*-*-* msp430-*-* amdgcn*-*-* } } } */
+/* { dg-final { scan-assembler-times "foo" 5 { xfail hppa*-*-* ia64*-*-* 
sh*-*-* fido-*-* m68k-*-* i?86-*-mingw* i?86-*-cygwin* x86_64-*-mingw* 
visium-*-* nvptx*-*-* pdp11*-*-* msp430-*-* amdgcn*-*-* } } } */
 /* { dg-final { scan-assembler-times "foo,%r" 5 { target hppa*-*-* } } } */
 /* { dg-final { scan-assembler-times "= foo"  5 { target ia64*-*-* } } } */
 /* { dg-final { scan-assembler-times "call\[ \t\]*_foo" 5 { target 
i?86-*-mingw* i?86-*-cygwin* } } } */
 /* { dg-final { scan-assembler-times "call\[ \t\]*foo" 5 { target 
x86_64-*-mingw* } } } */
 /* { dg-final { scan-assembler-times "jsr|bsrf|blink\ttr?,r18"  5 { target 
sh*-*-* } } } */
-/* { dg-final { scan-assembler-times "Jsr \\\$r" 5 { target cris-*-* } } } */
 /* { dg-final { scan-assembler-times "\[jb\]sr" 5 { target fido-*-* m68k-*-* 
pdp11-*-* } } } */
 /* { dg-final { scan-assembler-times "bra *tr,r\[1-9\]*,r21" 5 { target 
visium-*-* } } } */
 /* { dg-final { scan-assembler-times "(?n)\[ \t\]call\[ \t\].*\[ \t\]foo," 5 { 
target nvptx*-*-* } } } */
-- 
2.30.2



Re: [i386] adjust flag_omit_frame_pointer in a single function [PR113719] (was: Re: [PATCH] [i386] restore recompute to override opts after change [PR113719])

2024-07-14 Thread Hongtao Liu
On Thu, Jul 11, 2024 at 9:07 PM Alexandre Oliva  wrote:
>
> On Jul  4, 2024, Alexandre Oliva  wrote:
>
> > On Jul  3, 2024, Rainer Orth  wrote:
>
> > Hmm, I wonder if leaf frame pointer has to do with that.
>
> It did, in a way.
>
> 
>
> The first two patches for PR113719 have each regressed
> gcc.dg/ipa/iinline-attr.c on a different target.  The reason for this
> instability is that there are competing flag_omit_frame_pointer
> overriders on x86:
>
> - ix86_recompute_optlev_based_flags computes and sets a
>   -f[no-]omit-frame-pointer default depending on
>   USE_IX86_FRAME_POINTER and, in 32-bit mode, optimize_size
>
> - ix86_option_override_internal enables flag_omit_frame_pointer for
>   -momit-leaf-frame-pointer to take effect
>
> ix86_option_override[_internal] calls
> ix86_recompute_optlev_based_flags before setting
> flag_omit_frame_pointer.  It is called during global process_options.
>
> But ix86_recompute_optlev_based_flags is also called by
> parse_optimize_options, during attribute processing, and at that
> point, ix86_option_override is not called, so the final overrider for
> global options is not applied to the optimize attributes.  If they
> differ, the testcase fails.
>
> In order to fix this, we need to process all overriders of this option
> whenever we process any of them.  Since this setting is affected by
> optimization options, it makes sense to compute it in
> parse_optimize_options, rather than in process_options.
>
> Regstrapped on x86_64-linux-gnu.  Also verified that the regression is
> cured with a i686-solaris cross compiler.  Ok to install?
Ok. thanks.
>
>
> for  gcc/ChangeLog
>
> PR target/113719
> * config/i386/i386-options.cc (ix86_option_override_internal):
> Move flag_omit_frame_pointer final overrider...
> (ix86_recompute_optlev_based_flags): ... here.
> ---
>  gcc/config/i386/i386-options.cc |   12 ++--
>  1 file changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/gcc/config/i386/i386-options.cc b/gcc/config/i386/i386-options.cc
> index 5824c0cb072eb..059ef3ae6ad44 100644
> --- a/gcc/config/i386/i386-options.cc
> +++ b/gcc/config/i386/i386-options.cc
> @@ -1911,6 +1911,12 @@ ix86_recompute_optlev_based_flags (struct gcc_options 
> *opts,
> opts->x_flag_pcc_struct_return = DEFAULT_PCC_STRUCT_RETURN;
> }
>  }
> +
> +  /* Keep nonleaf frame pointers.  */
> +  if (opts->x_flag_omit_frame_pointer)
> +opts->x_target_flags &= ~MASK_OMIT_LEAF_FRAME_POINTER;
> +  else if (TARGET_OMIT_LEAF_FRAME_POINTER_P (opts->x_target_flags))
> +opts->x_flag_omit_frame_pointer = 1;
>  }
>
>  /* Implement part of TARGET_OVERRIDE_OPTIONS_AFTER_CHANGE hook.  */
> @@ -2590,12 +2596,6 @@ ix86_option_override_internal (bool main_args_p,
>  opts->x_target_flags |= MASK_NO_RED_ZONE;
>  }
>
> -  /* Keep nonleaf frame pointers.  */
> -  if (opts->x_flag_omit_frame_pointer)
> -opts->x_target_flags &= ~MASK_OMIT_LEAF_FRAME_POINTER;
> -  else if (TARGET_OMIT_LEAF_FRAME_POINTER_P (opts->x_target_flags))
> -opts->x_flag_omit_frame_pointer = 1;
> -
>/* If we're doing fast math, we don't care about comparison order
>   wrt NaNs.  This lets us use a shorter comparison sequence.  */
>if (opts->x_flag_finite_math_only)
>
>
> --
> Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
>Free Software Activist   GNU Toolchain Engineer
> More tolerance and less prejudice are key for inclusion and diversity
> Excluding neuro-others for not behaving ""normal"" is *not* inclusive



-- 
BR,
Hongtao


Re: [COMMITTED] CRIS: Adjust gcc.dg/tree-ssa/loop-1.c

2024-07-14 Thread Hans-Peter Nilsson
> From: Hans-Peter Nilsson 
> Date: Mon, 15 Jul 2024 05:06:43 +0200

> With r15-1619-g3b9b8d6cfdf593, there's a XPASS and a FAIL
> for this test-case for cris-elf.  Looking at the generated
> code, _foo is indeed no longer saved in a register for CRIS.
> While that looks like a regression, coremark results are the
> same around this revision, so simply adjust the test-case:
> remove the target-specific exceptions for cris-*-*.

Oh my...  That "sameness" was due to fumblefingers on my
part.  Sorry about that.  There is indeed a performance
regression at "-O2 -march=v10" for cris-elf for coremark.
Not a big one; going from 5179918 to 5181696 cycles gets me
0.034%, but still.  Maybe there are other targets affected
negatively by r15-1619-g3b9b8d6cfdf593, so I opened PR115932
to keep track.

brgds, H-P


[PATCH] i386: extend trunc{128}2{16,32,64}'s scope.

2024-07-14 Thread Hu, Lin1
Hi, all

Based on actual usage, trunc{128}2{16,32,64} use some instructions from
sse/sse3, so extend their scope to extend the scope of optimization.

Bootstraped and regtest on x86-64-linux-gnu, OK for trunk?

BRs,
Lin

gcc/ChangeLog:

PR target/107432
* config/i386/sse.md
(PMOV_SRC_MODE_3_AVX2): Add TARGET_AVX2 for V4DI and V8SI.
(PMOV_SRC_MODE_4): Add TARGET_AVX2 for V4DI.
(trunc2): Change constraint from TARGET_AVX2 to
TARGET_SSSE3.
(trunc2): Ditto.
(truncv2div2si2): Change constraint from TARGET_AVX2 to TARGET_SSE.

gcc/testsuite/ChangeLog:

PR target/107432
* gcc.target/i386/pr107432-10.c: New test.
---
 gcc/config/i386/sse.md  | 11 +++---
 gcc/testsuite/gcc.target/i386/pr107432-10.c | 41 +
 2 files changed, 47 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr107432-10.c

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index b3b4697924b..72f3c7df297 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -15000,7 +15000,8 @@ (define_expand 
"_2_mask_store"
   "TARGET_AVX512VL")
 
 (define_mode_iterator PMOV_SRC_MODE_3 [V4DI V2DI V8SI V4SI (V8HI 
"TARGET_AVX512BW")])
-(define_mode_iterator PMOV_SRC_MODE_3_AVX2 [V4DI V2DI V8SI V4SI V8HI])
+(define_mode_iterator PMOV_SRC_MODE_3_AVX2
+ [(V4DI "TARGET_AVX2") V2DI (V8SI "TARGET_AVX2") V4SI V8HI])
 (define_mode_attr pmov_dst_3_lower
   [(V4DI "v4qi") (V2DI "v2qi") (V8SI "v8qi") (V4SI "v4qi") (V8HI "v8qi")])
 (define_mode_attr pmov_dst_3
@@ -15014,7 +15015,7 @@ (define_expand "trunc2"
   [(set (match_operand: 0 "register_operand")
(truncate:
  (match_operand:PMOV_SRC_MODE_3_AVX2 1 "register_operand")))]
-  "TARGET_AVX2"
+  "TARGET_SSSE3"
 {
   if (TARGET_AVX512VL
   && (mode != V8HImode || TARGET_AVX512BW))
@@ -15390,7 +15391,7 @@ (define_insn_and_split 
"avx512vl_v8qi2_mask_store_2"
  (match_dup 2)))]
   "operands[0] = adjust_address_nv (operands[0], V8QImode, 0);")
 
-(define_mode_iterator PMOV_SRC_MODE_4 [V4DI V2DI V4SI])
+(define_mode_iterator PMOV_SRC_MODE_4 [(V4DI "TARGET_AVX2") V2DI V4SI])
 (define_mode_attr pmov_dst_4
   [(V4DI "V4HI") (V2DI "V2HI") (V4SI "V4HI")])
 (define_mode_attr pmov_dst_4_lower
@@ -15404,7 +15405,7 @@ (define_expand "trunc2"
   [(set (match_operand: 0 "register_operand")
(truncate:
  (match_operand:PMOV_SRC_MODE_4 1 "register_operand")))]
-  "TARGET_AVX2"
+  "TARGET_SSSE3"
 {
   if (TARGET_AVX512VL)
 {
@@ -15659,7 +15660,7 @@ (define_expand "truncv2div2si2"
   [(set (match_operand:V2SI 0 "register_operand")
(truncate:V2SI
  (match_operand:V2DI 1 "register_operand")))]
-  "TARGET_AVX2"
+  "TARGET_SSE"
 {
   if (TARGET_AVX512VL)
 {
diff --git a/gcc/testsuite/gcc.target/i386/pr107432-10.c 
b/gcc/testsuite/gcc.target/i386/pr107432-10.c
new file mode 100644
index 000..57edf7cfc78
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr107432-10.c
@@ -0,0 +1,41 @@
+/* { dg-do compile } */
+/* { dg-options "-march=x86-64-v2 -O2" } */
+/* { dg-final { scan-assembler-times "shufps" 1 } } */
+/* { dg-final { scan-assembler-times "pshufb" 5 } } */
+
+#include 
+
+typedef short __v2hi __attribute__ ((__vector_size__ (4)));
+typedef char __v2qi __attribute__ ((__vector_size__ (2)));
+typedef char __v4qi __attribute__ ((__vector_size__ (4)));
+typedef char __v8qi __attribute__ ((__vector_size__ (8)));
+
+__v2si mm_cvtepi64_epi32_builtin_convertvector(__v2di a)
+{
+  return __builtin_convertvector((__v2di)a, __v2si);
+}
+
+__v2hi mm_cvtepi64_epi16_builtin_convertvector(__m128i a)
+{
+  return __builtin_convertvector((__v2di)a, __v2hi);
+}
+
+__v4hi mm_cvtepi32_epi16_builtin_convertvector(__m128i a)
+{
+  return __builtin_convertvector((__v4si)a, __v4hi);
+}
+
+__v2qi mm_cvtepi64_epi8_builtin_convertvector(__m128i a)
+{
+  return __builtin_convertvector((__v2di)a, __v2qi);
+}
+
+__v4qi mm_cvtepi32_epi8_builtin_convertvector(__m128i a)
+{
+  return __builtin_convertvector((__v4si)a, __v4qi);
+}
+
+__v8qi mm_cvtepi16_epi8_builtin_convertvector(__m128i a)
+{
+  return __builtin_convertvector((__v8hi)a, __v8qi);
+}
-- 
2.31.1



[committed] RISC-V: Implement locality for __builtin_prefetch

2024-07-14 Thread Monk Chiang
The patch add the Zihintntl instructions in the prefetch pattern.
Zicbop has prefetch instructions. Zihintntl has NTL instructions.
Insert NTL instructions before prefetch instruction, if target
has Zihintntl extension.

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_print_operand): Add 'L' letter
to print zihintntl instructions string.
* config/riscv/riscv.md (prefetch): Add zihintntl instructions.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/prefetch-zicbop.c: New test.
* gcc.target/riscv/prefetch-zihintntl.c: New test.
---
 gcc/config/riscv/riscv.cc | 22 +++
 gcc/config/riscv/riscv.md | 10 ++---
 .../gcc.target/riscv/prefetch-zicbop.c| 20 +
 .../gcc.target/riscv/prefetch-zihintntl.c | 20 +
 4 files changed, 69 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/prefetch-zicbop.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/prefetch-zihintntl.c

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 53ab2f1a881..084a592a313 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -6488,6 +6488,7 @@ riscv_asm_output_opcode (FILE *asm_out_file, const char 
*p)
'A' Print the atomic operation suffix for memory model OP.
'I' Print the LR suffix for memory model OP.
'J' Print the SC suffix for memory model OP.
+   'L' Print a non-temporal locality hints instruction.
'z' Print x0 if OP is zero, otherwise print OP normally.
'i' Print i if the operand is not a register.
'S' Print shift-index of single-bit mask OP.
@@ -6682,6 +6683,27 @@ riscv_print_operand (FILE *file, rtx op, int letter)
   break;
 }
 
+case 'L':
+  {
+   const char *ntl_hint = NULL;
+   switch (INTVAL (op))
+ {
+ case 0:
+   ntl_hint = "ntl.all";
+   break;
+ case 1:
+   ntl_hint = "ntl.pall";
+   break;
+ case 2:
+   ntl_hint = "ntl.p1";
+   break;
+ }
+
+  if (ntl_hint)
+   asm_fprintf (file, "%s\n\t", ntl_hint);
+  break;
+  }
+
 case 'i':
   if (code != REG)
 fputs ("i", file);
diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 379015c60de..46c46039c33 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -4113,12 +4113,16 @@
 {
   switch (INTVAL (operands[1]))
   {
-case 0: return "prefetch.r\t%a0";
-case 1: return "prefetch.w\t%a0";
+case 0: return TARGET_ZIHINTNTL ? "%L2prefetch.r\t%a0" : "prefetch.r\t%a0";
+case 1: return TARGET_ZIHINTNTL ? "%L2prefetch.w\t%a0" : "prefetch.w\t%a0";
 default: gcc_unreachable ();
   }
 }
-  [(set_attr "type" "store")])
+  [(set_attr "type" "store")
+   (set (attr "length") (if_then_else (and (match_test "TARGET_ZIHINTNTL")
+  (match_test "IN_RANGE (INTVAL 
(operands[2]), 0, 2)"))
+ (const_string "8")
+ (const_string "4")))])
 
 (define_insn "riscv_prefetchi_"
   [(unspec_volatile:X [(match_operand:X 0 "address_operand" "r")
diff --git a/gcc/testsuite/gcc.target/riscv/prefetch-zicbop.c 
b/gcc/testsuite/gcc.target/riscv/prefetch-zicbop.c
new file mode 100644
index 000..0faa120f1f7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/prefetch-zicbop.c
@@ -0,0 +1,20 @@
+/* { dg-do compile target { { rv64-*-*}}} */
+/* { dg-options "-march=rv64gc_zicbop -mabi=lp64" } */
+
+void foo (char *p)
+{
+  __builtin_prefetch (p, 0, 0);
+  __builtin_prefetch (p, 0, 1);
+  __builtin_prefetch (p, 0, 2);
+  __builtin_prefetch (p, 0, 3);
+  __builtin_prefetch (p, 1, 0);
+  __builtin_prefetch (p, 1, 1);
+  __builtin_prefetch (p, 1, 2);
+  __builtin_prefetch (p, 1, 3);
+}
+
+/* { dg-final { scan-assembler-not "ntl.all\t" } } */
+/* { dg-final { scan-assembler-not "ntl.pall\t" } } */
+/* { dg-final { scan-assembler-not "ntl.p1\t" } } */
+/* { dg-final { scan-assembler-times "prefetch.r" 4 } } */
+/* { dg-final { scan-assembler-times "prefetch.w" 4 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/prefetch-zihintntl.c 
b/gcc/testsuite/gcc.target/riscv/prefetch-zihintntl.c
new file mode 100644
index 000..78a3afe6833
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/prefetch-zihintntl.c
@@ -0,0 +1,20 @@
+/* { dg-do compile target { { rv64-*-*}}} */
+/* { dg-options "-march=rv64gc_zicbop_zihintntl -mabi=lp64" } */
+
+void foo (char *p)
+{
+  __builtin_prefetch (p, 0, 0);
+  __builtin_prefetch (p, 0, 1);
+  __builtin_prefetch (p, 0, 2);
+  __builtin_prefetch (p, 0, 3);
+  __builtin_prefetch (p, 1, 0);
+  __builtin_prefetch (p, 1, 1);
+  __builtin_prefetch (p, 1, 2);
+  __builtin_prefetch (p, 1, 3);
+}
+
+/* { dg-final { scan-assembler-times "ntl.all" 2 } } */
+/* { dg-final { scan-assembler-times "ntl.pall" 2 } } */
+/* { dg-final { scan-assembler-times "ntl.p1" 2 } } *

Re: [PATCH] i386: extend trunc{128}2{16,32,64}'s scope.

2024-07-14 Thread Hongtao Liu
On Mon, Jul 15, 2024 at 1:39 PM Hu, Lin1  wrote:
>
> Hi, all
>
> Based on actual usage, trunc{128}2{16,32,64} use some instructions from
> sse/sse3, so extend their scope to extend the scope of optimization.
>
> Bootstraped and regtest on x86-64-linux-gnu, OK for trunk?
Ok.
>
> BRs,
> Lin
>
> gcc/ChangeLog:
>
> PR target/107432
> * config/i386/sse.md
> (PMOV_SRC_MODE_3_AVX2): Add TARGET_AVX2 for V4DI and V8SI.
> (PMOV_SRC_MODE_4): Add TARGET_AVX2 for V4DI.
> (trunc2): Change constraint from TARGET_AVX2 
> to
> TARGET_SSSE3.
> (trunc2): Ditto.
> (truncv2div2si2): Change constraint from TARGET_AVX2 to TARGET_SSE.
>
> gcc/testsuite/ChangeLog:
>
> PR target/107432
> * gcc.target/i386/pr107432-10.c: New test.
> ---
>  gcc/config/i386/sse.md  | 11 +++---
>  gcc/testsuite/gcc.target/i386/pr107432-10.c | 41 +
>  2 files changed, 47 insertions(+), 5 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr107432-10.c
>
> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> index b3b4697924b..72f3c7df297 100644
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -15000,7 +15000,8 @@ (define_expand 
> "_2_mask_store"
>"TARGET_AVX512VL")
>
>  (define_mode_iterator PMOV_SRC_MODE_3 [V4DI V2DI V8SI V4SI (V8HI 
> "TARGET_AVX512BW")])
> -(define_mode_iterator PMOV_SRC_MODE_3_AVX2 [V4DI V2DI V8SI V4SI V8HI])
> +(define_mode_iterator PMOV_SRC_MODE_3_AVX2
> + [(V4DI "TARGET_AVX2") V2DI (V8SI "TARGET_AVX2") V4SI V8HI])
>  (define_mode_attr pmov_dst_3_lower
>[(V4DI "v4qi") (V2DI "v2qi") (V8SI "v8qi") (V4SI "v4qi") (V8HI "v8qi")])
>  (define_mode_attr pmov_dst_3
> @@ -15014,7 +15015,7 @@ (define_expand "trunc2"
>[(set (match_operand: 0 "register_operand")
> (truncate:
>   (match_operand:PMOV_SRC_MODE_3_AVX2 1 "register_operand")))]
> -  "TARGET_AVX2"
> +  "TARGET_SSSE3"
>  {
>if (TARGET_AVX512VL
>&& (mode != V8HImode || TARGET_AVX512BW))
> @@ -15390,7 +15391,7 @@ (define_insn_and_split 
> "avx512vl_v8qi2_mask_store_2"
>   (match_dup 2)))]
>"operands[0] = adjust_address_nv (operands[0], V8QImode, 0);")
>
> -(define_mode_iterator PMOV_SRC_MODE_4 [V4DI V2DI V4SI])
> +(define_mode_iterator PMOV_SRC_MODE_4 [(V4DI "TARGET_AVX2") V2DI V4SI])
>  (define_mode_attr pmov_dst_4
>[(V4DI "V4HI") (V2DI "V2HI") (V4SI "V4HI")])
>  (define_mode_attr pmov_dst_4_lower
> @@ -15404,7 +15405,7 @@ (define_expand "trunc2"
>[(set (match_operand: 0 "register_operand")
> (truncate:
>   (match_operand:PMOV_SRC_MODE_4 1 "register_operand")))]
> -  "TARGET_AVX2"
> +  "TARGET_SSSE3"
>  {
>if (TARGET_AVX512VL)
>  {
> @@ -15659,7 +15660,7 @@ (define_expand "truncv2div2si2"
>[(set (match_operand:V2SI 0 "register_operand")
> (truncate:V2SI
>   (match_operand:V2DI 1 "register_operand")))]
> -  "TARGET_AVX2"
> +  "TARGET_SSE"
>  {
>if (TARGET_AVX512VL)
>  {
> diff --git a/gcc/testsuite/gcc.target/i386/pr107432-10.c 
> b/gcc/testsuite/gcc.target/i386/pr107432-10.c
> new file mode 100644
> index 000..57edf7cfc78
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr107432-10.c
> @@ -0,0 +1,41 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=x86-64-v2 -O2" } */
> +/* { dg-final { scan-assembler-times "shufps" 1 } } */
> +/* { dg-final { scan-assembler-times "pshufb" 5 } } */
> +
> +#include 
> +
> +typedef short __v2hi __attribute__ ((__vector_size__ (4)));
> +typedef char __v2qi __attribute__ ((__vector_size__ (2)));
> +typedef char __v4qi __attribute__ ((__vector_size__ (4)));
> +typedef char __v8qi __attribute__ ((__vector_size__ (8)));
> +
> +__v2si mm_cvtepi64_epi32_builtin_convertvector(__v2di a)
> +{
> +  return __builtin_convertvector((__v2di)a, __v2si);
> +}
> +
> +__v2hi mm_cvtepi64_epi16_builtin_convertvector(__m128i a)
> +{
> +  return __builtin_convertvector((__v2di)a, __v2hi);
> +}
> +
> +__v4hi mm_cvtepi32_epi16_builtin_convertvector(__m128i a)
> +{
> +  return __builtin_convertvector((__v4si)a, __v4hi);
> +}
> +
> +__v2qi mm_cvtepi64_epi8_builtin_convertvector(__m128i a)
> +{
> +  return __builtin_convertvector((__v2di)a, __v2qi);
> +}
> +
> +__v4qi mm_cvtepi32_epi8_builtin_convertvector(__m128i a)
> +{
> +  return __builtin_convertvector((__v4si)a, __v4qi);
> +}
> +
> +__v8qi mm_cvtepi16_epi8_builtin_convertvector(__m128i a)
> +{
> +  return __builtin_convertvector((__v8hi)a, __v8qi);
> +}
> --
> 2.31.1
>


-- 
BR,
Hongtao


Re: [RFC] Proposal to support Packed Boolean Vector masks.

2024-07-14 Thread Tejas Belagod

On 7/12/24 6:40 PM, Richard Biener wrote:

On Fri, Jul 12, 2024 at 3:05 PM Jakub Jelinek  wrote:


On Fri, Jul 12, 2024 at 02:56:53PM +0200, Richard Biener wrote:

Padding is only an issue for very small vectors - the obvious choice is
to disallow vector types that would require any padding.  I can hardly
see where those are faster than using a vector of up to 4 char elements.
Problematic are 1-bit elements with 4, 2 or one element vectors, 2-bit elements
with 2 or one element vectors and 4-bit elements with 1 element vectors.


I'd really like to avoid having to support something like
_BitInt(16372) __attribute__((vector_size (sizeof (_BitInt(16372)) * 16)))
_BitInt(2) to say size of long long could be acceptable.


I'd disallow _BitInt(n) with n >= 8, it should be just the syntactic way to say
the element should have n (< 8) bits.


I have no idea what the stance of supporting _BitInt in C++ are,
but most certainly diverging support (or even semantics) of the
vector extension in C vs. C++ is undesirable.


I believe Clang supports it in C++ next to C, GCC doesn't and Jason didn't
look favorably to _BitInt support in C++, so at least until something like
that is standardized in C++ the answer is probably no.


OK, I think that rules out _BitInt use here so while bool is then natural
for 1-bit elements for 2-bit and 4-bit elements we'd have to specify the
number of bits explicitly.  There is signed_bool_precision but like
vector_mask it's use is restricted to the GIMPLE frontend because
interaction with the rest of the language isn't defined.



Thanks for all the suggestions - really insightful (to me) discussions.

Yeah, BitInt seemed like it was best placed for this, but not having C++ 
support is definitely a blocker. But as you say, in the absence of 
BitInt, bool becomes the natural choice for bit sizes 1, 2 and 4. One 
way to specify non-1-bit widths could be overloading vector_size.


Also, I think overloading GIMPLE's vector_mask takes us into the 
earlier-discussed territory of what it should actually mean - it meaning 
the target truth type in GIMPLE and a generic vector extension in the FE 
will probably confuse gcc developers more than users.



That said - we're mixing two things here.  The desire to have "proper"
svbool (fix: declare in the backend) and the desire to have "packed"
bit-precision vectors (for whatever actual reason) as part of the
GCC vector extension.



If we leave lane-disambiguation of svbool to the backend, the values I 
see in supporting 1, 2 and 4 bitsizes are 1) first step towards 
supporting BitInt(N) vectors possibly in the future 2) having a way for 
targets to define their intrinsics' bool vector types using GNU 
extensions 3) feature parity with Clang's ext_vector_type?


I believe the primary motivation for Clang to support ext_vector_type 
was to have a way to define target intrinsics' vector bool type using 
vector extensions.


Thanks,
Tejas.


Richard.


 Jakub