date:20241128

[patch, wwwdocs, committed] Fix link to unsigned integers

2024-11-28 Thread Thomas Koenig


Hello world,

a change in the section heading of the documentation broke a link
in gcc-15/changes.html, fixed with this patch.

Best regards

Thomas

Author: Thomas Koenig 
Date:   Fri Nov 29 07:19:36 2024 +0100

Correct link to unsigned integers for Fortran.

* htdocs/gcc-15.changes.html: Correct link.

diff --git a/htdocs/gcc-15/changes.html b/htdocs/gcc-15/changes.html
index 6c9ebaac..23866bde 100644
--- a/htdocs/gcc-15/changes.html
+++ b/htdocs/gcc-15/changes.html
@@ -141,7 +141,7 @@ a work-in-progress.
   logical{8,16,32,64} and real16 were 
added.

   Experimental support for unsigned integers, enabled by
   -funsigned; see - 
href="https://gcc.gnu.org/onlinedocs/gfortran/Experimental-features-for-Fortran-202Y.html";

+  href="https://gcc.gnu.org/onlinedocs/gfortran/Unsigned-integers.html";
   >gfortran documentation for details. These have been proposed
   (href="https://j3-fortran.org/doc/year/24/24-116.txt";>J3/24-116)

   for inclusion in the next Fortran standard.

Re: gimplify: Handle void BIND_EXPR as asm input [PR100501]

2024-11-28 Thread Richard Biener

On Fri, Nov 29, 2024 at 3:04 AM Joseph Myers  wrote:
>
> As reported in bug 100501 (plus duplicates), the gimplifier ICEs for C
> tests involving a statement expression not returning a value as an asm
> input.
>
> The expected diagnostic for this case (as seen for C++ input) is one
> coming from the gimplifier and so it seems reasonable to fix the
> gimplifier to handle the GENERIC generated for this case by the C
> front end, rather than trying to make the C front end detect it
> earlier.  Thus, adjust two places in the gimplifier to work with
> gimplifying a BIND_EXPR changing *expr_p to NULL_TREE.
>
> Bootstrapped with no regressions for x86_64-pc-linux-gnu.  OK to commit?
>
> PR c/100501
>
> gcc/
> * gimplify.cc (gimplify_expr): Do not call gimple_test_f on
> *expr_p when it has become null.
> (gimplify_asm_expr): Handle TREE_VALUE (link) becoming null.
>
> gcc/testsuite/
> * gcc.dg/pr100501-1.c: New test.
>
> diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc
> index fb0ca23bfb6c..090f8987d5d3 100644
> --- a/gcc/gimplify.cc
> +++ b/gcc/gimplify.cc
> @@ -7457,6 +7457,13 @@ gimplify_asm_expr (tree *expr_p, gimple_seq *pre_p, 
> gimple_seq *post_p)
> TREE_VALUE (link) = error_mark_node;
>   tret = gimplify_expr (&TREE_VALUE (link), pre_p, post_p,
> is_gimple_lvalue, fb_lvalue | fb_mayfail);
> + if (TREE_VALUE (link) == NULL_TREE)

I think we're trying to handle errorneous cases by setting TREE_VALUE
to error_mark_node
before this, so how about the following instead?

diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc
index fb0ca23bfb6..aa99c0a98f7 100644
--- a/gcc/gimplify.cc
+++ b/gcc/gimplify.cc
@@ -7453,7 +7453,8 @@ gimplify_asm_expr (tree *expr_p, gimple_seq
*pre_p, gimple_seq *post_p)
  || TREE_CODE (inputv) == PREINCREMENT_EXPR
  || TREE_CODE (inputv) == POSTDECREMENT_EXPR
  || TREE_CODE (inputv) == POSTINCREMENT_EXPR
- || TREE_CODE (inputv) == MODIFY_EXPR)
+ || TREE_CODE (inputv) == MODIFY_EXPR
+ || VOID_TYPE_P (TREE_TYPE (inputv)))
TREE_VALUE (link) = error_mark_node;
  tret = gimplify_expr (&TREE_VALUE (link), pre_p, post_p,
is_gimple_lvalue, fb_lvalue | fb_mayfail);


> +   {
> + /* This can occur when an asm input is a BIND_EXPR for a
> +statement expression not returning a value.  */
> + tret = GS_ERROR;
> + TREE_VALUE (link) = error_mark_node;
> +   }
>   if (tret != GS_ERROR)
> {
>   /* Unlike output operands, memory inputs are not guaranteed
> @@ -19662,10 +19669,11 @@ gimplify_expr (tree *expr_p, gimple_seq *pre_p, 
> gimple_seq *post_p,
>
>/* Otherwise we're gimplifying a subexpression, so the resulting
>   value is interesting.  If it's a valid operand that matches
> - GIMPLE_TEST_F, we're done. Unless we are handling some
> - post-effects internally; if that's the case, we need to copy into
> - a temporary before adding the post-effects to POST_P.  */
> -  if (gimple_seq_empty_p (internal_post) && (*gimple_test_f) (*expr_p))
> + GIMPLE_TEST_F, or it's now NULL_TREE, we're done.  Unless we are
> + handling some post-effects internally; if that's the case, we need
> + to copy into a temporary before adding the post-effects to POST_P.  */
> +  if (gimple_seq_empty_p (internal_post)
> +  && (!*expr_p || (*gimple_test_f) (*expr_p)))
>  goto out;
>
>/* Otherwise, we need to create a new temporary for the gimplified
> diff --git a/gcc/testsuite/gcc.dg/pr100501-1.c 
> b/gcc/testsuite/gcc.dg/pr100501-1.c
> new file mode 100644
> index ..b5b3781a9c2f
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/pr100501-1.c
> @@ -0,0 +1,26 @@
> +/* Test ICE for statement expression returning no value as asm input (bug
> +   100501).  */
> +/* { dg-do compile } */
> +/* { dg-options "" } */
> +
> +int x;
> +int g ();
> +
> +void
> +f ()
> +{
> +  __asm__ ("" : : "m" (({}))); /* { dg-error "memory input 0 is not directly 
> addressable" } */
> +  __asm__ ("" : : "m" (({ ; }))); /* { dg-error "memory input 0 is not 
> directly addressable" } */
> +  __asm__ ("" : : "m" (({ (void) 0; }))); /* { dg-error "memory input 0 is 
> not directly addressable" } */
> +  __asm__ ("" : : "m" (({ f (); }))); /* { dg-error "memory input 0 is not 
> directly addressable|using result of function returning 'void'" } */
> +  __asm__ ("" : : "m" (({ f (); f (); }))); /* { dg-error "memory input 0 is 
> not directly addressable" } */
> +  __asm__ ("" : : "m" (({ x = g (); f (); }))); /* { dg-error "memory input 
> 0 is not directly addressable" } */
> +  __asm__ ("" : : "m" (({ if (1) g (); }))); /* { dg-error "memory input 0 
> is not directly addressable" } */
> +  __asm__ ("" : : "m" (({ if (1) g (); else g (); }))); /* { dg-error 
> "memory input 0 is not directly ad

[PATCH] arm: Add CDE options for star-mc1 cpu

2024-11-28 Thread Arvin Zhong

Hi GCC reviewers,

The star-mc1 CPU is an Armv8-m Mainline CPU supporting ARM CDE feature.
The attached is the patch to support adding CDE options for -mcpu=star-mc1.
The patch has been built and tested on the GCC upstream with arm-none-eabi.

Is it OK for trunk?

Thanks.

Best Regards,
Arvin Zhong



0001-arm-Add-CDE-options-for-star-mc1-cpu.patch
Description: 0001-arm-Add-CDE-options-for-star-mc1-cpu.patch

Re: [PATCH 11/15] Support for 64-bit location_t: RTL parts

2024-11-28 Thread Alexandre Oliva

On Nov 20, 2024, Richard Biener  wrote:

> On Sun, Nov 3, 2024 at 11:27 PM Lewis Hyatt  wrote:

>> While testing this with --enable-checking=rtl, I came across one place in
>> final.cc that seems to be a (currently) harmless misuse of RTL:
>> 
>> set_cur_block_to_this_block:
>> if (! this_block)
>> {
>> if (INSN_LOCATION (insn) == UNKNOWN_LOCATION)
>> continue;
>> else
>> this_block = DECL_INITIAL (cfun->decl);
>> }
>> 
>> In this part of reemit_insn_block_notes(), the insn variable could actually
>> be a NOTE and not an INSN. In that case, INSN_LOCATION() shouldn't be
>> called on it. It works fine currently because the field is properly accessed
>> by XINT() either way. (For an INSN, it is a location, but for a NOTE, it is
>> the note type enum). Currently, if insn is a NOTE, the comparison must
>> always be false because the note type is not equal to
>> 0==UNKNOWN_LOCATION. Once locations and ints are differentiated, this line
>> leads to a checking failure, which I resolved by checking for the NOTE_P
>> case before calling INSN_LOCATION.

>> if (! this_block)
>> {
>> - if (INSN_LOCATION (insn) == UNKNOWN_LOCATION)
>> + if (!NOTE_P (insn) && INSN_LOCATION (insn) == UNKNOWN_LOCATION)
>> continue;
>> else

> I think you instead want

>if (NOTE_P (insn)
>|| INSN_LOCATION (insn) == UNKNOWN_LOCATION)
>  continue;

> but the whole if (! this_block) block doesn't make sense to me ... I think
> we only get here for NOTE_P via

>   case NOTE_INSN_BEGIN_STMT:
>   case NOTE_INSN_INLINE_ENTRY:
> this_block = LOCATION_BLOCK (NOTE_MARKER_LOCATION (insn));
> goto set_cur_block_to_this_block;

> so possibly a !this_block case should be made explicit there, by checking
> NOTE_MARKER_LOCATION for UNKNOWN_LOCATION.  CCing Alex who
> might know.

I don't recall for sure, but I suspect my assumption back then was that
this_block taken from the marker would never or hardly ever be NULL, but
that it would be good to check before attempting to change the scope to
a NULL block.

I may have missed the distinction between INSN_LOCATION and RTL_LOCATION
there (my notes from back then don't even mention this, only the taking
of a location from the marker, which the case does), but your proposed
change looks reasonable.  Leaving that bit alone, moving the label down
and adding a test before the goto would also be fine.

I'm not entirely sure what the best thing to do in case the note doesn't
carry location information, or the referenced block is missing or
however else it could be NULL.  Staying in the previous scope is
somewhat sensible, but it amounts to silently dropping debug
information; a visible change of scope might be preferred, even if it's
to the whole-function scope, lacking a more sensible one.

-- 
Alexandre Oliva, happy hacker https://FSFLA.org/blogs/lxo/
   Free Software ActivistGNU Toolchain Engineer
Learn the truth about Richard Stallman at https://stallmansupport.org/

[PATCH] ifcombine: avoid unsound forwarder-enabled combinations [PR117723]

2024-11-28 Thread Alexandre Oliva



When ifcombining contiguous blocks, we can follow forwarder blocks and
reverse conditions to enable combinations, but when there are
intervening blocks, we have to constrain ourselves to paths to the
exit that share the PHI args with all intervening blocks.

Avoiding considering forwarders when intervening blocks were present
would match the preexisting test, but we can do better, recording in
case a forwarded path corresponds to the outer block's exit path, and
insisting on not combining through any other path but the one that was
verified as corresponding.  The latter is what this patch implements.

While at that, I've fixed some typos, introduced early testing before
computing the exit path to avoid it when computing it would be
wasteful, or when avoiding it can enable other sound combinations.

Regstrapped on x86_64-linux-gnu.  Ok to install?


for  gcc/ChangeLog

PR tree-optimization/117723
* tree-ssa-ifcombine.cc (tree_ssa_ifcombine_bb): Record
forwarder blocks in path to exit, and stick to them.  Avoid
computing the exit if obviously not needed, and if that
enables additional optimizations.
(tree_ssa_ifcombine_bb_1): Fix typos.

for  gcc/testsuite/ChangeLog

PR tree-optimization/117723
* gcc.dg/torture/ifcmb-1.c: New.
---
 gcc/testsuite/gcc.dg/torture/ifcmb-1.c |   63 +
 gcc/tree-ssa-ifcombine.cc  |  116 +++-
 2 files changed, 161 insertions(+), 18 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/torture/ifcmb-1.c

diff --git a/gcc/testsuite/gcc.dg/torture/ifcmb-1.c 
b/gcc/testsuite/gcc.dg/torture/ifcmb-1.c
new file mode 100644
index 0..2431a548598fc
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/ifcmb-1.c
@@ -0,0 +1,63 @@
+/* { dg-do run } */
+
+/* Test that we do NOT perform unsound transformations for any of these cases.
+   Forwarding blocks to the exit block used to enable some of them.  */
+
+[[gnu::noinline]]
+int f0 (int a, int b) {
+  if ((a & 1))
+return 0;
+  if (b)
+return 1;
+  if (!(a & 2))
+return 0;
+  else
+return 1;
+}
+
+[[gnu::noinline]]
+int f1 (int a, int b) {
+  if (!(a & 1))
+return 0;
+  if (b)
+return 1;
+  if ((a & 2))
+return 1;
+  else
+return 0;
+}
+
+[[gnu::noinline]]
+int f2 (int a, int b) {
+  if ((a & 1))
+return 0;
+  if (b)
+return 1;
+  if (!(a & 2))
+return 0;
+  else
+return 1;
+}
+
+[[gnu::noinline]]
+int f3 (int a, int b) {
+  if (!(a & 1))
+return 0;
+  if (b)
+return 1;
+  if ((a & 2))
+return 1;
+  else
+return 0;
+}
+
+int main() {
+  if (f0 (0, 1) != 1)
+__builtin_abort();
+  if (f1 (1, 1) != 1)
+__builtin_abort();
+  if (f2 (2, 1) != 1)
+__builtin_abort();
+  if (f3 (3, 1) != 1)
+__builtin_abort();
+}
diff --git a/gcc/tree-ssa-ifcombine.cc b/gcc/tree-ssa-ifcombine.cc
index e389b12aa37db..a87bf1210776f 100644
--- a/gcc/tree-ssa-ifcombine.cc
+++ b/gcc/tree-ssa-ifcombine.cc
@@ -1077,7 +1077,7 @@ tree_ssa_ifcombine_bb_1 (basic_block inner_cond_bb, 
basic_block outer_cond_bb,
 }
 
   /* The || form is characterized by a common then_bb with the
- two edges leading to it mergable.  The latter is guaranteed
+ two edges leading to it mergeable.  The latter is guaranteed
  by matching PHI arguments in the then_bb and the inner cond_bb
  having no side-effects.  */
   if (phi_pred_bb != then_bb
@@ -1088,7 +1088,7 @@ tree_ssa_ifcombine_bb_1 (basic_block inner_cond_bb, 
basic_block outer_cond_bb,
   
 if (q) goto then_bb; else goto inner_cond_bb;
   
-if (q) goto then_bb; else goto ...;
+if (p) goto then_bb; else goto ...;
   
 ...
*/
@@ -1104,7 +1104,7 @@ tree_ssa_ifcombine_bb_1 (basic_block inner_cond_bb, 
basic_block outer_cond_bb,
   
 if (q) goto inner_cond_bb; else goto then_bb;
   
-if (q) goto then_bb; else goto ...;
+if (p) goto then_bb; else goto ...;
   
 ...
*/
@@ -1139,13 +1139,18 @@ tree_ssa_ifcombine_bb (basic_block inner_cond_bb)
  Look for an OUTER_COND_BBs to combine with INNER_COND_BB.  They need not
  be contiguous, as long as inner and intervening blocks have no side
  effects, and are either single-entry-single-exit or conditionals choosing
- between the same EXIT_BB with the same PHI args, and the path leading to
- INNER_COND_BB.  ??? We could potentially handle multi-block
- single-entry-single-exit regions, but the loop below only deals with
- single-entry-single-exit individual intervening blocks.  Larger regions
- without side effects are presumably rare, so it's probably not worth the
- effort.  */
-  for (basic_block bb = inner_cond_bb, outer_cond_bb, exit_bb = NULL;
+ between the same EXIT_BB with the same PHI args, possibly through an
+ EXIT_PRED, and the path leading to INNER_COND_BB.  EXIT_PRED wi

Middle-end patch ping

2024-11-28 Thread Jakub Jelinek

Hi!

According to my notes, from the
https://gcc.gnu.org/pipermail/gcc-patches/2024-November/669367.html
patch ping the following patches are awaiting middle-end patch review
and nothing else:

https://gcc.gnu.org/pipermail/gcc-patches/2024-November/669774.html
  expr, c, gimplify: Don't clear whole unions [PR116416]
  C FE/testsuite approved, middle-end remains

https://gcc.gnu.org/pipermail/gcc-patches/2024-November/667949.html
  inline-asm, i386: Add "redzone" clobber support
  i386 part approved, there is no C/C++ FE part, middle-end remains

https://gcc.gnu.org/pipermail/gcc-patches/2024-November/668554.html
  Add support for nonnull_if_nonzero attribute [PR117023]
  C family approved, middle-end remains

https://gcc.gnu.org/pipermail/gcc-patches/2024-November/668798.html
  Introduce feeble_inline attribute [PR93008]
  With the suggested s/DECL_OPTIMIZABLE_INLINE_P/DECL_AGGRESSIBLE_INLINE_P/
  C approved, C++ wasn't but I'd even think the gcc/cp/ changes are obvious,
  middle-end remains

https://gcc.gnu.org/pipermail/gcc-patches/2024-November/668303.html
  c++, dyninit: Optimize C++ dynamic initialization by constants into 
DECL_INITIAL adjustment [PR102876]
  middle-end

At least the first 3 patches will allow commit of dependent patches
which were already fully approved.

Thanks

Jakub

Re: [PATCH] driver: -fhardened and -z lazy/-z norelro [PR117739]

2024-11-28 Thread Dimitri John Ledkov

Did bootstrap with gcc-14 (clean cherrypick, minor offsets).
Built and tested on arm64 & x86_64.
It resolved the reported problem.
Thank you for this patch.


On Tue, 26 Nov 2024, 22:37 Marek Polacek,  wrote:

> Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?
>
> -- >8 --
> As the manual states, using "-fhardened -fstack-protector" will produce
> a warning because -fhardened wants to enable -fstack-protector-strong,
> but it can't since it's been overriden by the weaker -fstack-protector.
>
> -fhardened also attempts to enable -Wl,-z,relro,-z,now.  By the same
> logic as above, "-fhardened -z norelro" or "-fhardened -z lazy" should
> produce the same warning.  But we don't detect this combination, so
> this patch fixes it.  I also renamed a variable to better reflect its
> purpose.
>
> Also don't check warn_hardened in process_command, since it's always
> true there.
>
> Also tweak wording in the manual as Jon Wakely suggested on IRC.
>
> PR driver/117739
>
> gcc/ChangeLog:
>
> * doc/invoke.texi: Tweak wording for -Whardened.
> * gcc.cc (driver_handle_option): If -z lazy or -z norelro was
> specified, don't enable linker hardening.
> (process_command): Don't check warn_hardened.
>
> gcc/testsuite/ChangeLog:
>
> * c-c++-common/fhardened-16.c: New test.
> * c-c++-common/fhardened-17.c: New test.
> * c-c++-common/fhardened-18.c: New test.
> * c-c++-common/fhardened-19.c: New test.
> * c-c++-common/fhardened-20.c: New test.
> * c-c++-common/fhardened-21.c: New test.
> ---
>  gcc/doc/invoke.texi   |  4 ++--
>  gcc/gcc.cc| 20 ++--
>  gcc/testsuite/c-c++-common/fhardened-16.c |  5 +
>  gcc/testsuite/c-c++-common/fhardened-17.c |  5 +
>  gcc/testsuite/c-c++-common/fhardened-18.c |  5 +
>  gcc/testsuite/c-c++-common/fhardened-19.c |  5 +
>  gcc/testsuite/c-c++-common/fhardened-20.c |  5 +
>  gcc/testsuite/c-c++-common/fhardened-21.c |  5 +
>  8 files changed, 46 insertions(+), 8 deletions(-)
>  create mode 100644 gcc/testsuite/c-c++-common/fhardened-16.c
>  create mode 100644 gcc/testsuite/c-c++-common/fhardened-17.c
>  create mode 100644 gcc/testsuite/c-c++-common/fhardened-18.c
>  create mode 100644 gcc/testsuite/c-c++-common/fhardened-19.c
>  create mode 100644 gcc/testsuite/c-c++-common/fhardened-20.c
>  create mode 100644 gcc/testsuite/c-c++-common/fhardened-21.c
>
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 346ac1369b8..371f723539c 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -7012,8 +7012,8 @@ This warning is enabled by @option{-Wall}.
>  Warn when @option{-fhardened} did not enable an option from its set (for
>  which see @option{-fhardened}).  For instance, using @option{-fhardened}
>  and @option{-fstack-protector} at the same time on the command line causes
> -@option{-Whardened} to warn because @option{-fstack-protector-strong} is
> -not enabled by @option{-fhardened}.
> +@option{-Whardened} to warn because @option{-fstack-protector-strong} will
> +not be enabled by @option{-fhardened}.
>
>  This warning is enabled by default and has effect only when
> @option{-fhardened}
>  is enabled.
> diff --git a/gcc/gcc.cc b/gcc/gcc.cc
> index 92c92996401..d2718d263bb 100644
> --- a/gcc/gcc.cc
> +++ b/gcc/gcc.cc
> @@ -305,9 +305,10 @@ static size_t dumpdir_length = 0;
> driver added to dumpdir after dumpbase or linker output name.  */
>  static bool dumpdir_trailing_dash_added = false;
>
> -/* True if -r, -shared, -pie, or -no-pie were specified on the command
> -   line.  */
> -static bool any_link_options_p;
> +/* True if -r, -shared, -pie, -no-pie, -z lazy, or -z norelro were
> +   specified on the command line, and therefore -fhardened should not
> +   add -z now/relro.  */
> +static bool avoid_linker_hardening_p;
>
>  /* True if -static was specified on the command line.  */
>  static bool static_p;
> @@ -4434,10 +4435,17 @@ driver_handle_option (struct gcc_options *opts,
> }
> /* Record the part after the last comma.  */
> add_infile (arg + prev, "*");
> +   if (strcmp (arg, "-z,lazy") == 0 || strcmp (arg, "-z,norelro") ==
> 0)
> + avoid_linker_hardening_p = true;
>}
>do_save = false;
>break;
>
> +case OPT_z:
> +  if (strcmp (arg, "lazy") == 0 || strcmp (arg, "norelro") == 0)
> +   avoid_linker_hardening_p = true;
> +  break;
> +
>  case OPT_Xlinker:
>add_infile (arg, "*");
>do_save = false;
> @@ -4642,7 +4650,7 @@ driver_handle_option (struct gcc_options *opts,
>  case OPT_r:
>  case OPT_shared:
>  case OPT_no_pie:
> -  any_link_options_p = true;
> +  avoid_linker_hardening_p = true;
>break;
>
>  case OPT_static:
> @@ -5026,7 +5034,7 @@ process_command (unsigned int decoded_options_count,
>/* TODO: check if -static -pie works and maybe u

Re: [PATCH][PR117704] testsuite: Fix test failure on x86_32 by adding -mfpmath=sse+387

2024-11-28 Thread Jakub Jelinek

On Thu, Nov 28, 2024 at 11:20:31AM +, Jennifer Schmitz wrote:
> The test gcc.dg/tree-ssa/pow_fold_1.c was failing for 32-bit x86 due to
> incompatibility of '-fexcess-precision=16' with '-mfpmath=387'.
> In order to resolve this, this patch adds -msse -mfpmath=sse+387 for i?86-*-*.
> 
> We tested this by running the test on an x86_64 machine with
> --target_board={unix/-m32}.
> OK for mainline?
> 
> Signed-off-by: Jennifer Schmitz 
> 
> gcc/testsuite/
>   PR testsuite/117704
>   * gcc.dg/tree-ssa/pow_fold_1.c: Add -msse -mfpmath=sse+387
>   for i?86-*-*.
> ---
>  gcc/testsuite/gcc.dg/tree-ssa/pow_fold_1.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pow_fold_1.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/pow_fold_1.c
> index d98bcb0827e..cb9d52e9653 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/pow_fold_1.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pow_fold_1.c
> @@ -1,6 +1,7 @@
>  /* { dg-do compile } */
>  /* { dg-options "-Ofast -fdump-tree-optimized -fexcess-precision=16" } */
>  /* { dg-add-options float16 } */
> +/* { dg-additional-options "-msse -mfpmath=sse+387" { target { i?86-*-* } } 
> } */

i?86-*-* shouldn't be used in target selectors alone, it doesn't mean much.
One can also use -m32 on x86_64-*-*, or i?86-*-* can be multilib compiler.
So, either it should be i?86-*-* x86_64-*-* or if one wants to limit it just
to 32-bit compilation on that target (but why in this case?), then
ia32 or { i?86-*-* x86_64-*-* } && ia32 etc.

>  /* { dg-require-effective-target float16_runtime } */
>  /* { dg-require-effective-target c99_runtime } */

Jakub

Re: [PATCH] Introduce feeble_inline attribute [PR93008]

2024-11-28 Thread Jakub Jelinek

On Thu, Nov 28, 2024 at 11:23:02AM +0100, Richard Biener wrote:
> Sorry for chiming in only late - to me this shows that the desire to inline
> a function more than another function, currently identified as
> DECL_DECLARED_INLINE_P overlaps with frontend semantic differences.
> But don't we reflect those semantic differences into the IL (via linkage,
> symtab details) already?

After handling it in the FE, yes, we do.  We still want
DECL_DECLARED_INLINE_P also e.g. for debug info generation.

>  So what would remain is a way for the user
> to distinguish between auto-inline (we have the corresponding -auto
> set of --params) and inline-inline.  The middle-end interface after your
> change, where DECL_DECLARED_INLINE_P means inline-inline
> unless !DECL_OPTIMIZABLE_INLINE_P looks a bit awkward.
> 
> Rather than clearing DECL_DECLARED_INLINE_P I'd suggest to
> split both completely and turn DECL_DISREGARD_INLINE_LIMITS,
> DECL_UNINLINABLE and auto-inline vs. inline-inline into a
> multi-bit enum and only use that for inlining decisions (ignoring
> DECL_DECLARED_INLINE_P for that purpose, but use that
> and feeble_inline to compute the enum value).

I think a 4 state flag { never_inline, default, auto_inline, always_inline }
would be fine.  The question is how to call the macro(s) and values
and how to merge those from different decls and what we do currently
e.g. for noinline, always_inline, on the same or on different decls
of the same function.

> Note I've had to lookup what 'feeble' means - given we use -auto

feeble was used in the meaning of synonym to weak, as I wrote,
weak_inline could be confusing.
Another possibility would be weaker_inline though, that one can't
confuse with weak attribute.

> for --params I'd have chosen __attribute__((auto_inline)), possibly
> "completed" by __attribute__((inline)) to mark a function as
> wanting 'inline' heuristics but not 'inline' semantics.

I think auto_inline and inline would be just confusing, even in the negative
forms.  We actually "auto-inline" even functions not declared inline, just
with different heuristics.

Jakub

[PATCH v2] RISC-V: Minimal support for ssdbltrp and smdbltrp extension.

2024-11-28 Thread Dongyan Chen

This patch support ssdbltrp[1] and smdbltrp[2] extension.
To enable GCC to recognize and process ssdbltrp and smdbltrp extension 
correctly at compile time.

[1] https://github.com/riscv/riscv-isa-manual/blob/main/src/ssdbltrp.adoc
[2] https://github.com/riscv/riscv-isa-manual/blob/main/src/smdbltrp.adoc

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc: New extension.
* common/config/riscv/riscv-ext-bitmask.def (RISCV_EXT_BITMASK): Ditto.
* config/riscv/riscv.opt: New mask.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/arch-45.c: New test.
* gcc.target/riscv/arch-46.c: New test.

---
 gcc/common/config/riscv/riscv-common.cc   | 6 ++
 gcc/common/config/riscv/riscv-ext-bitmask.def | 2 ++
 gcc/config/riscv/riscv.opt| 4 
 gcc/testsuite/gcc.target/riscv/arch-45.c  | 5 +
 gcc/testsuite/gcc.target/riscv/arch-46.c  | 5 +
 5 files changed, 22 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/arch-45.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/arch-46.c

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index 4c9a72d1180..608f0950f0f 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -222,6 +222,8 @@ static const riscv_implied_info_t riscv_implied_info[] =
   {"sscofpmf", "zicsr"},
   {"ssstateen", "zicsr"},
   {"sstc", "zicsr"},
+  {"ssdbltrp", "zicsr"},
+  {"smdbltrp", "zicsr"},
 
   {"xsfvcp", "zve32x"},
 
@@ -401,6 +403,8 @@ static const struct riscv_ext_version 
riscv_ext_version_table[] =
   {"sscofpmf",  ISA_SPEC_CLASS_NONE, 1, 0},
   {"ssstateen", ISA_SPEC_CLASS_NONE, 1, 0},
   {"sstc",  ISA_SPEC_CLASS_NONE, 1, 0},
+  {"ssdbltrp",  ISA_SPEC_CLASS_NONE, 1, 0},
+  {"smdbltrp",  ISA_SPEC_CLASS_NONE, 1, 0},
 
   {"svinval", ISA_SPEC_CLASS_NONE, 1, 0},
   {"svnapot", ISA_SPEC_CLASS_NONE, 1, 0},
@@ -1725,6 +1729,8 @@ static const riscv_ext_flag_table_t 
riscv_ext_flag_table[] =
   RISCV_EXT_FLAG_ENTRY ("svinval", x_riscv_sv_subext, MASK_SVINVAL),
   RISCV_EXT_FLAG_ENTRY ("svnapot", x_riscv_sv_subext, MASK_SVNAPOT),
   RISCV_EXT_FLAG_ENTRY ("svvptc", x_riscv_sv_subext, MASK_SVVPTC),
+  RISCV_EXT_FLAG_ENTRY ("ssdbltrp", x_riscv_sv_subext, MASK_SSDBLTRP),
+  RISCV_EXT_FLAG_ENTRY ("smdbltrp", x_riscv_sv_subext, MASK_SMDBLTRP),
 
   RISCV_EXT_FLAG_ENTRY ("ztso", x_riscv_ztso_subext, MASK_ZTSO),
 
diff --git a/gcc/common/config/riscv/riscv-ext-bitmask.def 
b/gcc/common/config/riscv/riscv-ext-bitmask.def
index a733533df98..9814b887b2d 100644
--- a/gcc/common/config/riscv/riscv-ext-bitmask.def
+++ b/gcc/common/config/riscv/riscv-ext-bitmask.def
@@ -80,5 +80,7 @@ RISCV_EXT_BITMASK ("zcf", 1,  5)
 RISCV_EXT_BITMASK ("zcmop",1,  6)
 RISCV_EXT_BITMASK ("zawrs",1,  7)
 RISCV_EXT_BITMASK ("svvptc",   1,  8)
+RISCV_EXT_BITMASK ("ssdbltrp", 1,  9)
+RISCV_EXT_BITMASK ("smdbltrp", 1,  10)
 
 #undef RISCV_EXT_BITMASK
diff --git a/gcc/config/riscv/riscv.opt b/gcc/config/riscv/riscv.opt
index a6a61a83db1..5900da57ca2 100644
--- a/gcc/config/riscv/riscv.opt
+++ b/gcc/config/riscv/riscv.opt
@@ -468,6 +468,10 @@ Mask(SVNAPOT) Var(riscv_sv_subext)
 
 Mask(SVVPTC) Var(riscv_sv_subext)
 
+Mask(SSDBLTRP) Var(riscv_sv_subext)
+
+Mask(SMDBLTRP) Var(riscv_sv_subext)
+
 TargetVariable
 int riscv_ztso_subext
 
diff --git a/gcc/testsuite/gcc.target/riscv/arch-45.c 
b/gcc/testsuite/gcc.target/riscv/arch-45.c
new file mode 100644
index 000..85e2510b40a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/arch-45.c
@@ -0,0 +1,5 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc_ssdbltrp -mabi=lp64" } */
+int foo()
+{
+}
diff --git a/gcc/testsuite/gcc.target/riscv/arch-46.c 
b/gcc/testsuite/gcc.target/riscv/arch-46.c
new file mode 100644
index 000..c95cc729cce
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/arch-46.c
@@ -0,0 +1,5 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc_smdbltrp -mabi=lp64" } */
+int foo()
+{
+}
-- 
2.43.0

Re: [PATCH] Fortran: fix crash with bounds check writing array section [PR117791]

2024-11-28 Thread Paul Richard Thomas

Hi Harald and Jerry,

I cannot see why the segfault is occurring of course:
  _gfortran_transfer_character_write (&dt_parm.9, &"line 4:"[1]{lb:
1 sz: 1}, 7);
  {
struct array01_integer(kind=4) parm.10;
integer(kind=8) D.4841;
struct array01_integer(kind=4) parm.11;
integer(kind=8) D.4848;
struct array01_integer(kind=4) parm.12;

D.4841 = (integer(kind=8)) sort_2 ((integer(kind=4)[0:] *)
parm.10.data);  // parm10 not set.

I am going to see if stopping the call to
'add_check_section_in_array_bounds' when an inner loop is evaluating an
argument for a function call in the outer loop does the trick.

That said, while your patch seems to be a bit hacky, it does fix the
problem in a sensible way. I just worry about potential corner cases since
it is the call to gfc_conv_expr_descriptor that causes the problem.

Cheers

Paul


On Wed, 27 Nov 2024 at 21:35, Harald Anlauf  wrote:

> Am 27.11.24 um 21:56 schrieb Jerry D:
> > On 11/27/24 12:31 PM, Harald Anlauf wrote:
> >> Dear all,
> >>
> >> the attached patch fixes a wrong-code issue with bounds-checking
> >> enabled when doing I/O of an array section and an index is either
> >> an expression or a function result.  The problem does not occur
> >> without bounds-checking.
> >>
> >> When looking at the original testcase, the function occuring in
> >> the affected index was evaluated twice, once with wrong arguments.
> >>
> >> The most simple solution appears to fall back to scalarization
> >> with bounds-checking enabled.  If someone has a quick idea to
> >> handle this better, please speak up!
> >>
> >> Regtested on x86_64-pc-linux-gnu.  OK for mainline?
> >>
> >> This seems to be a 14/15 regression, so a backport is advisable.
> >>
> >> Thanks,
> >> Harald
> >>
> >
> > The patch looks OK to me.
> >
> > I wonder if this fall back to the scalarizer should be done everywhere
> > if a a user has specified bounds checking, what is the point of
> > optimizing array references?
>
> If an array reference is of the type A(:,f()), there is no need to
> do bounds-checking for the first array index (we don't, so OK),
> and we also could pass the array slice to a library function that
> handles the section in one go, without generating a loop with calls.
> Scalarization is then sort of a missed-optimization.
>
> The problem is that the second argument is somehow evaluated twice
> with bounds-checking, but only with the I/O optimization.  I did not
> see such an issue when assigning A(:,f()) to a temporary rank-1 array
> and passing that array to the write().  It did create the right bounds
> check, and called f() correctly just once.
>
> Instead of creating a temporary, just passing to the scalarizer was
> the simpler solution.  Maybe Paul has an idea to solve this in a
> better way.
>
> > If the code works in 13 maybe we need to isolate to what broke it and
> > intervene at that place.
>
> Looking at the tree-dump, no bounds-check was generated in 13.
> I did some work to extend bounds-checking during 14-development,
> and the testcase may have just uncovered a latent issue?
>
> (And we sometimes evaluate functions way too often, see e.g. pr114021,
> so there's no lack of possibly related issues...)
>
> > Also go ahead with back porting if no other ideas pop up.  I just fear
> > we are covering up something else.
>
> I'll wait until tomorrow to see if Paul intervenes.  Otherwise I will
> proceed and push.
>
> Thanks for the review and discussion!
>
> Harald
>
> > Jerry
> >
> >
>
>

Re: [PATCH] Introduce feeble_inline attribute [PR93008]

2024-11-28 Thread Richard Biener

On Thu, 28 Nov 2024, Jakub Jelinek wrote:

> On Thu, Nov 28, 2024 at 11:23:02AM +0100, Richard Biener wrote:
> > Sorry for chiming in only late - to me this shows that the desire to inline
> > a function more than another function, currently identified as
> > DECL_DECLARED_INLINE_P overlaps with frontend semantic differences.
> > But don't we reflect those semantic differences into the IL (via linkage,
> > symtab details) already?
> 
> After handling it in the FE, yes, we do.  We still want
> DECL_DECLARED_INLINE_P also e.g. for debug info generation.
> 
> >  So what would remain is a way for the user
> > to distinguish between auto-inline (we have the corresponding -auto
> > set of --params) and inline-inline.  The middle-end interface after your
> > change, where DECL_DECLARED_INLINE_P means inline-inline
> > unless !DECL_OPTIMIZABLE_INLINE_P looks a bit awkward.
> > 
> > Rather than clearing DECL_DECLARED_INLINE_P I'd suggest to
> > split both completely and turn DECL_DISREGARD_INLINE_LIMITS,
> > DECL_UNINLINABLE and auto-inline vs. inline-inline into a
> > multi-bit enum and only use that for inlining decisions (ignoring
> > DECL_DECLARED_INLINE_P for that purpose, but use that
> > and feeble_inline to compute the enum value).
> 
> I think a 4 state flag { never_inline, default, auto_inline, always_inline }
> would be fine.  The question is how to call the macro(s) and values
> and how to merge those from different decls and what we do currently
> e.g. for noinline, always_inline, on the same or on different decls
> of the same function.

Well, the same question stands now, just that we can easily end up
with non-sensical flag combos like DECL_UNINLINABLE and
DECL_DISREGARD_INLINE_LIMITS set.  The enum would get rid of such
nonsense and instead require to define how conflicting attributes
would merge (I'd say last wins, just like with command-line flags,
with possibly diagnosing the earlier ignored one).

DECL_INLINE_SETTING would be my proposed name for the enum,
the enum flags you proposed look good besides that I think
'default' is actually 'auto_inline' and 'inline' is the
auto-inline with stronger inline preference (previously
DECL_DECLARED_INLINE_P).

> > Note I've had to lookup what 'feeble' means - given we use -auto
> 
> feeble was used in the meaning of synonym to weak, as I wrote,
> weak_inline could be confusing.
> Another possibility would be weaker_inline though, that one can't
> confuse with weak attribute.

Joseph already approved the feeble_inline name, so I can live with it.
>From the implementation side I'd have prefered to more easily
associate it with inline vs. auto-inline.

> > for --params I'd have chosen __attribute__((auto_inline)), possibly
> > "completed" by __attribute__((inline)) to mark a function as
> > wanting 'inline' heuristics but not 'inline' semantics.
> 
> I think auto_inline and inline would be just confusing, even in the negative
> forms.  We actually "auto-inline" even functions not declared inline, just
> with different heuristics.

But inline __attribute__((feeble_inline)) is exactly 'auto-inline', no?
I'm confused.

Richard.

Re: [PATCH] Introduce feeble_inline attribute [PR93008]

2024-11-28 Thread Jakub Jelinek

On Thu, Nov 28, 2024 at 01:03:01PM +0100, Richard Biener wrote:
> > I think auto_inline and inline would be just confusing, even in the negative
> > forms.  We actually "auto-inline" even functions not declared inline, just
> > with different heuristics.
> 
> But inline __attribute__((feeble_inline)) is exactly 'auto-inline', no?

No.  inline is that 'auto-inline' (with the meaning, do what IPA does with
DECL_DECLARED_INLINE_P right now, use it as a hint to inline stuff, higher
limits and the like).
inline __attribute__((feeble_inline)) is that 'default', i.e. for IPA
purposes handle it as if it wasn't explicitly inline.  Except that the FE
do what they should do with any inline, e.g. make it comdat, for constexpr
constexpr, handle static variables in those specially, ...

Jakub

[PATCH] builtins: Handle BITINT_TYPE in __builtin_iseqsig folding [PR117802]

2024-11-28 Thread Jakub Jelinek

Hi!

In check_builtin_function_arguments in the _BitInt patchset I've changed
INTEGER_TYPE tests to INTEGER_TYPE or BITINT_TYPE, but haven't done the
same in fold_builtin_iseqsig, which now ICEs because of that.

The following patch fixes that.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

BTW, that TYPE_PRECISION (type0) >= TYPE_PRECISION (type1) test
for REAL_TYPE vs. REAL_TYPE looks pretty random and dangerous, I think
it would be useful to handle this builtin also in the C and C++ FEs,
if both arguments have REAL_TYPE, use the FE specific routine to decide
which types to use and error if a comparison between types would be
erroneous (e.g. complain about _Decimal* vs. float/double/long
double/_Float*, pick up the preferred type, complain about
__ibm128 vs. _Float128 in C++, etc.).
But the FEs can just promote one argument to the other in that case
and keep fold_builtin_iseqsig as is for say Fortran and other FEs.

2024-11-28  Jakub Jelinek  

PR c/117802
* builtins.cc (fold_builtin_iseqsig): Handle BITINT_TYPE like
INTEGER_TYPE.

* gcc.dg/builtin-iseqsig-1.c: New test.
* gcc.dg/bitint-118.c: New test.

--- gcc/builtins.cc.jj  2024-11-27 14:33:07.522815405 +0100
+++ gcc/builtins.cc 2024-11-27 16:36:41.111547052 +0100
@@ -9946,9 +9946,11 @@ fold_builtin_iseqsig (location_t loc, tr
 /* Choose the wider of two real types.  */
 cmp_type = TYPE_PRECISION (type0) >= TYPE_PRECISION (type1)
   ? type0 : type1;
-  else if (code0 == REAL_TYPE && code1 == INTEGER_TYPE)
+  else if (code0 == REAL_TYPE
+  && (code1 == INTEGER_TYPE || code1 == BITINT_TYPE))
 cmp_type = type0;
-  else if (code0 == INTEGER_TYPE && code1 == REAL_TYPE)
+  else if ((code0 == INTEGER_TYPE || code0 == BITINT_TYPE)
+  && code1 == REAL_TYPE)
 cmp_type = type1;
 
   arg0 = builtin_save_expr (fold_convert_loc (loc, cmp_type, arg0));
--- gcc/testsuite/gcc.dg/builtin-iseqsig-1.c.jj 2024-11-27 16:45:00.951518847 
+0100
+++ gcc/testsuite/gcc.dg/builtin-iseqsig-1.c2024-11-27 17:03:48.02966 
+0100
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-options "" } */
+
+int
+foo (float x, int y)
+{
+  return __builtin_iseqsig (x, y) * 2 + __builtin_iseqsig (y, x);
+}
+
+int
+bar (double x, unsigned long y)
+{
+  return __builtin_iseqsig (x, y) * 2 + __builtin_iseqsig (y, x);
+}
+
+int
+baz (long double x, long long y)
+{
+  return __builtin_iseqsig (x, y) * 2 + __builtin_iseqsig (y, x);
+}
--- gcc/testsuite/gcc.dg/bitint-118.c.jj2024-11-27 16:45:21.457230486 
+0100
+++ gcc/testsuite/gcc.dg/bitint-118.c   2024-11-27 17:01:55.968241400 +0100
@@ -0,0 +1,21 @@
+/* PR c/117802 */
+/* { dg-do compile { target bitint575 } } */
+/* { dg-options "-std=c23" } */
+
+int
+foo (float x, _BitInt(8) y)
+{
+  return __builtin_iseqsig (x, y) * 2 + __builtin_iseqsig (y, x);
+}
+
+int
+bar (double x, unsigned _BitInt(162) y)
+{
+  return __builtin_iseqsig (x, y) * 2 + __builtin_iseqsig (y, x);
+}
+
+int
+baz (long double x, _BitInt(574) y)
+{
+  return __builtin_iseqsig (x, y) * 2 + __builtin_iseqsig (y, x);
+}

Jakub

Re: [PATCH] [x86] [RFC] Prevent loop vectorization if it's in a deeply nested big loop.

2024-11-28 Thread Richard Biener

On Thu, Nov 28, 2024 at 3:04 AM Hongtao Liu  wrote:
>
> On Wed, Nov 27, 2024 at 9:43 PM Richard Biener
>  wrote:
> >
> > On Wed, Nov 27, 2024 at 4:26 AM liuhongt  wrote:
> > >
> > > When loop requires any kind of versioning which could increase register
> > > pressure too much, and it's in a deeply nest big loop, don't do
> > > vectorization.
> > >
> > > I tested the patch with both Ofast and O2 for SPEC2017, besides 
> > > 548.exchange_r,
> > > other benchmarks are same binary.
> > >
> > > Bootstrapped and regtested 0on x86_64-pc-linux-gnu{-m32,}
> > > Any comments?
> >
> > The vectorizer tries to version an outer loop when vectorizing a loop nest
> > and the versioning condition is invariant.  See vect_loop_versioning.  This
> > tries to handle such cases.  Often the generated runtime alias checks are
> > not invariant because we do not consider the outer evolutions.  I think we
> > should instead fix this there.
> >
> > Question below ...
> >
> > > gcc/ChangeLog:
> > >
> > > pr target/117088
> > > * config/i386/i386.cc
> > > (ix86_vector_costs::ix86_vect_in_deep_nested_loop_p): New 
> > > function.
> > > (ix86_vector_costs::finish_cost): Prevent loop vectorization
> > > if it's in a deeply nested loop and require versioning.
> > > * config/i386/i386.opt (--param=vect-max-loop-depth=): New
> > > param.
> > > ---
> > >  gcc/config/i386/i386.cc  | 89 
> > >  gcc/config/i386/i386.opt |  4 ++
> > >  2 files changed, 93 insertions(+)
> > >
> > > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> > > index 526c9df7618..608f40413d2 100644
> > > --- a/gcc/config/i386/i386.cc
> > > +++ b/gcc/config/i386/i386.cc
> > > @@ -25019,6 +25019,8 @@ private:
> > >
> > >/* Estimate register pressure of the vectorized code.  */
> > >void ix86_vect_estimate_reg_pressure ();
> > > +  /* Check if vect_loop is in a deeply-nested loop.  */
> > > +  bool ix86_vect_in_deep_nested_loop_p (class loop *vect_loop);
> > >/* Number of GENERAL_REGS/SSE_REGS used in the vectorizer, it's used 
> > > for
> > >   estimation of register pressure.
> > >   ??? Currently it's only used by vec_construct/scalar_to_vec
> > > @@ -25324,6 +25326,84 @@ 
> > > ix86_vector_costs::ix86_vect_estimate_reg_pressure ()
> > >  }
> > >  }
> > >
> > > +/* Return true if vect_loop is in a deeply-nested loop.
> > > +   .i.e vect_loop_n in below loop structure.
> > > +loop1
> > > +{
> > > + loop2
> > > + {
> > > +  loop3
> > > +  {
> > > +   vect_loop_1;
> > > +   loop4
> > > +   {
> > > +vect_loop_2;
> > > +loop5
> > > +{
> > > + vect_loop_3;
> > > + loop6
> > > + {
> > > +  vect_loop_4;
> > > +  loop7
> > > +  {
> > > +   vect_loop_5;
> > > +   loop8
> > > +   {
> > > +   loop9
> > > +   }
> > > +  vect_loop_6;
> > > +  }
> > > + vect_loop_7;
> > > + }
> > > +}
> > > +   }
> > > + }
> > > + It's a big hammer to fix O2 regression for 548.exchange_r after 
> > > vectorization
> > > + is enhanced by (r15-4225-g70c3db511ba14f)  */
> > > +bool
> > > +ix86_vector_costs::ix86_vect_in_deep_nested_loop_p (class loop 
> > > *vect_loop)
> > > +{
> > > +  if (loop_depth (vect_loop) > (unsigned) ix86_vect_max_loop_depth)
> > > +return true;
> > > +
> > > +  if (loop_depth (vect_loop) < 2)
> > > +return false;
> > > +
> >
> > while the above two are "obvious", what you check below isn't clear to me.
> > Is this trying to compute whether 'vect_loop' is inside of a loop nest which
> > at any sibling of vect_loop (or even sibling of an outer loop of vect_loop,
> > recursively) is a sub-nest with a loop depth (relative to what?) exceeds
> > ix86_vect_max_loop_depth?
> Yes, the function tries to find if the vect_loop is in a "big outer
> loop" which contains an innermost loop with loop_depth >
> ix86_vect_max_loop_depth.
> If yes, then prevent vectorization for the loop if its tripcount is
> not constant VF-times.(requires any kind of versioning is not
> accurate, and yes it's a big hammer.)

I'll note it also doesn't seem to look at register pressure at all or limit
the cut-off to the very-cheap cost model?

That said, it feels like a hack specifically for 548.exchange_r, in particular
vectorization by itself shouldn't increase register pressure (much), but
exchange is known to operate on the bounds of "awful" with regard to
register pressure.  If you'd enable APX would exchange benefit from
vectorizing?

That said, I think we have to live with the regression, the change feels
odd and a strcmp (main_file_name, "exchange") would be similar.  So
we either need to make the pattern matching more precise, like counting
live IVs from the loop nest plus applying heuristics on how vectorization
increases register pressure (maybe it's an IVOPTs issue in the end?),
or defer a solution to GCC 16.

Richard.

> >
> > > +  class loop* outer_loop = loop_outer (vect_loop);
> > > +

[PATCH] gimple-fold: Avoid ICEs with bogus declarations like const attribute no snprintf [PR117358]

2024-11-28 Thread Jakub Jelinek

Hi!

When one puts incorrect const or pure attributes on declarations of various
C APIs which have corresponding builtins (vs. what they actually do), we can
get tons of ICEs in gimple-fold.cc.

The following patch fixes it by giving up gimple_fold_builtin_* folding
if the functions don't have gimple_vdef (or for pure functions like
bcmp/strchr/strstr gimple_vuse) when in SSA form (during gimplification
they will surely have both of those NULL even when declared correctly,
yet it is highly desirable to fold them).

Or shall I replace
!gimple_vdef (stmt) && gimple_in_ssa_p (cfun)
tests with
(gimple_call_flags (stmt) & (ECF_CONST | ECF_PURE | ECF_NOVOPS)) != 0
and
!gimple_vuse (stmt) && gimple_in_ssa_p (cfun)
with
(gimple_call_flags (stmt) & (ECF_CONST | ECF_NOVOPS)) != 0
?

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk as
is or go with the above tests instead?

2024-11-28  Jakub Jelinek  

PR tree-optimization/117358
* gimple-fold.cc (gimple_fold_builtin_memory_op): Punt if stmt has no
vdef in ssa form.
(gimple_fold_builtin_bcmp): Punt if stmt has no vuse in ssa form.
(gimple_fold_builtin_bcopy): Punt if stmt has no vdef in ssa form.
(gimple_fold_builtin_bzero): Likewise.
(gimple_fold_builtin_memset): Likewise.  Use return false instead of
return NULL_TREE.
(gimple_fold_builtin_strcpy): Punt if stmt has no vdef in ssa form.
(gimple_fold_builtin_strncpy): Likewise.
(gimple_fold_builtin_strchr): Punt if stmt has no vuse in ssa form.
(gimple_fold_builtin_strstr): Likewise.
(gimple_fold_builtin_strcat): Punt if stmt has no vdef in ssa form.
(gimple_fold_builtin_strcat_chk): Likewise.
(gimple_fold_builtin_strncat): Likewise.
(gimple_fold_builtin_strncat_chk): Likewise.
(gimple_fold_builtin_string_compare): Likewise.
(gimple_fold_builtin_fputs): Likewise.
(gimple_fold_builtin_memory_chk): Likewise.
(gimple_fold_builtin_stxcpy_chk): Likewise.
(gimple_fold_builtin_stxncpy_chk): Likewise.
(gimple_fold_builtin_stpcpy): Likewise.
(gimple_fold_builtin_snprintf_chk): Likewise.
(gimple_fold_builtin_sprintf_chk): Likewise.
(gimple_fold_builtin_sprintf): Likewise.
(gimple_fold_builtin_snprintf): Likewise.
(gimple_fold_builtin_fprintf): Likewise.
(gimple_fold_builtin_printf): Likewise.
(gimple_fold_builtin_realloc): Likewise.

* gcc.c-torture/compile/pr117358.c: New test.

--- gcc/gimple-fold.cc.jj   2024-11-23 13:00:29.566010380 +0100
+++ gcc/gimple-fold.cc  2024-11-28 09:09:31.184314115 +0100
@@ -1061,6 +1061,8 @@ gimple_fold_builtin_memory_op (gimple_st
}
   goto done;
 }
+  else if (!gimple_vdef (stmt) && gimple_in_ssa_p (cfun))
+return false;
   else
 {
   /* We cannot (easily) change the type of the copy if it is a storage
@@ -1511,6 +1513,8 @@ gimple_fold_builtin_bcmp (gimple_stmt_it
   /* Transform bcmp (a, b, len) into memcmp (a, b, len).  */
 
   gimple *stmt = gsi_stmt (*gsi);
+  if (!gimple_vuse (stmt) && gimple_in_ssa_p (cfun))
+return false;
   tree a = gimple_call_arg (stmt, 0);
   tree b = gimple_call_arg (stmt, 1);
   tree len = gimple_call_arg (stmt, 2);
@@ -1537,6 +1541,8 @@ gimple_fold_builtin_bcopy (gimple_stmt_i
  len) into memmove (dest, src, len).  */
 
   gimple *stmt = gsi_stmt (*gsi);
+  if (!gimple_vdef (stmt) && gimple_in_ssa_p (cfun))
+return false;
   tree src = gimple_call_arg (stmt, 0);
   tree dest = gimple_call_arg (stmt, 1);
   tree len = gimple_call_arg (stmt, 2);
@@ -1562,6 +1568,8 @@ gimple_fold_builtin_bzero (gimple_stmt_i
   /* Transform bzero (dest, len) into memset (dest, 0, len).  */
 
   gimple *stmt = gsi_stmt (*gsi);
+  if (!gimple_vdef (stmt) && gimple_in_ssa_p (cfun))
+return false;
   tree dest = gimple_call_arg (stmt, 0);
   tree len = gimple_call_arg (stmt, 1);
 
@@ -1591,6 +1599,9 @@ gimple_fold_builtin_memset (gimple_stmt_
   return true;
 }
 
+  if (!gimple_vdef (stmt) && gimple_in_ssa_p (cfun))
+return false;
+
   if (! tree_fits_uhwi_p (len))
 return false;
 
@@ -1613,20 +1624,20 @@ gimple_fold_builtin_memset (gimple_stmt_
   if ((!INTEGRAL_TYPE_P (etype)
&& !POINTER_TYPE_P (etype))
   || TREE_CODE (etype) == BITINT_TYPE)
-return NULL_TREE;
+return false;
 
   if (! var_decl_component_p (var))
-return NULL_TREE;
+return false;
 
   length = tree_to_uhwi (len);
   if (GET_MODE_SIZE (SCALAR_INT_TYPE_MODE (etype)) != length
   || (GET_MODE_PRECISION (SCALAR_INT_TYPE_MODE (etype))
  != GET_MODE_BITSIZE (SCALAR_INT_TYPE_MODE (etype)))
   || get_pointer_alignment (dest) / BITS_PER_UNIT < length)
-return NULL_TREE;
+return false;
 
   if (length > HOST_BITS_PER_WIDE_INT / BITS_PER_UNIT)
-return NULL_TREE;
+return false;
 
   if (!type_has_mode_precision_p (etype))
 etype = lang_hooks.types.ty

Re: [PATCH] gimple-fold: Avoid ICEs with bogus declarations like const attribute no snprintf [PR117358]

2024-11-28 Thread Richard Biener

On Thu, 28 Nov 2024, Jakub Jelinek wrote:

> Hi!
> 
> When one puts incorrect const or pure attributes on declarations of various
> C APIs which have corresponding builtins (vs. what they actually do), we can
> get tons of ICEs in gimple-fold.cc.
> 
> The following patch fixes it by giving up gimple_fold_builtin_* folding
> if the functions don't have gimple_vdef (or for pure functions like
> bcmp/strchr/strstr gimple_vuse) when in SSA form (during gimplification
> they will surely have both of those NULL even when declared correctly,
> yet it is highly desirable to fold them).
> 
> Or shall I replace
> !gimple_vdef (stmt) && gimple_in_ssa_p (cfun)
> tests with
> (gimple_call_flags (stmt) & (ECF_CONST | ECF_PURE | ECF_NOVOPS)) != 0
> and
> !gimple_vuse (stmt) && gimple_in_ssa_p (cfun)
> with
> (gimple_call_flags (stmt) & (ECF_CONST | ECF_NOVOPS)) != 0
> ?

I think this doesn't work since a wrong pure/const attribute will
override this.  We'd have to check the flags against the
builtins.def attributes only.

> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk as
> is or go with the above tests instead?

The patch is OK.

Richard.

> 2024-11-28  Jakub Jelinek  
> 
>   PR tree-optimization/117358
>   * gimple-fold.cc (gimple_fold_builtin_memory_op): Punt if stmt has no
>   vdef in ssa form.
>   (gimple_fold_builtin_bcmp): Punt if stmt has no vuse in ssa form.
>   (gimple_fold_builtin_bcopy): Punt if stmt has no vdef in ssa form.
>   (gimple_fold_builtin_bzero): Likewise.
>   (gimple_fold_builtin_memset): Likewise.  Use return false instead of
>   return NULL_TREE.
>   (gimple_fold_builtin_strcpy): Punt if stmt has no vdef in ssa form.
>   (gimple_fold_builtin_strncpy): Likewise.
>   (gimple_fold_builtin_strchr): Punt if stmt has no vuse in ssa form.
>   (gimple_fold_builtin_strstr): Likewise.
>   (gimple_fold_builtin_strcat): Punt if stmt has no vdef in ssa form.
>   (gimple_fold_builtin_strcat_chk): Likewise.
>   (gimple_fold_builtin_strncat): Likewise.
>   (gimple_fold_builtin_strncat_chk): Likewise.
>   (gimple_fold_builtin_string_compare): Likewise.
>   (gimple_fold_builtin_fputs): Likewise.
>   (gimple_fold_builtin_memory_chk): Likewise.
>   (gimple_fold_builtin_stxcpy_chk): Likewise.
>   (gimple_fold_builtin_stxncpy_chk): Likewise.
>   (gimple_fold_builtin_stpcpy): Likewise.
>   (gimple_fold_builtin_snprintf_chk): Likewise.
>   (gimple_fold_builtin_sprintf_chk): Likewise.
>   (gimple_fold_builtin_sprintf): Likewise.
>   (gimple_fold_builtin_snprintf): Likewise.
>   (gimple_fold_builtin_fprintf): Likewise.
>   (gimple_fold_builtin_printf): Likewise.
>   (gimple_fold_builtin_realloc): Likewise.
> 
>   * gcc.c-torture/compile/pr117358.c: New test.
> 
> --- gcc/gimple-fold.cc.jj 2024-11-23 13:00:29.566010380 +0100
> +++ gcc/gimple-fold.cc2024-11-28 09:09:31.184314115 +0100
> @@ -1061,6 +1061,8 @@ gimple_fold_builtin_memory_op (gimple_st
>   }
>goto done;
>  }
> +  else if (!gimple_vdef (stmt) && gimple_in_ssa_p (cfun))
> +return false;
>else
>  {
>/* We cannot (easily) change the type of the copy if it is a storage
> @@ -1511,6 +1513,8 @@ gimple_fold_builtin_bcmp (gimple_stmt_it
>/* Transform bcmp (a, b, len) into memcmp (a, b, len).  */
>  
>gimple *stmt = gsi_stmt (*gsi);
> +  if (!gimple_vuse (stmt) && gimple_in_ssa_p (cfun))
> +return false;
>tree a = gimple_call_arg (stmt, 0);
>tree b = gimple_call_arg (stmt, 1);
>tree len = gimple_call_arg (stmt, 2);
> @@ -1537,6 +1541,8 @@ gimple_fold_builtin_bcopy (gimple_stmt_i
>   len) into memmove (dest, src, len).  */
>  
>gimple *stmt = gsi_stmt (*gsi);
> +  if (!gimple_vdef (stmt) && gimple_in_ssa_p (cfun))
> +return false;
>tree src = gimple_call_arg (stmt, 0);
>tree dest = gimple_call_arg (stmt, 1);
>tree len = gimple_call_arg (stmt, 2);
> @@ -1562,6 +1568,8 @@ gimple_fold_builtin_bzero (gimple_stmt_i
>/* Transform bzero (dest, len) into memset (dest, 0, len).  */
>  
>gimple *stmt = gsi_stmt (*gsi);
> +  if (!gimple_vdef (stmt) && gimple_in_ssa_p (cfun))
> +return false;
>tree dest = gimple_call_arg (stmt, 0);
>tree len = gimple_call_arg (stmt, 1);
>  
> @@ -1591,6 +1599,9 @@ gimple_fold_builtin_memset (gimple_stmt_
>return true;
>  }
>  
> +  if (!gimple_vdef (stmt) && gimple_in_ssa_p (cfun))
> +return false;
> +
>if (! tree_fits_uhwi_p (len))
>  return false;
>  
> @@ -1613,20 +1624,20 @@ gimple_fold_builtin_memset (gimple_stmt_
>if ((!INTEGRAL_TYPE_P (etype)
> && !POINTER_TYPE_P (etype))
>|| TREE_CODE (etype) == BITINT_TYPE)
> -return NULL_TREE;
> +return false;
>  
>if (! var_decl_component_p (var))
> -return NULL_TREE;
> +return false;
>  
>length = tree_to_uhwi (len);
>if (GET_MODE_SIZE (SCALAR_INT_TYPE_MODE (etype)) != length
>

Ping [PATCH v2 0/2] aarch64: Use standard names for saturating arithmetic

2024-11-28 Thread Akram Ahmad


Just pinging v2 of this patch series

On 14/11/2024 15:53, Akram Ahmad wrote:

Hi all,

This patch series introduces standard names for scalar, Adv. SIMD, and
SVE saturating arithmetic instructions in the aarch64 backend.

Additional tests are added for scalar saturating arithmetic, as well
as to test that the auto-vectorizer correctly inserts NEON instructions
or scalar instructions where necessary, such as in 32 and 64-bit scalar
unsigned arithmetic. There are also tests for the auto-vectorized SVE
code.

The biggest change from V1-V2 of this series is the optimisation for
signed scalar arithmetic (32 and 64-bit) to avoid the use of FMOV in
the case of a constant and non-constant operand (immediate or GP reg
values respectively). This is only exhibited if early-ra is disabled
due to an early-ra bug which is assigning FP registers for operands
even if this would unnecessarily result in FMOV being used. This new
optimisation is tested by means of check-function-bodies as well as
an execution test.

As with v1 of this patch, the only new regression failures on aarch64
are to do with unsigned scalar intrinsics (32 and 64-bit) not using
the NEON instructions any more. Otherwise, there are no regressions.

SVE currently uses the unpredicated version of the instruction in the
backend.

v1 -> v2:
- Add new split for signed saturating arithmetic
- New test for signed saturating arithmetic
- Make addition tests accept commutative operands, other test fixes

Only the first patch in this series is updated in v2. The other
patch is already approved. If this is ok, could this be committed
for me please? I do not have commit rights.

Many thanks,

Akram

---

Akram Ahmad (2):
   aarch64: Use standard names for saturating arithmetic
   aarch64: Use standard names for SVE saturating arithmetic

  gcc/config/aarch64/aarch64-builtins.cc|  13 +
  gcc/config/aarch64/aarch64-simd-builtins.def  |   8 +-
  gcc/config/aarch64/aarch64-simd.md| 209 ++-
  gcc/config/aarch64/aarch64-sve.md |   4 +-
  gcc/config/aarch64/arm_neon.h |  96 +++
  gcc/config/aarch64/iterators.md   |   4 +
  .../saturating_arithmetic_autovect.inc|  58 +
  .../saturating_arithmetic_autovect_1.c|  79 ++
  .../saturating_arithmetic_autovect_2.c|  79 ++
  .../saturating_arithmetic_autovect_3.c|  75 ++
  .../saturating_arithmetic_autovect_4.c|  77 ++
  .../aarch64/saturating-arithmetic-signed.c| 244 ++
  .../aarch64/saturating_arithmetic.inc |  39 +++
  .../aarch64/saturating_arithmetic_1.c |  36 +++
  .../aarch64/saturating_arithmetic_2.c |  36 +++
  .../aarch64/saturating_arithmetic_3.c |  30 +++
  .../aarch64/saturating_arithmetic_4.c |  30 +++
  .../aarch64/sve/saturating_arithmetic.inc |  68 +
  .../aarch64/sve/saturating_arithmetic_1.c |  60 +
  .../aarch64/sve/saturating_arithmetic_2.c |  60 +
  .../aarch64/sve/saturating_arithmetic_3.c |  62 +
  .../aarch64/sve/saturating_arithmetic_4.c |  62 +
  22 files changed, 1371 insertions(+), 58 deletions(-)
  create mode 100644 
gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect.inc
  create mode 100644 
gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_1.c
  create mode 100644 
gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_2.c
  create mode 100644 
gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_3.c
  create mode 100644 
gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_4.c
  create mode 100644 
gcc/testsuite/gcc.target/aarch64/saturating-arithmetic-signed.c
  create mode 100644 gcc/testsuite/gcc.target/aarch64/saturating_arithmetic.inc
  create mode 100644 gcc/testsuite/gcc.target/aarch64/saturating_arithmetic_1.c
  create mode 100644 gcc/testsuite/gcc.target/aarch64/saturating_arithmetic_2.c
  create mode 100644 gcc/testsuite/gcc.target/aarch64/saturating_arithmetic_3.c
  create mode 100644 gcc/testsuite/gcc.target/aarch64/saturating_arithmetic_4.c
  create mode 100644 
gcc/testsuite/gcc.target/aarch64/sve/saturating_arithmetic.inc
  create mode 100644 
gcc/testsuite/gcc.target/aarch64/sve/saturating_arithmetic_1.c
  create mode 100644 
gcc/testsuite/gcc.target/aarch64/sve/saturating_arithmetic_2.c
  create mode 100644 
gcc/testsuite/gcc.target/aarch64/sve/saturating_arithmetic_3.c
  create mode 100644 
gcc/testsuite/gcc.target/aarch64/sve/saturating_arithmetic_4.c

[PATCH] arm: [MVE intrinsics] fix vctpq intrinsic implementation [PR target/117814]

2024-11-28 Thread Christophe Lyon

The VCTP instruction creates a Vector Tail Predicate in VPR.P0, based
on the input value, but also constrained by a VPT block (if present),
or if used within a DLSTP/LETP loop.

Therefore we need to inform the compiler that this intrinsic reads the
FPCXT register, otherwise it could make incorrect assumptions: for
instance in test7() from gcc.target/arm/mve/dlstp-compile-asm-2.c it
would hoist p1 = vctp32q (g) outside of the loop.

The patch introduces a new flag CP_READ_FPCXT, which is handled
similarly to CP_READ_MEMORY.

gcc/ChangeLog:

PR target/117814
* config/arm/arm-mve-builtins-base.cc (vctpq_impl): Implement
call_properties.
* config/arm/arm-mve-builtins.cc
(function_instance::reads_global_state_p): Handle CP_READ_FPCXT.
* config/arm/arm-mve-builtins.h (CP_READ_FPCXT): New flag.
---
 gcc/config/arm/arm-mve-builtins-base.cc | 6 ++
 gcc/config/arm/arm-mve-builtins.cc  | 3 ++-
 gcc/config/arm/arm-mve-builtins.h   | 1 +
 3 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/gcc/config/arm/arm-mve-builtins-base.cc 
b/gcc/config/arm/arm-mve-builtins-base.cc
index 723004b53d7..bc9dcc77515 100644
--- a/gcc/config/arm/arm-mve-builtins-base.cc
+++ b/gcc/config/arm/arm-mve-builtins-base.cc
@@ -541,6 +541,12 @@ public:
   /* Mode this intrinsic operates on.  */
   machine_mode m_mode;
 
+  unsigned int
+  call_properties (const function_instance &) const override
+  {
+return CP_READ_FPCXT;
+  }
+
   rtx
   expand (function_expander &e) const override
   {
diff --git a/gcc/config/arm/arm-mve-builtins.cc 
b/gcc/config/arm/arm-mve-builtins.cc
index 30b103ec086..8bbcedd2f15 100644
--- a/gcc/config/arm/arm-mve-builtins.cc
+++ b/gcc/config/arm/arm-mve-builtins.cc
@@ -785,7 +785,8 @@ function_instance::reads_global_state_p () const
   if (flags & CP_READ_FPCR)
 return true;
 
-  return flags & CP_READ_MEMORY;
+  /* Handle direct reads of global state.  */
+  return flags & (CP_READ_MEMORY | CP_READ_FPCXT);
 }
 
 /* Return true if calls to the function could modify some form of
diff --git a/gcc/config/arm/arm-mve-builtins.h 
b/gcc/config/arm/arm-mve-builtins.h
index cdc07b4e51f..d76a10516ba 100644
--- a/gcc/config/arm/arm-mve-builtins.h
+++ b/gcc/config/arm/arm-mve-builtins.h
@@ -93,6 +93,7 @@ const unsigned int CP_READ_FPCR = 1U << 0;
 const unsigned int CP_RAISE_FP_EXCEPTIONS = 1U << 1;
 const unsigned int CP_READ_MEMORY = 1U << 2;
 const unsigned int CP_WRITE_MEMORY = 1U << 3;
+const unsigned int CP_READ_FPCXT = 1U << 4;
 
 /* Enumerates the MVE predicate and (data) vector types, together called
"vector types" for brevity.  */
-- 
2.34.1

Re: [PATCH] Introduce -flto-partition=locality

2024-11-28 Thread Kyrylo Tkachov

Ping.

> On 15 Nov 2024, at 17:04, Kyrylo Tkachov  wrote:
> 
> Hi all,
> 
> This is a patch submission following-up from the RFC at:
> https://gcc.gnu.org/pipermail/gcc/2024-November/245076.html
> The patch is rebased and retested against current trunk, some debugging code
> removed, comments improved and some fixes added as I've we've done more
> testing.
> 
> >8-
> Implement partitioning and cloning in the callgraph to help locality.
> A new -flto-partition=locality flag is used to enable this.
> The majority of the logic is in the new IPA pass in ipa-locality-cloning.cc
> The optimization has two components:
> * Partitioning the callgraph so as to group callers and callees that 
> frequently
> call each other in the same partition
> * Cloning functions that straddle multiple callchains and allowing each clone
> to be local to the partition of its callchain.
> 
> The majority of the logic is in the new IPA pass in ipa-locality-cloning.cc.
> It creates a partitioning plan and does the prerequisite cloning.
> The partitioning is then implemented during the existing LTO partitioning 
> pass.
> 
> To guide these locality heuristics we use PGO data.
> In the absence of PGO data we use a static heuristic that uses the accumulated
> estimated edge frequencies of the callees for each function to guide the
> reordering.
> We are investigating some more elaborate static heuristics, in particular 
> using
> the demangled C++ names to group template instantiatios together.
> This is promising but we are working out some kinks in the implementation
> currently and want to send that out as a follow-up once we're more confident
> in it.
> 
> A new bootstrap-lto-locality bootstrap config is added that allows us to test
> this on GCC itself with either static or PGO heuristics.
> GCC bootstraps with both (normal LTO bootstrap and profiledbootstrap).
> 
> With this optimization we are seeing good performance gains on some large
> internal workloads that stress the parts of the processor that is sensitive
> to code locality, but we'd appreciate wider performance evaluation.
> 
> Bootstrapped and tested on aarch64-none-linux-gnu.
> Ok for mainline?
> Thanks,
> Kyrill
> 
> Signed-off-by: Prachi Godbole 
> Co-authored-by: Kyrylo Tkachov 
> 
>config/ChangeLog:
> * bootstrap-lto-locality.mk: New file.
> 
> gcc/ChangeLog:
>* Makefile.in (OBJS): Add ipa-locality-cloning.o
>(GTFILES): Add ipa-locality-cloning.cc dependency.
>* common.opt (lto_partition_model): Add locality value.
>* flag-types.h (lto_partition_model): Add LTO_PARTITION_LOCALITY 
> value.
>(enum lto_locality_cloning_model): Define.
>* lto-cgraph.cc (lto_set_symtab_encoder_in_partition): Add dumping 
> of node
>and index.
>* params.opt (lto_locality_cloning_model): New enum.
>(lto-partition-locality-cloning): New param.
>(lto-partition-locality-frequency-cutoff): Likewise.
>(lto-partition-locality-size-cutoff): Likewise.
>(lto-max-locality-partition): Likewise.
>* passes.def: Add pass_ipa_locality_cloning.
>* timevar.def (TV_IPA_LC): New timevar.
>* tree-pass.h (make_pass_ipa_locality_cloning): Declare.
>* ipa-locality-cloning.cc: New file.
>* ipa-locality-cloning.h: New file.
> 
>  gcc/lto/ChangeLog:
> * lto-partition.cc: Include ipa-locality-cloning.h
>(add_node_references_to_partition): Define.
>(create_partition): Likewise.
>(lto_locality_map): Likewise.
>(lto_promote_cross_file_statics): Add extra dumping.
>* lto-partition.h (lto_locality_map): Declare.
>* lto.cc (do_whole_program_analysis): Handle 
> LTO_PARTITION_LOCALITY.
> 
> <0001-Introduce-flto-partition-locality.patch>

Re: [PATCH] Introduce feeble_inline attribute [PR93008]

2024-11-28 Thread Richard Biener

On Thu, Nov 14, 2024 at 5:03 PM Jakub Jelinek  wrote:
>
> Hi!
>
> The inlining heuristics uses DECL_DECLARED_INLINE_P (whether a function
> has been explicitly marked inline; that can be inline keyword, or for C++
> also constexpr keyword or defining a function inside of a class definition)
> heavily to increase desirability of inlining a function etc.
> In most cases it is desirable, people usually mark functions inline with
> the intent that they are actually inlined.
> But as PR93008 shows, that isn't always the case.
> One can mark (usually large or cold) function constexpr just because the
> standard requires it to be constexpr or that it is useful to users to allow
> evaluating the function in constant expression evaluation, and doesn't mind
> if the compiler chooses to inline it if it is really worth it, but it might
> not be that good idea to do so.  Especially with recent versions of C++
> where pretty much everything has been or is going to be constexpr.
> Or one might e.g. use inline keyword to get the C++ comdat behavior, again
> with no particular intent that such function is a good idea to be inlined.
>
> This patch introduces a new attribute for weaker inline semantics (basically
> it behaves as inline for the FE/debug info purposes, just for the
> optimization decisions acts as if it wasn't explicitly inline); I haven't
> used weak_inline for the attribute name because one could confuse that with
> weak attribute and this has nothing to do with that.
>
> So far smoke tested on x86_64-linux, ok for trunk if it passes full
> bootstrap/regtest?

Sorry for chiming in only late - to me this shows that the desire to inline
a function more than another function, currently identified as
DECL_DECLARED_INLINE_P overlaps with frontend semantic differences.
But don't we reflect those semantic differences into the IL (via linkage,
symtab details) already?  So what would remain is a way for the user
to distinguish between auto-inline (we have the corresponding -auto
set of --params) and inline-inline.  The middle-end interface after your
change, where DECL_DECLARED_INLINE_P means inline-inline
unless !DECL_OPTIMIZABLE_INLINE_P looks a bit awkward.

Rather than clearing DECL_DECLARED_INLINE_P I'd suggest to
split both completely and turn DECL_DISREGARD_INLINE_LIMITS,
DECL_UNINLINABLE and auto-inline vs. inline-inline into a
multi-bit enum and only use that for inlining decisions (ignoring
DECL_DECLARED_INLINE_P for that purpose, but use that
and feeble_inline to compute the enum value).

Note I've had to lookup what 'feeble' means - given we use -auto
for --params I'd have chosen __attribute__((auto_inline)), possibly
"completed" by __attribute__((inline)) to mark a function as
wanting 'inline' heuristics but not 'inline' semantics.

Again, sorry for chiming in late.

Thanks,
Richard.

> 2024-11-14  Jakub Jelinek  
>
> PR c++/93008
> gcc/
> * tree-core.h (struct tree_function_decl): Add feeble_inline_flag
> bitfield.
> * tree.h (DECL_FEEBLE_INLINE_P, DECL_OPTIMIZABLE_INLINE_P): Define.
> * cgraphunit.cc (process_function_and_variable_attributes): Warn
> on feeble_inline attribute on !DECL_DECLARED_INLINE_P function.
> * symtab.cc (symtab_node::fixup_same_cpp_alias_visibility): Copy
> over DECL_FEEBLE_INLINE_P as well.  Formatting fixes.
> * tree-inline.cc (tree_inlinable_function_p, expand_call_inline): Use
> DECL_OPTIMIZABLE_INLINE_P instead of DECL_DECLARED_INLINE_P.
> * ipa-cp.cc (devirtualization_time_bonus): Likewise.
> * ipa-fnsummary.cc (ipa_call_context::estimate_size_and_time):
> Likewise.
> * ipa-icf.cc (sem_item::compare_referenced_symbol_properties): Punt
> on DECL_FEEBLE_INLINE_P differences.
> (sem_item::hash_referenced_symbol_properties): Hash also
> DECL_FEEBLE_INLINE_P and DECL_IS_REPLACEABLE_OPERATOR.
> * ipa-inline.cc (can_inline_edge_by_limits_p,
> want_early_inline_function_p, want_inline_small_function_p,
> want_inline_self_recursive_call_p, wrapper_heuristics_may_apply,
> edge_badness, recursive_inlining, early_inline_small_functions): Use
> DECL_OPTIMIZABLE_INLINE_P instead of DECL_DECLARED_INLINE_P.
> * ipa-split.cc (consider_split, execute_split_functions): Likewise.
> * lto-streamer-out.cc (hash_tree): Hash DECL_FEEBLE_INLINE_P and
> DECL_IS_REPLACEABLE_OPERATOR.
> * tree-streamer-in.cc (unpack_ts_function_decl_value_fields): Unpack
> DECL_FEEBLE_INLINE_P.
> * tree-streamer-out.cc (pack_ts_function_decl_value_fields): Pack
> DECL_FEEBLE_INLINE_P.
> * doc/invoke.texi (Winline): Document feeble_inline functions aren't
> warned about.
> * doc/extend.texi (feeble_inline function attribute): Document.
> gcc/c-family/
> * c-attribs.cc (attr_always_inline_exclusions,
> attr_noinline_exclusions): Add feeble_inline.
>

Re: [PATCH v2] s390: Add expander for uaddc/usubc optabs

2024-11-28 Thread Andreas Krebbel


On 11/21/24 09:38, Stefan Schulze Frielinghaus wrote:

Bootstrap and regtest are still running.  If those are successful and
there are no further comments I will push this one in the coming days.

-- >8 --

gcc/ChangeLog:

* config/s390/s390-protos.h (s390_emit_compare): Add mode
parameter for the resulting RTX.
* config/s390/s390.cc (s390_emit_compare): Dito.
(s390_emit_compare_and_swap): Change.
(s390_expand_vec_strlen): Change.
(s390_expand_cs_hqi): Change.
(s390_expand_split_stack_prologue): Change.
* config/s390/s390.md (*add3_carry1_cc): Renamed to ...
(add3_carry1_cc): this and in order to use the
corresponding gen function, encode CC mode into pattern.
(*sub3_borrow_cc): Renamed to ...
(sub3_borrow_cc): this and in order to use the
corresponding gen function, encode CC mode into pattern.
(*add3_alc_carry1_cc): Renamed to ...
(add3_alc_carry1_cc): this and in order to use the
corresponding gen function, encode CC mode into pattern.
(sub3_slb_borrow1_cc): New.
(uaddc5): New.
(usubc5): New.


Looks good to me. Thanks!


Andreas




gcc/testsuite/ChangeLog:

* gcc.target/s390/uaddc-1.c: New test.
* gcc.target/s390/uaddc-2.c: New test.
* gcc.target/s390/uaddc-3.c: New test.
* gcc.target/s390/usubc-1.c: New test.
* gcc.target/s390/usubc-2.c: New test.
* gcc.target/s390/usubc-3.c: New test.
---
  gcc/config/s390/s390-protos.h   |   2 +-
  gcc/config/s390/s390.cc |  20 +--
  gcc/config/s390/s390.md | 115 +
  gcc/testsuite/gcc.target/s390/uaddc-1.c | 156 
  gcc/testsuite/gcc.target/s390/uaddc-2.c |  25 
  gcc/testsuite/gcc.target/s390/uaddc-3.c |  27 
  gcc/testsuite/gcc.target/s390/usubc-1.c | 156 
  gcc/testsuite/gcc.target/s390/usubc-2.c |  25 
  gcc/testsuite/gcc.target/s390/usubc-3.c |  29 +
  9 files changed, 519 insertions(+), 36 deletions(-)
  create mode 100644 gcc/testsuite/gcc.target/s390/uaddc-1.c
  create mode 100644 gcc/testsuite/gcc.target/s390/uaddc-2.c
  create mode 100644 gcc/testsuite/gcc.target/s390/uaddc-3.c
  create mode 100644 gcc/testsuite/gcc.target/s390/usubc-1.c
  create mode 100644 gcc/testsuite/gcc.target/s390/usubc-2.c
  create mode 100644 gcc/testsuite/gcc.target/s390/usubc-3.c

diff --git a/gcc/config/s390/s390-protos.h b/gcc/config/s390/s390-protos.h
index e7ac59d17da..b8604394391 100644
--- a/gcc/config/s390/s390-protos.h
+++ b/gcc/config/s390/s390-protos.h
@@ -86,7 +86,7 @@ extern int tls_symbolic_operand (rtx);
  extern bool s390_match_ccmode (rtx_insn *, machine_mode);
  extern machine_mode s390_tm_ccmode (rtx, rtx, bool);
  extern machine_mode s390_select_ccmode (enum rtx_code, rtx, rtx);
-extern rtx s390_emit_compare (enum rtx_code, rtx, rtx);
+extern rtx s390_emit_compare (machine_mode, enum rtx_code, rtx, rtx);
  extern rtx_insn *s390_emit_jump (rtx, rtx);
  extern bool symbolic_reference_mentioned_p (rtx);
  extern bool tls_symbolic_reference_mentioned_p (rtx);
diff --git a/gcc/config/s390/s390.cc b/gcc/config/s390/s390.cc
index c9172d1153a..4c8bf21539c 100644
--- a/gcc/config/s390/s390.cc
+++ b/gcc/config/s390/s390.cc
@@ -2029,9 +2029,9 @@ s390_canonicalize_comparison (int *code, rtx *op0, rtx 
*op1,
 the IF_THEN_ELSE of the conditional branch testing the result.  */
  
  rtx

-s390_emit_compare (enum rtx_code code, rtx op0, rtx op1)
+s390_emit_compare (machine_mode mode, enum rtx_code code, rtx op0, rtx op1)
  {
-  machine_mode mode = s390_select_ccmode (code, op0, op1);
+  machine_mode cc_mode = s390_select_ccmode (code, op0, op1);
rtx cc;
  
/* Force OP1 into register in order to satisfy VXE TFmode patterns.  */

@@ -2043,17 +2043,17 @@ s390_emit_compare (enum rtx_code code, rtx op0, rtx op1)
/* Do not output a redundant compare instruction if a
 compare_and_swap pattern already computed the result and the
 machine modes are compatible.  */
-  gcc_assert (s390_cc_modes_compatible (GET_MODE (op0), mode)
+  gcc_assert (s390_cc_modes_compatible (GET_MODE (op0), cc_mode)
  == GET_MODE (op0));
cc = op0;
  }
else
  {
-  cc = gen_rtx_REG (mode, CC_REGNUM);
-  emit_insn (gen_rtx_SET (cc, gen_rtx_COMPARE (mode, op0, op1)));
+  cc = gen_rtx_REG (cc_mode, CC_REGNUM);
+  emit_insn (gen_rtx_SET (cc, gen_rtx_COMPARE (cc_mode, op0, op1)));
  }
  
-  return gen_rtx_fmt_ee (code, VOIDmode, cc, const0_rtx);

+  return gen_rtx_fmt_ee (code, mode, cc, const0_rtx);
  }
  
  /* If MEM is not a legitimate compare-and-swap memory operand, return a new

@@ -2103,7 +2103,7 @@ s390_emit_compare_and_swap (enum rtx_code code, rtx old, 
rtx mem,
  default:
gcc_unreachable ();
  }
-  return s390_emit_compare (code, cc, const0_rtx);
+  return s390_emit_compare (VOI

Re: [PATCH] builtins: Handle BITINT_TYPE in __builtin_iseqsig folding [PR117802]

2024-11-28 Thread Richard Biener

On Thu, 28 Nov 2024, Jakub Jelinek wrote:

> Hi!
> 
> In check_builtin_function_arguments in the _BitInt patchset I've changed
> INTEGER_TYPE tests to INTEGER_TYPE or BITINT_TYPE, but haven't done the
> same in fold_builtin_iseqsig, which now ICEs because of that.
> 
> The following patch fixes that.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

Richard.

> BTW, that TYPE_PRECISION (type0) >= TYPE_PRECISION (type1) test
> for REAL_TYPE vs. REAL_TYPE looks pretty random and dangerous, I think
> it would be useful to handle this builtin also in the C and C++ FEs,
> if both arguments have REAL_TYPE, use the FE specific routine to decide
> which types to use and error if a comparison between types would be
> erroneous (e.g. complain about _Decimal* vs. float/double/long
> double/_Float*, pick up the preferred type, complain about
> __ibm128 vs. _Float128 in C++, etc.).
> But the FEs can just promote one argument to the other in that case
> and keep fold_builtin_iseqsig as is for say Fortran and other FEs.
> 
> 2024-11-28  Jakub Jelinek  
> 
>   PR c/117802
>   * builtins.cc (fold_builtin_iseqsig): Handle BITINT_TYPE like
>   INTEGER_TYPE.
> 
>   * gcc.dg/builtin-iseqsig-1.c: New test.
>   * gcc.dg/bitint-118.c: New test.
> 
> --- gcc/builtins.cc.jj2024-11-27 14:33:07.522815405 +0100
> +++ gcc/builtins.cc   2024-11-27 16:36:41.111547052 +0100
> @@ -9946,9 +9946,11 @@ fold_builtin_iseqsig (location_t loc, tr
>  /* Choose the wider of two real types.  */
>  cmp_type = TYPE_PRECISION (type0) >= TYPE_PRECISION (type1)
>? type0 : type1;
> -  else if (code0 == REAL_TYPE && code1 == INTEGER_TYPE)
> +  else if (code0 == REAL_TYPE
> +&& (code1 == INTEGER_TYPE || code1 == BITINT_TYPE))
>  cmp_type = type0;
> -  else if (code0 == INTEGER_TYPE && code1 == REAL_TYPE)
> +  else if ((code0 == INTEGER_TYPE || code0 == BITINT_TYPE)
> +&& code1 == REAL_TYPE)
>  cmp_type = type1;
>  
>arg0 = builtin_save_expr (fold_convert_loc (loc, cmp_type, arg0));
> --- gcc/testsuite/gcc.dg/builtin-iseqsig-1.c.jj   2024-11-27 
> 16:45:00.951518847 +0100
> +++ gcc/testsuite/gcc.dg/builtin-iseqsig-1.c  2024-11-27 17:03:48.02966 
> +0100
> @@ -0,0 +1,20 @@
> +/* { dg-do compile } */
> +/* { dg-options "" } */
> +
> +int
> +foo (float x, int y)
> +{
> +  return __builtin_iseqsig (x, y) * 2 + __builtin_iseqsig (y, x);
> +}
> +
> +int
> +bar (double x, unsigned long y)
> +{
> +  return __builtin_iseqsig (x, y) * 2 + __builtin_iseqsig (y, x);
> +}
> +
> +int
> +baz (long double x, long long y)
> +{
> +  return __builtin_iseqsig (x, y) * 2 + __builtin_iseqsig (y, x);
> +}
> --- gcc/testsuite/gcc.dg/bitint-118.c.jj  2024-11-27 16:45:21.457230486 
> +0100
> +++ gcc/testsuite/gcc.dg/bitint-118.c 2024-11-27 17:01:55.968241400 +0100
> @@ -0,0 +1,21 @@
> +/* PR c/117802 */
> +/* { dg-do compile { target bitint575 } } */
> +/* { dg-options "-std=c23" } */
> +
> +int
> +foo (float x, _BitInt(8) y)
> +{
> +  return __builtin_iseqsig (x, y) * 2 + __builtin_iseqsig (y, x);
> +}
> +
> +int
> +bar (double x, unsigned _BitInt(162) y)
> +{
> +  return __builtin_iseqsig (x, y) * 2 + __builtin_iseqsig (y, x);
> +}
> +
> +int
> +baz (long double x, _BitInt(574) y)
> +{
> +  return __builtin_iseqsig (x, y) * 2 + __builtin_iseqsig (y, x);
> +}
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Re: [PATCH] expr, c, gimplify, v3: Don't clear whole unions [PR116416]

2024-11-28 Thread Richard Biener

On Tue, Oct 15, 2024 at 7:59 PM Jakub Jelinek  wrote:
>
> Hi!
>
> Here is an updated version of the patch.
>
> My reading of C23 is that if some aggregate field is initialized with
> {} (which is supposed to newly clear padding bits) and then its
> subobjects are overridden with designated initializers, then the
> padding bits are cleared property should be kept unless one overwrites
> that field completely with a different initializer, which was something
> the previous patch didn't implement, all it did was set
> CONSTRUCTOR_ZERO_PADDING_BITS flag on {} CONSTRUCTORs for flag_isoc23.
>
> This adjusted patch propagates it through the initializer handling
> and adds testcase coverage with my comments on what I think is well
> defined and what isn't (please correct me where I'm wrong).
>
> Haven't touched the C++ FE, I'm feeling lost there where zero initialization
> happens and where other forms of initialization happen, and where it should
> be propagated from one CONSTRUCTOR to another and where it shouldn't.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

The middle-end changes are OK.

Richard.

> 2024-10-15  Jakub Jelinek  
>
> PR c++/116416
> gcc/
> * flag-types.h (enum zero_init_padding_bits_kind): New type.
> * tree.h (CONSTRUCTOR_ZERO_PADDING_BITS): Define.
> * common.opt (fzero-init-padding-bits=): New option.
> * expr.cc (categorize_ctor_elements_1): Handle
> CONSTRUCTOR_ZERO_PADDING_BITS or
> flag_zero_init_padding_bits == ZERO_INIT_PADDING_BITS_ALL.  Fix up
> *p_complete = -1; setting for unions.
> (complete_ctor_at_level_p): Handle unions differently for
> flag_zero_init_padding_bits == ZERO_INIT_PADDING_BITS_STANDARD.
> * gimple-fold.cc (type_has_padding_at_level_p): Fix up UNION_TYPE
> handling, return also true for UNION_TYPE with no FIELD_DECLs
> and non-zero size, handle QUAL_UNION_TYPE like UNION_TYPE.
> * doc/invoke.texi (-fzero-init-padding-bits=@var{value}): Document.
> gcc/c/
> * c-parser.cc (c_parser_braced_init): Set 
> CONSTRUCTOR_ZERO_PADDING_BITS
> for flag_isoc23 empty initializers.
> * c-typeck.cc (constructor_zero_padding_bits): New variable.
> (struct constructor_stack): Add zero_padding_bits member.
> (really_start_incremental_init): Save and clear
> constructor_zero_padding_bits.
> (push_init_level): Save constructor_zero_padding_bits.  Or into it
> CONSTRUCTOR_ZERO_PADDING_BITS from previous value if implicit.
> (pop_init_level): Set CONSTRUCTOR_ZERO_PADDING_BITS if
> constructor_zero_padding_bits and restore
> constructor_zero_padding_bits.
> gcc/testsuite/
> * gcc.dg/plugin/infoleak-1.c (test_union_2b, test_union_4b): Expect
> diagnostics.
> * gcc.dg/c23-empty-init-4.c: New test.
> * gcc.dg/gnu11-empty-init-1.c: New test.
> * gcc.dg/gnu11-empty-init-2.c: New test.
> * gcc.dg/gnu11-empty-init-3.c: New test.
> * gcc.dg/gnu11-empty-init-4.c: New test.
>
> --- gcc/flag-types.h.jj 2024-10-07 11:40:04.518038504 +0200
> +++ gcc/flag-types.h2024-10-15 13:50:34.800660119 +0200
> @@ -291,6 +291,13 @@ enum auto_init_type {
>AUTO_INIT_ZERO = 2
>  };
>
> +/* Initialization of padding bits with zeros.  */
> +enum zero_init_padding_bits_kind {
> +  ZERO_INIT_PADDING_BITS_STANDARD = 0,
> +  ZERO_INIT_PADDING_BITS_UNIONS = 1,
> +  ZERO_INIT_PADDING_BITS_ALL = 2
> +};
> +
>  /* Different instrumentation modes.  */
>  enum sanitize_code {
>/* AddressSanitizer.  */
> --- gcc/tree.h.jj   2024-10-07 11:40:04.521038462 +0200
> +++ gcc/tree.h  2024-10-15 13:50:34.801660105 +0200
> @@ -1225,6 +1225,9 @@ extern void omp_clause_range_check_faile
>(vec_safe_length (CONSTRUCTOR_ELTS (NODE)))
>  #define CONSTRUCTOR_NO_CLEARING(NODE) \
>(CONSTRUCTOR_CHECK (NODE)->base.public_flag)
> +/* True if even padding bits should be zeroed during initialization.  */
> +#define CONSTRUCTOR_ZERO_PADDING_BITS(NODE) \
> +  (CONSTRUCTOR_CHECK (NODE)->base.default_def_flag)
>
>  /* Iterate through the vector V of CONSTRUCTOR_ELT elements, yielding the
> value of each element (stored within VAL). IX must be a scratch variable
> --- gcc/common.opt.jj   2024-10-07 11:40:04.510038616 +0200
> +++ gcc/common.opt  2024-10-15 13:50:35.227654223 +0200
> @@ -3505,6 +3505,22 @@ fzero-call-used-regs=
>  Common RejectNegative Joined
>  Clear call-used registers upon function return.
>
> +fzero-init-padding-bits=
> +Common Joined RejectNegative Enum(zero_init_padding_bits_kind) 
> Var(flag_zero_init_padding_bits) Init(ZERO_INIT_PADDING_BITS_STANDARD)
> +-fzero-init-padding-bits=[standard|unions|all] Zero padding bits in 
> initializers.
> +
> +Enum
> +Name(zero_init_padding_bits_kind) Type(enum zero_init_padding_bits_kind) 
> UnknownError(unrecognized zero init padding bits kind %qs)
> +
> +EnumValue
> +Enum(zero_init_padd

[committed] libstdc++: Include in os_defines.h for FreeBSD [PR117210]

2024-11-28 Thread Jonathan Wakely

This is needed so that __LONG_LONG_SUPPORTED is defined before we depend
on it.

libstdc++-v3/ChangeLog:

PR libstdc++/117210
* config/os/bsd/dragonfly/os_defines.h: Include .
* config/os/bsd/freebsd/os_defines.h: Likewise.
---

Bootstrapped x86_64-freebsd14, pushed to trunk.

I tried to test on dragonflybsd but it was too painful to even install
bash or vim so I gave up.

 libstdc++-v3/config/os/bsd/dragonfly/os_defines.h | 2 ++
 libstdc++-v3/config/os/bsd/freebsd/os_defines.h   | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/libstdc++-v3/config/os/bsd/dragonfly/os_defines.h 
b/libstdc++-v3/config/os/bsd/dragonfly/os_defines.h
index e030fa3dc87..9c5aaabc90f 100644
--- a/libstdc++-v3/config/os/bsd/dragonfly/os_defines.h
+++ b/libstdc++-v3/config/os/bsd/dragonfly/os_defines.h
@@ -29,6 +29,8 @@
 // System-specific #define, typedefs, corrections, etc, go here.  This
 // file will come before all others.
 
+#include  // For __LONG_LONG_SUPPORTED
+
 #define _GLIBCXX_USE_C99 1
 #define _GLIBCXX_USE_C99_STDIO 1
 #define _GLIBCXX_USE_C99_STDLIB 1
diff --git a/libstdc++-v3/config/os/bsd/freebsd/os_defines.h 
b/libstdc++-v3/config/os/bsd/freebsd/os_defines.h
index 0d63ae6cec4..125dfdc1888 100644
--- a/libstdc++-v3/config/os/bsd/freebsd/os_defines.h
+++ b/libstdc++-v3/config/os/bsd/freebsd/os_defines.h
@@ -29,6 +29,8 @@
 // System-specific #define, typedefs, corrections, etc, go here.  This
 // file will come before all others.
 
+#include  // For __LONG_LONG_SUPPORTED
+
 #define _GLIBCXX_USE_C99_STDIO 1
 #define _GLIBCXX_USE_C99_STDLIB 1
 #define _GLIBCXX_USE_C99_WCHAR 1
-- 
2.47.0

[PATCH] arm, testsuite: Adjust Arm tests after c23 changes

2024-11-28 Thread Christophe Lyon

After the recent c23, GCC complains because the testcase calls f()
with a parameter whereas the prototype has none.

gcc/testsuite/ChangeLog
* gcc.target/arm/mve/dlstp-loop-form.c: Fix f() prototype.
---
 gcc/testsuite/gcc.target/arm/mve/dlstp-loop-form.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/arm/mve/dlstp-loop-form.c 
b/gcc/testsuite/gcc.target/arm/mve/dlstp-loop-form.c
index 08811cef568..3039ee8f686 100644
--- a/gcc/testsuite/gcc.target/arm/mve/dlstp-loop-form.c
+++ b/gcc/testsuite/gcc.target/arm/mve/dlstp-loop-form.c
@@ -6,7 +6,7 @@
 #pragma GCC arm "arm_mve.h" false
 typedef __attribute__((aligned(2))) float16x8_t e;
 mve_pred16_t c(long d) { return __builtin_mve_vctp16qv8bi(d); }
-int f();
+int f(e);
 void n() {
   int g, h, *i, j;
   mve_pred16_t k;
-- 
2.34.1

Re: [PATCH] v2: Add support for nonnull_if_nonzero attribute [PR117023]

2024-11-28 Thread Richard Biener

On Wed, Nov 13, 2024 at 9:57 AM Jakub Jelinek  wrote:
>
> On Tue, Nov 12, 2024 at 06:34:39PM +0100, Jakub Jelinek wrote:
> > What do you think about this?  So far lightly tested.
>
> Unfortunately bootstrap/regtest revealed some issues in the patch,
> the tree-ssa-ccp.cc changes break bootstrap because fntype in there
> may be NULL and that is what get_nonnull_args handles by just returning
> NULL, but obviously TYPE_ATTRIBUTES (fntype) can't be accessed, so I've
> added if (!fntype) continue;
> And the ubsan tests worked in C but not C++ due to extra warning, so I've
> adjusted them.
>
> This has been successfully bootstrapped/regtested on x86_64-linux and
> i686-linux.

The middle-end changes are OK.

Richard.

> 2024-11-13  Jakub Jelinek  
>
> PR c/117023
> gcc/
> * gimple.h (infer_nonnull_range_by_attribute): Add a tree *
> argument defaulted to NULL.
> * gimple.cc (infer_nonnull_range_by_attribute): Add op2 argument.
> Handle also nonnull_if_nonzero attributes.
> * tree.cc (get_nonnull_args): Fix comment typo.
> * builtins.cc (validate_arglist): Handle nonnull_if_nonzero attribute.
> * tree-ssa-ccp.cc (pass_post_ipa_warn::execute): Handle
> nonnull_if_nonzero attributes.
> * ubsan.cc (instrument_nonnull_arg): Adjust
> infer_nonnull_range_by_attribute caller.  If it returned true and
> filed in non-NULL arg2, check that arg2 is non-zero as another
> condition next to checking that arg is zero.
> * doc/extend.texi (nonnull_if_nonzero): Document new attribute.
> gcc/c-family/
> * c-attribs.cc (handle_nonnull_if_nonzero_attribute): New
> function.
> (c_common_gnu_attributes): Add nonnull_if_nonzero attribute.
> (handle_nonnull_attribute): Fix comment typo.
> * c-common.cc (struct nonnull_arg_ctx): Add other member.
> (check_function_nonnull): Also check nonnull_if_nonzero attributes.
> (check_nonnull_arg): Use different warning wording if pctx->other
> is non-zero.
> (check_function_arguments): Initialize ctx.other.
> gcc/testsuite/
> * gcc.dg/nonnull-8.c: New test.
> * gcc.dg/nonnull-9.c: New test.
> * gcc.dg/nonnull-10.c: New test.
> * c-c++-common/ubsan/nonnull-6.c: New test.
> * c-c++-common/ubsan/nonnull-7.c: New test.
>
> --- gcc/gimple.h.jj 2024-09-23 16:01:12.393215457 +0200
> +++ gcc/gimple.h2024-11-12 12:24:06.544215672 +0100
> @@ -1661,7 +1661,7 @@ extern bool nonfreeing_call_p (gimple *)
>  extern bool nonbarrier_call_p (gimple *);
>  extern bool infer_nonnull_range (gimple *, tree);
>  extern bool infer_nonnull_range_by_dereference (gimple *, tree);
> -extern bool infer_nonnull_range_by_attribute (gimple *, tree);
> +extern bool infer_nonnull_range_by_attribute (gimple *, tree, tree * = NULL);
>  extern void sort_case_labels (vec &);
>  extern void preprocess_case_label_vec_for_gimple (vec &, tree, tree *);
>  extern void gimple_seq_set_location (gimple_seq, location_t);
> --- gcc/gimple.cc.jj2024-10-31 08:45:38.241824084 +0100
> +++ gcc/gimple.cc   2024-11-12 14:30:29.104618853 +0100
> @@ -3089,10 +3089,16 @@ infer_nonnull_range_by_dereference (gimp
>  }
>
>  /* Return true if OP can be inferred to be a non-NULL after STMT
> -   executes by using attributes.  */
> +   executes by using attributes.  If OP2 is non-NULL and nonnull_if_nonzero
> +   is the only attribute implying OP being non-NULL and the corresponding
> +   argument isn't non-zero INTEGER_CST, set *OP2 to the corresponding
> +   argument.  */
>  bool
> -infer_nonnull_range_by_attribute (gimple *stmt, tree op)
> +infer_nonnull_range_by_attribute (gimple *stmt, tree op, tree *op2)
>  {
> +  if (op2)
> +*op2 = NULL_TREE;
> +
>/* We can only assume that a pointer dereference will yield
>   non-NULL if -fdelete-null-pointer-checks is enabled.  */
>if (!flag_delete_null_pointer_checks
> @@ -3109,9 +3115,10 @@ infer_nonnull_range_by_attribute (gimple
>   attrs = lookup_attribute ("nonnull", attrs);
>
>   /* If "nonnull" wasn't specified, we know nothing about
> -the argument.  */
> +the argument, unless "nonnull_if_nonzero" attribute is
> +present.  */
>   if (attrs == NULL_TREE)
> -   return false;
> +   break;
>
>   /* If "nonnull" applies to all the arguments, then ARG
>  is non-null if it's in the argument list.  */
> @@ -3138,6 +3145,37 @@ infer_nonnull_range_by_attribute (gimple
> }
> }
> }
> +
> +  for (attrs = TYPE_ATTRIBUTES (fntype);
> +  (attrs = lookup_attribute ("nonnull_if_nonzero", attrs));
> +  attrs = TREE_CHAIN (attrs))
> +   {
> + tree args = TREE_VALUE (attrs);
> + unsigned int idx = TREE_INT_CST_LOW (TREE_VALUE (args)) - 1;
> + unsigned int idx2
> +   = TREE_INT_CST_LOW (TRE

Re: [PATCH] inline-asm, i386, v2: Add "redzone" clobber support

2024-11-28 Thread Richard Biener

On Thu, Nov 7, 2024 at 11:25 AM Jakub Jelinek  wrote:
>
> On Thu, Nov 07, 2024 at 09:12:34AM +0100, Uros Bizjak wrote:
> > On Thu, Nov 7, 2024 at 9:00 AM Jakub Jelinek  wrote:
> > >
> > > On Thu, Nov 07, 2024 at 08:47:34AM +0100, Uros Bizjak wrote:
> > > > Maybe we should always recognize "redzone", even for targets without
> > > > it. This is the way we recognize "cc" even for targets without CC reg
> > > > (e.g. alpha). This would simplify the definition and processing - if
> > > > the hook returns NULL_RTX (the default), then it (obviously) won't be
> > > > added to the clobber list.
> > >
> > > Dunno, am open to that, but thought it would be just weird if one says
> > > "redzone" on targets which don't have such a concept.
> >
> > Let's look at the situation with x86_32 and x86_64. The "redzone" for
> > the former is just an afterthought, so we can safely say that it
> > doesn't support it. So, the code that targets both targets (e.g. linux
> > kernel) would (in a pedantic way) have to redefine many shared asm
> > defines, one to have clobber and one without it. We don't want that,
> > we want one definition and "let's compiler sort it out".
> >
> > For targets without clobber concept, well - don't add it to the
> > clobber list if it is always ineffective. One *can* add "cc" to all
> > alpha asms, but well.. ;)
>
> Ok, here is a variant of the patch which just ignores "redzone" clobber if
> it doesn't make sense.

The middle-end parts are OK.

> 2024-11-07  Jakub Jelinek  
>
> gcc/
> * target.def (redzone_clobber): New target hook.
> * varasm.cc (decode_reg_name_and_count): Return -5 for
> "redzone".
> * cfgexpand.cc (expand_asm_stmt): Handle redzone clobber.
> * config/i386/i386.h (struct machine_function): Add
> asm_redzone_clobber_seen member.
> * config/i386/i386.cc (ix86_compute_frame_layout): Don't
> use red zone if cfun->machine->asm_redzone_clobber_seen.
> (ix86_redzone_clobber): New function.
> (TARGET_REDZONE_CLOBBER): Redefine.
> * doc/extend.texi (Clobbers and Scratch Registers): Document
> the "redzone" clobber.
> * doc/tm.texi.in: Add @hook TARGET_REDZONE_CLOBBER.
> * doc/tm.texi: Regenerate.
> gcc/testsuite/
> * gcc.dg/asm-redzone-1.c: New test.
> * gcc.target/i386/asm-redzone-1.c: New test.
>
> --- gcc/target.def.jj   2024-11-06 18:53:10.836843793 +0100
> +++ gcc/target.def  2024-11-07 10:57:58.697898800 +0100
> @@ -3376,6 +3376,16 @@ to be used.",
>   bool, (machine_mode mode),
>   NULL)
>
> +DEFHOOK
> +(redzone_clobber,
> + "Define this to return some RTL for the @code{redzone} @code{asm} clobber\n\
> +if target has a red zone and wants to support the @code{redzone} clobber\n\
> +or return NULL if the clobber should be ignored.\n\
> +\n\
> +The default is to ignore the @code{redzone} clobber.",
> + rtx, (),
> + NULL)
> +
>  /* Support for named address spaces.  */
>  #undef HOOK_PREFIX
>  #define HOOK_PREFIX "TARGET_ADDR_SPACE_"
> --- gcc/varasm.cc.jj2024-11-06 18:53:10.838843765 +0100
> +++ gcc/varasm.cc   2024-11-07 10:55:46.858763724 +0100
> @@ -965,9 +965,11 @@ set_user_assembler_name (tree decl, cons
>
>  /* Decode an `asm' spec for a declaration as a register name.
> Return the register number, or -1 if nothing specified,
> -   or -2 if the ASMSPEC is not `cc' or `memory' and is not recognized,
> +   or -2 if the ASMSPEC is not `cc' or `memory' or `redzone' and is not
> +   recognized,
> or -3 if ASMSPEC is `cc' and is not recognized,
> -   or -4 if ASMSPEC is `memory' and is not recognized.
> +   or -4 if ASMSPEC is `memory' and is not recognized,
> +   or -5 if ASMSPEC is `redzone' and is not recognized.
> Accept an exact spelling or a decimal number.
> Prefixes such as % are optional.  */
>
> @@ -1034,6 +1036,9 @@ decode_reg_name_and_count (const char *a
>}
>  #endif /* ADDITIONAL_REGISTER_NAMES */
>
> +  if (!strcmp (asmspec, "redzone"))
> +   return -5;
> +
>if (!strcmp (asmspec, "memory"))
> return -4;
>
> --- gcc/cfgexpand.cc.jj 2024-11-06 18:53:10.803844259 +0100
> +++ gcc/cfgexpand.cc2024-11-07 11:00:16.212953571 +0100
> @@ -3205,6 +3205,12 @@ expand_asm_stmt (gasm *stmt)
>   rtx x = gen_rtx_MEM (BLKmode, gen_rtx_SCRATCH (VOIDmode));
>   clobber_rvec.safe_push (x);
> }
> + else if (j == -5)
> +   {
> + if (targetm.redzone_clobber)
> +   if (rtx x = targetm.redzone_clobber ())
> + clobber_rvec.safe_push (x);
> +   }
>   else
> {
>   /* Otherwise we should have -1 == empty string
> --- gcc/config/i386/i386.h.jj   2024-11-06 18:53:10.807844203 +0100
> +++ gcc/config/i386/i386.h  2024-11-07 10:55:46.904763076 +0100
> @@ -2881,6 +2881,9 @@ struct GTY(()) machine_function {
>/* True if red zone is used

[PATCH] RISC-V: Minimal support for ssdbltrp and smdbltrp extension.

2024-11-28 Thread Dongyan Chen

This patch support ssdbltrp[1] and smdbltrp[2] extension.
To enable GCC to recognize and process ssdbltrp and smdbltrp extension 
correctly at compile time.

[1] https://github.com/riscv/riscv-isa-manual/blob/main/src/ssdbltrp.adoc
[2] https://github.com/riscv/riscv-isa-manual/blob/main/src/smdbltrp.adoc

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc: New extension.
* common/config/riscv/riscv-ext-bitmask.def (RISCV_EXT_BITMASK): Ditto.
* config/riscv/riscv.opt: New mask.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/arch-45.c: New test.
* gcc.target/riscv/arch-46.c: New test.

---
 gcc/common/config/riscv/riscv-common.cc   | 6 ++
 gcc/common/config/riscv/riscv-ext-bitmask.def | 2 ++
 gcc/config/riscv/riscv.opt| 2 ++
 gcc/testsuite/gcc.target/riscv/arch-45.c  | 5 +
 gcc/testsuite/gcc.target/riscv/arch-46.c  | 5 +
 5 files changed, 20 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/arch-45.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/arch-46.c

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index 4c9a72d1180..608f0950f0f 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -222,6 +222,8 @@ static const riscv_implied_info_t riscv_implied_info[] =
   {"sscofpmf", "zicsr"},
   {"ssstateen", "zicsr"},
   {"sstc", "zicsr"},
+  {"ssdbltrp", "zicsr"},
+  {"smdbltrp", "zicsr"},
 
   {"xsfvcp", "zve32x"},
 
@@ -401,6 +403,8 @@ static const struct riscv_ext_version 
riscv_ext_version_table[] =
   {"sscofpmf",  ISA_SPEC_CLASS_NONE, 1, 0},
   {"ssstateen", ISA_SPEC_CLASS_NONE, 1, 0},
   {"sstc",  ISA_SPEC_CLASS_NONE, 1, 0},
+  {"ssdbltrp",  ISA_SPEC_CLASS_NONE, 1, 0},
+  {"smdbltrp",  ISA_SPEC_CLASS_NONE, 1, 0},
 
   {"svinval", ISA_SPEC_CLASS_NONE, 1, 0},
   {"svnapot", ISA_SPEC_CLASS_NONE, 1, 0},
@@ -1725,6 +1729,8 @@ static const riscv_ext_flag_table_t 
riscv_ext_flag_table[] =
   RISCV_EXT_FLAG_ENTRY ("svinval", x_riscv_sv_subext, MASK_SVINVAL),
   RISCV_EXT_FLAG_ENTRY ("svnapot", x_riscv_sv_subext, MASK_SVNAPOT),
   RISCV_EXT_FLAG_ENTRY ("svvptc", x_riscv_sv_subext, MASK_SVVPTC),
+  RISCV_EXT_FLAG_ENTRY ("ssdbltrp", x_riscv_sv_subext, MASK_SSDBLTRP),
+  RISCV_EXT_FLAG_ENTRY ("smdbltrp", x_riscv_sv_subext, MASK_SMDBLTRP),
 
   RISCV_EXT_FLAG_ENTRY ("ztso", x_riscv_ztso_subext, MASK_ZTSO),
 
diff --git a/gcc/common/config/riscv/riscv-ext-bitmask.def 
b/gcc/common/config/riscv/riscv-ext-bitmask.def
index a733533df98..9814b887b2d 100644
--- a/gcc/common/config/riscv/riscv-ext-bitmask.def
+++ b/gcc/common/config/riscv/riscv-ext-bitmask.def
@@ -80,5 +80,7 @@ RISCV_EXT_BITMASK ("zcf", 1,  5)
 RISCV_EXT_BITMASK ("zcmop",1,  6)
 RISCV_EXT_BITMASK ("zawrs",1,  7)
 RISCV_EXT_BITMASK ("svvptc",   1,  8)
+RISCV_EXT_BITMASK ("ssdbltrp", 1,  9)
+RISCV_EXT_BITMASK ("smdbltrp", 1,  10)
 
 #undef RISCV_EXT_BITMASK
diff --git a/gcc/config/riscv/riscv.opt b/gcc/config/riscv/riscv.opt
index a6a61a83db1..f5b3cf1103e 100644
--- a/gcc/config/riscv/riscv.opt
+++ b/gcc/config/riscv/riscv.opt
@@ -468,6 +468,8 @@ Mask(SVNAPOT) Var(riscv_sv_subext)
 
 Mask(SVVPTC) Var(riscv_sv_subext)
 
+Mask(SSDBLTRP) Var(riscv_sv_subext)
+
 TargetVariable
 int riscv_ztso_subext
 
diff --git a/gcc/testsuite/gcc.target/riscv/arch-45.c 
b/gcc/testsuite/gcc.target/riscv/arch-45.c
new file mode 100644
index 000..85e2510b40a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/arch-45.c
@@ -0,0 +1,5 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc_ssdbltrp -mabi=lp64" } */
+int foo()
+{
+}
diff --git a/gcc/testsuite/gcc.target/riscv/arch-46.c 
b/gcc/testsuite/gcc.target/riscv/arch-46.c
new file mode 100644
index 000..85e2510b40a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/arch-46.c
@@ -0,0 +1,5 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc_ssdbltrp -mabi=lp64" } */
+int foo()
+{
+}
-- 
2.43.0

[PATCH][PR117704] testsuite: Fix test failure on x86_32 by adding -mfpmath=sse+387

2024-11-28 Thread Jennifer Schmitz

The test gcc.dg/tree-ssa/pow_fold_1.c was failing for 32-bit x86 due to
incompatibility of '-fexcess-precision=16' with '-mfpmath=387'.
In order to resolve this, this patch adds -msse -mfpmath=sse+387 for i?86-*-*.

We tested this by running the test on an x86_64 machine with
--target_board={unix/-m32}.
OK for mainline?

Signed-off-by: Jennifer Schmitz 

gcc/testsuite/
PR testsuite/117704
* gcc.dg/tree-ssa/pow_fold_1.c: Add -msse -mfpmath=sse+387
for i?86-*-*.
---
 gcc/testsuite/gcc.dg/tree-ssa/pow_fold_1.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pow_fold_1.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pow_fold_1.c
index d98bcb0827e..cb9d52e9653 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pow_fold_1.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pow_fold_1.c
@@ -1,6 +1,7 @@
 /* { dg-do compile } */
 /* { dg-options "-Ofast -fdump-tree-optimized -fexcess-precision=16" } */
 /* { dg-add-options float16 } */
+/* { dg-additional-options "-msse -mfpmath=sse+387" { target { i?86-*-* } } } 
*/
 /* { dg-require-effective-target float16_runtime } */
 /* { dg-require-effective-target c99_runtime } */
 
-- 

smime.p7s
Description: S/MIME cryptographic signature

[PATCH v3] zero_extend(not) -> xor optimization [PR112398]

2024-11-28 Thread Alexey Merzlyakov

This patch adds optimization of the following patterns:

  (zero_extend:M (subreg:N (not:O==M (X:Q==M ->
  (xor:M (zero_extend:M (subreg:N (X:M)), mask))
  ... where the mask is GET_MODE_MASK (N).

For the cases when X:M doesn't have any non-zero bits outside of mode N,
(zero_extend:M (subreg:N (X:M)) could be simplified to just (X:M)
and whole optimization will be:

  (zero_extend:M (subreg:N (not:M (X:M ->
  (xor:M (X:M, mask))

Patch targets to handle code patterns like:
  not   a0,a0
  andi  a0,a0,0xff
to be optimized to:
  xori  a0,a0,255

PR rtl-optimization/112398
PR rtl-optimization/117476

gcc/ChangeLog:

* simplify-rtx.cc (simplify_context::simplify_unary_operation_1):
Simplify ZERO_EXTEND (SUBREG (NOT X)) to XOR (X, GET_MODE_MASK(SUBREG))
when X doesn't have any non-zero bits outside of SUBREG mode.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/pr112398.c: New test.
* gcc.dg/torture/pr117476-1.c: New test. From Zhendong Su.
* gcc.dg/torture/pr117476-2.c: New test. From Zdenek Sojka.

Signed-off-by: Alexey Merzlyakov 
---
 gcc/simplify-rtx.cc   | 23 +++
 gcc/testsuite/gcc.dg/torture/pr117476-1.c | 12 
 gcc/testsuite/gcc.dg/torture/pr117476-2.c | 20 
 gcc/testsuite/gcc.target/riscv/pr112398.c | 14 ++
 4 files changed, 69 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr117476-1.c
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr117476-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/pr112398.c

diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
index 893c5f6e1ae..86b3f331928 100644
--- a/gcc/simplify-rtx.cc
+++ b/gcc/simplify-rtx.cc
@@ -1842,6 +1842,29 @@ simplify_context::simplify_unary_operation_1 (rtx_code 
code, machine_mode mode,
  & ~GET_MODE_MASK (op_mode)) == 0)
return SUBREG_REG (op);
 
+  /* Trying to optimize:
+(zero_extend:M (subreg:N (not:M (X:M ->
+(xor:M (zero_extend:M (subreg:N (X:M)), mask))
+where the mask is GET_MODE_MASK (N).
+For the cases when X:M doesn't have any non-zero bits
+outside of mode N, (zero_extend:M (subreg:N (X:M))
+will be simplified to just (X:M)
+and whole optimization will be -> (xor:M (X:M, mask)).  */
+  if (partial_subreg_p (op)
+ && GET_CODE (XEXP (op, 0)) == NOT
+ && GET_MODE (XEXP (op, 0)) == mode
+ && subreg_lowpart_p (op)
+ && HWI_COMPUTABLE_MODE_P (mode)
+ && is_a  (GET_MODE (op), &op_mode)
+ && (nonzero_bits (XEXP (XEXP (op, 0), 0), mode)
+ & ~GET_MODE_MASK (op_mode)) == 0)
+  {
+   unsigned HOST_WIDE_INT mask = GET_MODE_MASK (op_mode);
+   return simplify_gen_binary (XOR, mode,
+   XEXP (XEXP (op, 0), 0),
+   gen_int_mode (mask, mode));
+  }
+
 #if defined(POINTERS_EXTEND_UNSIGNED)
   /* As we do not know which address space the pointer is referring to,
 we can do this only if the target does not support different pointer
diff --git a/gcc/testsuite/gcc.dg/torture/pr117476-1.c 
b/gcc/testsuite/gcc.dg/torture/pr117476-1.c
new file mode 100644
index 000..d2955624040
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr117476-1.c
@@ -0,0 +1,12 @@
+/* PR rtl-optimization/117476.
+   First case checking out of mode N non-zero bits. */
+/* { dg-do run } */
+
+int c = 0x1FF;
+
+int main()
+{
+  if (((c ^ 0xFF) & 0xFF) != 0)
+__builtin_abort();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.dg/torture/pr117476-2.c 
b/gcc/testsuite/gcc.dg/torture/pr117476-2.c
new file mode 100644
index 000..1973ebc45e4
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr117476-2.c
@@ -0,0 +1,20 @@
+/* PR rtl-optimization/117476.
+   Second case checking skipping of TI mode. */
+/* { dg-do run } */
+/* { dg-require-effective-target int128 } */
+
+unsigned __int128 g;
+
+void
+foo ()
+{
+  g += __builtin_add_overflow_p (~g, 0, 0ul);
+}
+
+int
+main ()
+{
+  foo();
+  if (!g)
+__builtin_abort();
+}
diff --git a/gcc/testsuite/gcc.target/riscv/pr112398.c 
b/gcc/testsuite/gcc.target/riscv/pr112398.c
new file mode 100644
index 000..624a665b76c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr112398.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc -mabi=lp64d" } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" } } */
+
+#include 
+
+uint8_t neg_u8 (const uint8_t src)
+{
+  return ~src;
+}
+
+/* { dg-final { scan-assembler-times "xori\t" 1 } } */
+/* { dg-final { scan-assembler-not "not\t" } } */
+/* { dg-final { scan-assembler-not "andi\t" } } */
-- 
2.34.1

Re: [Bug tree-optimization/109429] [PATCH v2] ivopts: fixed complexities

2024-11-28 Thread Richard Biener

On Wed, Sep 25, 2024 at 5:32 PM Aleksandar Rakic
 wrote:
>
> Hi,
>
> I think I managed to fix indentation from the previous version.
>
> When comparing the tables showing the candidates for the group 1 before
> and after applying this patch, it can be observed that complexities for
> the candidates where the computation depends on the invariant
> expressions or the invariant variables should be at least one, which
> aligns with the approach used in the commit c2b64ce.

Commit c2b64ce is

+2017-05-11  Bin Cheng  
+
+   * tree-ssa-address.c (struct mem_address): Move to header file.
+   (valid_mem_ref_p, move_fixed_address_to_symbol): Make it global.
+   * tree-ssa-address.h (struct mem_address): Move from C file.
+   (valid_mem_ref_p, move_fixed_address_to_symbol): Declare.

that hasn't any "approach" to complexity, it just makes a function global.

>
> = Before this patch =
> Group 1:
>   cand  costcompl.  inv.expr.   inv.vars
>   1 11  0   5;  NIL;
>   2 11  0   6;  NIL;
>   4 8   0   7;  NIL;
>   5 9   0   8;  NIL;
>   6 1   0   NIL;NIL;
>   7 1   1   NIL;NIL;
>   9 7   0   5;  NIL;
> = Before this patch =
> = After this patch =
> Group 1:
>   cand  costcompl.  inv.expr.   inv.vars
>   1 11  2   4;  NIL;

why does complexity go up to 2 from 0 here?

>   2 11  1   4;  NIL;
>   4 8   1   5;  NIL;
>   5 8   2   6;  NIL;

Likewise, and why does cost change?

>   6 1   0   NIL;NIL;
>   7 1   1   NIL;NIL;
>   9 7   2   4;  NIL;

Likewise.

This comparison is probably not very useful without showing the actual candidate
and its uses?  The above before/after figures do not match the testcase
ontop of trunk.

> = After this patch =
>
> Hence, if the invariant expressions or the invariant variables are used
> when representing use with candidate, the complexity should be larger
> for more complex expressions, so it is incremented by one. I am not sure
> whether inv_present could be expressed as parts.

The testcase looks mips specific - it has just scan-tree-dump-not which
is probably easily confused to PASS when it regressed.  Can you instead
add a gcc.target/mips/ testcase that scans for actual assembler features?
If the testcase relies on inlining daxpy then declaring that static helps that
you just get dgefa in the final assembly.  If you want to scan both functions
I suggest to split the testcase into two to make it more reliable.

I see r15-5650-gd9c908b7503965 for a --target=mips64-r6-linux-gnu generates
for the innermost loop of dgefa

.L12:
addu$3,$9,$2
addu$3,$3,$8
lwc1$f1,0($3)
lwc1$f0,0($2)
addiu   $7,$7,1
mul.s   $f1,$f2,$f1
addiu   $2,$2,4
slt $3,$7,$10
add.s   $f0,$f0,$f1
.setnoreorder
.setnomacro
bne $3,$0,.L12

and with the patch

.L12:
addu$3,$9,$2
addu$3,$3,$8
lwc1$f1,0($3)
lwc1$f0,0($2)
addiu   $7,$7,1
mul.s   $f1,$f2,$f1
addiu   $2,$2,4
slt $3,$7,$10
add.s   $f0,$f0,$f1
.setnoreorder
.setnomacro
bne $3,$0,.L12

that's suspiciously identical?!  In fact the whole testcase generates
identical code.

So besides not being able to see the actual problem (maybe I need some
-march/-mtune?)
the actual issue I have with the patch is that aff_inv is tried to be
distributed to other
components and for parts that fail to be distributed we cost it via

  if (comp_inv != NULL_TREE)
cost = force_var_cost (data, comp_inv, inv_vars);

simply ensuring the complexity is at least one would have been to change that to

  if (comp_inv != NULL_TREE)
{
  cost = force_var_cost (data, comp_inv, inv_vars);
  /* Ensure complexity is at least one.  */
  cost.complexity = MAX (1, cost.complexity);
}

or alternatively just do that for the if (comp_inv && inv_expr &&
!simple_inv) case
(it's a bit odd we adjust cost there only for 'inv_expr != NULL').

The patch you posted instead of just adjusting complexity seems to change
the way we distribute the invariant - in particular we now distribute it to
parts.offset even when that is not supported (!(ok_with_ratio_p ||
ok_without_ratio_p)),
that's an odd change.

complexity is supposed to be a tie-breaker, so I think that having
bigger complexity
for when we can't move it fully to index is OK - in the end any part
that cannot be
moved will end up being applied to base I think (that we have
essentially two functions
for this, one for costing and one for actual code emission, is a bit
unfortunate).

In the end I can't ack this patch as I cannot reproduce an effect on
the testcase
you added and because of the odd change part o

Re: [PATCH] Introduce feeble_inline attribute [PR93008]

2024-11-28 Thread Richard Biener

On Thu, 28 Nov 2024, Jakub Jelinek wrote:

> On Thu, Nov 28, 2024 at 01:03:01PM +0100, Richard Biener wrote:
> > > I think auto_inline and inline would be just confusing, even in the 
> > > negative
> > > forms.  We actually "auto-inline" even functions not declared inline, just
> > > with different heuristics.
> > 
> > But inline __attribute__((feeble_inline)) is exactly 'auto-inline', no?
> 
> No.  inline is that 'auto-inline' (with the meaning, do what IPA does with
> DECL_DECLARED_INLINE_P right now, use it as a hint to inline stuff, higher
> limits and the like).
> inline __attribute__((feeble_inline)) is that 'default', i.e. for IPA
> purposes handle it as if it wasn't explicitly inline.  Except that the FE
> do what they should do with any inline, e.g. make it comdat, for constexpr
> constexpr, handle static variables in those specially, ...

OK, then it's just notational difference.  auto-inline to me is
applied to everything _not_ declared inline while 'inline' has
alternate cost metrics.  Aka "auto-inline" makes GCC add "inline".

Oh, there's -finline-functions which is enabling this "auto-inline",
keyed on whether a function is declared inline.  Does feeble_inline
also "undo" 'inline' to the level requiring -finline-functions to
consider a function as inline candidate?  (I know it's not _that_
simple, given -finline-functions-called-once considers not
inline declared functions as well, likewise
-finline-small-functions, cf. --param early-inlining-insns).

Richard.

>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Re: [PATCH] Fortran: fix crash with bounds check writing array section [PR117791]

2024-11-28 Thread Paul Richard Thomas

Hi Harald,


>
>> I'll wait until tomorrow to see if Paul intervenes.  Otherwise I will
>> proceed and push.
>>
>
I succeeded in breaking things even more! Please proceed and push.

Thanks

Paul

[PATCH] docs: Fix up __sync_* documentation [PR117642]

2024-11-28 Thread Jakub Jelinek

Hi!

The PR14311 commit which added support for __sync_* builtins documented that
there is a warning if a particular operation cannot be implemented.
But that commit nor anything later on implemented such warning, it was
always silent generation of the mentioned calls (which can in most cases
result in linker errors of course because those functions aren't implemented
anywhere, in libatomic or elsewhere in code shipped in gcc).

So, the following patch just adjust the documentation to match the
implementation.

Ok for trunk?

2024-11-28  Jakub Jelinek  

PR target/117642
* doc/extend.texi: Remove documentation of warning for unimplemented
__sync_* operations, such warning has never been implemented.

--- gcc/doc/extend.texi.jj  2024-11-28 11:48:15.232659061 +0100
+++ gcc/doc/extend.texi 2024-11-28 13:33:05.986713542 +0100
@@ -13562,16 +13562,11 @@ builtins (@pxref{__atomic Builtins}).  T
 code which should use the @samp{__atomic} builtins instead.
 
 Not all operations are supported by all target processors.  If a particular
-operation cannot be implemented on the target processor, a warning is
-generated and a call to an external function is generated.  The external
-function carries the same name as the built-in version,
-with an additional suffix
+operation cannot be implemented on the target processor, a call to an
+external function is generated.  The external function carries the same name
+as the built-in version, with an additional suffix
 @samp{_@var{n}} where @var{n} is the size of the data type.
 
-@c ??? Should we have a mechanism to suppress this warning?  This is almost
-@c useful for implementing the operation under the control of an external
-@c mutex.
-
 In most cases, these built-in functions are considered a @dfn{full barrier}.
 That is,
 no memory operand is moved across the operation, either forward or

Jakub

Re: [PATCH] docs: Fix up __sync_* documentation [PR117642]

2024-11-28 Thread Richard Biener

On Thu, Nov 28, 2024 at 1:52 PM Jakub Jelinek  wrote:
>
> Hi!
>
> The PR14311 commit which added support for __sync_* builtins documented that
> there is a warning if a particular operation cannot be implemented.
> But that commit nor anything later on implemented such warning, it was
> always silent generation of the mentioned calls (which can in most cases
> result in linker errors of course because those functions aren't implemented
> anywhere, in libatomic or elsewhere in code shipped in gcc).
>
> So, the following patch just adjust the documentation to match the
> implementation.
>
> Ok for trunk?

OK

> 2024-11-28  Jakub Jelinek  
>
> PR target/117642
> * doc/extend.texi: Remove documentation of warning for unimplemented
> __sync_* operations, such warning has never been implemented.
>
> --- gcc/doc/extend.texi.jj  2024-11-28 11:48:15.232659061 +0100
> +++ gcc/doc/extend.texi 2024-11-28 13:33:05.986713542 +0100
> @@ -13562,16 +13562,11 @@ builtins (@pxref{__atomic Builtins}).  T
>  code which should use the @samp{__atomic} builtins instead.
>
>  Not all operations are supported by all target processors.  If a particular
> -operation cannot be implemented on the target processor, a warning is
> -generated and a call to an external function is generated.  The external
> -function carries the same name as the built-in version,
> -with an additional suffix
> +operation cannot be implemented on the target processor, a call to an
> +external function is generated.  The external function carries the same name
> +as the built-in version, with an additional suffix
>  @samp{_@var{n}} where @var{n} is the size of the data type.
>
> -@c ??? Should we have a mechanism to suppress this warning?  This is almost
> -@c useful for implementing the operation under the control of an external
> -@c mutex.
> -
>  In most cases, these built-in functions are considered a @dfn{full barrier}.
>  That is,
>  no memory operand is moved across the operation, either forward or
>
> Jakub
>

Re: [PATCH] Introduce feeble_inline attribute [PR93008]

2024-11-28 Thread Jan Hubicka

> 
> I think a 4 state flag { never_inline, default, auto_inline, always_inline }
> would be fine.  The question is how to call the macro(s) and values
> and how to merge those from different decls and what we do currently
> e.g. for noinline, always_inline, on the same or on different decls
> of the same function.
I was also thinking a bit of the name, but it seemed too late to jump in
:)

Generally inliner has the following modes
 - noinline
 - conservative inlining (driven by -auto limits)
 - aggressive inlining (driven by -single limits)
 - disregarding inline limits (inline when you can, former extern inline
   of GNU C)
 - always inline (error when you can not inlined, at least sometimes -
   we included a design problem allowing always inline functions to have
   address taken, be recursive or be exported since historically
   always_inline was upgraded from disregarding to stronger
   interpretation)
Moreover meaning of conservative and aggressive is sensitive to
optimization level and also can be overwritten by inline hints.

So if we go for multi stage flag we probably want to have those 5 levels
+ default option.  There is also PR about switching inline limits (O2
wrt O3) which I am not sure how to fit into this picture.

-auto and -single names are historical and not very good. -single
predates me and -auto is my fault.
Perhaps the flags to switch conservative and aggresive inlining 
be called somehting like inline_conservatively and inline_aggresively
or conservative_inline and aggressive_inline. feeble_inline is
conservative option...

Honza
> 
> > Note I've had to lookup what 'feeble' means - given we use -auto
> 
> feeble was used in the meaning of synonym to weak, as I wrote,
> weak_inline could be confusing.
> Another possibility would be weaker_inline though, that one can't
> confuse with weak attribute.
> 
> > for --params I'd have chosen __attribute__((auto_inline)), possibly
> > "completed" by __attribute__((inline)) to mark a function as
> > wanting 'inline' heuristics but not 'inline' semantics.
> 
> I think auto_inline and inline would be just confusing, even in the negative
> forms.  We actually "auto-inline" even functions not declared inline, just
> with different heuristics.
> 
>   Jakub
>

Re: [PATCH][PR117704] testsuite: Fix test failure on x86_32 by adding -mfpmath=sse+387

2024-11-28 Thread Uros Bizjak

On Thu, Nov 28, 2024 at 12:22 PM Jennifer Schmitz  wrote:
>
> The test gcc.dg/tree-ssa/pow_fold_1.c was failing for 32-bit x86 due to
> incompatibility of '-fexcess-precision=16' with '-mfpmath=387'.
> In order to resolve this, this patch adds -msse -mfpmath=sse+387 for i?86-*-*.

This is already fixed by [1] that adds missing -mpfpmath=sse to
add_options_for_float16 dejagnu procedure.

[1] https://gcc.gnu.org/pipermail/gcc-patches/2024-November/669910.html

Uros.

>
> We tested this by running the test on an x86_64 machine with
> --target_board={unix/-m32}.
> OK for mainline?
>
> Signed-off-by: Jennifer Schmitz 
>
> gcc/testsuite/
> PR testsuite/117704
> * gcc.dg/tree-ssa/pow_fold_1.c: Add -msse -mfpmath=sse+387
> for i?86-*-*.
> ---
>  gcc/testsuite/gcc.dg/tree-ssa/pow_fold_1.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pow_fold_1.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/pow_fold_1.c
> index d98bcb0827e..cb9d52e9653 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/pow_fold_1.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pow_fold_1.c
> @@ -1,6 +1,7 @@
>  /* { dg-do compile } */
>  /* { dg-options "-Ofast -fdump-tree-optimized -fexcess-precision=16" } */
>  /* { dg-add-options float16 } */
> +/* { dg-additional-options "-msse -mfpmath=sse+387" { target { i?86-*-* } } 
> } */
>  /* { dg-require-effective-target float16_runtime } */
>  /* { dg-require-effective-target c99_runtime } */
>
> --

Address UNRESOLVED for 'g++.dg/tree-ssa/empty-loop.C' (was: optimize basic_string)

2024-11-28 Thread Thomas Schwinge

Hi!

On 2024-11-26T16:43:07+0100, Jan Hubicka  wrote:
> I also noticed that this patch trigger empty-loop.C failure which I
> originaly attributed to different change.  I filled PR117764 on that.
> We are no longer able to eliminate empty loops early, but we still 
> optimize them late.

> --- a/gcc/testsuite/g++.dg/tree-ssa/empty-loop.C
> +++ b/gcc/testsuite/g++.dg/tree-ssa/empty-loop.C
> @@ -30,5 +30,8 @@ int foo (vector &v, list &l, set 
> &s, map &m
>  
>return 0;
>  }
> -/* { dg-final { scan-tree-dump-not "if" "cddce2"} } */
> +/* Adding __builtin_unreachable to std::string::size() prevents cddce2 from
> +   eliminating the loop early, see PR117764.  */
> +/* { dg-final { scan-tree-dump-not "if" "cddce2" { xfail *-*-* } } } */
> +/* { dg-final { scan-tree-dump-not "if" "cddce3"} } */

Pushed to trunk branch commit 3e8d3079c31567d3e9f43cc2cb100ddef25f48a2
"Address UNRESOLVED for 'g++.dg/tree-ssa/empty-loop.C'", see attached.


Grüße
 Thomas


>From 3e8d3079c31567d3e9f43cc2cb100ddef25f48a2 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Thu, 28 Nov 2024 14:31:17 +0100
Subject: [PATCH] Address UNRESOLVED for 'g++.dg/tree-ssa/empty-loop.C'

As of commit 1046c32de4956c3d706a2ff8683582fd21b8f360 "optimize basic_string",
we've got:

PASS: g++.dg/tree-ssa/empty-loop.C  -std=gnu++17 (test for excess errors)
[-PASS:-]{+XFAIL:+} g++.dg/tree-ssa/empty-loop.C  -std=gnu++17  scan-tree-dump-not cddce2 "if"
{+UNRESOLVED: g++.dg/tree-ssa/empty-loop.C  -std=gnu++17  scan-tree-dump-not cddce3 "if"+}
[Etc.]

	gcc/testsuite/
	* g++.dg/tree-ssa/empty-loop.C: Address UNRESOLVED.
---
 gcc/testsuite/g++.dg/tree-ssa/empty-loop.C | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/g++.dg/tree-ssa/empty-loop.C b/gcc/testsuite/g++.dg/tree-ssa/empty-loop.C
index b7e7e27cc042..adb6ab582dbc 100644
--- a/gcc/testsuite/g++.dg/tree-ssa/empty-loop.C
+++ b/gcc/testsuite/g++.dg/tree-ssa/empty-loop.C
@@ -1,5 +1,6 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-cddce2 -ffinite-loops -Wno-unused-result" } */
+/* { dg-options "-O2 -ffinite-loops -Wno-unused-result" } */
+/* { dg-additional-options "-fdump-tree-cddce2 -fdump-tree-cddce3" } */
 /* { dg-skip-if "requires hosted libstdc++ for string" { ! hostedlib } } */
 
 #include 
-- 
2.34.1

[PATCH v4 4/5] openmp, fortran: Add support for map iterators in OpenMP target construct (Fortran)

2024-11-28 Thread Kwok Cheung Yeung

When constructing an iterator with a subset of the original, the 
original BLOCK_SUBBLOCKS is moved to the first new iterator. Otherwise 
this part of the patchset is unchanged.From f40b72c1e750ec948ebf3ffd92da107679d0b702 Mon Sep 17 00:00:00 2001
From: Kwok Cheung Yeung 
Date: Wed, 27 Nov 2024 21:53:58 +
Subject: [PATCH 4/5] openmp, fortran: Add support for map iterators in OpenMP
 target construct (Fortran)

This adds support for iterators in map clauses within OpenMP
'target' constructs in Fortran.

Some special handling for struct field maps has been added to libgomp in
order to handle arrays of derived types.

2024-11-27  Kwok Cheung Yeung  

gcc/fortran/
* dump-parse-tree.cc (show_omp_namelist): Add iterator support for
OMP_LIST_MAP.
* openmp.cc (gfc_free_omp_clauses): Free namespace in namelist for
OMP_LIST_MAP.
(gfc_match_omp_clauses): Parse 'iterator' modifier for 'map' clause.
(resolve_omp_clauses): Resolve iterators for OMP_LIST_MAP.
* trans-openmp.cc (gfc_trans_omp_clauses): Handle iterators in
OMP_LIST_MAP clauses.  Add expressions to iter_block rather than
block.

gcc/
* gimplify.cc (compute_omp_iterator_count): Account for difference
in loop boundaries in Fortran.
(build_omp_iterator_loop): Change upper boundary condition for
Fortran.  Insert block statements into innermost loop.
(remove_unused_omp_iterator_vars): Copy block subblocks of old
iterator to new iterator and remove original.
(contains_only_iterator_vars_1): New.
(contains_only_iterator_vars): New.
(extract_base_bit_offset): Add iterator argument.  Do not set
variable_offset if contains_only_iterator_vars is true.
(omp_accumulate_sibling_list): Add iterator argument to
extract_base_bit_offset.
* tree-pretty-print.cc (dump_block_node): Ignore BLOCK_SUBBLOCKS
containing iterator block statements.

gcc/testsuite/
* gfortran.dg/gomp/target-map-iterators-1.f90: New.
* gfortran.dg/gomp/target-map-iterators-2.f90: New.
* gfortran.dg/gomp/target-map-iterators-3.f90: New.
* gfortran.dg/gomp/target-map-iterators-4.f90: New.

libgomp/
* target.c (kind_to_name): Handle GOMP_MAP_STRUCT and
GOMP_MAP_STRUCT_UNORD.
(gomp_add_map): New.
(gomp_merge_iterator_maps): Expand fields of a struct mapping
breadth-first.
* testsuite/libgomp.fortran/target-map-iterators-1.f90: New.
* testsuite/libgomp.fortran/target-map-iterators-2.f90: New.
* testsuite/libgomp.fortran/target-map-iterators-3.f90: New.
---
 gcc/fortran/dump-parse-tree.cc|  9 +-
 gcc/fortran/openmp.cc | 35 ++--
 gcc/fortran/trans-openmp.cc   | 71 
 gcc/gimplify.cc   | 82 +++---
 .../gomp/target-map-iterators-1.f90   | 26 ++
 .../gomp/target-map-iterators-2.f90   | 33 
 .../gomp/target-map-iterators-3.f90   | 24 ++
 .../gomp/target-map-iterators-4.f90   | 31 +++
 gcc/tree-pretty-print.cc  |  4 +-
 libgomp/target.c  | 83 ++-
 .../target-map-iterators-1.f90| 45 ++
 .../target-map-iterators-2.f90| 45 ++
 .../target-map-iterators-3.f90| 56 +
 13 files changed, 489 insertions(+), 55 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/target-map-iterators-1.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/target-map-iterators-2.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/target-map-iterators-3.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/target-map-iterators-4.f90
 create mode 100644 libgomp/testsuite/libgomp.fortran/target-map-iterators-1.f90
 create mode 100644 libgomp/testsuite/libgomp.fortran/target-map-iterators-2.f90
 create mode 100644 libgomp/testsuite/libgomp.fortran/target-map-iterators-3.f90

diff --git a/gcc/fortran/dump-parse-tree.cc b/gcc/fortran/dump-parse-tree.cc
index 8e6adfe2829..6db470a9017 100644
--- a/gcc/fortran/dump-parse-tree.cc
+++ b/gcc/fortran/dump-parse-tree.cc
@@ -1359,7 +1359,8 @@ show_omp_namelist (int list_type, gfc_omp_namelist *n)
   for (; n; n = n->next)
 {
   gfc_current_ns = ns_curr;
-  if (list_type == OMP_LIST_AFFINITY || list_type == OMP_LIST_DEPEND)
+  if (list_type == OMP_LIST_AFFINITY || list_type == OMP_LIST_DEPEND
+ || list_type == OMP_LIST_MAP)
{
  gfc_current_ns = n->u2.ns ? n->u2.ns : ns_curr;
  if (n->u2.ns != ns_iter)
@@ -1371,8 +1372,12 @@ show_omp_namelist (int list_type, gfc_omp_namelist *n)
fputs ("AFFINITY (", dumpfile);
  else if (n->u.depend_doacross_op == OMP_DOACROSS_SINK_FIRST)
fputs ("D

[PATCH v4 3/5] openmp: Add support for iterators in 'target update' clauses (C/C++)

2024-11-28 Thread Kwok Cheung Yeung

The target update clause decls and sizes are now Gimplified in 
gimplify_scan_omp_clauses. The rest of the patch is mostly unchanged.From 79159cbf815d458114e7c6da8dbb138ce24b7df1 Mon Sep 17 00:00:00 2001
From: Kwok Cheung Yeung 
Date: Wed, 27 Nov 2024 21:51:34 +
Subject: [PATCH 3/5] openmp: Add support for iterators in 'target update'
 clauses (C/C++)

This adds support for iterators in 'to' and 'from' clauses in the
'target update' OpenMP directive.

2024-11-27  Kwok Cheung Yeung  

gcc/c/
* c-parser.cc (c_parser_omp_clause_from_to): Parse 'iterator' modifier.
* c-typeck.cc (c_finish_omp_clauses): Finish iterators for to/from
clauses.

gcc/cp/
* parser.cc (cp_parser_omp_clause_from_to): Parse 'iterator' modifier.
* semantics.cc (finish_omp_clauses): Finish iterators for to/from
clauses.

gcc/
* gimplify.cc (gimplify_scan_omp_clauses): Add argument for iterator
loop sequence.   Gimplify the clause decl and size into the iterator
loop if iterators are used.
(gimplify_omp_workshare): Add argument for iterator loops sequence
in call to gimplify_scan_omp_clauses.
(gimplify_omp_target_update): Call remove_unused_omp_iterator_vars and
build_omp_iterators_loops.  Add loop sequence as argument when calling
gimplify_scan_omp_clauses, gimplify_adjust_omp_clauses and building
the Gimple statement.
* tree-pretty-print.cc (dump_omp_clause): Call dump_omp_iterators
for to/from clauses with iterators.
* tree.cc (omp_clause_num_ops): Add extra operand for OMP_CLAUSE_FROM
and OMP_CLAUSE_TO.
* tree.h (OMP_CLAUSE_HAS_ITERATORS): Add check for OMP_CLAUSE_TO and
OMP_CLAUSE_FROM.
(OMP_CLAUSE_ITERATORS): Likewise.

gcc/testsuite/
* c-c++-common/gomp/target-update-iterators-1.c: New.
* c-c++-common/gomp/target-update-iterators-2.c: New.
* c-c++-common/gomp/target-update-iterators-3.c: New.

libgomp/
* target.c (gomp_update): Call gomp_merge_iterator_maps.  Free
allocated variables.
* testsuite/libgomp.c-c++-common/target-update-iterators-1.c: New.
* testsuite/libgomp.c-c++-common/target-update-iterators-2.c: New.
* testsuite/libgomp.c-c++-common/target-update-iterators-3.c: New.
---
 gcc/c/c-parser.cc | 105 +++--
 gcc/c/c-typeck.cc |   5 +-
 gcc/cp/parser.cc  | 111 --
 gcc/cp/semantics.cc   |   5 +-
 gcc/gimplify.cc   |  37 +++---
 .../gomp/target-update-iterators-1.c  |  20 
 .../gomp/target-update-iterators-2.c  |  23 
 .../gomp/target-update-iterators-3.c  |  17 +++
 gcc/tree-pretty-print.cc  |  10 ++
 gcc/tree.cc   |   4 +-
 gcc/tree.h|   6 +-
 libgomp/target.c  |  14 +++
 .../target-update-iterators-1.c   |  65 ++
 .../target-update-iterators-2.c   |  58 +
 .../target-update-iterators-3.c   |  67 +++
 15 files changed, 509 insertions(+), 38 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/gomp/target-update-iterators-1.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/target-update-iterators-2.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/target-update-iterators-3.c
 create mode 100644 
libgomp/testsuite/libgomp.c-c++-common/target-update-iterators-1.c
 create mode 100644 
libgomp/testsuite/libgomp.c-c++-common/target-update-iterators-2.c
 create mode 100644 
libgomp/testsuite/libgomp.c-c++-common/target-update-iterators-3.c

diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
index c3e57341850..5d1b17e5b25 100644
--- a/gcc/c/c-parser.cc
+++ b/gcc/c/c-parser.cc
@@ -20037,8 +20037,11 @@ c_parser_omp_clause_device_type (c_parser *parser, 
tree list)
to ( variable-list )

OpenMP 5.1:
-   from ( [present :] variable-list )
-   to ( [present :] variable-list ) */
+   from ( [motion-modifier[,] [motion-modifier[,]...]:] variable-list )
+   to ( [motion-modifier[,] [motion-modifier[,]...]:] variable-list )
+
+   motion-modifier:
+ present | iterator (iterators-definition)  */

 static tree
 c_parser_omp_clause_from_to (c_parser *parser, enum omp_clause_code kind,
@@ -20049,15 +20052,88 @@ c_parser_omp_clause_from_to (c_parser *parser, enum 
omp_clause_code kind,
   if (!parens.require_open (parser))
 return list;

+  int pos = 1, colon_pos = 0;
+  int iterator_length = 0;
+  while (c_parser_peek_nth_token_raw (parser, pos)->type == CPP_NAME)
+{
+  if (c_parser_peek_nth_token_raw (parser, pos + 1)->type
+ == CPP_OPEN_PAREN)
+   {
+ unsigned int n = pos + 2;
+ if (c_parser_check_balanced_raw_token_sequence (parser,

[PATCH v4 2/5] openmp: Add support for iterators in map clauses (C/C++)

2024-11-28 Thread Kwok Cheung Yeung

A new field has been added to gomp_target to store the gimple_seq for 
the iterator loops. This means that gomp_target now needs a separate GSS 
code. Accessor functions have been added for this field, and the build 
function has been added.


When building the iterator loops for a clause, an artificial label is 
inserted into the loop body to mark the body as belonging to a 
particular iterator (as a single target statement may have multiple 
clauses with different iterators). The label, as well as the iterator 
variable and the array used to hold expanded values, are inserted at the 
end of the iterator in order to add extra statements to the loop body 
later on.


The function enter_omp_iterator_loop_context is used to recursively 
iterate through the gimple_seq containing the iterator loops, looking 
for the label associated with the loop body and updating the 
gimplification context as it goes. The gimple sequence for the loop body 
is then returned. exit_omp_iterator_loop_context is used to reset the 
context.


When gimplifying the clause decl/size or when adding code to add 
decls/sizes to the arrays used to hold the expanded iterator values in 
omp-low, the extra information added to the iterator vector is used to 
call enter_omp_iterator_loop_context, and any resulting Gimple 
statements are added to the returned gimple_seq, thereby adding them to 
the correct loop body.


remove_unused_omp_iterator_vars is called to handle iterators with 
unused iterator variables. It first finds the set of iterator variables 
used by a clause - if the entire set is used, then nothing needs to be 
done. If none are used, then the iterator is removed from the clause. If 
it is a subset, then the subset is looked up in a cache to find a 
suitable iterator, creating a new entry if not present. The variables in 
the clause are then remapped to those in the new iterator.From ceb003984d80067ec1b92f70ac5bfe4ce2072d81 Mon Sep 17 00:00:00 2001
From: Kwok Cheung Yeung 
Date: Wed, 27 Nov 2024 21:49:32 +
Subject: [PATCH 2/5] openmp: Add support for iterators in map clauses (C/C++)

This adds preliminary support for iterators in map clauses within OpenMP
'target' constructs (which includes constructs such as 'target enter data').

Iterators with non-constant loop bounds are not currently supported.

2024-11-27  Kwok Cheung Yeung  

gcc/c/
* c-parser.cc (c_parser_omp_clause_map): Parse 'iterator' modifier.
* c-typeck.cc (c_finish_omp_clauses): Finish iterators.  Apply
iterators to generated clauses.

gcc/cp/
* parser.cc (cp_parser_omp_clause_map): Parse 'iterator' modifier.
* semantics.cc (finish_omp_clauses): Finish iterators.  Apply
iterators to generated clauses.

gcc/
* gimple-pretty-print.cc (dump_gimple_omp_target): Print expanded
iterator loops.
* gimple.cc (gimple_build_omp_target): Add argument for iterator
loops sequence.  Initialize iterator loops field.
* gimple.def (GIMPLE_OMP_TARGET): Set GSS symbol to GSS_OMP_TARGET.
* gimple.h (gomp_target): Set GSS symbol to GSS_OMP_TARGET.  Add extra
field for iterator loops.
(gimple_build_omp_target): Add argument for iterator loops sequence.
(gimple_omp_target_iterator_loops): New.
(gimple_omp_target_iterator_loops_ptr): New.
(gimple_omp_target_set_iterator_loops): New.
* gimplify.cc (find_var_decl): New.
(copy_omp_iterator): New.
(remap_omp_iterator_var_1): New.
(remap_omp_iterator_var): New.
(remove_unused_omp_iterator_vars): New.
(struct iterator_loop_info_t): New type.
(iterator_loop_info_map_t): New type.
(build_omp_iterators_loops): New.
(enter_omp_iterator_loop_context_1): New.
(enter_omp_iterator_loop_context): New.
(enter_omp_iterator_loop_context): New.
(exit_omp_iterator_loop_context): New.
(gimplify_adjust_omp_clauses): Add argument for iterator loop
sequence.  Gimplify the clause decl and size into the iterator
loop if iterators are used.
(gimplify_omp_workshare): Call remove_unused_omp_iterator_vars and
build_omp_iterators_loops for OpenMP target expressions.  Add
loop sequence as argument when calling gimplify_adjust_omp_clauses
and building the Gimple statement.
* gimplify.h (enter_omp_iterator_loop_context): New prototype.
(exit_omp_iterator_loop_context): New prototype.
* gsstruct.def (GSS_OMP_TARGET): New.
* omp-low.cc (lower_omp_map_iterator_expr): New.
(lower_omp_map_iterator_size): New.
(finish_omp_map_iterators): New.
(lower_omp_target): Add sorry if iterators used with deep mapping.
Call lower_omp_map_iterator_expr before assigning to sender ref.
Call lower_omp_map_iterator_size before setting the size.  Insert
iterator loop sequence before the statements for t

[PATCH v4 5/5] openmp, fortran: Add support for iterators in OpenMP 'target update' constructs (Fortran)

2024-11-28 Thread Kwok Cheung Yeung


This part of the patch is unchanged.From e761481eb3d9b322267b3a22773caa0f0270275f Mon Sep 17 00:00:00 2001
From: Kwok Cheung Yeung 
Date: Wed, 27 Nov 2024 21:56:08 +
Subject: [PATCH 5/5] openmp, fortran: Add support for iterators in OpenMP
 'target update' constructs (Fortran)

This adds Fortran support for iterators in 'to' and 'from' clauses in the
'target update' OpenMP directive.

2024-11-27  Kwok Cheung Yeung  

gcc/fortran/
* dump-parse-tree.cc (show_omp_namelist): Add iterator support for
OMP_LIST_TO and OMP_LIST_FROM.
* openmp.cc (gfc_free_omp_clauses): Free namespace for OMP_LIST_TO
and OMP_LIST_FROM.
(gfc_match_motion_var_list): Parse 'iterator' modifier.
(resolve_omp_clauses): Resolve iterators for OMP_LIST_TO and
OMP_LIST_FROM.
* trans-openmp.cc (gfc_trans_omp_clauses): Handle iterators in
OMP_LIST_TO and OMP_LIST_FROM clauses.  Add expressions to
iter_block rather than block.

gcc/testsuite/
* gfortran.dg/gomp/target-update-iterators-1.f90: New.
* gfortran.dg/gomp/target-update-iterators-2.f90: New.
* gfortran.dg/gomp/target-update-iterators-3.f90: New.

libgomp/
* testsuite/libgomp.fortran/target-update-iterators-1.f90: New.
* testsuite/libgomp.fortran/target-update-iterators-2.f90: New.
* testsuite/libgomp.fortran/target-update-iterators-3.f90: New.
---
 gcc/fortran/dump-parse-tree.cc|  7 +-
 gcc/fortran/openmp.cc | 62 +--
 gcc/fortran/trans-openmp.cc   | 50 ++--
 .../gomp/target-update-iterators-1.f90| 25 ++
 .../gomp/target-update-iterators-2.f90| 28 +++
 .../gomp/target-update-iterators-3.f90| 23 ++
 .../target-update-iterators-1.f90 | 68 
 .../target-update-iterators-2.f90 | 63 +++
 .../target-update-iterators-3.f90 | 78 +++
 9 files changed, 392 insertions(+), 12 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/target-update-iterators-1.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/target-update-iterators-2.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/target-update-iterators-3.f90
 create mode 100644 
libgomp/testsuite/libgomp.fortran/target-update-iterators-1.f90
 create mode 100644 
libgomp/testsuite/libgomp.fortran/target-update-iterators-2.f90
 create mode 100644 
libgomp/testsuite/libgomp.fortran/target-update-iterators-3.f90

diff --git a/gcc/fortran/dump-parse-tree.cc b/gcc/fortran/dump-parse-tree.cc
index 6db470a9017..a28b53ea8f5 100644
--- a/gcc/fortran/dump-parse-tree.cc
+++ b/gcc/fortran/dump-parse-tree.cc
@@ -1360,7 +1360,8 @@ show_omp_namelist (int list_type, gfc_omp_namelist *n)
 {
   gfc_current_ns = ns_curr;
   if (list_type == OMP_LIST_AFFINITY || list_type == OMP_LIST_DEPEND
- || list_type == OMP_LIST_MAP)
+ || list_type == OMP_LIST_MAP
+ || list_type == OMP_LIST_TO || list_type == OMP_LIST_FROM)
{
  gfc_current_ns = n->u2.ns ? n->u2.ns : ns_curr;
  if (n->u2.ns != ns_iter)
@@ -1376,6 +1377,10 @@ show_omp_namelist (int list_type, gfc_omp_namelist *n)
fputs ("DEPEND (", dumpfile);
  else if (list_type == OMP_LIST_MAP)
fputs ("MAP (", dumpfile);
+ else if (list_type == OMP_LIST_TO)
+   fputs ("TO (", dumpfile);
+ else if (list_type == OMP_LIST_FROM)
+   fputs ("FROM (", dumpfile);
  else
gcc_unreachable ();
}
diff --git a/gcc/fortran/openmp.cc b/gcc/fortran/openmp.cc
index 00ebe6b1e00..a586c2df537 100644
--- a/gcc/fortran/openmp.cc
+++ b/gcc/fortran/openmp.cc
@@ -194,7 +194,8 @@ gfc_free_omp_clauses (gfc_omp_clauses *c)
   for (i = 0; i < OMP_LIST_NUM; i++)
 gfc_free_omp_namelist (c->lists[i],
   i == OMP_LIST_AFFINITY || i == OMP_LIST_DEPEND
-  || i == OMP_LIST_MAP,
+  || i == OMP_LIST_MAP
+  || i == OMP_LIST_TO || i == OMP_LIST_FROM,
   i == OMP_LIST_ALLOCATE,
   i == OMP_LIST_USES_ALLOCATORS,
   i == OMP_LIST_INIT);
@@ -1378,17 +1379,65 @@ gfc_match_motion_var_list (const char *str, 
gfc_omp_namelist **list,
   if (m != MATCH_YES)
 return m;
 
-  match m_present = gfc_match (" present : ");
+  gfc_namespace *ns_iter = NULL, *ns_curr = gfc_current_ns;
+  int present_modifier = 0, iterator_modifier = 0;
+  locus present_locus = gfc_current_locus, iterator_locus = gfc_current_locus;
 
-  m = gfc_match_omp_variable_list ("", list, false, NULL, headp, true, true);
+  for (;;)
+{
+  locus current_locus = gfc_current

[PATCH v4 0/5] openmp: Add support for iterators in OpenMP mapping clauses

2024-11-28 Thread Kwok Cheung Yeung

This is a revised version of the patch series posted at: 
https://gcc.gnu.org/pipermail/gcc-patches/2024-October/664541.html


The previous version performed the expansion of the iterators into loops 
at the omp-lowering stage, but that meant that the Gimplification of the 
clauses with iterators was also delayed to that point, which is undesirable.


To fix this, I have split the handling of iterators into two phases. The 
first stage happens during the gimplification pass - the iterator loops 
are created, then immediately Gimplified. The resulting Gimple sequence 
is later stored separately as an operand in the Gimple target statement 
rather than incorporated directly into the main program code. When the 
clause decl and sizes are gimplified in gimplify_scan_omp_clauses and 
gimplify_adjust_omp_clauses, we enter the bind context of the iterator 
loop body corresponding to the clause then gimplify the decl/size 
directly into the loop body (which works because the iterator variables 
are in scope at that point).


In the omp lowering pass, lower_omp_map_iterator_expr and 
lower_omp_map_iterator_size as still called just before sending the 
hostaddrs/sizes to the arrays read by libgomp, but they now enter the 
bind context of the relevant loop body and generate code there. After 
lowering is done, the gimple sequence containing the loops is inserted 
just before the Gimple code for the target statement.


The behaviour with clauses not using all variables specified in an 
iterator is also changed. It is now made into a warning when detected 
instead of an error, and the iterator is trimmed to exclude the unused 
variables. If no variables are used then the iterator is removed 
entirely. If an iterator is shared between multiple clauses with 
different sets of variables used, then the iterator is unshared and 
trimmed accordingly.

[PATCH v4 1/5] openmp: Refactor handling of iterators

2024-11-28 Thread Kwok Cheung Yeung

This patch version has the omp_ prefixes added to the new functions for 
building iterator loops and computing the iterator count.


Also, they now take the iterator vector directly rather than the clause 
containing it. This is because depend clauses store the iterator in 
TREE_PURPOSE (TREE_DECL (c)), while map clauses store it in 
OMP_CLAUSE_ITERATORS (c).From a25a986ae707395b3bf31a7d3f08e0d04554aed3 Mon Sep 17 00:00:00 2001
From: Kwok Cheung Yeung 
Date: Wed, 27 Nov 2024 21:49:12 +
Subject: [PATCH 1/5] openmp: Refactor handling of iterators

Move code to calculate the iteration size and to generate the iterator
expansion loop into separate functions.

Use OMP_ITERATOR_DECL_P to check for iterators in clause declarations.

2024-11-27  Kwok Cheung Yeung  

gcc/c-family/
* c-omp.cc (c_finish_omp_depobj): Use OMP_ITERATOR_DECL_P.

gcc/c/
* c-typeck.cc (handle_omp_array_sections): Use OMP_ITERATOR_DECL_P.
(c_finish_omp_clauses): Likewise.

gcc/cp/
* pt.cc (tsubst_omp_clause_decl): Use OMP_ITERATOR_DECL_P.
* semantics.cc (handle_omp_array_sections): Likewise.
(finish_omp_clauses): Likewise.

gcc/
* gimplify.cc (gimplify_omp_affinity): Use OMP_ITERATOR_DECL_P.
(compute_omp_iterator_count): New.
(build_omp_iterator_loop): New.
(gimplify_omp_depend): Use OMP_ITERATOR_DECL_P,
compute_omp_iterator_count and build_omp_iterator_loop.
* tree-inline.cc (copy_tree_body_r): Use OMP_ITERATOR_DECL_P.
* tree-pretty-print.cc (dump_omp_clause): Likewise.
* tree.h (OMP_ITERATOR_DECL_P): New macro.
---
 gcc/c-family/c-omp.cc|   4 +-
 gcc/c/c-typeck.cc|  13 +-
 gcc/cp/pt.cc |   4 +-
 gcc/cp/semantics.cc  |   8 +-
 gcc/gimplify.cc  | 321 +++
 gcc/tree-inline.cc   |   5 +-
 gcc/tree-pretty-print.cc |   8 +-
 gcc/tree.h   |   6 +
 8 files changed, 173 insertions(+), 196 deletions(-)

diff --git a/gcc/c-family/c-omp.cc b/gcc/c-family/c-omp.cc
index 7e20e5a5082..1d15878e7ef 100644
--- a/gcc/c-family/c-omp.cc
+++ b/gcc/c-family/c-omp.cc
@@ -762,9 +762,7 @@ c_finish_omp_depobj (location_t loc, tree depobj,
  kind = OMP_CLAUSE_DEPEND_KIND (clause);
  t = OMP_CLAUSE_DECL (clause);
  gcc_assert (t);
- if (TREE_CODE (t) == TREE_LIST
- && TREE_PURPOSE (t)
- && TREE_CODE (TREE_PURPOSE (t)) == TREE_VEC)
+ if (OMP_ITERATOR_DECL_P (t))
{
  error_at (OMP_CLAUSE_LOCATION (clause),
"% modifier may not be specified on "
diff --git a/gcc/c/c-typeck.cc b/gcc/c/c-typeck.cc
index f465123bfab..32db5893b46 100644
--- a/gcc/c/c-typeck.cc
+++ b/gcc/c/c-typeck.cc
@@ -15151,9 +15151,7 @@ handle_omp_array_sections (tree &c, enum 
c_omp_region_type ort)
   tree *tp = &OMP_CLAUSE_DECL (c);
   if ((OMP_CLAUSE_CODE (c) == OMP_CLAUSE_DEPEND
|| OMP_CLAUSE_CODE (c) == OMP_CLAUSE_AFFINITY)
-  && TREE_CODE (*tp) == TREE_LIST
-  && TREE_PURPOSE (*tp)
-  && TREE_CODE (TREE_PURPOSE (*tp)) == TREE_VEC)
+  && OMP_ITERATOR_DECL_P (*tp))
 tp = &TREE_VALUE (*tp);
   tree first = handle_omp_array_sections_1 (c, *tp, types,
maybe_zero_len, first_non_one,
@@ -16350,9 +16348,7 @@ c_finish_omp_clauses (tree clauses, enum 
c_omp_region_type ort)
  /* FALLTHRU */
case OMP_CLAUSE_AFFINITY:
  t = OMP_CLAUSE_DECL (c);
- if (TREE_CODE (t) == TREE_LIST
- && TREE_PURPOSE (t)
- && TREE_CODE (TREE_PURPOSE (t)) == TREE_VEC)
+ if (OMP_ITERATOR_DECL_P (t))
{
  if (TREE_PURPOSE (t) != last_iterators)
last_iterators_remove
@@ -16452,10 +16448,7 @@ c_finish_omp_clauses (tree clauses, enum 
c_omp_region_type ort)
  break;
}
}
- if (TREE_CODE (OMP_CLAUSE_DECL (c)) == TREE_LIST
- && TREE_PURPOSE (OMP_CLAUSE_DECL (c))
- && (TREE_CODE (TREE_PURPOSE (OMP_CLAUSE_DECL (c)))
- == TREE_VEC))
+ if (OMP_ITERATOR_DECL_P (OMP_CLAUSE_DECL (c)))
TREE_VALUE (OMP_CLAUSE_DECL (c)) = t;
  else
OMP_CLAUSE_DECL (c) = t;
diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 564e368ff43..df8e82bcafa 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -17619,9 +17619,7 @@ tsubst_omp_clause_decl (tree decl, tree args, 
tsubst_flags_t complain,
 return decl;
 
   /* Handle OpenMP iterators.  */
-  if (TREE_CODE (decl) == TREE_LIST
-  && TREE_PURPOSE (decl)
-  && TREE_CODE (TREE_PURPOSE (decl)) == TREE_VEC)
+  if (OMP_ITERATOR_DECL_P (decl))
 {
   tree ret;
   if (iterator_cache[0] == TREE_PURPOSE (decl))
diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index b6ff12f508e..04272db0914 100644
--- a/gcc/cp/semantics.cc

Re: [PATCH] [x86] [RFC] Prevent loop vectorization if it's in a deeply nested big loop.

2024-11-28 Thread Hongtao Liu

On Thu, Nov 28, 2024 at 4:57 PM Richard Biener
 wrote:
>
> On Thu, Nov 28, 2024 at 3:04 AM Hongtao Liu  wrote:
> >
> > On Wed, Nov 27, 2024 at 9:43 PM Richard Biener
> >  wrote:
> > >
> > > On Wed, Nov 27, 2024 at 4:26 AM liuhongt  wrote:
> > > >
> > > > When loop requires any kind of versioning which could increase register
> > > > pressure too much, and it's in a deeply nest big loop, don't do
> > > > vectorization.
> > > >
> > > > I tested the patch with both Ofast and O2 for SPEC2017, besides 
> > > > 548.exchange_r,
> > > > other benchmarks are same binary.
> > > >
> > > > Bootstrapped and regtested 0on x86_64-pc-linux-gnu{-m32,}
> > > > Any comments?
> > >
> > > The vectorizer tries to version an outer loop when vectorizing a loop nest
> > > and the versioning condition is invariant.  See vect_loop_versioning.  
> > > This
> > > tries to handle such cases.  Often the generated runtime alias checks are
> > > not invariant because we do not consider the outer evolutions.  I think we
> > > should instead fix this there.
> > >
> > > Question below ...
> > >
> > > > gcc/ChangeLog:
> > > >
> > > > pr target/117088
> > > > * config/i386/i386.cc
> > > > (ix86_vector_costs::ix86_vect_in_deep_nested_loop_p): New 
> > > > function.
> > > > (ix86_vector_costs::finish_cost): Prevent loop vectorization
> > > > if it's in a deeply nested loop and require versioning.
> > > > * config/i386/i386.opt (--param=vect-max-loop-depth=): New
> > > > param.
> > > > ---
> > > >  gcc/config/i386/i386.cc  | 89 
> > > >  gcc/config/i386/i386.opt |  4 ++
> > > >  2 files changed, 93 insertions(+)
> > > >
> > > > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> > > > index 526c9df7618..608f40413d2 100644
> > > > --- a/gcc/config/i386/i386.cc
> > > > +++ b/gcc/config/i386/i386.cc
> > > > @@ -25019,6 +25019,8 @@ private:
> > > >
> > > >/* Estimate register pressure of the vectorized code.  */
> > > >void ix86_vect_estimate_reg_pressure ();
> > > > +  /* Check if vect_loop is in a deeply-nested loop.  */
> > > > +  bool ix86_vect_in_deep_nested_loop_p (class loop *vect_loop);
> > > >/* Number of GENERAL_REGS/SSE_REGS used in the vectorizer, it's used 
> > > > for
> > > >   estimation of register pressure.
> > > >   ??? Currently it's only used by vec_construct/scalar_to_vec
> > > > @@ -25324,6 +25326,84 @@ 
> > > > ix86_vector_costs::ix86_vect_estimate_reg_pressure ()
> > > >  }
> > > >  }
> > > >
> > > > +/* Return true if vect_loop is in a deeply-nested loop.
> > > > +   .i.e vect_loop_n in below loop structure.
> > > > +loop1
> > > > +{
> > > > + loop2
> > > > + {
> > > > +  loop3
> > > > +  {
> > > > +   vect_loop_1;
> > > > +   loop4
> > > > +   {
> > > > +vect_loop_2;
> > > > +loop5
> > > > +{
> > > > + vect_loop_3;
> > > > + loop6
> > > > + {
> > > > +  vect_loop_4;
> > > > +  loop7
> > > > +  {
> > > > +   vect_loop_5;
> > > > +   loop8
> > > > +   {
> > > > +   loop9
> > > > +   }
> > > > +  vect_loop_6;
> > > > +  }
> > > > + vect_loop_7;
> > > > + }
> > > > +}
> > > > +   }
> > > > + }
> > > > + It's a big hammer to fix O2 regression for 548.exchange_r after 
> > > > vectorization
> > > > + is enhanced by (r15-4225-g70c3db511ba14f)  */
> > > > +bool
> > > > +ix86_vector_costs::ix86_vect_in_deep_nested_loop_p (class loop 
> > > > *vect_loop)
> > > > +{
> > > > +  if (loop_depth (vect_loop) > (unsigned) ix86_vect_max_loop_depth)
> > > > +return true;
> > > > +
> > > > +  if (loop_depth (vect_loop) < 2)
> > > > +return false;
> > > > +
> > >
> > > while the above two are "obvious", what you check below isn't clear to me.
> > > Is this trying to compute whether 'vect_loop' is inside of a loop nest 
> > > which
> > > at any sibling of vect_loop (or even sibling of an outer loop of 
> > > vect_loop,
> > > recursively) is a sub-nest with a loop depth (relative to what?) exceeds
> > > ix86_vect_max_loop_depth?
> > Yes, the function tries to find if the vect_loop is in a "big outer
> > loop" which contains an innermost loop with loop_depth >
> > ix86_vect_max_loop_depth.
> > If yes, then prevent vectorization for the loop if its tripcount is
> > not constant VF-times.(requires any kind of versioning is not
> > accurate, and yes it's a big hammer.)
>
> I'll note it also doesn't seem to look at register pressure at all or limit
> the cut-off to the very-cheap cost model?
The default parameter ix86_vect_max_loop_depth implies the register
pressure, for each layer of loop, it generally needs 2 registers: 1 iv
+ 1 tripcount.
ix86_vect_max_loop_depth > 8 will run out of 16 registers. The
vectoriation for unknown tripcountl increases 1 register for the "new
tripcount of main vectorized loop", and it causes extra spill in the
outer loop.
if the tripcount of vect_loop is big, then the penalty can be
compensated by the vectori

Re: [PATCH] c: correct type compatibility for bit-fields [PR117828]

2024-11-28 Thread Joseph Myers

On Thu, 28 Nov 2024, Martin Uecker wrote:

> Bit-fields need additional checks for type compatiblity.
> 
> 
> Bootstrapped and regression tested on x86_64.
> 
> 
> 
> c: correct type compatibility for bit-fields [PR117828]
> 
> Add missing test for consistency of bit-fields when comparing tagged
> types for compatibility.
> 
> PR c/117828
> 
> gcc/c/ChangeLog:
> * c-typeck.c (tagged_types_tu_compatible_p): Add check.
> 
> gcc/testsuite/ChangeLog:
> * c23-tag-bitfields-1.c: New test.
> * pr117828.c: New test.

OK (with the file names corrected in the ChangeLog entries, it's 
c-typeck.cc, and gcc.dg/ is needed in the test names).

-- 
Joseph S. Myers
josmy...@redhat.com

gimplify: Handle void BIND_EXPR as asm input [PR100501]

2024-11-28 Thread Joseph Myers

As reported in bug 100501 (plus duplicates), the gimplifier ICEs for C
tests involving a statement expression not returning a value as an asm
input.

The expected diagnostic for this case (as seen for C++ input) is one
coming from the gimplifier and so it seems reasonable to fix the
gimplifier to handle the GENERIC generated for this case by the C
front end, rather than trying to make the C front end detect it
earlier.  Thus, adjust two places in the gimplifier to work with
gimplifying a BIND_EXPR changing *expr_p to NULL_TREE.

Bootstrapped with no regressions for x86_64-pc-linux-gnu.  OK to commit?

PR c/100501

gcc/
* gimplify.cc (gimplify_expr): Do not call gimple_test_f on
*expr_p when it has become null.
(gimplify_asm_expr): Handle TREE_VALUE (link) becoming null.

gcc/testsuite/
* gcc.dg/pr100501-1.c: New test.

diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc
index fb0ca23bfb6c..090f8987d5d3 100644
--- a/gcc/gimplify.cc
+++ b/gcc/gimplify.cc
@@ -7457,6 +7457,13 @@ gimplify_asm_expr (tree *expr_p, gimple_seq *pre_p, 
gimple_seq *post_p)
TREE_VALUE (link) = error_mark_node;
  tret = gimplify_expr (&TREE_VALUE (link), pre_p, post_p,
is_gimple_lvalue, fb_lvalue | fb_mayfail);
+ if (TREE_VALUE (link) == NULL_TREE)
+   {
+ /* This can occur when an asm input is a BIND_EXPR for a
+statement expression not returning a value.  */
+ tret = GS_ERROR;
+ TREE_VALUE (link) = error_mark_node;
+   }
  if (tret != GS_ERROR)
{
  /* Unlike output operands, memory inputs are not guaranteed
@@ -19662,10 +19669,11 @@ gimplify_expr (tree *expr_p, gimple_seq *pre_p, 
gimple_seq *post_p,
 
   /* Otherwise we're gimplifying a subexpression, so the resulting
  value is interesting.  If it's a valid operand that matches
- GIMPLE_TEST_F, we're done. Unless we are handling some
- post-effects internally; if that's the case, we need to copy into
- a temporary before adding the post-effects to POST_P.  */
-  if (gimple_seq_empty_p (internal_post) && (*gimple_test_f) (*expr_p))
+ GIMPLE_TEST_F, or it's now NULL_TREE, we're done.  Unless we are
+ handling some post-effects internally; if that's the case, we need
+ to copy into a temporary before adding the post-effects to POST_P.  */
+  if (gimple_seq_empty_p (internal_post)
+  && (!*expr_p || (*gimple_test_f) (*expr_p)))
 goto out;
 
   /* Otherwise, we need to create a new temporary for the gimplified
diff --git a/gcc/testsuite/gcc.dg/pr100501-1.c 
b/gcc/testsuite/gcc.dg/pr100501-1.c
new file mode 100644
index ..b5b3781a9c2f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr100501-1.c
@@ -0,0 +1,26 @@
+/* Test ICE for statement expression returning no value as asm input (bug
+   100501).  */
+/* { dg-do compile } */
+/* { dg-options "" } */
+
+int x;
+int g ();
+
+void
+f ()
+{
+  __asm__ ("" : : "m" (({}))); /* { dg-error "memory input 0 is not directly 
addressable" } */
+  __asm__ ("" : : "m" (({ ; }))); /* { dg-error "memory input 0 is not 
directly addressable" } */
+  __asm__ ("" : : "m" (({ (void) 0; }))); /* { dg-error "memory input 0 is not 
directly addressable" } */
+  __asm__ ("" : : "m" (({ f (); }))); /* { dg-error "memory input 0 is not 
directly addressable|using result of function returning 'void'" } */
+  __asm__ ("" : : "m" (({ f (); f (); }))); /* { dg-error "memory input 0 is 
not directly addressable" } */
+  __asm__ ("" : : "m" (({ x = g (); f (); }))); /* { dg-error "memory input 0 
is not directly addressable" } */
+  __asm__ ("" : : "m" (({ if (1) g (); }))); /* { dg-error "memory input 0 is 
not directly addressable" } */
+  __asm__ ("" : : "m" (({ if (1) g (); else g (); }))); /* { dg-error "memory 
input 0 is not directly addressable" } */
+  __asm__ ("" : : "m" (({ test : goto test; }))); /* { dg-error "memory input 
0 is not directly addressable" } */
+  __asm__ ("" : : "m" (({ return; }))); /* { dg-error "memory input 0 is not 
directly addressable" } */
+  __asm__ ("" : : "m" (({ while (1); }))); /* { dg-error "memory input 0 is 
not directly addressable" } */
+  __asm__ ("" : : "m" (({ do {} while (1); }))); /* { dg-error "memory input 0 
is not directly addressable" } */
+  __asm__ ("" : : "m" (({ for (;;); }))); /* { dg-error "memory input 0 is not 
directly addressable" } */
+  __asm__ ("" : : "m" (({ switch (x); }))); /* { dg-error "memory input 0 is 
not directly addressable" } */
+}

-- 
Joseph S. Myers
josmy...@redhat.com

[committed] testsuite: Fix up pr116675.c test [PR116675]

2024-11-28 Thread Jakub Jelinek

On Wed, Nov 20, 2024 at 08:02:34PM +0800, Cui, Lili wrote:
>   PR target/116675
>   * gcc.target/i386/pr116675.c: New test.

The test uses dg-do run and scan-assembler* at the same time,
that obviously doesn't work when pr116675.s isn't created at all,
so one gets
PASS: gcc.target/i386/pr116675.c execution test
gcc.target/i386/pr116675.c: output file does not exist
UNRESOLVED: gcc.target/i386/pr116675.c scan-assembler-times pand 4
gcc.target/i386/pr116675.c: output file does not exist
UNRESOLVED: gcc.target/i386/pr116675.c scan-assembler-times pandn 4
gcc.target/i386/pr116675.c: output file does not exist
UNRESOLVED: gcc.target/i386/pr116675.c scan-assembler-times por 4
The usual way to handle that is adding -save-temps option.

The test FAILs after that change though, for simple reason, the pand
regex doesn't match just pand instructions, but also the pandn ones.

I've added \t there to make sure it matches only pand.

Though, wonder if it wouldn't be safer to split the test into two,
one with just the 4 functions (why noinline, noclone rather than
noipa, btw?), that one would be dg-do compile and have the scan-assembler*
directives, and then another one which includes the first one and is
dg-do run and contains the runtime checking of those.  Because with the
checking (main etc.) one risks that those instructions can appear in there
as well.

In any case, I've committed this as obvious after testing it on
x86_64-linux -m32/-m64.

2024-11-28  Jakub Jelinek  

PR target/116675
* gcc.target/i386/pr116675.c: Add -save-temps to dg-options.
Scan for pand\t rather than pand.

--- gcc/testsuite/gcc.target/i386/pr116675.c.jj 2024-11-26 21:50:00.60285 
+0100
+++ gcc/testsuite/gcc.target/i386/pr116675.c2024-11-28 14:51:18.579548566 
+0100
@@ -1,6 +1,6 @@
 /* { dg-do run } */
-/* { dg-options "-O2 -msse2 -mno-ssse3" } */
-/* { dg-final { scan-assembler-times "pand" 4 } } */
+/* { dg-options "-O2 -msse2 -mno-ssse3 -save-temps" } */
+/* { dg-final { scan-assembler-times "pand\t" 4 } } */
 /* { dg-final { scan-assembler-times "pandn" 4 } } */
 /* { dg-final { scan-assembler-times "por" 4 } } */

Jakub

[committed] First two patches from Mariam's CRC work

2024-11-28 Thread Jeff Law

So these are updated versions of the first two of Mariam's patches for 
CRC optimization.  They introduce the basic building blocks that are 
used by subsequent patches as well as CRC builtin support.


The biggest conceptual change from Mariam's patch is to drop the 
assumption that we're going to be using word_mode in the table based 
expansion.  That in turn means we can support the table based CRC 
expansion independent of the target's word size and what modes are 
supported for basic ALU operations.


This has been tested on every cross target in my tester and it has been 
bootstrapped and regression tested on x86_64.  The full series has also 
been bootstrapped and regression tested on a variety of targets 
including, but not limited to aarch64, riscv64, ppc64le, and others.


Attaching committed patch #1 and #2 for the archivers.

Jeffcommit bb46d05ad64e4e0acb3307e76bab340aa8587d3e
Author: Mariam Arutunian 
Date:   Mon Nov 11 12:48:34 2024 -0700

[PATCH v6 01/12] Implement internal functions for efficient CRC computation.

Add two new internal functions (IFN_CRC, IFN_CRC_REV), to provide faster
CRC generation.
One performs bit-forward and the other bit-reversed CRC computation.
If CRC optabs are supported, they are used for the CRC computation.
Otherwise, table-based CRC is generated.
The supported data and CRC sizes are 8, 16, 32, and 64 bits.
The polynomial is without the leading 1.
A table with 256 elements is used to store precomputed CRCs.
For the reflection of inputs and the output, a simple algorithm involving
SHIFT, AND, and OR operations is used.

gcc/

* doc/md.texi (crc@var{m}@var{n}4, crc_rev@var{m}@var{n}4): 
Document.
* expr.cc (calculate_crc): New function.
(assemble_crc_table): Likewise.
(generate_crc_table): Likewise.
(calculate_table_based_CRC): Likewise.
(expand_crc_table_based): Likewise.
(gen_common_operation_to_reflect): Likewise.
(reflect_64_bit_value): Likewise.
(reflect_32_bit_value): Likewise.
(reflect_16_bit_value): Likewise.
(reflect_8_bit_value): Likewise.
(generate_reflecting_code_standard): Likewise.
(expand_reversed_crc_table_based): Likewise.
* expr.h (generate_reflecting_code_standard): New function 
declaration.
(expand_crc_table_based): Likewise.
(expand_reversed_crc_table_based): Likewise.
* internal-fn.cc: (crc_direct): Define.
(direct_crc_optab_supported_p): Likewise.
(expand_crc_optab_fn): New function
* internal-fn.def (CRC, CRC_REV): New internal functions.
* optabs.def (crc_optab, crc_rev_optab): New optabs.

Signed-off-by: Mariam Arutunian 
Co-authored-by: Joern Rennecke 
Co-authored-by: Jeff Law 

diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index c4c37053833..69605bf75c0 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -8578,6 +8578,20 @@ Return 1 if operand 1 is a normal floating point number 
and 0
 otherwise.  @var{m} is a scalar floating point mode.  Operand 0
 has mode @code{SImode}, and operand 1 has mode @var{m}.
 
+@cindex @code{crc@var{m}@var{n}4} instruction pattern
+@item @samp{crc@var{m}@var{n}4}
+Calculate a bit-forward CRC using operands 1, 2 and 3,
+then store the result in operand 0.
+Operands 1 is the initial CRC, operands 2 is the data and operands 3 is the
+polynomial without leading 1.
+Operands 0, 1 and 3 have mode @var{n} and operand 2 has mode @var{m}, where
+both modes are integers.  The size of CRC to be calculated is determined by the
+mode; for example, if @var{n} is @code{HImode}, a CRC16 is calculated.
+
+@cindex @code{crc_rev@var{m}@var{n}4} instruction pattern
+@item @samp{crc_rev@var{m}@var{n}4}
+Similar to @samp{crc@var{m}@var{n}4}, but calculates a bit-reversed CRC.
+
 @end table
 
 @end ifset
diff --git a/gcc/expr.cc b/gcc/expr.cc
index f4939140bb5..de25437660e 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -14177,3 +14177,350 @@ int_expr_size (const_tree exp)
 
   return tree_to_shwi (size);
 }
+
+/* Calculate CRC for the initial CRC and given POLYNOMIAL.
+   CRC_BITS is CRC size.  */
+
+static unsigned HOST_WIDE_INT
+calculate_crc (unsigned HOST_WIDE_INT crc,
+  unsigned HOST_WIDE_INT polynomial,
+  unsigned short crc_bits)
+{
+  unsigned HOST_WIDE_INT msb = HOST_WIDE_INT_1U << (crc_bits - 1);
+  crc = crc << (crc_bits - 8);
+  for (short i = 8; i > 0; --i)
+{
+  if (crc & msb)
+   crc = (crc << 1) ^ polynomial;
+  else
+   crc <<= 1;
+}
+  /* Zero out bits in crc beyond the specified number of crc_bits.  */
+  if (crc_bits < sizeof (crc) * CHAR_BIT)
+crc &= (HOST_WIDE_INT_1U << crc_bits) - 1;
+  return crc;
+}
+
+/* Assemble CRC table with 256 elements for the given POLYNOM and CRC_BITS with
+   given ID.
+   ID is

[committed] libstdc++: Reduce duplication in Doxygen comments for std::list

2024-11-28 Thread Jonathan Wakely

We have a number of comments which are duplicated for C++98 and C++11
overloads, where the signatures are slightly different. Instead of
duplicating the comments that are 90% identical, just use a single
comment that can apply to both. In some cases this means saying "an
iterator" instead of "A const iterator" but that's fine, a
std::list::const_iterator is still an iterator (and a non-const iterator
is a valid argument to those functions because they'll implicitly
convert to const_iterator).

In two cases the @return description just needs to say that it returns
void for C++98 and an iterator otherwise.

libstdc++-v3/ChangeLog:

* include/bits/stl_list.h: Reduce duplication in doxygen
comments.
---

Tested x86_64-linux. Pushed to trunk.

 libstdc++-v3/include/bits/stl_list.h | 156 ---
 1 file changed, 42 insertions(+), 114 deletions(-)

diff --git a/libstdc++-v3/include/bits/stl_list.h 
b/libstdc++-v3/include/bits/stl_list.h
index df7f388ede5..cf3d05fcae9 100644
--- a/libstdc++-v3/include/bits/stl_list.h
+++ b/libstdc++-v3/include/bits/stl_list.h
@@ -1477,21 +1477,8 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
   template
iterator
emplace(const_iterator __position, _Args&&... __args);
+#endif
 
-  /**
-   *  @brief  Inserts given value into %list before specified iterator.
-   *  @param  __position  A const_iterator into the %list.
-   *  @param  __x  Data to be inserted.
-   *  @return  An iterator that points to the inserted data.
-   *
-   *  This function will insert a copy of the given value before
-   *  the specified location.  Due to the nature of a %list this
-   *  operation can be done in constant time, and does not
-   *  invalidate iterators and references.
-   */
-  iterator
-  insert(const_iterator __position, const value_type& __x);
-#else
   /**
*  @brief  Inserts given value into %list before specified iterator.
*  @param  __position  An iterator into the %list.
@@ -1502,38 +1489,34 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
*  the specified location.  Due to the nature of a %list this
*  operation can be done in constant time, and does not
*  invalidate iterators and references.
-   */
-  iterator
-  insert(iterator __position, const value_type& __x);
-#endif
-
-#if __cplusplus >= 201103L
-  /**
-   *  @brief  Inserts given rvalue into %list before specified iterator.
-   *  @param  __position  A const_iterator into the %list.
-   *  @param  __x  Data to be inserted.
-   *  @return  An iterator that points to the inserted data.
*
-   *  This function will insert a copy of the given rvalue before
-   *  the specified location.  Due to the nature of a %list this
-   *  operation can be done in constant time, and does not
-   *  invalidate iterators and references.
-   */
+   *  @{
+   */
+#if __cplusplus >= 201103L
+  iterator
+  insert(const_iterator __position, const value_type& __x);
+
   iterator
   insert(const_iterator __position, value_type&& __x)
   { return emplace(__position, std::move(__x)); }
+#else
+  iterator
+  insert(iterator __position, const value_type& __x);
+#endif
+  /// @}
 
+#if __cplusplus >= 201103L
   /**
*  @brief  Inserts the contents of an initializer_list into %list
*  before specified const_iterator.
*  @param  __p  A const_iterator into the %list.
*  @param  __l  An initializer_list of value_type.
*  @return  An iterator pointing to the first element inserted
-   *   (or __position).
+   *   (or `__p`).
*
*  This function will insert copies of the data in the
-   *  initializer_list @a l into the %list before the location
-   *  specified by @a p.
+   *  initializer_list `__l` into the %list before the location
+   *  specified by `__p`.
*
*  This operation is linear in the number of elements inserted and
*  does not invalidate iterators and references.
@@ -1543,36 +1526,24 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
   { return this->insert(__p, __l.begin(), __l.end()); }
 #endif
 
-#if __cplusplus >= 201103L
-  /**
-   *  @brief  Inserts a number of copies of given data into the %list.
-   *  @param  __position  A const_iterator into the %list.
-   *  @param  __n  Number of elements to be inserted.
-   *  @param  __x  Data to be inserted.
-   *  @return  An iterator pointing to the first element inserted
-   *   (or __position).
-   *
-   *  This function will insert a specified number of copies of the
-   *  given data before the location specified by @a position.
-   *
-   *  This operation is linear in the number of elements inserted and
-   *  does not invalidate iterators and references.
-   */
-  iterator
-  insert(const_it

[committed] libstdc++: Fix allocator-extended move ctor for std::basic_stacktrace [PR117822]

2024-11-28 Thread Jonathan Wakely

libstdc++-v3/ChangeLog:

PR libstdc++/117822
* include/std/stacktrace (stacktrace(stacktrace&&, const A&)):
Fix typo in qualified-id for is_always_equal trait.
* testsuite/19_diagnostics/stacktrace/stacktrace.cc: Test
allocator-extended constructors and allocator propagation.
---

Tested x86_64-linux. Pushed to trunk.

 libstdc++-v3/include/std/stacktrace   |   2 +-
 .../19_diagnostics/stacktrace/stacktrace.cc   | 207 +-
 2 files changed, 204 insertions(+), 5 deletions(-)

diff --git a/libstdc++-v3/include/std/stacktrace 
b/libstdc++-v3/include/std/stacktrace
index 58d0c2a0fc2..2c0f6ba10a9 100644
--- a/libstdc++-v3/include/std/stacktrace
+++ b/libstdc++-v3/include/std/stacktrace
@@ -295,7 +295,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   const allocator_type& __alloc) noexcept
   : _M_alloc(__alloc)
   {
-   if constexpr (_Allocator::is_always_equal::value)
+   if constexpr (_AllocTraits::is_always_equal::value)
  _M_impl = std::__exchange(__other._M_impl, {});
else if (_M_alloc == __other._M_alloc)
  _M_impl = std::__exchange(__other._M_impl, {});
diff --git a/libstdc++-v3/testsuite/19_diagnostics/stacktrace/stacktrace.cc 
b/libstdc++-v3/testsuite/19_diagnostics/stacktrace/stacktrace.cc
index 6bb22eacd92..ee1a6d221e3 100644
--- a/libstdc++-v3/testsuite/19_diagnostics/stacktrace/stacktrace.cc
+++ b/libstdc++-v3/testsuite/19_diagnostics/stacktrace/stacktrace.cc
@@ -106,13 +106,165 @@ test_cons()
 VERIFY( s5 != s0 );
 VERIFY( s3 == s0 );
 
-// TODO test allocator-extended copy/move
+Stacktrace s6(s5, Alloc{6});
+VERIFY( ! s6.empty() );
+VERIFY( s6.size() != 0 );
+VERIFY( s6.begin() != s6.end() );
+VERIFY( s6 == s5 );
+VERIFY( s5 != s0 );
+VERIFY( s6.get_allocator().get_personality() == 6 );
 
-// TODO test allocator propagation
+Stacktrace s7(std::move(s6), Alloc{7});
+VERIFY( ! s7.empty() );
+VERIFY( s7.size() != 0 );
+VERIFY( s7.begin() != s7.end() );
+VERIFY( s7 == s5 );
+VERIFY( s5 != s0 );
+VERIFY( s7.get_allocator().get_personality() == 7 );
+  }
+
+  {
+using Alloc = __gnu_test::SimpleAllocator;
+using Stacktrace = std::basic_stacktrace;
+
+Stacktrace s0;
+VERIFY( s0.empty() );
+VERIFY( s0.size() == 0 );
+VERIFY( s0.begin() == s0.end() );
+
+Stacktrace s1(Alloc{});
+VERIFY( s1.empty() );
+VERIFY( s1.size() == 0 );
+VERIFY( s1.begin() == s1.end() );
+
+VERIFY( s0 == s1 );
+
+Stacktrace s2(s0);
+VERIFY( s2 == s0 );
+
+const Stacktrace curr = Stacktrace::current();
+
+Stacktrace s3(curr);
+VERIFY( ! s3.empty() );
+VERIFY( s3.size() != 0 );
+VERIFY( s3.begin() != s3.end() );
+VERIFY( s3 != s0 );
+
+Stacktrace s4(s3);
+VERIFY( ! s4.empty() );
+VERIFY( s4.size() != 0 );
+VERIFY( s4.begin() != s4.end() );
+VERIFY( s4 == s3 );
+VERIFY( s4 != s0 );
+
+Stacktrace s5(std::move(s3));
+VERIFY( ! s5.empty() );
+VERIFY( s5.size() != 0 );
+VERIFY( s5.begin() != s5.end() );
+VERIFY( s5 == s4 );
+VERIFY( s5 != s0 );
+VERIFY( s3 == s0 );
+
+Stacktrace s6(s5, Alloc{});
+VERIFY( ! s6.empty() );
+VERIFY( s6.size() != 0 );
+VERIFY( s6.begin() != s6.end() );
+VERIFY( s6 == s5 );
+VERIFY( s5 != s0 );
+
+Stacktrace s7(std::move(s6), Alloc{});
+VERIFY( ! s7.empty() );
+VERIFY( s7.size() != 0 );
+VERIFY( s7.begin() != s7.end() );
+VERIFY( s7 == s5 );
+VERIFY( s5 != s0 );
+  }
+
+{
+using Stacktrace = std::pmr::stacktrace;
+using Alloc = Stacktrace::allocator_type;
+
+Stacktrace s0;
+VERIFY( s0.empty() );
+VERIFY( s0.size() == 0 );
+VERIFY( s0.begin() == s0.end() );
+
+Stacktrace s1(Alloc{});
+VERIFY( s1.empty() );
+VERIFY( s1.size() == 0 );
+VERIFY( s1.begin() == s1.end() );
+
+VERIFY( s0 == s1 );
+
+Stacktrace s2(s0);
+VERIFY( s2 == s0 );
+
+const Stacktrace curr = Stacktrace::current();
+
+Stacktrace s3(curr);
+VERIFY( ! s3.empty() );
+VERIFY( s3.size() != 0 );
+VERIFY( s3.begin() != s3.end() );
+VERIFY( s3 != s0 );
+
+Stacktrace s4(s3);
+VERIFY( ! s4.empty() );
+VERIFY( s4.size() != 0 );
+VERIFY( s4.begin() != s4.end() );
+VERIFY( s4 == s3 );
+VERIFY( s4 != s0 );
+
+Stacktrace s5(std::move(s3));
+VERIFY( ! s5.empty() );
+VERIFY( s5.size() != 0 );
+VERIFY( s5.begin() != s5.end() );
+VERIFY( s5 == s4 );
+VERIFY( s5 != s0 );
+VERIFY( s3 == s0 );
+
+__gnu_test::memory_resource mr;
+Stacktrace s6(s5, &mr);
+VERIFY( ! s6.empty() );
+VERIFY( s6.size() != 0 );
+VERIFY( s6.begin() != s6.end() );
+VERIFY( s6 == s5 );
+VERIFY( s5 != s0 );
+VERIFY( s6.get_allocator() != s5.get_allocator() );
+
+Stacktrace s7(std::move(s6), Alloc{});
+VERIFY( ! s7.empty() );
+VERIFY( s7.size() != 0 );
+VERIFY( s7.begin()

Re: [PATCH][ivopts]: perform affine fold to unsigned on non address expressions. [PR114932]

2024-11-28 Thread Richard Biener

On Thu, 7 Nov 2024, Tamar Christina wrote:

> Hi All,
> 
> When the patch for PR114074 was applied we saw a good boost in exchange2.
> 
> This boost was partially caused by a simplification of the addressing modes.
> With the patch applied IV opts saw the following form for the base addressing;
> 
>   Base: (integer(kind=4) *) &block + ((sizetype) ((unsigned long) l0_19(D) *
> 324) + 36)
> 
> vs what we normally get:
> 
>   Base: (integer(kind=4) *) &block + ((sizetype) ((integer(kind=8)) l0_19(D)
> * 81) + 9) * 4
> 
> This is because the patch promoted multiplies where one operand is a constant
> from a signed multiply to an unsigned one, to attempt to fold away the 
> constant.
> 
> This patch attempts the same but due to the various problems with SCEV and
> niters not being able to analyze the resulting forms (i.e. PR114322) we can't
> do it during SCEV or in the general form like in fold-const like 
> extract_muldiv
> attempts.
> 
> Instead this applies the simplification during IVopts initialization when we
> create the IV.  Essentially when we know the IV won't overflow with regards to
> niters then we perform an affine fold which gets it to simplify the internal
> computation, even if this is signed because we know that for IVOPTs uses the
> IV won't ever overflow.  This allows IV opts to see the simplified form
> without influencing the rest of the compiler.
> 
> as mentioned in PR114074 it would be good to fix the missed optimization in 
> the
> other passes so we can perform this in general.
> 
> The reason this has a big impact on fortran code is that fortran doesn't seem 
> to
> have unsigned integer types.  As such all it's addressing are created with
> signed types and folding does not happen on them due to the possible overflow.
> 
> concretely on AArch64 this changes the results from generation:
> 
> mov x27, -108
> mov x24, -72
> mov x23, -36
> add x21, x1, x0, lsl 2
> add x19, x20, x22
> .L5:
> add x0, x22, x19
> add x19, x19, 324
> ldr d1, [x0, x27]
> add v1.2s, v1.2s, v15.2s
> str d1, [x20, 216]
> ldr d0, [x0, x24]
> add v0.2s, v0.2s, v15.2s
> str d0, [x20, 252]
> ldr d31, [x0, x23]
> add v31.2s, v31.2s, v15.2s
> str d31, [x20, 288]
> bl  digits_20_
> cmp x21, x19
> bne .L5
> 
> into:
> 
> .L5:
> ldr d1, [x19, -108]
> add v1.2s, v1.2s, v15.2s
> str d1, [x20, 216]
> ldr d0, [x19, -72]
> add v0.2s, v0.2s, v15.2s
> str d0, [x20, 252]
> ldr d31, [x19, -36]
> add x19, x19, 324
> add v31.2s, v31.2s, v15.2s
> str d31, [x20, 288]
> bl  digits_20_
> cmp x21, x19
> bne .L5
> 
> The two patches together results in a 10% performance increase in exchange2 in
> SPECCPU 2017 and a 4% reduction in binary size and a 5% improvement in compile
> time. There's also a 5% performance improvement in fotonik3d and similar
> reduction in binary size.
> 
> The patch folds every IV to unsigned to canonicalize them.  At the end of the
> pass we match.pd will then remove unneeded conversions.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu,
> x86_64-pc-linux-gnu -m32, -m64 and some issues below:
> 
>  * gcc.dg/torture/bitint-49.c   -O1  execution test
>  * gcc.c-torture/execute/pr110115.c   -O1  execution test
> 
> These two start to fail now because of a bug in the stack slot sharing 
> conflict
> function.  Basically the change changes the addressing from ADDR_REF to
> (unsigned) ADDR_REF and add_scope_conflics_2 does not look deep enough through
> the promotion to realize that the two values are live at the same time.
> 
> Both of these issues are fixed by Andrew's patch [1],  Since this patch 
> rewrites
> the entire thing, it didn't seem useful for me to provide a spot fix for this.
> 
> [1] 
> https://inbox.sourceware.org/gcc-patches/20241017024205.2660484-1-quic_apin...@quicinc.com/
> 
> Ok for master after Andrew's patch gets in?
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>   PR tree-optimization/114932
>   * tree-scalar-evolution.cc (alloc_iv): Perform affine unsigned fold.
> 
> gcc/testsuite/ChangeLog:
> 
>   PR tree-optimization/114932
>   * gcc.dg/tree-ssa/pr64705.c: Update dump file scan.
>   * gcc.target/i386/pr115462.c: The testcase shares 3 IVs which calculates
>   the same thing but with a slightly different increment offset.  The test
>   checks for 3 complex addressing loads, one for each IV.  But with this
>   change they now all share one IV.  That is the loop now only has one
>   complex addressing.  This is ultimately driven by the backend costing
>   and the current costing says this is preferred so updating the testcase.
>   * gfortran.dg/addressing-modes_1.f90: N

Fix 'libgomp.oacc-c/../libgomp.oacc-c-c++-common/acc_get_property-gcn.c' for C23 default (was: [committed] c: Default to -std=gnu23)

2024-11-28 Thread Thomas Schwinge

Hi!

On 2024-11-15T23:46:47+, Joseph Myers  wrote:
> Change the default language version for C compilation from -std=gnu17
> to -std=gnu23.

Wow, that came quickly.


> A few tests are updated to remove local definitions of
> bool, true and false (where making such an unconditional test change
> seemed to make more sense than changing the test conditionally earlier
> or building it with -std=gnu17); most test issues were already
> addressed in previous patches.

> NOTE: it's likely there are target-specific tests for non-x86 targets
> that need updating as a result of this change.  See commit
> 9fb5348e3021021e82d75e4ca4e6f8d51a34c24f ("testsuite: Prepare for
> -std=gnu23 default") for examples of changes to prepare the testsuite
> to work with a -std=gnu23 default.

Hidden behind effective-target 'openacc_radeon_accel_selected', I ran
into another one; I've pushed to trunk branch
commit bcb764ec7c063326a17eb6213313cc9c0fd348b3
"Fix 'libgomp.oacc-c/../libgomp.oacc-c-c++-common/acc_get_property-gcn.c' for 
C23 default",
see attached.


Grüße
 Thomas


>From bcb764ec7c063326a17eb6213313cc9c0fd348b3 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Thu, 28 Nov 2024 15:14:20 +0100
Subject: [PATCH] Fix
 'libgomp.oacc-c/../libgomp.oacc-c-c++-common/acc_get_property-gcn.c' for C23
 default

With commit 55e3bd376b2214e200fa76d12b67ff259b06c212 "c: Default to -std=gnu23"
we've got:

[-PASS:-]{+FAIL:+} libgomp.oacc-c/../libgomp.oacc-c-c++-common/acc_get_property-gcn.c -DACC_DEVICE_TYPE_radeon=1 -DACC_MEM_SHARED=0 -foffload=amdgcn-amdhsa  -O0  (test for excess errors)
[-PASS:-]{+UNRESOLVED:+} libgomp.oacc-c/../libgomp.oacc-c-c++-common/acc_get_property-gcn.c -DACC_DEVICE_TYPE_radeon=1 -DACC_MEM_SHARED=0 -foffload=amdgcn-amdhsa  -O0  [-execution test-]{+compilation failed to produce executable+}
[Etc.]

..., due to:

[...]/libgomp.oacc-c/../libgomp.oacc-c-c++-common/acc_get_property-gcn.c:16:13: error: two or more data types in declaration specifiers
[...]/libgomp.oacc-c/../libgomp.oacc-c-c++-common/acc_get_property-gcn.c:16:1: warning: useless type name in empty declaration

	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/acc_get_property-gcn.c
	[!__cplusplus]: Don't 'typedef int bool;'.
---
 .../testsuite/libgomp.oacc-c-c++-common/acc_get_property-gcn.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_get_property-gcn.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_get_property-gcn.c
index 4b1fb5e0e761..ab8fc6c276be 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_get_property-gcn.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_get_property-gcn.c
@@ -12,9 +12,6 @@
 #include 
 #include 
 
-#ifndef __cplusplus
-typedef int bool;
-#endif
 #include 
 
 
-- 
2.34.1

Re: [PATCH] driver: -fhardened and -z lazy/-z norelro [PR117739]

2024-11-28 Thread Marek Polacek

On Thu, Nov 28, 2024 at 11:27:32AM +, Dimitri John Ledkov wrote:
> Did bootstrap with gcc-14 (clean cherrypick, minor offsets).
> Built and tested on arm64 & x86_64.
> It resolved the reported problem.
> Thank you for this patch.
 
Thanks a lot for testing it!
 
> On Tue, 26 Nov 2024, 22:37 Marek Polacek,  wrote:
> 
> > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?
> >
> > -- >8 --
> > As the manual states, using "-fhardened -fstack-protector" will produce
> > a warning because -fhardened wants to enable -fstack-protector-strong,
> > but it can't since it's been overriden by the weaker -fstack-protector.
> >
> > -fhardened also attempts to enable -Wl,-z,relro,-z,now.  By the same
> > logic as above, "-fhardened -z norelro" or "-fhardened -z lazy" should
> > produce the same warning.  But we don't detect this combination, so
> > this patch fixes it.  I also renamed a variable to better reflect its
> > purpose.
> >
> > Also don't check warn_hardened in process_command, since it's always
> > true there.
> >
> > Also tweak wording in the manual as Jon Wakely suggested on IRC.
> >
> > PR driver/117739
> >
> > gcc/ChangeLog:
> >
> > * doc/invoke.texi: Tweak wording for -Whardened.
> > * gcc.cc (driver_handle_option): If -z lazy or -z norelro was
> > specified, don't enable linker hardening.
> > (process_command): Don't check warn_hardened.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * c-c++-common/fhardened-16.c: New test.
> > * c-c++-common/fhardened-17.c: New test.
> > * c-c++-common/fhardened-18.c: New test.
> > * c-c++-common/fhardened-19.c: New test.
> > * c-c++-common/fhardened-20.c: New test.
> > * c-c++-common/fhardened-21.c: New test.
> > ---
> >  gcc/doc/invoke.texi   |  4 ++--
> >  gcc/gcc.cc| 20 ++--
> >  gcc/testsuite/c-c++-common/fhardened-16.c |  5 +
> >  gcc/testsuite/c-c++-common/fhardened-17.c |  5 +
> >  gcc/testsuite/c-c++-common/fhardened-18.c |  5 +
> >  gcc/testsuite/c-c++-common/fhardened-19.c |  5 +
> >  gcc/testsuite/c-c++-common/fhardened-20.c |  5 +
> >  gcc/testsuite/c-c++-common/fhardened-21.c |  5 +
> >  8 files changed, 46 insertions(+), 8 deletions(-)
> >  create mode 100644 gcc/testsuite/c-c++-common/fhardened-16.c
> >  create mode 100644 gcc/testsuite/c-c++-common/fhardened-17.c
> >  create mode 100644 gcc/testsuite/c-c++-common/fhardened-18.c
> >  create mode 100644 gcc/testsuite/c-c++-common/fhardened-19.c
> >  create mode 100644 gcc/testsuite/c-c++-common/fhardened-20.c
> >  create mode 100644 gcc/testsuite/c-c++-common/fhardened-21.c
> >
> > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> > index 346ac1369b8..371f723539c 100644
> > --- a/gcc/doc/invoke.texi
> > +++ b/gcc/doc/invoke.texi
> > @@ -7012,8 +7012,8 @@ This warning is enabled by @option{-Wall}.
> >  Warn when @option{-fhardened} did not enable an option from its set (for
> >  which see @option{-fhardened}).  For instance, using @option{-fhardened}
> >  and @option{-fstack-protector} at the same time on the command line causes
> > -@option{-Whardened} to warn because @option{-fstack-protector-strong} is
> > -not enabled by @option{-fhardened}.
> > +@option{-Whardened} to warn because @option{-fstack-protector-strong} will
> > +not be enabled by @option{-fhardened}.
> >
> >  This warning is enabled by default and has effect only when
> > @option{-fhardened}
> >  is enabled.
> > diff --git a/gcc/gcc.cc b/gcc/gcc.cc
> > index 92c92996401..d2718d263bb 100644
> > --- a/gcc/gcc.cc
> > +++ b/gcc/gcc.cc
> > @@ -305,9 +305,10 @@ static size_t dumpdir_length = 0;
> > driver added to dumpdir after dumpbase or linker output name.  */
> >  static bool dumpdir_trailing_dash_added = false;
> >
> > -/* True if -r, -shared, -pie, or -no-pie were specified on the command
> > -   line.  */
> > -static bool any_link_options_p;
> > +/* True if -r, -shared, -pie, -no-pie, -z lazy, or -z norelro were
> > +   specified on the command line, and therefore -fhardened should not
> > +   add -z now/relro.  */
> > +static bool avoid_linker_hardening_p;
> >
> >  /* True if -static was specified on the command line.  */
> >  static bool static_p;
> > @@ -4434,10 +4435,17 @@ driver_handle_option (struct gcc_options *opts,
> > }
> > /* Record the part after the last comma.  */
> > add_infile (arg + prev, "*");
> > +   if (strcmp (arg, "-z,lazy") == 0 || strcmp (arg, "-z,norelro") ==
> > 0)
> > + avoid_linker_hardening_p = true;
> >}
> >do_save = false;
> >break;
> >
> > +case OPT_z:
> > +  if (strcmp (arg, "lazy") == 0 || strcmp (arg, "norelro") == 0)
> > +   avoid_linker_hardening_p = true;
> > +  break;
> > +
> >  case OPT_Xlinker:
> >add_infile (arg, "*");
> >do_save = false;
> > @@ -4642,7 +4650,7 @@ driver_handle_option (struct gcc_options

Re: [PATCH] c++: P2865R5, Remove Deprecated Array Comparisons from C++26 [PR117788]

2024-11-28 Thread Jason Merrill


On 11/27/24 9:06 PM, Marek Polacek wrote:

Not a bugfix, but this should only affect C++26.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8--
This patch implements P2865R5 by promoting the warning to error in C++26
only.  -Wno-array-compare shouldn't disable the error, so adjust the call
sites as well.


I think it's fine for -Wno-array-compare to suppress the error (and
-Wno-error=array-compare to reduce it to a warning), so how about 
DK_PERMERROR rather than DK_ERROR?


We also need SFINAE for this when !tf_warning_or_error.

Jason

Re: [PATCH] arm, mve: Do not DLSTP transform loops if VCTP is not first

2024-11-28 Thread Christophe Lyon

Hi Andre,

On Thu, 28 Nov 2024 at 17:37, Andre Vieira
 wrote:
>
> Hi,
>
> This rejects any loops where any predicated instruction comes before the vctp
> that generates the loop predicate.  Even though this is not a requirement for
> dlstp transformation we have found potential issues where you can end up with 
> a
> wrong transformation, so it is safer to reject such loops.
>
> OK for trunk?
>
> gcc/ChangeLog:
>
> * gcc/config/arm/arm.cc (arm_mve_get_loop_vctp): Reject loops with a
> predicated instruction before the vctp.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/arm/mve/dlstp-invalid-asm.c (test10): Renamed to...
> (test10a): ... this.
> (test10b): Variation of test10a with a small change to trigger an
> issue.

Thanks, the patch LGTM except a minor nit:

 /* Using a VPR that gets re-generated within the loop.  */
-void test10 (int32_t *a, int32_t *b, int32_t *c, int n)
+void test10a (int32_t *a, int32_t *b, int32_t *c, int n)
[...]

+/* Using a VPR that gets re-generated within the loop.  */
+void test10b (int32_t *a, int32_t *b, int32_t *c, int n)

Can you update the comment before test10b, to highlight the difference
with test10a?

Thanks,

Christophe


> ---
>  gcc/config/arm/arm.cc | 21 ++-
>  .../gcc.target/arm/mve/dlstp-invalid-asm.c| 20 +-
>  2 files changed, 35 insertions(+), 6 deletions(-)
>

[PATCH] c++: define __cpp_pack_indexing [PR113798]

2024-11-28 Thread Marek Polacek

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
Forgot to do this in my original patch.

gcc/c-family/ChangeLog:

* c-cppbuiltin.cc (c_cpp_builtins): Predefine
__cpp_pack_indexing=202311L for C++26.

gcc/testsuite/ChangeLog:

* g++.dg/cpp26/feat-cxx26.C (__cpp_pack_indexing): Add test.
---
 gcc/c-family/c-cppbuiltin.cc| 1 +
 gcc/testsuite/g++.dg/cpp26/feat-cxx26.C | 6 ++
 2 files changed, 7 insertions(+)

diff --git a/gcc/c-family/c-cppbuiltin.cc b/gcc/c-family/c-cppbuiltin.cc
index c354c794b55..195f8ae5e40 100644
--- a/gcc/c-family/c-cppbuiltin.cc
+++ b/gcc/c-family/c-cppbuiltin.cc
@@ -1092,6 +1092,7 @@ c_cpp_builtins (cpp_reader *pfile)
  cpp_define (pfile, "__cpp_structured_bindings=202403L");
  cpp_define (pfile, "__cpp_deleted_function=202403L");
  cpp_define (pfile, "__cpp_variadic_friend=202403L");
+ cpp_define (pfile, "__cpp_pack_indexing=202311L");
}
   if (flag_concepts && cxx_dialect > cxx14)
cpp_define (pfile, "__cpp_concepts=202002L");
diff --git a/gcc/testsuite/g++.dg/cpp26/feat-cxx26.C 
b/gcc/testsuite/g++.dg/cpp26/feat-cxx26.C
index c387a7dfe60..d74ff0e427b 100644
--- a/gcc/testsuite/g++.dg/cpp26/feat-cxx26.C
+++ b/gcc/testsuite/g++.dg/cpp26/feat-cxx26.C
@@ -622,3 +622,9 @@
 #elif __cpp_variadic_friend != 202403
 #  error "__cpp_variadic_friend != 202403"
 #endif
+
+#ifndef __cpp_pack_indexing
+# error "__cpp_pack_indexing"
+#elif __cpp_pack_indexing != 202311
+#  error "__cpp_pack_indexing != 202311"
+#endif

base-commit: ab2cce593ef6085a5f517cdca2520c5c44acbfad
-- 
2.47.0

Re: [PATCH v4 5/5] aarch64: add SVE2 FP8DOT2 and FP8DOT4 intrinsics

2024-11-28 Thread Claudio Bantaloukas




On 21/11/2024 15:41, Richard Sandiford wrote:

Claudio Bantaloukas  writes:

This patch adds support for the following intrinsics:
- svdot[_f32_mf8]_fpm
- svdot_lane[_f32_mf8]_fpm
- svdot[_f16_mf8]_fpm
- svdot_lane[_f16_mf8]_fpm

The first two are available under a combination of the FP8DOT4 and SVE2 
features.
Alternatively under the SSVE_FP8DOT4 feature under streaming mode.
The final two are available under a combination of the FP8DOT2 and SVE2 
features.
Alternatively under the SSVE_FP8DOT2 feature under streaming mode.

Some of the comments from the previous patches apply here too
(e.g. the boilerplate at the start of the tests, and testing the
highest in-range index).

Done

It looks like the patch is missing a change to doc/invoke.texi.

Done


Otherwise it's just banal trivia, sorry:


diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc 
b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
index 022163f0726..65df48a3e65 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
+++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
@@ -835,21 +835,28 @@ public:
rtx
expand (function_expander &e) const override
{
-/* In the optab, the multiplication operands come before the accumulator
-   operand.  The optab is keyed off the multiplication mode.  */
-e.rotate_inputs_left (0, 3);
  insn_code icode;
-if (e.type_suffix_ids[1] == NUM_TYPE_SUFFIXES)
-  icode = e.convert_optab_handler_for_sign (sdot_prod_optab,
-   udot_prod_optab,
-   0, e.result_mode (),
-   GET_MODE (e.args[0]));
+if (e.fpm_mode == aarch64_sve::FPM_set)
+  {
+   icode = code_for_aarch64_sve_dot (e.result_mode ());
+  }

Formatting nit, but: no braces around single statements, with the body
then being indented by 2 spaces relative to the "if".

Done

diff --git a/gcc/config/aarch64/aarch64-sve-builtins-shapes.cc 
b/gcc/config/aarch64/aarch64-sve-builtins-shapes.cc
index 09f343e7118..9f79f6e28c7 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins-shapes.cc
+++ b/gcc/config/aarch64/aarch64-sve-builtins-shapes.cc
@@ -3994,6 +3994,34 @@ struct ternary_bfloat_def
  };
  SHAPE (ternary_bfloat)
  
+/* sv_t svfoo[_t0](sv_t, svmfloat8_t, svmfloat8_t).  */

+struct ternary_mfloat8_def
+: public ternary_resize2_base<8, TYPE_mfloat, TYPE_mfloat>
+{
+  void
+  build (function_builder &b, const function_group_info &group) const override
+  {
+gcc_assert (group.fpm_mode == FPM_set);
+b.add_overloaded_functions (group, MODE_none);
+build_all (b, "v0,v0,vM,vM", group, MODE_none);
+  }
+
+  tree
+  resolve (function_resolver &r) const override
+  {
+type_suffix_index type;
+if (!r.check_num_arguments (4)
+   || (type = r.infer_vector_type (0)) == NUM_TYPE_SUFFIXES
+   || !r.require_vector_type (1, VECTOR_TYPE_svmfloat8_t)
+   || !r.require_vector_type (2, VECTOR_TYPE_svmfloat8_t)
+   || !r.require_scalar_type (3, "int64_t"))

uint64_t

Done

+  return error_mark_node;
+
+return r.resolve_to (r.mode_suffix_id, type, TYPE_SUFFIX_mf8, GROUP_none);
+  }
+};
+SHAPE (ternary_mfloat8)
+
  /* sv_t svfoo[_t0](sv_t, svbfloat16_t, svbfloat16_t, uint64_t)
  
 where the final argument is an integer constant expression in the range

diff --git a/gcc/config/aarch64/aarch64-sve-builtins-sve2.def 
b/gcc/config/aarch64/aarch64-sve-builtins-sve2.def
index c84c153e913..7d90e3b5e20 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins-sve2.def
+++ b/gcc/config/aarch64/aarch64-sve-builtins-sve2.def
@@ -363,3 +363,15 @@ DEF_SVE_FUNCTION_GS_FPM (svmlallbb_lane, 
ternary_mfloat8_lane, s_float_mf8, none
  DEF_SVE_FUNCTION_GS_FPM (svmlallbt_lane, ternary_mfloat8_lane, s_float_mf8, 
none, none, set)
  DEF_SVE_FUNCTION_GS_FPM (svmlalltb_lane, ternary_mfloat8_lane, s_float_mf8, 
none, none, set)
  #undef REQUIRED_EXTENSIONS
+
+#define REQUIRED_EXTENSIONS \
+  streaming_compatible (AARCH64_FL_SVE2 | AARCH64_FL_FP8DOT4, 
AARCH64_FL_SSVE_FP8DOT4)

Elsewhere we've been putting the non-streaming and streaming requirements
on separate lines if the whole thing doesn't fit on one line:

#define REQUIRED_EXTENSIONS \
   streaming_compatible (AARCH64_FL_SVE2 | AARCH64_FL_FP8DOT4, \
 AARCH64_FL_SSVE_FP8DOT4)

Same below.

Done


Looks good to me otherwise, thanks.

Richard

[PATCH] libstdc++: Simplify std::_Destroy using 'if constexpr'

2024-11-28 Thread Jonathan Wakely

This is another place where we can use 'if constexpr' to replace
dispatching to a specialized class template, improving compile times and
avoiding a function call.

libstdc++-v3/ChangeLog:

* include/bits/stl_construct.h (_Destroy(FwdIter, FwdIter)): Use
'if constexpr' instead of dispatching to a member function of a
class template.
(_Destroy_n(FwdIter, Size)): Likewise.
(_Destroy_aux, _Destroy_n_aux): Only define for C++98.
---

This seems worthwhile, as another small reduction in compile times,
similar to a number of recent patches.

Tested x86_64-linux.

 libstdc++-v3/include/bits/stl_construct.h | 33 ++-
 1 file changed, 27 insertions(+), 6 deletions(-)

diff --git a/libstdc++-v3/include/bits/stl_construct.h 
b/libstdc++-v3/include/bits/stl_construct.h
index 9d6111396e1..6889a9bfa0e 100644
--- a/libstdc++-v3/include/bits/stl_construct.h
+++ b/libstdc++-v3/include/bits/stl_construct.h
@@ -166,6 +166,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 #endif
 }
 
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Wc++17-extensions" // for if-constexpr
+
+#if __cplusplus < 201103L
   template
 struct _Destroy_aux
 {
@@ -185,6 +189,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 static void
 __destroy(_ForwardIterator, _ForwardIterator) { }
 };
+#endif
 
   /**
* Destroy a range of objects.  If the value_type of the object has
@@ -201,15 +206,21 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   // A deleted destructor is trivial, this ensures we reject such types:
   static_assert(is_destructible<_Value_type>::value,
"value type is destructible");
-#endif
+  if constexpr (!__has_trivial_destructor(_Value_type))
+   for (; __first != __last; ++__first)
+ std::_Destroy(std::__addressof(*__first));
 #if __cpp_constexpr_dynamic_alloc // >= C++20
-  if (std::__is_constant_evaluated())
-   return std::_Destroy_aux::__destroy(__first, __last);
+  else if (std::__is_constant_evaluated())
+   for (; __first != __last; ++__first)
+ std::destroy_at(std::__addressof(*__first));
 #endif
+#else
   std::_Destroy_aux<__has_trivial_destructor(_Value_type)>::
__destroy(__first, __last);
+#endif
 }
 
+#if __cplusplus < 201103L
   template
 struct _Destroy_n_aux
 {
@@ -234,6 +245,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  return __first;
}
 };
+#endif
 
   /**
* Destroy a range of objects.  If the value_type of the object has
@@ -250,14 +262,23 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   // A deleted destructor is trivial, this ensures we reject such types:
   static_assert(is_destructible<_Value_type>::value,
"value type is destructible");
-#endif
+  if constexpr (!__has_trivial_destructor(_Value_type))
+   for (; __count > 0; (void)++__first, --__count)
+ std::_Destroy(std::__addressof(*__first));
 #if __cpp_constexpr_dynamic_alloc // >= C++20
-  if (std::__is_constant_evaluated())
-   return std::_Destroy_n_aux::__destroy_n(__first, __count);
+  else if (std::__is_constant_evaluated())
+   for (; __count > 0; (void)++__first, --__count)
+ std::destroy_at(std::__addressof(*__first));
 #endif
+  else
+   std::advance(__first, __count);
+  return __first;
+#else
   return std::_Destroy_n_aux<__has_trivial_destructor(_Value_type)>::
__destroy_n(__first, __count);
+#endif
 }
+#pragma GCC diagnostic pop
 
 #if __glibcxx_raw_memory_algorithms // >= C++17
   template 
-- 
2.47.0

Re: Re [PATCH v5] replace atoi with strtol in varasm.cc (decode_reg_name_and_count) [PR114540]

2024-11-28 Thread Heiko Eißfeldt


There are three important parts missing.

I don't see you in MAINTAINERS file, so you need to decide if you assign
copyright to FSF or submit this under DCO.

I wonder if it is ok to add myself to MAINTAINERS file?

Seehttps://gcc.gnu.org/contribute.html#legal for more details (if you
already have FSF assignment on file, somebody would need to check that,
I don't have access to that).

I wanted to assign copyright to FSF but got no answer from
ass...@fsf.org (nor from ass...@gnu.org)
for my request of the required documents yet. So for now I go with DCO.

Another part is that a ChangeLog entry is missing (as documented in
https://gcc.gnu.org/codingconventions.html#ChangeLogs).
For your patch, that would be something like:
PR middle-end/114540
* varasm.cc (decode_reg_name_and_count): Use strtoul instead of atoi
and simplify verification the whole asmspec contains just decimal
digits.
(if testcase is added, empty line and
* testcase_filename_relative_to_gcc/testsuite/: New test.
added too; and if you go with DCO, followed by empty line and Signed-Off-By: 
line).

The third part is missing testcase, for PR like this it should be probably
in gcc/testsuite/gcc.dg/pr114540.c, have
/* PR middle-end/114540 */
/* { dg-do compile } */

and add /* { dg-error "whatever error to expect" } */
directives to the lines on which errors will appear after the patch.

Done.
diff --git a/ChangeLog b/ChangeLog
index a0b48aa45cb..9b8dac9fc91 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,7 @@
+2024-11-28  Heiko Eißfeldt  
+
+   * MAINTAINERS: Add myself to write after approval.
+
 2024-11-25  Sandra Loosemore  
 
* MAINTAINERS: Remove references to nios2.
diff --git a/MAINTAINERS b/MAINTAINERS
index 26455d1cabf..0fab8de6ac7 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -453,6 +453,7 @@ David Edelsohn  dje 

 Bernd Edlinger  edlinger
 Phil Edwardspme 
 Mark Eggleston  markeggleston   

+Heiko Eißfeldt  -   
 Steve Ellceysje 
 Ben Ellistonbje 
 Mohan Embar membar  
diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 2be87f2079c..53295e34434 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,16 @@
+2024-11-28  Heiko Eißfeldt  
+
+   PR middle-end/114540
+   * varasm.cc (decode_reg_name_and_count): Use strtoul instead of atoi
+   and simplify verification the whole asmspec contains just decimal
+   digits.
+
+   * gcc.dg/pr114540.c: New test.
+
+   Signed-off-by:  Heiko Eißfeldt  
+   Co-authored-by: Jakub Jelinek  
+
+
 2024-11-27  Uros Bizjak  
 
PR target/36503
diff --git a/gcc/testsuite/gcc.dg/pr114540.c b/gcc/testsuite/gcc.dg/pr114540.c
new file mode 100644
index 000..6d1aadc443f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr114540.c
@@ -0,0 +1,25 @@
+/* PR middle-end/114540 */
+/* { dg-do compile } */
+/* { dg-options "" } */
+
+void f()
+{
+asm("":::"2147483648");/* INT_MAX + 1   { dg-error 
"unknown register name" } */
+asm("":::"4294967296");/* UINT_MAX + 1  { dg-error 
"unknown register name" } */
+asm("":::"18446744073709551616");  /* ULONG_MAX + 1 { dg-error 
"unknown register name" } */
+asm("":::"9223372036854775808");   /* LONG_MAX + 1  { dg-error 
"unknown register name" } */
+asm("":::"9223372036854775807");   /* LONG_MAX  { dg-error 
"unknown register name" } */
+asm("":::"2147483647");/* INT_MAX   { dg-error 
"unknown register name" } */
+asm("":::"2147483647&"); /* INT_MAX + garbage char  { dg-error 
"unknown register name" } */
+asm("":::"0"); /* real reg */
+
+register int a asm("2147483648"); /* INT_MAX + 1  { 
dg-error "invalid register name for" } */
+register int b asm("4294967296"); /* UINT_MAX + 1 { 
dg-error "invalid register name for" } */
+register int c asm("18446744073709551616");  /* ULONG_MAX + 1 { 
dg-error "invalid register name for" } */
+register int d asm("9223372036854775808"); /* LONG_MAX + 1{ 
dg-error "invalid register name for" } */
+register int e asm("9223372036854775807"); /* LONG_MAX{ 
dg-error "invalid register name for" } */
+register int f asm("2147483647"); /* INT_MAX  { 
dg-error "invalid register name for" } */
+register int g asm("2147483647&"); /* INT_MAX + garbage char  { 
dg-error "invalid register name for" } */
+register int h asm("0"); /* real reg */
+}
+
diff --git a/gcc/varasm.cc b/gcc/varasm.cc
index be11123180c..261621a18c7 100644
--- a/gcc/varasm.cc
+++ b/gcc/varasm.cc
@@ -990,16 +990,21 @@ decode_reg_name_and_count (const char *asmspec, int 
*pnregs)
   asmspec = strip_reg_name (asmspec);
 
   /* Allow a decimal number as a

Re: [PATCH v5] c++: Implement P2662R3, Pack Indexing [PR113798]

2024-11-28 Thread Jason Merrill


On 11/27/24 9:05 PM, Marek Polacek wrote:

On Wed, Nov 27, 2024 at 04:19:33PM -0500, Jason Merrill wrote:

On 11/6/24 3:33 PM, Marek Polacek wrote:

On Mon, Nov 04, 2024 at 11:10:05PM -0500, Jason Merrill wrote:

On 10/30/24 4:59 PM, Marek Polacek wrote:

On Wed, Oct 30, 2024 at 09:01:36AM -0400, Patrick Palka wrote:

On Tue, 29 Oct 2024, Marek Polacek wrote:

+static tree
+cp_parser_pack_index (cp_parser *parser, tree pack)
+{
+  if (cxx_dialect < cxx26)
+pedwarn (cp_lexer_peek_token (parser->lexer)->location,
+OPT_Wc__26_extensions, "pack indexing only available with "
+"%<-std=c++2c%> or %<-std=gnu++2c%>");
+  /* Consume the '...' token.  */
+  cp_lexer_consume_token (parser->lexer);
+  /* Consume the '['.  */
+  cp_lexer_consume_token (parser->lexer);
+
+  if (cp_lexer_next_token_is (parser->lexer, CPP_CLOSE_SQUARE))
+{
+  error_at (cp_lexer_peek_token (parser->lexer)->location,
+   "pack index missing");


Maybe cp_parser_error?


Unsure.  This:

template
void foo(Ts...[]);

then generates:

error: variable or field 'foo' declared void
error: expected primary-expression before '...' token
error: pack index missing before ']' token

which doesn't seem better.


I guess the question is whether we need to deal with the vexing parse. But
in this case it'd be ill-formed regardless, so what you have is fine.


@@ -6368,6 +6416,12 @@ cp_parser_primary_expression (cp_parser *parser,
  = make_location (caret_loc, start_loc, finish_loc);
decl.set_location (combined_loc);
+
+   /* "T...[constant-expression]" is a C++26 pack-index-expression.  */
+   if (cp_lexer_next_token_is (parser->lexer, CPP_ELLIPSIS)
+   && cp_lexer_nth_token_is (parser->lexer, 2, CPP_OPEN_SQUARE))
+ decl = cp_parser_pack_index (parser, decl);


Shouldn't this be in cp_parser_id_expression?


It should, but I need to wait until after finish_id_expression, so that
DECL isn't just an identifier node.


Ah, makes sense.


+ ~ computed-type-specifier


Hmm, seems we never implemented ~decltype.


Looks like CWG 1753: .


Thanks.


@@ -4031,6 +4036,15 @@ find_parameter_packs_r (tree *tp, int *walk_subtrees, 
void* data)
  *walk_subtrees = 0;
  return NULL_TREE;
+case PACK_INDEX_TYPE:
+case PACK_INDEX_EXPR:
+  /* We can have an expansion of an expansion, such as "Ts...[Is]...",
+so do look into the index.  */
+  cp_walk_tree (&PACK_INDEX_INDEX (t), &find_parameter_packs_r, ppd,
+   ppd->visited);
+  *walk_subtrees = 0;
+  return NULL_TREE;


Do we need to handle these specifically here?  I'd think the handling in
cp_walk_subtrees would be enough.


I think I do, otherwise the Ts...[Is]... test doesn't work.
It is used when calling check_for_bare_parameter_packs.


Makes sense.


I'm not seeing a test for https://eel.is/c++draft/diff#cpp23.dcl.dcl-2 or
the code to handle this case differently in C++23 vs 26.

Ah, right.  I've added the test (pack-indexing11.C) but we don't
compile it C++23 as we should due to:

pack-indexing11.C:7:13: error: expected ',' or '...' before '[' token
  7 | void f(T... [1]);
| ^

which seems like a bug.  Opened .

Is fixing that a requirement for this patch?


No.  Really, given that we're reusing this grammar, it's probably fine to
never fix it.


I've closed it.
  

This patch implements C++26 Pack Indexing, as described in
.

The issue discussing how to mangle pack indexes has not been resolved
yet  and I've
made no attempt to address it so far.

Unlike v1, which used augmented TYPE/EXPR_PACK_EXPANSION codes, this
version introduces two new codes: PACK_INDEX_EXPR and PACK_INDEX_TYPE.
Both carry two operands: the pack expansion and the index.  They are
handled in tsubst_pack_index: substitute the index and the pack and
then extract the element from the vector (if possible).

To handle pack indexing in a decltype or with decltype(auto), there is
also the new PACK_INDEX_PARENTHESIZED_P flag.

With this feature, it's valid to write something like

using U = tmpl;

where we first expand the template argument into

Ts...[Is#0], Ts...[Is#1], ...

and then substitute each individual pack index.

+  MARK_TS_TYPE_NON_COMMON (PACK_INDEX_TYPE);


I wonder about trying to use the tree_common symtab member for the type
index so we don't need non_common, but that's not necessary.


+   if (cp_lexer_next_token_is (parser->lexer, CPP_ELLIPSIS)
+   && cp_lexer_nth_token_is (parser->lexer, 2, CPP_OPEN_SQUARE))


This happens a lot in the parser changes, how about factoring it into
cp_parser_next_tokens_are_pack_index?


Done.
  

Or change cp_parser_pack_index to cp_parser_maybe_pack_index that does this
check, then returns the argument if we aren't looking at a pack index?


+c

Re: [PATCH] arm: [MVE intrinsics] fix vctpq intrinsic implementation [PR target/117814]

2024-11-28 Thread Andre Vieira (lists)


Hi Christophe,

On 28/11/2024 10:22, Christophe Lyon wrote:

The VCTP instruction creates a Vector Tail Predicate in VPR.P0, based
on the input value, but also constrained by a VPT block (if present),
or if used within a DLSTP/LETP loop.

Therefore we need to inform the compiler that this intrinsic reads the
FPCXT register, otherwise it could make incorrect assumptions: for
instance in test7() from gcc.target/arm/mve/dlstp-compile-asm-2.c it
would hoist p1 = vctp32q (g) outside of the loop.


We chatted about this offlist but it's good to share here for others 
too. I do not believe the transformation gcc is doing here is wrong. The 
transformation we do for test 7, along with some others in the 
testsuite, relies on analysis we do to check whether masks, that are not 
the loop predicate mask, used within the loop have a side effect. In 
other words, any instruction that is not predicated by the loop 
predicate, be that unpredicated or predicated by another mask, triggers 
an analysis to check whether the results are used in a safe way. Check 
the comments above 'arm_mve_impl_predicated_p' in arm.cc


For test7 the non-loop predicate 'p1' is used to predicate a load inside 
the loop, when dlstp'ed that load will be masked by 'p & p1' instead, 
which means it could be loading less than initially intended, however, 
the results of that load are used in a vadd predicated by 'p' which 
means any values that it would have loaded if not masked by 'p' would 
have been discarded in the add, so it has no relevant effect.


Furthermore, I also believe the compiler is already aware that VCTP 
writes P0, given it has an input operand with the predicate 
'vpr_register_operand' and the register constraint '=Up'. During DLSTP 
transformation we rely on reads and writes to such operands to do our 
transformation and it should also provide other backend passes with 
enough information.


So I don't think this patch is needed.

[PATCH] arm, mve: Do not DLSTP transform loops if VCTP is not first

2024-11-28 Thread Andre Vieira

Hi,

This rejects any loops where any predicated instruction comes before the vctp
that generates the loop predicate.  Even though this is not a requirement for
dlstp transformation we have found potential issues where you can end up with a
wrong transformation, so it is safer to reject such loops.

OK for trunk?

gcc/ChangeLog:

* gcc/config/arm/arm.cc (arm_mve_get_loop_vctp): Reject loops with a
predicated instruction before the vctp.

gcc/testsuite/ChangeLog:

* gcc.target/arm/mve/dlstp-invalid-asm.c (test10): Renamed to...
(test10a): ... this.
(test10b): Variation of test10a with a small change to trigger an
issue.
---
 gcc/config/arm/arm.cc | 21 ++-
 .../gcc.target/arm/mve/dlstp-invalid-asm.c| 20 +-
 2 files changed, 35 insertions(+), 6 deletions(-)

diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc
index 7292fddef80..29c0f478f36 100644
--- a/gcc/config/arm/arm.cc
+++ b/gcc/config/arm/arm.cc
@@ -34749,11 +34749,22 @@ arm_mve_get_loop_vctp (basic_block bb)
  instruction.  We require arm_get_required_vpr_reg_param to be false
  to make sure we pick up a VCTP, rather than a VCTP_M.  */
   FOR_BB_INSNS (bb, insn)
-if (NONDEBUG_INSN_P (insn))
-  if (arm_get_required_vpr_reg_ret_val (insn)
-	  && (arm_mve_get_vctp_lanes (insn) != 0)
-	  && !arm_get_required_vpr_reg_param (insn))
-	return insn;
+{
+  if (!NONDEBUG_INSN_P (insn))
+	continue;
+  /* If we encounter a predicated instruction before the VCTP then we can
+	 not dlstp transform this loop because we would be imposing extra
+	 predication on that instruction which was not present in the original
+	 code.  */
+  if (arm_get_required_vpr_reg_param (insn))
+	return NULL;
+  if (arm_get_required_vpr_reg_ret_val (insn))
+	{
+	  if (arm_mve_get_vctp_lanes (insn) != 0)
+	return insn;
+	  return NULL;
+	}
+}
   return NULL;
 }
 
diff --git a/gcc/testsuite/gcc.target/arm/mve/dlstp-invalid-asm.c b/gcc/testsuite/gcc.target/arm/mve/dlstp-invalid-asm.c
index 26df2d30523..f26754cc482 100644
--- a/gcc/testsuite/gcc.target/arm/mve/dlstp-invalid-asm.c
+++ b/gcc/testsuite/gcc.target/arm/mve/dlstp-invalid-asm.c
@@ -128,7 +128,7 @@ void test9 (int32_t *a, int32_t *b, int32_t *c, int n)
 }
 
 /* Using a VPR that gets re-generated within the loop.  */
-void test10 (int32_t *a, int32_t *b, int32_t *c, int n)
+void test10a (int32_t *a, int32_t *b, int32_t *c, int n)
 {
   mve_pred16_t p = vctp32q (n);
   while (n > 0)
@@ -145,6 +145,24 @@ void test10 (int32_t *a, int32_t *b, int32_t *c, int n)
 }
 }
 
+/* Using a VPR that gets re-generated within the loop.  */
+void test10b (int32_t *a, int32_t *b, int32_t *c, int n)
+{
+  mve_pred16_t p = vctp32q (n-4);
+  while (n > 0)
+{
+  int32x4_t va = vldrwq_z_s32 (a, p);
+  p = vctp32q (n);
+  int32x4_t vb = vldrwq_z_s32 (b, p);
+  int32x4_t vc = vaddq_x_s32 (va, vb, p);
+  vstrwq_p_s32 (c, vc, p);
+  c += 4;
+  a += 4;
+  b += 4;
+  n -= 4;
+}
+}
+
 /* Using vctp32q_m instead of vctp32q.  */
 void test11 (int32_t *a, int32_t *b, int32_t *c, int n, mve_pred16_t p0)
 {

[committed] i386: Macroize compound shift patterns some more

2024-11-28 Thread Uros Bizjak

Merge ashl and  compound define_insn_and_split
patterns to form  macroized pattern.

No functional changes.

gcc/ChangeLog:

* config/i386/i386.md (*3_mask): Macroize
pattern from *ashl3_mask and *3_mask
using any_shift code iterator.
(*3_mask_1): Macroize pattern
from *ashl3_mask_1 and *3_mask_1
using any_shift code iterator.
(*3_add): Macroize pattern
from *ashl3_add and *3_add
using any_shift code iterator.
(*3_add_1): Macroize pattern
from *ashl3_add_1 and *3_add_1
using any_shift code iterator.
(*3_sub): Macroize pattern
from *ashl3_sub and *3_sub
using any_shift code iterator.
(*3_sub_1): Macroize pattern
from *ashl3_sub_1 and *3_sub_1
using any_shift code iterator.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Uros.
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 2fc48006bca..8eb9cb682b1 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -15847,157 +15847,6 @@ (define_expand "@x86_shift_adj_2"
   DONE;
 })
 
-;; Avoid useless masking of count operand.
-(define_insn_and_split "*ashl3_mask"
-  [(set (match_operand:SWI48 0 "nonimmediate_operand")
-   (ashift:SWI48
- (match_operand:SWI48 1 "nonimmediate_operand")
- (subreg:QI
-   (and
- (match_operand 2 "int248_register_operand" "c,r")
- (match_operand 3 "const_int_operand")) 0)))
-   (clobber (reg:CC FLAGS_REG))]
-  "ix86_binary_operator_ok (ASHIFT, mode, operands)
-   && (INTVAL (operands[3]) & (GET_MODE_BITSIZE (mode)-1))
-  == GET_MODE_BITSIZE (mode)-1
-   && ix86_pre_reload_split ()"
-  "#"
-  "&& 1"
-  [(parallel
- [(set (match_dup 0)
-  (ashift:SWI48 (match_dup 1)
-(match_dup 2)))
-  (clobber (reg:CC FLAGS_REG))])]
-{
-  operands[2] = force_reg (GET_MODE (operands[2]), operands[2]);
-  operands[2] = gen_lowpart (QImode, operands[2]);
-}
-  [(set_attr "isa" "*,bmi2")])
-
-(define_insn_and_split "*ashl3_mask_1"
-  [(set (match_operand:SWI48 0 "nonimmediate_operand")
-   (ashift:SWI48
- (match_operand:SWI48 1 "nonimmediate_operand")
- (and:QI
-   (match_operand:QI 2 "register_operand" "c,r")
-   (match_operand:QI 3 "const_int_operand"
-   (clobber (reg:CC FLAGS_REG))]
-  "ix86_binary_operator_ok (ASHIFT, mode, operands)
-   && (INTVAL (operands[3]) & (GET_MODE_BITSIZE (mode)-1))
-  == GET_MODE_BITSIZE (mode)-1
-   && ix86_pre_reload_split ()"
-  "#"
-  "&& 1"
-  [(parallel
- [(set (match_dup 0)
-  (ashift:SWI48 (match_dup 1)
-(match_dup 2)))
-  (clobber (reg:CC FLAGS_REG))])]
-  ""
-  [(set_attr "isa" "*,bmi2")])
-
-(define_insn_and_split "*ashl3_add"
-  [(set (match_operand:SWI48 0 "nonimmediate_operand")
-   (ashift:SWI48
- (match_operand:SWI48 1 "nonimmediate_operand")
- (subreg:QI
-   (plus
- (match_operand 2 "int248_register_operand" "c,r")
- (match_operand 3 "const_int_operand")) 0)))
-   (clobber (reg:CC FLAGS_REG))]
-  "ix86_binary_operator_ok (ASHIFT, mode, operands)
-   && (INTVAL (operands[3]) & ( * BITS_PER_UNIT - 1)) == 0
-   && ix86_pre_reload_split ()"
-  "#"
-  "&& 1"
-  [(parallel
- [(set (match_dup 0)
-  (ashift:SWI48 (match_dup 1)
-(match_dup 2)))
-  (clobber (reg:CC FLAGS_REG))])]
-{
-  operands[2] = force_reg (GET_MODE (operands[2]), operands[2]);
-  operands[2] = gen_lowpart (QImode, operands[2]);
-}
-  [(set_attr "isa" "*,bmi2")])
-
-(define_insn_and_split "*ashl3_add_1"
-  [(set (match_operand:SWI48 0 "nonimmediate_operand")
-   (ashift:SWI48
- (match_operand:SWI48 1 "nonimmediate_operand")
- (plus:QI
-   (match_operand:QI 2 "register_operand" "c,r")
-   (match_operand:QI 3 "const_int_operand"
-   (clobber (reg:CC FLAGS_REG))]
-  "ix86_binary_operator_ok (ASHIFT, mode, operands)
-   && (INTVAL (operands[3]) & ( * BITS_PER_UNIT - 1)) == 0
-   && ix86_pre_reload_split ()"
-  "#"
-  "&& 1"
-  [(parallel
- [(set (match_dup 0)
-  (ashift:SWI48 (match_dup 1)
-(match_dup 2)))
-  (clobber (reg:CC FLAGS_REG))])]
-  ""
-  [(set_attr "isa" "*,bmi2")])
-
-(define_insn_and_split "*ashl3_sub"
-  [(set (match_operand:SWI48 0 "nonimmediate_operand")
-   (ashift:SWI48
- (match_operand:SWI48 1 "nonimmediate_operand")
- (subreg:QI
-   (minus
- (match_operand 3 "const_int_operand")
- (match_operand 2 "int248_register_operand" "c,r")) 0)))
-   (clobber (reg:CC FLAGS_REG))]
-  "ix86_binary_operator_ok (ASHIFT, mode, operands)
-   && (INTVAL (operands[3]) & ( * BITS_PER_UNIT - 1)) == 0
-   && ix86_pre_reload_split ()"
-  "#"
-  "&& 1"
-  [(parallel
- [(set (match_dup 4)
-  (neg:QI (match_dup 2)))
-  (clobber (reg:CC FLAGS_REG))])
-   (parallel
- [(set (match_dup 0)
-  (ashift:SWI48 (match_dup 1)

Re: [PATCH] vect: Do not try to duplicate_and_interleave one-element mode.

2024-11-28 Thread Richard Sandiford

"Robin Dapp"  writes:
>> Could you walk me through the failure in more detail?  It sounds
>> like can_duplicate_and_interleave_p eventually gets to the point of
>> subdividing the original elements, instead of either combining consecutive
>> elements (the best case), or leaving them as-is (the expected fallback
>> for SVE).  But it sounds like those attempts fail in this case, but an
>> attempt to subdivide the elements succeeds.  Is that right?  And if so,
>> why does that happen?
>
> Apologies for the very late response.
>
> What I see is that we start with a base_vector_type vector([1,1]) long int
> and a count of 2, so ELT_BYTES = 16.
> We don't have a TI vector mode (and creating a single-element vector by
> interleaving is futile anyway) so the first attempt fails.
> The type in the second attempt is vector([1,1]) unsigned long but this
> is rejected because of
>
>   && multiple_p (GET_MODE_NUNITS (TYPE_MODE (vector_type)),   
>   
>   │
>2, &half_nelts))
>
> Then we try vector([2,2]) unsigned int which "succeeds".  This, however,
> eventually causes the ICE when we try to build a vector with 0 elements.

Ah, ok, thanks.

How about going for a slight variation of your original patch.  After:

  nvectors *= 2;

add:

  /* We need to be able to to fuse COUNT / NVECTORS elements together.  */
  if (!multple_p (count, nvectors))
return false;

OK like that if it works.

Richard

> Maybe another option would be to decline 1-element vectors right away?
>
> diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> index eac16e80ecd..d3e52489fa8 100644
> --- a/gcc/tree-vect-slp.cc
> +++ b/gcc/tree-vect-slp.cc
> @@ -427,7 +427,9 @@ can_duplicate_and_interleave_p (vec_info *vinfo, unsigned 
> int count,
> tree *permutes)
>  {
>tree base_vector_type = get_vectype_for_scalar_type (vinfo, elt_type, 
> count);
> -  if (!base_vector_type || !VECTOR_MODE_P (TYPE_MODE (base_vector_type)))
> +  if (!base_vector_type
> +  || !VECTOR_MODE_P (TYPE_MODE (base_vector_type))
> +  || maybe_lt (GET_MODE_NUNITS (TYPE_MODE (base_vector_type)), 2))
>  return false;
>
> Regards
>  Robin

Re: [PATCH v2] RISC-V: Minimal support for ssdbltrp and smdbltrp extension.

2024-11-28 Thread yulong




在 2024/11/28 19:24, Dongyan Chen 写道:

This patch support ssdbltrp[1] and smdbltrp[2] extension.
To enable GCC to recognize and process ssdbltrp and smdbltrp extension 
correctly at compile time.

[1] https://github.com/riscv/riscv-isa-manual/blob/main/src/ssdbltrp.adoc
[2] https://github.com/riscv/riscv-isa-manual/blob/main/src/smdbltrp.adoc

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc: New extension.
* common/config/riscv/riscv-ext-bitmask.def (RISCV_EXT_BITMASK): Ditto.
* config/riscv/riscv.opt: New mask.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/arch-45.c: New test.
* gcc.target/riscv/arch-46.c: New test.

---
  gcc/common/config/riscv/riscv-common.cc   | 6 ++
  gcc/common/config/riscv/riscv-ext-bitmask.def | 2 ++
  gcc/config/riscv/riscv.opt| 4 
  gcc/testsuite/gcc.target/riscv/arch-45.c  | 5 +
  gcc/testsuite/gcc.target/riscv/arch-46.c  | 5 +
  5 files changed, 22 insertions(+)
  create mode 100644 gcc/testsuite/gcc.target/riscv/arch-45.c
  create mode 100644 gcc/testsuite/gcc.target/riscv/arch-46.c

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index 4c9a72d1180..608f0950f0f 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -222,6 +222,8 @@ static const riscv_implied_info_t riscv_implied_info[] =
{"sscofpmf", "zicsr"},
{"ssstateen", "zicsr"},
{"sstc", "zicsr"},
+  {"ssdbltrp", "zicsr"},
+  {"smdbltrp", "zicsr"},
  
{"xsfvcp", "zve32x"},
  
@@ -401,6 +403,8 @@ static const struct riscv_ext_version riscv_ext_version_table[] =

{"sscofpmf",  ISA_SPEC_CLASS_NONE, 1, 0},
{"ssstateen", ISA_SPEC_CLASS_NONE, 1, 0},
{"sstc",  ISA_SPEC_CLASS_NONE, 1, 0},
+  {"ssdbltrp",  ISA_SPEC_CLASS_NONE, 1, 0},
+  {"smdbltrp",  ISA_SPEC_CLASS_NONE, 1, 0},
  
{"svinval", ISA_SPEC_CLASS_NONE, 1, 0},

{"svnapot", ISA_SPEC_CLASS_NONE, 1, 0},
@@ -1725,6 +1729,8 @@ static const riscv_ext_flag_table_t 
riscv_ext_flag_table[] =
RISCV_EXT_FLAG_ENTRY ("svinval", x_riscv_sv_subext, MASK_SVINVAL),
RISCV_EXT_FLAG_ENTRY ("svnapot", x_riscv_sv_subext, MASK_SVNAPOT),
RISCV_EXT_FLAG_ENTRY ("svvptc", x_riscv_sv_subext, MASK_SVVPTC),
+  RISCV_EXT_FLAG_ENTRY ("ssdbltrp", x_riscv_sv_subext, MASK_SSDBLTRP),
+  RISCV_EXT_FLAG_ENTRY ("smdbltrp", x_riscv_sv_subext, MASK_SMDBLTRP),
  
RISCV_EXT_FLAG_ENTRY ("ztso", x_riscv_ztso_subext, MASK_ZTSO),
  
diff --git a/gcc/common/config/riscv/riscv-ext-bitmask.def b/gcc/common/config/riscv/riscv-ext-bitmask.def

index a733533df98..9814b887b2d 100644
--- a/gcc/common/config/riscv/riscv-ext-bitmask.def
+++ b/gcc/common/config/riscv/riscv-ext-bitmask.def
@@ -80,5 +80,7 @@ RISCV_EXT_BITMASK ("zcf",   1,  5)
  RISCV_EXT_BITMASK ("zcmop", 1,  6)
  RISCV_EXT_BITMASK ("zawrs", 1,  7)
  RISCV_EXT_BITMASK ("svvptc",1,  8)
+RISCV_EXT_BITMASK ("ssdbltrp",   1,  9)
+RISCV_EXT_BITMASK ("smdbltrp",   1,  10)

Pay attention to the code format. Use tabs instead of spaces.
  
  #undef RISCV_EXT_BITMASK

diff --git a/gcc/config/riscv/riscv.opt b/gcc/config/riscv/riscv.opt
index a6a61a83db1..5900da57ca2 100644
--- a/gcc/config/riscv/riscv.opt
+++ b/gcc/config/riscv/riscv.opt
@@ -468,6 +468,10 @@ Mask(SVNAPOT) Var(riscv_sv_subext)
  
  Mask(SVVPTC) Var(riscv_sv_subext)
  
+Mask(SSDBLTRP) Var(riscv_sv_subext)

+
+Mask(SMDBLTRP) Var(riscv_sv_subext)
+
  TargetVariable
  int riscv_ztso_subext
  


I think it`s better to split this patch into two commits.

Waitting for others to comment.


diff --git a/gcc/testsuite/gcc.target/riscv/arch-45.c 
b/gcc/testsuite/gcc.target/riscv/arch-45.c
new file mode 100644
index 000..85e2510b40a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/arch-45.c
@@ -0,0 +1,5 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc_ssdbltrp -mabi=lp64" } */
+int foo()
+{
+}
diff --git a/gcc/testsuite/gcc.target/riscv/arch-46.c 
b/gcc/testsuite/gcc.target/riscv/arch-46.c
new file mode 100644
index 000..c95cc729cce
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/arch-46.c
@@ -0,0 +1,5 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc_smdbltrp -mabi=lp64" } */
+int foo()
+{
+}

[PATCH v3 1/8] aarch64: Fix ACLE macro __ARM_FEATURE_SVE_VECTOR_OPERATORS

2024-11-28 Thread Tejas Belagod

This patch enables ACLE macro __ARM_FEATURE_SVE_VECTOR_OPERATORS to indicate
that C/C++ language operations are available natively on SVE ACLE types.

gcc/ChangeLog:

* config/aarch64/aarch64-c.cc (aarch64_update_cpp_builtins): Define
__ARM_FEATURE_SVE_VECTOR_OPERATORS.
---
 gcc/config/aarch64/aarch64-c.cc | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-c.cc b/gcc/config/aarch64/aarch64-c.cc
index 689c763cd45..3cc2c97c6d8 100644
--- a/gcc/config/aarch64/aarch64-c.cc
+++ b/gcc/config/aarch64/aarch64-c.cc
@@ -193,15 +193,19 @@ aarch64_update_cpp_builtins (cpp_reader *pfile)
   aarch64_def_or_undef (TARGET_SIMD_RDMA, "__ARM_FEATURE_QRDMX", pfile);
   aarch64_def_or_undef (TARGET_SVE, "__ARM_FEATURE_SVE", pfile);
   cpp_undef (pfile, "__ARM_FEATURE_SVE_BITS");
+  cpp_undef (pfile, "__ARM_FEATURE_SVE_VECTOR_OPERATORS");
   if (TARGET_SVE)
 {
   int bits;
+  int ops = 1;
   if (!BITS_PER_SVE_VECTOR.is_constant (&bits))
-   bits = 0;
+   {
+ bits = 0;
+ ops = 2;
+   }
   builtin_define_with_int_value ("__ARM_FEATURE_SVE_BITS", bits);
+  builtin_define_with_int_value ("__ARM_FEATURE_SVE_VECTOR_OPERATORS", 
ops);
 }
-  aarch64_def_or_undef (TARGET_SVE, "__ARM_FEATURE_SVE_VECTOR_OPERATORS",
-   pfile);
   aarch64_def_or_undef (TARGET_SVE_I8MM,
"__ARM_FEATURE_SVE_MATMUL_INT8", pfile);
   aarch64_def_or_undef (TARGET_SVE_F32MM,
-- 
2.25.1

[PATCH v3 0/8] aarch64: Enable C/C++ operations on SVE ACLE types.

2024-11-28 Thread Tejas Belagod

Hi,

This is v3 of the series
  https://gcc.gnu.org/pipermail/gcc-patches/2024-November/669255.html

based on review comments. Changes in this version include:

1. Better way of handling poly-sized vector checking in gimple-fold.cc
2. Changelog fix.

Thanks all for the reviews. Based on Richard's comment

https://gcc.gnu.org/pipermail/gcc-patches/2024-November/669289.html

I will commit it at EoB today if there are no further reviews.


This patchset enables C/C++ operations on SVE ACLE types.  The changes enable
operations on SVE ACLE types to have the same semantics as GNU vector types.
These operations like (+, -, &, | etc) behave exactly as they would behave on
GNU vector types. The operations are self-contained as in we still don't allow
mixing GNU and SVE vector types in, for eg, binary operations because the 
typeof the expression is ambiguous and this causes PCS issues.

Other operations like implicit conversions behave as they would with GNU 
vectors i.e.

gnu_uv = sv_uv; // This is possible as long as the size, shape and 
element-signedness
// of both vectors are the same.
gnu_uv = sv_sv; // Error as implicit conversion from signed to unsigned is not 
possible
// even though size and shape may be similar.

Such assignments would have to go through an explicit cast

gnu_uv = (gnu_uv)sv_sv;

Following unary operations are supported:
  
  sve_type_var[0];
  &sve_type_var[0];
  sve_type_var[n];
  &sve_type_var[n];
  +sve_type_var;
  -sve_type_var;
  ~sve_type_var;
  !sve_type_var; /* Allowed in C++ */
  *sve_type_var; /* Error! */
  __real sve_type_var; /* Error! */
  __imag sve_type_var; /* Error! */
  ++sve_type_var;
  --sve_type_var;
  sve_type_var++;
  sve_type_var--;

Following binary ops are supported:

  sve_type_var + sve_type_var;
  sve_type_var - sve_type_var;
  sve_type_var * sve_type_var;
  sve_type_var / sve_type_var;
  sve_type_var % sve_type_var;
  sve_type_var & sve_type_var;
  sve_type_var | sve_type_var;
  sve_type_var ^ sve_type_var;
  sve_type_var == sve_type_var;
  sve_type_var != sve_type_var;
  sve_type_var <= sve_type_var;
  sve_type_var < sve_type_var;
  sve_type_var > sve_type_var;
  sve_type_var >= sve_type_var;
  sve_type_var << sve_type_var;
  sve_type_var >> sve_type_var;
  sve_type_var && sve_type_var; /* Allowed in C++ */
  sve_type_var || sve_type_var; /* Allowed in C++ */

/* Vector-scalar binary arithmetic. The reverse is also supported
   eg.  + sve_type_var  */  

  sve_type_var + ;
  sve_type_var - ;
  sve_type_var * ;
  sve_type_var / ;
  sve_type_var % ;
  sve_type_var & ;
  sve_type_var | ;
  sve_type_var ^ ;
  sve_type_var == ;
  sve_type_var != ;
  sve_type_var <= ;
  sve_type_var < ;
  sve_type_var > ;
  sve_type_var >= ;
  sve_type_var << ;
  sve_type_var >> ;
  sve_type_var && ; /* Allowed in C++ */
  sve_type_var || ; /* Allowed in C++ */
  sve_type_var + ;
  sve_type_var - ;
  sve_type_var * ;
  sve_type_var / ;
  sve_type_var % ;
  sve_type_var & ;
  sve_type_var | ;
  sve_type_var ^ ;
  sve_type_var == ;
  sve_type_var != ;
  sve_type_var <= ;
  sve_type_var < ;
  sve_type_var > ;
  sve_type_var >= ;
  sve_type_var << ;
  sve_type_var >> ;
  sve_type_var && ; /* Allowed in C++ */
  sve_type_var || ; /* Allowed in C++ */

Ternary operations:

   ? sve_type_var : sve_type_var;

  sve_type_var ? sve_type_var : sve_type_var; /* Allowed in C++ */

Builtins:

  /* Vector built-ins.  */

  __builtin_shuffle (sve_type_var, sve_type_var, sve_type_var);
  __builtin_convertvector (sve_type_var, );

These operations are supported for both fixed length and variable length 
vectors.

One outstanding fail
PASS->FAIL: g++.dg/ext/sve-sizeless-1.C  -std=gnu++11  (test for errors, line 
163)

I've left another outstanding fail as is - the test where an address is taken 
of an SVE vector element. I'm not
sure what the behaviour should be here.

Otherwise regression tested and bootstrapped on aarch64-linux-gnu. Bootstrapped 
on x86-linux-gnu.

OK for trunk?

Thanks,
Tejas.

Tejas Belagod (8):
  aarch64: Fix ACLE macro __ARM_FEATURE_SVE_VECTOR_OPERATORS
  aarch64: Make C/C++ operations possible on SVE ACLE types.
  c: Range-check indexing of SVE ACLE vectors
  gimple: Handle variable-sized vectors in BIT_FIELD_REF
  c: Fix constructor bounds checking for VLA and construct VLA vector
constants
  aarch64: Add testcase for C/C++ ops on SVE ACLE types.
  aarch64: Update SVE ACLE tests
  cp: Fix another assumption in the FE about constant vector indices.

 gcc/c-family/c-common.cc  |  10 +-
 gcc/c/c-typeck.cc |  16 +-
 gcc/config/aarch64/aarch64-c.cc   |  10 +-
 gcc/config/aarch64/aarch64-sve-builtins.cc|   5 +-
 gcc/cp/decl.cc|  11 +-
 gcc/gimple-fold.cc|   2 +-
 gcc/testsuite/g++.dg/ext/sve-sizeless-1.C |  11 +
 gcc/testsuite/g++.dg/ext/sve-sizeless-2.C |   9 +
 .../sve/acle/general-c++/gnu_vectors_1.C  | 4

[PATCH v3 3/8] c: Range-check indexing of SVE ACLE vectors

2024-11-28 Thread Tejas Belagod

This patch adds a check for non-GNU vectors to warn that the index is outside
the range of a fixed vector size.  For VLA vectors, we don't diagnose.

gcc/ChangeLog:

* c-family/c-common.cc (convert_vector_to_array_for_subscript): Add
range-check for target vector types.
---
 gcc/c-family/c-common.cc | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/gcc/c-family/c-common.cc b/gcc/c-family/c-common.cc
index 721407157bc..260c2e005e6 100644
--- a/gcc/c-family/c-common.cc
+++ b/gcc/c-family/c-common.cc
@@ -9045,10 +9045,12 @@ convert_vector_to_array_for_subscript (location_t loc,
   ret = !lvalue_p (*vecp);
 
   index = fold_for_warn (index);
-  if (TREE_CODE (index) == INTEGER_CST)
-if (!tree_fits_uhwi_p (index)
-   || maybe_ge (tree_to_uhwi (index), TYPE_VECTOR_SUBPARTS (type)))
- warning_at (loc, OPT_Warray_bounds_, "index value is out of bound");
+  /* Warn out-of-bounds index for vectors only if known.  */
+  if (poly_int_tree_p (index))
+   if (!tree_fits_poly_uint64_p (index)
+   || known_ge (tree_to_poly_uint64 (index),
+ TYPE_VECTOR_SUBPARTS (type)))
+   warning_at (loc, OPT_Warray_bounds_, "index value is out of bound");
 
   /* We are building an ARRAY_REF so mark the vector as addressable
  to not run into the gimplifiers premature setting of DECL_GIMPLE_REG_P
-- 
2.25.1

[PATCH v3 8/8] cp: Fix another assumption in the FE about constant vector indices.

2024-11-28 Thread Tejas Belagod

This patch adds a change to handle VLA's poly indices.

gcc/ChangeLog:

* cp/decl.cc (reshape_init_array_1): Handle poly indices.

gcc/testsuite/ChangeLog:

* g++.dg/ext/sve-sizeless-1.C: Update test to test initialize error.
* g++.dg/ext/sve-sizeless-2.C: Likewise.
---
 gcc/cp/decl.cc| 11 ---
 gcc/testsuite/g++.dg/ext/sve-sizeless-1.C | 11 +++
 gcc/testsuite/g++.dg/ext/sve-sizeless-2.C |  9 +
 3 files changed, 28 insertions(+), 3 deletions(-)

diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
index 80485f0a428..4b6a5191a8a 100644
--- a/gcc/cp/decl.cc
+++ b/gcc/cp/decl.cc
@@ -6894,15 +6894,20 @@ reshape_init_array_1 (tree elt_type, tree max_index, 
reshape_iter *d,
 
   if (sized_array_p)
 {
+  poly_uint64 midx;
   /* Minus 1 is used for zero sized arrays.  */
   if (integer_all_onesp (max_index))
return new_init;
 
-  if (tree_fits_uhwi_p (max_index))
-   max_index_cst = tree_to_uhwi (max_index);
+  if (tree_fits_poly_uint64_p (max_index))
+   midx = tree_to_poly_uint64 (max_index);
   /* sizetype is sign extended, not zero extended.  */
   else
-   max_index_cst = tree_to_uhwi (fold_convert (size_type_node, max_index));
+   midx = tree_to_poly_uint64 (fold_convert (size_type_node, max_index));
+
+  /* For VLA vectors, we restict the number of elements in the constructor
+to lower bound of the VLA elements.  */
+  max_index_cst = constant_lower_bound (midx);
 }
 
   /* Loop until there are no more initializers.  */
diff --git a/gcc/testsuite/g++.dg/ext/sve-sizeless-1.C 
b/gcc/testsuite/g++.dg/ext/sve-sizeless-1.C
index adee37a0551..0a5c80b92b8 100644
--- a/gcc/testsuite/g++.dg/ext/sve-sizeless-1.C
+++ b/gcc/testsuite/g++.dg/ext/sve-sizeless-1.C
@@ -124,6 +124,8 @@ void thrower2 () throw (svint8_t); // { dg-error {cannot 
throw or catch SVE type
 void thrower3 () throw (svint8_t); // { dg-error {cannot throw or catch SVE 
type 'svint8_t'} "" { target c++98_only } }
 #endif
 
+extern int bar (void);
+
 // Main tests for statements and expressions.
 
 void
@@ -161,6 +163,15 @@ statements (int n)
   svint8_t init_sve_sc5 = {};
   svint8_t init_sve_sc6 = { sve_sc1 };
   svint8_t init_sve_sc7 = { sve_sh1 }; // { dg-error {cannot convert 
'svint16_t' to 'svint8_t'} }
+  svint32_t init_sve_vc1 = { 0, 1 };
+  svint32_t init_sve_vc2 = { 0, bar () };
+  svint32_t init_sve_vc3 = { bar (), n };
+  svint32_t init_sve_vc4 = { 0, 1, 2, 3 };
+  svint32_t init_sve_vc5 = { 0, 1, bar (), 3 };
+  svint32_t init_sve_vc6 = { 0, 1, 2, 3, 4 }; // { dg-error {too many 
initializers for 'svint32_t'} }
+  svint32_t init_sve_vc7 = { 0, 1, 2, 3, bar () }; // { dg-error {too many 
initializers for 'svint32_t'} }
+  svint32_t init_sve_vc8 = { 0, 1, 2, 3, 4, 5 }; // { dg-error {too many 
initializers for 'svint32_t'} }
+  svint32_t init_sve_vc9 = { 0, bar (), 2, 3, 4, n }; // { dg-error {too many 
initializers for 'svint32_t'} }
 
   // Constructor calls.
 
diff --git a/gcc/testsuite/g++.dg/ext/sve-sizeless-2.C 
b/gcc/testsuite/g++.dg/ext/sve-sizeless-2.C
index 394ac1e4579..87937d060d2 100644
--- a/gcc/testsuite/g++.dg/ext/sve-sizeless-2.C
+++ b/gcc/testsuite/g++.dg/ext/sve-sizeless-2.C
@@ -161,6 +161,15 @@ statements (int n)
   svint8_t init_sve_sc5 = {};
   svint8_t init_sve_sc6 = { sve_sc1 };
   svint8_t init_sve_sc7 = { sve_sh1 }; // { dg-error {cannot convert 
'svint16_t' to 'svint8_t'} }
+  svint32_t init_sve_vc1 = { 0, 1 };
+  svint32_t init_sve_vc2 = { 0, bar () };
+  svint32_t init_sve_vc3 = { bar (), n };
+  svint32_t init_sve_vc4 = { 0, 1, 2, 3, 4, 5, 6, 7 };
+  svint32_t init_sve_vc5 = { 0, 1, bar (), 3, 4, 5, n, 7 };
+  svint32_t init_sve_vc6 = { 0, 1, 2, 3, 4, 5, 6, 7, 8 }; // { dg-error {too 
many initializers for 'svint32_t'} }
+  svint32_t init_sve_vc7 = { 0, 1, 2, 3, bar (), 5, 6, 7, n }; // { dg-error 
{too many initializers for 'svint32_t'} }
+  svint32_t init_sve_vc8 = { 0, bar (), 2, 3, 4, n, 5, 6, 7, 8, 9 }; // { 
dg-error {too many initializers for 'svint32_t'} }
+  svint32_t init_sve_vc9 = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 }; // { dg-error 
{too many initializers for 'svint32_t'} }
 
   // Constructor calls.
 
-- 
2.25.1

[PATCH v3 6/8] aarch64: Add testcase for C/C++ ops on SVE ACLE types.

2024-11-28 Thread Tejas Belagod

This patch adds a test case to cover C/C++ operators on SVE ACLE types.  This
does not cover all types, but covers most representative types.

gcc/testsuite:

* gcc.target/aarch64/sve/acle/general/cops.c: New test.
---
 .../aarch64/sve/acle/general/cops.c   | 579 ++
 1 file changed, 579 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/acle/general/cops.c

diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/general/cops.c 
b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/cops.c
new file mode 100644
index 000..f0dc9a9b21c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/cops.c
@@ -0,0 +1,579 @@
+/* { dg-do run { target aarch64_sve_hw } } */
+/* { dg-options "-O2" } */
+
+#include 
+#include 
+
+#define DECL_FUNC_UNARY(type, name, op, intr, su, sz, id) \
+  __attribute__ ((noipa)) \
+  type func_ ## name ## type ## _unary (type a) { \
+return op (a); \
+  } \
+  void checkfunc_ ## name ## type ## _unary () { \
+type data = svindex_ ## su ## sz (0, 1); \
+type zr = svindex_ ## su ## sz (0, 0); \
+type one = svindex_ ## su ## sz (1, 0); \
+type mone = svindex_ ## su ## sz (-1, 0); \
+svbool_t pg = svptrue_b ## sz (); \
+type exp = intr ## su ## sz ## _z (pg, id, data); \
+type actual = func_ ## name ## type ## _unary (data); \
+svbool_t res = svcmpeq_ ## su ## sz (pg, exp, actual); \
+if (svptest_any (pg, svnot_b_z (pg, res))) \
+  __builtin_abort (); \
+  }
+
+#define DECL_FUNC_UNARY_FLOAT(type, name, op, intr, su, sz, id) \
+  __attribute__ ((noipa)) \
+  type func_ ## name ## type ## _unary (type a) { \
+return op (a); \
+  } \
+  void checkfunc_ ## name ## type ## _unary () { \
+type data = svdup_n_ ## su ## sz (2.0); \
+type zr = svdup_n_ ## su ## sz (0.0); \
+type one = svdup_n_ ## su ## sz (1.0); \
+type mone = svdup_n_ ## su ## sz (-1.0); \
+svbool_t pg = svptrue_b ## sz (); \
+type exp = intr ## su ## sz ## _z (pg, id, data); \
+type actual = func_ ## name ## type ## _unary (data); \
+svbool_t res = svcmpeq_ ## su ## sz (pg, exp, actual); \
+if (svptest_any (pg, svnot_b_z (pg, res))) \
+  __builtin_abort (); \
+  }
+
+#define DECL_FUNC_INDEX(rtype, type, intr, su, sz)  \
+  __attribute__ ((noipa)) \
+  rtype func_ ## rtype ## type ## _vindex (type a, int n) { \
+return (a[n]); \
+  } \
+  __attribute__ ((noipa)) \
+  rtype func_ ## rtype ## type ## _cindex (type a) { \
+return (a[0]); \
+  } \
+  void checkfunc_ ## rtype ## type ## _vindex () { \
+type a = svindex_ ## su ## sz (0, 1); \
+int n = 2; \
+if (2 != func_ ## rtype ## type ## _vindex (a, n)) \
+  __builtin_abort (); \
+  } \
+  void checkfunc_ ## rtype ## type ## _cindex () { \
+type a = svindex_ ## su ## sz (1, 0); \
+if (1 != func_ ## rtype ## type ## _cindex (a)) \
+  __builtin_abort (); \
+  }
+
+#define DECL_FUNC_INDEX_FLOAT(rtype, type, intr, su, sz)  \
+  __attribute__ ((noipa)) \
+  rtype func_ ## rtype ## type ## _vindex (type a, int n) { \
+return (a[n]); \
+  } \
+  __attribute__ ((noipa)) \
+  rtype func_ ## rtype ## type ## _cindex (type a) { \
+return (a[0]); \
+  } \
+  void checkfunc_ ## rtype ## type ## _vindex () { \
+type a = svdup_n_ ## su ## sz (2.0); \
+int n = 2; \
+if (2.0 != func_ ## rtype ## type ## _vindex (a, n)) \
+  __builtin_abort (); \
+  } \
+  void checkfunc_ ## rtype ## type ## _cindex () { \
+type a = svdup_n_ ## su ## sz (4.0); \
+if (4.0 != func_ ## rtype ## type ## _cindex (a)) \
+  __builtin_abort (); \
+  }
+
+#define DECL_FUNC_BINARY(type, name, op, intr, su, sz)  \
+  __attribute__ ((noipa)) \
+  type func_ ## name  ## type ## _binary(type a, type b) { \
+return (a) op (b); \
+  } \
+  void checkfunc_ ## name ## type ## _binary () { \
+type a = svindex_ ## su ## sz (0, 1); \
+type b = svindex_ ## su ## sz (0, 2); \
+svbool_t all_true = svptrue_b ## sz (); \
+type exp = intr ## su ## sz ## _z (all_true, a, b); \
+type actual = func_ ## name ## type ## _binary (a, b); \
+svbool_t res = svcmpeq_ ## su ## sz (all_true, exp, actual); \
+if (svptest_any (all_true, svnot_b_z (all_true, res))) \
+  __builtin_abort (); \
+  }
+
+#define DECL_FUNC_BINARY_SHIFT(type, name, op, intr, su, sz)  \
+  __attribute__ ((noipa)) \
+  type func_ ## name  ## type ## _binary(type a, type b) { \
+return (a) op (b); \
+  } \
+  void checkfunc_ ## name ## type ## _binary () { \
+type a = svindex_ ## su ## sz (0, 1); \
+svuint ## sz ## _t b = svindex_u ## sz (0, 2); \
+type c = svindex_ ## su ## sz (0, 2); \
+svbool_t all_true = svptrue_b ## sz (); \
+type exp = intr ## su ## sz ## _z (all_true, a, b); \
+type actual = func_ ## name ## type ## _binary (a, c); \
+svbool_t res = svcmpeq_ ## su ## sz (all_true, exp, actual); \
+if (svptest_any (all_true, svnot_b_z (all_true, res))) \
+  __builtin_abort (); \
+

[PATCH v3 2/8] aarch64: Make C/C++ operations possible on SVE ACLE types.

2024-11-28 Thread Tejas Belagod

This patch changes the TYPE_INDIVISBLE flag to 0 to enable SVE ACLE types to be
treated as GNU vectors and have the same semantics with operations that are
defined on GNU vectors.

gcc/ChangeLog:

* config/aarch64/aarch64-sve-builtins.cc (register_builtin_types): Flip
TYPE_INDIVISBLE flag for SVE ACLE vector types.
---
 gcc/config/aarch64/aarch64-sve-builtins.cc | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/gcc/config/aarch64/aarch64-sve-builtins.cc 
b/gcc/config/aarch64/aarch64-sve-builtins.cc
index 0fec1cd439e..adbadd303d4 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins.cc
+++ b/gcc/config/aarch64/aarch64-sve-builtins.cc
@@ -4576,6 +4576,9 @@ register_builtin_types ()
  vectype = build_truth_vector_type_for_mode (BYTES_PER_SVE_VECTOR,
  VNx16BImode);
  num_pr = 1;
+ /* Leave svbool_t as indivisible for now.  We don't yet support
+C/C++ operators on predicates.  */
+ TYPE_INDIVISIBLE_P (vectype) = 1;
}
  else
{
@@ -4592,12 +4595,12 @@ register_builtin_types ()
  && TYPE_ALIGN (vectype) == 128
  && known_eq (size, BITS_PER_SVE_VECTOR));
  num_zr = 1;
+ TYPE_INDIVISIBLE_P (vectype) = 0;
}
  vectype = build_distinct_type_copy (vectype);
  gcc_assert (vectype == TYPE_MAIN_VARIANT (vectype));
  SET_TYPE_STRUCTURAL_EQUALITY (vectype);
  TYPE_ARTIFICIAL (vectype) = 1;
- TYPE_INDIVISIBLE_P (vectype) = 1;
  make_type_sizeless (vectype);
}
   if (num_pr)
-- 
2.25.1

[PATCH v3 4/8] gimple: Handle variable-sized vectors in BIT_FIELD_REF

2024-11-28 Thread Tejas Belagod

Handle variable-sized vectors for BIT_FIELD_REF canonicalization.

gcc/ChangeLog:

* gimple-fold.cc (maybe_canonicalize_mem_ref_addr): Handle variable
sized vector types in BIT_FIELD_REF canonicalization.
* tree-cfg.cc (verify_types_in_gimple_reference): Change object-size-
checking for BIT_FIELD_REF to error offsets that are known_gt to be
outside object-size.  Out-of-range offsets can happen in the case of
indices that reference VLA SVE vector elements that may be outside the
minimum vector size range and therefore maybe_gt is not appropirate
here.
---
 gcc/gimple-fold.cc | 2 +-
 gcc/tree-cfg.cc| 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc
index 5eedad54ced..4ad5ae03d91 100644
--- a/gcc/gimple-fold.cc
+++ b/gcc/gimple-fold.cc
@@ -6293,7 +6293,7 @@ maybe_canonicalize_mem_ref_addr (tree *t, bool is_debug = 
false)
 (TYPE_SIZE (TREE_TYPE (*t;
  widest_int ext
= wi::add (idx, wi::to_widest (TYPE_SIZE (TREE_TYPE (*t;
- if (wi::les_p (ext, wi::to_widest (TYPE_SIZE (vtype
+ if (maybe_le (ext, wi::to_poly_widest (TYPE_SIZE (vtype
{
  *t = build3_loc (EXPR_LOCATION (*t), BIT_FIELD_REF,
   TREE_TYPE (*t),
diff --git a/gcc/tree-cfg.cc b/gcc/tree-cfg.cc
index 9ac8304e676..87f9776c417 100644
--- a/gcc/tree-cfg.cc
+++ b/gcc/tree-cfg.cc
@@ -3175,7 +3175,7 @@ verify_types_in_gimple_reference (tree expr, bool 
require_lvalue)
  return true;
}
  if (!AGGREGATE_TYPE_P (TREE_TYPE (op))
- && maybe_gt (size + bitpos,
+ && known_gt (size + bitpos,
   tree_to_poly_uint64 (TYPE_SIZE (TREE_TYPE (op)
{
  error ("position plus size exceeds size of referenced object in "
-- 
2.25.1

[PATCH v1] RISC-V: Fix incorrect optimization options passing to widden

2024-11-28 Thread pan2 . li

From: Pan Li 

Like the strided load/store, the testcases of vector widen are
designed to pick up different sorts of optimization options but actually
these option are ignored according to the Execution log of gcc.log.
This patch would like to make it correct almost the same as what we fixed for
strided load/store.

The below test suites are passed for this patch.
* The rv64gcv fully regression test.

It is test only patch and obvious up to a point, will commit it
directly if no comments in next 48H.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/rvv.exp: Fix the incorrect optimization
options passing to testcases.

Signed-off-by: Pan Li 
---
 gcc/testsuite/gcc.target/riscv/rvv/rvv.exp | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp 
b/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
index 448374d49db..26113238c4f 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
+++ b/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
@@ -88,7 +88,7 @@ set AUTOVEC_TEST_OPTS [list \
   {-ftree-vectorize -O2 -mrvv-max-lmul=m4} ]
 foreach op $AUTOVEC_TEST_OPTS {
   dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/autovec/widen/*.\[cS\]]] 
\
-"" "$op"
+"$op" ""
 }
 
 # VLS-VLMAX tests
-- 
2.43.0

[PATCH v1] RISC-V: Fix incorrect optimization options passing to widden

2024-11-28 Thread pan2 . li

From: Pan Li 

Like the strided load/store, the testcases of vector widen are
designed to pick up different sorts of optimization options but actually
these option are ignored according to the Execution log of gcc.log.
This patch would like to make it correct almost the same as what we fixed for
strided load/store.

The below test suites are passed for this patch.
* The rv64gcv fully regression test.

It is test only patch and obvious up to a point, will commit it
directly if no comments in next 48H.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/rvv.exp: Fix the incorrect optimization
options passing to testcases.

Signed-off-by: Pan Li 
---
 gcc/testsuite/gcc.target/riscv/rvv/rvv.exp | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp 
b/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
index 448374d49db..26113238c4f 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
+++ b/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
@@ -88,7 +88,7 @@ set AUTOVEC_TEST_OPTS [list \
   {-ftree-vectorize -O2 -mrvv-max-lmul=m4} ]
 foreach op $AUTOVEC_TEST_OPTS {
   dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/autovec/widen/*.\[cS\]]] 
\
-"" "$op"
+"$op" ""
 }
 
 # VLS-VLMAX tests
-- 
2.43.0

[PATCH v3 5/8] c: Fix constructor bounds checking for VLA and construct VLA vector constants

2024-11-28 Thread Tejas Belagod

This patch adds support for checking bounds of SVE ACLE vector initialization
constructors.  It also adds support to construct vector constant from init
constructors.

gcc/ChangeLog:

* c-typeck.cc (process_init_element): Add check to restrict
constructor length to the minimum vector length allowed.
* tree.cc (build_vector_from_ctor): Add support to construct VLA vector
constants from init constructors.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/sve/acle/general-c/sizeless-1.c: Update test to
test initialize error.
* gcc.target/aarch64/sve/acle/general-c/sizeless-2.c: Likewise.
---
 gcc/c/c-typeck.cc| 16 +++-
 .../aarch64/sve/acle/general-c/sizeless-1.c  | 13 +
 .../aarch64/sve/acle/general-c/sizeless-2.c  | 12 
 gcc/tree.cc  | 16 +++-
 4 files changed, 47 insertions(+), 10 deletions(-)

diff --git a/gcc/c/c-typeck.cc b/gcc/c/c-typeck.cc
index e429ce9d176..709410bcf19 100644
--- a/gcc/c/c-typeck.cc
+++ b/gcc/c/c-typeck.cc
@@ -12069,12 +12069,18 @@ retry:
{
  tree elttype = TYPE_MAIN_VARIANT (TREE_TYPE (constructor_type));
 
-/* Do a basic check of initializer size.  Note that vectors
-   always have a fixed size derived from their type.  */
- if (tree_int_cst_lt (constructor_max_index, constructor_index))
+ /* Do a basic check of initializer size.  Note that vectors
+may not always have a fixed size derived from their type.  */
+ if (maybe_lt (tree_to_poly_uint64 (constructor_max_index),
+   tree_to_poly_uint64 (constructor_index)))
{
- pedwarn_init (loc, 0,
-   "excess elements in vector initializer");
+ /* Diagose VLA out-of-bounds as errors.  */
+ if (tree_to_poly_uint64 (constructor_max_index).is_constant())
+   pedwarn_init (loc, 0,
+ "excess elements in vector initializer");
+ else
+   error_init (loc, "excess elements in vector initializer");
+
  break;
}
 
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/general-c/sizeless-1.c 
b/gcc/testsuite/gcc.target/aarch64/sve/acle/general-c/sizeless-1.c
index b0389fa00a8..747bac464a5 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/general-c/sizeless-1.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/general-c/sizeless-1.c
@@ -38,6 +38,8 @@ void ext_consume_sve_sc (svint8_t);
 void ext_consume_varargs (int, ...);
 svint8_t ext_produce_sve_sc ();
 
+extern int bar (void);
+
 /* Main tests for statements and expressions.  */
 
 void
@@ -69,6 +71,17 @@ statements (int n)
 
   int initi_a = sve_sc1; /* { dg-error {incompatible types when initializing 
type 'int' using type 'svint8_t'} } */
   int initi_b = { sve_sc1 }; /* { dg-error {incompatible types when 
initializing type 'int' using type 'svint8_t'} } */
+  svint32_t init_sve_vc1 = { 0, 1 };
+  svint32_t init_sve_vc2 = { 0, bar () };
+  svint32_t init_sve_vc3 = { bar (), n };
+  svint32_t init_sve_vc4 = { 0, 1, 2, 3 };
+  svint32_t init_sve_vc5 = { 0, 1, bar (), 3 };
+  svint32_t init_sve_vc6 = { 0, 1, 2, 3, 4 }; /* { dg-error {excess elements 
in vector initializer} } */
+  svint32_t init_sve_vc7 = { 0, 1, 2, 3, bar () }; /* { dg-error {excess 
elements in vector initializer} } */
+  svint32_t init_sve_vc8 = { 0, 1, 2, 3, 4, 5 }; /* { dg-error {excess 
elements in vector initializer} } */
+  svint32_t init_sve_vc9 = { 0, bar (), 2, 3, 4, n }; /* { dg-error {excess 
elements in vector initializer} } */
+
+
 
   /* Compound literals.  */
 
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/general-c/sizeless-2.c 
b/gcc/testsuite/gcc.target/aarch64/sve/acle/general-c/sizeless-2.c
index d16f40b5f2a..33cd21610ea 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/general-c/sizeless-2.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/general-c/sizeless-2.c
@@ -38,6 +38,8 @@ void ext_consume_sve_sc (svint8_t);
 void ext_consume_varargs (int, ...);
 svint8_t ext_produce_sve_sc ();
 
+extern int bar (void);
+
 /* Main tests for statements and expressions.  */
 
 void
@@ -69,6 +71,16 @@ statements (int n)
 
   int initi_a = sve_sc1; /* { dg-error {incompatible types when initializing 
type 'int' using type 'svint8_t'} } */
   int initi_b = { sve_sc1 }; /* { dg-error {incompatible types when 
initializing type 'int' using type 'svint8_t'} } */
+  svint32_t init_sve_vc1 = { 0, 1 };
+  svint32_t init_sve_vc2 = { 0, bar () };
+  svint32_t init_sve_vc3 = { bar (), n };
+  svint32_t init_sve_vc4 = { 0, 1, 2, 3, 4, 5, 6, 7 };
+  svint32_t init_sve_vc5 = { 0, 1, bar (), 3, 4, 5, 6, 7 };
+  svint32_t init_sve_vc6 = { 0, 1, 2, 3, 4, 5, 6, 7, 8 }; /* { dg-warning 
{excess elements in vector initializer} } */
+  svint32_t init_sve_vc7 = { 0, 1, 2, 3, bar (), 5, 6, 7, 8 }; /* { dg-war

[PATCH v1] RISC-V: Fix RVV strided load/store testcases failure

2024-11-28 Thread pan2 . li

From: Pan Li 

This patch would like to fix the testcases failures of strided
load/store after sorts of optimization option passing to testcase.

* Add no strict align for vector option.
* Adjust dg-final by any-opts and/or no-opts if the rtl dump changes
  on different optimization options (like O2, O3, zvl).

The below test suites are passed for this patch.
* The rv64gcv fully regression test.

It is test only patch and obvious up to a point, will commit it
directly if no comments in next 48H.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/strided/strided_ld_st-1-f16.c: Fix
the failed test by target any-opts and/or no-opts.
* gcc.target/riscv/rvv/autovec/strided/strided_ld_st-1-f32.c: Ditto
* gcc.target/riscv/rvv/autovec/strided/strided_ld_st-1-f64.c: Ditto
* gcc.target/riscv/rvv/autovec/strided/strided_ld_st-1-i16.c: Ditto
* gcc.target/riscv/rvv/autovec/strided/strided_ld_st-1-i32.c: Ditto
* gcc.target/riscv/rvv/autovec/strided/strided_ld_st-1-i64.c: Ditto
* gcc.target/riscv/rvv/autovec/strided/strided_ld_st-1-i8.c: Ditto
* gcc.target/riscv/rvv/autovec/strided/strided_ld_st-1-u16.c: Ditto
* gcc.target/riscv/rvv/autovec/strided/strided_ld_st-1-u32.c: Ditto
* gcc.target/riscv/rvv/autovec/strided/strided_ld_st-1-u64.c: Ditto
* gcc.target/riscv/rvv/autovec/strided/strided_ld_st-1-u8.c: Ditto

Signed-off-by: Pan Li 
---
 .../rvv/autovec/strided/strided_ld_st-1-f16.c | 32 ++---
 .../rvv/autovec/strided/strided_ld_st-1-f32.c | 32 ++---
 .../rvv/autovec/strided/strided_ld_st-1-f64.c |  2 +-
 .../rvv/autovec/strided/strided_ld_st-1-i16.c | 32 ++---
 .../rvv/autovec/strided/strided_ld_st-1-i32.c | 46 ---
 .../rvv/autovec/strided/strided_ld_st-1-i64.c |  2 +-
 .../rvv/autovec/strided/strided_ld_st-1-i8.c  | 32 ++---
 .../rvv/autovec/strided/strided_ld_st-1-u16.c | 32 ++---
 .../rvv/autovec/strided/strided_ld_st-1-u32.c | 46 ---
 .../rvv/autovec/strided/strided_ld_st-1-u64.c |  2 +-
 .../rvv/autovec/strided/strided_ld_st-1-u8.c  | 32 ++---
 11 files changed, 231 insertions(+), 59 deletions(-)

diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/strided/strided_ld_st-1-f16.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/strided/strided_ld_st-1-f16.c
index a128e9fb20a..4098774ba38 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/strided/strided_ld_st-1-f16.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/strided/strided_ld_st-1-f16.c
@@ -1,13 +1,31 @@
 /* { dg-do compile } */
-/* { dg-options "-march=rv64gcv_zvfh -mabi=lp64d -fno-vect-cost-model 
-fdump-rtl-expand-details" } */
+/* { dg-options "-march=rv64gcv_zvfh -mabi=lp64d -mno-vector-strict-align 
-fno-vect-cost-model -fdump-rtl-expand-details" } */
 
 #include "strided_ld_st.h"
 
 DEF_STRIDED_LD_ST_FORM_1(_Float16)
 
-/* { dg-final { scan-rtl-dump-times ".MASK_LEN_STRIDED_LOAD " 4 "expand" { 
target { any-opts "-O3" } } } } */
-/* { dg-final { scan-rtl-dump-times ".MASK_LEN_STRIDED_STORE " 4 "expand" { 
target { any-opts "-O3" } } } } */
-/* { dg-final { scan-rtl-dump-times ".MASK_LEN_STRIDED_LOAD " 2 "expand" { 
target { any-opts "-O2" } } } } */
-/* { dg-final { scan-rtl-dump-times ".MASK_LEN_STRIDED_STORE " 2 "expand" { 
target { any-opts "-O2" } } } } */
-/* { dg-final { scan-assembler-times {vlse16.v} 1 } } */
-/* { dg-final { scan-assembler-times {vsse16.v} 1 } } */
+/* { dg-final { scan-rtl-dump-times ".MASK_LEN_STRIDED_LOAD " 4 "expand" { 
target {
+ any-opts "-O3"
+ no-opts "-mrvv-vector-bits=zvl"
+   } } } } */
+/* { dg-final { scan-rtl-dump-times ".MASK_LEN_STRIDED_STORE " 4 "expand" { 
target {
+ any-opts "-O3"
+ no-opts "-mrvv-vector-bits=zvl"
+   } } } } */
+
+/* { dg-final { scan-rtl-dump-times ".MASK_LEN_STRIDED_LOAD " 2 "expand" { 
target {
+ any-opts "-O2"
+ no-opts "-mrvv-vector-bits=zvl"
+   } } } } */
+/* { dg-final { scan-rtl-dump-times ".MASK_LEN_STRIDED_STORE " 2 "expand" { 
target {
+ any-opts "-O2"
+ no-opts "-mrvv-vector-bits=zvl"
+   } } } } */
+
+/* { dg-final { scan-assembler-times {vlse16.v} 1 { target {
+ no-opts "-mrvv-vector-bits=zvl"
+   } } } } */
+/* { dg-final { scan-assembler-times {vsse16.v} 1 { target {
+ no-opts "-mrvv-vector-bits=zvl"
+   } } } } */
diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/strided/strided_ld_st-1-f32.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/strided/strided_ld_st-1-f32.c
index 621c26a2df2..e1d1063ec8c 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/strided/strided_ld_st-1-f32.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/strided/strided_ld_st-1-f32.c
@@ -1,13 +1,31 @@
 /* { dg-do compile } */
-/* { dg-options "-march=rv64gcv -mabi=lp64d -fno-vect-cost-model 
-fdump-rtl-expand-details" } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -mno-vector-strict-align 
-fno-vect-cost-model -fdump-rtl-expand-details" } */
 
 #include "strided

Re: [PATCH]middle-end: rework vectorizable_store to iterate over single index [PR117557]

2024-11-28 Thread Richard Biener

On Wed, 27 Nov 2024, Tamar Christina wrote:

> Hi All,
> 
> The testcase
> 
> #include 
> #include 
> 
> #define N 8
> #define L 8
> 
> void f(const uint8_t * restrict seq1,
>const uint8_t *idx, uint8_t *seq_out) {
>   for (int i = 0; i < L; ++i) {
> uint8_t h = idx[i];
> memcpy((void *)&seq_out[i * N], (const void *)&seq1[h * N / 2], N / 2);
>   }
> }
> 
> compiled at -O3 -mcpu=neoverse-n1+sve
> 
> miscompiles to:
> 
> ld1wz31.s, p3/z, [x23, z29.s, sxtw]
> ld1wz29.s, p7/z, [x23, z30.s, sxtw]
> st1wz29.s, p7, [x24, z12.s, sxtw]
> st1wz31.s, p7, [x24, z12.s, sxtw]
> 
> rather than
> 
> ld1wz31.s, p3/z, [x23, z29.s, sxtw]
> ld1wz29.s, p7/z, [x23, z30.s, sxtw]
> st1wz29.s, p7, [x24, z12.s, sxtw]
> addvl   x3, x24, #2
> st1wz31.s, p3, [x3, z12.s, sxtw]
> 
> Where two things go wrong, the wrong mask is used and the address pointers to
> the stores are wrong.
> 
> This issue is happening because the codegen loop in vectorizable_store is a
> nested loop where in the outer loop we iterate over ncopies and in the inner
> loop we loop over vec_num.
> 
> For SLP ncopies == 1 and vec_num == SLP_NUM_STMS, but the loop mask is
> determined by only the outerloop index and the pointer address is only updated
> in the outer loop.
> 
> As such for SLP we always use the same predicate and the same memory location.
> This patch flattens the two loops and instead iterates over ncopies * vec_num
> and simplified the indexing.
> 
> This does not fully fix the gcc_r miscompile error in SPECCPU 2017 as the 
> error
> moves somewhere else.  I will look at that next but fixes some other libraries
> that also started failing.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu, arm-none-linux-gnueabihf,
> x86_64-pc-linux-gnu -m32, -m64 and no issues
> 
> Ok for master?

OK.

Thanks,
Richard.

> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>   PR tree-optimization/117557
>   * tree-vect-stmts.cc (vectorizable_store): Flatten the ncopies and
>   vec_num loops.
> 
> gcc/testsuite/ChangeLog:
> 
>   PR tree-optimization/117557
>   * gcc.target/aarch64/pr117557.c: New test.
> 
> ---
> diff --git a/gcc/testsuite/gcc.target/aarch64/pr117557.c 
> b/gcc/testsuite/gcc.target/aarch64/pr117557.c
> new file mode 100644
> index 
> ..80b3fde41109988db70eafd715224df0b0029cd1
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/pr117557.c
> @@ -0,0 +1,29 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -mcpu=neoverse-n1+sve -fdump-tree-vect" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> +
> +#include 
> +#include 
> +
> +#define N 8
> +#define L 8
> +
> +/*
> +**f:
> +**   ...
> +**   ld1wz[0-9]+.s, p([0-9]+)/z, \[x[0-9]+, z[0-9]+.s, sxtw\]
> +**   ld1wz[0-9]+.s, p([0-9]+)/z, \[x[0-9]+, z[0-9]+.s, sxtw\]
> +**   st1wz[0-9]+.s, p\1, \[x[0-9]+, z[0-9]+.s, sxtw\]
> +**   incbx([0-9]+), all, mul #2
> +**   st1wz[0-9]+.s, p\2, \[x\3, z[0-9]+.s, sxtw\]
> +**   ret
> +**   ...
> +*/
> +void f(const uint8_t * restrict seq1,
> +   const uint8_t *idx, uint8_t *seq_out) {
> +  for (int i = 0; i < L; ++i) {
> +uint8_t h = idx[i];
> +memcpy((void *)&seq_out[i * N], (const void *)&seq1[h * N / 2], N / 2);
> +  }
> +}
> +
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index 
> c2d5818b2786123fac7afe290d85c7dd2bda4308..4759c274f3ccbb111a907576539b2a8efb7726a3
>  100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -9228,7 +9228,8 @@ vectorizable_store (vec_info *vinfo,
>gcc_assert (!grouped_store);
>auto_vec vec_offsets;
>unsigned int inside_cost = 0, prologue_cost = 0;
> -  for (j = 0; j < ncopies; j++)
> +  int num_stmts = ncopies * vec_num;
> +  for (j = 0; j < num_stmts; j++)
>   {
> gimple *new_stmt;
> if (j == 0)
> @@ -9246,14 +9247,14 @@ vectorizable_store (vec_info *vinfo,
>   vect_get_slp_defs (op_node, gvec_oprnds[0]);
> else
>   vect_get_vec_defs_for_operand (vinfo, first_stmt_info,
> -ncopies, op, gvec_oprnds[0]);
> +num_stmts, op, 
> gvec_oprnds[0]);
> if (mask)
>   {
> if (slp_node)
>   vect_get_slp_defs (mask_node, &vec_masks);
> else
>   vect_get_vec_defs_for_operand (vinfo, stmt_info,
> -ncopies,
> +num_stmts,
>  mask, &vec_masks,
>  mask_vectype);
>   }
> @@ -9279,281 +9280,280 @@ vectorizable_store (vec_info *vinfo,
>   }
>  
> new_stmt = NULL;
> -   for (i = 0; i < vec_

Re: [wwwdocs][committed] projects/gomp/: Update for OpenMP 6.0 spec release

2024-11-28 Thread Thomas Schwinge

Hi Tobias!

On 2024-11-14T18:18:57+0100, Tobias Burnus  wrote:
> maybe doing parallel work doesn't work well.

>From my own experience: no, doesn't.  ;-)


> --- a/htdocs/projects/gomp/index.html
> +++ b/htdocs/projects/gomp/index.html

>  omp_target_memset and
> -  omp_target_memset_rect_async routines
> +  omp_target_rect_async routines
>  No

Rather: 'omp_target_memset_async'?


Grüße
 Thomas

[committed] libstdc++: Deprecate std::rel_ops namespace for C++20

2024-11-28 Thread Jonathan Wakely

This is deprecated in the C++20 standard and will be removed at some
point.

libstdc++-v3/ChangeLog:

* include/bits/stl_relops.h (rel_ops): Add deprecated attribute.
* testsuite/20_util/headers/utility/using_namespace_std_rel_ops.cc:
Add dg-warning for -Wdeprecated warnings.
* testsuite/20_util/rel_ops.cc: Likewise.
* testsuite/util/testsuite_containers.h: Disable -Wdeprecated
warnings when using rel_ops.
---

Nobody should be using this namespace in any version of C++, ever.

Tested x86_64-linux. Pushed to trunk.

 libstdc++-v3/include/bits/stl_relops.h   | 2 +-
 .../headers/utility/using_namespace_std_rel_ops.cc   | 2 +-
 libstdc++-v3/testsuite/20_util/rel_ops.cc| 2 +-
 libstdc++-v3/testsuite/util/testsuite_containers.h   | 9 +
 4 files changed, 12 insertions(+), 3 deletions(-)

diff --git a/libstdc++-v3/include/bits/stl_relops.h 
b/libstdc++-v3/include/bits/stl_relops.h
index 06c85ca8da9..29e7af3c250 100644
--- a/libstdc++-v3/include/bits/stl_relops.h
+++ b/libstdc++-v3/include/bits/stl_relops.h
@@ -63,7 +63,7 @@ namespace std _GLIBCXX_VISIBILITY(default)
 {
 _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
-  namespace rel_ops
+  namespace rel_ops _GLIBCXX20_DEPRECATED_SUGGEST("<=>")
   {
 /** @namespace std::rel_ops
  *  @brief  The generated relational operators are sequestered here.
diff --git 
a/libstdc++-v3/testsuite/20_util/headers/utility/using_namespace_std_rel_ops.cc 
b/libstdc++-v3/testsuite/20_util/headers/utility/using_namespace_std_rel_ops.cc
index 330bde88d63..b583eaa4713 100644
--- 
a/libstdc++-v3/testsuite/20_util/headers/utility/using_namespace_std_rel_ops.cc
+++ 
b/libstdc++-v3/testsuite/20_util/headers/utility/using_namespace_std_rel_ops.cc
@@ -21,5 +21,5 @@
 
 namespace gnu
 {
-  using namespace std::rel_ops;
+  using namespace std::rel_ops; // { dg-warning "deprecated" "" { target c++20 
} }
 }
diff --git a/libstdc++-v3/testsuite/20_util/rel_ops.cc 
b/libstdc++-v3/testsuite/20_util/rel_ops.cc
index 711822966d3..f84503293e1 100644
--- a/libstdc++-v3/testsuite/20_util/rel_ops.cc
+++ b/libstdc++-v3/testsuite/20_util/rel_ops.cc
@@ -24,7 +24,7 @@
 #include 
 #include 
 
-using namespace std::rel_ops;
+using namespace std::rel_ops; // { dg-warning "deprecated" "" { target c++20 } 
}
 
 // libstdc++/3628
 void test01()
diff --git a/libstdc++-v3/testsuite/util/testsuite_containers.h 
b/libstdc++-v3/testsuite/util/testsuite_containers.h
index 4dd78d4ec9d..f48bb54f140 100644
--- a/libstdc++-v3/testsuite/util/testsuite_containers.h
+++ b/libstdc++-v3/testsuite/util/testsuite_containers.h
@@ -183,9 +183,12 @@ namespace __gnu_test
 {
   forward_members_unordered(const typename _Tp::value_type& v)
   {
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Wdeprecated-declarations"
// Make sure that even if rel_ops is injected there is no ambiguity
// when comparing iterators.
using namespace std::rel_ops;
+#pragma GCC diagnostic pop
 
typedef _Tp test_type;
test_type container;
@@ -283,9 +286,12 @@ namespace __gnu_test
 {
   forward_members(_Tp& container)
   {
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Wdeprecated-declarations"
// Make sure that even if rel_ops is injected there is no ambiguity
// when comparing iterators.
using namespace std::rel_ops;
+#pragma GCC diagnostic pop
 
typedef traits<_Tp> traits_type;
iterator_concept_checks(container)
   {
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Wdeprecated-declarations"
// Make sure that even if rel_ops is injected there is no ambiguity
// when comparing iterators.
using namespace std::rel_ops;
+#pragma GCC diagnostic pop
 
assert( !(container.begin() < container.begin()) );
assert( !(container.cbegin() < container.cbegin()) );
-- 
2.47.0

Re: [PATCH v4 1/2] [APX CFCMOV] Support APX CFCMOV in if_convert pass

2024-11-28 Thread Richard Sandiford

Thanks for using maskload/store_optab and sorry for the very slow review.
Been a bit swamped lately...

The patch seems to be using maskload and maskstore as though they were
variants of movMMcc, with the comparison being part of the load/store.
Instead, the current interface is that:

  maskload 

would do:

  op0 = op2 ? op1 : op3

where op2 is an already-computed condition.  Similarly:

  maskstore 

would do:

  if (op2) op0 = op1

where again op2 is an already-computed condition.  (Note that maskstore
only has three operands, not four.)

This would be the natural interface if we were trying to generate these
instructions from gimple.  But I suppose the interface makes things awkward
for this patch, where you're trying to generate the pattern from an RTL pass.

So this raises two questions:

(1) How should we handle the requirement to have a comparison operand,
instead of the normal precomputed boolean operand?

(2) maskload and maskstore take two modes: the mode of the data being
loaded/stored, and the mode of the condition.  What should the mode
of the condition be if the operand is a comparison?

TBH I'm not sure what to do here.  One option would be to emit a separate
cstore (emit_store_flag) and then pass the result of that cstore to operand 2
of the maskload/store.  The mode of the operand could be the integer
equivalent of the value being loaded/stored (e.g. SI when loading or
storing SF).  I think this would work best with any future gimple
support.  But it likely means that we rely on combine to eliminate
redundant comparison-of-cstore sequences.

Another option would be to allow operand 2 to be a comparison operand,
as for movMMcc.  Regarding (2), we could choose to use VOIDmode for
the second mode, since (a) that is the actual mode of a canonicalised
comparison and (b) it should safely isolate the comparison usage from
the non-comparison usage.  If no-one has any better suggestions,
I suppose we should do this.  It isn't mutually exclusive with the
first option: we could still handle precomputed boolean conditions
as well, in a future patch.

"Kong, Lingling"  writes:
> @@ -2132,6 +2134,54 @@ noce_emit_bb (rtx last_insn, basic_block bb, bool 
> simple)
>return true;
>  }
>  
> +/* Return TRUE if we could convert "if (test) *x = a; else skip" to
> +   scalar mask store and could do conditional faulting movcc, i.e.
> +   x86 cfcmov, especially when store x may cause memmory faults and
> +   in else_bb x == b.  */
> +
> +static bool
> +can_use_scalar_mask_store (rtx x, rtx a, rtx b, bool a_simple)
> +{
> +  gcc_assert (MEM_P (x));
> +
> +  machine_mode x_mode = GET_MODE (x);
> +  if (convert_optab_handler (maskstore_optab, x_mode,
> +  x_mode) == CODE_FOR_nothing)

If we go for the option described above, the second mode here should
be the mode of if_info.cond.

> +return false;
> +
> +  if (!rtx_equal_p (x, b) || !may_trap_or_fault_p (x))
> +return false;
> +  if (!a_simple || !register_operand (a, x_mode))
> +return false;

Could you explain the purpose of the last if statement?  I would have
expected noce_try_cmove_arith to handle other forms of "a" correctly
(as long as they don't fault -- more on that below).

> +
> +  return true;
> +}
> +
> +/* Return TRUE if backend supports scalar maskload_optab/maskstore_optab,
> +   which suppressed memory faults when load or store a memory operand
> +   and the condition code evaluates to false.  */
> +
> +static bool
> +can_use_scalar_mask_load_store (struct noce_if_info *if_info)
> +{
> +  rtx a = if_info->a;
> +  rtx b = if_info->b;
> +  rtx x = if_info->x;
> +
> +  if (!MEM_P (a) && !MEM_P (b))
> +return false;
> +
> +  if (MEM_P (x))
> +return can_use_scalar_mask_store (x, a, b, if_info->then_simple);
> +  else
> +/* Return TRUE if backend supports scalar maskload_optab, we could 
> convert
> +   "if (test) x = *a; else x = b;" or "if (test) x = a; else x = *b;"
> +   to conditional faulting movcc, i.e. x86 cfcmov, especially when load a
> +   or b may cause memmory faults.  */
> +return convert_optab_handler (maskstore_optab, GET_MODE (a),
> +   GET_MODE (a)) != CODE_FOR_nothing;

It looks like this should be maskload_optab.  "a" might be a VOIDmode
constant, so it's probably better to use GET_MODE (x) for the first mode.
The comment above about the second mode applies here too.

> +}
> +
>  /* Try more complex cases involving conditional_move.  */
>  
>  static bool
> @@ -2171,7 +2221,17 @@ noce_try_cmove_arith (struct noce_if_info *if_info)
>/* ??? We could handle this if we knew that a load from A or B could
>   not trap or fault.  This is also true if we've already loaded
>   from the address along the path from ENTRY.  */

This comment is now a little out of place.

> -  else if (may_trap_or_fault_p (a) || may_trap_or_fault_p (b))
> +  /* Just wait cse_not_expected, then convert to conditional mov on their
> + addres

[committed] libstdc++: Reorder printer registrations in printers.py

2024-11-28 Thread Jonathan Wakely

Register StdIntegralConstantPrinter with the other C++11 printers, and
register StdTextEncodingPrinter after C++20 printers.

libstdc++-v3/ChangeLog:

* python/libstdcxx/v6/printers.py: Reorder registrations.
---

No behaviour change.

Tested x86_64-linux. Pushed to trunk.

 libstdc++-v3/python/libstdcxx/v6/printers.py | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/libstdc++-v3/python/libstdcxx/v6/printers.py 
b/libstdc++-v3/python/libstdcxx/v6/printers.py
index d05b79762fd..37ca51b2628 100644
--- a/libstdc++-v3/python/libstdcxx/v6/printers.py
+++ b/libstdc++-v3/python/libstdcxx/v6/printers.py
@@ -2830,10 +2830,6 @@ def build_libstdcxx_dictionary():
 # vector
 libstdcxx_printer.add_version('std::', 'locale', StdLocalePrinter)
 
-libstdcxx_printer.add_version('std::', 'integral_constant',
-  StdIntegralConstantPrinter)
-libstdcxx_printer.add_version('std::', 'text_encoding',
-  StdTextEncodingPrinter)
 
 if hasattr(gdb.Value, 'dynamic_type'):
 libstdcxx_printer.add_version('std::', 'error_code',
@@ -2896,6 +2892,8 @@ def build_libstdcxx_dictionary():
   StdChronoDurationPrinter)
 libstdcxx_printer.add_version('std::chrono::', 'time_point',
   StdChronoTimePointPrinter)
+libstdcxx_printer.add_version('std::', 'integral_constant',
+  StdIntegralConstantPrinter)
 
 # std::regex components
 libstdcxx_printer.add_version('std::__detail::', '_State',
@@ -2971,6 +2969,9 @@ def build_libstdcxx_dictionary():
 # libstdcxx_printer.add_version('std::chrono::(anonymous namespace)', 
'Rule',
 #  StdChronoTimeZoneRulePrinter)
 
+# C++26 components
+libstdcxx_printer.add_version('std::', 'text_encoding',
+  StdTextEncodingPrinter)
 # Extensions.
 libstdcxx_printer.add_version('__gnu_cxx::', 'slist', StdSlistPrinter)
 
-- 
2.47.0

Re: Backport two LRA patches to gcc-14 branch

2024-11-28 Thread Uros Bizjak

On Wed, Nov 27, 2024 at 1:39 PM Vladimir Makarov  wrote:
>
>
> On 11/27/24 04:05, Uros Bizjak wrote:
> > Hello!
> >
> > I'd like to backport two LRA patches to gcc-14 branch:
> >
> > 1. [PR114942][LRA]: Don't reuse input reload reg of inout early clobber 
> > operand
> > https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=9585317f0715699197b1313bbf939c6ea3c1ace6
> >
> > 2. [PR117105][LRA]: Use unique value reload pseudo for early clobber operand
> > https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=4b09e2c67ef593db171b0755b46378964421782b
> >
> > They both fix RA failure with strict_low_part family of instructions:
> >
> > (insn 24 55 54 4 (parallel [
> >  (set (strict_low_part (reg:QI 2 cx [orig:109 e ] [109]))
> >  (and:QI (subreg:QI (zero_extract:HI (reg/v:HI 2 cx
> > [orig:109 e ] [109])
> >  (const_int 8 [0x8])
> >  (const_int 8 [0x8])) 0)
> >  (reg:QI 1 dx [orig:115 _6 ] [115])))
> >  (clobber (reg:CC 17 flags))
> >
> > that were added by me for PR target/78904, so I have some interest in
> > the backport.
> >
> > The backport of two patches was bootstrapped and regression tested
> > with the current gcc-14 branch.
> >
> > Is the backport OK for branch?
> >
> OK.  They are both safe.  I don't expect any issues with them.

Done.

Thanks,
Uros.

Re: [PATCH] arm: [MVE intrinsics] fix vctpq intrinsic implementation [PR target/117814]

2024-11-28 Thread Christophe Lyon

On Thu, 28 Nov 2024 at 15:13, Andre Vieira (lists)
 wrote:
>
> Hi Christophe,
>
> On 28/11/2024 10:22, Christophe Lyon wrote:
> > The VCTP instruction creates a Vector Tail Predicate in VPR.P0, based
> > on the input value, but also constrained by a VPT block (if present),
> > or if used within a DLSTP/LETP loop.
> >
> > Therefore we need to inform the compiler that this intrinsic reads the
> > FPCXT register, otherwise it could make incorrect assumptions: for
> > instance in test7() from gcc.target/arm/mve/dlstp-compile-asm-2.c it
> > would hoist p1 = vctp32q (g) outside of the loop.
>
> We chatted about this offlist but it's good to share here for others
> too. I do not believe the transformation gcc is doing here is wrong. The
> transformation we do for test 7, along with some others in the
> testsuite, relies on analysis we do to check whether masks, that are not
> the loop predicate mask, used within the loop have a side effect. In
> other words, any instruction that is not predicated by the loop
> predicate, be that unpredicated or predicated by another mask, triggers
> an analysis to check whether the results are used in a safe way. Check
> the comments above 'arm_mve_impl_predicated_p' in arm.cc
>
> For test7 the non-loop predicate 'p1' is used to predicate a load inside
> the loop, when dlstp'ed that load will be masked by 'p & p1' instead,
> which means it could be loading less than initially intended, however,
> the results of that load are used in a vadd predicated by 'p' which
> means any values that it would have loaded if not masked by 'p' would
> have been discarded in the add, so it has no relevant effect.
>
> Furthermore, I also believe the compiler is already aware that VCTP
> writes P0, given it has an input operand with the predicate
> 'vpr_register_operand' and the register constraint '=Up'. During DLSTP
> transformation we rely on reads and writes to such operands to do our
> transformation and it should also provide other backend passes with
> enough information.
>
> So I don't think this patch is needed.
>
Indeed.
I managed to wrongly convince myself that p1 = vctp32q (g) should not
be hoisted...

Let me drop this patch, but dlstp-compile-asm-2.c still needs fixing.

Thanks,

Christophe

Re: [PATCH 4/8] ipa: Better value ranges for zero pointer constants

2024-11-28 Thread Martin Jambor

Hi,

On Fri, Nov 15 2024, Martin Jambor wrote:
> Hi,
>
> On Thu, Nov 07 2024, Aldy Hernandez wrote:
>> Jan Hubicka  writes:
>>
 > 2024-11-01  Martin Jambor  
 >
 > * ipa-prop.cc (ipa_compute_jump_functions_for_edge): When 
 > creating
 > value-range jump functions from pointer type constant zero, do so
 > as if it was not a pointer.
 > ---
 >  gcc/ipa-prop.cc | 3 ++-
 >  1 file changed, 2 insertions(+), 1 deletion(-)
 >
 > diff --git a/gcc/ipa-prop.cc b/gcc/ipa-prop.cc
 > index 9bd2e4bc60c..012f8a32386 100644
 > --- a/gcc/ipa-prop.cc
 > +++ b/gcc/ipa-prop.cc
 > @@ -2368,7 +2368,8 @@ ipa_compute_jump_functions_for_edge (struct 
 > ipa_func_body_info *fbi,
 > }
 >
 >value_range vr (TREE_TYPE (arg));
 > -  if (POINTER_TYPE_P (TREE_TYPE (arg)))
 > +  if (POINTER_TYPE_P (TREE_TYPE (arg))
 > + && !zerop (arg))
 
 integer_zerop (arg) - I also think this deserves a comment.
>
> thanks for the pointer, I was not aware of that function.  But given
> Honza's and Aldi's feedback, I may take a different path.
>
>>>
>>> Comment would indeed be nice.  It is not clear to me why special
>>> handling is needed here and ranger does not give the same or better
>>> value range than one we compute based on alignment+offset and non-zero
>>> ness?
>>
>> Yeah, this doesn't smell right.  Martin, could you look at what's going on?
>
> If you quickly glance at the code it is not surprising.  The pointer
> handling code looks if it knows that the argument is non-zero, and
> depending on that either starts with a (freshly initialized) non_zero or
> varying value_range.  Afterwards, it proceeds to attempt to imbue it
> with alignment info.  That is, the code does not try to store the result
> of the get_range_query into the jump function, it is simply interested
> in the non-NULLness.
>
> I thought that was intentional but given Aldy's reaction perhaps it
> wasn't, so I decided to be bolder and rework the code a bit.  Please see
> an alternative patch below.
>
>>
>>>
>>> The code was needed since we did not have value ranges for pointer typed
>>> SSA names, but do we still need to special case them these days?
>>
>> Note that the prange implementation doesn't do anything extra we weren't
>> already doing with irange for pointers.  And the original code didn't
>> update ranges or value/mask pairs based on alignment, so you probably
>> still have to keep doing whatever alignment magic you were doing.
>
> I also work with the assumption that the extra code is necessary but my
> understanding of ranger and its capabilities is limited.
>
> Below is the new patch which has also passed bootstrap and testing on
> x86_64-linux on its own and along with the verifier patch.
>

Ping, please.

Martin


> -- 8<  8<  8< --
>
> When looking into cases where we know an actual argument of a call is
> a constant but we don't generate a singleton value-range for the jump
> function, I found out that the special handling of pointer constants
> does not work well for constant zero pointer values.  In fact the code
> only attempts to see if it can figure out that an argument is not zero
> and if it can figure out any alignment information.
>
> With this patch, we try to use the value_range that ranger can give us
> in the jump function if we can and we query ranger for all kinds of
> arguments, not just SSA_NAMES (and so also pointer integer constants).
> If we cannot figure out a useful range we fall back again on figuring
> out non-NULLness with tree_single_nonzero_warnv_p.
>
> With this patch, we generate
>
>   [prange] struct S * [0, 0] MASK 0x0 VALUE 0x0
>
> instead of for example:
>
>   [prange] struct S * [0, +INF] MASK 0xfff0 VALUE 0x0
>
> for a zero constant passed in a call.
>
> If you are wondering why we check whether the value range obtained
> from range_of_expr can be undefined, even when the function returns
> true, that is because that can apparently happen for default-definition
> SSA_NAMEs.
>
> gcc/ChangeLog:
>
> 2024-11-15  Martin Jambor  
>
>   * ipa-prop.cc (ipa_compute_jump_functions_for_edge): Try harder to
>   use the value range obtained from ranger for pointer values.
> ---
>  gcc/ipa-prop.cc | 35 ---
>  1 file changed, 16 insertions(+), 19 deletions(-)
>
> diff --git a/gcc/ipa-prop.cc b/gcc/ipa-prop.cc
> index fd0d9b7c15c..50ec8e0cf28 100644
> --- a/gcc/ipa-prop.cc
> +++ b/gcc/ipa-prop.cc
> @@ -2382,28 +2382,27 @@ ipa_compute_jump_functions_for_edge (struct 
> ipa_func_body_info *fbi,
>value_range vr (TREE_TYPE (arg));
>if (POINTER_TYPE_P (TREE_TYPE (arg)))
>   {
> -   bool addr_nonzero = false;
> -   bool strict_overflow = false;
> -
> -   if (TREE_CODE (arg) == SSA_NAME
> -   && param_type
> -   && get_range_query (cfun)->range_

Re: [PATCH] c++: define __cpp_pack_indexing [PR113798]

2024-11-28 Thread Jakub Jelinek

On Thu, Nov 28, 2024 at 12:19:00PM -0500, Marek Polacek wrote:
> Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?
> 
> -- >8 --
> Forgot to do this in my original patch.
> 
> gcc/c-family/ChangeLog:
> 
>   * c-cppbuiltin.cc (c_cpp_builtins): Predefine
>   __cpp_pack_indexing=202311L for C++26.
> 
> gcc/testsuite/ChangeLog:
> 
>   * g++.dg/cpp26/feat-cxx26.C (__cpp_pack_indexing): Add test.

LGTM.

Jakub

Re: [PATCH 1/3] tree-optimization/117467 - Do not calculate an entry range for invariant names.

2024-11-28 Thread Jakub Jelinek

On Mon, Nov 25, 2024 at 07:55:46PM -0500, Andrew MacLeod wrote:
> From 97bea858ff782dc5c80490bb48cbd3241ad3413c Mon Sep 17 00:00:00 2001
> From: Andrew MacLeod 
> Date: Mon, 25 Nov 2024 09:50:33 -0500
> Subject: [PATCH 1/3] Do not calculate an entry range for invariant names.
> 
> If an SSA_NAME is invariant, do not calculate an on_entry value.
> 
>   PR tree-optimization/117467
>   * gimple-range-cache.cc (ranger_cache::entry_range): Do not
>   invoke range_from_dom for invariant ssa-names.

LGTM.

Jakub

Re: [PATCH 3/3] PR tree-optimization/117647 - Only add inferred ranges if they change the value.

2024-11-28 Thread Jakub Jelinek

On Mon, Nov 25, 2024 at 07:56:09PM -0500, Andrew MacLeod wrote:
> From 9434efb95a481ea57db8d47919d05cbe17b8bcba Mon Sep 17 00:00:00 2001
> From: Andrew MacLeod 
> Date: Sat, 23 Nov 2024 14:05:54 -0500
> Subject: [PATCH 3/3] Only add inferred ranges if they change the value.
> 
> Do not add an inferred range if it is already incorprated in the
> current range of an SSA_NAME.
> 
>   PR tree-optimization/117467
>   * gimple-range-infer.cc (infer_range_manager::add_ranges): Check
>   range_of_expr to see if the inferred range is needed.

LGTM.

Jakub

Re: [PATCH 2/3] PR tree-optimization/117467 - Add a range query to inferred ranges.

2024-11-28 Thread Jakub Jelinek

On Mon, Nov 25, 2024 at 07:56:00PM -0500, Andrew MacLeod wrote:
> From 0aee6d112bf4dd9accd7aaa8b48a520a878dedf9 Mon Sep 17 00:00:00 2001
> From: Andrew MacLeod 
> Date: Sat, 16 Nov 2024 08:29:30 -0500
> Subject: [PATCH 2/3] Add a range query to inferred ranges.
> 
> Provide a range_query for any inferred range processing which wants to
> examine the range of an argument to make decisions.  Add some comments.
> 
>   * gimple-range-cache.cc (ranger_cache::ranger_cache): Create the
>   infer oracle using THIS as the range_query.
>   * gimple_range_infer.cc (gimple_infer_range::gimple_infer_range):
>   Add a range_query to the constructor and use it.
>   (infer_range_manager::infer_range_manager): Add a range_query.
>   * gimple-range-infer.h (gimple_infer_range): Adjust prototype.
>   (infer_range_manager): Add a range_query.
>   * value-query.cc (range_query::create_infer_oracle): Add a range_query.

LGTM.

Jakub

[PATCH] libstdc++: Use hidden friends for __normal_iterator operators

2024-11-28 Thread Jonathan Wakely

As suggested by Jason, this makes all __normal_iterator operators into
friends so they can be found by ADL and don't need to be separately
exported in module std.

For the operator<=> comparing two iterators of the same type, I had to
use a deduced return type and add a requires-clause, because it's no
longer a template and so we no longer get substitution failures when
it's considered in oerload resolution.

I also had to reorder the __attribute__((always_inline)) and
[[nodiscard]] attributes, which have to be in a particular order when
used on friend functions.

libstdc++-v3/ChangeLog:

* include/bits/stl_iterator.h (__normal_iterator): Make all
non-member operators hidden friends.
* src/c++11/string-inst.cc: Remove explicit instantiations of
operators that are no longer templates.
---

Tested x86_64-linux.

This iterator type isn't defined in the standard, and users shouldn't be
doing funny things with it, so nothing prevents us from replacing its
operators with hidden friends.

 libstdc++-v3/include/bits/stl_iterator.h | 341 ---
 libstdc++-v3/src/c++11/string-inst.cc|  11 -
 2 files changed, 184 insertions(+), 168 deletions(-)

diff --git a/libstdc++-v3/include/bits/stl_iterator.h 
b/libstdc++-v3/include/bits/stl_iterator.h
index e872598d7d8..656a47e5f76 100644
--- a/libstdc++-v3/include/bits/stl_iterator.h
+++ b/libstdc++-v3/include/bits/stl_iterator.h
@@ -1164,188 +1164,215 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   const _Iterator&
   base() const _GLIBCXX_NOEXCEPT
   { return _M_current; }
-};
 
-  // Note: In what follows, the left- and right-hand-side iterators are
-  // allowed to vary in types (conceptually in cv-qualification) so that
-  // comparison between cv-qualified and non-cv-qualified iterators be
-  // valid.  However, the greedy and unfriendly operators in std::rel_ops
-  // will make overload resolution ambiguous (when in scope) if we don't
-  // provide overloads whose operands are of the same type.  Can someone
-  // remind me what generic programming is about? -- Gaby
+private:
+  // Note: In what follows, the left- and right-hand-side iterators are
+  // allowed to vary in types (conceptually in cv-qualification) so that
+  // comparison between cv-qualified and non-cv-qualified iterators be
+  // valid.  However, the greedy and unfriendly operators in std::rel_ops
+  // will make overload resolution ambiguous (when in scope) if we don't
+  // provide overloads whose operands are of the same type.  Can someone
+  // remind me what generic programming is about? -- Gaby
 
 #ifdef __cpp_lib_three_way_comparison
-  template
-[[nodiscard, __gnu__::__always_inline__]]
-constexpr bool
-operator==(const __normal_iterator<_IteratorL, _Container>& __lhs,
-  const __normal_iterator<_IteratorR, _Container>& __rhs)
-noexcept(noexcept(__lhs.base() == __rhs.base()))
-requires requires {
-  { __lhs.base() == __rhs.base() } -> std::convertible_to;
-}
-{ return __lhs.base() == __rhs.base(); }
+  template
+   [[nodiscard, __gnu__::__always_inline__]]
+   friend
+   constexpr bool
+   operator==(const __normal_iterator& __lhs,
+  const __normal_iterator<_Iter, _Container>& __rhs)
+   noexcept(noexcept(__lhs.base() == __rhs.base()))
+   requires requires {
+ { __lhs.base() == __rhs.base() } -> std::convertible_to;
+   }
+   { return __lhs.base() == __rhs.base(); }
 
-  template
-[[nodiscard, __gnu__::__always_inline__]]
-constexpr std::__detail::__synth3way_t<_IteratorR, _IteratorL>
-operator<=>(const __normal_iterator<_IteratorL, _Container>& __lhs,
-   const __normal_iterator<_IteratorR, _Container>& __rhs)
-noexcept(noexcept(std::__detail::__synth3way(__lhs.base(), __rhs.base(
-{ return std::__detail::__synth3way(__lhs.base(), __rhs.base()); }
+  template
+   static constexpr bool __nothrow_synth3way
+ = noexcept(std::__detail::__synth3way(std::declval<_Iterator&>(),
+   std::declval<_Iter&>()));
 
-  template
-[[nodiscard, __gnu__::__always_inline__]]
-constexpr bool
-operator==(const __normal_iterator<_Iterator, _Container>& __lhs,
-  const __normal_iterator<_Iterator, _Container>& __rhs)
-noexcept(noexcept(__lhs.base() == __rhs.base()))
-requires requires {
-  { __lhs.base() == __rhs.base() } -> std::convertible_to;
-}
-{ return __lhs.base() == __rhs.base(); }
+  template
+   [[nodiscard, __gnu__::__always_inline__]]
+   friend
+   constexpr std::__detail::__synth3way_t<_Iterator, _Iter>
+   operator<=>(const __normal_iterator& __lhs,
+   const __normal_iterator<_Iter, _Container>& __rhs)
+   noexcept(__nothrow_synth3way<_Iter>)
+   requires requires {
+ std::__detail::__synth3way(__lhs.base(), __rhs.base

Generating dynamic tags for applications using MTE tagged stack

2024-11-28 Thread Indu Bhagat


Hi,

I need some feedback/discussion on GCC/Binutils command line options 
around MTE

tagged stack usage.  See "Proposed GCC/Binutils implementation for the user
space ABI for MTE stack" below in the email for the high-level design.

Thanks
Indu

-

MTE Background
--
Memory Tagging Extension (MTE) is an AArch64 extension.  This extension
allows coloring of 16-byte memory granules with 4-bit tag values.  The
extension provides additional instructions in ISA and a new memory type,
Normal Tagged Memory, added to the Arm Architecture.  This hardware-assisted
mechanism can be used to detect memory bugs like buffer overrun or
use-after-free.  The detection is probabilistic.

Current glibc and kernel support

A user program may exercise MTE on stack, heap and/or globals data accesses.
The applicable memory range must be mapped with the Normal-Tagged memory
attribute ([1]).  When available and enabled, the kernel advertises the
feature to userspace via HWCAP2_MTE.  The new flag PROT_MTE (for mmap () and
mprotect ()) specify that the associated pages allow access to the MTE
allocation tags.

glibc currently provides a tunable glibc.mem.tagging ([3]) and MTE aware
malloc.  The tunable can be used to enable the malloc subsystem to allocate
tagged memory with either precise or deferred faulting mode.  The GNU C
Library startup code will automatically enable memory tagging support in the
kernel if this tunable has any non-zero value.

User space ABI for MTE enable stack usage
--
As per the Memtag ABI Extension to ELF for the Arm® 64-bit Architecture
(AArch64) ([5]), the first two of the following dynamic tags are of interest
(as we are interested in stack tagging ATM):
  DT_AARCH64_MEMTAG_MODE
  DT_AARCH64_MEMTAG_STACK
  DT_AARCH64_MEMTAG_GLOBALS
AFAICT, these are not implemented in the Linux kernel nor specified in glibc
yet.

Proposed GCC/Binutils implementation for the user space ABI for MTE stack

1. Generating DT_AARCH64_MEMTAG_MODE

GCC
---
Add new GCC command line option: -fsanitize-memtag-mode= where mode is
one of:
  - none
  - sync
  - async
This option does not affect code generation.  The driver simply passes 
this to
the linker by using the --aarch64-memtag-mode= ld option.   IOW, 
this is

a convenience command line option which equivalent to
-Wl,--aarch64-memtag-mode=.

No configure time checks will be added on the GCC side.  If the linker does
not support --aarch64-memtag-mode=, user will see an error right away
anyway.

Binutils

Add new ld command line option for emultempl/aarch64elf :
--aarch64-memtag-mode= where  can be one of none, sync, or 
async.


For mode of sync or async, a DT_AARCH64_MEMTAG_MODE dynamic tag with a 
value of

0 or 1 respectively is emitted.

Q: Why does the MemTagABI ([5]) not assign a value for asymm mode yet?

Generated only for aarch64 elf.  Linker silently ignores when specified for
32-bit elf.  As per the MemtagABI doc, the dynamic tag when present on a
dynamically loaded objects, is ignored.

readelf displays the dynamic tag when present.

2. Generating DT_AARCH64_MEMTAG_STACK

GCC
---
In the RFC patch set I sent earlier ([4]), I used the new option
-fsanitize=memtag and was thinking that we will use params to control 
whether
stack or globals are colored.  E.g., with 
--param=memtag-instrument-stack=1 or
--param=memtag-instrument-globals=1.  But this will not work because the 
driver

will have no visibility of the params.

So, a new GCC command line option: -fsanitize=memtag-stack should instead be
used to tag stack variables of a function.  This option will trigger the 
driver
to pass --aarch64-memtag-stack to the linker, if linking.  (Later, for 
globals,
we can use -fsanitize=memtag-globals in GCC and --aarch4-memtag-globals 
in ld.)


Binutils
---
Add new ld command line option for emultempl/aarch64elf :
  --aarch64-memtag-stack

GNU ld will emit a DT_AARCH64_MEMTAG_STACK dynamic tag with a value of
1 if --aarch64-memtag-stack is specified.

Generated only for aarch64 elf.  Linker silently ignores when specified for
32-bit elf.  As per the MemtagABI doc, the dynamic tag when present on a
dynamically loaded objects, is ignored.

Q: Should it be --aarch64-memtag= instead, where  is either 


or  ?

Q: It may be useful to validate that all components being linked have been
compiled with -fsanitize=memtag-stack. WDYT ? If we need to validate, we 
will
likely need a new assembler directive, and find a way to convey the 
information

to the linker. Thoughts ?

Q: Should ld warn if it sees some EH Frame info with augmentation char 
'G' but

no --aarch64-memtag-stack in the command line by the user ?

readelf displays the dynamic tag when present.

3. Generating .cfi_mte_memtag_frame CFI direcive

GCC

GCC will emit a .cfi_mte_memtag_frame (LLVM already uses this CFI directive)
after a .cfi_startpr

[PATCH] libstdc++: Make std::basic_stacktrace swappable with unequal allocators

2024-11-28 Thread Jonathan Wakely

The standard says that it's undefined to swap two containers if the
allocators are not equal and do not propagate. This ensures that swap is
always O(1) and non-throwing, but has other undesirable consequences
such as LWG 2152. The 2016 paper P0178 ("Allocators and swap") proposed
making the non-member swap handle non-equal allocators, by performing an
O(n) deep copy when needed. This ensures that a.swap(b) is still O(1)
and non-throwing, but swap(a, b) is valid for all values of the type.

This change implements that for std::basic_stacktrace. The member swap
is changed so that for the undefined case (where we can't swap the
allocators, but can't swap storage separately from the allocators) we
just return without changing either object. This ensures that with
assertions disabled we don't separate allocated storage from the
allocator that can free it.

For the non-member swap, perform deep copies of the two ranges, avoiding
reallocation if there is already sufficient capacity.

libstdc++-v3/ChangeLog:

* include/std/stacktrace (basic_stacktrace::swap): Refactor so
that the undefined case is a no-op when assertions are disabled.
(swap): Remove precondition and perform deep copies when member
swap would be undefined.
* testsuite/19_diagnostics/stacktrace/stacktrace.cc: Check
swapping with unequal, non-propagating allocators.
---

As part of my ongoing quest to reduce the undefined behaviour surface in
the library, this helps to avoid UB when swapping stacktrace objects.

This is an RFC to see if people like the idea. If we do it here, we
could do it for other containers too.

For the common case there should be no additional cost, because the
'if constexpr' conditions will be true and swap(a, b) will just call
a.swap(b) unconditionally, which will swap the contents unconditionally.
We only do extra work in the cases that are currently undefined.

Tested x86_64-linux.

 libstdc++-v3/include/std/stacktrace   | 77 ---
 .../19_diagnostics/stacktrace/stacktrace.cc   | 23 ++
 2 files changed, 90 insertions(+), 10 deletions(-)

diff --git a/libstdc++-v3/include/std/stacktrace 
b/libstdc++-v3/include/std/stacktrace
index f94a424e4cf..ab0788cde08 100644
--- a/libstdc++-v3/include/std/stacktrace
+++ b/libstdc++-v3/include/std/stacktrace
@@ -476,15 +476,79 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
}
 
   // [stacktrace.basic.mod], modifiers
+
+  /** Exchange the contents of two stacktrace objects
+   *
+   * @pre The allocators must propagate on swap or must be equal.
+   */
   void
   swap(basic_stacktrace& __other) noexcept
   {
-   std::swap(_M_impl, __other._M_impl);
if constexpr (_AllocTraits::propagate_on_container_swap::value)
- std::swap(_M_alloc, __other._M_alloc);
+ {
+   using std::swap;
+   swap(_M_alloc, __other._M_alloc);
+ }
else if constexpr (!_AllocTraits::is_always_equal::value)
  {
-   __glibcxx_assert(_M_alloc == __other._M_alloc);
+   if (_M_alloc != __other._M_alloc)
+ {
+   __glibcxx_assert(_M_alloc == __other._M_alloc);
+   // If assertions are disabled but the allocators are unequal,
+   // we can't swap pointers, so just erroneously return.
+   return;
+ }
+ }
+   std::swap(_M_impl, __other._M_impl);
+  }
+
+  // [stacktrace.basic.nonmem], non-member functions
+
+  /** Exchange the contents of two stacktrace objects
+   *
+   * Unlike the `swap` member function, this can be used with unequal
+   * and non-propagating allocators. If the storage cannot be efficiently
+   * swapped then the stacktrace_entry elements will be exchanged
+   * one-by-one, reallocating if needed.
+   */
+  friend void
+  swap(basic_stacktrace& __a, basic_stacktrace& __b)
+  noexcept(_AllocTraits::propagate_on_container_swap::value
+|| _AllocTraits::is_always_equal::value)
+  {
+   if constexpr (_AllocTraits::propagate_on_container_swap::value
+   || _AllocTraits::is_always_equal::value)
+ __a.swap(__b);
+   else if (__a._M_alloc == __b._M_alloc) [[likely]]
+ __a.swap(__b);
+   else // O(N) swap for non-equal non-propagating allocators
+ {
+   basic_stacktrace* __p[2]{ std::__addressof(__a),
+ std::__addressof(__b) };
+   if (__p[0]->size() > __p[1]->size())
+ std::swap(__p[0], __p[1]);
+   basic_stacktrace& __a = *__p[0]; // shorter sequence
+   basic_stacktrace& __b = *__p[1]; // longer sequence
+
+   const auto __a_sz = __a.size();
+   auto __a_begin = __a._M_impl._M_frames;
+   auto __a_end = __a._M_impl._M_frames + __a_sz;
+   auto __b_begin = __b._M_impl._M_frames;
+
+   if (__a._M_impl._M_capacity <

Re: [PATCH v4 4/5] aarch64: add SVE2 FP8 multiply accumulate intrinsics

2024-11-28 Thread Claudio Bantaloukas




On 21/11/2024 14:33, Richard Sandiford wrote:

Claudio Bantaloukas  writes:

[...]
@@ -4004,6 +4008,44 @@ SHAPE (ternary_bfloat_lane)
  typedef ternary_bfloat_lane_base<2> ternary_bfloat_lanex2_def;
  SHAPE (ternary_bfloat_lanex2)
  

+/* sv_t svfoo[_t0](sv_t, svmfloat8_t, svmfloat8_t, uint64_t)
+
+   where the final argument is an integer constant expression in the range
+   [0, 15].  */
+struct ternary_mfloat8_lane_def
+: public ternary_resize2_lane_base<8, TYPE_mfloat, TYPE_mfloat>
+{
+  void
+  build (function_builder &b, const function_group_info &group) const override
+  {
+gcc_assert (group.fpm_mode == FPM_set);
+b.add_overloaded_functions (group, MODE_none);
+build_all (b, "v0,v0,vM,vM,su64", group, MODE_none);
+  }
+
+  bool
+  check (function_checker &c) const override
+  {
+return c.require_immediate_lane_index (3, 2, 1);
+  }
+
+  tree
+  resolve (function_resolver &r) const override
+  {
+type_suffix_index type;
+if (!r.check_num_arguments (5)
+   || (type = r.infer_vector_type (0)) == NUM_TYPE_SUFFIXES
+   || !r.require_vector_type (1, VECTOR_TYPE_svmfloat8_t)
+   || !r.require_vector_type (2, VECTOR_TYPE_svmfloat8_t)
+   || !r.require_integer_immediate (3)
+   || !r.require_scalar_type (4, "int64_t"))

uint64_t

done, although I wonder if "fpm_t, aka uint64_t" would be better.



+  return error_mark_node;
+
+return r.resolve_to (r.mode_suffix_id, type, TYPE_SUFFIX_mf8, GROUP_none);
+  }
+};
+SHAPE (ternary_mfloat8_lane)
+
  /* sv_t svfoo[_t0](sv_t, svbfloatt16_t, svbfloat16_t)
 sv_t svfoo[_n_t0](sv_t, svbfloat16_t, bfloat16_t).  */
  struct ternary_bfloat_opt_n_def
@@ -4019,6 +4061,46 @@ struct ternary_bfloat_opt_n_def
  };
  SHAPE (ternary_bfloat_opt_n)
  
+/* sv_t svfoo[_t0](sv_t, svmfloatt8_t, svmfloat8_t)

+   sv_t svfoo[_n_t0](sv_t, svmfloat8_t, bfloat8_t).  */
+struct ternary_mfloat8_opt_n_def
+: public ternary_resize2_opt_n_base<8, TYPE_mfloat, TYPE_mfloat>
+{
+  void
+  build (function_builder &b, const function_group_info &group) const override
+  {
+gcc_assert (group.fpm_mode == FPM_set);
+b.add_overloaded_functions (group, MODE_none);
+build_all (b, "v0,v0,vM,vM", group, MODE_none);
+build_all (b, "v0,v0,vM,sM", group, MODE_n);
+  }
+
+  tree
+  resolve (function_resolver &r) const override
+  {
+type_suffix_index type;
+if (!r.check_num_arguments (4)
+   || (type = r.infer_vector_type (0)) == NUM_TYPE_SUFFIXES
+   || !r.require_vector_type (1, VECTOR_TYPE_svmfloat8_t)
+   || !r.require_scalar_type (3, "int64_t"))
+  return error_mark_node;
+
+tree scalar_form
+   = r.lookup_form (MODE_n, type, TYPE_SUFFIX_mf8, GROUP_none);
+if (r.scalar_argument_p (2))
+  {
+   if (scalar_form)
+ return scalar_form;
+   return error_mark_node;

It looks like this would return error_mark_node without reporting
an error first.


+  }
+if (scalar_form && !r.require_vector_or_scalar_type (2))
+  return error_mark_node;
+
+return r.resolve_to (r.mode_suffix_id, type, TYPE_SUFFIX_mf8, GROUP_none);
+  }

In this context (unlike finish_opt_n_resolution) we know that there is
a bijection between the vector and scalar forms.  So I think we can just
add require_vector_or_scalar_type to the initial checks:

 if (!r.check_num_arguments (4)
|| (type = r.infer_vector_type (0)) == NUM_TYPE_SUFFIXES
|| !r.require_vector_type (1, VECTOR_TYPE_svmfloat8_t)
|| !r.require_vector_or_scalar_type (2)
|| !r.require_scalar_type (3, "int64_t"))
   return error_mark_node;

 auto mode = r.mode_suffix_id;
 if (r.scalar_argument_p (2))
   mode = MODE_n;
 else if (!r.require_vector_type (2, VECTOR_TYPE_svmfloat8_t))
   return error_mark_node;

 return r.resolve_to (mode, type, TYPE_SUFFIX_mf8, GROUP_none);

(untested).

Done, all tests pass.

[...]
+;; -
+;;  [FP] Mfloat8 Multiply-and-accumulate operations
+;; -
+;; Includes:
+;; - FMLALB (vectors, FP8 to FP16)
+;; - FMLALT (vectors, FP8 to FP16)
+;; - FMLALB (indexed, FP8 to FP16)
+;; - FMLALT (indexed, FP8 to FP16)
+;; - FMLALLBB (vectors)
+;; - FMLALLBB (indexed)
+;; - FMLALLBT (vectors)
+;; - FMLALLBT (indexed)
+;; - FMLALLTB (vectors)
+;; - FMLALLTB (indexed)
+;; - FMLALLTT (vectors)
+;; - FMLALLTT (indexed)
+;; -
+
+(define_insn "@aarch64_sve_add_"
+  [(set (match_operand:SVE_FULL_HSF 0 "register_operand")
+   (unspec:SVE_FULL_HSF
+ [(match_operand:SVE_FULL_HSF 1 "register_operand")
+  (match_operand:VNx16QI 2 "register_operand")
+  (match_operand:VNx16QI 3 "register_operand")
+  (reg:DI FPM_REGNUM)]
+ SVE2_FP8_TERNARY))]
+  "TARGET_SSVE_FP8FMA"
+  {@ [ cons: =0 , 1 , 2 , 3 ; attrs: movprfx ]
+

[committed] libstdc++: Use std::_Destroy in std::stacktrace

2024-11-28 Thread Jonathan Wakely

This benefits from the optimizations in std::_Destroy which avoid doing
any work when using std::allocator.

libstdc++-v3/ChangeLog:

* include/std/stacktrace (basic_stacktrace::_M_impl::_M_resize):
Use std::_Destroy to destroy removed elements.
---

Tested x86_64-linux. Pushed to trunk.

 libstdc++-v3/include/std/stacktrace | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/libstdc++-v3/include/std/stacktrace 
b/libstdc++-v3/include/std/stacktrace
index 2c0f6ba10a9..f94a424e4cf 100644
--- a/libstdc++-v3/include/std/stacktrace
+++ b/libstdc++-v3/include/std/stacktrace
@@ -601,8 +601,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
void
_M_resize(size_type __n, allocator_type& __alloc) noexcept
{
- for (size_type __i = __n; __i < _M_size; ++__i)
-   _AllocTraits::destroy(__alloc, &_M_frames[__i]);
+ std::_Destroy(_M_frames + __n, _M_frames + _M_size, __alloc);
  _M_size = __n;
}
 
-- 
2.47.0

[PATCH v5 4/5] aarch64: add SVE2 FP8 multiply accumulate intrinsics

2024-11-28 Thread Claudio Bantaloukas


This patch adds support for the following intrinsics:
- svmlalb[_f16_mf8]_fpm
- svmlalb[_n_f16_mf8]_fpm
- svmlalt[_f16_mf8]_fpm
- svmlalt[_n_f16_mf8]_fpm
- svmlalb_lane[_f16_mf8]_fpm
- svmlalt_lane[_f16_mf8]_fpm
- svmlallbb[_f32_mf8]_fpm
- svmlallbb[_n_f32_mf8]_fpm
- svmlallbt[_f32_mf8]_fpm
- svmlallbt[_n_f32_mf8]_fpm
- svmlalltb[_f32_mf8]_fpm
- svmlalltb[_n_f32_mf8]_fpm
- svmlalltt[_f32_mf8]_fpm
- svmlalltt[_n_f32_mf8]_fpm
- svmlallbb_lane[_f32_mf8]_fpm
- svmlallbt_lane[_f32_mf8]_fpm
- svmlalltb_lane[_f32_mf8]_fpm
- svmlalltt_lane[_f32_mf8]_fpm

These are available under a combination of the FP8FMA and SVE2 features.
Alternatively under the SSVE_FP8FMA feature under streaming mode.

gcc/
* config/aarch64/aarch64-option-extensions.def
(fp8fma, ssve-fp8fma): Add new options.
* config/aarch64/aarch64-sve-builtins-functions.h
(unspec_based_function_base): Add unspec_for_mfp8.
(unspec_for): Return unspec_for_mfp8 on fpm-using cases.
(sme_1mode_function): Fix call to parent ctor.
(sme_2mode_function_t): Likewise.
(unspec_based_mla_function, unspec_based_mla_lane_function): Handle
fpm-using cases.
* config/aarch64/aarch64-sve-builtins-shapes.cc
(parse_element_type): Treat M as TYPE_SUFFIX_mf8
(ternary_mfloat8_lane_def): Add new class.
(ternary_mfloat8_opt_n_def): Likewise.
(ternary_mfloat8_lane): Add new shape.
(ternary_mfloat8_opt_n): Likewise.
* config/aarch64/aarch64-sve-builtins-shapes.h
(ternary_mfloat8_lane, ternary_mfloat8_opt_n): Declare.
* config/aarch64/aarch64-sve-builtins-sve2.cc
(svmlalb_lane, svmlalb, svmlalt_lane, svmlalt): Update definitions
with mfloat8_t unspec in ctor.
(svmlallbb_lane, svmlallbb, svmlallbt_lane, svmlallbt, svmlalltb_lane,
svmlalltb, svmlalltt_lane, svmlalltt, svmlal_impl): Add new FUNCTIONs.
(svqrshr, svqrshrn, svqrshru, svqrshrun): Update definitions with
nop mfloat8 unspec in ctor.
* config/aarch64/aarch64-sve-builtins-sve2.def
(svmlalb, svmlalt, svmlalb_lane, svmlalt_lane, svmlallbb, svmlallbt,
svmlalltb, svmlalltt, svmlalltt_lane, svmlallbb_lane, svmlallbt_lane,
svmlalltb_lane): Add new DEF_SVE_FUNCTION_GS_FPMs.
* config/aarch64/aarch64-sve-builtins-sve2.h
(svmlallbb_lane, svmlallbb, svmlallbt_lane, svmlallbt, svmlalltb_lane,
svmlalltb, svmlalltt_lane, svmlalltt): Declare.
* config/aarch64/aarch64-sve-builtins.cc
(TYPES_h_float_mf8, TYPES_s_float_mf8): Add new types.
(h_float_mf8, s_float_mf8): Add new SVE_TYPES_ARRAY.
* config/aarch64/aarch64-sve2.md
(@aarch64_sve_add_): Add new.
(@aarch64_sve_add_): Add new.
(@aarch64_sve_add_lane_): Likewise.
(@aarch64_sve_add_lane_): Likewise.
* config/aarch64/aarch64.h
(TARGET_FP8FMA, TARGET_SSVE_FP8FMA): Likewise.
* config/aarch64/iterators.md
(VNx8HF_ONLY): Add new.
(UNSPEC_FMLALB_FP8, UNSPEC_FMLALLBB_FP8, UNSPEC_FMLALLBT_FP8,
UNSPEC_FMLALLTB_FP8, UNSPEC_FMLALLTT_FP8, UNSPEC_FMLALT_FP8): Likewise.
(SVE2_FP8_TERNARY_VNX8HF, SVE2_FP8_TERNARY_VNX4SF): Likewise.
(SVE2_FP8_TERNARY_LANE_VNX8HF, SVE2_FP8_TERNARY_LANE_VNX4SF): Likewise.
(sve2_fp8_fma_op_vnx8hf, sve2_fp8_fma_op_vnx4sf): Likewise.
* doc/invoke.texi: Document fp8fma and sve-fp8fma extensions.

gcc/testsuite/

* gcc.target/aarch64/sve/acle/asm/test_sve_acle.h
(TEST_DUAL_Z_REV, TEST_DUAL_LANE_REG, TEST_DUAL_ZD) Add fpm0 argument.
* gcc.target/aarch64/sve/acle/general-c/ternary_mfloat8_opt_n_1.c: Add
new shape test.
* gcc.target/aarch64/sve/acle/general-c/ternary_mfloat8_lane_1.c:
Likewise.
* gcc.target/aarch64/sve2/acle/asm/mlalb_lane_mf8.c: Add new test.
* gcc.target/aarch64/sve2/acle/asm/mlalb_mf8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/mlallbb_lane_mf8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/mlallbb_mf8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/mlallbt_lane_mf8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/mlallbt_mf8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/mlalltb_lane_mf8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/mlalltb_mf8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/mlalltt_lane_mf8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/mlalltt_mf8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/mlalt_lane_mf8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/mlalt_mf8.c: Likewise.
* lib/target-supports.exp: Add check_effective_target for fp8fma and
ssve-fp8fma
---
 .../aarch64/aarch64-option-extensions.def |  4 +
 .../aarch64/aarch64-sve-builtins-functions.h  | 16 +++-
 .../aarch64/aarch64-sve-builtins-shapes.cc| 78 
 .../aarch64/aarch64-sve-builtins-shapes.h |  2 +
 .

[PATCH v5 3/5] aarch64: add svcvt* FP8 intrinsics

2024-11-28 Thread Claudio Bantaloukas


This patch adds the following intrinsics:
- svcvt1_bf16[_mf8]_fpm
- svcvt1_f16[_mf8]_fpm
- svcvt2_bf16[_mf8]_fpm
- svcvt2_f16[_mf8]_fpm
- svcvtlt1_bf16[_mf8]_fpm
- svcvtlt1_f16[_mf8]_fpm
- svcvtlt2_bf16[_mf8]_fpm
- svcvtlt2_f16[_mf8]_fpm
- svcvtn_mf8[_f16_x2]_fpm (unpredicated)
- svcvtnb_mf8[_f32_x2]_fpm
- svcvtnt_mf8[_f32_x2]_fpm

The underlying instructions are only available when SVE2 is enabled and the PE
is not in streaming SVE mode. They are also available when SME2 is enabled and
the PE is in streaming SVE mode.

gcc/
* config/aarch64/aarch64-sve-builtins-shapes.cc
(parse_signature): Add an fpm_t (uint64_t) argument to functions that
set the fpm register.
(unary_convertxn_narrowt_def): New class.
(unary_convertxn_narrowt): New shape.
(unary_convertxn_narrow_def): New class.
(unary_convertxn_narrow): New shape.
* config/aarch64/aarch64-sve-builtins-shapes.h
(unary_convertxn_narrowt): Declare.
(unary_convertxn_narrow): Likewise.
* config/aarch64/aarch64-sve-builtins-sve2.cc
(svcvt_fp8_impl): New class.
(svcvtn_impl): Handle fp8 cases.
(svcvt1, svcvt2, svcvtlt1, svcvtlt2): Add new FUNCTION.
(svcvtnb): Likewise.
* config/aarch64/aarch64-sve-builtins-sve2.def
(svcvt1, svcvt2, svcvtlt1, svcvtlt2): Add new DEF_SVE_FUNCTION_GS_FPM.
(svcvtn): Likewise.
(svcvtnb, svcvtnt): Likewise.
* config/aarch64/aarch64-sve-builtins-sve2.h
(svcvt1, svcvt2, svcvtlt1, svcvtlt2, svcvtnb, svcvtnt): Declare.
* config/aarch64/aarch64-sve-builtins.cc
(TYPES_cvt_mf8, TYPES_cvtn_mf8, TYPES_cvtnx_mf8): Add new types arrays.
(function_builder::get_name): Append _fpm to functions that set fpmr.
(function_resolver::check_gp_argument): Deal with the fpm_t argument.
(function_expander::expand): Set the fpm register before
calling the insn if the function warrants it.
* config/aarch64/aarch64-sve2.md (@aarch64_sve2_fp8_cvt): Add new.
(@aarch64_sve2_fp8_cvtn): Likewise.
(@aarch64_sve2_fp8_cvtnb): Likewise.
(@aarch64_sve_cvtnt): Likewise.
* config/aarch64/aarch64.h (TARGET_SSVE_FP8): Add new.
* config/aarch64/iterators.md
(VNx8SF_ONLY, SVE_FULL_HFx2): New mode iterators.
(UNSPEC_F1CVT, UNSPEC_F1CVTLT, UNSPEC_F2CVT, UNSPEC_F2CVTLT): Add new.
(UNSPEC_FCVTNB, UNSPEC_FCVTNT): Likewise.
(UNSPEC_FP8FCVTN): Likewise.
(FP8CVT_UNS, fp8_cvt_uns_op): Likewise.

gcc/testsuite/

* gcc.target/aarch64/sve/acle/asm/test_sve_acle.h
(TEST_DUAL_Z): Add fpm0 argument
* gcc.target/aarch64/sve/acle/general-c/unary_convertxn_narrow_1.c:
Add new tests.
* gcc.target/aarch64/sve/acle/general-c/unary_convertxn_narrowt_1.c:
Likewise.
* gcc.target/aarch64/sve2/acle/asm/cvt_mf8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/cvtlt_mf8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/cvtn_mf8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/cvtnb_mf8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/cvtnt_mf8.c: Likewise.
* lib/target-supports.exp: Add aarch64_asm_fp8_ok check.
---
 .../aarch64/aarch64-sve-builtins-shapes.cc| 78 +++
 .../aarch64/aarch64-sve-builtins-shapes.h |  2 +
 .../aarch64/aarch64-sve-builtins-sve2.cc  | 28 ++-
 .../aarch64/aarch64-sve-builtins-sve2.def | 12 +++
 .../aarch64/aarch64-sve-builtins-sve2.h   |  6 ++
 gcc/config/aarch64/aarch64-sve-builtins.cc| 31 +++-
 gcc/config/aarch64/aarch64-sve2.md| 51 
 gcc/config/aarch64/aarch64.h  |  5 ++
 gcc/config/aarch64/iterators.md   | 24 ++
 .../aarch64/sve/acle/asm/test_sve_acle.h  |  2 +-
 .../acle/general-c/unary_convertxn_narrow_1.c | 60 ++
 .../general-c/unary_convertxn_narrowt_1.c | 38 +
 .../aarch64/sve2/acle/asm/cvt_mf8.c   | 48 
 .../aarch64/sve2/acle/asm/cvtlt_mf8.c | 50 
 .../aarch64/sve2/acle/asm/cvtn_mf8.c  | 30 +++
 .../aarch64/sve2/acle/asm/cvtnb_mf8.c | 20 +
 .../aarch64/sve2/acle/asm/cvtnt_mf8.c | 31 
 gcc/testsuite/lib/target-supports.exp |  2 +-
 18 files changed, 513 insertions(+), 5 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sve/acle/general-c/unary_convertxn_narrow_1.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sve/acle/general-c/unary_convertxn_narrowt_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/cvt_mf8.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/cvtlt_mf8.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/cvtn_mf8.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/cvtnb_mf8.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/cvtnt_mf8.c

[PATCH v5 0/5] aarch64: Add fp8 sve foundation

2024-11-28 Thread Claudio Bantaloukas



The ACLE defines a new set of fp8 vector types and intrinsics that operate on
these, some of them operating on the vectors as if they were bags of bits and
some requiring an additional argument of type fpm_t.

The following patches introduce:
- the types
- intrinsics that operate without the fpm_t type
- foundational changes that will be used to implement intrinsics requiring an
  fpm_t argument at the end
- fp8 conversion intrinsics
- fp8 multiply accumulate intrinsics

Compared to v1 of this series, this version adds:
- A change to fix return of scalar fp8 values
- Added tests for sve<->simd conversions
- Support for svcvt* intrinsics along with supporting shapes

Compared to v2 of this series, this version:
- Removes the first patch to fix return of scalar fp8 (already merged)
- Uses b_data to add mf8 rather than TYPES_all_data directly
- Updated test register matching with regex rather than hardcoded regs
- fixed formatting in aarch64-sve-builtins-base.cc,
  aarch64-sve-builtins-sve2.cc, aarch64-sve-builtins.cc
- removed fpm mode from DEF_SVE_FUNCTION_GS
- added DEF_SVE_FUNCTION_GS_FPM
- renamed unary_convert_narrowxn_fpm to unary_convertxn_narrowt
- renamed unary_convertxn_fpm to unary_convertxn_narrow
- use require_scalar_type rather than require_derived_scalar_type
- moved emit_move_insn for fpmr into function_expander::expand
- simplified instruction patterns
- addressed style request from code review
- Added fp8 multiply accumulate intrinsics

Compared to v3 of this series, this version:
- fixes some tests in patch 1 to deal with corrected syntax of the tbl
  instruction.
- Added fp8 sve dot intrinsics

Compared to v4 of this series, this version:
- updates test conditions for svcvt*, svmlal*, svdot* intrinsics to allow
  testing them under STREAMING_COMPATIBLE.
- updates ternary_mfloat8 and ternary_mfloat8_lane shapes to require uint64_t
  rather than int64_t.
- updates ternary_mfloat8_opt_n shape to require uint64_t rather than int64_t
  and improves readability and error clarity of resolve method.
- updates and adds further error tests on the shapes above.
- duplicates @aarch64_sve_add_ and aarch64_sve_add_lane_ define_insns to avoid
  creating invalid combinations. Kept the mode to allow use via 
- fixes formatting of aarch64.h, invoke.texi and aarch64-sve-builtins-sve2.def
- adds mode_iterator VNx8HF_ONLY
- split SVE2_FP8_TERNARY, SVE2_FP8_TERNARY_LANE iterators
- split sve2_fp8_fma_op
- adds upper limit asm test for svmlalb_lane_f16_mf8_fpm, svdot_lane_f16_mf8_fpm
  and svdot_lane_f32_mf8_fpm
- adds invoke.texi entried for fp8dot4, fp8dot2 and ssve variants


Is this ok for master? I do not have commit rights yet, if ok, can someone
commit it on my behalf?

Regression tested on aarch64-unknown-linux-gnu.

Thanks,
Claudio Bantaloukas


Claudio Bantaloukas (5):
  aarch64: Add basic svmfloat8_t support to arm_sve.h
  aarch64: specify fpm mode in function instances and groups
  aarch64: add svcvt* FP8 intrinsics
  aarch64: add SVE2 FP8 multiply accumulate intrinsics
  aarch64: add SVE2 FP8DOT2 and FP8DOT4 intrinsics

 .../aarch64/aarch64-option-extensions.def |  12 +
 .../aarch64/aarch64-sve-builtins-base.cc  |  77 +++--
 .../aarch64/aarch64-sve-builtins-functions.h  |  16 +-
 .../aarch64/aarch64-sve-builtins-shapes.cc| 208 +++-
 .../aarch64/aarch64-sve-builtins-shapes.h |  12 +-
 .../aarch64/aarch64-sve-builtins-sve2.cc  | 101 --
 .../aarch64/aarch64-sve-builtins-sve2.def |  43 +++
 .../aarch64/aarch64-sve-builtins-sve2.h   |  14 +
 gcc/config/aarch64/aarch64-sve-builtins.cc|  71 -
 gcc/config/aarch64/aarch64-sve-builtins.def   |  11 +-
 gcc/config/aarch64/aarch64-sve-builtins.h |  28 +-
 gcc/config/aarch64/aarch64-sve2.md| 173 ++
 gcc/config/aarch64/aarch64.h  |  32 ++
 gcc/config/aarch64/iterators.md   |  63 
 gcc/doc/invoke.texi   |  17 +
 .../aarch64/sve/acle/general-c++/mangle_1.C   |   2 +
 .../aarch64/sve/acle/general-c++/mangle_2.C   |   2 +
 .../aarch64/sve/acle/asm/clasta_mf8.c |  52 +++
 .../aarch64/sve/acle/asm/clastb_mf8.c |  52 +++
 .../aarch64/sve/acle/asm/create2_1.c  |  15 +
 .../aarch64/sve/acle/asm/create3_1.c  |  11 +
 .../aarch64/sve/acle/asm/create4_1.c  |  12 +
 .../aarch64/sve/acle/asm/dup_lane_mf8.c   | 124 
 .../gcc.target/aarch64/sve/acle/asm/dup_mf8.c |  31 ++
 .../aarch64/sve/acle/asm/dup_neonq_mf8.c  |  30 ++
 .../aarch64/sve/acle/asm/dupq_lane_mf8.c  |  48 +++
 .../gcc.target/aarch64/sve/acle/asm/ext_mf8.c |  73 +
 .../aarch64/sve/acle/asm/get2_mf8.c   |  55 
 .../aarch64/sve/acle/asm/get3_mf8.c   | 108 +++
 .../aarch64/sve/acle/asm/get4_mf8.c   | 179 +++
 .../aarch64/sve/acle/asm/get_neonq_mf8.c  |  33 ++
 .../aarch64/sve/acle/asm/insr_mf8.c   |  22 ++
 .../aarch64/sve/acle/asm/lasta_mf8.c  |  12 +
 .../aarch64/sv

Re: [PATCH] Fortran: fix crash with bounds check writing array section [PR117791]

2024-11-28 Thread Harald Anlauf


Hi Paul,

Am 28.11.24 um 13:55 schrieb Paul Richard Thomas:

Hi Harald,





I'll wait until tomorrow to see if Paul intervenes.  Otherwise I will
proceed and push.




I succeeded in breaking things even more! Please proceed and push.


I'm sort of glad you failed, too!  ;-)

Pushed as r15-5766 .

Note that the patch does not touch the following, probably more common
cases, where one index is just an (ordinary or implied-do) variable:

  write(*,*) 'line 4:',(array(:,j), j=sort_2(i(1:2)),sort_2(i(1:2)))
  write(*,*) 'line 5:',(array(:,j), j=1,int (2-sort_2(i(1:2

These still get optimized and use _gfortran_transfer_array_write .

Thanks,
Harald


Thanks

Paul

[PATCH v5 5/5] aarch64: add SVE2 FP8DOT2 and FP8DOT4 intrinsics

2024-11-28 Thread Claudio Bantaloukas


This patch adds support for the following intrinsics:
- svdot[_f32_mf8]_fpm
- svdot_lane[_f32_mf8]_fpm
- svdot[_f16_mf8]_fpm
- svdot_lane[_f16_mf8]_fpm

The first two are available under a combination of the FP8DOT4 and SVE2 
features.
Alternatively under the SSVE_FP8DOT4 feature under streaming mode.
The final two are available under a combination of the FP8DOT2 and SVE2 
features.
Alternatively under the SSVE_FP8DOT2 feature under streaming mode.

gcc/
* config/aarch64/aarch64-option-extensions.def
(fp8dot4, ssve-fp8dot4): Add new extensions.
(fp8dot2, ssve-fp8dot2): Likewise.
* config/aarch64/aarch64-sve-builtins-base.cc (svdot_impl): Support fp8.
(svdotprod_lane_impl): Likewise.
(svdot_lane): Provide an unspec for fp8 types.
* config/aarch64/aarch64-sve-builtins-shapes.cc
(ternary_mfloat8_def): Add new class.
(ternary_mfloat8): Add new shape.
(ternary_mfloat8_lane_group_selection_def): Add new class.
(ternary_mfloat8_lane_group_selection): Add new shape.
* config/aarch64/aarch64-sve-builtins-shapes.h
(ternary_mfloat8, ternary_mfloat8_lane_group_selection): Declare.
* config/aarch64/aarch64-sve-builtins-sve2.def
(svdot, svdot_lane): Add new DEF_SVE_FUNCTION_GS_FPM, twice to deal
with the combination of features providing support for 32 and 16 bit
floating point.
* config/aarch64/aarch64-sve2.md (@aarch64_sve_dot): Add new.
(@aarch64_sve_dot_lane): Likewise.
* config/aarch64/aarch64.h:
(TARGET_FP8DOT4, TARGET_SSVE_FP8DOT4): Add new defines.
(TARGET_FP8DOT2, TARGET_SSVE_FP8DOT2): Likewise.
* config/aarch64/iterators.md
(UNSPEC_DOT_FP8, UNSPEC_DOT_LANE_FP8): Add new unspecs.
* doc/invoke.texi: Document fp8dot4, fp8dot2, ssve-fp8dot4, ssve-fp8dot2
extensions.

gcc/testsuite

* gcc.target/aarch64/sve/acle/general-c/ternary_mfloat8_1.c: Add new.

gcc.target/aarch64/sve/acle/general-c/ternary_mfloat8_lane_group_selection_1.c:
Likewise.
* gcc.target/aarch64/sve2/acle/asm/dot_lane_mf8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/dot_mf8.c: Likewise.
* lib/target-supports.exp: Add dg-require-effective-target support for
aarch64_asm_fp8dot2_ok, aarch64_asm_fp8dot4_ok,
aarch64_asm_ssve-fp8dot2_ok and aarch64_asm_ssve-fp8dot4_ok.
---
 .../aarch64/aarch64-option-extensions.def |   8 +
 .../aarch64/aarch64-sve-builtins-base.cc  |  56 +++---
 .../aarch64/aarch64-sve-builtins-shapes.cc|  48 +
 .../aarch64/aarch64-sve-builtins-shapes.h |   8 +-
 .../aarch64/aarch64-sve-builtins-sve2.def |  14 ++
 gcc/config/aarch64/aarch64-sve2.md|  41 +
 gcc/config/aarch64/aarch64.h  |  18 ++
 gcc/config/aarch64/iterators.md   |   2 +
 gcc/doc/invoke.texi   |  12 ++
 .../sve/acle/general-c/ternary_mfloat8_1.c|  33 
 .../ternary_mfloat8_lane_group_selection_1.c  |  49 +
 .../aarch64/sve2/acle/asm/dot_lane_mf8.c  | 172 ++
 .../aarch64/sve2/acle/asm/dot_mf8.c   | 101 ++
 gcc/testsuite/lib/target-supports.exp |   3 +-
 14 files changed, 541 insertions(+), 24 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sve/acle/general-c/ternary_mfloat8_1.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sve/acle/general-c/ternary_mfloat8_lane_group_selection_1.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/dot_lane_mf8.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/dot_mf8.c

diff --git a/gcc/config/aarch64/aarch64-option-extensions.def b/gcc/config/aarch64/aarch64-option-extensions.def
index f39c9e6f897..089a0a74ec0 100644
--- a/gcc/config/aarch64/aarch64-option-extensions.def
+++ b/gcc/config/aarch64/aarch64-option-extensions.def
@@ -251,6 +251,14 @@ AARCH64_OPT_EXTENSION("ssve-fp8fma", SSVE_FP8FMA, (SME2,FP8), (), (), "ssve-fp8f
  
 AARCH64_OPT_EXTENSION("faminmax", FAMINMAX, (SIMD), (), (), "faminmax")
 
+AARCH64_OPT_EXTENSION("fp8dot4", FP8DOT4, (FP8FMA), (), (), "fp8dot4")
+
+AARCH64_OPT_EXTENSION("ssve-fp8dot4", SSVE_FP8DOT4, (SSVE_FP8FMA), (), (), "ssve-fp8dot4")
+
+AARCH64_OPT_EXTENSION("fp8dot2", FP8DOT2, (FP8DOT4), (), (), "fp8dot2")
+ 
+AARCH64_OPT_EXTENSION("ssve-fp8dot2", SSVE_FP8DOT2, (SSVE_FP8DOT4), (), (), "ssve-fp8dot2")
+ 
 #undef AARCH64_OPT_FMV_EXTENSION
 #undef AARCH64_OPT_EXTENSION
 #undef AARCH64_FMV_FEATURE
diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
index 95e66dc2adf..b97941932ab 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
+++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
@@ -838,21 +838,26 @@ public:
   rtx
   expand (function_expander &e) const override
   {
-/* In the optab, the multiplication operands come before the accumulator
-   operand.  Th

1 2 >

1 - 100 of 102 matches

Mail list logo