Re: [PATCH 2/3] [APX CCMP] Adjust startegy for selecting ccmp candidates

2024-06-06 Thread Hongyu Wang
Thanks, this is the patch I'm going to check-in.

For general ccmp scenario, the tree sequence is like

_1 = (a < b)
_2 = (c < d)
_3 = _1 & _2

current ccmp expanding will try to swap compare order for _1 and _2,
compare the expansion cost/cost2 for expanding _1 or _2 first, then
return the sequence with lower cost.

It is possible that one expansion succeeds and the other fails.
For example, x86 has int ccmp but not fp ccmp, so a combined fp and
int comparison must be ordered such that the fp comparison happens
first.  The costs are not meaningful for failed expansions.

Check the expand_ccmp_next result ret and ret2, returns the valid one
before cost comparison.

gcc/ChangeLog:

* ccmp.cc (expand_ccmp_expr_1): Check ret and ret2 of
  expand_ccmp_next, returns the valid one first instead of
  comparing cost.
---
 gcc/ccmp.cc | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/gcc/ccmp.cc b/gcc/ccmp.cc
index 7cb525addf4..4d50708d986 100644
--- a/gcc/ccmp.cc
+++ b/gcc/ccmp.cc
@@ -247,7 +247,15 @@ expand_ccmp_expr_1 (gimple *g, rtx_insn
**prep_seq, rtx_insn **gen_seq)
cost2 = seq_cost (prep_seq_2, speed_p);
cost2 += seq_cost (gen_seq_2, speed_p);
  }
-   if (cost2 < cost1)
+
+   /* It's possible that one expansion succeeds and the other
+  fails.
+  For example, x86 has int ccmp but not fp ccmp, and so a
+  combined fp and int comparison must be ordered such that
+  the fp comparison happens first. The costs are not
+  meaningful for failed expansions.  */
+
+   if (ret2 && (!ret || cost2 < cost1))
  {
*prep_seq = prep_seq_2;
*gen_seq = gen_seq_2;
--
2.31.1

Richard Sandiford  于2024年6月5日周三 17:21写道:

>
> Hongyu Wang  writes:
> > CC'd Richard for ccmp part as previously it is added only for aarch64.
> > The original logic will not interrupted since if
> > aarch64_gen_ccmp_first succeeded, aarch64_gen_ccmp_next will also
> > success, the cmp/fcmp and ccmp/fccmp supports all GPI/GPF, and the
> > prepare_operand will fixup the input that cmp supports but ccmp not,
> > so ret/ret2 will all be valid when comparing cost.
> > Thanks in advance.
>
> Sorry for the slow review.
>
> > Hongyu Wang  于2024年5月15日周三 16:22写道:
> >>
> >> For general ccmp scenario, the tree sequence is like
> >>
> >> _1 = (a < b)
> >> _2 = (c < d)
> >> _3 = _1 & _2
> >>
> >> current ccmp expanding will try to swap compare order for _1 and _2,
> >> compare the cost/cost2 between compare _1 and _2 first, then return the
> >> sequence with lower cost.
> >>
> >> For x86 ccmp, we don't support FP compare as ccmp operand, but we
> >> support fp comi + int ccmp sequence. With current cost comparison
> >> model, the fp comi + int ccmp can never be generated since it doesn't
> >> check whether expand_ccmp_next returns available result and the rtl
> >> cost for the empty ccmp sequence is always smaller.
> >>
> >> Check the expand_ccmp_next result ret and ret2, returns the valid one
> >> before cost comparison.
> >>
> >> gcc/ChangeLog:
> >>
> >> * ccmp.cc (expand_ccmp_expr_1): Check ret and ret2 of
> >> expand_ccmp_next, returns the valid one first before
> >> comparing cost.
> >> ---
> >>  gcc/ccmp.cc | 12 +++-
> >>  1 file changed, 11 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/gcc/ccmp.cc b/gcc/ccmp.cc
> >> index 7cb525addf4..4b424220068 100644
> >> --- a/gcc/ccmp.cc
> >> +++ b/gcc/ccmp.cc
> >> @@ -247,7 +247,17 @@ expand_ccmp_expr_1 (gimple *g, rtx_insn **prep_seq, 
> >> rtx_insn **gen_seq)
> >>   cost2 = seq_cost (prep_seq_2, speed_p);
> >>   cost2 += seq_cost (gen_seq_2, speed_p);
> >> }
> >> - if (cost2 < cost1)
> >> +
> >> + /* For x86 target the ccmp does not support fp operands, but
> >> +have fcomi insn that can produce eflags and then do int
> >> +ccmp. So if one of the op is fp compare, ret1 or ret2 can
> >> +fail, and the cost of the corresponding empty seq will
> >> +always be smaller, then the NULL sequence will be returned.
> >> +Add check for ret and ret2, returns the available one if
> >> +the other is NULL.  */
>
> I think the more fundamental point is that the cost of a failed
> expansion isn't meaningful.  So how about:
>
>   /* It's possible that one expansion succeeds and the other fails.
>  For example, x86 has int ccmp but not fp ccmp, and so a combined
>  fp and int comparison must be ordered such that the fp comparison
>  happens first.  The costs are not meaningful for failed
>  expansions.  */
>
> >> + if ((!ret && ret2)
> >> + || (!(ret && !ret2)
> >> + && cost2 < cost1))
>
> I think this simplifies to:
>
>   if (ret2 && (!ret1 || cost2 < cost1))
>
> OK with those changes, thanks.
>
> Richard
>
> >> {
> >>   *prep_seq = prep_seq_2;
> >>   *gen_seq 

[PATCH] RISC-V: Regenerate opt urls.

2024-06-06 Thread Robin Dapp
Hi,

I wasn't aware that I needed to regenerate the opt urls when
adding an option.  For this patch I did it now.

I suppose this doesn't require an extra OK but I'm going to
wait some minutes before applying still.

Regards
 Robin

gcc/ChangeLog:

* config/riscv/riscv.opt.urls: Regenerate.
---
 gcc/config/riscv/riscv.opt.urls | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/gcc/config/riscv/riscv.opt.urls b/gcc/config/riscv/riscv.opt.urls
index d87e9d5c9a8..622cb6e7b44 100644
--- a/gcc/config/riscv/riscv.opt.urls
+++ b/gcc/config/riscv/riscv.opt.urls
@@ -47,6 +47,12 @@ UrlSuffix(gcc/RISC-V-Options.html#index-mcmodel_003d-4)
 mstrict-align
 UrlSuffix(gcc/RISC-V-Options.html#index-mstrict-align-4)
 
+mscalar-strict-align
+UrlSuffix(gcc/RISC-V-Options.html#index-mscalar-strict-align)
+
+mvector-strict-align
+UrlSuffix(gcc/RISC-V-Options.html#index-mvector-strict-align)
+
 ; skipping UrlSuffix for 'mexplicit-relocs' due to finding no URLs
 
 mrelax
-- 
2.45.1


[wwwdocs] gcc-15/changes.html + projects/gomp: update for new OpenMP features

2024-06-06 Thread Tobias Burnus

GCC 15 now supports unified-shared memory and the tile/unroll constructs
in OpenMP.

Updates https://gcc.gnu.org/gcc-15/changes.html
and https://gcc.gnu.org/projects/gomp/

Comments?

Tobias
gcc-15/changes.html + projects/gomp: update for new OpenMP features

GCC 15 now supports unified-shared memory and the tile/unroll constructs
in OpenMP.

 htdocs/gcc-15/changes.html  | 27 ++-
 htdocs/projects/gomp/index.html | 11 +++
 2 files changed, 33 insertions(+), 5 deletions(-)

diff --git a/htdocs/gcc-15/changes.html b/htdocs/gcc-15/changes.html
index b59fd3be..94528ebd 100644
--- a/htdocs/gcc-15/changes.html
+++ b/htdocs/gcc-15/changes.html
@@ -40,6 +40,24 @@ a work-in-progress.
 
 New Languages and Language specific improvements
 
+
+  https://gcc.gnu.org/projects/gomp/";>OpenMP
+  
+
+  Support for unified-shared memory has been added for some AMD and Nvidia
+  GPUs devices, enabled only when using the
+  unified_shared_memory clause to the requires
+  directive. For details, see the offload-target specifics section in the
+  https://gcc.gnu.org/onlinedocs/libgomp/Offload-Target-Specifics.html";
+  >GNU Offloading and Multi Processing Runtime Library Manual.
+
+
+  OpenMP 5.1: The unroll and tile
+  loop-transformation constructs are now supported.
+
+  
+
+
 
 
 
diff --git a/htdocs/projects/gomp/index.html b/htdocs/projects/gomp/index.html
index 94bda5ff..d1765fc3 100644
--- a/htdocs/projects/gomp/index.html
+++ b/htdocs/projects/gomp/index.html
@@ -313,18 +313,21 @@ than listed, depending on resolved corner cases and optimizations.
   
   
 requires directive
-
+
   GCC 9
   GCC 12
   GCC 13
-  GCC 14
+  GCC 14
+  GCC 15
 
 
   (atomic_default_mem_order)
   (dynamic_allocators)
   complete but no non-host devices provides unified_address or
   unified_shared_memory
-  complete but no non-host devices provides unified_shared_memory
+  complete but no non-host devices provides unified_shared_memory
+  complete; see also https://gcc.gnu.org/onlinedocs/libgomp/Offload-Target-Specifics.html";>
+  Offload-Target Specifics
 
   
   
@@ -706,7 +709,7 @@ than listed, depending on resolved corner cases and optimizations.
   
   
 Loop transformation constructs
-No
+GCC 15
 
   
   


[PATCH] [APX ZU] Support APX zero-upper

2024-06-06 Thread Kong, Lingling
Enable ZU for IMUL (opcodes 0x69 and 0x6B) and SETcc.

gcc/ChangeLog:

* config/i386/i386-opts.h (enum apx_features):Add apx_zu.
* config/i386/i386.h (TARGET_APX_ZU): Define.
* config/i386/i386.md (*imulhizu): New define_insn.
(*setcc__zu): Ditto.
* config/i386/i386.opt: Add enum value for zu.

gcc/testsuite/ChangeLog:

* gcc.target/i386/apx-zu-1.c: New test.
* gcc.target/i386/apx-zu-2.c: Ditto.

Bootstrapped & regtested on x86-64-pc-linux-gnu with binutils 2.42 branch.
OK for trunk?

---
 gcc/config/i386/i386-opts.h  |  3 +-
 gcc/config/i386/i386.h   |  1 +
 gcc/config/i386/i386.md  | 25 ++--
 gcc/config/i386/i386.opt |  3 ++
 gcc/testsuite/gcc.target/i386/apx-zu-1.c | 38   
gcc/testsuite/gcc.target/i386/apx-zu-2.c | 19 
 6 files changed, 86 insertions(+), 3 deletions(-)  create mode 100644 
gcc/testsuite/gcc.target/i386/apx-zu-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/apx-zu-2.c

diff --git a/gcc/config/i386/i386-opts.h b/gcc/config/i386/i386-opts.h index 
5fcc4927978..c7ec0d9fd39 100644
--- a/gcc/config/i386/i386-opts.h
+++ b/gcc/config/i386/i386-opts.h
@@ -142,8 +142,9 @@ enum apx_features {
   apx_ppx = 1 << 3,
   apx_nf = 1 << 4,
   apx_ccmp = 1 << 5,
+  apx_zu = 1 << 6,
   apx_all = apx_egpr | apx_push2pop2 | apx_ndd
-   | apx_ppx | apx_nf | apx_ccmp,
+   | apx_ppx | apx_nf | apx_ccmp | apx_zu,
 };
 
 #endif
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h index 
7051c6c13e4..dc1a1f44320 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -57,6 +57,7 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If 
not, see  #define TARGET_APX_PPX (ix86_apx_features & apx_ppx)  #define 
TARGET_APX_NF (ix86_apx_features & apx_nf) 
#define TARGET_APX_CCMP (ix86_apx_features & apx_ccmp)
+#define TARGET_APX_ZU (ix86_apx_features & apx_zu)
 
 #include "config/vxworks-dummy.h"
 
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index 
ffcf63e1cba..a2765f65754 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -9967,6 +9967,19 @@
(const_string "direct")))
(set_attr "mode" "")])
 
+(define_insn "*imulhizu"
+  [(set (match_operand:SWI48x 0 "register_operand" "=r,r")
+   (zero_extend:SWI48x
+ (mult:HI (match_operand:HI 1 "nonimmediate_operand" "%rm,rm")
+  (match_operand:HI 2 "immediate_operand" "K,n"
+   (clobber (reg:CC FLAGS_REG))]
+  "TARGET_APX_ZU"
+  "@
+   imulzu{w}\t{%2, %1, %w0|%w0, %1, %2}
+   imulzu{w}\t{%2, %1, %w0|%w0, %1, %2}"
+  [(set_attr "type" "imul")
+   (set_attr "mode" "HI")])
+
 (define_insn "*mulsi3_1_zext"
   [(set (match_operand:DI 0 "register_operand" "=r,r,r")
(zero_extend:DI
@@ -18354,11 +18367,19 @@
 ;; For all sCOND expanders, also expand the compare or test insn that  ;; 
generates cc0.  Generate an equality comparison if `seq' or `sne'.
 
+(define_insn "*setcc__zu"
+  [(set (match_operand:SWI248 0 "register_operand" "=r")
+   (match_operator:SWI248 1 "ix86_comparison_operator"
+ [(reg FLAGS_REG) (const_int 0)]))]
+  "TARGET_APX_ZU"
+  "setzu%C1\t%b0"
+  [(set_attr "type" "setcc")])
+
 (define_insn_and_split "*setcc_di_1"
   [(set (match_operand:DI 0 "register_operand" "=q")
(match_operator:DI 1 "ix86_comparison_operator"
  [(reg FLAGS_REG) (const_int 0)]))]
-  "TARGET_64BIT && !TARGET_PARTIAL_REG_STALL"
+  "!TARGET_APX_ZU && TARGET_64BIT && !TARGET_PARTIAL_REG_STALL"
   "#"
   "&& reload_completed"
   [(set (match_dup 2) (match_dup 1))
@@ -18391,7 +18412,7 @@
   [(set (match_operand:SWI24 0 "register_operand" "=q")
(match_operator:SWI24 1 "ix86_comparison_operator"
  [(reg FLAGS_REG) (const_int 0)]))]
-  "!TARGET_PARTIAL_REG_STALL
+  "!TARGET_APX_ZU && !TARGET_PARTIAL_REG_STALL
&& (!TARGET_ZERO_EXTEND_WITH_AND || optimize_function_for_size_p (cfun))"
   "#"
   "&& reload_completed"
diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt index 
7017cc87cec..353fffb2343 100644
--- a/gcc/config/i386/i386.opt
+++ b/gcc/config/i386/i386.opt
@@ -1342,6 +1342,9 @@ Enum(apx_features) String(nf) Value(apx_nf) Set(6)  
EnumValue
 Enum(apx_features) String(ccmp) Value(apx_ccmp) Set(7)
 
+EnumValue
+Enum(apx_features) String(zu) Value(apx_zu) Set(8)
+
 EnumValue
 Enum(apx_features) String(all) Value(apx_all) Set(1)
 
diff --git a/gcc/testsuite/gcc.target/i386/apx-zu-1.c 
b/gcc/testsuite/gcc.target/i386/apx-zu-1.c
new file mode 100644
index 000..927a87673a7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/apx-zu-1.c
@@ -0,0 +1,38 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-mapxf -march=x86-64 -O2" } */
+/* { dg-final { scan-assembler-not "setle"} } */
+/* { dg-final { scan-assembler-not "setge"} } */
+/* { dg-final { scan-assembler-not "sete"} } */
+/* { dg-final { scan-assembler-not "xor"} } */
+/* 

Re: [patch, rs6000, middle-end 0/1] v1: Add implementation for different targets for pair mem fusion

2024-06-06 Thread Richard Sandiford
Hi,

Just some comments on the fuseable_load_p part, since that's what
we were discussing last time.

It looks like this now relies on:

Ajit Agarwal  writes:
> +  /* We use DF data flow because we change location rtx
> +  which is easier to find and modify.
> +  We use mix of rtl-ssa def-use and DF data flow
> +  where it is easier.  */
> +  df_chain_add_problem (DF_DU_CHAIN | DF_UD_CHAIN);
> +  df_analyze ();
> +  df_set_flags (DF_DEFER_INSN_RESCAN);

But please don't do this!  For one thing, building DU/UD chains
as well as rtl-ssa is really expensive in terms of compile time.
But more importantly, modifications need to happen via rtl-ssa
to ensure that the IL is kept up-to-date.  If we don't do that,
later fuse attempts will be based on stale data and so could
generate incorrect code.

> +// Check whether load can be fusable or not.
> +// Return true if fuseable otherwise false.
> +bool
> +rs6000_pair_fusion::fuseable_load_p (insn_info *info)
> +{
> +  for (auto def : info->defs())
> +{
> +  auto set = dyn_cast (def);
> +  for (auto use1 : set->nondebug_insn_uses ())
> + use1->set_is_live_out_use (true);
> +}

What was the reason for adding this loop?

> +
> +  rtx_insn *rtl_insn = info ->rtl ();
> +  rtx body = PATTERN (rtl_insn);
> +  rtx dest_exp = SET_DEST (body);
> +
> +  if (REG_P (dest_exp) &&
> +  (DF_REG_DEF_COUNT (REGNO (dest_exp)) > 1

The rtl-ssa way of checking this is:

  crtl->ssa->is_single_dominating_def (...)

> +   || DF_REG_EQ_USE_COUNT (REGNO (dest_exp)) > 0))
> +return  false;

Why are uses in notes a problem?  In the worst case, we should just be
able to remove the note instead.

> +
> +  rtx addr = XEXP (SET_SRC (body), 0);
> +
> +  if (GET_CODE (addr) == PLUS
> +  && XEXP (addr, 1) && CONST_INT_P (XEXP (addr, 1)))
> +{
> +  if (INTVAL (XEXP (addr, 1)) == -16)
> + return false;
> +  }

What's special about -16?

> +
> +  df_ref use;
> +  df_insn_info *insn_info = DF_INSN_INFO_GET (info->rtl ());
> +  FOR_EACH_INSN_INFO_DEF (use, insn_info)
> +{
> +  struct df_link *def_link = DF_REF_CHAIN (use);
> +
> +  if (!def_link || !def_link->ref
> +   || DF_REF_IS_ARTIFICIAL (def_link->ref))
> + continue;
> +
> +  while (def_link && def_link->ref)
> + {
> +   rtx_insn *insn = DF_REF_INSN (def_link->ref);
> +   if (GET_CODE (PATTERN (insn)) == PARALLEL)
> + return false;

Why do you need to skip PARALLELs?

> +
> +   rtx set = single_set (insn);
> +   if (set == NULL_RTX)
> + return false;
> +
> +   rtx op0 = SET_SRC (set);
> +   rtx_code code = GET_CODE (op0);
> +
> +   // This check is added as register pairs are not generated
> +   // by RA for neg:V2DF (fma: V2DF (reg1)
> +   //  (reg2)
> +   //  (neg:V2DF (reg3)))
> +   if (GET_RTX_CLASS (code) == RTX_UNARY)
> + return false;

What's special about (neg (fma ...))?

> +
> +   def_link = def_link->next;
> + }
> + }
> +  return true;
> +}

Thanks,
Richard


Re: [PATCH v1 0/6] Add DLL import/export implementation to AArch64

2024-06-06 Thread Evgeny Karpov
Thursday, June 6, 2024 1:42 AM
Jonathan Yong <10wa...@gmail.com> wrote:
>
> Where is HAVE_64BIT_POINTERS used?
> 

Sorry, it was missed in the posted changes for review.

Regards,
Evgeny

diff --git a/gcc/config/mingw/mingw32.h b/gcc/config/mingw/mingw32.h
index 8a6f0e8e8a5..0c9d5424942 100644
--- a/gcc/config/mingw/mingw32.h
+++ b/gcc/config/mingw/mingw32.h
@@ -82,7 +82,7 @@ along with GCC; see the file COPYING3.  If not see
 #endif

 #undef SUB_LINK_ENTRY
-#if TARGET_64BIT_DEFAULT || defined (TARGET_AARCH64_MS_ABI)
+#if HAVE_64BIT_POINTERS
 #define SUB_LINK_ENTRY SUB_LINK_ENTRY64
 #else
 #define SUB_LINK_ENTRY SUB_LINK_ENTRY32



Re: [PATCH v2 1/2] driver: Use -as/ld/objcopy as final fallback instead of native ones for cross

2024-06-06 Thread Richard Sandiford
YunQiang Su  writes:
> YunQiang Su  于2024年5月29日周三 10:02写道:
>>
>> Richard Sandiford  于2024年5月29日周三 05:28写道:
>> >
>> > YunQiang Su  writes:
>> > > If `find_a_program` cannot find `as/ld/objcopy` and we are a cross 
>> > > toolchain,
>> > > the final fallback is `as/ld` of system.  In fact, we can have a try with
>> > > -as/ld/objcopy before fallback to native as/ld/objcopy.
>> > >
>> > > This patch is derivatived from Debian's patch:
>> > >   gcc-search-prefixed-as-ld.diff
>> >
>> > I'm probably making you repeat a previous discussion, sorry, but could
>> > you describe the use case in more detail?  The current approach to
>> > handling cross toolchains has been used for many years.  Presumably
>> > this patch is supporting a different way of organising things,
>> > but I wasn't sure from the description what it was.
>> >
>> > AIUI, we currently assume that cross as, ld and objcopy will be
>> > installed under those names in $prefix/$target_alias/bin (aka 
>> > $tooldir/bin).
>> > E.g.:
>> >
>> >bin/aarch64-elf-as = aarch64-elf/bin/as
>> >
>> > GCC should then find as in aarch64-elf/bin.
>> >
>> > Is that not true in your case?
>> >
>>
>> Yes. This patch is only about the final fallback. I mean aarch64-elf/bin/as
>> still has higher priority than bin/aarch64-elf-as.
>>
>> In the current code, we find gas with:
>> /prefix/aarch64-elf/bin/as > $PATH/as
>>
>> And this patch a new one between them:
>> /prefix/aarch64-elf/bin/as > $PATH/aarch64-elf-as > $PATH/as
>>
>> > To be clear, I'm not saying the patch is wrong.  I'm just trying to
>> > understand why the patch is needed.
>> >
>>
>> Yes. If gcc is configured correctly, it is not so useful.
>> In some case for some lazy user, it may be useful,
>> for example, the binutils installed into different prefix with libc etc.
>>
>> For example, binutils is installed into /usr/aarch64-elf/bin, while
>> libc is installed into /usr/local/aarch64-elf/.
>>
>
> Any idea about it? Is it a use case making sense?

Yeah, I think it makes sense.  GCC and binutils are separate packages.
Users could cherry-pick a GCC installation and a separate binutils
installation rather than bundling them together into a single
toolchain.  And not everyone will have permission to change $tooldir.

So I agree we should support searching the user's path for an
as/ld/etc. based on the tool prefix.  Unfortunately, I don't think
I understand the code & constraints well enough to do a review.

In particular, it seems unfortunate that we need to do a trial
subcommand invocation before committing to the prefixed name.
And, if we continue to search for "as" in the user's path as a fallback,
it's not 100% obvious that "${triple}-as" later in the path should trump
"as" earlier in the path.

In some ways, it seems more consistent to do the replacement without
first doing a trial invocation.  But I don't know whether that would
break existing use cases.  (To be clear, I wouldn't feel comfortable
approving a patch to do that without buy-in from other maintainers.)

Thanks,
Richard


[PATCH v2] Target-independent store forwarding avoidance.

2024-06-06 Thread Manolis Tsamis
This pass detects cases of expensive store forwarding and tries to avoid them
by reordering the stores and using suitable bit insertion sequences.
For example it can transform this:

 strbw2, [x1, 1]
 ldr x0, [x1]  # Expensive store forwarding to larger load.

To:

 ldr x0, [x1]
 strbw2, [x1]
 bfi x0, x2, 0, 8

Assembly like this can appear with bitfields or type punning / unions.
On stress-ng when running the cpu-union microbenchmark the following speedups
have been observed.

  Neoverse-N1:  +29.4%
  Intel Coffeelake: +13.1%
  AMD 5950X:+17.5%

gcc/ChangeLog:

* Makefile.in: Add avoid-store-forwarding.o.
* common.opt: New option -favoid-store-forwarding.
* params.opt: New param store-forwarding-max-distance.
* doc/invoke.texi: Document new pass.
* doc/passes.texi: Document new pass.
* passes.def: Schedule a new pass.
* tree-pass.h (make_pass_rtl_avoid_store_forwarding): Declare.
* avoid-store-forwarding.cc: New file.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/avoid-store-forwarding-1.c: New test.
* gcc.target/aarch64/avoid-store-forwarding-2.c: New test.
* gcc.target/aarch64/avoid-store-forwarding-3.c: New test.
* gcc.target/aarch64/avoid-store-forwarding-4.c: New test.

Signed-off-by: Manolis Tsamis 
---

Changes in v2:
- Allow modes that are not scalar_int_mode.
- Introduce simple costing to avoid unprofitable transformations.
- Reject bit insert sequences that spill to memory.
- Document new pass.
- Fix and add testcases.

 gcc/Makefile.in   |   1 +
 gcc/avoid-store-forwarding.cc | 578 ++
 gcc/common.opt|   4 +
 gcc/doc/invoke.texi   |   9 +
 gcc/doc/passes.texi   |   8 +
 gcc/params.opt|   4 +
 gcc/passes.def|   1 +
 .../aarch64/avoid-store-forwarding-1.c|  28 +
 .../aarch64/avoid-store-forwarding-2.c|  39 ++
 .../aarch64/avoid-store-forwarding-3.c|  31 +
 .../aarch64/avoid-store-forwarding-4.c|  24 +
 gcc/tree-pass.h   |   1 +
 12 files changed, 728 insertions(+)
 create mode 100644 gcc/avoid-store-forwarding.cc
 create mode 100644 gcc/testsuite/gcc.target/aarch64/avoid-store-forwarding-1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/avoid-store-forwarding-2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/avoid-store-forwarding-3.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/avoid-store-forwarding-4.c

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index c983b0c102a..1fd68c7d182 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1683,6 +1683,7 @@ OBJS = \
statistics.o \
stmt.o \
stor-layout.o \
+   avoid-store-forwarding.o \
store-motion.o \
streamer-hooks.o \
stringpool.o \
diff --git a/gcc/avoid-store-forwarding.cc b/gcc/avoid-store-forwarding.cc
new file mode 100644
index 000..b641451a6b7
--- /dev/null
+++ b/gcc/avoid-store-forwarding.cc
@@ -0,0 +1,578 @@
+/* Avoid store forwarding optimization pass.
+   Copyright (C) 2024 Free Software Foundation, Inc.
+   Contributed by VRULL GmbH.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   .  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "backend.h"
+#include "rtl.h"
+#include "alias.h"
+#include "rtlanal.h"
+#include "tree-pass.h"
+#include "cselib.h"
+#include "predict.h"
+#include "insn-config.h"
+#include "expmed.h"
+#include "recog.h"
+#include "regset.h"
+#include "df.h"
+#include "expr.h"
+#include "memmodel.h"
+#include "emit-rtl.h"
+#include "vec.h"
+
+/* This pass tries to detect and avoid cases of store forwarding.
+   On many processors there is a large penalty when smaller stores are
+   forwarded to larger loads.  The idea used to avoid the stall is to move
+   the store after the load and in addition emit a bit insert sequence so
+   the load register has the correct value.  For example the following:
+
+ strbw2, [x1, 1]
+ ldr x0, [x1]
+
+   Will be transformed to:
+
+ ldr x0, [x1]
+ and w2, w2, 255
+ strbw2, [x1]
+  

Re: [PATCH] [RFC] Target-independent store forwarding avoidance. [PR48696] Target-independent store forwarding avoidance.

2024-06-06 Thread Manolis Tsamis
On Fri, May 24, 2024 at 9:27 AM Richard Biener  wrote:
>
> On Thu, 23 May 2024, Manolis Tsamis wrote:
>
> > This pass detects cases of expensive store forwarding and tries to avoid 
> > them
> > by reordering the stores and using suitable bit insertion sequences.
> > For example it can transform this:
> >
> >  strbw2, [x1, 1]
> >  ldr x0, [x1]  # Epxensive store forwarding to larger load.
> >
> > To:
> >
> >  ldr x0, [x1]
> >  strbw2, [x1]
> >  bfi x0, x2, 0, 8
>
> How do we represent atomics?  If the latter is a load-acquire or release
> the transform would be invalid.
>
> > Assembly like this can appear with bitfields or type punning / unions.
> > On stress-ng when running the cpu-union microbenchmark the following 
> > speedups
> > have been observed.
> >
> >   Neoverse-N1:  +29.4%
> >   Intel Coffeelake: +13.1%
> >   AMD 5950X:+17.5%
> >
> >   PR rtl-optimization/48696
> >
> > gcc/ChangeLog:
> >
> >   * Makefile.in: Add avoid-store-forwarding.o.
> >   * common.opt: New option -favoid-store-forwarding.
> >   * params.opt: New param store-forwarding-max-distance.
> >   * passes.def: Schedule a new pass.
> >   * tree-pass.h (make_pass_rtl_avoid_store_forwarding): Declare.
> >   * avoid-store-forwarding.cc: New file.
> >
> > gcc/testsuite/ChangeLog:
> >
> >   * gcc.dg/avoid-store-forwarding-1.c: New test.
> >   * gcc.dg/avoid-store-forwarding-2.c: New test.
> >   * gcc.dg/avoid-store-forwarding-3.c: New test.
> >
> > Signed-off-by: Manolis Tsamis 
> > ---
> >
> >  gcc/Makefile.in   |   1 +
> >  gcc/avoid-store-forwarding.cc | 554 ++
> >  gcc/common.opt|   4 +
> >  gcc/params.opt|   4 +
> >  gcc/passes.def|   1 +
> >  .../gcc.dg/avoid-store-forwarding-1.c |  46 ++
> >  .../gcc.dg/avoid-store-forwarding-2.c |  39 ++
> >  .../gcc.dg/avoid-store-forwarding-3.c |  31 +
> >  gcc/tree-pass.h   |   1 +
> >  9 files changed, 681 insertions(+)
> >  create mode 100644 gcc/avoid-store-forwarding.cc
> >  create mode 100644 gcc/testsuite/gcc.dg/avoid-store-forwarding-1.c
> >  create mode 100644 gcc/testsuite/gcc.dg/avoid-store-forwarding-2.c
> >  create mode 100644 gcc/testsuite/gcc.dg/avoid-store-forwarding-3.c
> >
> > diff --git a/gcc/Makefile.in b/gcc/Makefile.in
> > index a7f15694c34..be969b1ca1d 100644
> > --- a/gcc/Makefile.in
> > +++ b/gcc/Makefile.in
> > @@ -1681,6 +1681,7 @@ OBJS = \
> >   statistics.o \
> >   stmt.o \
> >   stor-layout.o \
> > + avoid-store-forwarding.o \
> >   store-motion.o \
> >   streamer-hooks.o \
> >   stringpool.o \
> > diff --git a/gcc/avoid-store-forwarding.cc b/gcc/avoid-store-forwarding.cc
> > new file mode 100644
> > index 000..d90627c4872
> > --- /dev/null
> > +++ b/gcc/avoid-store-forwarding.cc
> > @@ -0,0 +1,554 @@
> > +/* Avoid store forwarding optimization pass.
> > +   Copyright (C) 2024 Free Software Foundation, Inc.
> > +   Contributed by VRULL GmbH.
> > +
> > +   This file is part of GCC.
> > +
> > +   GCC is free software; you can redistribute it and/or modify it
> > +   under the terms of the GNU General Public License as published by
> > +   the Free Software Foundation; either version 3, or (at your option)
> > +   any later version.
> > +
> > +   GCC is distributed in the hope that it will be useful, but
> > +   WITHOUT ANY WARRANTY; without even the implied warranty of
> > +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > +   General Public License for more details.
> > +
> > +   You should have received a copy of the GNU General Public License
> > +   along with GCC; see the file COPYING3.  If not see
> > +   .  */
> > +
> > +#include "config.h"
> > +#include "system.h"
> > +#include "coretypes.h"
> > +#include "backend.h"
> > +#include "rtl.h"
> > +#include "alias.h"
> > +#include "rtlanal.h"
> > +#include "tree-pass.h"
> > +#include "cselib.h"
> > +#include "predict.h"
> > +#include "insn-config.h"
> > +#include "expmed.h"
> > +#include "recog.h"
> > +#include "regset.h"
> > +#include "df.h"
> > +#include "expr.h"
> > +#include "memmodel.h"
> > +#include "emit-rtl.h"
> > +#include "vec.h"
> > +
> > +/* This pass tries to detect and avoid cases of store forwarding.
> > +   On many processors there is a large penalty when smaller stores are
> > +   forwarded to larger loads.  The idea used to avoid the stall is to move
> > +   the store after the load and in addition emit a bit insert sequence so
> > +   the load register has the correct value.  For example the following:
> > +
> > + strbw2, [x1, 1]
> > + ldr x0, [x1]
> > +
> > +   Will be transformed to:
> > +
> > + ldr x0, [x1]
> > + and w2, w2, 255
> > + strbw2, [x1]
> > + bfi x0

Re: [PATCH v2] aarch64: Add vector floating point extend pattern [PR113880, PR113869]

2024-06-06 Thread Richard Sandiford
Pengxuan Zheng  writes:
> This patch adds vector floating point extend pattern for V2SF->V2DF and
> V4HF->V4SF conversions by renaming the existing 
> aarch64_float_extend_lo_
> pattern to the standard optab one, i.e., extend2. This allows the
> vectorizer to vectorize certain floating point widening operations for the
> aarch64 target.
>
>   PR target/113880
>   PR target/113869
>
> gcc/ChangeLog:
>
>   * config/aarch64/aarch64-builtins.cc (VAR1): Remap float_extend_lo_
>   builtin codes to standard optab ones.
>   * config/aarch64/aarch64-simd.md (aarch64_float_extend_lo_): 
> Rename
>   to...
>   (extend2): ... This.
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/aarch64/extend-vec.c: New test.

OK, thanks, and sorry for the slow review.

Richard

> Signed-off-by: Pengxuan Zheng 
> ---
>  gcc/config/aarch64/aarch64-builtins.cc|  9 
>  gcc/config/aarch64/aarch64-simd.md|  2 +-
>  gcc/testsuite/gcc.target/aarch64/extend-vec.c | 21 +++
>  3 files changed, 31 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/extend-vec.c
>
> diff --git a/gcc/config/aarch64/aarch64-builtins.cc 
> b/gcc/config/aarch64/aarch64-builtins.cc
> index f8eeccb554d..25189888d17 100644
> --- a/gcc/config/aarch64/aarch64-builtins.cc
> +++ b/gcc/config/aarch64/aarch64-builtins.cc
> @@ -534,6 +534,15 @@ BUILTIN_VDQ_BHSI (urhadd, uavg, _ceil, 0)
>  BUILTIN_VDQ_BHSI (shadd, avg, _floor, 0)
>  BUILTIN_VDQ_BHSI (uhadd, uavg, _floor, 0)
>  
> +/* The builtins below should be expanded through the standard optabs
> +   CODE_FOR_extend2. */
> +#undef VAR1
> +#define VAR1(F,T,N,M) \
> +  constexpr insn_code CODE_FOR_aarch64_##F##M = CODE_FOR_##T##N##M##2;
> +
> +VAR1 (float_extend_lo_, extend, v2sf, v2df)
> +VAR1 (float_extend_lo_, extend, v4hf, v4sf)
> +
>  #undef VAR1
>  #define VAR1(T, N, MAP, FLAG, A) \
>{#N #A, UP (A), CF##MAP (N, A), 0, TYPES_##T, FLAG_##FLAG},
> diff --git a/gcc/config/aarch64/aarch64-simd.md 
> b/gcc/config/aarch64/aarch64-simd.md
> index 868f4486218..c5e2c9f00d0 100644
> --- a/gcc/config/aarch64/aarch64-simd.md
> +++ b/gcc/config/aarch64/aarch64-simd.md
> @@ -3132,7 +3132,7 @@
>  DONE;
>}
>  )
> -(define_insn "aarch64_float_extend_lo_"
> +(define_insn "extend2"
>[(set (match_operand: 0 "register_operand" "=w")
>   (float_extend:
> (match_operand:VDF 1 "register_operand" "w")))]
> diff --git a/gcc/testsuite/gcc.target/aarch64/extend-vec.c 
> b/gcc/testsuite/gcc.target/aarch64/extend-vec.c
> new file mode 100644
> index 000..f6241d5
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/extend-vec.c
> @@ -0,0 +1,21 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +
> +/* { dg-final { scan-assembler-times {fcvtl\tv[0-9]+.2d, v[0-9]+.2s} 1 } } */
> +void
> +f (float *__restrict a, double *__restrict b)
> +{
> +  b[0] = a[0];
> +  b[1] = a[1];
> +}
> +
> +/* { dg-final { scan-assembler-times {fcvtl\tv[0-9]+.4s, v[0-9]+.4h} 1 } } */
> +void
> +f1 (_Float16 *__restrict a, float *__restrict b)
> +{
> +
> +  b[0] = a[0];
> +  b[1] = a[1];
> +  b[2] = a[2];
> +  b[3] = a[3];
> +}


Re: [PATCH v4] Match: Support more form for scalar unsigned SAT_ADD

2024-06-06 Thread Richard Biener
On Thu, Jun 6, 2024 at 3:19 AM Li, Pan2  wrote:
>
> Hi Richard,
>
> After revisited all the comments of the mail thread, I would like to confirm 
> if my understanding is correct according to the generated match code.
> For now the generated code looks like below:
>
> else if (gphi *_a1 = dyn_cast  (_d1))
>   {
> basic_block _b1 = gimple_bb (_a1);
> if (gimple_phi_num_args (_a1) == 2)
>   {
> basic_block _pb_0_1 = EDGE_PRED (_b1, 0)->src;
> basic_block _pb_1_1 = EDGE_PRED (_b1, 1)->src;
> basic_block _db_1 = safe_dyn_cast  (*gsi_last_bb (_pb_0_1)) 
> ? _pb_0_1 : _pb_1_1;
> basic_block _other_db_1 = safe_dyn_cast  (*gsi_last_bb 
> (_pb_0_1)) ? _pb_1_1 : _pb_0_1;
> gcond *_ct_1 = safe_dyn_cast  (*gsi_last_bb (_db_1));
> if (_ct_1 && EDGE_COUNT (_other_db_1->preds) == 1
>   && EDGE_COUNT (_other_db_1->succs) == 1
>   && EDGE_PRED (_other_db_1, 0)->src == _db_1)
>   {
> tree _cond_lhs_1 = gimple_cond_lhs (_ct_1);
> tree _cond_rhs_1 = gimple_cond_rhs (_ct_1);
> tree _p0 = build2 (gimple_cond_code (_ct_1), boolean_type_node, 
> _cond_lhs_1, _cond_rhs_1);
> bool _arg_0_is_true_1 = gimple_phi_arg_edge (_a1, 0)->flags & 
> EDGE_TRUE_VALUE;
> tree _p1 = gimple_phi_arg_def (_a1, _arg_0_is_true_1 ? 0 : 1);
> tree _p2 = gimple_phi_arg_def (_a1, _arg_0_is_true_1 ? 1 : 0);
> 
>
> The flow may look like below, or can only handling flow like below.
>
> +--+
> | cond |---+
> +--+   v
>|+---+
>|| other |
>|+---+
>v   |
> +-+|
> | PHI | <--+
> +-+
>
> Thus, I think it cannot handle the below 2 PHI flows (or even more 
> complicated shapes)
>
> +--+
> | cond |---+
> +--+   |
>|   |
>v   |
> +--+   |
> | mid  |   v
> +--++---+
>|| other |
>|+---+
>v   |
> +-+|
> | PHI | <--+
> +-+
>
> +--+
> | cond |---+
> +--+   |
>|   v
>|+---+
>|| mid-0 |+
>|+---+|
>|   | v
>|   |   +---+
>|   |   | mid-1 |
>|   v   +---+
>|+---+|
>|| other |<---+
>|+---+
>v   |
> +-+|
> | PHI | <--+
> +-+

Correct.

> So I am not very sure if we need (or reasonable) to take care of all the PHI 
> gimple flows (may impossible ?) Or keep the simplest one for now and add more 
> case by case.
> Thanks a lot.

I'd only keep the simplest one for now.  More complex cases can be
handled easily
with using dominators but those might not always be available or up-to-date when
doing match queries.  So let's revisit when we run into a case where
the simple form
isn't enough.

Richard.

>
> Pan
>
> -Original Message-
> From: Li, Pan2
> Sent: Wednesday, June 5, 2024 9:44 PM
> To: Richard Biener 
> Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
> tamar.christ...@arm.com
> Subject: RE: [PATCH v4] Match: Support more form for scalar unsigned SAT_ADD
>
> Thanks Richard for comments, will address the comments in v7, and looks like 
> I also need to resolve conflict up to a point.
>
> Pan
>
> -Original Message-
> From: Richard Biener 
> Sent: Wednesday, June 5, 2024 4:50 PM
> To: Li, Pan2 
> Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
> tamar.christ...@arm.com
> Subject: Re: [PATCH v4] Match: Support more form for scalar unsigned SAT_ADD
>
> On Thu, May 30, 2024 at 3:37 PM  wrote:
> >
> > From: Pan Li 
> >
> > After we support one gassign form of the unsigned .SAT_ADD,  we
> > would like to support more forms including both the branch and
> > branchless.  There are 5 other forms of .SAT_ADD,  list as below:
> >
> > Form 1:
> >   #define SAT_ADD_U_1(T) \
> >   T sat_add_u_1_##T(T x, T y) \
> >   { \
> > return (T)(x + y) >= x ? (x + y) : -1; \
> >   }
> >
> > Form 2:
> >   #define SAT_ADD_U_2(T) \
> >   T sat_add_u_2_##T(T x, T y) \
> >   { \
> > T ret; \
> > T overflow = __builtin_add_overflow (x, y, &ret); \
> > return (T)(-overflow) | ret; \
> >   }
> >
> > Form 3:
> >   #define SAT_ADD_U_3(T) \
> >   T sat_add_u_3_##T (T x, T y) \
> >   { \
> > T ret; \
> > return __builtin_add_overflow (x, y, &ret) ? -1 : ret; \
> >   }
> >
> > Form 4:
> >   #define SAT_ADD_U_4(T) \
> >   T sat_add_u_4_##T (T x, T y) \
> >   { \
> > T ret; \
> > return __builtin_add_overflow (x, y, &ret) == 0 ? ret : -1; \
> >   }
> >
> > Form 5:
> >   #define SAT_ADD_U_5(T) \
> >   T sat_add_u_5_##T(T x, T y) \
> >   { \
> > return (T)(x + y) < x ? -1 : (x + y); \
> >   }
> >
> > Take the for

Re: [PATCH v2] Vect: Support IFN SAT_SUB for unsigned vector int

2024-06-06 Thread Richard Biener
On Thu, Jun 6, 2024 at 8:26 AM  wrote:
>
> From: Pan Li 
>
> This patch would like to support the .SAT_SUB for the unsigned
> vector int.  Given we have below example code:
>
> void
> vec_sat_sub_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n)
> {
>   for (unsigned i = 0; i < n; i++)
> out[i] = (x[i] - y[i]) & (-(uint64_t)(x[i] >= y[i]));
> }
>
> Before this patch:
> void
> vec_sat_sub_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n)
> {
>   ...
>   _77 = .SELECT_VL (ivtmp_75, POLY_INT_CST [2, 2]);
>   ivtmp_56 = _77 * 8;
>   vect__4.7_59 = .MASK_LEN_LOAD (vectp_x.5_57, 64B, { -1, ... }, _77, 0);
>   vect__6.10_63 = .MASK_LEN_LOAD (vectp_y.8_61, 64B, { -1, ... }, _77, 0);
>
>   mask__7.11_64 = vect__4.7_59 >= vect__6.10_63;
>   _66 = .COND_SUB (mask__7.11_64, vect__4.7_59, vect__6.10_63, { 0, ... });
>
>   .MASK_LEN_STORE (vectp_out.15_71, 64B, { -1, ... }, _77, 0, _66);
>   vectp_x.5_58 = vectp_x.5_57 + ivtmp_56;
>   vectp_y.8_62 = vectp_y.8_61 + ivtmp_56;
>   vectp_out.15_72 = vectp_out.15_71 + ivtmp_56;
>   ivtmp_76 = ivtmp_75 - _77;
>   ...
> }
>
> After this patch:
> void
> vec_sat_sub_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n)
> {
>   ...
>   _76 = .SELECT_VL (ivtmp_74, POLY_INT_CST [2, 2]);
>   ivtmp_60 = _76 * 8;
>   vect__4.7_63 = .MASK_LEN_LOAD (vectp_x.5_61, 64B, { -1, ... }, _76, 0);
>   vect__6.10_67 = .MASK_LEN_LOAD (vectp_y.8_65, 64B, { -1, ... }, _76, 0);
>
>   vect_patt_37.11_68 = .SAT_SUB (vect__4.7_63, vect__6.10_67);
>
>   .MASK_LEN_STORE (vectp_out.12_70, 64B, { -1, ... }, _76, 0, 
> vect_patt_37.11_68);
>   vectp_x.5_62 = vectp_x.5_61 + ivtmp_60;
>   vectp_y.8_66 = vectp_y.8_65 + ivtmp_60;
>   vectp_out.12_71 = vectp_out.12_70 + ivtmp_60;
>   ivtmp_75 = ivtmp_74 - _76;
>   ...
> }
>
> The below test suites are passed for this patch
> * The x86 bootstrap test.
> * The x86 fully regression test.
> * The riscv fully regression tests.

OK.

Richard.

> gcc/ChangeLog:
>
> * match.pd: Add new form for vector mode recog.
> * tree-vect-patterns.cc (gimple_unsigned_integer_sat_sub): Add
> new match func decl;
> (vect_recog_build_binary_gimple_call): Extract helper func to
> build gcall with given internal_fn.
> (vect_recog_sat_sub_pattern): Add new func impl to recog .SAT_SUB.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/match.pd  | 14 +++
>  gcc/tree-vect-patterns.cc | 85 ---
>  2 files changed, 84 insertions(+), 15 deletions(-)
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 7c1ad428a3c..ebc60eba8dc 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -3110,6 +3110,20 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>   (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
>&& types_match (type, @0, @1
>
> +/* Unsigned saturation sub, case 3 (branchless with gt):
> +   SAT_U_SUB = (X - Y) * (X > Y).  */
> +(match (unsigned_integer_sat_sub @0 @1)
> + (mult:c (minus @0 @1) (convert (gt @0 @1)))
> + (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
> +  && types_match (type, @0, @1
> +
> +/* Unsigned saturation sub, case 4 (branchless with ge):
> +   SAT_U_SUB = (X - Y) * (X >= Y).  */
> +(match (unsigned_integer_sat_sub @0 @1)
> + (mult:c (minus @0 @1) (convert (ge @0 @1)))
> + (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
> +  && types_match (type, @0, @1
> +
>  /* x >  y  &&  x != XXX_MIN  -->  x > y
> x >  y  &&  x == XXX_MIN  -->  false . */
>  (for eqne (eq ne)
> diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
> index 81e8fdc9122..cef901808eb 100644
> --- a/gcc/tree-vect-patterns.cc
> +++ b/gcc/tree-vect-patterns.cc
> @@ -4488,6 +4488,32 @@ vect_recog_mult_pattern (vec_info *vinfo,
>  }
>
>  extern bool gimple_unsigned_integer_sat_add (tree, tree*, tree (*)(tree));
> +extern bool gimple_unsigned_integer_sat_sub (tree, tree*, tree (*)(tree));
> +
> +static gcall *
> +vect_recog_build_binary_gimple_call (vec_info *vinfo, gimple *stmt,
> +internal_fn fn, tree *type_out,
> +tree op_0, tree op_1)
> +{
> +  tree itype = TREE_TYPE (op_0);
> +  tree vtype = get_vectype_for_scalar_type (vinfo, itype);
> +
> +  if (vtype != NULL_TREE
> +&& direct_internal_fn_supported_p (fn, vtype, OPTIMIZE_FOR_BOTH))
> +{
> +  gcall *call = gimple_build_call_internal (fn, 2, op_0, op_1);
> +
> +  gimple_call_set_lhs (call, vect_recog_temp_ssa_var (itype, NULL));
> +  gimple_call_set_nothrow (call, /* nothrow_p */ false);
> +  gimple_set_location (call, gimple_location (stmt));
> +
> +  *type_out = vtype;
> +
> +  return call;
> +}
> +
> +  return NULL;
> +}
>
>  /*
>   * Try to detect saturation add pattern (SAT_ADD), aka below gimple:
> @@ -4510,27 +4536,55 @@ vect_recog_sat_add_pattern (vec_info *vinfo, 
> stmt_vec_info stmt_vinfo,
>if (!is_gimple_assign (last_stmt))
>  return NULL;
>
> -  tree res_ops[2];
> +  tre

RE: [PATCH v2] Vect: Support IFN SAT_SUB for unsigned vector int

2024-06-06 Thread Li, Pan2
Committed, thanks Richard.

Pan

-Original Message-
From: Richard Biener  
Sent: Thursday, June 6, 2024 6:50 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
tamar.christ...@arm.com; ubiz...@gmail.com
Subject: Re: [PATCH v2] Vect: Support IFN SAT_SUB for unsigned vector int

On Thu, Jun 6, 2024 at 8:26 AM  wrote:
>
> From: Pan Li 
>
> This patch would like to support the .SAT_SUB for the unsigned
> vector int.  Given we have below example code:
>
> void
> vec_sat_sub_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n)
> {
>   for (unsigned i = 0; i < n; i++)
> out[i] = (x[i] - y[i]) & (-(uint64_t)(x[i] >= y[i]));
> }
>
> Before this patch:
> void
> vec_sat_sub_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n)
> {
>   ...
>   _77 = .SELECT_VL (ivtmp_75, POLY_INT_CST [2, 2]);
>   ivtmp_56 = _77 * 8;
>   vect__4.7_59 = .MASK_LEN_LOAD (vectp_x.5_57, 64B, { -1, ... }, _77, 0);
>   vect__6.10_63 = .MASK_LEN_LOAD (vectp_y.8_61, 64B, { -1, ... }, _77, 0);
>
>   mask__7.11_64 = vect__4.7_59 >= vect__6.10_63;
>   _66 = .COND_SUB (mask__7.11_64, vect__4.7_59, vect__6.10_63, { 0, ... });
>
>   .MASK_LEN_STORE (vectp_out.15_71, 64B, { -1, ... }, _77, 0, _66);
>   vectp_x.5_58 = vectp_x.5_57 + ivtmp_56;
>   vectp_y.8_62 = vectp_y.8_61 + ivtmp_56;
>   vectp_out.15_72 = vectp_out.15_71 + ivtmp_56;
>   ivtmp_76 = ivtmp_75 - _77;
>   ...
> }
>
> After this patch:
> void
> vec_sat_sub_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n)
> {
>   ...
>   _76 = .SELECT_VL (ivtmp_74, POLY_INT_CST [2, 2]);
>   ivtmp_60 = _76 * 8;
>   vect__4.7_63 = .MASK_LEN_LOAD (vectp_x.5_61, 64B, { -1, ... }, _76, 0);
>   vect__6.10_67 = .MASK_LEN_LOAD (vectp_y.8_65, 64B, { -1, ... }, _76, 0);
>
>   vect_patt_37.11_68 = .SAT_SUB (vect__4.7_63, vect__6.10_67);
>
>   .MASK_LEN_STORE (vectp_out.12_70, 64B, { -1, ... }, _76, 0, 
> vect_patt_37.11_68);
>   vectp_x.5_62 = vectp_x.5_61 + ivtmp_60;
>   vectp_y.8_66 = vectp_y.8_65 + ivtmp_60;
>   vectp_out.12_71 = vectp_out.12_70 + ivtmp_60;
>   ivtmp_75 = ivtmp_74 - _76;
>   ...
> }
>
> The below test suites are passed for this patch
> * The x86 bootstrap test.
> * The x86 fully regression test.
> * The riscv fully regression tests.

OK.

Richard.

> gcc/ChangeLog:
>
> * match.pd: Add new form for vector mode recog.
> * tree-vect-patterns.cc (gimple_unsigned_integer_sat_sub): Add
> new match func decl;
> (vect_recog_build_binary_gimple_call): Extract helper func to
> build gcall with given internal_fn.
> (vect_recog_sat_sub_pattern): Add new func impl to recog .SAT_SUB.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/match.pd  | 14 +++
>  gcc/tree-vect-patterns.cc | 85 ---
>  2 files changed, 84 insertions(+), 15 deletions(-)
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 7c1ad428a3c..ebc60eba8dc 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -3110,6 +3110,20 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>   (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
>&& types_match (type, @0, @1
>
> +/* Unsigned saturation sub, case 3 (branchless with gt):
> +   SAT_U_SUB = (X - Y) * (X > Y).  */
> +(match (unsigned_integer_sat_sub @0 @1)
> + (mult:c (minus @0 @1) (convert (gt @0 @1)))
> + (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
> +  && types_match (type, @0, @1
> +
> +/* Unsigned saturation sub, case 4 (branchless with ge):
> +   SAT_U_SUB = (X - Y) * (X >= Y).  */
> +(match (unsigned_integer_sat_sub @0 @1)
> + (mult:c (minus @0 @1) (convert (ge @0 @1)))
> + (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
> +  && types_match (type, @0, @1
> +
>  /* x >  y  &&  x != XXX_MIN  -->  x > y
> x >  y  &&  x == XXX_MIN  -->  false . */
>  (for eqne (eq ne)
> diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
> index 81e8fdc9122..cef901808eb 100644
> --- a/gcc/tree-vect-patterns.cc
> +++ b/gcc/tree-vect-patterns.cc
> @@ -4488,6 +4488,32 @@ vect_recog_mult_pattern (vec_info *vinfo,
>  }
>
>  extern bool gimple_unsigned_integer_sat_add (tree, tree*, tree (*)(tree));
> +extern bool gimple_unsigned_integer_sat_sub (tree, tree*, tree (*)(tree));
> +
> +static gcall *
> +vect_recog_build_binary_gimple_call (vec_info *vinfo, gimple *stmt,
> +internal_fn fn, tree *type_out,
> +tree op_0, tree op_1)
> +{
> +  tree itype = TREE_TYPE (op_0);
> +  tree vtype = get_vectype_for_scalar_type (vinfo, itype);
> +
> +  if (vtype != NULL_TREE
> +&& direct_internal_fn_supported_p (fn, vtype, OPTIMIZE_FOR_BOTH))
> +{
> +  gcall *call = gimple_build_call_internal (fn, 2, op_0, op_1);
> +
> +  gimple_call_set_lhs (call, vect_recog_temp_ssa_var (itype, NULL));
> +  gimple_call_set_nothrow (call, /* nothrow_p */ false);
> +  gimple_set_location (call, gimple_location (stmt));
> +
> +  *type_out = vtype;
> +
> + 

RE: [PATCH v4] Match: Support more form for scalar unsigned SAT_ADD

2024-06-06 Thread Li, Pan2
> I'd only keep the simplest one for now.  More complex cases can be
> handled easily
> with using dominators but those might not always be available or up-to-date 
> when
> doing match queries.  So let's revisit when we run into a case where
> the simple form
> isn't enough.

Got it. Thanks, will send the v7 if no surprise from test suites.

Pan

-Original Message-
From: Richard Biener  
Sent: Thursday, June 6, 2024 6:47 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
tamar.christ...@arm.com
Subject: Re: [PATCH v4] Match: Support more form for scalar unsigned SAT_ADD

On Thu, Jun 6, 2024 at 3:19 AM Li, Pan2  wrote:
>
> Hi Richard,
>
> After revisited all the comments of the mail thread, I would like to confirm 
> if my understanding is correct according to the generated match code.
> For now the generated code looks like below:
>
> else if (gphi *_a1 = dyn_cast  (_d1))
>   {
> basic_block _b1 = gimple_bb (_a1);
> if (gimple_phi_num_args (_a1) == 2)
>   {
> basic_block _pb_0_1 = EDGE_PRED (_b1, 0)->src;
> basic_block _pb_1_1 = EDGE_PRED (_b1, 1)->src;
> basic_block _db_1 = safe_dyn_cast  (*gsi_last_bb (_pb_0_1)) 
> ? _pb_0_1 : _pb_1_1;
> basic_block _other_db_1 = safe_dyn_cast  (*gsi_last_bb 
> (_pb_0_1)) ? _pb_1_1 : _pb_0_1;
> gcond *_ct_1 = safe_dyn_cast  (*gsi_last_bb (_db_1));
> if (_ct_1 && EDGE_COUNT (_other_db_1->preds) == 1
>   && EDGE_COUNT (_other_db_1->succs) == 1
>   && EDGE_PRED (_other_db_1, 0)->src == _db_1)
>   {
> tree _cond_lhs_1 = gimple_cond_lhs (_ct_1);
> tree _cond_rhs_1 = gimple_cond_rhs (_ct_1);
> tree _p0 = build2 (gimple_cond_code (_ct_1), boolean_type_node, 
> _cond_lhs_1, _cond_rhs_1);
> bool _arg_0_is_true_1 = gimple_phi_arg_edge (_a1, 0)->flags & 
> EDGE_TRUE_VALUE;
> tree _p1 = gimple_phi_arg_def (_a1, _arg_0_is_true_1 ? 0 : 1);
> tree _p2 = gimple_phi_arg_def (_a1, _arg_0_is_true_1 ? 1 : 0);
> 
>
> The flow may look like below, or can only handling flow like below.
>
> +--+
> | cond |---+
> +--+   v
>|+---+
>|| other |
>|+---+
>v   |
> +-+|
> | PHI | <--+
> +-+
>
> Thus, I think it cannot handle the below 2 PHI flows (or even more 
> complicated shapes)
>
> +--+
> | cond |---+
> +--+   |
>|   |
>v   |
> +--+   |
> | mid  |   v
> +--++---+
>|| other |
>|+---+
>v   |
> +-+|
> | PHI | <--+
> +-+
>
> +--+
> | cond |---+
> +--+   |
>|   v
>|+---+
>|| mid-0 |+
>|+---+|
>|   | v
>|   |   +---+
>|   |   | mid-1 |
>|   v   +---+
>|+---+|
>|| other |<---+
>|+---+
>v   |
> +-+|
> | PHI | <--+
> +-+

Correct.

> So I am not very sure if we need (or reasonable) to take care of all the PHI 
> gimple flows (may impossible ?) Or keep the simplest one for now and add more 
> case by case.
> Thanks a lot.

I'd only keep the simplest one for now.  More complex cases can be
handled easily
with using dominators but those might not always be available or up-to-date when
doing match queries.  So let's revisit when we run into a case where
the simple form
isn't enough.

Richard.

>
> Pan
>
> -Original Message-
> From: Li, Pan2
> Sent: Wednesday, June 5, 2024 9:44 PM
> To: Richard Biener 
> Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
> tamar.christ...@arm.com
> Subject: RE: [PATCH v4] Match: Support more form for scalar unsigned SAT_ADD
>
> Thanks Richard for comments, will address the comments in v7, and looks like 
> I also need to resolve conflict up to a point.
>
> Pan
>
> -Original Message-
> From: Richard Biener 
> Sent: Wednesday, June 5, 2024 4:50 PM
> To: Li, Pan2 
> Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
> tamar.christ...@arm.com
> Subject: Re: [PATCH v4] Match: Support more form for scalar unsigned SAT_ADD
>
> On Thu, May 30, 2024 at 3:37 PM  wrote:
> >
> > From: Pan Li 
> >
> > After we support one gassign form of the unsigned .SAT_ADD,  we
> > would like to support more forms including both the branch and
> > branchless.  There are 5 other forms of .SAT_ADD,  list as below:
> >
> > Form 1:
> >   #define SAT_ADD_U_1(T) \
> >   T sat_add_u_1_##T(T x, T y) \
> >   { \
> > return (T)(x + y) >= x ? (x + y) : -1; \
> >   }
> >
> > Form 2:
> >   #define SAT_ADD_U_2(T) \
> >   T sat_add_u_2_##T(T x, T y) \
> >   { \
> > T ret; \
> > T overflow = _

[PATCH]AArch64: correct constraint on Upl early clobber alternatives

2024-06-06 Thread Tamar Christina
Hi All,

I made an oversight in the previous patch, where I added a ?Upa
alternative to the Upl cases.  This causes it to create the tie
between the larger register file rather than the constrained one.

This fixes the affected patterns.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
Build SPECCPU 2017 and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* config/aarch64/aarch64-sve.md (@aarch64_pred_cmp,
*cmp_cc, *cmp_ptest,
@aarch64_pred_cmp_wide,
*aarch64_pred_cmp_wide_cc,
*aarch64_pred_cmp_wide_ptest): Fix Upl tie alternative.
* config/aarch64/aarch64-sve2.md (@aarch64_pred_): Fix
Upl tie alternative.

---
diff --git a/gcc/config/aarch64/aarch64-sve.md 
b/gcc/config/aarch64/aarch64-sve.md
index 
d902bce62fde88b6d85f8d71f305e7fc76a4d34e..d69db34016a55b4324faa129a3ac1f47227ba776
 100644
--- a/gcc/config/aarch64/aarch64-sve.md
+++ b/gcc/config/aarch64/aarch64-sve.md
@@ -8134,13 +8134,13 @@ (define_insn "@aarch64_pred_cmp"
  UNSPEC_PRED_Z))
(clobber (reg:CC_NZC CC_REGNUM))]
   "TARGET_SVE"
-  {@ [ cons: =0 , 1   , 3 , 4; attrs: pred_clobber ]
- [ &Upa , Upl , w , ; yes ] 
cmp\t%0., %1/z, %3., #%4
- [ ?Upa , 0Upl, w , ; yes ] ^
- [ Upa  , Upl , w , ; no  ] ^
- [ &Upa , Upl , w , w; yes ] 
cmp\t%0., %1/z, %3., %4.
- [ ?Upa , 0Upl, w , w; yes ] ^
- [ Upa  , Upl , w , w; no  ] ^
+  {@ [ cons: =0 , 1  , 3 , 4; attrs: pred_clobber ]
+ [ &Upa , Upl, w , ; yes ] 
cmp\t%0., %1/z, %3., #%4
+ [ ?Upl , 0  , w , ; yes ] ^
+ [ Upa  , Upl, w , ; no  ] ^
+ [ &Upa , Upl, w , w; yes ] 
cmp\t%0., %1/z, %3., %4.
+ [ ?Upl , 0  , w , w; yes ] ^
+ [ Upa  , Upl, w , w; no  ] ^
   }
 )
 
@@ -8170,13 +8170,13 @@ (define_insn_and_rewrite "*cmp_cc"
  UNSPEC_PRED_Z))]
   "TARGET_SVE
&& aarch64_sve_same_pred_for_ptest_p (&operands[4], &operands[6])"
-  {@ [ cons: =0 , 1, 2 , 3; attrs: pred_clobber ]
- [ &Upa ,  Upl , w , ; yes ] 
cmp\t%0., %1/z, %2., #%3
- [ ?Upa ,  0Upl, w , ; yes ] ^
- [ Upa  ,  Upl , w , ; no  ] ^
- [ &Upa ,  Upl , w , w; yes ] 
cmp\t%0., %1/z, %2., %3.
- [ ?Upa ,  0Upl, w , w; yes ] ^
- [ Upa  ,  Upl , w , w; no  ] ^
+  {@ [ cons: =0 , 1   , 2 , 3; attrs: pred_clobber ]
+ [ &Upa ,  Upl, w , ; yes ] 
cmp\t%0., %1/z, %2., #%3
+ [ ?Upl ,  0  , w , ; yes ] ^
+ [ Upa  ,  Upl, w , ; no  ] ^
+ [ &Upa ,  Upl, w , w; yes ] 
cmp\t%0., %1/z, %2., %3.
+ [ ?Upl ,  0  , w , w; yes ] ^
+ [ Upa  ,  Upl, w , w; no  ] ^
   }
   "&& !rtx_equal_p (operands[4], operands[6])"
   {
@@ -8205,12 +8205,12 @@ (define_insn_and_rewrite "*cmp_ptest"
   "TARGET_SVE
&& aarch64_sve_same_pred_for_ptest_p (&operands[4], &operands[6])"
   {@ [ cons: =0, 1, 2 , 3; attrs: pred_clobber ]
- [ &Upa,  Upl , w , ; yes ] 
cmp\t%0., %1/z, %2., #%3
- [ ?Upa,  0Upl, w , ; yes ] ^
- [ Upa ,  Upl , w , ; no  ] ^
- [ &Upa,  Upl , w , w; yes ] 
cmp\t%0., %1/z, %2., %3.
- [ ?Upa,  0Upl, w , w; yes ] ^
- [ Upa ,  Upl , w , w; no  ] ^
+ [ &Upa,  Upl, w , ; yes ] 
cmp\t%0., %1/z, %2., #%3
+ [ ?Upl,  0  , w , ; yes ] ^
+ [ Upa ,  Upl, w , ; no  ] ^
+ [ &Upa,  Upl, w , w; yes ] 
cmp\t%0., %1/z, %2., %3.
+ [ ?Upl,  0  , w , w; yes ] ^
+ [ Upa ,  Upl, w , w; no  ] ^
   }
   "&& !rtx_equal_p (operands[4], operands[6])"
   {
@@ -8263,10 +8263,10 @@ (define_insn "@aarch64_pred_cmp_wide"
  UNSPEC_PRED_Z))
(clobber (reg:CC_NZC CC_REGNUM))]
   "TARGET_SVE"
-  {@ [ cons: =0, 1, 2, 3, 4; attrs: pred_clobber ]
- [ &Upa,  Upl ,  , w, w; yes ] 
cmp\t%0., %1/z, %3., %4.d
- [ ?Upa,  0Upl,  , w, w; yes ] ^
- [ Upa ,  Upl ,  , w, w; no  ] ^
+  {@ [ cons: =0, 1   , 2, 3, 4; attrs: pred_clobber ]
+ [ &Upa,  Upl,  , w, w; yes ] 
cmp\t%0., %1/z, %3., %4.d
+ [ ?Upl,  0  ,  , w, w; yes ] ^
+ [ Upa ,  Upl,  , w, w; no

Re: [PATCH v4 1/5] libgomp, openmp: Add ompx_pinned_mem_alloc

2024-06-06 Thread Tobias Burnus

Hi Andrew, hi Jakub, hello world,

Andrew Stubbs wrote:


Compared to the previous v3 posting of this patch, the enumeration of
the "ompx" allocators have been moved to start at "100"


100 is a bad value - as can be seen below.

As Jakub suggested at 
https://gcc.gnu.org/pipermail/gcc-patches/2023-December/640432.html
"given that LLVM uses 100-102 range, perhaps pick a different one, 200 or 150"

(I know that the first review email suggested 100.)


This creates a new predefined allocator as a shortcut for using pinned
memory with OpenMP.  The name uses the OpenMP extension space and is
intended to be consistent with other OpenMP implementations currently in
development.


Namely: ompx_pinned_mem_alloc

RFC: Should we use this name or - similar to LLVM - prefix this by
a vendor prefix instead (gnu_omp_ or gcc_omp_ instead of ompx_)?

IMHO it is fine to use ompx_ for pinned as the semantic is clear
and should be compatible with IBM and AMD.

For other additional memspaces / allocators, I am less sure, i.e.
on OG13 there are:
- ompx_unified_shared_mem_space, ompx_host_mem_space
- ompx_unified_shared_mem_alloc, ompx_host_mem_alloc

(BTW: In light of TR13 naming, the USM one could be
..._devices_all_mem_{alloc,space}, just to start some bikeshading
or following LLVM + Intel '…target_{host,shared}…'.)

* * *

Looking at other compilers:

IBM's compiler, https://www.ibm.com/docs/en/SSXVZZ_16.1.1/pdf/compiler.pdf , 
has:
- ompx_pinned_mem_alloc, tagged as IBM extension and otherwise without 
documenting it further

Checking omp.h, they define it as:
  ompx_pinned_mem_alloc = 9, /* Preview of host pinned memory support */
and additionally have:
  LOMP_MAX_MEM_ALLOC = 1024,

AMD's compiler based on clang has:
  /* Preview of pinned memory support */
  ompx_pinned_mem_alloc = 120,
in addition to the LLVM defines shown below.

Regarding LLVM:
- they don't offer 'pinned'
- they use the prefix 'llvm_omp' not 'ompx'

Namely:
typedef enum omp_allocator_handle_t
...
  llvm_omp_target_host_mem_alloc = 100,
  llvm_omp_target_shared_mem_alloc = 101,
  llvm_omp_target_device_mem_alloc = 102,
...
typedef enum omp_memspace_handle_t
...
  llvm_omp_target_host_mem_space = 100,
  llvm_omp_target_shared_mem_space = 101,
  llvm_omp_target_device_mem_space = 102,

Remark: I did not find a documentation - and while I
understand in principle host and shared, I wonder how
LLVM handles 'device_mem_space' when there is more than
one device.

BTW: OpenMP TR13 avoids this issue by adding two sets of
API routines. Namely:

First, for memspaces,
- omp_get_{device,devices}_memspace
- omp_get_{device,devices}_and_host_memspace
- omp_get_devices_all_memspace

and, secondly, for allocators:
- omp_get_{device,devices}_allocator
- omp_get_{device,devices}_and_host_allocator
- omp_get_devices_all_allocator

where omp_get_device_* takes a single device number and
omp_get_devices_* a list of device numbers while _and_host
automatically adds the initial device to the list.

* * *

Looking at Intel, they even use extensions without prefix:

omp_target_{host,shared,device}_mem_{space,alloc}

and contrary to LLVM they document it with the semantic, cf.
https://www.intel.com/content/www/us/en/docs/dpcpp-cpp-compiler/developer-guide-reference/2023-1/openmp-memory-spaces-and-allocators.html

* * *


The allocator is equivalent to using a custom allocator with the pinned
trait and the null fallback trait.


...


diff --git a/libgomp/allocator.c b/libgomp/allocator.c
index cdedc7d80e9..18e3f525ec6 100644
--- a/libgomp/allocator.c
+++ b/libgomp/allocator.c
@@ -99,6 +99,8 @@ GOMP_is_alloc (void *ptr)


...


   #define ARRAY_SIZE(A) (sizeof (A) / sizeof ((A)[0]))
-_Static_assert (ARRAY_SIZE (predefined_alloc_mapping)
+_Static_assert (ARRAY_SIZE (predefined_omp_alloc_mapping)
== omp_max_predefined_alloc + 1,
-   "predefined_alloc_mapping must match omp_memspace_handle_t");
+   "predefined_omp_alloc_mapping must match 
omp_memspace_handle_t");
+#define ARRAY_SIZE(A) (sizeof (A) / sizeof ((A)[0]))


I am surprised that this compiles: Why do you re-#define this macro?

* * *


--- a/libgomp/omp.h.in
+++ b/libgomp/omp.h.in
@@ -134,6 +134,7 @@ typedef enum omp_allocator_handle_t __GOMP_UINTPTR_T_ENUM
 omp_cgroup_mem_alloc = 6,
 omp_pteam_mem_alloc = 7,
 omp_thread_mem_alloc = 8,
+  ompx_pinned_mem_alloc = 100,


See remark regarding "100" at the top of this email.


--- a/libgomp/omp_lib.f90.in
+++ b/libgomp/omp_lib.f90.in
+integer (kind=omp_allocator_handle_kind), &
+ parameter :: ompx_pinned_mem_alloc = 100


Likewise.

* * *

Why didn't you also update omp_lib.h.in?

* * *

I think you really want to update the checking code inside GCC itself,

i.e. for Fortran:

3 |   !$omp allocate(a) allocator(100)

  | 21

Error: Predefined allocator required in ALLOCATOR clause at (1) as the list 
item 'a' at (2) has the SAV

Re: arm: Add .type and .size to __gnu_cmse_nonsecure_call [PR115360]

2024-06-06 Thread Richard Earnshaw (lists)
On 05/06/2024 17:07, Andre Vieira (lists) wrote:
> Hi,
> 
> This patch adds missing assembly directives to the CMSE library wrapper to 
> call functions with attribute cmse_nonsecure_call.  Without the .type 
> directive the linker will fail to produce the correct veneer if a call to 
> this wrapper function is to far from the wrapper itself.  The .size was added 
> for completeness, though we don't necessarily have a usecase for it.
> 
> I did not add a testcase as I couldn't get dejagnu to disassemble the linked 
> binary to check we used an appropriate branch instruction, I did however test 
> it locally and with this change the GNU linker now generates an appropriate 
> veneer and call to that veneer when __gnu_cmse_nonsecure_call is too far.
> 
> OK for trunk and backport to any release branches still in support (after 
> waiting a week or so)?
> 
> libgcc/ChangeLog:
> 
> PR target/115360
> * config/arm/cmse_nonsecure_call.S: Add .type and .size directives.

OK.

R.


Re: Clean up after newlib "nvptx: In offloading execution, map '_exit' to 'abort' [GCC PR85463]"

2024-06-06 Thread Thomas Schwinge
Hi!

On 2023-01-20T21:12:05+0100, I wrote:
> Re the newlib commit 05a2d7a8b3277b469e7cb121115bba398adc8559
> "nvptx: In offloading execution, map '_exit' to 'abort' [GCC PR85463]"
> that I've just pushes to newlib main branch:
>
> On 2023-01-19T23:00:05+0100, I wrote:
>> This is still not properly resolving 
>> '[nvptx] "exit" in offloaded region doesn't terminate process', but is
>> one step into that direction, and allows for simplifying some GCC code.
>
>> --- a/newlib/libc/machine/nvptx/_exit.c
>> +++ b/newlib/libc/machine/nvptx/_exit.c
>
>> @@ -26,7 +27,15 @@ void __attribute__((noreturn))
>>  _exit (int status)
>>  {
>>if (__exitval_ptr)
>> -*__exitval_ptr = status;
>> -  for (;;)
>> -asm ("exit;" ::: "memory");
>> +{
>> +  *__exitval_ptr = status;
>> +  for (;;)
>> +   asm ("exit;" ::: "memory");
>> +}
>> +  else /* offloading */
>> +{
>> +  /* Map to 'abort'; see 
>> +'[nvptx] "exit" in offloaded region doesn't terminate process'.  */
>> +  abort ();
>> +}
>>  }
>
> That has put "the PR85463 stuff" into the one central place, and allows
> for simplifying GCC as per the attached
> 'Clean up after newlib "nvptx: In offloading execution, map '_exit' to 
> 'abort' [GCC PR85463]"',
> which I've just pushed to GCC devel/omp/gcc-12 branch in
> commit 094b379f461bb4b635327cde26eabc0966159fec, and intend to push to
> GCC master branch once the latter depends on updated newlib for other
> (functional) reasons.

Better late than never: I've now pushed to GCC trunk branch
commit 395ac0417a17ba6405873f891f895417d696b603
'Clean up after newlib "nvptx: In offloading execution, map '_exit' to 'abort' 
[GCC PR85463]"',
see attached.


Grüße
 Thomas


>From 395ac0417a17ba6405873f891f895417d696b603 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Wed, 5 Jun 2024 14:34:06 +0200
Subject: [PATCH] Clean up after newlib "nvptx: In offloading execution, map
 '_exit' to 'abort' [GCC PR85463]"

	PR target/85463
	libgfortran/
	* runtime/minimal.c [__nvptx__] (exit): Don't override.
	libgomp/
	* config/nvptx/error.c (exit): Don't override.
	* testsuite/libgomp.oacc-fortran/error_stop-1.f: Update.
	* testsuite/libgomp.oacc-fortran/error_stop-2.f: Likewise.
	* testsuite/libgomp.oacc-fortran/error_stop-3.f: Likewise.
	* testsuite/libgomp.oacc-fortran/stop-1.f: Likewise.
	* testsuite/libgomp.oacc-fortran/stop-2.f: Likewise.
	* testsuite/libgomp.oacc-fortran/stop-3.f: Likewise.
---
 libgfortran/runtime/minimal.c   |  8 
 libgomp/config/nvptx/error.c|  7 ---
 .../testsuite/libgomp.oacc-fortran/error_stop-1.f   |  8 +---
 .../testsuite/libgomp.oacc-fortran/error_stop-2.f   |  8 +---
 .../testsuite/libgomp.oacc-fortran/error_stop-3.f   |  8 +---
 libgomp/testsuite/libgomp.oacc-fortran/stop-1.f | 13 +
 libgomp/testsuite/libgomp.oacc-fortran/stop-2.f |  6 +-
 libgomp/testsuite/libgomp.oacc-fortran/stop-3.f | 12 
 8 files changed, 37 insertions(+), 33 deletions(-)

diff --git a/libgfortran/runtime/minimal.c b/libgfortran/runtime/minimal.c
index f13b3a4bf90..619f818c844 100644
--- a/libgfortran/runtime/minimal.c
+++ b/libgfortran/runtime/minimal.c
@@ -31,14 +31,6 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 #endif
 
 
-#if __nvptx__
-/* Map "exit" to "abort"; see PR85463 '[nvptx] "exit" in offloaded region
-   doesn't terminate process'.  */
-# undef exit
-# define exit(status) do { (void) (status); abort (); } while (0)
-#endif
-
-
 #if __nvptx__
 /* 'printf' is all we have.  */
 # undef estr_vprintf
diff --git a/libgomp/config/nvptx/error.c b/libgomp/config/nvptx/error.c
index 7e668276004..f7a2536c29b 100644
--- a/libgomp/config/nvptx/error.c
+++ b/libgomp/config/nvptx/error.c
@@ -58,11 +58,4 @@
 #endif
 
 
-/* The 'exit (EXIT_FAILURE);' of an Fortran (only, huh?) OpenMP 'error'
-   directive with 'severity (fatal)' causes a hang, so 'abort' instead of
-   'exit'.  */
-#undef exit
-#define exit(status) abort ()
-
-
 #include "../../error.c"
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/error_stop-1.f b/libgomp/testsuite/libgomp.oacc-fortran/error_stop-1.f
index de727749a53..3918d6853f6 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/error_stop-1.f
+++ b/libgomp/testsuite/libgomp.oacc-fortran/error_stop-1.f
@@ -16,14 +16,16 @@
   END PROGRAM MAIN
 
 ! { dg-output "CheCKpOInT(\n|\r\n|\r)+" }
+
 ! { dg-output "ERROR STOP (\n|\r\n|\r)+" }
 !
 ! In gfortran's main program, libfortran's set_options is called - which sets
 ! compiler_options.backtrace = 1 by default.  For an offload libgfortran, this
 ! is never called and, hence, "Error termination." is never printed.  Thus:
 ! { dg-output "Error termination.*" { target { ! { openacc_nvidia_accel_selected || openacc_radeon_accel_selected } } } }
-!
-! PR85463:
+
+! PR85463.  The 'exit' implementation used with nvptx
+! offloadi

nvptx offloading: Global constructor, destructor support, via nvptx-tools 'ld' (was: nvptx: Support global constructors/destructors via 'collect2' for offloading)

2024-06-06 Thread Thomas Schwinge
Hi!

On 2022-12-23T14:35:16+0100, I wrote:
> On 2022-12-02T14:35:35+0100, I wrote:
>> On 2022-12-01T22:13:38+0100, I wrote:
>>> I'm working on support for global constructors/destructors with
>>> GCC/nvptx
>>
>> See "nvptx: Support global constructors/destructors via 'collect2'"
>> [posted before]

..., which I then recently revised; see
commit d9c90c82d900fdae95df4499bf5f0a4ecb903b53
"nvptx target: Global constructor, destructor support, via nvptx-tools 'ld'".

> Building on that, attached is now the additional "for offloading" piece:
> "nvptx: Support global constructors/destructors via 'collect2' for 
> offloading".

Similarly revised, I've now pushed to trunk branch
commit 5bbe5350a0932c78d4ffce292ba4104a6fe6ef96
"nvptx offloading: Global constructor, destructor support, via nvptx-tools 
'ld'",
see attached.


Grüße
 Thomas


>From 5bbe5350a0932c78d4ffce292ba4104a6fe6ef96 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Wed, 5 Jun 2024 12:40:50 +0200
Subject: [PATCH] nvptx offloading: Global constructor, destructor support, via
 nvptx-tools 'ld'

This extends commit d9c90c82d900fdae95df4499bf5f0a4ecb903b53
"nvptx target: Global constructor, destructor support, via nvptx-tools 'ld'"
for offloading.

	libgcc/
	* config/nvptx/gbl-ctors.c ["mgomp"]
	(__do_global_ctors__entry__mgomp)
	(__do_global_dtors__entry__mgomp): New.
	[!"mgomp"] (__do_global_ctors__entry, __do_global_dtors__entry):
	New.
	libgomp/
	* plugin/plugin-nvptx.c (nvptx_do_global_cdtors): New.
	(nvptx_close_device, GOMP_OFFLOAD_load_image)
	(GOMP_OFFLOAD_unload_image): Call it.
---
 libgcc/config/nvptx/gbl-ctors.c |  55 +++
 libgomp/plugin/plugin-nvptx.c   | 117 +++-
 2 files changed, 171 insertions(+), 1 deletion(-)

diff --git a/libgcc/config/nvptx/gbl-ctors.c b/libgcc/config/nvptx/gbl-ctors.c
index a2ca053e5e3..a56d64f8ef8 100644
--- a/libgcc/config/nvptx/gbl-ctors.c
+++ b/libgcc/config/nvptx/gbl-ctors.c
@@ -68,6 +68,61 @@ __gbl_ctors (void)
 }
 
 
+/* For nvptx offloading configurations, need '.entry' wrappers.  */
+
+# if defined(__nvptx_softstack__) && defined(__nvptx_unisimt__)
+
+/* OpenMP */
+
+/* See 'crt0.c', 'mgomp.c'.  */
+extern void *__nvptx_stacks[32] __attribute__((shared,nocommon));
+extern unsigned __nvptx_uni[32] __attribute__((shared,nocommon));
+
+__attribute__((kernel)) void __do_global_ctors__entry__mgomp (void *);
+
+void
+__do_global_ctors__entry__mgomp (void *nvptx_stacks_0)
+{
+  __nvptx_stacks[0] = nvptx_stacks_0;
+  __nvptx_uni[0] = 0;
+
+  __static_do_global_ctors ();
+}
+
+__attribute__((kernel)) void __do_global_dtors__entry__mgomp (void *);
+
+void
+__do_global_dtors__entry__mgomp (void *nvptx_stacks_0)
+{
+  __nvptx_stacks[0] = nvptx_stacks_0;
+  __nvptx_uni[0] = 0;
+
+  __static_do_global_dtors ();
+}
+
+# else
+
+/* OpenACC */
+
+__attribute__((kernel)) void __do_global_ctors__entry (void);
+
+void
+__do_global_ctors__entry (void)
+{
+  __static_do_global_ctors ();
+}
+
+__attribute__((kernel)) void __do_global_dtors__entry (void);
+
+void
+__do_global_dtors__entry (void)
+{
+  __static_do_global_dtors ();
+}
+
+# endif
+
+
 /* The following symbol just provides a means for the nvptx-tools 'ld' to
trigger linking in this file.  */
 
diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index 4cedc5390a3..0f3a3be1898 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -346,6 +346,11 @@ static struct ptx_device **ptx_devices;
default is set here.  */
 static unsigned lowlat_pool_size = 8 * 1024;
 
+static bool nvptx_do_global_cdtors (CUmodule, struct ptx_device *,
+const char *);
+static size_t nvptx_stacks_size ();
+static void *nvptx_stacks_acquire (struct ptx_device *, size_t, int);
+
 static inline struct nvptx_thread *
 nvptx_thread (void)
 {
@@ -565,6 +570,18 @@ nvptx_close_device (struct ptx_device *ptx_dev)
   if (!ptx_dev)
 return true;
 
+  bool ret = true;
+
+  for (struct ptx_image_data *image = ptx_dev->images;
+   image != NULL;
+   image = image->next)
+{
+  if (!nvptx_do_global_cdtors (image->module, ptx_dev,
+   "__do_global_dtors__entry"
+   /* or "__do_global_dtors__entry__mgomp" */))
+	ret = false;
+}
+
   for (struct ptx_free_block *b = ptx_dev->free_blocks; b;)
 {
   struct ptx_free_block *b_next = b->next;
@@ -585,7 +602,8 @@ nvptx_close_device (struct ptx_device *ptx_dev)
 CUDA_CALL (cuCtxDestroy, ptx_dev->ctx);
 
   free (ptx_dev);
-  return true;
+
+  return ret;
 }
 
 static int
@@ -1317,6 +1335,93 @@ nvptx_set_clocktick (CUmodule module, struct ptx_device *dev)
 GOMP_PLUGIN_fatal ("cuMemcpyHtoD error: %s", cuda_error (r));
 }
 
+/* Invoke MODULE's global constructors/destructors.  */
+
+static bool
+nvptx_do_global_cdtors (CUmodule module, struct ptx_device *ptx_dev,
+			const char *funcname)
+{
+  bool ret = true;
+  char *funcname_mgomp = NULL;
+  CUresult r;
+  CUfunction funcptr;
+  r = CUDA_CALL_NOCHECK (cuModuleGetFu

Re: [PATCH]AArch64: correct constraint on Upl early clobber alternatives

2024-06-06 Thread Richard Sandiford
Tamar Christina  writes:
> Hi All,
>
> I made an oversight in the previous patch, where I added a ?Upa
> alternative to the Upl cases.  This causes it to create the tie
> between the larger register file rather than the constrained one.
>
> This fixes the affected patterns.
>
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> Build SPECCPU 2017 and no issues.
>
> Ok for master?
>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
>   * config/aarch64/aarch64-sve.md (@aarch64_pred_cmp,
>   *cmp_cc, *cmp_ptest,
>   @aarch64_pred_cmp_wide,
>   *aarch64_pred_cmp_wide_cc,
>   *aarch64_pred_cmp_wide_ptest): Fix Upl tie alternative.
>   * config/aarch64/aarch64-sve2.md (@aarch64_pred_): Fix
>   Upl tie alternative.

OK, thanks.  For the record...

> ---
> diff --git a/gcc/config/aarch64/aarch64-sve.md 
> b/gcc/config/aarch64/aarch64-sve.md
> index 
> d902bce62fde88b6d85f8d71f305e7fc76a4d34e..d69db34016a55b4324faa129a3ac1f47227ba776
>  100644
> --- a/gcc/config/aarch64/aarch64-sve.md
> +++ b/gcc/config/aarch64/aarch64-sve.md
> @@ -8134,13 +8134,13 @@ (define_insn "@aarch64_pred_cmp"
> UNSPEC_PRED_Z))
> (clobber (reg:CC_NZC CC_REGNUM))]
>"TARGET_SVE"
> -  {@ [ cons: =0 , 1   , 3 , 4; attrs: pred_clobber ]
> - [ &Upa , Upl , w , ; yes ] 
> cmp\t%0., %1/z, %3., #%4
> - [ ?Upa , 0Upl, w , ; yes ] ^
> - [ Upa  , Upl , w , ; no  ] ^
> - [ &Upa , Upl , w , w; yes ] 
> cmp\t%0., %1/z, %3., %4.
> - [ ?Upa , 0Upl, w , w; yes ] ^
> - [ Upa  , Upl , w , w; no  ] ^
> +  {@ [ cons: =0 , 1  , 3 , 4; attrs: pred_clobber ]
> + [ &Upa , Upl, w , ; yes ] 
> cmp\t%0., %1/z, %3., #%4
> + [ ?Upl , 0  , w , ; yes ] ^
> + [ Upa  , Upl, w , ; no  ] ^
> + [ &Upa , Upl, w , w; yes ] 
> cmp\t%0., %1/z, %3., %4.
> + [ ?Upl , 0  , w , w; yes ] ^
> + [ Upa  , Upl, w , w; no  ] ^
>}
>  )
>  
> @@ -8170,13 +8170,13 @@ (define_insn_and_rewrite "*cmp_cc"
> UNSPEC_PRED_Z))]
>"TARGET_SVE
> && aarch64_sve_same_pred_for_ptest_p (&operands[4], &operands[6])"
> -  {@ [ cons: =0 , 1, 2 , 3; attrs: pred_clobber ]
> - [ &Upa ,  Upl , w , ; yes ] 
> cmp\t%0., %1/z, %2., #%3
> - [ ?Upa ,  0Upl, w , ; yes ] ^
> - [ Upa  ,  Upl , w , ; no  ] ^
> - [ &Upa ,  Upl , w , w; yes ] 
> cmp\t%0., %1/z, %2., %3.
> - [ ?Upa ,  0Upl, w , w; yes ] ^
> - [ Upa  ,  Upl , w , w; no  ] ^
> +  {@ [ cons: =0 , 1   , 2 , 3; attrs: pred_clobber ]
> + [ &Upa ,  Upl, w , ; yes ] 
> cmp\t%0., %1/z, %2., #%3
> + [ ?Upl ,  0  , w , ; yes ] ^
> + [ Upa  ,  Upl, w , ; no  ] ^
> + [ &Upa ,  Upl, w , w; yes ] 
> cmp\t%0., %1/z, %2., %3.
> + [ ?Upl ,  0  , w , w; yes ] ^
> + [ Upa  ,  Upl, w , w; no  ] ^
>}
>"&& !rtx_equal_p (operands[4], operands[6])"
>{
> @@ -8205,12 +8205,12 @@ (define_insn_and_rewrite "*cmp_ptest"
>"TARGET_SVE
> && aarch64_sve_same_pred_for_ptest_p (&operands[4], &operands[6])"
>{@ [ cons: =0, 1, 2 , 3; attrs: pred_clobber ]
> - [ &Upa,  Upl , w , ; yes ] 
> cmp\t%0., %1/z, %2., #%3
> - [ ?Upa,  0Upl, w , ; yes ] ^
> - [ Upa ,  Upl , w , ; no  ] ^
> - [ &Upa,  Upl , w , w; yes ] 
> cmp\t%0., %1/z, %2., %3.
> - [ ?Upa,  0Upl, w , w; yes ] ^
> - [ Upa ,  Upl , w , w; no  ] ^
> + [ &Upa,  Upl, w , ; yes ] 
> cmp\t%0., %1/z, %2., #%3
> + [ ?Upl,  0  , w , ; yes ] ^
> + [ Upa ,  Upl, w , ; no  ] ^
> + [ &Upa,  Upl, w , w; yes ] 
> cmp\t%0., %1/z, %2., %3.
> + [ ?Upl,  0  , w , w; yes ] ^
> + [ Upa ,  Upl, w , w; no  ] ^
>}
>"&& !rtx_equal_p (operands[4], operands[6])"
>{
> @@ -8263,10 +8263,10 @@ (define_insn "@aarch64_pred_cmp_wide"
> UNSPEC_PRED_Z))
> (clobber (reg:CC_NZC CC_REGNUM))]
>"TARGET_SVE"
> -  {@ [ cons: =0, 1, 2, 3, 4; attrs: pred_clobber ]
> - [ &Upa,  Upl ,  , w, w; yes ] 
> cmp\t%0., %1/z, %3., %4.d
> - [ ?Upa,  0Upl,  , w, w; yes ] ^
> - [ Upa ,  Upl ,  ,

Re: nvptx, libgcc: Stub unwinding implementation

2024-06-06 Thread Thomas Schwinge
Hi!

On 2023-01-20T22:04:02+0100, I wrote:
> We've been (t)asked to enable (portions of) GCC/Fortran I/O for nvptx
> offloading, which means building a normal (non-'LIBGFOR_MINIMAL')
> configuration of libgfortran.  One prerequisite patch, based on WIP work
> by Andrew Stubbs, is: "nvptx, libgcc: Stub unwinding implementation"

Pushed to trunk branch commit a29c5852a606588175d11844db84da0881227100
"nvptx, libgcc: Stub unwinding implementation", see attached.


Grüße
 Thomas


>From a29c5852a606588175d11844db84da0881227100 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Wed, 5 Jun 2024 13:11:04 +0200
Subject: [PATCH] nvptx, libgcc: Stub unwinding implementation

Adding stub '_Unwind_Backtrace', '_Unwind_GetIPInfo' functions is necessary
for linking libbacktrace, as a normal (non-'LIBGFOR_MINIMAL') configuration
of libgfortran wants to do, for example.

The file 'libgcc/config/nvptx/unwind-nvptx.c' is copied from
'libgcc/config/gcn/unwind-gcn.c'.

libgcc/ChangeLog:

	* config/nvptx/t-nvptx: Add unwind-nvptx.c.
	* config/nvptx/unwind-nvptx.c: New file.

Co-authored-by: Andrew Stubbs 
---
 libgcc/config/nvptx/t-nvptx|  3 ++-
 libgcc/config/nvptx/unwind-nvptx.c | 37 ++
 2 files changed, 39 insertions(+), 1 deletion(-)
 create mode 100644 libgcc/config/nvptx/unwind-nvptx.c

diff --git a/libgcc/config/nvptx/t-nvptx b/libgcc/config/nvptx/t-nvptx
index 260ed6334db..1ff574c2982 100644
--- a/libgcc/config/nvptx/t-nvptx
+++ b/libgcc/config/nvptx/t-nvptx
@@ -1,6 +1,7 @@
 LIB2ADD=$(srcdir)/config/nvptx/reduction.c \
 	$(srcdir)/config/nvptx/mgomp.c \
-	$(srcdir)/config/nvptx/atomic.c
+	$(srcdir)/config/nvptx/atomic.c \
+	$(srcdir)/config/nvptx/unwind-nvptx.c
 
 # Until we have libstdc++-v3/libsupc++ proper.
 LIB2ADD += $(srcdir)/c++-minimal/guard.c
diff --git a/libgcc/config/nvptx/unwind-nvptx.c b/libgcc/config/nvptx/unwind-nvptx.c
new file mode 100644
index 000..d08ba266be1
--- /dev/null
+++ b/libgcc/config/nvptx/unwind-nvptx.c
@@ -0,0 +1,37 @@
+/* Stub unwinding implementation.
+
+   Copyright (C) 2019-2024 Free Software Foundation, Inc.
+   Contributed by Mentor Graphics
+
+   This file is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by the
+   Free Software Foundation; either version 3, or (at your option) any
+   later version.
+
+   This file is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   .  */
+
+#include "unwind.h"
+
+_Unwind_Reason_Code
+_Unwind_Backtrace(_Unwind_Trace_Fn trace, void * trace_argument)
+{
+  return 0;
+}
+
+_Unwind_Ptr
+_Unwind_GetIPInfo (struct _Unwind_Context *c, int *ip_before_insn)
+{
+  return 0;
+}
-- 
2.34.1



Re: [committed] nvptx, libgfortran: Switch out of "minimal" mode

2024-06-06 Thread Tobias Burnus

Hi Thomas,

regarding the commit r15-1070-g3a4775d4403f2e / https://gcc.gnu.org/r15-1070

First, thanks for adding I/O support to nvptx offloading.

I have a wording nit, to be confirmed by a native speaker:


--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi

...

+@item I/O within OpenMP target regions and OpenACC compute regions is 
supported

+  using the C library @code{printf} functions.
+  Additionally, the Fortran @code{print}/@code{write} statements are
+  supported within OpenMP target regions, but not yet OpenACC compute
+  regions.  @c The latter needs 
'GOMP_NVPTX_NATIVE_GPU_THREAD_STACK_SIZE'.




I think an "in" (or 'within') is missing before OpenACC.

Otherwise, it seemed to fine at a glance – and I am happy that that 
feature now finally works :-)


Hooray, no longer using reverse offload ("!$omp target 
device(ancestor:1)") for Fortran I/O when debugging.


Thanks,

Tobias


nvptx, libgfortran: Switch out of "minimal" mode

2024-06-06 Thread Thomas Schwinge
Hi!

On 2023-01-20T22:16:00+0100, Hi wrote:
> On 2023-01-20T22:04:02+0100, I wrote:
>> We've been (t)asked to enable (portions of) GCC/Fortran I/O for nvptx
>> offloading, which means building a normal (non-'LIBGFOR_MINIMAL')
>> configuration of libgfortran.
>
> This is achieved by 'nvptx, libgfortran: Switch out of "minimal" mode',
> see attached, again based on WIP work by Andrew Stubbs.

I've recently slightly revised this, in particular:

> The OpenACC XFAILs: "[...] overflows the stack [...]"

... I now avoid by use of commit 0d25989d60d15866ef4737d66e02432f50717255
"nvptx offloading: 'GOMP_NVPTX_NATIVE_GPU_THREAD_STACK_SIZE' environment 
variable [PR97384, PR105274]".

The underlying issue remains...

> [...] unresolved at this point; see the discussion around
> "Handling of large stack objects in GPU code generation -- maybe transform 
> into heap allocation?",
> and my "nvptx: '-mframe-malloc-threshold', '-Wframe-malloc-threshold'"
> experimenting.  (The latter works to some extent, but also has other
> issues that I shall detail at some later point in time.)

(No progress.)


Pushed to trunk branch commit 3a4775d4403f2e88b589e88a9937cc1fd45a0e87
'nvptx, libgfortran: Switch out of "minimal" mode', see attached.

This, unsurprisingly, also greatly improves GCC/Fortran test results for
nvptx target.


Grüße
 Thomas


>From 3a4775d4403f2e88b589e88a9937cc1fd45a0e87 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Wed, 5 Jun 2024 13:13:24 +0200
Subject: [PATCH] nvptx, libgfortran: Switch out of "minimal" mode

..., in order to enable (portions of) Fortran I/O, for example.

	libgfortran/
	* configure.ac: No longer set 'LIBGFOR_MINIMAL' for nvptx.
	* configure: Regenerate.
	libgomp/
	* libgomp.texi (nvptx): Update.
	* testsuite/libgomp.fortran/target-print-1-nvptx.f90: Remove.
	* testsuite/libgomp.fortran/target-print-1.f90: Adjust.
	* testsuite/libgomp.oacc-fortran/error_stop-2-nvptx.f: New.
	* testsuite/libgomp.oacc-fortran/error_stop-2.f: Adjust.
	* testsuite/libgomp.oacc-fortran/print-1-nvptx.f90: Adjust.
	* testsuite/libgomp.oacc-fortran/print-1.f90: Adjust.
	* testsuite/libgomp.oacc-fortran/stop-2-nvptx.f: New.
	* testsuite/libgomp.oacc-fortran/stop-2.f: Adjust.

Co-authored-by: Andrew Stubbs 
---
 libgfortran/configure | 21 --
 libgfortran/configure.ac  | 17 +++-
 libgomp/libgomp.texi  | 10 +++--
 .../libgomp.fortran/target-print-1-nvptx.f90  | 11 -
 .../libgomp.fortran/target-print-1.f90|  3 --
 .../libgomp.oacc-fortran/error_stop-2-nvptx.f | 39 ++
 .../libgomp.oacc-fortran/error_stop-2.f   |  3 +-
 .../libgomp.oacc-fortran/print-1-nvptx.f90| 40 +++
 .../libgomp.oacc-fortran/print-1.f90  |  4 +-
 .../libgomp.oacc-fortran/stop-2-nvptx.f   | 36 +
 .../testsuite/libgomp.oacc-fortran/stop-2.f   |  3 +-
 11 files changed, 134 insertions(+), 53 deletions(-)
 delete mode 100644 libgomp/testsuite/libgomp.fortran/target-print-1-nvptx.f90
 create mode 100644 libgomp/testsuite/libgomp.oacc-fortran/error_stop-2-nvptx.f
 create mode 100644 libgomp/testsuite/libgomp.oacc-fortran/stop-2-nvptx.f

diff --git a/libgfortran/configure b/libgfortran/configure
index 774dd52fc95..11a1bc5f070 100755
--- a/libgfortran/configure
+++ b/libgfortran/configure
@@ -6207,17 +6207,12 @@ else
 fi
 
 
-# For GPU offloading, not everything in libfortran can be supported.
-# Currently, the only target that has this problem is nvptx.  The
-# following is a (partial) list of features that are unsupportable on
-# this particular target:
-# * Constructors
-# * alloca
-# * C library support for I/O, with printf as the one notable exception
-# * C library support for other features such as signal, environment
-#   variables, time functions
-
- if test "x${target_cpu}" = xnvptx; then
+# "Minimal" mode is for targets that cannot (yet) support all features of
+# libgfortran.  It avoids the need for working constructors, alloca, and C
+# library support for I/O, signals, environment variables, time functions, etc.
+# At present there are no targets that require this mode.
+
+ if false; then
   LIBGFOR_MINIMAL_TRUE=
   LIBGFOR_MINIMAL_FALSE='#'
 else
@@ -12852,7 +12847,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 12855 "configure"
+#line 12850 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
@@ -12958,7 +12953,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 12961 "configure"
+#line 12956 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
diff --git a/libgfortran/configure.ac b/libgfortran/configure.ac
index 46585a3ee14..cca1ea0ea97 100644
--- a/libgfortran/configure.ac
+++ b/libgfortran/configure.ac
@@ -209,17 +209,12 @@ AM_CONDITIONAL(LIBGFOR_USE_SYMVER, [test "x$gfortran_use_symver" != xno])
 A

Re: [PATCH] aarch64: Add missing ACLE macro for NEON-SVE Bridge

2024-06-06 Thread Richard Sandiford
Richard Ball  writes:
> __ARM_NEON_SVE_BRIDGE was missed in the original patch and is
> added by this patch.
>
> Ok for trunk and a backport into gcc-14?
>
> gcc/ChangeLog:
>
>   * config/aarch64/aarch64-c.cc (aarch64_update_cpp_builtins):
>   Add missing __ARM_NEON_SVE_BRIDGE.

After this patch was posted, there was some internal discussion
involving LLVM & GNU devs about what this kind of macro means, now that
we have FMV.  The feeling was that __ARM_NEON_SVE_BRIDGE should just
indicate whether the compiler provides the file, not whether AdvSIMD
& SVE are enabled.  I think we should therefore add this to
aarch64_define_unconditional_macros instead.

Sorry for the slow review.  I was waiting for the outcome of that
discussion before replying.

Thanks,
Richard

> diff --git a/gcc/config/aarch64/aarch64-c.cc b/gcc/config/aarch64/aarch64-c.cc
> index 
> fe1a20e4e546a68e5f7eddff3bbb0d3e831fbd9b..1121be118cf8d05e3736ad4ee75568ff7cb92bfd
>  100644
> --- a/gcc/config/aarch64/aarch64-c.cc
> +++ b/gcc/config/aarch64/aarch64-c.cc
> @@ -260,6 +260,7 @@ aarch64_update_cpp_builtins (cpp_reader *pfile)
>aarch64_def_or_undef (TARGET_SME_I16I64, "__ARM_FEATURE_SME_I16I64", 
> pfile);
>aarch64_def_or_undef (TARGET_SME_F64F64, "__ARM_FEATURE_SME_F64F64", 
> pfile);
>aarch64_def_or_undef (TARGET_SME2, "__ARM_FEATURE_SME2", pfile);
> +  aarch64_def_or_undef (TARGET_SVE, "__ARM_NEON_SVE_BRIDGE", pfile);
>  
>/* Not for ACLE, but required to keep "float.h" correct if we switch
>   target between implementations that do or do not support ARMv8.2-A


Re: [PATCH] aarch64: Add fix_truncv4sfv4hi2 pattern [PR113882]

2024-06-06 Thread Richard Sandiford
Pengxuan Zheng  writes:
> This patch adds the fix_truncv4sfv4hi2 (V4SF->V4HI) pattern which is 
> implemented
> using fix_truncv4sfv4si2 (V4SF->V4SI) and then truncv4siv4hi2 (V4SI->V4HI).
>
>   PR target/113882
>
> gcc/ChangeLog:
>
>   * config/aarch64/aarch64-simd.md (fix_truncv4sfv4hi2): New pattern.

Could we handle this by extending the target-independent code instead?
Richard mentioned in comment 1 that the current set of intermediate
conversions is hard-coded, but it didn't sound like he was implying that
the set shouldn't change.

Thanks,
Richard

> gcc/testsuite/ChangeLog:
>
>   * gcc.target/aarch64/fix_trunc2.c: New test.
>
> Signed-off-by: Pengxuan Zheng 
> ---
>  gcc/config/aarch64/aarch64-simd.md| 13 +
>  gcc/testsuite/gcc.target/aarch64/fix_trunc2.c | 14 ++
>  2 files changed, 27 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/fix_trunc2.c
>
> diff --git a/gcc/config/aarch64/aarch64-simd.md 
> b/gcc/config/aarch64/aarch64-simd.md
> index 868f4486218..096f7b56a27 100644
> --- a/gcc/config/aarch64/aarch64-simd.md
> +++ b/gcc/config/aarch64/aarch64-simd.md
> @@ -3032,6 +3032,19 @@ (define_expand 
> "2"
>"TARGET_SIMD"
>{})
>  
> +
> +(define_expand "fix_truncv4sfv4hi2"
> +  [(match_operand:V4HI 0 "register_operand")
> +   (match_operand:V4SF 1 "register_operand")]
> +  "TARGET_SIMD"
> +  {
> +rtx tmp = gen_reg_rtx (V4SImode);
> +emit_insn (gen_fix_truncv4sfv4si2 (tmp, operands[1]));
> +emit_insn (gen_truncv4siv4hi2 (operands[0], tmp));
> +DONE;
> +  }
> +)
> +
>  (define_expand "ftrunc2"
>[(set (match_operand:VHSDF 0 "register_operand")
>   (unspec:VHSDF [(match_operand:VHSDF 1 "register_operand")]
> diff --git a/gcc/testsuite/gcc.target/aarch64/fix_trunc2.c 
> b/gcc/testsuite/gcc.target/aarch64/fix_trunc2.c
> new file mode 100644
> index 000..57cc00913a3
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/fix_trunc2.c
> @@ -0,0 +1,14 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +
> +void
> +f (short *__restrict a, float *__restrict b)
> +{
> +  a[0] = b[0];
> +  a[1] = b[1];
> +  a[2] = b[2];
> +  a[3] = b[3];
> +}
> +
> +/* { dg-final { scan-assembler-times {fcvtzs\tv[0-9]+.4s, v[0-9]+.4s} 1 } } 
> */
> +/* { dg-final { scan-assembler-times {xtn\tv[0-9]+.4h, v[0-9]+.4s} 1 } } */


Re: [pushed 2/3] libcpp: move label_text to its own header

2024-06-06 Thread Bert Wesarg
Dear David,

On Tue, May 28, 2024 at 10:07 PM David Malcolm  wrote:
>
> No functional change intended.
>
> Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
> Pushed to trunk as r15-874-g9bda2c4c81b668.
>
> libcpp/ChangeLog:
> * Makefile.in (TAGS_SOURCES): Add include/label-text.h.
> * include/label-text.h: New file.
> * include/rich-location.h: Include "label-text.h".
> (class label_text): Move to label-text.h.
>
> Signed-off-by: David Malcolm 
> ---
>  libcpp/Makefile.in |   2 +-
>  libcpp/include/label-text.h| 102 +
>  libcpp/include/rich-location.h |  79 +
>  3 files changed, 105 insertions(+), 78 deletions(-)
>  create mode 100644 libcpp/include/label-text.h
>
> diff --git a/libcpp/Makefile.in b/libcpp/Makefile.in
> index ebbca3fb..7e47153264c0 100644
> --- a/libcpp/Makefile.in
> +++ b/libcpp/Makefile.in
> @@ -271,7 +271,7 @@ ETAGS = @ETAGS@
>
>  TAGS_SOURCES = $(libcpp_a_SOURCES) internal.h system.h ucnid.h \
>  include/cpplib.h include/line-map.h include/mkdeps.h include/symtab.h \
> -include/rich-location.h
> +include/rich-location.h include/label-text.h

this does not seem to be enough that the new header will be installed.
I get compile errors when compiling an plug-in with this patch:

In file included from
/home/bitten/opt/gcc-15-20240602/lib/gcc/x86_64-pc-linux-gnu/15.0.0/plugin/include/diagnostic.h:24,
from 
/home/bitten/builds/oCyPvWN6/1/perftools/cicd/scorep/src/build-gcc-plugin/../src/adapters/compiler/gcc-plugin/scorep_plugin_inst_descriptor.cpp:43:
/home/bitten/opt/gcc-15-20240602/lib/gcc/x86_64-pc-linux-gnu/15.0.0/plugin/include/rich-location.h:25:10:
fatal error: label-text.h: No such file or directory
25 | #include "label-text.h"
| ^~
compilation terminated.

Best,
Bert

>
>
>  TAGS: $(TAGS_SOURCES)
> diff --git a/libcpp/include/label-text.h b/libcpp/include/label-text.h
> new file mode 100644
> index ..13562cda41f9
> --- /dev/null
> +++ b/libcpp/include/label-text.h
> @@ -0,0 +1,102 @@
> +/* A very simple string class.
> +   Copyright (C) 2015-2024 Free Software Foundation, Inc.
> +
> +This program is free software; you can redistribute it and/or modify it
> +under the terms of the GNU General Public License as published by the
> +Free Software Foundation; either version 3, or (at your option) any
> +later version.
> +
> +This program is distributed in the hope that it will be useful,
> +but WITHOUT ANY WARRANTY; without even the implied warranty of
> +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +GNU General Public License for more details.
> +
> +You should have received a copy of the GNU General Public License
> +along with this program; see the file COPYING3.  If not see
> +.
> +
> + In other words, you are welcome to use, share and improve this program.
> + You are forbidden to forbid anyone else to use, share and improve
> + what you give them.   Help stamp out software-hoarding!  */
> +
> +#ifndef LIBCPP_LABEL_TEXT_H
> +#define LIBCPP_LABEL_TEXT_H
> +
> +/* A struct for the result of range_label::get_text: a NUL-terminated buffer
> +   of localized text, and a flag to determine if the caller should "free" the
> +   buffer.  */
> +
> +class label_text
> +{
> +public:
> +  label_text ()
> +  : m_buffer (NULL), m_owned (false)
> +  {}
> +
> +  ~label_text ()
> +  {
> +if (m_owned)
> +  free (m_buffer);
> +  }
> +
> +  /* Move ctor.  */
> +  label_text (label_text &&other)
> +  : m_buffer (other.m_buffer), m_owned (other.m_owned)
> +  {
> +other.release ();
> +  }
> +
> +  /* Move assignment.  */
> +  label_text & operator= (label_text &&other)
> +  {
> +if (m_owned)
> +  free (m_buffer);
> +m_buffer = other.m_buffer;
> +m_owned = other.m_owned;
> +other.release ();
> +return *this;
> +  }
> +
> +  /* Delete the copy ctor and copy-assignment operator.  */
> +  label_text (const label_text &) = delete;
> +  label_text & operator= (const label_text &) = delete;
> +
> +  /* Create a label_text instance that borrows BUFFER from a
> + longer-lived owner.  */
> +  static label_text borrow (const char *buffer)
> +  {
> +return label_text (const_cast  (buffer), false);
> +  }
> +
> +  /* Create a label_text instance that takes ownership of BUFFER.  */
> +  static label_text take (char *buffer)
> +  {
> +return label_text (buffer, true);
> +  }
> +
> +  void release ()
> +  {
> +m_buffer = NULL;
> +m_owned = false;
> +  }
> +
> +  const char *get () const
> +  {
> +return m_buffer;
> +  }
> +
> +  bool is_owner () const
> +  {
> +return m_owned;
> +  }
> +
> +private:
> +  char *m_buffer;
> +  bool m_owned;
> +
> +  label_text (char *buffer, bool owned)
> +  : m_buffer (buffer), m_owned (owned)
> +  {}
> +};
> +
> +#endif /* !LIBCPP_LABEL_TEXT_H  */
> diff --git a/libcpp/include/rich-location.h b/libcpp/include/rich-location

Re: [PATCH] aarch64: Add fix_truncv4sfv4hi2 pattern [PR113882]

2024-06-06 Thread Richard Biener
On Thu, 6 Jun 2024, Richard Sandiford wrote:

> Pengxuan Zheng  writes:
> > This patch adds the fix_truncv4sfv4hi2 (V4SF->V4HI) pattern which is 
> > implemented
> > using fix_truncv4sfv4si2 (V4SF->V4SI) and then truncv4siv4hi2 (V4SI->V4HI).
> >
> > PR target/113882
> >
> > gcc/ChangeLog:
> >
> > * config/aarch64/aarch64-simd.md (fix_truncv4sfv4hi2): New pattern.
> 
> Could we handle this by extending the target-independent code instead?
> Richard mentioned in comment 1 that the current set of intermediate
> conversions is hard-coded, but it didn't sound like he was implying that
> the set shouldn't change.

Yes, much like non-SLP uses supportable_narrowing_operation with any
number of intermediate conversions the SLP case should do something
similar.

Richard.

> Thanks,
> Richard
> 
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/aarch64/fix_trunc2.c: New test.
> >
> > Signed-off-by: Pengxuan Zheng 
> > ---
> >  gcc/config/aarch64/aarch64-simd.md| 13 +
> >  gcc/testsuite/gcc.target/aarch64/fix_trunc2.c | 14 ++
> >  2 files changed, 27 insertions(+)
> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/fix_trunc2.c
> >
> > diff --git a/gcc/config/aarch64/aarch64-simd.md 
> > b/gcc/config/aarch64/aarch64-simd.md
> > index 868f4486218..096f7b56a27 100644
> > --- a/gcc/config/aarch64/aarch64-simd.md
> > +++ b/gcc/config/aarch64/aarch64-simd.md
> > @@ -3032,6 +3032,19 @@ (define_expand 
> > "2"
> >"TARGET_SIMD"
> >{})
> >  
> > +
> > +(define_expand "fix_truncv4sfv4hi2"
> > +  [(match_operand:V4HI 0 "register_operand")
> > +   (match_operand:V4SF 1 "register_operand")]
> > +  "TARGET_SIMD"
> > +  {
> > +rtx tmp = gen_reg_rtx (V4SImode);
> > +emit_insn (gen_fix_truncv4sfv4si2 (tmp, operands[1]));
> > +emit_insn (gen_truncv4siv4hi2 (operands[0], tmp));
> > +DONE;
> > +  }
> > +)
> > +
> >  (define_expand "ftrunc2"
> >[(set (match_operand:VHSDF 0 "register_operand")
> > (unspec:VHSDF [(match_operand:VHSDF 1 "register_operand")]
> > diff --git a/gcc/testsuite/gcc.target/aarch64/fix_trunc2.c 
> > b/gcc/testsuite/gcc.target/aarch64/fix_trunc2.c
> > new file mode 100644
> > index 000..57cc00913a3
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/aarch64/fix_trunc2.c
> > @@ -0,0 +1,14 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2" } */
> > +
> > +void
> > +f (short *__restrict a, float *__restrict b)
> > +{
> > +  a[0] = b[0];
> > +  a[1] = b[1];
> > +  a[2] = b[2];
> > +  a[3] = b[3];
> > +}
> > +
> > +/* { dg-final { scan-assembler-times {fcvtzs\tv[0-9]+.4s, v[0-9]+.4s} 1 } 
> > } */
> > +/* { dg-final { scan-assembler-times {xtn\tv[0-9]+.4h, v[0-9]+.4s} 1 } } */
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


[PATCH v7] Match: Support more form for scalar unsigned SAT_ADD

2024-06-06 Thread pan2 . li
From: Pan Li 

After we support one gassign form of the unsigned .SAT_ADD,  we
would like to support more forms including both the branch and
branchless.  There are 5 other forms of .SAT_ADD,  list as below:

Form 1:
  #define SAT_ADD_U_1(T) \
  T sat_add_u_1_##T(T x, T y) \
  { \
return (T)(x + y) >= x ? (x + y) : -1; \
  }

Form 2:
  #define SAT_ADD_U_2(T) \
  T sat_add_u_2_##T(T x, T y) \
  { \
T ret; \
T overflow = __builtin_add_overflow (x, y, &ret); \
return (T)(-overflow) | ret; \
  }

Form 3:
  #define SAT_ADD_U_3(T) \
  T sat_add_u_3_##T (T x, T y) \
  { \
T ret; \
return __builtin_add_overflow (x, y, &ret) ? -1 : ret; \
  }

Form 4:
  #define SAT_ADD_U_4(T) \
  T sat_add_u_4_##T (T x, T y) \
  { \
T ret; \
return __builtin_add_overflow (x, y, &ret) == 0 ? ret : -1; \
  }

Form 5:
  #define SAT_ADD_U_5(T) \
  T sat_add_u_5_##T(T x, T y) \
  { \
return (T)(x + y) < x ? -1 : (x + y); \
  }

Take the forms 3 of above as example:

uint64_t
sat_add (uint64_t x, uint64_t y)
{
  uint64_t ret;
  return __builtin_add_overflow (x, y, &ret) ? -1 : ret;
}

Before this patch:
uint64_t sat_add (uint64_t x, uint64_t y)
{
  long unsigned int _1;
  long unsigned int _2;
  uint64_t _3;
  __complex__ long unsigned int _6;

;;   basic block 2, loop depth 0
;;pred:   ENTRY
  _6 = .ADD_OVERFLOW (x_4(D), y_5(D));
  _2 = IMAGPART_EXPR <_6>;
  if (_2 != 0)
goto ; [35.00%]
  else
goto ; [65.00%]
;;succ:   4
;;3

;;   basic block 3, loop depth 0
;;pred:   2
  _1 = REALPART_EXPR <_6>;
;;succ:   4

;;   basic block 4, loop depth 0
;;pred:   3
;;2
  # _3 = PHI <_1(3), 18446744073709551615(2)>
  return _3;
;;succ:   EXIT
}

After this patch:
uint64_t sat_add (uint64_t x, uint64_t y)
{
  long unsigned int _12;

;;   basic block 2, loop depth 0
;;pred:   ENTRY
  _12 = .SAT_ADD (x_4(D), y_5(D)); [tail call]
  return _12;
;;succ:   EXIT
}

The flag '^' acts on cond_expr will generate matching code similar as below:

else if (gphi *_a1 = dyn_cast  (_d1))
  {
basic_block _b1 = gimple_bb (_a1);
if (gimple_phi_num_args (_a1) == 2)
  {
basic_block _pb_0_1 = EDGE_PRED (_b1, 0)->src;
basic_block _pb_1_1 = EDGE_PRED (_b1, 1)->src;
basic_block _db_1 = safe_dyn_cast  (*gsi_last_bb (_pb_0_1))
? _pb_0_1 : _pb_1_1;
basic_block _other_db_1 = safe_dyn_cast  (*gsi_last_bb 
(_pb_0_1))
  ? _pb_1_1 : _pb_0_1;
gcond *_ct_1 = safe_dyn_cast  (*gsi_last_bb (_db_1));
if (_ct_1 && EDGE_COUNT (_other_db_1->preds) == 1
  && EDGE_COUNT (_other_db_1->succs) == 1
  && EDGE_PRED (_other_db_1, 0)->src == _db_1)
  {
tree _cond_lhs_1 = gimple_cond_lhs (_ct_1);
tree _cond_rhs_1 = gimple_cond_rhs (_ct_1);
tree _p0 = build2 (gimple_cond_code (_ct_1), boolean_type_node,
   _cond_lhs_1, _cond_rhs_1);
bool _arg_0_is_true_1 = gimple_phi_arg_edge (_a1, 0)->flags & 
EDGE_TRUE_VALUE;
tree _p1 = gimple_phi_arg_def (_a1, _arg_0_is_true_1 ? 0 : 1);
tree _p2 = gimple_phi_arg_def (_a1, _arg_0_is_true_1 ? 1 : 0);


The below test suites are passed for this patch.
* The x86 bootstrap test.
* The x86 fully regression test.
* The riscv fully regression test.

gcc/ChangeLog:

* doc/match-and-simplify.texi: Add doc for the matching flag '^'.
* genmatch.cc (cmp_operand): Add match_phi comparation.
(dt_node::gen_kids_1): Add cond_expr bool flag for phi match.
(dt_operand::gen_phi_on_cond): Add new func to gen phi matching
on cond_expr.
(parser::parse_expr): Add handling for the expr flag '^'.
* match.pd: Add more form for unsigned .SAT_ADD.
* tree-ssa-math-opts.cc (build_saturation_binary_arith_call): Add
new func impl to build call for phi gimple.
(match_unsigned_saturation_add): Add new func impl to match the
.SAT_ADD for phi gimple.
(math_opts_dom_walker::after_dom_children): Add phi matching
try for all gimple phi stmt.

Signed-off-by: Pan Li 
---
 gcc/doc/match-and-simplify.texi |  16 
 gcc/genmatch.cc | 126 +++-
 gcc/match.pd|  43 ++-
 gcc/tree-ssa-math-opts.cc   |  56 +-
 4 files changed, 236 insertions(+), 5 deletions(-)

diff --git a/gcc/doc/match-and-simplify.texi b/gcc/doc/match-and-simplify.texi
index 01f19e2f62c..63d5af159f5 100644
--- a/gcc/doc/match-and-simplify.texi
+++ b/gcc/doc/match-and-simplify.texi
@@ -361,6 +361,22 @@ Usually the types of the generated result expressions are
 determined from the context, but sometimes like in the above case
 it is required that you specify them explicitly.
 
+Another modifier for generated expressions is @code{^} which
+tells the machinery to try m

[PATCH, OpenACC 2.7, v2] Implement reductions for arrays and structs

2024-06-06 Thread Chung-Lin Tang
Hi Thomas,
This is v2 of the C/C++/middle-end parts of array/struct
support for OpenACC reductions.

The main changes are much fixed support for sub-arrays,
and some new testcases.

Tested on mainline using x86_64 host and nvptx/amdgcn offloading.
Will backport to upcoming omp/devel/gcc-14 branch after approved for mainline.

Thanks,
Chung-Lin

2024-06-06  Chung-Lin Tang  

gcc/c/ChangeLog:

* c-parser.cc (c_parser_omp_clause_reduction): Adjustments for
OpenACC-specific cases.
* c-typeck.cc (c_oacc_reduction_defined_type_p): New function.
(c_oacc_reduction_code_name): Likewise.
(c_finish_omp_clauses): Handle OpenACC cases using new functions.

gcc/cp/ChangeLog:

* parser.cc (cp_parser_omp_clause_reduction): Adjustments for
OpenACC-specific cases.
* semantics.cc (cp_oacc_reduction_defined_type_p): New function.
(cp_oacc_reduction_code_name): Likewise.
(finish_omp_reduction_clause): Handle OpenACC cases using new
functions.

gcc/ChangeLog:

* config/gcn/gcn-tree.cc (gcn_reduction_update): Additions for
handling ARRAY_TYPE and RECORD_TYPE reductions.
(gcn_goacc_reduction_setup): Likewise.
(gcn_goacc_reduction_init): Likewise.
(gcn_goacc_reduction_fini): Likewise.
(gcn_goacc_reduction_teardown): Likewise.

* config/nvptx/nvptx.cc (nvptx_gen_shuffle): Properly generate
V2SI shuffle using vec_extract op.
(nvptx_get_shared_red_addr): Adjust type/alignment calculations to
use TYPE_SIZE/ALIGN_UNIT instead of machine mode based.
(nvptx_reduction_update): Additions for handling ARRAY_TYPE and
RECORD_TYPE reductions.
(nvptx_goacc_reduction_setup): Likewise.
(nvptx_goacc_reduction_init): Likewise.
(nvptx_goacc_reduction_fini): Likewise.
(nvptx_goacc_reduction_teardown): Likewise.

* gimplify.cc (gimplify_scan_omp_clauses): Sanity checking for
supported array reduction cases.
(gimplify_adjust_omp_clauses): Peel away array MEM_REF for decl lookup.

* omp-low.cc (scan_sharing_clauses): Adjust ARRAY_REF pointer type
building to use decl type, rather than generic ptr_type_node.
(omp_reduction_init_op): Add ARRAY_TYPE and RECORD_TYPE init op
construction.
(lower_rec_input_clauses): Set OMP_CLAUSE_REDUCTION_PRIVATE_EXPR.
(oacc_array_reduction_bias): New function.
(lower_oacc_reductions): Add code to teardown/recover array access
MEM_REF in OMP_CLAUSE_DECL, to accomodate for lookup requirements.
Use OMP_CLAUSE_REDUCTION_PRIVATE_EXPR as reduction private copy if set.
Handle array reductions using new oacc_array_reduction_bias function.
Adjust type/alignment calculations to use TYPE_SIZE/ALIGN_UNIT
instead of machine mode based.

* omp-oacc-neuter-broadcast.cc (worker_single_copy):
Add 'hash_set *array_reduction_base_vars' parameter.
Add xxx.

(neuter_worker_single): Add 'hash_set *array_reduction_base_vars'
parameter. Adjust recursive calls to self and worker_single_copy.
(oacc_do_neutering): Add 'hash_set *array_reduction_base_vars'
parameter. Adjust call to neuter_worker_single.
(execute_omp_oacc_neuter_broadcast): Add local
'hash_set array_reduction_base_vars' declaration. Collect MEM_REF
base-pointer SSA_NAMEs of arrays into array_reduction_base_vars. Add
'&array_reduction_base_vars' argument to call of oacc_do_neutering.

* omp-offload.cc (default_goacc_reduction): Add unshare_expr.

* tree.cc (omp_clause_num_ops): Increase OMP_CLAUSE_REDUCTION ops to 6.
* tree.h (OMP_CLAUSE_REDUCTION_PRIVATE_EXPR): New macro.

gcc/testsuite/ChangeLog:

* c-c++-common/goacc/reduction-9.c: New test.
* c-c++-common/goacc/reduction-10.c: New test.
* c-c++-common/goacc/reduction-11.c: New test.
* c-c++-common/goacc/reduction-12.c: New test.
* c-c++-common/goacc/reduction-13.c: New test.
* c-c++-common/goacc/reduction-14.c: New test.

libgomp/ChangeLog:

* testsuite/libgomp.oacc-c-c++-common/reduction.h
(check_reduction_array_xx): New macro.
(operator_apply): Likewise.
(check_reduction_array_op): Likewise.
(check_reduction_arraysec_op): Likewise.
(function_apply): Likewise.
(check_reduction_array_macro): Likewise.
(check_reduction_arraysec_macro): Likewise.
(check_reduction_xxx_xx_all): Likewise.
* testsuite/libgomp.oacc-c-c++-common/reduction-arrays-1.c: New test.
* testsuite/libgomp.oacc-c-c++-common/reduction-arrays-2.c: New test.
* testsuite/libgomp.oacc-c-c++-common/reduction-arrays-3.c: New test.
* testsuite/libgomp.oacc-c-c++-common/reduction-structs-1.c: New test.
diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
index 2d9e9c0969f..61991a218f8 100

[PATCH] RISC-V: Handle non-grouped stores as single-lane SLP

2024-06-06 Thread Richard Biener
The following enables single-lane loop SLP discovery for non-grouped stores
and adjusts vectorizable_store to properly handle those.

For gfortran.dg/vect/vect-8.f90 we vectorize one additional loop,
not running into the "not falling back to strided accesses" bail-out.
I have not investigated in detail.  Similar for gcc.dg/vect/slp-19c.c.

The gcc.dg/vect/O3-pr39675-2.c and gcc.dg/vect/slp-19[abc].c SLPs
depend on the load permute lowering as the single-lane store we
now want to handle is fed from a single lane from groups of size four.
I've updated the expected number of SLPs but they FAIL.

For gfortran.dg/vect/fast-math-mgrid-resid.f predictive commoning
now unrolls the loop, the vectorization factor is the same.  I think
association during SLP build might be the reason for the difference.

There is a set of i386 target assembler test FAILs,
gcc.target/i386/pr88531-2[bc].c in particular fail because the
target cannot identify SLP emulated gathers, see another mail from me.
Others need adjustment, I've adjusted one with this patch only.

I'm probably delaying this a bit until the load permute lowering
is good enough for pushing.

* tree-vect-slp.cc (vect_analyze_slp): Perform single-lane
loop SLP discovery for non-grouped stores.
* tree-vect-stmts.cc (vectorizable_store): Always set
vec_num for SLP.

* gcc.dg/vect/O3-pr39675-2.c: Adjust expected number of SLP.
* gcc.dg/vect/fast-math-vect-call-1.c: Likewise.
* gcc.dg/vect/no-scevccp-slp-31.c: Likewise.
* gcc.dg/vect/slp-12b.c: Likewise.
* gcc.dg/vect/slp-12c.c: Likewise.
* gcc.dg/vect/slp-19a.c: Likewise.
* gcc.dg/vect/slp-19b.c: Likewise.
* gcc.dg/vect/slp-19c.c: Likewise.
* gcc.dg/vect/slp-4-big-array.c: Likewise.
* gcc.dg/vect/slp-4.c: Likewise.
* gcc.dg/vect/slp-5.c: Likewise.
* gcc.dg/vect/slp-7.c: Likewise.
* gcc.dg/vect/slp-perm-7.c: Likewise.
* gcc.dg/vect/slp-37.c: Likewise.
* gcc.dg/vect/vect-outer-slp-3.c: Disable vectorization of
initialization loop.
* gcc.dg/vect/slp-reduc-5.c: Likewise.
* gcc.dg/vect/no-scevccp-outer-12.c: Un-XFAIL.  SLP can handle
inner loop inductions with multiple vector stmt copies.
* gfortran.dg/vect/vect-8.f90: Adjust expected number of
vectorized loops.
* gfortran.dg/vect/fast-math-mgrid-resid.f: Expect predictive
commoning with unrolling.
* gcc.target/i386/vectorize1.c: Adjust what we scan for.
---
 gcc/testsuite/gcc.dg/vect/O3-pr39675-2.c  |  2 +-
 .../gcc.dg/vect/fast-math-vect-call-1.c   |  2 +-
 .../gcc.dg/vect/no-scevccp-outer-12.c |  3 +--
 gcc/testsuite/gcc.dg/vect/no-scevccp-slp-31.c |  5 ++--
 gcc/testsuite/gcc.dg/vect/slp-12b.c   |  2 +-
 gcc/testsuite/gcc.dg/vect/slp-12c.c   |  2 +-
 gcc/testsuite/gcc.dg/vect/slp-19a.c   |  2 +-
 gcc/testsuite/gcc.dg/vect/slp-19b.c   |  2 +-
 gcc/testsuite/gcc.dg/vect/slp-19c.c   |  4 ++--
 gcc/testsuite/gcc.dg/vect/slp-37.c|  2 +-
 gcc/testsuite/gcc.dg/vect/slp-4-big-array.c   |  2 +-
 gcc/testsuite/gcc.dg/vect/slp-4.c |  2 +-
 gcc/testsuite/gcc.dg/vect/slp-5.c |  2 +-
 gcc/testsuite/gcc.dg/vect/slp-7.c |  4 ++--
 gcc/testsuite/gcc.dg/vect/slp-perm-7.c|  4 ++--
 gcc/testsuite/gcc.dg/vect/slp-reduc-5.c   |  3 ++-
 gcc/testsuite/gcc.dg/vect/vect-outer-slp-3.c  |  1 +
 gcc/testsuite/gcc.target/i386/vectorize1.c|  4 ++--
 .../gfortran.dg/vect/fast-math-mgrid-resid.f  |  2 +-
 gcc/testsuite/gfortran.dg/vect/vect-8.f90 |  2 +-
 gcc/tree-vect-slp.cc  | 23 +++
 gcc/tree-vect-stmts.cc| 11 +
 22 files changed, 57 insertions(+), 29 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/vect/O3-pr39675-2.c 
b/gcc/testsuite/gcc.dg/vect/O3-pr39675-2.c
index c3f0f6dc1be..ddaac56cc0b 100644
--- a/gcc/testsuite/gcc.dg/vect/O3-pr39675-2.c
+++ b/gcc/testsuite/gcc.dg/vect/O3-pr39675-2.c
@@ -27,5 +27,5 @@ foo ()
 }
 
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target 
vect_strided4 } } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { 
target vect_strided4 } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { 
target vect_strided4 } } } */
   
diff --git a/gcc/testsuite/gcc.dg/vect/fast-math-vect-call-1.c 
b/gcc/testsuite/gcc.dg/vect/fast-math-vect-call-1.c
index ad22f6e82b3..6c9b7c37b6e 100644
--- a/gcc/testsuite/gcc.dg/vect/fast-math-vect-call-1.c
+++ b/gcc/testsuite/gcc.dg/vect/fast-math-vect-call-1.c
@@ -101,4 +101,4 @@ main ()
 }
 
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 4 "vect" { target { 
vect_call_copysignf && vect_call_sqrtf } } } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 3 "vect" { 
target { { vect_call_copysignf && vect_call_sqrtf }

Re: [patch, rs6000, middle-end 0/1] v1: Add implementation for different targets for pair mem fusion

2024-06-06 Thread Ajit Agarwal
Hello Richard:

On 06/06/24 2:28 pm, Richard Sandiford wrote:
> Hi,
> 
> Just some comments on the fuseable_load_p part, since that's what
> we were discussing last time.
> 
> It looks like this now relies on:
> 
> Ajit Agarwal  writes:
>> +  /* We use DF data flow because we change location rtx
>> + which is easier to find and modify.
>> + We use mix of rtl-ssa def-use and DF data flow
>> + where it is easier.  */
>> +  df_chain_add_problem (DF_DU_CHAIN | DF_UD_CHAIN);
>> +  df_analyze ();
>> +  df_set_flags (DF_DEFER_INSN_RESCAN);
> 
> But please don't do this!  For one thing, building DU/UD chains
> as well as rtl-ssa is really expensive in terms of compile time.
> But more importantly, modifications need to happen via rtl-ssa
> to ensure that the IL is kept up-to-date.  If we don't do that,
> later fuse attempts will be based on stale data and so could
> generate incorrect code.
> 

Sure I have made changes to use only rtl-ssa and not to use
UD/DU chains. I will send the changes in separate subsequent
patch.

>> +// Check whether load can be fusable or not.
>> +// Return true if fuseable otherwise false.
>> +bool
>> +rs6000_pair_fusion::fuseable_load_p (insn_info *info)
>> +{
>> +  for (auto def : info->defs())
>> +{
>> +  auto set = dyn_cast (def);
>> +  for (auto use1 : set->nondebug_insn_uses ())
>> +use1->set_is_live_out_use (true);
>> +}
> 
> What was the reason for adding this loop?
>

The purpose of adding is to avoid assert failure in gcc/rtl-ssa/changes.cc:252

 
>> +
>> +  rtx_insn *rtl_insn = info ->rtl ();
>> +  rtx body = PATTERN (rtl_insn);
>> +  rtx dest_exp = SET_DEST (body);
>> +
>> +  if (REG_P (dest_exp) &&
>> +  (DF_REG_DEF_COUNT (REGNO (dest_exp)) > 1
> 
> The rtl-ssa way of checking this is:
> 
>   crtl->ssa->is_single_dominating_def (...)
> 
>> +   || DF_REG_EQ_USE_COUNT (REGNO (dest_exp)) > 0))
>> +return  false;
> 
> Why are uses in notes a problem?  In the worst case, we should just be
> able to remove the note instead.
>

We can remove this and its no more required. I will make this
change in subsequent patches.
 
>> +
>> +  rtx addr = XEXP (SET_SRC (body), 0);
>> +
>> +  if (GET_CODE (addr) == PLUS
>> +  && XEXP (addr, 1) && CONST_INT_P (XEXP (addr, 1)))
>> +{
>> +  if (INTVAL (XEXP (addr, 1)) == -16)
>> +return false;
>> +  }
> 
> What's special about -16?
> 

The tests like libgomp/for-8 fails with fused load with offset -16 and 0.
Thats why I have added this check.


>> +
>> +  df_ref use;
>> +  df_insn_info *insn_info = DF_INSN_INFO_GET (info->rtl ());
>> +  FOR_EACH_INSN_INFO_DEF (use, insn_info)
>> +{
>> +  struct df_link *def_link = DF_REF_CHAIN (use);
>> +
>> +  if (!def_link || !def_link->ref
>> +  || DF_REF_IS_ARTIFICIAL (def_link->ref))
>> +continue;
>> +
>> +  while (def_link && def_link->ref)
>> +{
>> +  rtx_insn *insn = DF_REF_INSN (def_link->ref);
>> +  if (GET_CODE (PATTERN (insn)) == PARALLEL)
>> +return false;
> 
> Why do you need to skip PARALLELs?
>

vec_select with parallel give failures final.cc "can't split-up with subreg 128 
(reg OO"
Thats why I have added this.

 
>> +
>> +  rtx set = single_set (insn);
>> +  if (set == NULL_RTX)
>> +return false;
>> +
>> +  rtx op0 = SET_SRC (set);
>> +  rtx_code code = GET_CODE (op0);
>> +
>> +  // This check is added as register pairs are not generated
>> +  // by RA for neg:V2DF (fma: V2DF (reg1)
>> +  //  (reg2)
>> +  //  (neg:V2DF (reg3)))
>> +  if (GET_RTX_CLASS (code) == RTX_UNARY)
>> +return false;
> 
> What's special about (neg (fma ...))?
>

I am not sure why register allocator fails allocating register pairs with
NEG Unary operation with fma operand. I have not debugged register allocator 
why the NEG
Unary operation with fma operand. 
 
>> +
>> +  def_link = def_link->next;
>> +}
>> + }
>> +  return true;
>> +}
> 
> Thanks,
> Richard

Thanks & Regards
Ajit


[PATCH] testsuite: go: Require split-stack support for go.test/test/index0.go [PR87589]

2024-06-06 Thread Rainer Orth
The index0-out.go test FAILs on Solaris (SPARC and x86, 32 and 64-bit),
as well as several others:

FAIL: ./index0-out.go execution,  -O0 -g -fno-var-tracking-assignments 

The test SEGVs because it tries a stack acess way beyond the stack
area.  As Ian analyzed in the PR, the testcase currently requires
split-stack support, so this patch requires just that.

Tested on i386-pc-solaris2.11 and sparc-sun-solaris2.11.

Ok for trunk?

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


2024-06-05  Rainer Orth  

gcc/testsuite:
PR go/87589
* go.test/go-test.exp (go-gc-tests): Require split-stack support
for index0.go.

# HG changeset patch
# Parent  f29c8ac19b89d7ed6c5d957b99e03c8a387f6c31
testsuite: go: Require split-stack support for go.test/test/index0.go [PR87589]

diff --git a/gcc/testsuite/go.test/go-test.exp b/gcc/testsuite/go.test/go-test.exp
--- a/gcc/testsuite/go.test/go-test.exp
+++ b/gcc/testsuite/go.test/go-test.exp
@@ -477,7 +477,8 @@ proc go-gc-tests { } {
 	if { ( [file tail $test] == "select2.go" \
 		   || [file tail $test] == "stack.go" \
 		   || [file tail $test] == "peano.go" \
-		   || [file tail $test] == "nilptr2.go" ) \
+		   || [file tail $test] == "nilptr2.go" \
+		   || [file tail $test] == "index0.go" ) \
 		 && ! [check_effective_target_split_stack] } {
 	# These tests fails on targets without split stack.
 	untested $name


[PATCH] Add SLP_TREE_MEMORY_ACCESS_TYPE

2024-06-06 Thread Richard Biener
It turns out target costing code looks at STMT_VINFO_MEMORY_ACCESS_TYPE
to identify operations from (emulated) gathers for example.  This
doesn't work for SLP loads since we do not set STMT_VINFO_MEMORY_ACCESS_TYPE
there as the vectorization strathegy might differ between different
stmt uses.  It seems we got away with setting it for stores though.
The following adds a memory_access_type field to slp_tree and sets it
from load and store vectorization code.  All the costing doesn't record
the SLP node (that was only done selectively for some corner case).  The
costing is really in need of a big overhaul, the following just massages
the two relevant ops to fix gcc.dg/target/pr88531-2[bc].c FAILs when
switching on SLP for non-grouped stores.  In particular currently
we either have a SLP node or a stmt_info in the cost hook but not both.

So the following is a hack(?).  Other targets look possibly affected as
well.  I do want to postpone rewriting all of the costing to after
all-SLP.

Any comments?

* tree-vectorizer.h (_slp_tree::memory_access_type): Add.
(SLP_TREE_MEMORY_ACCESS_TYPE): New.
(record_stmt_cost): Add another overload.
* tree-vect-slp.cc (_slp_tree::_slp_tree): Initialize
memory_access_type.
* tree-vect-stmts.cc (vectorizable_store): Set
SLP_TREE_MEMORY_ACCESS_TYPE.
(vectorizable_load): Likewise.  Also record the SLP node
when costing emulated gather offset decompose and vector
composition.
* config/i386/i386.cc (ix86_vector_costs::add_stmt_cost): Also
recognize SLP emulated gather/scatter.
---
 gcc/config/i386/i386.cc |  22 ++---
 gcc/tree-vect-slp.cc|   1 +
 gcc/tree-vect-stmts.cc  |  16 +--
 gcc/tree-vectorizer.h   | 102 
 4 files changed, 91 insertions(+), 50 deletions(-)

diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 4126ab24a79..32ecf31d8d1 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -25150,13 +25150,21 @@ ix86_vector_costs::add_stmt_cost (int count, 
vect_cost_for_stmt kind,
  (AGU and load ports).  Try to account for this by scaling the
  construction cost by the number of elements involved.  */
   if ((kind == vec_construct || kind == vec_to_scalar)
-  && stmt_info
-  && (STMT_VINFO_TYPE (stmt_info) == load_vec_info_type
- || STMT_VINFO_TYPE (stmt_info) == store_vec_info_type)
-  && ((STMT_VINFO_MEMORY_ACCESS_TYPE (stmt_info) == VMAT_ELEMENTWISE
-  && (TREE_CODE (DR_STEP (STMT_VINFO_DATA_REF (stmt_info)))
-  != INTEGER_CST))
- || STMT_VINFO_MEMORY_ACCESS_TYPE (stmt_info) == VMAT_GATHER_SCATTER))
+  && ((stmt_info
+  && (STMT_VINFO_TYPE (stmt_info) == load_vec_info_type
+  || STMT_VINFO_TYPE (stmt_info) == store_vec_info_type)
+  && ((STMT_VINFO_MEMORY_ACCESS_TYPE (stmt_info) == VMAT_ELEMENTWISE
+   && (TREE_CODE (DR_STEP (STMT_VINFO_DATA_REF (stmt_info)))
+   != INTEGER_CST))
+  || (STMT_VINFO_MEMORY_ACCESS_TYPE (stmt_info)
+  == VMAT_GATHER_SCATTER)))
+ || (node
+ && ((SLP_TREE_MEMORY_ACCESS_TYPE (node) == VMAT_ELEMENTWISE
+ && (TREE_CODE (DR_STEP (STMT_VINFO_DATA_REF
+   (SLP_TREE_REPRESENTATIVE (node
+ != INTEGER_CST))
+ || (SLP_TREE_MEMORY_ACCESS_TYPE (node)
+ == VMAT_GATHER_SCATTER)
 {
   stmt_cost = ix86_builtin_vectorization_cost (kind, vectype, misalign);
   stmt_cost *= (TYPE_VECTOR_SUBPARTS (vectype) + 1);
diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index e1e47b786c2..c359e8a0bbc 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -122,6 +122,7 @@ _slp_tree::_slp_tree ()
   SLP_TREE_CODE (this) = ERROR_MARK;
   SLP_TREE_VECTYPE (this) = NULL_TREE;
   SLP_TREE_REPRESENTATIVE (this) = NULL;
+  SLP_TREE_MEMORY_ACCESS_TYPE (this) = VMAT_INVARIANT;
   SLP_TREE_REF_COUNT (this) = 1;
   this->failed = NULL;
   this->max_nunits = 1;
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index bd7dd149d11..8049c458136 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -8316,6 +8316,8 @@ vectorizable_store (vec_info *vinfo,
   if (costing_p) /* transformation not required.  */
 {
   STMT_VINFO_MEMORY_ACCESS_TYPE (stmt_info) = memory_access_type;
+  if (slp_node)
+   SLP_TREE_MEMORY_ACCESS_TYPE (slp_node) = memory_access_type;
 
   if (loop_vinfo
  && LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo))
@@ -8356,7 +8358,10 @@ vectorizable_store (vec_info *vinfo,
  && first_stmt_info != stmt_info)
return true;
 }
-  gcc_assert (memory_access_type == STMT_VINFO_MEMORY_ACCESS_TYPE (stmt_info));
+  if (slp_node)
+gcc_assert (memory_access_type == SLP_TREE_MEMORY_ACCESS_TYPE (stmt_info));
+  else
+gcc_assert (

Re: [PATCH v7] Match: Support more form for scalar unsigned SAT_ADD

2024-06-06 Thread Richard Biener
On Thu, Jun 6, 2024 at 3:37 PM  wrote:
>
> From: Pan Li 
>
> After we support one gassign form of the unsigned .SAT_ADD,  we
> would like to support more forms including both the branch and
> branchless.  There are 5 other forms of .SAT_ADD,  list as below:
>
> Form 1:
>   #define SAT_ADD_U_1(T) \
>   T sat_add_u_1_##T(T x, T y) \
>   { \
> return (T)(x + y) >= x ? (x + y) : -1; \
>   }
>
> Form 2:
>   #define SAT_ADD_U_2(T) \
>   T sat_add_u_2_##T(T x, T y) \
>   { \
> T ret; \
> T overflow = __builtin_add_overflow (x, y, &ret); \
> return (T)(-overflow) | ret; \
>   }
>
> Form 3:
>   #define SAT_ADD_U_3(T) \
>   T sat_add_u_3_##T (T x, T y) \
>   { \
> T ret; \
> return __builtin_add_overflow (x, y, &ret) ? -1 : ret; \
>   }
>
> Form 4:
>   #define SAT_ADD_U_4(T) \
>   T sat_add_u_4_##T (T x, T y) \
>   { \
> T ret; \
> return __builtin_add_overflow (x, y, &ret) == 0 ? ret : -1; \
>   }
>
> Form 5:
>   #define SAT_ADD_U_5(T) \
>   T sat_add_u_5_##T(T x, T y) \
>   { \
> return (T)(x + y) < x ? -1 : (x + y); \
>   }
>
> Take the forms 3 of above as example:
>
> uint64_t
> sat_add (uint64_t x, uint64_t y)
> {
>   uint64_t ret;
>   return __builtin_add_overflow (x, y, &ret) ? -1 : ret;
> }
>
> Before this patch:
> uint64_t sat_add (uint64_t x, uint64_t y)
> {
>   long unsigned int _1;
>   long unsigned int _2;
>   uint64_t _3;
>   __complex__ long unsigned int _6;
>
> ;;   basic block 2, loop depth 0
> ;;pred:   ENTRY
>   _6 = .ADD_OVERFLOW (x_4(D), y_5(D));
>   _2 = IMAGPART_EXPR <_6>;
>   if (_2 != 0)
> goto ; [35.00%]
>   else
> goto ; [65.00%]
> ;;succ:   4
> ;;3
>
> ;;   basic block 3, loop depth 0
> ;;pred:   2
>   _1 = REALPART_EXPR <_6>;
> ;;succ:   4
>
> ;;   basic block 4, loop depth 0
> ;;pred:   3
> ;;2
>   # _3 = PHI <_1(3), 18446744073709551615(2)>
>   return _3;
> ;;succ:   EXIT
> }
>
> After this patch:
> uint64_t sat_add (uint64_t x, uint64_t y)
> {
>   long unsigned int _12;
>
> ;;   basic block 2, loop depth 0
> ;;pred:   ENTRY
>   _12 = .SAT_ADD (x_4(D), y_5(D)); [tail call]
>   return _12;
> ;;succ:   EXIT
> }
>
> The flag '^' acts on cond_expr will generate matching code similar as below:
>
> else if (gphi *_a1 = dyn_cast  (_d1))
>   {
> basic_block _b1 = gimple_bb (_a1);
> if (gimple_phi_num_args (_a1) == 2)
>   {
> basic_block _pb_0_1 = EDGE_PRED (_b1, 0)->src;
> basic_block _pb_1_1 = EDGE_PRED (_b1, 1)->src;
> basic_block _db_1 = safe_dyn_cast  (*gsi_last_bb (_pb_0_1))
> ? _pb_0_1 : _pb_1_1;
> basic_block _other_db_1 = safe_dyn_cast  (*gsi_last_bb 
> (_pb_0_1))
>   ? _pb_1_1 : _pb_0_1;
> gcond *_ct_1 = safe_dyn_cast  (*gsi_last_bb (_db_1));
> if (_ct_1 && EDGE_COUNT (_other_db_1->preds) == 1
>   && EDGE_COUNT (_other_db_1->succs) == 1
>   && EDGE_PRED (_other_db_1, 0)->src == _db_1)
>   {
> tree _cond_lhs_1 = gimple_cond_lhs (_ct_1);
> tree _cond_rhs_1 = gimple_cond_rhs (_ct_1);
> tree _p0 = build2 (gimple_cond_code (_ct_1), boolean_type_node,
>_cond_lhs_1, _cond_rhs_1);
> bool _arg_0_is_true_1 = gimple_phi_arg_edge (_a1, 0)->flags & 
> EDGE_TRUE_VALUE;
> tree _p1 = gimple_phi_arg_def (_a1, _arg_0_is_true_1 ? 0 : 1);
> tree _p2 = gimple_phi_arg_def (_a1, _arg_0_is_true_1 ? 1 : 0);
> 
>
> The below test suites are passed for this patch.
> * The x86 bootstrap test.
> * The x86 fully regression test.
> * The riscv fully regression test.

OK.

Thanks,
Richard.

> gcc/ChangeLog:
>
> * doc/match-and-simplify.texi: Add doc for the matching flag '^'.
> * genmatch.cc (cmp_operand): Add match_phi comparation.
> (dt_node::gen_kids_1): Add cond_expr bool flag for phi match.
> (dt_operand::gen_phi_on_cond): Add new func to gen phi matching
> on cond_expr.
> (parser::parse_expr): Add handling for the expr flag '^'.
> * match.pd: Add more form for unsigned .SAT_ADD.
> * tree-ssa-math-opts.cc (build_saturation_binary_arith_call): Add
> new func impl to build call for phi gimple.
> (match_unsigned_saturation_add): Add new func impl to match the
> .SAT_ADD for phi gimple.
> (math_opts_dom_walker::after_dom_children): Add phi matching
> try for all gimple phi stmt.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/doc/match-and-simplify.texi |  16 
>  gcc/genmatch.cc | 126 +++-
>  gcc/match.pd|  43 ++-
>  gcc/tree-ssa-math-opts.cc   |  56 +-
>  4 files changed, 236 insertions(+), 5 deletions(-)
>
> diff --git a/gcc/doc/match-and-simplify.texi b/gcc/doc/match-and-simplify.texi
> index 01f19e2f62c..63d5af159f5 100644
> --- a

[PATCH] go: Fix gccgo -v on Solaris with ld

2024-06-06 Thread Rainer Orth
The Go testsuite's go.sum file ends in

Couldn't determine version of 
/var/gcc/regression/master/11.4-gcc-64/build/gcc/gccgo

on Solaris.  It turns out this happens because gccgo -v is confused:

[...]
gcc version 15.0.0 20240531 (experimental) [master 
a0d60660f2aae2d79685f73d568facb2397582d8] (GCC) 
COMPILER_PATH=./:/usr/ccs/bin/
LIBRARY_PATH=./:/lib/amd64/:/usr/lib/amd64/:/lib/:/usr/lib/
COLLECT_GCC_OPTIONS='-g1' '-B' './' '-v' '-shared-libgcc' '-mtune=generic' 
'-march=x86-64' '-dumpdir' 'a.'
 ./collect2 -V -M ./libgcc-unwind.map -Qy /usr/lib/amd64/crt1.o ./crtp.o 
/usr/lib/amd64/crti.o /usr/lib/amd64/values-Xa.o /usr/lib/amd64/values-xpg6.o 
./crtbegin.o -L. -L/lib/amd64 -L/usr/lib/amd64 -t -lgcc_s -lgcc -lc -lgcc_s 
-lgcc ./crtend.o /usr/lib/amd64/crtn.o
ld: Software Generation Utilities - Solaris Link Editors: 5.11-1.3297
Undefined   first referenced
 symbol in file
main/usr/lib/amd64/crt1.o
ld: fatal: symbol referencing errors
collect2: error: ld returned 1 exit status

trying to invoke the linker without adding any object file.  This only
happens when Solaris ld is in use.  gccgo passes -t to the linker in
that case, but does it unconditionally, even with -v.

When configured to use GNU ld, gccgo -v is fine instead.

This patch avoids this by restricting the -t to actually linking.

Tested on i386-pc-solaris2.11 and sparc-sun-solaris2.11 (ld and gld).

Ok for trunk?

I believe this has to go via gofrontend, though.

Raine

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


2024-06-05  Rainer Orth  

gcc/go:
* gospec.cc (lang_specific_driver) [TARGET_SOLARIS !USE_GLD]: Only
add -t if linking.

# HG changeset patch
# Parent  361d219108e4be29bd4c59bb6cb74e460a0e9126
go: Fix gccgo -v on Solaris with ld

diff --git a/gcc/go/gospec.cc b/gcc/go/gospec.cc
--- a/gcc/go/gospec.cc
+++ b/gcc/go/gospec.cc
@@ -443,8 +443,11 @@ lang_specific_driver (struct cl_decoded_
  using the GNU linker, the Solaris linker needs an option to not
  warn about this.  Everything works without this option, but you
  get unsightly warnings at link time.  */
-  generate_option (OPT_Wl_, "-t", 1, CL_DRIVER, &new_decoded_options[j]);
-  j++;
+  if (library > 0)
+{
+  generate_option (OPT_Wl_, "-t", 1, CL_DRIVER, &new_decoded_options[j]);
+  j++;
+}
 #endif
 
   *in_decoded_options_count = j;


Re: [committed] nvptx, libgfortran: Switch out of "minimal" mode

2024-06-06 Thread Sandra Loosemore

On 6/6/24 06:06, Tobias Burnus wrote:

Hi Thomas,

regarding the commit r15-1070-g3a4775d4403f2e / 
https://gcc.gnu.org/r15-1070


First, thanks for adding I/O support to nvptx offloading.

I have a wording nit, to be confirmed by a native speaker:


--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi

...

+@item I/O within OpenMP target regions and OpenACC compute regions is 
supported

+  using the C library @code{printf} functions.
+  Additionally, the Fortran @code{print}/@code{write} statements are
+  supported within OpenMP target regions, but not yet OpenACC 
compute
+  regions.  @c The latter needs 
'GOMP_NVPTX_NATIVE_GPU_THREAD_STACK_SIZE'.




I think an "in" (or 'within') is missing before OpenACC.


Yes, "...not yet within OpenACC compute regions", please.

-Sandra




Re: [patch, rs6000, middle-end 0/1] v1: Add implementation for different targets for pair mem fusion

2024-06-06 Thread Richard Sandiford
Ajit Agarwal  writes:
> On 06/06/24 2:28 pm, Richard Sandiford wrote:
>> Hi,
>> 
>> Just some comments on the fuseable_load_p part, since that's what
>> we were discussing last time.
>> 
>> It looks like this now relies on:
>> 
>> Ajit Agarwal  writes:
>>> +  /* We use DF data flow because we change location rtx
>>> +which is easier to find and modify.
>>> +We use mix of rtl-ssa def-use and DF data flow
>>> +where it is easier.  */
>>> +  df_chain_add_problem (DF_DU_CHAIN | DF_UD_CHAIN);
>>> +  df_analyze ();
>>> +  df_set_flags (DF_DEFER_INSN_RESCAN);
>> 
>> But please don't do this!  For one thing, building DU/UD chains
>> as well as rtl-ssa is really expensive in terms of compile time.
>> But more importantly, modifications need to happen via rtl-ssa
>> to ensure that the IL is kept up-to-date.  If we don't do that,
>> later fuse attempts will be based on stale data and so could
>> generate incorrect code.
>> 
>
> Sure I have made changes to use only rtl-ssa and not to use
> UD/DU chains. I will send the changes in separate subsequent
> patch.

Thanks.  Before you send the patch though:

>>> +// Check whether load can be fusable or not.
>>> +// Return true if fuseable otherwise false.
>>> +bool
>>> +rs6000_pair_fusion::fuseable_load_p (insn_info *info)
>>> +{
>>> +  for (auto def : info->defs())
>>> +{
>>> +  auto set = dyn_cast (def);
>>> +  for (auto use1 : set->nondebug_insn_uses ())
>>> +   use1->set_is_live_out_use (true);
>>> +}
>> 
>> What was the reason for adding this loop?
>>
>
> The purpose of adding is to avoid assert failure in gcc/rtl-ssa/changes.cc:252

That assert is making sure that we don't delete a definition of a
register (or memory) while a real insn still uses it.  If the assert
is firing then something has gone wrong.

Live-out uses are a particular kind of use that occur at the end of
basic blocks.  It's incorrect to mark normal insn uses as live-out.

When an assert fails, it's important to understand why the failure
occurs, rather than brute-force the assert condition to true.

>>> [...]
>>> +
>>> +  rtx addr = XEXP (SET_SRC (body), 0);
>>> +
>>> +  if (GET_CODE (addr) == PLUS
>>> +  && XEXP (addr, 1) && CONST_INT_P (XEXP (addr, 1)))
>>> +{
>>> +  if (INTVAL (XEXP (addr, 1)) == -16)
>>> +   return false;
>>> +  }
>> 
>> What's special about -16?
>> 
>
> The tests like libgomp/for-8 fails with fused load with offset -16 and 0.
> Thats why I have added this check.

But why does it fail though?  It sounds like the testcase is pointing
out a problem in the pass (or perhaps elsewhere).  It's important that
we try to understand and fix the underlying problem.

>>> +
>>> +  df_ref use;
>>> +  df_insn_info *insn_info = DF_INSN_INFO_GET (info->rtl ());
>>> +  FOR_EACH_INSN_INFO_DEF (use, insn_info)
>>> +{
>>> +  struct df_link *def_link = DF_REF_CHAIN (use);
>>> +
>>> +  if (!def_link || !def_link->ref
>>> + || DF_REF_IS_ARTIFICIAL (def_link->ref))
>>> +   continue;
>>> +
>>> +  while (def_link && def_link->ref)
>>> +   {
>>> + rtx_insn *insn = DF_REF_INSN (def_link->ref);
>>> + if (GET_CODE (PATTERN (insn)) == PARALLEL)
>>> +   return false;
>> 
>> Why do you need to skip PARALLELs?
>>
>
> vec_select with parallel give failures final.cc "can't split-up with subreg 
> 128 (reg OO"
> Thats why I have added this.

But in (vec_select ... (parallel ...)), the parallel won't be the 
PATTERN (insn).  It'll instead be a suboperand of the vec_select.

Here too it's important to understand why the final.cc failure occurs
and what the correct fix is.

>>> +
>>> + rtx set = single_set (insn);
>>> + if (set == NULL_RTX)
>>> +   return false;
>>> +
>>> + rtx op0 = SET_SRC (set);
>>> + rtx_code code = GET_CODE (op0);
>>> +
>>> + // This check is added as register pairs are not generated
>>> + // by RA for neg:V2DF (fma: V2DF (reg1)
>>> + //  (reg2)
>>> + //  (neg:V2DF (reg3)))
>>> + if (GET_RTX_CLASS (code) == RTX_UNARY)
>>> +   return false;
>> 
>> What's special about (neg (fma ...))?
>>
>
> I am not sure why register allocator fails allocating register pairs with
> NEG Unary operation with fma operand. I have not debugged register allocator 
> why the NEG
> Unary operation with fma operand. 

I don't think it'll be specific to (neg (fma ...)).  Here too I think the
test showed up a problem with the pass and we should try to understand
and fix it.

More generally, it seems like you've run the testsuite, seen failures,
isolated a particular change that was wrong in some way, and then added
checks for that particular bit of input rtl.  But that isn't how the
testsuite is meant to be used.

The code needs to make sense from first principles, with comments to
explain decisions that are nonobvious.  (Well, to some extent anyway :))
The testsuite exists to verify the code.  If a testsuite failure occurs,
we need to understand what went wrong i

Re: [committed] nvptx, libgfortran: Switch out of "minimal" mode

2024-06-06 Thread Tobias Burnus

Sandra Loosemore wrote:

On 6/6/24 06:06, Tobias Burnus wrote:
+@item I/O within OpenMP target regions and OpenACC compute regions 
is supported

+  using the C library @code{printf} functions.
+  Additionally, the Fortran @code{print}/@code{write} 
statements are
+  supported within OpenMP target regions, but not yet OpenACC 
compute
+  regions.  @c The latter needs 
'GOMP_NVPTX_NATIVE_GPU_THREAD_STACK_SIZE'.




I think an "in" (or 'within') is missing before OpenACC.


Yes, "...not yet within OpenACC compute regions", please.


Thanks! Committed as https://gcc.gnu.org/r15-1072-g423522aacd9f30

Tobias



[PATCH] arm: Fix CASE_VECTOR_SHORTEN_MODE for thumb2.

2024-06-06 Thread Richard Ball
The CASE_VECTOR_SHORTEN_MODE query is missing some equals signs
which causes suboptimal codegen due to missed optimisation
opportunities. This patch also adds a test for thumb2
switch statements as none exist currently.

gcc/ChangeLog:
PR target/115353
* config/arm/arm.h (enum arm_auto_incmodes):
Correct CASE_VECTOR_SHORTEN_MODE query.

gcc/testsuite/ChangeLog:

* gcc.target/arm/thumb2-switchstatement.c: New test.diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index 449e6935b32f8f272df709ba43aa2ba7de37e6b3..0cd5d733952d7620f452d9d90cec9103b3fb5300 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -2111,8 +2111,8 @@ enum arm_auto_incmodes
   ? (ADDR_DIFF_VEC_FLAGS (body).offset_unsigned = 0, HImode)	\
   : SImode)\
: (TARGET_THUMB2			\
-  ? ((min > 0 && max < 0x200) ? QImode\
-  : (min > 0 && max <= 0x2) ? HImode\
+  ? ((min >= 0 && max < 0x200) ? QImode\
+  : (min >= 0 && max < 0x2) ? HImode\
   : SImode)\
: ((min >= 0 && max < 1024)		\
   ? (ADDR_DIFF_VEC_FLAGS (body).offset_unsigned = 1, QImode)	\
diff --git a/gcc/testsuite/gcc.target/arm/thumb2-switchstatement.c b/gcc/testsuite/gcc.target/arm/thumb2-switchstatement.c
new file mode 100644
index ..8badf318e626de1911e297bff8e93ac72160224f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/thumb2-switchstatement.c
@@ -0,0 +1,144 @@
+/* { dg-do compile } */
+/* { dg-options "-mthumb --param case-values-threshold=1 -fno-reorder-blocks -fno-tree-dce -O2" } */
+/* { dg-require-effective-target arm_thumb2_ok } */
+/* { dg-final { check-function-bodies "**" "" "" } } */
+
+#define NOP "nop;"
+#define NOP2 NOP NOP
+#define NOP4 NOP2 NOP2
+#define NOP8 NOP4 NOP4
+#define NOP16 NOP8 NOP8
+#define NOP32 NOP16 NOP16
+#define NOP64 NOP32 NOP32
+#define NOP128 NOP64 NOP64
+#define NOP256 NOP128 NOP128
+#define NOP512 NOP256 NOP256
+#define NOP1024 NOP512 NOP512
+#define NOP2048 NOP1024 NOP1024
+#define NOP4096 NOP2048 NOP2048
+#define NOP8192 NOP4096 NOP4096
+#define NOP16384 NOP8192 NOP8192
+#define NOP32768 NOP16384 NOP16384
+#define NOP65536 NOP32768 NOP32768
+#define NOP131072 NOP65536 NOP65536
+
+enum z
+{
+  a = 1,
+  b,
+  c,
+  d,
+  e,
+  f = 7,
+};
+
+inline void QIFunction (const char* flag)
+{
+  asm volatile (NOP32);
+  return;
+}
+
+inline void HIFunction (const char* flag)
+{
+  asm volatile (NOP512);
+  return;
+}
+
+inline void SIFunction (const char* flag)
+{
+  asm volatile (NOP131072);
+  return;
+}
+
+/*
+**QImode_test:
+**	...
+**	tbb	\[pc, r[0-9]+\]
+**	...
+*/
+__attribute__ ((noinline)) __attribute__ ((noclone)) const char* QImode_test(enum z x)
+{
+  switch (x)
+{
+  case d:
+QIFunction("QItest");
+return "InlineASM";
+  case f:
+return "TEST";
+  default:
+return "Default";
+}
+}
+
+/* { dg-final { scan-assembler ".byte" } } */
+
+/*
+**HImode_test:
+**	...
+**	tbh	\[pc, r[0-9]+, lsl #1\]
+**	...
+*/
+__attribute__ ((noinline)) __attribute__ ((noclone)) const char* HImode_test(enum z x)
+{
+  switch (x)
+  {
+case d:
+  HIFunction("HItest");
+  return "InlineASM";
+case f:
+  return "TEST";
+default:
+  return "Default";
+  }
+}
+
+/* { dg-final { scan-assembler ".2byte" } } */
+
+/*
+**SImode_test:
+**	...
+**	adr	(r[0-9]+), .L[0-9]+
+**	ldr	pc, \[\1, r[0-9]+, lsl #2\]
+**	...
+*/
+__attribute__ ((noinline)) __attribute__ ((noclone)) const char* SImode_test(enum z x)
+{
+  switch (x)
+  {
+case d:
+  SIFunction("SItest");
+  return "InlineASM";
+case f:
+  return "TEST";
+default:
+  return "Default";
+  }
+}
+
+/* { dg-final { scan-assembler ".word" } } */
+
+/*
+**backwards_branch_test:
+**	...
+**	adr	(r[0-9]+), .L[0-9]+
+**	ldr	pc, \[\1, r[0-9]+, lsl #2\]
+**	...
+*/
+__attribute__ ((noinline)) __attribute__ ((noclone)) const char* backwards_branch_test(enum z x, int flag)
+{
+  if (flag == 5)
+  {
+backwards:
+  asm volatile (NOP512);
+  return "ASM";
+  }
+  switch (x)
+  {
+case d:
+  goto backwards;
+case f:
+  return "TEST";
+default:
+  return "Default";
+  }
+}
\ No newline at end of file


Re: [PATCH] arm: Fix CASE_VECTOR_SHORTEN_MODE for thumb2.

2024-06-06 Thread Richard Earnshaw (lists)
On 06/06/2024 15:40, Richard Ball wrote:
> The CASE_VECTOR_SHORTEN_MODE query is missing some equals signs
> which causes suboptimal codegen due to missed optimisation
> opportunities. This patch also adds a test for thumb2
> switch statements as none exist currently.
> 
> gcc/ChangeLog:
>   PR target/115353
>   * config/arm/arm.h (enum arm_auto_incmodes):
>   Correct CASE_VECTOR_SHORTEN_MODE query.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/arm/thumb2-switchstatement.c: New test.

OK.

R.


Re: [PATCH V2] aarch64: Add missing ACLE macro for NEON-SVE Bridge

2024-06-06 Thread Richard Ball
v2: Change macro definition following internal discussion.

__ARM_NEON_SVE_BRIDGE was missed in the original patch and is
added by this patch.

Ok for trunk and a backport into gcc-14?

gcc/ChangeLog:

* config/aarch64/aarch64-c.cc (aarch64_define_unconditional_macros):
Add missing __ARM_NEON_SVE_BRIDGE.

On 6/6/24 13:20, Richard Sandiford wrote:
> Richard Ball  writes:
>> __ARM_NEON_SVE_BRIDGE was missed in the original patch and is
>> added by this patch.
>>
>> Ok for trunk and a backport into gcc-14?
>>
>> gcc/ChangeLog:
>>
>>  * config/aarch64/aarch64-c.cc (aarch64_update_cpp_builtins):
>>  Add missing __ARM_NEON_SVE_BRIDGE.
> 
> After this patch was posted, there was some internal discussion
> involving LLVM & GNU devs about what this kind of macro means, now that
> we have FMV.  The feeling was that __ARM_NEON_SVE_BRIDGE should just
> indicate whether the compiler provides the file, not whether AdvSIMD
> & SVE are enabled.  I think we should therefore add this to
> aarch64_define_unconditional_macros instead.
> 
> Sorry for the slow review.  I was waiting for the outcome of that
> discussion before replying.
> 
> Thanks,
> Richard
> 
>> diff --git a/gcc/config/aarch64/aarch64-c.cc 
>> b/gcc/config/aarch64/aarch64-c.cc
>> index 
>> fe1a20e4e546a68e5f7eddff3bbb0d3e831fbd9b..1121be118cf8d05e3736ad4ee75568ff7cb92bfd
>>  100644
>> --- a/gcc/config/aarch64/aarch64-c.cc
>> +++ b/gcc/config/aarch64/aarch64-c.cc
>> @@ -260,6 +260,7 @@ aarch64_update_cpp_builtins (cpp_reader *pfile)
>>aarch64_def_or_undef (TARGET_SME_I16I64, "__ARM_FEATURE_SME_I16I64", 
>> pfile);
>>aarch64_def_or_undef (TARGET_SME_F64F64, "__ARM_FEATURE_SME_F64F64", 
>> pfile);
>>aarch64_def_or_undef (TARGET_SME2, "__ARM_FEATURE_SME2", pfile);
>> +  aarch64_def_or_undef (TARGET_SVE, "__ARM_NEON_SVE_BRIDGE", pfile);
>>  
>>/* Not for ACLE, but required to keep "float.h" correct if we switch
>>   target between implementations that do or do not support ARMv8.2-Adiff --git a/gcc/config/aarch64/aarch64-c.cc b/gcc/config/aarch64/aarch64-c.cc
index fe1a20e4e546a68e5f7eddff3bbb0d3e831fbd9b..d042e5fbd8c562df2e4538b51b960c194d2ca2c9 100644
--- a/gcc/config/aarch64/aarch64-c.cc
+++ b/gcc/config/aarch64/aarch64-c.cc
@@ -75,6 +75,7 @@ aarch64_define_unconditional_macros (cpp_reader *pfile)
 
   builtin_define ("__ARM_STATE_ZA");
   builtin_define ("__ARM_STATE_ZT0");
+  builtin_define ("__ARM_NEON_SVE_BRIDGE");
 
   /* Define keyword attributes like __arm_streaming as macros that expand
  to the associated [[...]] attribute.  Use __extension__ in the attribute


[r15-1056 Regression] FAIL: gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c -flto -ffat-lto-objects execution test on Linux/x86_64

2024-06-06 Thread haochen.jiang
On Linux/x86_64,

4653b682ef161c3c2fc7bf8462b8f9206a1349e6 is the first bad commit
commit 4653b682ef161c3c2fc7bf8462b8f9206a1349e6
Author: Richard Biener 
Date:   Tue Mar 5 15:46:24 2024 +0100

Allow single-lane SLP in-order reductions

caused

FAIL: gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c execution test
FAIL: gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c -flto 
-ffat-lto-objects execution test

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r15-1056/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c 
--target_board='unix{-m64\ -march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at haochen dot jiang at intel.com.)
(If you met problems with cascadelake related, disabling AVX512F in command 
line might save that.)
(However, please make sure that there is no potential problems with AVX512.)


[r15-1058 Regression] FAIL: gcc.target/i386/pr77881.c scan-assembler js[ \t].?L on Linux/x86_64

2024-06-06 Thread haochen.jiang
On Linux/x86_64,

c989e59fc99d994159114304d4e715c72bedff0a is the first bad commit
commit c989e59fc99d994159114304d4e715c72bedff0a
Author: Hongyu Wang 
Date:   Wed Mar 27 10:13:06 2024 +0800

[APX CCMP] Support APX CCMP

caused

FAIL: gcc.target/i386/pr77881.c scan-assembler js[ \t].?L

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r15-1058/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/pr77881.c --target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/pr77881.c --target_board='unix{-m32\ 
-march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/pr77881.c --target_board='unix{-m64}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/pr77881.c --target_board='unix{-m64\ 
-march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at haochen dot jiang at intel.com.)
(If you met problems with cascadelake related, disabling AVX512F in command 
line might save that.)
(However, please make sure that there is no potential problems with AVX512.)


Re: [PATCH V2] aarch64: Add missing ACLE macro for NEON-SVE Bridge

2024-06-06 Thread Richard Sandiford
Richard Ball  writes:
> v2: Change macro definition following internal discussion.
>
> __ARM_NEON_SVE_BRIDGE was missed in the original patch and is
> added by this patch.
>
> Ok for trunk and a backport into gcc-14?

Yes, thanks.

Richard

> gcc/ChangeLog:
>
>   * config/aarch64/aarch64-c.cc (aarch64_define_unconditional_macros):
>   Add missing __ARM_NEON_SVE_BRIDGE.
>
> On 6/6/24 13:20, Richard Sandiford wrote:
>> Richard Ball  writes:
>>> __ARM_NEON_SVE_BRIDGE was missed in the original patch and is
>>> added by this patch.
>>>
>>> Ok for trunk and a backport into gcc-14?
>>>
>>> gcc/ChangeLog:
>>>
>>> * config/aarch64/aarch64-c.cc (aarch64_update_cpp_builtins):
>>> Add missing __ARM_NEON_SVE_BRIDGE.
>> 
>> After this patch was posted, there was some internal discussion
>> involving LLVM & GNU devs about what this kind of macro means, now that
>> we have FMV.  The feeling was that __ARM_NEON_SVE_BRIDGE should just
>> indicate whether the compiler provides the file, not whether AdvSIMD
>> & SVE are enabled.  I think we should therefore add this to
>> aarch64_define_unconditional_macros instead.
>> 
>> Sorry for the slow review.  I was waiting for the outcome of that
>> discussion before replying.
>> 
>> Thanks,
>> Richard
>> 
>>> diff --git a/gcc/config/aarch64/aarch64-c.cc 
>>> b/gcc/config/aarch64/aarch64-c.cc
>>> index 
>>> fe1a20e4e546a68e5f7eddff3bbb0d3e831fbd9b..1121be118cf8d05e3736ad4ee75568ff7cb92bfd
>>>  100644
>>> --- a/gcc/config/aarch64/aarch64-c.cc
>>> +++ b/gcc/config/aarch64/aarch64-c.cc
>>> @@ -260,6 +260,7 @@ aarch64_update_cpp_builtins (cpp_reader *pfile)
>>>aarch64_def_or_undef (TARGET_SME_I16I64, "__ARM_FEATURE_SME_I16I64", 
>>> pfile);
>>>aarch64_def_or_undef (TARGET_SME_F64F64, "__ARM_FEATURE_SME_F64F64", 
>>> pfile);
>>>aarch64_def_or_undef (TARGET_SME2, "__ARM_FEATURE_SME2", pfile);
>>> +  aarch64_def_or_undef (TARGET_SVE, "__ARM_NEON_SVE_BRIDGE", pfile);
>>>  
>>>/* Not for ACLE, but required to keep "float.h" correct if we switch
>>>   target between implementations that do or do not support ARMv8.2-A
>
> diff --git a/gcc/config/aarch64/aarch64-c.cc b/gcc/config/aarch64/aarch64-c.cc
> index 
> fe1a20e4e546a68e5f7eddff3bbb0d3e831fbd9b..d042e5fbd8c562df2e4538b51b960c194d2ca2c9
>  100644
> --- a/gcc/config/aarch64/aarch64-c.cc
> +++ b/gcc/config/aarch64/aarch64-c.cc
> @@ -75,6 +75,7 @@ aarch64_define_unconditional_macros (cpp_reader *pfile)
>  
>builtin_define ("__ARM_STATE_ZA");
>builtin_define ("__ARM_STATE_ZT0");
> +  builtin_define ("__ARM_NEON_SVE_BRIDGE");
>  
>/* Define keyword attributes like __arm_streaming as macros that expand
>   to the associated [[...]] attribute.  Use __extension__ in the attribute


Re: [pushed 2/3] libcpp: move label_text to its own header

2024-06-06 Thread Andrew Pinski
On Thu, Jun 6, 2024 at 6:02 AM Bert Wesarg  wrote:
>
> Dear David,
>
> On Tue, May 28, 2024 at 10:07 PM David Malcolm  wrote:
> >
> > No functional change intended.
> >
> > Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
> > Pushed to trunk as r15-874-g9bda2c4c81b668.
> >
> > libcpp/ChangeLog:
> > * Makefile.in (TAGS_SOURCES): Add include/label-text.h.
> > * include/label-text.h: New file.
> > * include/rich-location.h: Include "label-text.h".
> > (class label_text): Move to label-text.h.
> >
> > Signed-off-by: David Malcolm 
> > ---
> >  libcpp/Makefile.in |   2 +-
> >  libcpp/include/label-text.h| 102 +
> >  libcpp/include/rich-location.h |  79 +
> >  3 files changed, 105 insertions(+), 78 deletions(-)
> >  create mode 100644 libcpp/include/label-text.h
> >
> > diff --git a/libcpp/Makefile.in b/libcpp/Makefile.in
> > index ebbca3fb..7e47153264c0 100644
> > --- a/libcpp/Makefile.in
> > +++ b/libcpp/Makefile.in
> > @@ -271,7 +271,7 @@ ETAGS = @ETAGS@
> >
> >  TAGS_SOURCES = $(libcpp_a_SOURCES) internal.h system.h ucnid.h \
> >  include/cpplib.h include/line-map.h include/mkdeps.h include/symtab.h \
> > -include/rich-location.h
> > +include/rich-location.h include/label-text.h
>
> this does not seem to be enough that the new header will be installed.
> I get compile errors when compiling an plug-in with this patch:
>
> In file included from
> /home/bitten/opt/gcc-15-20240602/lib/gcc/x86_64-pc-linux-gnu/15.0.0/plugin/include/diagnostic.h:24,
> from 
> /home/bitten/builds/oCyPvWN6/1/perftools/cicd/scorep/src/build-gcc-plugin/../src/adapters/compiler/gcc-plugin/scorep_plugin_inst_descriptor.cpp:43:
> /home/bitten/opt/gcc-15-20240602/lib/gcc/x86_64-pc-linux-gnu/15.0.0/plugin/include/rich-location.h:25:10:
> fatal error: label-text.h: No such file or directory
> 25 | #include "label-text.h"
> | ^~
> compilation terminated.

I have a fix which I am testing.

>
> Best,
> Bert
>
> >
> >
> >  TAGS: $(TAGS_SOURCES)
> > diff --git a/libcpp/include/label-text.h b/libcpp/include/label-text.h
> > new file mode 100644
> > index ..13562cda41f9
> > --- /dev/null
> > +++ b/libcpp/include/label-text.h
> > @@ -0,0 +1,102 @@
> > +/* A very simple string class.
> > +   Copyright (C) 2015-2024 Free Software Foundation, Inc.
> > +
> > +This program is free software; you can redistribute it and/or modify it
> > +under the terms of the GNU General Public License as published by the
> > +Free Software Foundation; either version 3, or (at your option) any
> > +later version.
> > +
> > +This program is distributed in the hope that it will be useful,
> > +but WITHOUT ANY WARRANTY; without even the implied warranty of
> > +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > +GNU General Public License for more details.
> > +
> > +You should have received a copy of the GNU General Public License
> > +along with this program; see the file COPYING3.  If not see
> > +.
> > +
> > + In other words, you are welcome to use, share and improve this program.
> > + You are forbidden to forbid anyone else to use, share and improve
> > + what you give them.   Help stamp out software-hoarding!  */
> > +
> > +#ifndef LIBCPP_LABEL_TEXT_H
> > +#define LIBCPP_LABEL_TEXT_H
> > +
> > +/* A struct for the result of range_label::get_text: a NUL-terminated 
> > buffer
> > +   of localized text, and a flag to determine if the caller should "free" 
> > the
> > +   buffer.  */
> > +
> > +class label_text
> > +{
> > +public:
> > +  label_text ()
> > +  : m_buffer (NULL), m_owned (false)
> > +  {}
> > +
> > +  ~label_text ()
> > +  {
> > +if (m_owned)
> > +  free (m_buffer);
> > +  }
> > +
> > +  /* Move ctor.  */
> > +  label_text (label_text &&other)
> > +  : m_buffer (other.m_buffer), m_owned (other.m_owned)
> > +  {
> > +other.release ();
> > +  }
> > +
> > +  /* Move assignment.  */
> > +  label_text & operator= (label_text &&other)
> > +  {
> > +if (m_owned)
> > +  free (m_buffer);
> > +m_buffer = other.m_buffer;
> > +m_owned = other.m_owned;
> > +other.release ();
> > +return *this;
> > +  }
> > +
> > +  /* Delete the copy ctor and copy-assignment operator.  */
> > +  label_text (const label_text &) = delete;
> > +  label_text & operator= (const label_text &) = delete;
> > +
> > +  /* Create a label_text instance that borrows BUFFER from a
> > + longer-lived owner.  */
> > +  static label_text borrow (const char *buffer)
> > +  {
> > +return label_text (const_cast  (buffer), false);
> > +  }
> > +
> > +  /* Create a label_text instance that takes ownership of BUFFER.  */
> > +  static label_text take (char *buffer)
> > +  {
> > +return label_text (buffer, true);
> > +  }
> > +
> > +  void release ()
> > +  {
> > +m_buffer = NULL;
> > +m_owned = false;
> > +  }
> > +
> > +  const char *get () const
> > +  

Re: [pushed 2/3] libcpp: move label_text to its own header

2024-06-06 Thread David Malcolm
On Thu, 2024-06-06 at 08:40 -0700, Andrew Pinski wrote:
> On Thu, Jun 6, 2024 at 6:02 AM Bert Wesarg
>  wrote:
> > 
> > Dear David,
> > 
> > On Tue, May 28, 2024 at 10:07 PM David Malcolm
> >  wrote:
> > > 
> > > No functional change intended.
> > > 
> > > Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
> > > Pushed to trunk as r15-874-g9bda2c4c81b668.
> > > 
> > > libcpp/ChangeLog:
> > >     * Makefile.in (TAGS_SOURCES): Add include/label-text.h.
> > >     * include/label-text.h: New file.
> > >     * include/rich-location.h: Include "label-text.h".
> > >     (class label_text): Move to label-text.h.
> > > 
> > > Signed-off-by: David Malcolm 
> > > ---
> > >  libcpp/Makefile.in |   2 +-
> > >  libcpp/include/label-text.h    | 102
> > > +
> > >  libcpp/include/rich-location.h |  79 +
> > >  3 files changed, 105 insertions(+), 78 deletions(-)
> > >  create mode 100644 libcpp/include/label-text.h
> > > 
> > > diff --git a/libcpp/Makefile.in b/libcpp/Makefile.in
> > > index ebbca3fb..7e47153264c0 100644
> > > --- a/libcpp/Makefile.in
> > > +++ b/libcpp/Makefile.in
> > > @@ -271,7 +271,7 @@ ETAGS = @ETAGS@
> > > 
> > >  TAGS_SOURCES = $(libcpp_a_SOURCES) internal.h system.h ucnid.h \
> > >  include/cpplib.h include/line-map.h include/mkdeps.h
> > > include/symtab.h \
> > > -    include/rich-location.h
> > > +    include/rich-location.h include/label-text.h
> > 
> > this does not seem to be enough that the new header will be
> > installed.
> > I get compile errors when compiling an plug-in with this patch:
> > 
> > In file included from
> > /home/bitten/opt/gcc-15-20240602/lib/gcc/x86_64-pc-linux-
> > gnu/15.0.0/plugin/include/diagnostic.h:24,
> > from
> > /home/bitten/builds/oCyPvWN6/1/perftools/cicd/scorep/src/build-gcc-
> > plugin/../src/adapters/compiler/gcc-
> > plugin/scorep_plugin_inst_descriptor.cpp:43:
> > /home/bitten/opt/gcc-15-20240602/lib/gcc/x86_64-pc-linux-
> > gnu/15.0.0/plugin/include/rich-location.h:25:10:
> > fatal error: label-text.h: No such file or directory
> > 25 | #include "label-text.h"
> > > ^~
> > compilation terminated.
> 
> I have a fix which I am testing.

Likewise (and sorry about the breakage)

Dave



Re: [PATCH 27/52] nios2: Remove macros {FLOAT,DOUBLE,LONG_DOUBLE}_TYPE_SIZE

2024-06-06 Thread Sandra Loosemore

On 6/2/24 21:01, Kewen Lin wrote:

This is to remove macros {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE
defines in nios2 port.

gcc/ChangeLog:

* config/nios2/nios2.h (FLOAT_TYPE_SIZE): Remove.
(DOUBLE_TYPE_SIZE): Likewise.
(LONG_DOUBLE_TYPE_SIZE): Likewise.


Fine with me, but somewhat redundant since I'm still planning to remove 
the entire nios2 back end this release cycle.


-Sandra



[PATCH] arm: Zero/Sign extends for CMSE security on Armv8-M.baseline

2024-06-06 Thread Torbjörn SVENSSON
I would like to push this patch to the following branches:

- releases/gcc-11
- releases/gcc-12
- releases/gcc-13
- releases/gcc-14
- trunk

Ok?

The problem was highlighted by https://linaro.atlassian.net/browse/GNU-1239

--

Properly handle zero and sign extension for Armv8-M.baseline as
Cortex-M23 can have the security extension active.
Currently, there is a internal compiler error on Cortex-M23 for the
epilog processing of sign extension.

This patch addresses the following CVE-2024-0151 for Armv8-M.baseline.

gcc/ChangeLog:

* config/arm/arm.cc (cmse_nonsecure_call_inline_register_clear):
Sign extend for Thumb1.
(thumb1_expand_prologue): Add zero/sign extend.

Signed-off-by: Torbjörn SVENSSON 
Co-authored-by: Yvan ROUX 
---
 gcc/config/arm/arm.cc | 68 ++-
 1 file changed, 60 insertions(+), 8 deletions(-)

diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc
index ea0c963a4d6..077cb61f42a 100644
--- a/gcc/config/arm/arm.cc
+++ b/gcc/config/arm/arm.cc
@@ -19220,17 +19220,23 @@ cmse_nonsecure_call_inline_register_clear (void)
  || TREE_CODE (ret_type) == BOOLEAN_TYPE)
  && known_lt (GET_MODE_SIZE (TYPE_MODE (ret_type)), 4))
{
- machine_mode ret_mode = TYPE_MODE (ret_type);
+ rtx ret_mode = gen_rtx_REG (TYPE_MODE (ret_type), R0_REGNUM);
+ rtx si_mode = gen_rtx_REG (SImode, R0_REGNUM);
  rtx extend;
  if (TYPE_UNSIGNED (ret_type))
-   extend = gen_rtx_ZERO_EXTEND (SImode,
- gen_rtx_REG (ret_mode, 
R0_REGNUM));
+   extend = gen_rtx_SET (si_mode, gen_rtx_ZERO_EXTEND (SImode,
+   ret_mode));
+ else if (TARGET_THUMB1)
+   {
+ if (known_lt (GET_MODE_SIZE (TYPE_MODE (ret_type)), 2))
+   extend = gen_thumb1_extendqisi2 (si_mode, ret_mode);
+ else
+   extend = gen_thumb1_extendhisi2 (si_mode, ret_mode);
+   }
  else
-   extend = gen_rtx_SIGN_EXTEND (SImode,
- gen_rtx_REG (ret_mode, 
R0_REGNUM));
- emit_insn_after (gen_rtx_SET (gen_rtx_REG (SImode, R0_REGNUM),
-extend), insn);
-
+   extend = gen_rtx_SET (si_mode, gen_rtx_SIGN_EXTEND (SImode,
+   ret_mode));
+ emit_insn_after (extend, insn);
}
 
 
@@ -27250,6 +27256,52 @@ thumb1_expand_prologue (void)
   live_regs_mask = offsets->saved_regs_mask;
   lr_needs_saving = live_regs_mask & (1 << LR_REGNUM);
 
+  /* The AAPCS requires the callee to widen integral types narrower
+ than 32 bits to the full width of the register; but when handling
+ calls to non-secure space, we cannot trust the callee to have
+ correctly done so.  So forcibly re-widen the result here.  */
+  if (IS_CMSE_ENTRY (func_type))
+{
+  function_args_iterator args_iter;
+  CUMULATIVE_ARGS args_so_far_v;
+  cumulative_args_t args_so_far;
+  bool first_param = true;
+  tree arg_type;
+  tree fndecl = current_function_decl;
+  tree fntype = TREE_TYPE (fndecl);
+  arm_init_cumulative_args (&args_so_far_v, fntype, NULL_RTX, fndecl);
+  args_so_far = pack_cumulative_args (&args_so_far_v);
+  FOREACH_FUNCTION_ARGS (fntype, arg_type, args_iter)
+   {
+ rtx arg_rtx;
+
+ if (VOID_TYPE_P (arg_type))
+   break;
+
+ function_arg_info arg (arg_type, /*named=*/true);
+ if (!first_param)
+   /* We should advance after processing the argument and pass
+  the argument we're advancing past.  */
+   arm_function_arg_advance (args_so_far, arg);
+ first_param = false;
+ arg_rtx = arm_function_arg (args_so_far, arg);
+ gcc_assert (REG_P (arg_rtx));
+ if ((TREE_CODE (arg_type) == INTEGER_TYPE
+ || TREE_CODE (arg_type) == ENUMERAL_TYPE
+ || TREE_CODE (arg_type) == BOOLEAN_TYPE)
+ && known_lt (GET_MODE_SIZE (GET_MODE (arg_rtx)), 4))
+   {
+ rtx res_reg = gen_rtx_REG (SImode, REGNO(arg_rtx));
+ if (TYPE_UNSIGNED (arg_type))
+   emit_set_insn (res_reg, gen_rtx_ZERO_EXTEND (SImode, arg_rtx));
+ else if (known_lt (GET_MODE_SIZE (GET_MODE (arg_rtx)), 2))
+   emit_insn (gen_thumb1_extendqisi2 (res_reg, arg_rtx));
+ else
+   emit_insn (gen_thumb1_extendhisi2 (res_reg, arg_rtx));
+   }
+   }
+}
+
   /* Extract a mask of the ones we can give to the Thumb's push instruction.  
*/
   l_mask = live_regs_mask & 0x40ff;
   /* Then count how many other high registers will need to be pushed.  */
-- 
2.25.1



Ping: [PATCHes 1-3] Add support for -mcpu=power11

2024-06-06 Thread Michael Meissner
Ping the 3 patches for adding -mcpu=power11 support.

Patch #1: Add support for -mcpu=power11
https://gcc.gnu.org/pipermail/gcc-patches/2024-June/653552.html

Patch #2: Add tuning support for power11
https://gcc.gnu.org/pipermail/gcc-patches/2024-June/653550.html

Patch #3: Add power11 tests
https://gcc.gnu.org/pipermail/gcc-patches/2024-June/653553.html

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com


Re: [PATCH][_Hashtable] Fix some implementation inconsistencies

2024-06-06 Thread François Dumont

No chance ?

On 22/05/2024 06:50, François Dumont wrote:

Ping ?

On 13/05/2024 06:33, François Dumont wrote:

libstdc++: [_Hashtable] Fix some implementation inconsistencies

    Get rid of the different usages of the mutable keyword except in
    _Prime_rehash_policy where it is preserved for abi compatibility 
reason.


    Fix comment to explain that we need the computation of bucket 
index noexcept

    to be able to rehash the container when needed.

    For Standard instantiations through std::unordered_xxx containers 
we already
    force caching of hash code when hash functor is not noexcep so it 
is guarantied.


    The static_assert purpose in _Hashtable on _M_bucket_index is 
thus limited

    to usages of _Hashtable with exotic _Hashtable_traits.

    libstdc++-v3/ChangeLog:

    * include/bits/hashtable_policy.h 
(_NodeBuilder<>::_S_build): Remove

    const qualification on _NodeGenerator instance.
(_ReuseOrAllocNode<>::operator()(_Args&&...)): Remove const 
qualification.

    (_ReuseOrAllocNode<>::_M_nodes): Remove mutable.
    (_Insert_base<>::_M_insert_range): Remove _NodeGetter 
const qualification.
    (_Hash_code_base<>::_M_bucket_index(const 
_Hash_node_value<>&, size_t)):
    Simplify noexcept declaration, we already static_assert 
that _RangeHash functor

    is noexcept.
    * include/bits/hashtable.h: Rework comments. Remove const 
qualifier on

    _NodeGenerator& arguments.

Tested under Linux x64, ok to commit ?

François



[COMMITTED] Plugins: Add label-text.h to CPPLIB_H so it will be installed [PR115288]

2024-06-06 Thread Andrew Pinski
After r15-874-g9bda2c4c81b668, out of tree plugins won't compile
as the new libcpp header file label-text.h is not installed.

This adds the new header file to CPPLIB_H which is used for
the plugin headers to install.

Committed as obvious after a build and install and make sure
the new header file is installed.

gcc/ChangeLog:

* Makefile.in (CPPLIB_H): Add label-text.h.

Signed-off-by: Andrew Pinski 
---
 gcc/Makefile.in | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index c983b0c102a..f5adb647d3f 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1038,6 +1038,7 @@ SYSTEM_H = system.h hwint.h 
$(srcdir)/../include/libiberty.h \
 PREDICT_H = predict.h predict.def
 CPPLIB_H = $(srcdir)/../libcpp/include/line-map.h \
$(srcdir)/../libcpp/include/rich-location.h \
+   $(srcdir)/../libcpp/include/label-text.h \
$(srcdir)/../libcpp/include/cpplib.h
 CODYLIB_H = $(srcdir)/../libcody/cody.hh
 INPUT_H = $(srcdir)/../libcpp/include/line-map.h input.h
-- 
2.43.0



Re: [pushed 2/3] libcpp: move label_text to its own header

2024-06-06 Thread Andrew Pinski
On Thu, Jun 6, 2024 at 9:00 AM David Malcolm  wrote:
>
> On Thu, 2024-06-06 at 08:40 -0700, Andrew Pinski wrote:
> > On Thu, Jun 6, 2024 at 6:02 AM Bert Wesarg
> >  wrote:
> > >
> > > Dear David,
> > >
> > > On Tue, May 28, 2024 at 10:07 PM David Malcolm
> > >  wrote:
> > > >
> > > > No functional change intended.
> > > >
> > > > Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
> > > > Pushed to trunk as r15-874-g9bda2c4c81b668.
> > > >
> > > > libcpp/ChangeLog:
> > > > * Makefile.in (TAGS_SOURCES): Add include/label-text.h.
> > > > * include/label-text.h: New file.
> > > > * include/rich-location.h: Include "label-text.h".
> > > > (class label_text): Move to label-text.h.
> > > >
> > > > Signed-off-by: David Malcolm 
> > > > ---
> > > >  libcpp/Makefile.in |   2 +-
> > > >  libcpp/include/label-text.h| 102
> > > > +
> > > >  libcpp/include/rich-location.h |  79 +
> > > >  3 files changed, 105 insertions(+), 78 deletions(-)
> > > >  create mode 100644 libcpp/include/label-text.h
> > > >
> > > > diff --git a/libcpp/Makefile.in b/libcpp/Makefile.in
> > > > index ebbca3fb..7e47153264c0 100644
> > > > --- a/libcpp/Makefile.in
> > > > +++ b/libcpp/Makefile.in
> > > > @@ -271,7 +271,7 @@ ETAGS = @ETAGS@
> > > >
> > > >  TAGS_SOURCES = $(libcpp_a_SOURCES) internal.h system.h ucnid.h \
> > > >  include/cpplib.h include/line-map.h include/mkdeps.h
> > > > include/symtab.h \
> > > > -include/rich-location.h
> > > > +include/rich-location.h include/label-text.h
> > >
> > > this does not seem to be enough that the new header will be
> > > installed.
> > > I get compile errors when compiling an plug-in with this patch:
> > >
> > > In file included from
> > > /home/bitten/opt/gcc-15-20240602/lib/gcc/x86_64-pc-linux-
> > > gnu/15.0.0/plugin/include/diagnostic.h:24,
> > > from
> > > /home/bitten/builds/oCyPvWN6/1/perftools/cicd/scorep/src/build-gcc-
> > > plugin/../src/adapters/compiler/gcc-
> > > plugin/scorep_plugin_inst_descriptor.cpp:43:
> > > /home/bitten/opt/gcc-15-20240602/lib/gcc/x86_64-pc-linux-
> > > gnu/15.0.0/plugin/include/rich-location.h:25:10:
> > > fatal error: label-text.h: No such file or directory
> > > 25 | #include "label-text.h"
> > > > ^~
> > > compilation terminated.
> >
> > I have a fix which I am testing.
>
> Likewise (and sorry about the breakage)

Committed as r15-1076-g6e6471806d886b .

>
> Dave
>


Re: [PATCH] arm: Zero/Sign extends for CMSE security on Armv8-M.baseline

2024-06-06 Thread Christophe Lyon
Hi Torbjörn!

On Thu, 6 Jun 2024 at 18:47, Torbjörn SVENSSON
 wrote:
>
> I would like to push this patch to the following branches:
>
> - releases/gcc-11
> - releases/gcc-12
> - releases/gcc-13
> - releases/gcc-14
> - trunk
>
> Ok?
>
> The problem was highlighted by https://linaro.atlassian.net/browse/GNU-1239
>
> --
>
> Properly handle zero and sign extension for Armv8-M.baseline as
> Cortex-M23 can have the security extension active.
> Currently, there is a internal compiler error on Cortex-M23 for the
> epilog processing of sign extension.
>
> This patch addresses the following CVE-2024-0151 for Armv8-M.baseline.
>
> gcc/ChangeLog:
>
> * config/arm/arm.cc (cmse_nonsecure_call_inline_register_clear):
> Sign extend for Thumb1.
> (thumb1_expand_prologue): Add zero/sign extend.

Quick nitpicking: I think the ICE you are fixing was reported as
https://linaro.atlassian.net/browse/GNU-1205
(GNU-1239 is about your test improvements failing too, in addition to
the existing ones)
and your patch is actually about fixing GCC bug report 115253.

So your commit title should end with "[PR115253]" (or maybe "PR target/115253")
and your ChangeLog should also contain "PR target/115253".

You can use contrib/git_check_commit.py to check your patch is
correctly formatted (otherwise it will be rejected by the commit hooks
anyway).

I haven't looked into the details of the patch yet :-)

Thanks for looking at this,

Christophe

>
> Signed-off-by: Torbjörn SVENSSON 
> Co-authored-by: Yvan ROUX 
> ---
>  gcc/config/arm/arm.cc | 68 ++-
>  1 file changed, 60 insertions(+), 8 deletions(-)
>
> diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc
> index ea0c963a4d6..077cb61f42a 100644
> --- a/gcc/config/arm/arm.cc
> +++ b/gcc/config/arm/arm.cc
> @@ -19220,17 +19220,23 @@ cmse_nonsecure_call_inline_register_clear (void)
>   || TREE_CODE (ret_type) == BOOLEAN_TYPE)
>   && known_lt (GET_MODE_SIZE (TYPE_MODE (ret_type)), 4))
> {
> - machine_mode ret_mode = TYPE_MODE (ret_type);
> + rtx ret_mode = gen_rtx_REG (TYPE_MODE (ret_type), R0_REGNUM);
> + rtx si_mode = gen_rtx_REG (SImode, R0_REGNUM);
>   rtx extend;
>   if (TYPE_UNSIGNED (ret_type))
> -   extend = gen_rtx_ZERO_EXTEND (SImode,
> - gen_rtx_REG (ret_mode, 
> R0_REGNUM));
> +   extend = gen_rtx_SET (si_mode, gen_rtx_ZERO_EXTEND (SImode,
> +   
> ret_mode));
> + else if (TARGET_THUMB1)
> +   {
> + if (known_lt (GET_MODE_SIZE (TYPE_MODE (ret_type)), 2))
> +   extend = gen_thumb1_extendqisi2 (si_mode, ret_mode);
> + else
> +   extend = gen_thumb1_extendhisi2 (si_mode, ret_mode);
> +   }
>   else
> -   extend = gen_rtx_SIGN_EXTEND (SImode,
> - gen_rtx_REG (ret_mode, 
> R0_REGNUM));
> - emit_insn_after (gen_rtx_SET (gen_rtx_REG (SImode, R0_REGNUM),
> -extend), insn);
> -
> +   extend = gen_rtx_SET (si_mode, gen_rtx_SIGN_EXTEND (SImode,
> +   
> ret_mode));
> + emit_insn_after (extend, insn);
> }
>
>
> @@ -27250,6 +27256,52 @@ thumb1_expand_prologue (void)
>live_regs_mask = offsets->saved_regs_mask;
>lr_needs_saving = live_regs_mask & (1 << LR_REGNUM);
>
> +  /* The AAPCS requires the callee to widen integral types narrower
> + than 32 bits to the full width of the register; but when handling
> + calls to non-secure space, we cannot trust the callee to have
> + correctly done so.  So forcibly re-widen the result here.  */
> +  if (IS_CMSE_ENTRY (func_type))
> +{
> +  function_args_iterator args_iter;
> +  CUMULATIVE_ARGS args_so_far_v;
> +  cumulative_args_t args_so_far;
> +  bool first_param = true;
> +  tree arg_type;
> +  tree fndecl = current_function_decl;
> +  tree fntype = TREE_TYPE (fndecl);
> +  arm_init_cumulative_args (&args_so_far_v, fntype, NULL_RTX, fndecl);
> +  args_so_far = pack_cumulative_args (&args_so_far_v);
> +  FOREACH_FUNCTION_ARGS (fntype, arg_type, args_iter)
> +   {
> + rtx arg_rtx;
> +
> + if (VOID_TYPE_P (arg_type))
> +   break;
> +
> + function_arg_info arg (arg_type, /*named=*/true);
> + if (!first_param)
> +   /* We should advance after processing the argument and pass
> +  the argument we're advancing past.  */
> +   arm_function_arg_advance (args_so_far, arg);
> + first_param = false;
> + arg_rtx = arm_function_arg (args_so_far, arg);
> + gcc_assert (REG_P (arg_rtx));
> + if ((

[committed] testsuite/i386: Add vector sat_sub testcases [PR112600]

2024-06-06 Thread Uros Bizjak
PR middle-end/112600

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr112600-2a.c: New test.
* gcc.target/i386/pr112600-2b.c: New test.

Tested on x86_64-linux-gnu {,-m32}.

Uros.
diff --git a/gcc/testsuite/gcc.target/i386/pr112600-2a.c 
b/gcc/testsuite/gcc.target/i386/pr112600-2a.c
new file mode 100644
index 000..4df38e5a720
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr112600-2a.c
@@ -0,0 +1,15 @@
+/* PR middle-end/112600 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -ftree-vectorize -msse2" } */
+
+typedef unsigned char T;
+
+void foo (T *out, T *x, T *y, int n)
+{
+  int i;
+
+  for (i = 0; i < n; i++)
+out[i] = (x[i] - y[i]) & (-(T)(x[i] >= y[i]));
+}
+
+/* { dg-final { scan-assembler "psubusb" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr112600-2b.c 
b/gcc/testsuite/gcc.target/i386/pr112600-2b.c
new file mode 100644
index 000..0f6345de704
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr112600-2b.c
@@ -0,0 +1,15 @@
+/* PR middle-end/112600 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -ftree-vectorize -msse2" } */
+
+typedef unsigned short T;
+
+void foo (T *out, T *x, T *y, int n)
+{
+  int i;
+
+  for (i = 0; i < n; i++)
+out[i] = (x[i] - y[i]) & (-(T)(x[i] >= y[i]));
+}
+
+/* { dg-final { scan-assembler "psubusw" } } */


Re: [PATCH 06/52] m2: Replace uses of {FLOAT, {, LONG_}DOUBLE}_TYPE_SIZE

2024-06-06 Thread Gaius Mulley
"Kewen.Lin"  writes:

Hi Kewen,

> Nice!  Looking forward to you pushing this new one (I'm withdrawing the 
> original
> patch).

all pushed now - thanks for the original patch!

regards,
Gaius


RE: [PATCH v2] aarch64: Add vector floating point extend pattern [PR113880, PR113869]

2024-06-06 Thread Pengxuan Zheng (QUIC)
> Pengxuan Zheng  writes:
> > This patch adds vector floating point extend pattern for V2SF->V2DF
> > and
> > V4HF->V4SF conversions by renaming the existing
> > V4HF->aarch64_float_extend_lo_
> > pattern to the standard optab one, i.e., extend2. This
> > allows the vectorizer to vectorize certain floating point widening
> > operations for the
> > aarch64 target.
> >
> > PR target/113880
> > PR target/113869
> >
> > gcc/ChangeLog:
> >
> > * config/aarch64/aarch64-builtins.cc (VAR1): Remap float_extend_lo_
> > builtin codes to standard optab ones.
> > * config/aarch64/aarch64-simd.md
> (aarch64_float_extend_lo_): Rename
> > to...
> > (extend2): ... This.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/aarch64/extend-vec.c: New test.
> 
> OK, thanks, and sorry for the slow review.
> 
> Richard

Thanks, Richard! Pushed as r15-1079-g230d62a2cdd16c.

Thanks,
Pengxuan
> 
> > Signed-off-by: Pengxuan Zheng 
> > ---
> >  gcc/config/aarch64/aarch64-builtins.cc|  9 
> >  gcc/config/aarch64/aarch64-simd.md|  2 +-
> >  gcc/testsuite/gcc.target/aarch64/extend-vec.c | 21
> > +++
> >  3 files changed, 31 insertions(+), 1 deletion(-)  create mode 100644
> > gcc/testsuite/gcc.target/aarch64/extend-vec.c
> >
> > diff --git a/gcc/config/aarch64/aarch64-builtins.cc
> > b/gcc/config/aarch64/aarch64-builtins.cc
> > index f8eeccb554d..25189888d17 100644
> > --- a/gcc/config/aarch64/aarch64-builtins.cc
> > +++ b/gcc/config/aarch64/aarch64-builtins.cc
> > @@ -534,6 +534,15 @@ BUILTIN_VDQ_BHSI (urhadd, uavg, _ceil, 0)
> > BUILTIN_VDQ_BHSI (shadd, avg, _floor, 0)  BUILTIN_VDQ_BHSI (uhadd,
> > uavg, _floor, 0)
> >
> > +/* The builtins below should be expanded through the standard optabs
> > +   CODE_FOR_extend2. */
> > +#undef VAR1
> > +#define VAR1(F,T,N,M) \
> > +  constexpr insn_code CODE_FOR_aarch64_##F##M =
> > +CODE_FOR_##T##N##M##2;
> > +
> > +VAR1 (float_extend_lo_, extend, v2sf, v2df)
> > +VAR1 (float_extend_lo_, extend, v4hf, v4sf)
> > +
> >  #undef VAR1
> >  #define VAR1(T, N, MAP, FLAG, A) \
> >{#N #A, UP (A), CF##MAP (N, A), 0, TYPES_##T, FLAG_##FLAG}, diff
> > --git a/gcc/config/aarch64/aarch64-simd.md
> > b/gcc/config/aarch64/aarch64-simd.md
> > index 868f4486218..c5e2c9f00d0 100644
> > --- a/gcc/config/aarch64/aarch64-simd.md
> > +++ b/gcc/config/aarch64/aarch64-simd.md
> > @@ -3132,7 +3132,7 @@
> >  DONE;
> >}
> >  )
> > -(define_insn "aarch64_float_extend_lo_"
> > +(define_insn "extend2"
> >[(set (match_operand: 0 "register_operand" "=w")
> > (float_extend:
> >   (match_operand:VDF 1 "register_operand" "w")))] diff --git
> > a/gcc/testsuite/gcc.target/aarch64/extend-vec.c
> > b/gcc/testsuite/gcc.target/aarch64/extend-vec.c
> > new file mode 100644
> > index 000..f6241d5
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/aarch64/extend-vec.c
> > @@ -0,0 +1,21 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2" } */
> > +
> > +/* { dg-final { scan-assembler-times {fcvtl\tv[0-9]+.2d, v[0-9]+.2s}
> > +1 } } */ void f (float *__restrict a, double *__restrict b) {
> > +  b[0] = a[0];
> > +  b[1] = a[1];
> > +}
> > +
> > +/* { dg-final { scan-assembler-times {fcvtl\tv[0-9]+.4s, v[0-9]+.4h}
> > +1 } } */ void
> > +f1 (_Float16 *__restrict a, float *__restrict b) {
> > +
> > +  b[0] = a[0];
> > +  b[1] = a[1];
> > +  b[2] = a[2];
> > +  b[3] = a[3];
> > +}


Re: [PATCH] c: Fix up pointer types to may_alias structures [PR114493]

2024-06-06 Thread Joseph Myers
On Tue, 4 Jun 2024, Jakub Jelinek wrote:

> Hi!
> 
> The following testcase ICEs in ipa-free-lang, because the
> fld_incomplete_type_of
>   gcc_assert (TYPE_CANONICAL (t2) != t2
>   && TYPE_CANONICAL (t2) == TYPE_CANONICAL (TREE_TYPE 
> (t)));
> assertion doesn't hold.
> This is because t is a struct S * type which was created while struct S
> was still incomplete and without the may_alias attribute (and TYPE_CANONICAL
> of a pointer type is a type created with can_alias_all = false argument),
> while later on on the struct definition may_alias attribute was used.
> fld_incomplete_type_of then creates an incomplete distinct copy of the
> structure (but with the original attributes) but pointers created for it
> are because of the "may_alias" attribute TYPE_REF_CAN_ALIAS_ALL, including
> their TYPE_CANONICAL, because while that is created with !can_alias_all
> argument, we later set it because of the "may_alias" attribute on the
> to_type.
> 
> This doesn't ICE with C++ since PR70512 fix because the C++ FE sets
> TYPE_REF_CAN_ALIAS_ALL on all pointer types to the class type (and its
> variants) when the may_alias is added.
> 
> The following patch does that in the C FE as well.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk and
> release branches?
> 
> 2024-06-04  Jakub Jelinek  
> 
>   PR c/114493
>   * c-decl.cc (c_fixup_may_alias): New function.
>   (finish_struct): Call it if "may_alias" attribute is
>   specified.
> 
>   * gcc.dg/pr114493-1.c: New test.
>   * gcc.dg/pr114493-2.c: New test.

OK.

-- 
Joseph S. Myers
josmy...@redhat.com



Re: [PATCH v3] libstdc++: Fix std::ranges::iota not in numeric [PR108760]

2024-06-06 Thread Michael Levine (BLOOMBERG/ 731 LEX)
To test the theory that this issue was unrelated to my patch, I moved the 
out_value_result definition into std/numeric and restored the version of 
bits/ranges_algobase.h to the version in master.  I kept the include line 
"include " in std/numeric even though it wasn't being 
used.  With the include line I see the same error about __memcmp is not a 
member of 'std'. 

From: Michael Levine (BLOOMBERG/ 731 LEX) At: 05/30/24 13:43:58 UTC-4:00To:  
jwak...@redhat.com
Cc:  ppa...@redhat.com,  gcc-patches@gcc.gnu.org,  libstd...@gcc.gnu.org
Subject: Re: [PATCH v3] libstdc++: Fix std::ranges::iota not in numeric 
[PR108760]
When I remove  for importing __memcmp (my apologies for 
writing __memcpy) from libstdc++-v3/include/bits/ranges_algobase.h and try to 
rerun the code, I get the following error:

In file included from 
$HOME/projects/objdirforgcc/_pfx/include/c++/15.0.0/numeric:69,
 from ranges-iota-fix.cpp:1:
$HOME/projects/objdirforgcc/_pfx/include/c++/15.0.0/bits/ranges_algobase.h: In 
member function ‘constexpr bool std::ranges::__equal_fn::operator()(_Iter1, 
_Sent1, _Iter2, _Sent2, _Pred, _Proj1, _Proj2) const’:
$HOME/projects/objdirforgcc/_pfx/include/c++/15.0.0/bits/ranges_algobase.h:143:32:
 error: ‘__memcmp’ is not a member of ‘std’; did you mean ‘__memcmpable’?
  143 |   return !std::__memcmp(__first1, __first2, __len);
  |^~~~
  |__memcmpable

From: jwak...@redhat.com At: 05/24/24 10:12:57 UTC-4:00To:  Michael Levine 
(BLOOMBERG/ 731 LEX ) 
Cc:  ppa...@redhat.com,  gcc-patches@gcc.gnu.org,  libstd...@gcc.gnu.org
Subject: Re: [PATCH v3] libstdc++: Fix std::ranges::iota not in numeric 
[PR108760]

On 24/05/24 13:56 -, Michael Levine (BLOOMBERG/ 731 LEX) wrote:
>I've attached the v3 version of the patch as a single, squashed patch 
containing all of the changes.  I manually prepended my sign off to the patch.


>Signed-off-by: Michael Levine 
>---
>diff --git a/libstdc++-v3/include/bits/ranges_algo.h 
b/libstdc++-v3/include/bits/ranges_algo.h
>index 62faff173bd..d258be0b93f 100644
>--- a/libstdc++-v3/include/bits/ranges_algo.h
>+++ b/libstdc++-v3/include/bits/ranges_algo.h
>@@ -3521,58 +3521,6 @@ namespace ranges
> 
> #endif // __glibcxx_ranges_contains
> 
>-#if __glibcxx_ranges_iota >= 202202L // C++ >= 23
>-
>-  template
>-struct out_value_result
>-{
>-  [[no_unique_address]] _Out out;
>-  [[no_unique_address]] _Tp value;
>-
>-  template
>-requires convertible_to
>-&& convertible_to
>-constexpr
>-  operator out_value_result<_Out2, _Tp2>() const &
>- { return {out, value}; }
>-
>-  template
>-  requires convertible_to<_Out, _Out2>
>-   && convertible_to<_Tp, _Tp2>
>-   constexpr
>-  operator out_value_result<_Out2, _Tp2>() &&
>-  { return {std::move(out), std::move(value)}; }
>-};
>-
>-  template
>-using iota_result = out_value_result<_Out, _Tp>;
>-
>-  struct __iota_fn
>-  {
>-template _Sent, 
weakly_incrementable _Tp>
>-  requires indirectly_writable<_Out, const _Tp&>
>-  constexpr iota_result<_Out, _Tp>
>-  operator()(_Out __first, _Sent __last, _Tp __value) const
>-  {
>-while (__first != __last)
>-{
>-*__first = static_cast(__value);
>- ++__first;
>- ++__value;
>-   }
>-return {std::move(__first), std::move(__value)};
>-  }
>-
>-template _Range>
>-  constexpr iota_result, _Tp>
>-  operator()(_Range&& __r, _Tp __value) const
>-  { return (*this)(ranges::begin(__r), ranges::end(__r), 
std::move(__value)); }
>-  };
>-
>-  inline constexpr __iota_fn iota{};
>-
>-#endif // __glibcxx_ranges_iota
>-
> #if __glibcxx_ranges_find_last >= 202207L // C++ >= 23
> 
>   struct __find_last_fn
>diff --git a/libstdc++-v3/include/bits/ranges_algobase.h 
b/libstdc++-v3/include/bits/ranges_algobase.h
>index e26a73a27d6..965b36aed35 100644
>--- a/libstdc++-v3/include/bits/ranges_algobase.h
>+++ b/libstdc++-v3/include/bits/ranges_algobase.h
>@@ -35,6 +35,7 @@
> #include 
> #include 
> #include 
>+#include  // __memcpy

Why is this being added here? What is __memcpy?

I don't think out_value_result requires any new headers to be included
here, does it?

> #include  // ranges::begin, ranges::range etc.
> #include   // __invoke
> #include  // __is_byte
>@@ -70,6 +71,32 @@ namespace ranges
>  __is_move_iterator> = true;
>   } // namespace __detail
> 
>+#if __glibcxx_ranges_iota >= 202202L // C++ >= 23
>+
>+template
>+struct out_value_result
>+{
>+[[no_unique_address]] _Out out;
>+[[no_unique_address]] _Tp value;
>+
>+template
>+  requires convertible_to
>+&& convertible_to
>+constexpr
>+  operator out_value_result<_Out2, _Tp2>() const &
>+ { return {out, value}; }
>+
>+template
>+requires convertible_to<_Out, _Out2>
>+&& convertible_to<_Tp, _Tp2>
>+  constexpr
>+  operator out_

Re: [patch, rs6000, middle-end 0/1] v1: Add implementation for different targets for pair mem fusion

2024-06-06 Thread Ajit Agarwal
Hello Richard:

On 06/06/24 8:03 pm, Richard Sandiford wrote:
> Ajit Agarwal  writes:
>> On 06/06/24 2:28 pm, Richard Sandiford wrote:
>>> Hi,
>>>
>>> Just some comments on the fuseable_load_p part, since that's what
>>> we were discussing last time.
>>>
>>> It looks like this now relies on:
>>>
>>> Ajit Agarwal  writes:
 +  /* We use DF data flow because we change location rtx
 +   which is easier to find and modify.
 +   We use mix of rtl-ssa def-use and DF data flow
 +   where it is easier.  */
 +  df_chain_add_problem (DF_DU_CHAIN | DF_UD_CHAIN);
 +  df_analyze ();
 +  df_set_flags (DF_DEFER_INSN_RESCAN);
>>>
>>> But please don't do this!  For one thing, building DU/UD chains
>>> as well as rtl-ssa is really expensive in terms of compile time.
>>> But more importantly, modifications need to happen via rtl-ssa
>>> to ensure that the IL is kept up-to-date.  If we don't do that,
>>> later fuse attempts will be based on stale data and so could
>>> generate incorrect code.
>>>
>>
>> Sure I have made changes to use only rtl-ssa and not to use
>> UD/DU chains. I will send the changes in separate subsequent
>> patch.
> 
> Thanks.  Before you send the patch though:
> 
 +// Check whether load can be fusable or not.
 +// Return true if fuseable otherwise false.
 +bool
 +rs6000_pair_fusion::fuseable_load_p (insn_info *info)
 +{
 +  for (auto def : info->defs())
 +{
 +  auto set = dyn_cast (def);
 +  for (auto use1 : set->nondebug_insn_uses ())
 +  use1->set_is_live_out_use (true);
 +}
>>>
>>> What was the reason for adding this loop?
>>>
>>
>> The purpose of adding is to avoid assert failure in 
>> gcc/rtl-ssa/changes.cc:252
> 
> That assert is making sure that we don't delete a definition of a
> register (or memory) while a real insn still uses it.  If the assert
> is firing then something has gone wrong.
> 
> Live-out uses are a particular kind of use that occur at the end of
> basic blocks.  It's incorrect to mark normal insn uses as live-out.
> 
> When an assert fails, it's important to understand why the failure
> occurs, rather than brute-force the assert condition to true.
> 

The above assert failure occurs when there is a debug insn and its
use is not live-out.

 [...]
 +
 +  rtx addr = XEXP (SET_SRC (body), 0);
 +
 +  if (GET_CODE (addr) == PLUS
 +  && XEXP (addr, 1) && CONST_INT_P (XEXP (addr, 1)))
 +{
 +  if (INTVAL (XEXP (addr, 1)) == -16)
 +  return false;
 +  }
>>>
>>> What's special about -16?
>>>
>>
>> The tests like libgomp/for-8 fails with fused load with offset -16 and 0.
>> Thats why I have added this check.
> 
> But why does it fail though?  It sounds like the testcase is pointing
> out a problem in the pass (or perhaps elsewhere).  It's important that
> we try to understand and fix the underlying problem.
> 

This check is not required anymore and will remove from subsequent patches.
 +
 +  df_ref use;
 +  df_insn_info *insn_info = DF_INSN_INFO_GET (info->rtl ());
 +  FOR_EACH_INSN_INFO_DEF (use, insn_info)
 +{
 +  struct df_link *def_link = DF_REF_CHAIN (use);
 +
 +  if (!def_link || !def_link->ref
 +|| DF_REF_IS_ARTIFICIAL (def_link->ref))
 +  continue;
 +
 +  while (def_link && def_link->ref)
 +  {
 +rtx_insn *insn = DF_REF_INSN (def_link->ref);
 +if (GET_CODE (PATTERN (insn)) == PARALLEL)
 +  return false;
>>>
>>> Why do you need to skip PARALLELs?
>>>
>>
>> vec_select with parallel give failures final.cc "can't split-up with subreg 
>> 128 (reg OO"
>> Thats why I have added this.
> 
> But in (vec_select ... (parallel ...)), the parallel won't be the 
> PATTERN (insn).  It'll instead be a suboperand of the vec_select.
> 
> Here too it's important to understand why the final.cc failure occurs
> and what the correct fix is.
> 

subreg with vec_select operand already exists before fusion pass.
We overwrite them with subreg 128 bits from 256 OO mode operand.
Due to this in final.cc we couldnt splt at line 2807 and bails
out fatal_insn.

Currently we dont support already existing subreg vector operand
to generate register pairs.
We should bail out from fusion pass in this case.
 +
 +rtx set = single_set (insn);
 +if (set == NULL_RTX)
 +  return false;
 +
 +rtx op0 = SET_SRC (set);
 +rtx_code code = GET_CODE (op0);
 +
 +// This check is added as register pairs are not generated
 +// by RA for neg:V2DF (fma: V2DF (reg1)
 +//  (reg2)
 +//  (neg:V2DF (reg3)))
 +if (GET_RTX_CLASS (code) == RTX_UNARY)
 +  return false;
>>>
>>> What's special about (neg (fma ...))?
>>>
>>
>> I am not sure why register allocator fails allocating register pairs with
>> NEG Unary operation with fma operand. I have not debugged register alloca

Re: [PATCH] testsuite: go: Require split-stack support for go.test/test/index0.go [PR87589]

2024-06-06 Thread Ian Lance Taylor
On Thu, Jun 6, 2024 at 7:00 AM Rainer Orth  
wrote:
>
> The index0-out.go test FAILs on Solaris (SPARC and x86, 32 and 64-bit),
> as well as several others:
>
> FAIL: ./index0-out.go execution,  -O0 -g -fno-var-tracking-assignments
>
> The test SEGVs because it tries a stack acess way beyond the stack
> area.  As Ian analyzed in the PR, the testcase currently requires
> split-stack support, so this patch requires just that.
>
> Tested on i386-pc-solaris2.11 and sparc-sun-solaris2.11.
>
> Ok for trunk?
>
> Rainer
>
> --
> -
> Rainer Orth, Center for Biotechnology, Bielefeld University
>
>
> 2024-06-05  Rainer Orth  
>
> gcc/testsuite:
> PR go/87589
> * go.test/go-test.exp (go-gc-tests): Require split-stack support
> for index0.go.



This is OK.  Thanks.

Ian


Re: [PATCH] go: Fix gccgo -v on Solaris with ld

2024-06-06 Thread Ian Lance Taylor
On Thu, Jun 6, 2024 at 7:13 AM Rainer Orth  
wrote:
>
> The Go testsuite's go.sum file ends in
>
> Couldn't determine version of 
> /var/gcc/regression/master/11.4-gcc-64/build/gcc/gccgo
>
> on Solaris.  It turns out this happens because gccgo -v is confused:
>
> [...]
> gcc version 15.0.0 20240531 (experimental) [master 
> a0d60660f2aae2d79685f73d568facb2397582d8] (GCC)
> COMPILER_PATH=./:/usr/ccs/bin/
> LIBRARY_PATH=./:/lib/amd64/:/usr/lib/amd64/:/lib/:/usr/lib/
> COLLECT_GCC_OPTIONS='-g1' '-B' './' '-v' '-shared-libgcc' '-mtune=generic' 
> '-march=x86-64' '-dumpdir' 'a.'
>  ./collect2 -V -M ./libgcc-unwind.map -Qy /usr/lib/amd64/crt1.o ./crtp.o 
> /usr/lib/amd64/crti.o /usr/lib/amd64/values-Xa.o /usr/lib/amd64/values-xpg6.o 
> ./crtbegin.o -L. -L/lib/amd64 -L/usr/lib/amd64 -t -lgcc_s -lgcc -lc -lgcc_s 
> -lgcc ./crtend.o /usr/lib/amd64/crtn.o
> ld: Software Generation Utilities - Solaris Link Editors: 5.11-1.3297
> Undefined   first referenced
>  symbol in file
> main/usr/lib/amd64/crt1.o
> ld: fatal: symbol referencing errors
> collect2: error: ld returned 1 exit status
>
> trying to invoke the linker without adding any object file.  This only
> happens when Solaris ld is in use.  gccgo passes -t to the linker in
> that case, but does it unconditionally, even with -v.
>
> When configured to use GNU ld, gccgo -v is fine instead.
>
> This patch avoids this by restricting the -t to actually linking.
>
> Tested on i386-pc-solaris2.11 and sparc-sun-solaris2.11 (ld and gld).
>
> Ok for trunk?
>
> I believe this has to go via gofrontend, though.
>
> Raine
>
> --
> -
> Rainer Orth, Center for Biotechnology, Bielefeld University
>
>
> 2024-06-05  Rainer Orth  
>
> gcc/go:
> * gospec.cc (lang_specific_driver) [TARGET_SOLARIS !USE_GLD]: Only
> add -t if linking.



This is OK.  Thanks.

You can just go ahead and commit this change.  The files in the gcc/go
directory itself live in the GCC tree.  The files in gcc/go/gofrontend
are copied in from a different repository.

Ian


Re: [patch, rs6000, middle-end 0/1] v1: Add implementation for different targets for pair mem fusion

2024-06-06 Thread Richard Sandiford
Ajit Agarwal  writes:
> On 06/06/24 8:03 pm, Richard Sandiford wrote:
>> Ajit Agarwal  writes:
>>> On 06/06/24 2:28 pm, Richard Sandiford wrote:
 Hi,

 Just some comments on the fuseable_load_p part, since that's what
 we were discussing last time.

 It looks like this now relies on:

 Ajit Agarwal  writes:
> +  /* We use DF data flow because we change location rtx
> +  which is easier to find and modify.
> +  We use mix of rtl-ssa def-use and DF data flow
> +  where it is easier.  */
> +  df_chain_add_problem (DF_DU_CHAIN | DF_UD_CHAIN);
> +  df_analyze ();
> +  df_set_flags (DF_DEFER_INSN_RESCAN);

 But please don't do this!  For one thing, building DU/UD chains
 as well as rtl-ssa is really expensive in terms of compile time.
 But more importantly, modifications need to happen via rtl-ssa
 to ensure that the IL is kept up-to-date.  If we don't do that,
 later fuse attempts will be based on stale data and so could
 generate incorrect code.

>>>
>>> Sure I have made changes to use only rtl-ssa and not to use
>>> UD/DU chains. I will send the changes in separate subsequent
>>> patch.
>> 
>> Thanks.  Before you send the patch though:
>> 
> +// Check whether load can be fusable or not.
> +// Return true if fuseable otherwise false.
> +bool
> +rs6000_pair_fusion::fuseable_load_p (insn_info *info)
> +{
> +  for (auto def : info->defs())
> +{
> +  auto set = dyn_cast (def);
> +  for (auto use1 : set->nondebug_insn_uses ())
> + use1->set_is_live_out_use (true);
> +}

 What was the reason for adding this loop?

>>>
>>> The purpose of adding is to avoid assert failure in 
>>> gcc/rtl-ssa/changes.cc:252
>> 
>> That assert is making sure that we don't delete a definition of a
>> register (or memory) while a real insn still uses it.  If the assert
>> is firing then something has gone wrong.
>> 
>> Live-out uses are a particular kind of use that occur at the end of
>> basic blocks.  It's incorrect to mark normal insn uses as live-out.
>> 
>> When an assert fails, it's important to understand why the failure
>> occurs, rather than brute-force the assert condition to true.
>> 
>
> The above assert failure occurs when there is a debug insn and its
> use is not live-out.

Uses in debug insns are never live-out uses.

It sounds like the bug is that we're failing to update all debug uses of
the original register.  We need to do that, or "reset" the debug insn if
substitution fails for some reason.

See fixup_debug_uses for what the target-independent part of the pass
does for debug insns that are affected by movement.  Hopefully the
update needed here will be simpler than that.

> [...]
> +
> +  rtx addr = XEXP (SET_SRC (body), 0);
> +
> +  if (GET_CODE (addr) == PLUS
> +  && XEXP (addr, 1) && CONST_INT_P (XEXP (addr, 1)))
> +{
> +  if (INTVAL (XEXP (addr, 1)) == -16)
> + return false;
> +  }

 What's special about -16?

>>>
>>> The tests like libgomp/for-8 fails with fused load with offset -16 and 0.
>>> Thats why I have added this check.
>> 
>> But why does it fail though?  It sounds like the testcase is pointing
>> out a problem in the pass (or perhaps elsewhere).  It's important that
>> we try to understand and fix the underlying problem.
>> 
>
> This check is not required anymore and will remove from subsequent patches.

OK, great.

> +
> +  df_ref use;
> +  df_insn_info *insn_info = DF_INSN_INFO_GET (info->rtl ());
> +  FOR_EACH_INSN_INFO_DEF (use, insn_info)
> +{
> +  struct df_link *def_link = DF_REF_CHAIN (use);
> +
> +  if (!def_link || !def_link->ref
> +   || DF_REF_IS_ARTIFICIAL (def_link->ref))
> + continue;
> +
> +  while (def_link && def_link->ref)
> + {
> +   rtx_insn *insn = DF_REF_INSN (def_link->ref);
> +   if (GET_CODE (PATTERN (insn)) == PARALLEL)
> + return false;

 Why do you need to skip PARALLELs?

>>>
>>> vec_select with parallel give failures final.cc "can't split-up with subreg 
>>> 128 (reg OO"
>>> Thats why I have added this.
>> 
>> But in (vec_select ... (parallel ...)), the parallel won't be the 
>> PATTERN (insn).  It'll instead be a suboperand of the vec_select.
>> 
>> Here too it's important to understand why the final.cc failure occurs
>> and what the correct fix is.
>> 
>
> subreg with vec_select operand already exists before fusion pass.
> We overwrite them with subreg 128 bits from 256 OO mode operand.

But why is that wrong?  What was the full rtl of the subreg before the
pass runs, what did the subreg look like after the pass, and why is the
change not correct?

In general, there are two main ways that an rtl change can be incorrect:

(1) The new rtl isn't well-formed (such as (subreg (subreg X A) B)).
In this case, the new rtl makes no inherent 

RE: [PATCH v7] Match: Support more form for scalar unsigned SAT_ADD

2024-06-06 Thread Li, Pan2
Committed, thanks Richard.

Pan

-Original Message-
From: Richard Biener  
Sent: Thursday, June 6, 2024 10:04 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
tamar.christ...@arm.com
Subject: Re: [PATCH v7] Match: Support more form for scalar unsigned SAT_ADD

On Thu, Jun 6, 2024 at 3:37 PM  wrote:
>
> From: Pan Li 
>
> After we support one gassign form of the unsigned .SAT_ADD,  we
> would like to support more forms including both the branch and
> branchless.  There are 5 other forms of .SAT_ADD,  list as below:
>
> Form 1:
>   #define SAT_ADD_U_1(T) \
>   T sat_add_u_1_##T(T x, T y) \
>   { \
> return (T)(x + y) >= x ? (x + y) : -1; \
>   }
>
> Form 2:
>   #define SAT_ADD_U_2(T) \
>   T sat_add_u_2_##T(T x, T y) \
>   { \
> T ret; \
> T overflow = __builtin_add_overflow (x, y, &ret); \
> return (T)(-overflow) | ret; \
>   }
>
> Form 3:
>   #define SAT_ADD_U_3(T) \
>   T sat_add_u_3_##T (T x, T y) \
>   { \
> T ret; \
> return __builtin_add_overflow (x, y, &ret) ? -1 : ret; \
>   }
>
> Form 4:
>   #define SAT_ADD_U_4(T) \
>   T sat_add_u_4_##T (T x, T y) \
>   { \
> T ret; \
> return __builtin_add_overflow (x, y, &ret) == 0 ? ret : -1; \
>   }
>
> Form 5:
>   #define SAT_ADD_U_5(T) \
>   T sat_add_u_5_##T(T x, T y) \
>   { \
> return (T)(x + y) < x ? -1 : (x + y); \
>   }
>
> Take the forms 3 of above as example:
>
> uint64_t
> sat_add (uint64_t x, uint64_t y)
> {
>   uint64_t ret;
>   return __builtin_add_overflow (x, y, &ret) ? -1 : ret;
> }
>
> Before this patch:
> uint64_t sat_add (uint64_t x, uint64_t y)
> {
>   long unsigned int _1;
>   long unsigned int _2;
>   uint64_t _3;
>   __complex__ long unsigned int _6;
>
> ;;   basic block 2, loop depth 0
> ;;pred:   ENTRY
>   _6 = .ADD_OVERFLOW (x_4(D), y_5(D));
>   _2 = IMAGPART_EXPR <_6>;
>   if (_2 != 0)
> goto ; [35.00%]
>   else
> goto ; [65.00%]
> ;;succ:   4
> ;;3
>
> ;;   basic block 3, loop depth 0
> ;;pred:   2
>   _1 = REALPART_EXPR <_6>;
> ;;succ:   4
>
> ;;   basic block 4, loop depth 0
> ;;pred:   3
> ;;2
>   # _3 = PHI <_1(3), 18446744073709551615(2)>
>   return _3;
> ;;succ:   EXIT
> }
>
> After this patch:
> uint64_t sat_add (uint64_t x, uint64_t y)
> {
>   long unsigned int _12;
>
> ;;   basic block 2, loop depth 0
> ;;pred:   ENTRY
>   _12 = .SAT_ADD (x_4(D), y_5(D)); [tail call]
>   return _12;
> ;;succ:   EXIT
> }
>
> The flag '^' acts on cond_expr will generate matching code similar as below:
>
> else if (gphi *_a1 = dyn_cast  (_d1))
>   {
> basic_block _b1 = gimple_bb (_a1);
> if (gimple_phi_num_args (_a1) == 2)
>   {
> basic_block _pb_0_1 = EDGE_PRED (_b1, 0)->src;
> basic_block _pb_1_1 = EDGE_PRED (_b1, 1)->src;
> basic_block _db_1 = safe_dyn_cast  (*gsi_last_bb (_pb_0_1))
> ? _pb_0_1 : _pb_1_1;
> basic_block _other_db_1 = safe_dyn_cast  (*gsi_last_bb 
> (_pb_0_1))
>   ? _pb_1_1 : _pb_0_1;
> gcond *_ct_1 = safe_dyn_cast  (*gsi_last_bb (_db_1));
> if (_ct_1 && EDGE_COUNT (_other_db_1->preds) == 1
>   && EDGE_COUNT (_other_db_1->succs) == 1
>   && EDGE_PRED (_other_db_1, 0)->src == _db_1)
>   {
> tree _cond_lhs_1 = gimple_cond_lhs (_ct_1);
> tree _cond_rhs_1 = gimple_cond_rhs (_ct_1);
> tree _p0 = build2 (gimple_cond_code (_ct_1), boolean_type_node,
>_cond_lhs_1, _cond_rhs_1);
> bool _arg_0_is_true_1 = gimple_phi_arg_edge (_a1, 0)->flags & 
> EDGE_TRUE_VALUE;
> tree _p1 = gimple_phi_arg_def (_a1, _arg_0_is_true_1 ? 0 : 1);
> tree _p2 = gimple_phi_arg_def (_a1, _arg_0_is_true_1 ? 1 : 0);
> 
>
> The below test suites are passed for this patch.
> * The x86 bootstrap test.
> * The x86 fully regression test.
> * The riscv fully regression test.

OK.

Thanks,
Richard.

> gcc/ChangeLog:
>
> * doc/match-and-simplify.texi: Add doc for the matching flag '^'.
> * genmatch.cc (cmp_operand): Add match_phi comparation.
> (dt_node::gen_kids_1): Add cond_expr bool flag for phi match.
> (dt_operand::gen_phi_on_cond): Add new func to gen phi matching
> on cond_expr.
> (parser::parse_expr): Add handling for the expr flag '^'.
> * match.pd: Add more form for unsigned .SAT_ADD.
> * tree-ssa-math-opts.cc (build_saturation_binary_arith_call): Add
> new func impl to build call for phi gimple.
> (match_unsigned_saturation_add): Add new func impl to match the
> .SAT_ADD for phi gimple.
> (math_opts_dom_walker::after_dom_children): Add phi matching
> try for all gimple phi stmt.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/doc/match-and-simplify.texi |  16 
>  gcc/genmatch.cc | 126 +++

Re: [wwwdocs] gcc-15/changes.html + projects/gomp: update for new OpenMP features

2024-06-06 Thread Gerald Pfeifer
On Thu, 6 Jun 2024, Tobias Burnus wrote:
> GCC 15 now supports unified-shared memory and the tile/unroll constructs 
> in OpenMP.
> 
> Updates https://gcc.gnu.org/gcc-15/changes.html
> and https://gcc.gnu.org/projects/gomp/

Nice!

> Comments?

--- a/htdocs/gcc-15/changes.html
+++ b/htdocs/gcc-15/changes.html
+
+  https://gcc.gnu.org/projects/gomp/";>OpenMP

Can you please make this a relative link, i.e. "../projects/gomp/"?

+  Support for unified-shared memory has been added for some AMD and Nvidia
+  GPUs devices, enabled only when using the

"GPU devices", I believe?

And I think just "enabled" is sufficient.

+
+  OpenMP 5.1: The unroll and tile
+  loop-transformation constructs are now supported.
+

I'm thinking "loop transformation" in English? Or is this a specific term 
from the standard?

Fine with these changes.

Thanks,
Gerald


RE: [PATCH v1 1/5] RISC-V: Add testcases for scalar unsigned SAT_ADD form 1

2024-06-06 Thread Li, Pan2
Committed the series as the middle-end patch committed.

Pan

From: Li, Pan2
Sent: Monday, June 3, 2024 11:24 AM
To: juzhe.zh...@rivai.ai; gcc-patches 
Cc: kito.cheng 
Subject: RE: [PATCH v1 1/5] RISC-V: Add testcases for scalar unsigned SAT_ADD 
form 1

Thanks Juzhe, will commit it after the middle-end patch, as well as the rest 
similar 4 patches.

Pan

From: juzhe.zh...@rivai.ai 
mailto:juzhe.zh...@rivai.ai>>
Sent: Monday, June 3, 2024 11:19 AM
To: Li, Pan2 mailto:pan2...@intel.com>>; gcc-patches 
mailto:gcc-patches@gcc.gnu.org>>
Cc: kito.cheng mailto:kito.ch...@gmail.com>>; Li, Pan2 
mailto:pan2...@intel.com>>
Subject: Re: [PATCH v1 1/5] RISC-V: Add testcases for scalar unsigned SAT_ADD 
form 1

LGTM. Thanks.


juzhe.zh...@rivai.ai

From: pan2.li
Date: 2024-06-03 11:09
To: gcc-patches
CC: juzhe.zhong; 
kito.cheng; Pan Li
Subject: [PATCH v1 1/5] RISC-V: Add testcases for scalar unsigned SAT_ADD form 1
From: Pan Li mailto:pan2...@intel.com>>

After the middle-end support the form 1 of unsigned SAT_ADD and
the RISC-V backend implement the scalar .SAT_ADD, add more test
case to cover the form 1 of unsigned .SAT_ADD.

Form 1:

  #define SAT_ADD_U_1(T)   \
  T sat_add_u_1_##T(T x, T y)  \
  {\
return (T)(x + y) >= x ? (x + y) : -1; \
  }

Passed the riscv fully regression tests.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/sat_arith.h: Add helper macro for form 1.
* gcc.target/riscv/sat_u_add-5.c: New test.
* gcc.target/riscv/sat_u_add-6.c: New test.
* gcc.target/riscv/sat_u_add-7.c: New test.
* gcc.target/riscv/sat_u_add-8.c: New test.
* gcc.target/riscv/sat_u_add-run-5.c: New test.
* gcc.target/riscv/sat_u_add-run-6.c: New test.
* gcc.target/riscv/sat_u_add-run-7.c: New test.
* gcc.target/riscv/sat_u_add-run-8.c: New test.

Signed-off-by: Pan Li mailto:pan2...@intel.com>>
---
gcc/testsuite/gcc.target/riscv/sat_arith.h|  8 ++
gcc/testsuite/gcc.target/riscv/sat_u_add-5.c  | 19 ++
gcc/testsuite/gcc.target/riscv/sat_u_add-6.c  | 21 
gcc/testsuite/gcc.target/riscv/sat_u_add-7.c  | 18 +
gcc/testsuite/gcc.target/riscv/sat_u_add-8.c  | 17 +
.../gcc.target/riscv/sat_u_add-run-5.c| 25 +++
.../gcc.target/riscv/sat_u_add-run-6.c| 25 +++
.../gcc.target/riscv/sat_u_add-run-7.c| 25 +++
.../gcc.target/riscv/sat_u_add-run-8.c| 25 +++
9 files changed, 183 insertions(+)
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_add-5.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_add-6.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_add-7.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_add-8.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_add-run-5.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_add-run-6.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_add-run-7.c
create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_add-run-8.c

diff --git a/gcc/testsuite/gcc.target/riscv/sat_arith.h 
b/gcc/testsuite/gcc.target/riscv/sat_arith.h
index 2ef9fd825f3..2abc83d7666 100644
--- a/gcc/testsuite/gcc.target/riscv/sat_arith.h
+++ b/gcc/testsuite/gcc.target/riscv/sat_arith.h
@@ -10,6 +10,13 @@ sat_u_add_##T##_fmt_1 (T x, T y)   \
   return (x + y) | (-(T)((T)(x + y) < x)); \
}
+#define DEF_SAT_U_ADD_FMT_2(T)   \
+T __attribute__((noinline))  \
+sat_u_add_##T##_fmt_2 (T x, T y) \
+{\
+  return (T)(x + y) >= x ? (x + y) : -1; \
+}
+
#define DEF_VEC_SAT_U_ADD_FMT_1(T)   \
void __attribute__((noinline))   \
vec_sat_u_add_##T##_fmt_1 (T *out, T *op_1, T *op_2, unsigned limit) \
@@ -24,6 +31,7 @@ vec_sat_u_add_##T##_fmt_1 (T *out, T *op_1, T *op_2, unsigned 
limit) \
}
#define RUN_SAT_U_ADD_FMT_1(T, x, y) sat_u_add_##T##_fmt_1(x, y)
+#define RUN_SAT_U_ADD_FMT_2(T, x, y) sat_u_add_##T##_fmt_2(x, y)
#define RUN_VEC_SAT_U_ADD_FMT_1(T, out, op_1, op_2, N) \
   vec_sat_u_add_##T##_fmt_1(out, op_1, op_2, N)
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_add-5.c 
b/gcc/testsuite/gcc.target/riscv/sat_u_add-5.c
new file mode 100644
index 000..4c73c7f8a21
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_add-5.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fdump-rtl-expand-details 
-fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "sat_arith.h"
+
+/*
+** sat_u_add_uint8_t_fmt_2:
+** add\s+[atx][0-9]+,\s*a0,\s*a1
+** andi\s+[atx][0-9]+,\s*[atx][0-9]+,\s*0xff
+** sltu\s+[atx][0

[PATCH committed] Add additional option --param max-completely-peeled-insns=200 for power64*-*-*

2024-06-06 Thread liuhongt
gcc/testsuite/ChangeLog:

* gcc.dg/vect/pr112325.c:Add additional option --param
max-completely-peeled-insns=200 for power64*-*-*.
---
 gcc/testsuite/gcc.dg/vect/pr112325.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/testsuite/gcc.dg/vect/pr112325.c 
b/gcc/testsuite/gcc.dg/vect/pr112325.c
index dea6cca3b86..143903beab2 100644
--- a/gcc/testsuite/gcc.dg/vect/pr112325.c
+++ b/gcc/testsuite/gcc.dg/vect/pr112325.c
@@ -3,6 +3,7 @@
 /* { dg-require-effective-target vect_int } */
 /* { dg-require-effective-target vect_shift } */
 /* { dg-additional-options "-mavx2" { target x86_64-*-* i?86-*-* } } */
+/* { dg-additional-options "--param max-completely-peeled-insns=200" { target 
powerpc64*-*-* } } */
 
 typedef unsigned short ggml_fp16_t;
 static float table_f32_f16[1 << 16];
-- 
2.31.1



[PATCH v2] RISC-V: Implement .SAT_SUB for unsigned scalar int

2024-06-06 Thread pan2 . li
From: Pan Li 

As the middle support of .SAT_SUB committed,  implement the unsigned
scalar int of .SAT_SUB for the riscv backend.  Consider below example
code:

T __attribute__((noinline))\
sat_u_sub_##T##_fmt_1 (T x, T y)   \
{  \
  return (x - y) & (-(T)(x >= y)); \
}

T __attribute__((noinline))   \
sat_u_sub_##T##_fmt_2 (T x, T y)  \
{ \
  return (x - y) & (-(T)(x > y)); \
}

DEF_SAT_U_SUB_FMT_1(uint64_t);
DEF_SAT_U_SUB_FMT_2(uint64_t);

Before this patch:
sat_u_sub_uint64_t_fmt_1:
bltua0,a1,.L2
sub a0,a0,a1
ret
.L2:
li  a0,0
ret

After this patch:
sat_u_sub_uint64_t_fmt_1:
sltua5,a0,a1
addia5,a5,-1
sub a0,a0,a1
and a0,a5,a0
ret

Please note only above 2 forms of .SAT_SUB are support for now,  we will
add more forms in short future.

The below test suites are passed for this patch.
* The rv64gcv fully regression test.

gcc/ChangeLog:

* config/riscv/riscv-protos.h (riscv_expand_ussub): Add new func
decl for ussub expanding.
* config/riscv/riscv.cc (riscv_expand_ussub): Ditto but for impl.
* config/riscv/riscv.md (ussub3): Add new pattern ussub
for scalar modes.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/sat_arith.h: Add test macros and comments.
* gcc.target/riscv/sat_u_sub-1.c: New test.
* gcc.target/riscv/sat_u_sub-2.c: New test.
* gcc.target/riscv/sat_u_sub-3.c: New test.
* gcc.target/riscv/sat_u_sub-4.c: New test.
* gcc.target/riscv/sat_u_sub-5.c: New test.
* gcc.target/riscv/sat_u_sub-6.c: New test.
* gcc.target/riscv/sat_u_sub-7.c: New test.
* gcc.target/riscv/sat_u_sub-8.c: New test.
* gcc.target/riscv/sat_u_sub-run-1.c: New test.
* gcc.target/riscv/sat_u_sub-run-2.c: New test.
* gcc.target/riscv/sat_u_sub-run-3.c: New test.
* gcc.target/riscv/sat_u_sub-run-4.c: New test.
* gcc.target/riscv/sat_u_sub-run-5.c: New test.
* gcc.target/riscv/sat_u_sub-run-6.c: New test.
* gcc.target/riscv/sat_u_sub-run-7.c: New test.
* gcc.target/riscv/sat_u_sub-run-8.c: New test.

Signed-off-by: Pan Li 
---
 gcc/config/riscv/riscv-protos.h   |  1 +
 gcc/config/riscv/riscv.cc | 39 +++
 gcc/config/riscv/riscv.md | 11 ++
 gcc/testsuite/gcc.target/riscv/sat_arith.h| 23 +++
 gcc/testsuite/gcc.target/riscv/sat_u_sub-1.c  | 18 +
 gcc/testsuite/gcc.target/riscv/sat_u_sub-2.c  | 19 +
 gcc/testsuite/gcc.target/riscv/sat_u_sub-3.c  | 18 +
 gcc/testsuite/gcc.target/riscv/sat_u_sub-4.c  | 17 
 gcc/testsuite/gcc.target/riscv/sat_u_sub-5.c  | 18 +
 gcc/testsuite/gcc.target/riscv/sat_u_sub-6.c  | 19 +
 gcc/testsuite/gcc.target/riscv/sat_u_sub-7.c  | 18 +
 gcc/testsuite/gcc.target/riscv/sat_u_sub-8.c  | 17 
 .../gcc.target/riscv/sat_u_sub-run-1.c| 25 
 .../gcc.target/riscv/sat_u_sub-run-2.c| 25 
 .../gcc.target/riscv/sat_u_sub-run-3.c| 25 
 .../gcc.target/riscv/sat_u_sub-run-4.c| 25 
 .../gcc.target/riscv/sat_u_sub-run-5.c| 25 
 .../gcc.target/riscv/sat_u_sub-run-6.c| 25 
 .../gcc.target/riscv/sat_u_sub-run-7.c| 25 
 .../gcc.target/riscv/sat_u_sub-run-8.c| 25 
 20 files changed, 418 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-5.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-6.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-7.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-8.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-run-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-run-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-run-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-run-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-run-5.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-run-6.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-run-7.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-run-8.c

diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 0704968561b..09eb3a574e3 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -134,6 +134,7 @@ extern bool
 riscv_zcmp_valid_stack_adj_bytes_p (HOST_WIDE_INT, int);
 extern void riscv_legitimize_poly_move (machine

Re:[PATCH] haifa-sched: Avoid the fusion priority of the fused insn to affect the subsequent insn sequence.

2024-06-06 Thread Jin Ma

I am very sorry that I did not check the commit information carefully. The 
statement is somewhat inaccurate.

> When the insn 1 and 2, 3 and 4 can be fusioned, then there is the
> following sequence:
> 
> ;;    insn |
> ;;      1  | sp=sp-0x18
> ;;  +   2  | [sp+0x10]=ra
> ;;      3  | [sp+0x8]=s0
> ;;      4  | [sp+0x0]=s1

> The fusion priority of the insn 2, 3, and 4 are the same. According to
> the current algorithm, since abs(0x10-0x8) is followed by the insn 3. It is obviously unreasonable to do so.
> 
> Therefore, when we issue the insn 3 and 4, we should consider the fusion
> priority of the insn 1 instead of the insn 2. And the final instruction
> sequence is as follows:

> ;;    insn |
> ;;      1  | sp=sp-0x18
> ;;  +   2  | [sp+0x10]=ra
> ;;      4  | [sp+0x8]=s1
> ;;  +   3  | [sp+0x0]=s0
> 
> gcc/ChangeLog:

>  * haifa-sched.cc (rank_for_schedule): Likewise.

When the insn 1 and 2, 4 and 3 can be fusioned, then there is the
following sequence:

;;    insn |
;;      1  | sp=sp-0x18
;;  +   2  | [sp+0x10]=ra
;;      3  | [sp+0x8]=s0
;;      4  | [sp+0x0]=s1

The fusion priority of the insn 2, 3, and 4 are the same. According to
the current algorithm, since abs(0x10-0x8)

[PATCH] [libstdc++] drop workaround for clang<=7 (was: [libstdc++] add _GLIBCXX_CLANG to workaround predefined __clang__)

2024-06-06 Thread Alexandre Oliva
On May 31, 2024, Alexandre Oliva  wrote:

>> I think we could drop this kluge entirely, clang 7 is old now, we
>> generally only support the most recent 3 or 4 clang versions.

> Fine with me, but I'd do that in a separate later patch, so that this
> goes in, and if it gets backported, it will cover this change, rather
> than miss it.  Though, as you say, it doesn't matter much either way.

In response to a request in the review of the patch that introduced
_GLIBCXX_CLANG, this patch removes from std/variant an obsolete
workaround for clang 7-.

Regstrapping on x86_64-linux-gnu.  Ok to install?


for  libstdc++-v3/ChangeLog

* include/std/variant: Drop obsolete workaround.
---
 libstdc++-v3/include/std/variant |5 -
 1 file changed, 5 deletions(-)

diff --git a/libstdc++-v3/include/std/variant b/libstdc++-v3/include/std/variant
index 51aaa62085170..13ea1dd384965 100644
--- a/libstdc++-v3/include/std/variant
+++ b/libstdc++-v3/include/std/variant
@@ -1758,11 +1758,6 @@ namespace __detail::__variant
  }, __rhs);
   }
 
-#if defined(_GLIBCXX_CLANG) && __clang_major__ <= 7
-public:
-  using _Base::_M_u; // See https://bugs.llvm.org/show_bug.cgi?id=31852
-#endif
-
 private:
   template
friend constexpr decltype(auto)


-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity
Excluding neuro-others for not behaving ""normal"" is *not* inclusive


[PATCH] [testsuite] [arm] test board cflags in multilib.exp

2024-06-06 Thread Alexandre Oliva


multilib.exp tests for multilib-altering flags in a board's
multilib_flags and skips the test, but if such flags appear in the
board's cflags, with the same distorting effects on tested multilibs,
we fail to skip the test.

Extend the skipping logic to board's cflags as well.

Regstrapping on x86_64-linux-gnu.  Already tested on arm-eabi (gcc-13
and trunk).  Ok to install?


for  gcc/testsuite/ChangeLog

* gcc.target/arm/multilib.exp: Skip based on board cflags too.
---
 gcc/testsuite/gcc.target/arm/multilib.exp |8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/gcc/testsuite/gcc.target/arm/multilib.exp 
b/gcc/testsuite/gcc.target/arm/multilib.exp
index 4442d5d754bd6..12c93bc89d222 100644
--- a/gcc/testsuite/gcc.target/arm/multilib.exp
+++ b/gcc/testsuite/gcc.target/arm/multilib.exp
@@ -18,13 +18,15 @@ load_lib gcc-dg.exp
 
 dg-init
 
-if { [board_info [target_info name] exists multilib_flags] 
- && [regexp {(-marm|-mthumb|-march=.*|-mcpu=.*|-mfpu=.*|-mfloat=abi=.*)\y} 
[board_info [target_info name] multilib_flags]] } {
+foreach flagsvar {multilib_flags cflags} {
+  if { [board_info [target_info name] exists $flagsvar] 
+ && [regexp {(-marm|-mthumb|-march=.*|-mcpu=.*|-mfpu=.*|-mfloat=abi=.*)\y} 
[board_info [target_info name] $flagsvar]] } {

 # Multilib flags override anything we can apply to a test, so
 # skip if any of the above options are set there.
-verbose "skipping multilib tests due to multilib_flags setting" 1
+verbose "skipping multilib tests due to $flagsvar setting" 1
 return
+  }
 }
 
 # We don't want to run this test multiple times in a parallel make check.

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity
Excluding neuro-others for not behaving ""normal"" is *not* inclusive


Re: [wwwdocs] gcc-15/changes.html + projects/gomp: update for new OpenMP features

2024-06-06 Thread Tobias Burnus

Hi Gerald,

Gerald Pfeifer wrote:

+++ b/htdocs/gcc-15/changes.html
+
+  https://gcc.gnu.org/projects/gomp/";>OpenMP

Can you please make this a relative link, i.e. "../projects/gomp/"?


Good point. I thought such links should be absolute because of 
(www.)GNU.org, i.e.


https://www.gnu.org/software/gcc/releases.html

... but also that page has https://www.gnu.org/software/gcc/projects/gomp/

GNU.org does not have the documentation, but going to 
https://www.gnu.org/software/gcc/onlinedocs/ or a subpage redirects (302 
temporary redirect) to the GCC website. Likewise for '../git' but for 
'../wiki' it has a HTTP 404 not found; fortunately, ../wiki/ works.


I think there are plenty of links which could be relative ones but are 
absolute ones.


Looks like a janitorial task to fix the absolute links, possibly 
excluding those with /git, /onlinedocs, /wiki – or assuming that the 
main page is GCC.gnu.org, relying on the redirects.


In any case, those links are probably broken on GNU.org:

htdocs/gcc-14/porting_to.html:href="/onlinedocs/gcc-14.1.0/gcc/Diagnostic-Pragmas.html">#pragma 
GCC diagnostic warning


htdocs/gcc-5/changes.html:    A href="/onlinedocs/libstdc++/manual/using_dual_abi.html">Dual


* * *


+
+  OpenMP 5.1: The unroll and tile
+  loop-transformation constructs are now supported.
+

I'm thinking "loop transformation" in English? Or is this a specific term
from the standard?


Loop transformation happens at the end. But e.g "(#pragma omp) unroll 
full" is a directive and, e.g.


#pragma omp unroll partial(2)

for (int i=0; i < n; i++)

a[i] = 5;

is a construct (= directive + structured block (if any) + end directive 
(if any)).


Tobias



Re: [PATCH v3 #1/2] enable adjustment of return_pc debug attrs

2024-06-06 Thread Alexandre Oliva
On May 28, 2024, Jason Merrill  wrote:

> On 5/25/24 08:12, Alexandre Oliva wrote:
>> On Apr 27, 2023, Alexandre Oliva  wrote:
>>> On Apr 14, 2023, Alexandre Oliva  wrote:
 On Mar 23, 2023, Alexandre Oliva  wrote:
> This patch introduces infrastructure for targets to add an offset to
> the label issued after the call_insn to set the call_return_pc
> attribute.  This will be used on rs6000, that sometimes issues another
> instruction after the call proper as part of a call insn.
>> 
 Ping?
 https://gcc.gnu.org/pipermail/gcc-patches/2023-March/614452.html
>> Ping?
>> Refreshed, retested on ppc64le-linux-gnu.  Ok to install?

> I wonder about adding this information to REG_CALL_ARG_LOCATION, but
> doing it this way also seems reasonable.  I'm interested in Jakub's 
> input, but the patch is OK in a week if he doesn't get to it.

Thanks, I'm putting it in, but I also look forward to Jakub's feedback.

As for REG_CALL_ARG_LOCATION, I suppose that would be a decent place to
hold it for the new hook to get at it, but since it can usually be
computed directly, possibly with help of an attribute, adding extra rtl
to call insns is probably unnecessary and undesirable.

>> for  gcc/ChangeLog
>> * target.def (call_offset_return_label): New hook.
>> * gcc/doc/tm.texi.in (TARGET_CALL_OFFSET_RETURN_LABEL): Add
>> placeholder.
>> * gcc/doc/tm.texi: Rebuild.
>> * dwarf2out.cc (struct call_arg_loc_node): Record call_insn
>> instad of call_arg_loc_note.
>> (add_AT_lbl_id): Add optional offset argument.
>> (gen_call_site_die): Compute and pass on a return pc offset.
>> (gen_subprogram_die): Move call_arg_loc_note computation...
>> (dwarf2out_var_location): ... from here.  Set call_insn.

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity
Excluding neuro-others for not behaving ""normal"" is *not* inclusive


[PATCH] libstdc++: Optimize std::gcd

2024-06-06 Thread Stephen Face
This patch is to optimize the runtime execution of gcd. Mathematically,
it computes with the same algorithm as before, but subtractions and
branches are rearranged to encourage generation of code that can use
flags from the subtractions for conditional moves. Additionally, most
pairs of integers are coprime, so this patch also includes a check for
one of the integers to be equal to 1, and then it will exit the loop
early in this case.

libstdc++-v3/ChangeLog:

* include/std/numeric(__gcd): Optimize.
---
I have tested this on x86_64-linux and aarch64-linux. I have tested the
timing with random distributions of small inputs and large inputs on a
couple of machines with -O2 and found decreases in execution time from
20% to 60% depending on the machine and distribution of inputs.

 libstdc++-v3/include/std/numeric | 21 +++--
 1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/libstdc++-v3/include/std/numeric b/libstdc++-v3/include/std/numeric
index c912db4a519..3c9e8387a0e 100644
--- a/libstdc++-v3/include/std/numeric
+++ b/libstdc++-v3/include/std/numeric
@@ -148,19 +148,20 @@ namespace __detail
 
   while (true)
{
- if (__m > __n)
-   {
- _Tp __tmp = __m;
- __m = __n;
- __n = __tmp;
-   }
+ _Tp __m_minus_n = __m - __n;
+ if (__m_minus_n == 0)
+   return __m << __k;
 
- __n -= __m;
+ _Tp __next_m = __m < __n ? __m : __n;
 
- if (__n == 0)
-   return __m << __k;
+ if (__next_m == 1)
+   return __next_m << __k;
+
+ _Tp __n_minus_m = __n - __m;
+ __n = __n < __m ? __m_minus_n : __n_minus_m;
+ __m = __next_m;
 
- __n >>= std::__countr_zero(__n);
+ __n >>= std::__countr_zero(__m_minus_n);
}
 }
 } // namespace __detail
-- 
2.45.2



Re: [x86 SSE] Improve handling of ternlog instructions in i386/sse.md (v3)

2024-06-06 Thread Hongtao Liu
On Thu, Jun 6, 2024 at 6:07 PM Roger Sayle  wrote:
>
>
> Hi Hongtao,
> Here's the third revision of my improved ternlog handling patch for x86.
> This addresses the previously discovered problems, adding a check for
> memory_operand, and adds four new test cases, to confirm that the
> appropriate functionality is being triggered/covered, including a test
> case for the example you reported requiring the memory_operand fix.
> [Thanks to Alexander Monakov for suggesting I use my ternlog benchmark
> as a coverage testcase.]
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32}
> with no new failures.  Ok for mainline?
Ok.

BTW with -march=cascadelake, I notice there're new failures, most of
them can be fixed by adjusting ix86_rtx_cost(to recognize
ix86_ternlog_operand_p).

gcc: gcc.target/i386/avx2-pr98461.c scan-assembler-times \tnotl\t 6
gcc: gcc.target/i386/avx512f-copysign.c scan-assembler-times
vpternlog[dq][ \\t]+\\$(?:216|228|0xd8|0xe4), 5
gcc: gcc.target/i386/pr101989-broadcast-1.c scan-assembler-times \\{1to4\\} 4
gcc: gcc.target/i386/sse2-v1ti-vne.c scan-assembler-times pcmpeq 6
unix/-m32: gcc: gcc.target/i386/avx2-pr98461.c scan-assembler-times \tnotl\t 6
unix/-m32: gcc: gcc.target/i386/pr101989-broadcast-1.c
scan-assembler-times \\{1to4\\} 4


New tests that FAIL (6 tests):

gcc: gcc.target/i386/avx512f-vpternlogd-3.c scan-assembler-times
vpternlogd[ \\t] 694
gcc: gcc.target/i386/avx512f-vpternlogd-4.c scan-assembler-times
vpternlogd[ \\t] 694
unix/-m32: gcc: gcc.target/i386/avx512f-vpternlogd-3.c
scan-assembler-times vpternlogd[ \\t] 694
unix/-m32: gcc: gcc.target/i386/avx512f-vpternlogd-4.c
scan-assembler-times vpternlogd[ \\t] 694

>
>
> 2024-06-06  Roger Sayle  
> Hongtao Liu  
>
> gcc/ChangeLog
> * config/i386/i386-expand.cc (ix86_expand_args_builtin): Call
> fixup_modeless_constant before testing predicates.  Only call
> copy_to_mode_reg on memory operands (after the first one).
> (ix86_gen_bcst_mem): Helper function to convert a CONST_VECTOR
> into a VEC_DUPLICATE if possible.
> (ix86_ternlog_idx):  Convert an RTX expression into a ternlog
> index between 0 and 255, recording the operands in ARGS, if
> possible or return -1 if this is not possible/valid.
> (ix86_ternlog_leaf_p): Helper function to identify "leaves"
> of a ternlog expression, e.g. REG_P, MEM_P, CONST_VECTOR, etc.
> (ix86_ternlog_operand_p): Test whether a expression is suitable
> for and prefered as an UNSPEC_TERNLOG.
> (ix86_expand_ternlog_binop): Helper function to construct the
> binary operation corresponding to a sufficiently simple ternlog.
> (ix86_expand_ternlog_andnot): Helper function to construct a
> ANDN operation corresponding to a sufficiently simple ternlog.
> (ix86_expand_ternlog): Expand a 3-operand ternary logic
> expression, constructing either an UNSPEC_TERNLOG or simpler
> rtx expression.  Called from builtin expanders and pre-reload
> splitters.
> * config/i386/i386-protos.h (ix86_ternlog_idx): Prototype here.
> (ix86_ternlog_operand_p): Likewise.
> (ix86_expand_ternlog): Likewise.
> * config/i386/predicates.md (ternlog_operand): New predicate
> that calls xi86_ternlog_operand_p.
> * config/i386/sse.md (_vpternlog_0): New
> define_insn_and_split that recognizes a SET_SRC of ternlog_operand
> and expands it via ix86_expand_ternlog pre-reload.
> (_vternlog_mask): Convert from define_insn to
> define_expand.  Use ix86_expand_ternlog if the mask operand is
> ~0 (or 255 or -1).
> (*_vternlog_mask): define_insn renamed from above.
>
> gcc/testsuite/ChangeLog
> * gcc.target/i386/avx512f-vpternlogd-1.c: Update test case.
> * gcc.target/i386/avx512f-vpternlogq-1.c: Likewise.
> * gcc.target/i386/avx512vl-vpternlogd-1.c: Likewise.
> * gcc.target/i386/avx512vl-vpternlogq-1.c: Likewise.
> * gcc.target/i386/pr100711-4.c: Likewise.
> * gcc.target/i386/pr100711-5.c: Likewise.
>
> * gcc.target/i386/avx512f-vpternlogd-3.c: New 128-bit test case.
> * gcc.target/i386/avx512f-vpternlogd-4.c: New 256-bit test case.
> * gcc.target/i386/avx512f-vpternlogd-5.c: New 512-bit test case.
> * gcc.target/i386/avx512f-vpternlogq-3.c: New test case.
>
>
> Thanks in advance,
> Roger
>
> > -Original Message-
> > From: Hongtao Liu 
> > On Mon, May 27, 2024 at 2:48 PM Hongtao Liu  wrote:
> > >
> > > On Sat, May 18, 2024 at 4:10 AM Roger Sayle 
> > wrote:
> > > >
> > > >
> > > > Hi Hongtao,
> > > > Many thanks for the review, bug fixes and suggestions for improvements.
> > > > This revised version of the patch, implements all of your
> > > > corrections.  In theory the "ternlog idx" should guarantee that 

Re: [Patch, PR Fortran/90072] Polymorphic Dispatch to Polymophic Return Type Memory Leak

2024-06-06 Thread Paul Richard Thomas
Hi Andre,

I apologise for the slow response. It's been something of a heavy week...

This is good for mainline.

Thanks

Paul

PS That's good news about the funding. Maybe we will get to see "built in"
coarrays soon?


On Tue, 4 Jun 2024 at 11:25, Andre Vehreschild  wrote:

> Hi all,
>
> attached patch fixes a memory leak when a user-defined function returns a
> polymorphic type/class. The issue was, that the polymorphic type was not
> detected correctly and therefore the len-field was not transferred
> correctly.
>
> Regtests ok x86_64-linux/Fedora 39. Ok for master?
>
> Regards,
> Andre
> --
> Andre Vehreschild * Email: vehre ad gmx dot de
>