date:20240612

On Tue, 11 Jun 2024, Richard Sandiford wrote:

> Don't think it makes any difference, but:
> 
> Richard Biener  writes:
> > @@ -2151,7 +2151,16 @@ get_group_load_store_type (vec_info *vinfo, 
> > stmt_vec_info stmt_info,
> >  access excess elements.
> >  ???  Enhancements include peeling multiple iterations
> >  or using masked loads with a static mask.  */
> > - || (group_size * cvf) % cnunits + group_size - gap < cnunits))
> > + || ((group_size * cvf) % cnunits + group_size - gap < cnunits
> > + /* But peeling a single scalar iteration is enough if
> > +we can use the next power-of-two sized partial
> > +access.  */
> > + && ((cremain = (group_size * cvf - gap) % cnunits), true
> 
> ...this might be less surprising as:
> 
> && ((cremain = (group_size * cvf - gap) % cnunits, true)
> 
> in terms of how the &&s line up.

Yeah - I'll fix before pushing.

Thanks,
Richard.

> Thanks,
> Richard
> 
> > + && ((cpart_size = (1 << ceil_log2 (cremain)))
> > + != cnunits)
> > + && vector_vector_composition_type
> > +  (vectype, cnunits / cpart_size,
> > +   &half_vtype) == NULL_TREE
> > {
> >   if (dump_enabled_p ())
> > dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > @@ -11599,6 +11608,27 @@ vectorizable_load (vec_info *vinfo,
> >   gcc_assert (new_vtype
> >   || LOOP_VINFO_PEELING_FOR_GAPS
> >(loop_vinfo));
> > +   /* But still reduce the access size to the next
> > +  required power-of-two so peeling a single
> > +  scalar iteration is sufficient.  */
> > +   unsigned HOST_WIDE_INT cremain;
> > +   if (remain.is_constant (&cremain))
> > + {
> > +   unsigned HOST_WIDE_INT cpart_size
> > + = 1 << ceil_log2 (cremain);
> > +   if (known_gt (nunits, cpart_size)
> > +   && constant_multiple_p (nunits, cpart_size,
> > +   &num))
> > + {
> > +   tree ptype;
> > +   new_vtype
> > + = vector_vector_composition_type (vectype,
> > +   num,
> > +   &ptype);
> > +   if (new_vtype)
> > + ltype = ptype;
> > + }
> > + }
> >   }
> >   }
> > tree offset
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Re: [PATCH] [libstdc++] [testsuite] xfail double-prec from_chars for float128_t

On Wed, 12 Jun 2024, 02:14 Alexandre Oliva,  wrote:

>
> Tests involving float128_t were xfailed or otherwise worked around for
> vxworks on aarch64.  The same issue came up on rtems.  This patch
> adjusts them similarly.
>
> Regstrapping on x86_64-linux-gnu.  Also tested with gcc-13 on
> aarch64-rtems6.  Ok to install?
>

OK


> (I'd have expected the fast_float limitation to come up with aarch64-elf
> and any other aarch64 targets, but since I haven't observed it there,
> I'm avoiding aarch64-*-*.)
>
>
> for  libstdc++-v3/ChangeLog
>
> * testsuite/20_util/from_chars/8.cc: Skip float128_t testing
> on aarch64-rtems*.
> * testsuite/20_util/to_chars/float128_c++23.cc: Xfail run on
> aarch64-rtems*.
> ---
>  libstdc++-v3/testsuite/20_util/from_chars/8.cc |2 +-
>  .../testsuite/20_util/to_chars/float128_c++23.cc   |2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/libstdc++-v3/testsuite/20_util/from_chars/8.cc
> b/libstdc++-v3/testsuite/20_util/from_chars/8.cc
> index a6343422c5a91..bacad89943b5f 100644
> --- a/libstdc++-v3/testsuite/20_util/from_chars/8.cc
> +++ b/libstdc++-v3/testsuite/20_util/from_chars/8.cc
> @@ -17,7 +17,7 @@
>
>  // { dg-do run { target c++23 } }
>  // { dg-add-options ieee }
> -// { dg-additional-options "-DSKIP_LONG_DOUBLE" { target
> aarch64-*-vxworks* } }
> +// { dg-additional-options "-DSKIP_LONG_DOUBLE" { target
> aarch64-*-vxworks* aarch64-*-rtems* } }
>
>  #include 
>  #include 
> diff --git a/libstdc++-v3/testsuite/20_util/to_chars/float128_c++23.cc
> b/libstdc++-v3/testsuite/20_util/to_chars/float128_c++23.cc
> index ca00761ee7c98..6cb9cadcd2041 100644
> --- a/libstdc++-v3/testsuite/20_util/to_chars/float128_c++23.cc
> +++ b/libstdc++-v3/testsuite/20_util/to_chars/float128_c++23.cc
> @@ -19,7 +19,7 @@
>  // { dg-require-effective-target ieee_floats }
>  // { dg-require-effective-target size32plus }
>  // { dg-add-options ieee }
> -// { dg-xfail-run-if "from_chars limited to double-precision" {
> aarch64-*-vxworks* } }
> +// { dg-xfail-run-if "from_chars limited to double-precision" {
> aarch64-*-vxworks* aarch64-*-rtems* } }
>
>  #include 
>  #include 
>
> --
> Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
>Free Software Activist   GNU Toolchain Engineer
> More tolerance and less prejudice are key for inclusion and diversity
> Excluding neuro-others for not behaving ""normal"" is *not* inclusive
>

Re: [PATCH] [libstdc++] [testsuite] require cmath for c++23 cmath tests

On Wed, 12 Jun 2024, 02:17 Alexandre Oliva,  wrote:

>
> Some c++23 tests fail on targets that don't satisfy dg-require-cmath,
> because referenced math functions don't get declared in std.


Are they present on the target at all? Is not declaring them in std the
underlying bug here?


  Add the
> missing requirement.
>
> Regstrapping on x86_64-linux-gnu.  Already successfully tested with
> gcc-13 on aarch64-rtems, where it avoids the errors that come up because
> math.h doesn't meet the cmath requirements there.  Ok to install?
>

OK


>
> for  libstdc++-v3/ChangeLog
>
> * testsuite/26_numerics/headers/cmath/constexpr_std_c++23.cc:
> Require cmath.
> * testsuite/26_numerics/headers/cmath/functions_std_c++23.cc:
> Likewise.
> * testsuite/26_numerics/headers/cmath/nextafter_std_c++23.cc:
> Likewise.
> ---
>  .../headers/cmath/constexpr_std_c++23.cc   |1 +
>  .../headers/cmath/functions_std_c++23.cc   |1 +
>  .../26_numerics/headers/cmath/nextafter_c++23.cc   |1 +
>  3 files changed, 3 insertions(+)
>
> diff --git
> a/libstdc++-v3/testsuite/26_numerics/headers/cmath/constexpr_std_c++23.cc
> b/libstdc++-v3/testsuite/26_numerics/headers/cmath/constexpr_std_c++23.cc
> index 0e3d112fe2e80..3c2377fd6987b 100644
> ---
> a/libstdc++-v3/testsuite/26_numerics/headers/cmath/constexpr_std_c++23.cc
> +++
> b/libstdc++-v3/testsuite/26_numerics/headers/cmath/constexpr_std_c++23.cc
> @@ -16,6 +16,7 @@
>  // .
>
>  // { dg-do link { target c++23 } }
> +// { dg-require-cmath "" }
>
>  #include 
>  #include 
> diff --git
> a/libstdc++-v3/testsuite/26_numerics/headers/cmath/functions_std_c++23.cc
> b/libstdc++-v3/testsuite/26_numerics/headers/cmath/functions_std_c++23.cc
> index 000cebf364aaa..ea68ac5da7551 100644
> ---
> a/libstdc++-v3/testsuite/26_numerics/headers/cmath/functions_std_c++23.cc
> +++
> b/libstdc++-v3/testsuite/26_numerics/headers/cmath/functions_std_c++23.cc
> @@ -16,6 +16,7 @@
>  // .
>
>  // { dg-do link { target c++23 } }
> +// { dg-require-cmath "" }
>
>  #include 
>  #include 
> diff --git
> a/libstdc++-v3/testsuite/26_numerics/headers/cmath/nextafter_c++23.cc
> b/libstdc++-v3/testsuite/26_numerics/headers/cmath/nextafter_c++23.cc
> index 7d7e10bd8aea3..91767d22cc3f2 100644
> --- a/libstdc++-v3/testsuite/26_numerics/headers/cmath/nextafter_c++23.cc
> +++ b/libstdc++-v3/testsuite/26_numerics/headers/cmath/nextafter_c++23.cc
> @@ -16,6 +16,7 @@
>  // .
>
>  // { dg-do run { target c++23 } }
> +// { dg-require-cmath "" }
>
>  #include 
>  #include 
>
> --
> Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
>Free Software Activist   GNU Toolchain Engineer
> More tolerance and less prejudice are key for inclusion and diversity
> Excluding neuro-others for not behaving ""normal"" is *not* inclusive
>

Re: [PATCH 2/2] RISC-V: Move mode assertion out of conditional branch in emit_insn

2024-06-12 Thread Robin Dapp

Hi Edwin,

this LGTM but I just remembered I intended to turn the assert
into a more descriptive error.

The attached patch has been sitting on my local branch for a
while.  Maybe we should rather fold yours into it?

Regards
 Robin

>From d164403ef577917f905c1690f2199fab330f05e2 Mon Sep 17 00:00:00 2001
From: Robin Dapp 
Date: Fri, 31 May 2024 14:51:17 +0200
Subject: [PATCH] RISC-V: Use descriptive errors instead of asserts.

In emit_insn we forestall possible ICEs in maybe_legitimize_operand by
asserting.  This patch replaces the assertions by more descriptive
internal errors.

gcc/ChangeLog:

* config/riscv/riscv-v.cc: Replace asserts by internal errors.
---
 gcc/config/riscv/riscv-v.cc | 27 ++-
 1 file changed, 22 insertions(+), 5 deletions(-)

diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 8911f5783c8..810203b8ba5 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -50,6 +50,7 @@
 #include "rtx-vector-builder.h"
 #include "targhooks.h"
 #include "predict.h"
+#include "errors.h"
 
 using namespace riscv_vector;
 
@@ -291,10 +292,20 @@ public:
if (mode == VOIDmode)
  mode = Pmode;
else
- /* Early assertion ensures same mode since maybe_legitimize_operand
-will check this.  */
- gcc_assert (GET_MODE (ops[opno]) == VOIDmode
- || GET_MODE (ops[opno]) == mode);
+ {
+   /* Early assertion ensures same mode since maybe_legitimize_operand
+  will check this.  */
+   machine_mode required_mode = GET_MODE (ops[opno]);
+   if (required_mode != VOIDmode && required_mode != mode)
+ {
+   internal_error ("expected mode %s for operand %d of "
+   "insn %s but got mode %s.\n",
+   GET_MODE_NAME (mode),
+   opno,
+   insn_data[(int) icode].name,
+   GET_MODE_NAME (required_mode));
+ }
+ }
 
add_input_operand (ops[opno], mode);
   }
@@ -346,7 +357,13 @@ public:
 else if (m_insn_flags & VXRM_RDN_P)
   add_rounding_mode_operand (VXRM_RDN);
 
-gcc_assert (insn_data[(int) icode].n_operands == m_opno);
+
+if (insn_data[(int) icode].n_operands != m_opno)
+  internal_error ("invalid number of operands for insn %s, "
+ "expected %d but got %d.\n",
+ insn_data[(int) icode].name,
+ insn_data[(int) icode].n_operands, m_opno);
+
 expand (icode, any_mem_p);
   }
 
-- 
2.45.1

Re: [PATCH 1/2] RISC-V: Fix vwsll combine on rv32 targets

2024-06-12 Thread Robin Dapp

Hi Edwin,

this is OK but did you check if we can get rid of the subreg
condition now that we have gen_lowpart?

Regards
 Robin

[pushed] wwwdocs: news: Remove reference to /java

2024-06-12 Thread Gerald Pfeifer

This is a left-over that redirects to our main page; we removed all
our Java material years ago.
---
 htdocs/news.html | 1 -
 1 file changed, 1 deletion(-)

diff --git a/htdocs/news.html b/htdocs/news.html
index de92bdf6..4a24a4ad 100644
--- a/htdocs/news.html
+++ b/htdocs/news.html
@@ -1799,7 +1799,6 @@ that tracks checkins to the egcs webpages CVS repository.
 Cygnus announces the first public
 release of libgcj, the runtime component of the GNU compiler for Java.
 Read the release announcement.
-Goto the libgcj homepage.
 
 
 April 6, 1999
-- 
2.45.2

Re: [PATCH v4] rs6000: Fix incorrect RTL for Power LE when removing the UNSPECS [PR106069]

Hi,

on 2023/2/10 10:59, Xionghu Luo wrote:
> Resend this patch...
> 
> v4: Update per comments.
> v3: rename altivec_vmrghb_direct_le to altivec_vmrglb_direct_le to match
> the actual output ASM vmrglb. Likewise for all similar xxx_direct_le
> patterns.
> v2: Split the direct pattern to be and le with same RTL but different insn.
> 
> The native RTL expression for vec_mrghw should be same for BE and LE as
> they are register and endian-independent.  So both BE and LE need
> generate exactly same RTL with index [0 4 1 5] when expanding vec_mrghw
> with vec_select and vec_concat.
> 
> (set (reg:V4SI 141) (vec_select:V4SI (vec_concat:V8SI
>  (subreg:V4SI (reg:V16QI 139) 0)
>  (subreg:V4SI (reg:V16QI 140) 0))
>  [const_int 0 4 1 5]))
> 
> Then combine pass could do the nested vec_select optimization
> in simplify-rtx.c:simplify_binary_operation_1 also on both BE and LE:
> 
> 21: r150:V4SI=vec_select(vec_concat(r141:V4SI,r146:V4SI),parallel [0 4 1 5])
> 24: {r151:SI=vec_select(r150:V4SI,parallel [const_int 3]);}
> 
> =>
> 
> 21: r150:V4SI=vec_select(vec_concat(r141:V4SI,r146:V4SI),parallel)
> 24: {r151:SI=vec_select(r146:V4SI,parallel [const_int 1]);}
> 
> The endianness check need only once at ASM generation finally.
> ASM would be better due to nested vec_select simplified to simple scalar
> load.
> 
> Regression tested pass for Power8{LE,BE}{32,64} and Power{9,10}LE{32,64}
> Linux.

As the recent PR115355 shows, this issue can also affect the
behavior when users are adopting vectorization optimization,
IMHO we should get this landed as soon as possible.

The culprit commit r12-4496 changes the expanders for vector
merge {high/h,low/l} {byte/b, halfword/h, word/w}, which are
mainly for built-in function vec_merge{h,l} expanding (also
used as gen function in some internal uses).  As PVIPR defines,
vec_mergeh "Merges the first halves (in element order) of two
vectors" and vec_mergel "Merges the last halves (in element
order) of two vectors", so both of them clearly have endian
considerations.  Taking define_expand "altivec_vmrghb" as
example, before commit r12-4496 it generates below RTL
pattern for both BE and LE:

// from (define_insn "*altivec_vmrghb_internal"
   [(set (match_operand:V16QI 0 "register_operand" "=v")
 (vec_select:V16QI
   (vec_concat:V32QI
 (match_operand:V16QI 1 "register_operand" "v")
 (match_operand:V16QI 2 "register_operand" "v"))
   (parallel [(const_int 0) (const_int 16)
  (const_int 1) (const_int 17)
  (const_int 2) (const_int 18)
  (const_int 3) (const_int 19)
  (const_int 4) (const_int 20)
  (const_int 5) (const_int 21)
  (const_int 6) (const_int 22)
  (const_int 7) (const_int 23)])))]

and which matches hardware insn "vmrghb %0,%1,%2" on BE while
"vmrglb %0,%2,%1" on LE.

After commit r12-4496, on BE it generates RTL pattern

// from (define_insn "altivec_vmrghb_direct"
   [(set (match_operand:V16QI 0 "register_operand" "=v")
 (vec_select:V16QI
   (vec_concat:V32QI
 (match_operand:V16QI 1 "register_operand" "v")
 (match_operand:V16QI 2 "register_operand" "v"))
   (parallel [(const_int 0) (const_int 16)
  (const_int 1) (const_int 17)
  (const_int 2) (const_int 18)
  (const_int 3) (const_int 19)
  (const_int 4) (const_int 20)
  (const_int 5) (const_int 21)
  (const_int 6) (const_int 22)
  (const_int 7) (const_int 23)])))]

and matches hw insn "vmrghb %0,%1,%2" which is consistent
with the previous.  However, on LE it generates pattern

// from (define_insn "altivec_vmrglb_direct"
   [(set (match_operand:V16QI 0 "register_operand" "=v")
 (vec_select:V16QI
   (vec_concat:V32QI
 (match_operand:V16QI 2 "register_operand" "v")
 (match_operand:V16QI 1 "register_operand" "v"))
   (parallel [(const_int  8) (const_int 24)
  (const_int  9) (const_int 25)
  (const_int 10) (const_int 26)
  (const_int 11) (const_int 27)
  (const_int 12) (const_int 28)
  (const_int 13) (const_int 29)
  (const_int 14) (const_int 30)
  (const_int 15) (const_int 31)])))]

, note that it's adjusted by considering the effect of
std::swap on operands.  It matches hw insn "vmrglb %0,%1,%2"
which is the same as before (as swapping operands), but its
associated RTL pattern is totally changed, which is wrong.
If optimization passes leave this pattern alone, even if its
pattern doesn't represent its hw insn, it's still fine, that's
why simple testing on bif doesn't expose this issue.  But once
some optimization pass such as

Re: [PATCH-1v3] fwprop: Replace rtx_cost with insn_cost in try_fwprop_subst_pattern [PR113325]

HAO CHEN GUI  writes:
> Hi,
>   This patch replaces rtx_cost with insn_cost in forward propagation.
> In the PR, one constant vector should be propagated and replace a
> pseudo in a store insn if we know it's a duplicated constant vector.
> It reduces the insn cost but not rtx cost. In this case, the cost is
> determined by destination operand (memory or pseudo). Unfortunately,
> rtx cost can't help.
>
>   The test case is added in the second rs6000 specific patch.
>
>   Compared to previous version, the main changes are:
> 1. Invoke change_is_worthwhile to judge if the cost is reduced and
> the replacement is worthwhile.
> 2. Invalidate recog data before getting the insn cost for the new
> rtl as insn cost might call extract_constrain_insn_cached and
> extract_insn_cached to cache the recog data. The cache data is
> invalid for the new rtl and it causes ICE.
> 3. Check if the insn cost of new rtl is zero which means unknown
> cost. The replacement should be rejected at this situation.
>
> Previous version
> https://gcc.gnu.org/pipermail/gcc-patches/2024-May/651233.html
>
>   The patch causes a regression cases on i386 as the pattern cost
> regulation has a bug. Please refer the patch and discussion here.
> https://gcc.gnu.org/pipermail/gcc-patches/2024-May/651363.html
>
>   Bootstrapped and tested on powerpc64-linux BE and LE with no
> regressions. Is it OK for the trunk?
>
> ChangeLog
> fwprop: invoke change_is_worthwhile to judge if a replacement is worthwhile
>
> gcc/
>   * fwprop.cc (try_fwprop_subst_pattern): Invoke change_is_worthwhile
>   to judge if a replacement is worthwhile.
>   * rtl-ssa/changes.cc (rtl_ssa::changes_are_worthwhile): Invalidate
>   recog data before getting the insn cost for the new rtl.  Check if
>   the insn cost of new rtl is unknown and fail the replacement.
>
> patch.diff
> diff --git a/gcc/fwprop.cc b/gcc/fwprop.cc
> index de543923b92..975de0eec7f 100644
> --- a/gcc/fwprop.cc
> +++ b/gcc/fwprop.cc
> @@ -471,29 +471,19 @@ try_fwprop_subst_pattern (obstack_watermark &attempt, 
> insn_change &use_change,
>redo_changes (0);
>  }
>
> -  /* ??? In theory, it should be better to use insn costs rather than
> - set_src_costs here.  That would involve replacing this code with
> - change_is_worthwhile.  */
>bool ok = recog (attempt, use_change);
> -  if (ok && !prop.changed_mem_p () && !use_insn->is_asm ())
> -if (rtx use_set = single_set (use_rtl))
> -  {
> - bool speed = optimize_bb_for_speed_p (BLOCK_FOR_INSN (use_rtl));
> - temporarily_undo_changes (0);
> - auto old_cost = set_src_cost (SET_SRC (use_set),
> -   GET_MODE (SET_DEST (use_set)), speed);
> - redo_changes (0);
> - auto new_cost = set_src_cost (SET_SRC (use_set),
> -   GET_MODE (SET_DEST (use_set)), speed);
> - if (new_cost > old_cost
> - || (new_cost == old_cost && !prop.likely_profitable_p ()))
> -   {
> - if (dump_file)
> -   fprintf (dump_file, "change not profitable"
> -" (cost %d -> cost %d)\n", old_cost, new_cost);
> - ok = false;
> -   }
> -  }
> +  if (ok && !prop.changed_mem_p () && !use_insn->is_asm ()
> +  && single_set (use_rtl))
> +{
> +  if (!change_is_worthwhile (use_change, false)
> +   || (!prop.likely_profitable_p ()
> +   && !change_is_worthwhile (use_change, true)))
> + {
> +   if (dump_file)
> + fprintf (dump_file, "change not profitable");
> +   ok = false;
> + }
> +}

It should only be necessary to call change_is_worthwhile once,
with strict == !prop.likely_profitable_p ()

So something like:

  bool ok = recog (attempt, use_change);
  if (ok && !prop.changed_mem_p () && !use_insn->is_asm ())
{
  bool strict_p = !prop.likely_profitable_p ();
  if (!change_is_worthwhile (use_change, strict_p))
{
  if (dump_file)
fprintf (dump_file, "change not profitable");
  ok = false;
}
}

> diff --git a/gcc/rtl-ssa/changes.cc b/gcc/rtl-ssa/changes.cc
> index 11639e81bb7..9bad6c2070c 100644
> --- a/gcc/rtl-ssa/changes.cc
> +++ b/gcc/rtl-ssa/changes.cc
> @@ -185,7 +185,18 @@ rtl_ssa::changes_are_worthwhile (array_slice *const> changes,
> * change->old_cost ());
>if (!change->is_deletion ())
>   {
> +   /* Invalidate recog data as insn_cost may call
> +  extract_insn_cached.  */
> +   INSN_CODE (change->rtl ()) = -1;

The:

  bool ok = recog (attempt, use_change);

should leave INSN_CODE set to the result of the successful recog.
Why isn't that true in the example you hit?

I wondered whether we might be trying to cost a NOOP_MOVE_INSN_CODE,
since I couldn't see anything in the current code to stop that.
But if so, that's a bug.  NOOP_MOVE_INSN_CODE should have zero cost,
and shouldn't go through insn_cost.

Thanks,
Richard

> change->new_

Re: [PATCH] ifcvt: Clarify if_info.original_cost.

2024-06-12 Thread Robin Dapp

> Hmm, ok.  The bit that confused me most was:
> 
>   if (last_needs_comparison != -1)
> {
>   end_sequence ();
>   start_sequence ();
>   ...
> }
> 
> which implied that the second attempt was made conditionally.
> It seems like it's always used and is an inherent part of the
> algorithm.
> 
> If the problem is tracking liveness, wouldn't it be better to
> iterate over the "then" block in reverse order?  We would start
> with the liveness set for the join block and update as we move
> backwards through the "then" block.  This liveness set would
> tell us whether the current instruction needs to preserve a
> particular register.  That should make it possible to do the
> transformation in one step, and so avoid the risk that the
> second attempt does something that is unexpectedly different
> from the first attempt.

I agree that the current approach is rather cumbersome.  Indeed
the second attempt was conditional at first and I changed it to
be unconditional after some patch iterations.
Your reverse-order idea sounds like it should work.  To further
clean up the algorithm we could also make it more explicit
that a "cmov" depends on either the condition or the CC and
basically track two separate paths through the block, one CC
path and one "condition" path.

I can surely do that as a follow up.  It might conflict with
Manolis's changes, though, so his work should probably be in
first.

> FWIW, the reason for asking was that it seemed safer to pass
> use_cond_earliest back from noce_convert_multiple_sets_1
> to noce_convert_multiple_sets, as another parameter,
> and then do the adjustment around noce_convert_multiple_sets's
> call to targetm.noce_conversion_profitable_p.  That would avoid
> the new for a new if_info field, which in turn would make it
> less likely that stale information is carried over from one attempt
> to the next (e.g. if other ifcvt techniques end up using the same
> field in future).

Would something like the attached v4 be OK that uses a parameter
instead (I mean without having refactored the full algorithm)?
At least I changed the comment before the second attempt to
hopefully cause a tiny bit less confusion :)
I haven't fully bootstrapped it yet.

Regards
 Robin

Before noce_find_if_block processes a block it sets up an if_info
structure that holds the original costs.  At that point the costs of
the then/else blocks have not been added so we only care about the
"if" cost.

The code originally used BRANCH_COST for that but was then changed
to COST_N_INSNS (2) - a compare and a jump.

This patch computes the jump costs via
  insn_cost (if_info.jump, ...)
under the assumption that the target takes BRANCH_COST into account
when costing a jump instruction.

In noce_convert_multiple_sets, we keep track of the need for the initial
CC comparison.  If we needed it for the generated sequence we add its
cost before default_noce_conversion_profitable_p.

gcc/ChangeLog:

* ifcvt.cc (noce_convert_multiple_sets):  Define
use_cond_earliest and adjust original cost if needed.
(noce_convert_multiple_sets_1): Add param use_cond_earliest.
(noce_process_if_block): Do not subtract CC cost anymore.
(noce_find_if_block): Use insn_cost for costing jump insn.
---
 gcc/ifcvt.cc | 79 +---
 1 file changed, 44 insertions(+), 35 deletions(-)

diff --git a/gcc/ifcvt.cc b/gcc/ifcvt.cc
index 58ed42673e5..2854eea7702 100644
--- a/gcc/ifcvt.cc
+++ b/gcc/ifcvt.cc
@@ -105,7 +105,8 @@ static bool noce_convert_multiple_sets_1 (struct 
noce_if_info *,
  hash_map *,
  auto_vec *,
  auto_vec *,
- auto_vec *, int *);
+ auto_vec *,
+ int *, bool *);
 
 /* Count the number of non-jump active insns in BB.  */
 
@@ -3502,30 +3503,28 @@ noce_convert_multiple_sets (struct noce_if_info 
*if_info)
 
   int last_needs_comparison = -1;
 
+  bool use_cond_earliest = false;
+
   bool ok = noce_convert_multiple_sets_1
 (if_info, &need_no_cmov, &rewired_src, &targets, &temporaries,
- &unmodified_insns, &last_needs_comparison);
+ &unmodified_insns, &last_needs_comparison, &use_cond_earliest);
   if (!ok)
   return false;
 
-  /* If there are insns that overwrite part of the initial
- comparison, we can still omit creating temporaries for
- the last of them.
- As the second try will always create a less expensive,
- valid sequence, we do not need to compare and can discard
- the first one.  */
-  if (last_needs_comparison != -1)
-{
-  end_sequence ();
-  start_sequence ();
-  ok = noce_convert_multiple_sets_1
-   (if_info, &need_no_cmov, &rewired_src, &targets, &temporaries,
-&unmodified_insns, &last_needs_comparison);
-  /* Actually

Re: [PATCH] rust: Do not link with libdl and libpthread unconditionally

2024-06-12 Thread Uros Bizjak

On Tue, Jun 11, 2024 at 11:21 AM Arthur Cohen  wrote:
>
> Thanks Richi!
>
> Tested again and pushed on trunk.


This patch introduced a couple of errors during ./configure:

checking for library containing dlopen... none required
checking for library containing pthread_create... none required
/git/gcc/configure: line 8997: test: too many arguments
/git/gcc/configure: line 8999: test: too many arguments
/git/gcc/configure: line 9003: test: too many arguments
/git/gcc/configure: line 9005: test: =: unary operator expected

You have to wrap arguments of the test with double quotes.

Uros.

> Best,
>
> Arthur
>
> On 5/31/24 15:02, Richard Biener wrote:
> > On Fri, May 31, 2024 at 12:24 PM Arthur Cohen  
> > wrote:
> >>
> >> Hi Richard,
> >>
> >> On 4/30/24 09:55, Richard Biener wrote:
> >>> On Fri, Apr 19, 2024 at 11:49 AM Arthur Cohen  
> >>> wrote:
> 
>  Hi everyone,
> 
>  This patch checks for the presence of dlopen and pthread_create in libc. 
>  If that is not the
>  case, we check for the existence of -ldl and -lpthread, as these 
>  libraries are required to
>  link the Rust runtime to our Rust frontend.
> 
>  If these libs are not present on the system, then we disable the Rust 
>  frontend.
> 
>  This was tested on x86_64, in an environment with a recent GLIBC and in 
>  a container with GLIBC
>  2.27.
> 
>  Apologies for sending it in so late.
> >>>
> >>> For example GCC_ENABLE_PLUGINS simply does
> >>>
> >>># Check -ldl
> >>>saved_LIBS="$LIBS"
> >>>AC_SEARCH_LIBS([dlopen], [dl])
> >>>if test x"$ac_cv_search_dlopen" = x"-ldl"; then
> >>>  pluginlibs="$pluginlibs -ldl"
> >>>fi
> >>>LIBS="$saved_LIBS"
> >>>
> >>> which I guess would also work for pthread_create?  This would simplify
> >>> the code a bit.
> >>
> >> Thanks a lot for the review. I've udpated the patch's content in
> >> configure.ac per your suggestion. Tested similarly on x86_64 and in a
> >> container with libc 2.27
> >
> > LGTM.
> >
> > Thanks,
> > Richard.
> >
> >>   From 00669b600a75743523c358ee41ab999b6e9fa0f6 Mon Sep 17 00:00:00 2001
> >> From: Arthur Cohen 
> >> Date: Fri, 12 Apr 2024 13:52:18 +0200
> >> Subject: [PATCH] rust: Do not link with libdl and libpthread 
> >> unconditionally
> >>
> >> ChangeLog:
> >>
> >>  * Makefile.tpl: Add CRAB1_LIBS variable.
> >>  * Makefile.in: Regenerate.
> >>  * configure: Regenerate.
> >>  * configure.ac: Check if -ldl and -lpthread are needed, and if 
> >> so, add
> >>  them to CRAB1_LIBS.
> >>
> >> gcc/rust/ChangeLog:
> >>
> >>  * Make-lang.in: Remove overazealous LIBS = -ldl -lpthread line, 
> >> link
> >>  crab1 against CRAB1_LIBS.
> >> ---
> >>Makefile.in   |   3 +
> >>Makefile.tpl  |   3 +
> >>configure | 154 ++
> >>configure.ac  |  41 +++
> >>gcc/rust/Make-lang.in |   6 +-
> >>5 files changed, 203 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/Makefile.in b/Makefile.in
> >> index edb0c8a9a42..1753fb6b862 100644
> >> --- a/Makefile.in
> >> +++ b/Makefile.in
> >> @@ -197,6 +197,7 @@ HOST_EXPORTS = \
> >>  $(BASE_EXPORTS) \
> >>  CC="$(CC)"; export CC; \
> >>  ADA_CFLAGS="$(ADA_CFLAGS)"; export ADA_CFLAGS; \
> >> +   CRAB1_LIBS="$(CRAB1_LIBS)"; export CRAB1_LIBS; \
> >>  CFLAGS="$(CFLAGS)"; export CFLAGS; \
> >>  CONFIG_SHELL="$(SHELL)"; export CONFIG_SHELL; \
> >>  CXX="$(CXX)"; export CXX; \
> >> @@ -450,6 +451,8 @@ GOCFLAGS = $(CFLAGS)
> >>GDCFLAGS = @GDCFLAGS@
> >>GM2FLAGS = $(CFLAGS)
> >>
> >> +CRAB1_LIBS = @CRAB1_LIBS@
> >> +
> >>PKG_CONFIG_PATH = @PKG_CONFIG_PATH@
> >>
> >>GUILE = guile
> >> diff --git a/Makefile.tpl b/Makefile.tpl
> >> index adbcbdd1d57..4aeaad3c1a5 100644
> >> --- a/Makefile.tpl
> >> +++ b/Makefile.tpl
> >> @@ -200,6 +200,7 @@ HOST_EXPORTS = \
> >>  $(BASE_EXPORTS) \
> >>  CC="$(CC)"; export CC; \
> >>  ADA_CFLAGS="$(ADA_CFLAGS)"; export ADA_CFLAGS; \
> >> +   CRAB1_LIBS="$(CRAB1_LIBS)"; export CRAB1_LIBS; \
> >>  CFLAGS="$(CFLAGS)"; export CFLAGS; \
> >>  CONFIG_SHELL="$(SHELL)"; export CONFIG_SHELL; \
> >>  CXX="$(CXX)"; export CXX; \
> >> @@ -453,6 +454,8 @@ GOCFLAGS = $(CFLAGS)
> >>GDCFLAGS = @GDCFLAGS@
> >>GM2FLAGS = $(CFLAGS)
> >>
> >> +CRAB1_LIBS = @CRAB1_LIBS@
> >> +
> >>PKG_CONFIG_PATH = @PKG_CONFIG_PATH@
> >>
> >>GUILE = guile
> >> diff --git a/configure b/configure
> >> index 02b435c1163..a9ea5258f0f 100755
> >> --- a/configure
> >> +++ b/configure
> >> @@ -690,6 +690,7 @@ extra_host_zlib_configure_flags
> >>extra_host_libiberty_configure_flags
> >>stage1_languages
> >>host_libs_picflag
> >> +CRAB1_LIBS
> >>PICFLAG
> >>host_shared
> >>gcc_host_pie
> >> @@ -8826,6 +8827,139 @@ fi
> >>
> >>
> >>
> >> +# Rust req

Re: [PATCH v2] fix PowerPC < 7 w/ Altivec not to default to power7

2024-06-12 Thread René Rebe

Hey,

> On Jun 12, 2024, at 00:27, René Rebe  wrote:
> 
> Hi!
> 
>> On Jun 12, 2024, at 00:15, Segher Boessenkool  
>> wrote:
>> 
>> Hi!
>> 
>> What does "powerpc < 7" mean?  Something before POWER ISA 2.06?
> 
> PowerPC ISA level 7 or whatever you like to call it.
> 
>> On Tue, Jun 11, 2024 at 04:22:54PM +0200, Rene Rebe wrote:
>>> Glibc uses .machine to determine assembler optimizations to use.
>> 
>> What does this mean?
>> 
>> .machine is an *output* for glibc; nothing in glibc reads source code.
> 
> The glibc build with gcc since 2019 with -mcpu=g5, cell or anything before
> power7 w/ altiven will use assembly optimizations with instructions not
> supported by the CPU. I found out the hard way because the resultings
> binaries threw SIGILL.

Thankfully to total recall I actually debugged this live, 4 years ago on 
YouTube:

https://www.youtube.com/watch?v=0gU5n3XhGOw

It is actually in glibc’s preconfigure explicitly grep’ing for it to choose
the submachine assembler optimizations:

preconfigure:case "${machine}:${submachine}" in
preconfigure:  | grep -E "mcpu=|.machine" -m 1 \
preconfigure:  | sed -e "s/.*machine //" -e "s/.*mcpu=\(.*\)\"/\1/“`

While we could argue that the glibc configure code is also not particularly
stelar, gcc should define the correct .machine ISA level like it did before the
quoted change in 2019 and my patch submitted nearly 4 years ago
fixes that ;-)

You can also support the work I’m doing daily over at:

https://patreon.com/renerebe

Thank you so much,
René

>> Nothing the ".machine" directive does has anything to do with
>> optimisations.  Instead, it simply changes what architecture level is
>> used for the following code. what specific instructions are supported
>> mainly.
> 
> I could probably go grep the glibc sources again 4 years later for you.
> 
>>> --- a/gcc/testsuite/gcc.target/powerpc/pr97367.c.vanilla 2024-05-30 
>>> 18:26:29.839784279 +0200
>>> +++ b/gcc/testsuite/gcc.target/powerpc/pr97367.c 2024-10-06 
>>> 18:20:34.873818482 +0200
>>> @@ -0,0 +1,9 @@
>>> +/* { dg-do compile } */
>>> +/* { dg-options "-mdejagnu-cpu=G5" } */
>>> +
>>> +int dummy ()
>>> +{
>>> +  return 0;
>>> +}
>>> +
>>> +/* { dg-final { scan-assembler "power4" } } */
>> 
>> Please explain (in the testcase, not here!) what this is meant to test!
>> 
>> You probably want to say {\mpower4\M} instead, btw.  Unless you want to
>> match ".machine spower436" as well?
> 
> That sounds indeed reasonable. I guess we can make it match .machine, too.
> Updated test-case welcome ;-)


-- 
ExactCODE GmbH, Lietzenburger Str. 42, DE-10789 Berlin
http://exactcode.com | http://exactscan.com | http://ocrkit.com

Re: [PATCH v2 0/4] Libatomic: Cleanup ifunc selector and aliasing

Victor Do Nascimento  writes:
> Changes in V2:
>
> As explained in patch v2 1/4, it has become clear that the current
> approach of querying assembler support for newer architectural
> extensions at compile time is undesirable both from a maintainability
> as well as a consistency standpoint - Different compiled versions of
> Libatomic may have different features depending on the machine on
> which they were built.
>
> These issues make for difficult testing as the explosion in number of
> `#ifdef' guards makes maintenance error-prone and the dependence on
> binutils version means that, as well as deploying changes for testing
> in a variety of target configurations, testing must also involve
> compiling the library on an increasing number of host configurations,
> meaning that the chance of bugs going undetected increases (as was
> proved in the pre-commit CI which, due to the use of an older version
> of Binutils, picked up on a runtime-error that had hitherto gone
> unnoticed).
>
> We therefore do away with the use of all assembly instructions
> dependent on Binutils 2.42, choosing to replace them with `.inst's
> instead.  This eliminates the latent bug picked up by CI and will
> ensure consistent builds of Libatomic across all versions of Binutils.

Nice!  Thanks for doing this.  It seems much cleaner and more flexible
than the current approach.

Thanks also for the clear organisation of the series.

OK for trunk.  (For the record, I didn't hand-check the encodings of the
.insts ...)

Richard

> ---
>
> The recent introduction of the optional LSE128 and RCPC3 architectural
> extensions to AArch64 has further led to the increased flexibility of
> atomic support in the architecture, with many extensions providing
> support for distinct atomic operations, each with different potential
> applications in mind.
>
> This has led to maintenance difficulties in Libatomic, in particular
> regarding the way the ifunc selector is generated via a series of
> macro expansions at compile-time.
>
> Until now, irrespective of the atomic operation in question, all atomic
> functions for a particular operand size were expected to have the same
> number of ifunc alternatives, meaning that a one-size-fits-all
> approach could reasonably be taken for the selector.
>
> This meant that if, hypothetically, for a particular architecture and
> operand size one particular atomic operation was to have 3 different
> implementations associated with different extensions, libatomic would
> likewise be required to present three ifunc alternatives for all other
> atomic functions.
>
> The consequence in the design choice was the unnecessary use of
> function aliasing and the unwieldy code which resulted from this.
>
> This patch series attempts to remediate this issue by making the
> preprocessor macros defining the number of ifunc alternatives and
> their respective selection functions dependent on the file importing
> the ifunc selector-generating framework.
>
> all files are given `LAT_' macros, defined at the beginning
> and undef'd at the end of the file.  It is these macros that are
> subsequently used to fine-tune the behaviors of `libatomic_i.h' and
> `host-config.h'.
>
> In particular, the definition of the `IFUNC_NCOND(N)' and
> `IFUNC_COND_' macros in host-config.h can now be guarded behind
> these new file-specific macros, which ultimately control what the
> `GEN_SELECTOR(X)' macro in `libatomic_i.h' expands to.  As both of
> these headers are imported once per file implementing some atomic
> operation, fine-tuned control is now possible.
>
> Regtested with both `--enable-gnu-indirect-function' and
> `--disable-gnu-indirect-function' configurations on armv9.4-a target
> with LRCPC3 and LSE128 support and without.
>
> Victor Do Nascimento (4):
>   Libatomic: AArch64: Convert all lse128 assembly to .insn directives
>   Libatomic: Define per-file identifier macros
>   Libatomic: Make ifunc selector behavior contingent on importing file
>   Libatomic: Clean up AArch64 `atomic_16.S' implementation file
>
>  libatomic/acinclude.m4   |  18 -
>  libatomic/auto-config.h.in   |   3 -
>  libatomic/cas_n.c|   2 +
>  libatomic/config/linux/aarch64/atomic_16.S   | 511 +--
>  libatomic/config/linux/aarch64/host-config.h |  35 +-
>  libatomic/configure  |  43 --
>  libatomic/configure.ac   |   3 -
>  libatomic/exch_n.c   |   2 +
>  libatomic/fadd_n.c   |   2 +
>  libatomic/fand_n.c   |   2 +
>  libatomic/fence.c|   2 +
>  libatomic/fenv.c |   2 +
>  libatomic/fior_n.c   |   2 +
>  libatomic/flag.c |   2 +
>  libatomic/fnand_n.c  |   2 +
>  libatomic/fop_n.c|   2 +
>  libatomic/fsub_n.c

Re: arm: Add .type and .size to __gnu_cmse_nonsecure_call [PR115360]

2024-06-12 Thread Andre Vieira (lists)





On 06/06/2024 12:53, Richard Earnshaw (lists) wrote:

On 05/06/2024 17:07, Andre Vieira (lists) wrote:

Hi,

This patch adds missing assembly directives to the CMSE library wrapper to call 
functions with attribute cmse_nonsecure_call.  Without the .type directive the 
linker will fail to produce the correct veneer if a call to this wrapper 
function is to far from the wrapper itself.  The .size was added for 
completeness, though we don't necessarily have a usecase for it.

I did not add a testcase as I couldn't get dejagnu to disassemble the linked 
binary to check we used an appropriate branch instruction, I did however test 
it locally and with this change the GNU linker now generates an appropriate 
veneer and call to that veneer when __gnu_cmse_nonsecure_call is too far.

OK for trunk and backport to any release branches still in support (after 
waiting a week or so)?

libgcc/ChangeLog:

 PR target/115360
 * config/arm/cmse_nonsecure_call.S: Add .type and .size directives.


OK.

R.


OK to backport? I was thinking backporting it as far as gcc-11 (we 
haven't done a 11.5 yet).


Kind Regards,
Andre

Re: [PATCH] aarch64: Use bitreverse rtl code instead of unspec [PR115176]

Andrew Pinski  writes:
> Bitreverse rtl code was added with r14-1586-g6160572f8d243c. So let's
> use it instead of an unspec. This is just a small cleanup but it does
> have one small fix with respect to rtx costs which didn't handle vector modes
> correctly for the UNSPEC and now it does.
> This is part of the first step in adding __builtin_bitreverse's builtins
> but it is independent of it though.

Nice cleanup.

> Bootstrapped and tested on aarch64-linux-gnu with no regressions.
>
> gcc/ChangeLog:
>
>   PR target/115176
>   * config/aarch64/aarch64-simd.md (aarch64_rbit): Use
>   bitreverse instead of unspec.
>   * config/aarch64/aarch64-sve-builtins-base.cc (svrbit): Convert over to 
> using
>   rtx_code_function instead of unspec_based_function.
>   * config/aarch64/aarch64-sve.md: Update comment where RBIT is included.
>   * config/aarch64/aarch64.cc (aarch64_rtx_costs): Handle BITREVERSE like 
> BSWAP.
>   Remove UNSPEC_RBIT support.
>   * config/aarch64/aarch64.md (unspec): Remove UNSPEC_RBIT.
>   (aarch64_rbit): Use bitreverse instead of unspec.
>   * config/aarch64/iterators.md (SVE_INT_UNARY): Add bitreverse.
>   (optab): Likewise.
>   (sve_int_op): Likewise.
>   (SVE_INT_UNARY): Remove UNSPEC_RBIT.
>   (optab): Likewise.
>   (sve_int_op): Likewise.
>   (min_elem_bits): Likewise.
>
> Signed-off-by: Andrew Pinski 
> ---
>  gcc/config/aarch64/aarch64-simd.md  |  3 +--
>  gcc/config/aarch64/aarch64-sve-builtins-base.cc |  2 +-
>  gcc/config/aarch64/aarch64-sve.md   |  2 +-
>  gcc/config/aarch64/aarch64.cc   | 10 ++
>  gcc/config/aarch64/aarch64.md   |  3 +--
>  gcc/config/aarch64/iterators.md | 10 +-
>  6 files changed, 11 insertions(+), 19 deletions(-)
>
> diff --git a/gcc/config/aarch64/aarch64-simd.md 
> b/gcc/config/aarch64/aarch64-simd.md
> index f644bd1731e..0bb39091a38 100644
> --- a/gcc/config/aarch64/aarch64-simd.md
> +++ b/gcc/config/aarch64/aarch64-simd.md
> @@ -377,8 +377,7 @@ (define_insn "bswap2"
>  
>  (define_insn "aarch64_rbit"
>[(set (match_operand:VB 0 "register_operand" "=w")
> - (unspec:VB [(match_operand:VB 1 "register_operand" "w")]
> -UNSPEC_RBIT))]
> + (bitreverse:VB (match_operand:VB 1 "register_operand" "w")))]
>"TARGET_SIMD"
>"rbit\\t%0., %1."
>[(set_attr "type" "neon_rbit")]
> diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc 
> b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> index 0d2edf3f19e..dea2f6e6bfc 100644
> --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> @@ -3186,7 +3186,7 @@ FUNCTION (svqincp, svqdecp_svqincp_impl, (SS_PLUS, 
> US_PLUS))
>  FUNCTION (svqincw, svqinc_bhwd_impl, (SImode))
>  FUNCTION (svqincw_pat, svqinc_bhwd_impl, (SImode))
>  FUNCTION (svqsub, rtx_code_function, (SS_MINUS, US_MINUS, -1))
> -FUNCTION (svrbit, unspec_based_function, (UNSPEC_RBIT, UNSPEC_RBIT, -1))
> +FUNCTION (svrbit, rtx_code_function, (BITREVERSE, BITREVERSE, -1))
>  FUNCTION (svrdffr, svrdffr_impl,)
>  FUNCTION (svrecpe, unspec_based_function, (-1, UNSPEC_URECPE, UNSPEC_FRECPE))
>  FUNCTION (svrecps, unspec_based_function, (-1, -1, UNSPEC_FRECPS))
> diff --git a/gcc/config/aarch64/aarch64-sve.md 
> b/gcc/config/aarch64/aarch64-sve.md
> index d69db34016a..5331e7121d5 100644
> --- a/gcc/config/aarch64/aarch64-sve.md
> +++ b/gcc/config/aarch64/aarch64-sve.md
> @@ -3083,6 +3083,7 @@ (define_expand "vec_extract"
>  ;; - CLS (= clrsb)
>  ;; - CLZ
>  ;; - CNT (= popcount)
> +;; - RBIT (= bitreverse)
>  ;; - NEG
>  ;; - NOT
>  ;; -
> @@ -3171,7 +3172,6 @@ (define_insn "*cond__any"
>  ;;  [INT] General unary arithmetic corresponding to unspecs
>  ;; -
>  ;; Includes
> -;; - RBIT
>  ;; - REVB
>  ;; - REVH
>  ;; - REVW
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index 13191ec8e34..0e9d7b1ec0f 100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -14690,6 +14690,7 @@ cost_plus:
>   return true;
>}
>  
> +case BITREVERSE:
>  case BSWAP:
>*cost = COSTS_N_INSNS (1);
>  
> @@ -15339,16 +15340,9 @@ cost_plus:
>  
>return false;
>  }
> -
> -  if (XINT (x, 1) == UNSPEC_RBIT)
> -{
> -  if (speed)
> -*cost += extra_cost->alu.rev;
> -
> -  return false;
> -}
>break;
>  
> +

The extra blank line here seems unnecessary.

OK otherwise, thanks.

Richard

>  case TRUNCATE:
>  
>/* Decompose muldi3_highpart.  */
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index dd88fd891b5..69167ab0c04 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@

Re: [PATCH] tree-optimization/115385 - handle more gaps with peeling of a single iteration

On Wed, 12 Jun 2024, Richard Biener wrote:

> On Tue, 11 Jun 2024, Richard Sandiford wrote:
> 
> > Don't think it makes any difference, but:
> > 
> > Richard Biener  writes:
> > > @@ -2151,7 +2151,16 @@ get_group_load_store_type (vec_info *vinfo, 
> > > stmt_vec_info stmt_info,
> > >access excess elements.
> > >???  Enhancements include peeling multiple iterations
> > >or using masked loads with a static mask.  */
> > > -   || (group_size * cvf) % cnunits + group_size - gap < cnunits))
> > > +   || ((group_size * cvf) % cnunits + group_size - gap < cnunits
> > > +   /* But peeling a single scalar iteration is enough if
> > > +  we can use the next power-of-two sized partial
> > > +  access.  */
> > > +   && ((cremain = (group_size * cvf - gap) % cnunits), true
> > 
> > ...this might be less surprising as:
> > 
> >   && ((cremain = (group_size * cvf - gap) % cnunits, true)
> > 
> > in terms of how the &&s line up.
> 
> Yeah - I'll fix before pushing.

The aarch64 CI shows that a few testcases no longer use SVE
(gcc.target/aarch64/sve/slp_perm_{4,7,8}.c) because peeling
for gaps is deemed isufficient.  Formerly we had

  if (loop_vinfo
  && *memory_access_type == VMAT_CONTIGUOUS
  && SLP_TREE_LOAD_PERMUTATION (slp_node).exists ()
  && !multiple_p (group_size * LOOP_VINFO_VECT_FACTOR 
(loop_vinfo),
  nunits))
{
  unsigned HOST_WIDE_INT cnunits, cvf;
  if (!can_overrun_p
  || !nunits.is_constant (&cnunits)
  || !LOOP_VINFO_VECT_FACTOR (loop_vinfo).is_constant 
(&cvf)
  /* Peeling for gaps assumes that a single scalar 
iteration
 is enough to make sure the last vector iteration 
doesn't
 access excess elements.
 ???  Enhancements include peeling multiple iterations
 or using masked loads with a static mask.  */
  || (group_size * cvf) % cnunits + group_size - gap < 
cnunits)
{
  if (dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, 
vect_location,
 "peeling for gaps insufficient for "
 "access\n");

and in all cases multiple_p (group_size * LOOP_VINFO_VECT_FACTOR, nunits)
is true so we didn't check for whether peeling one iteration is
sufficient.  But after the refactoring the outer checks merely
indicates there's overrun (which is there already because gap != 0).

That is, we never verified, for the "regular" gap case, whether peeling
for a single iteration is sufficient.  But now of course we run into
the inner check which will always trigger if earlier checks didn't
work out to set overrun_p to false.

For slp_perm_8.c we have a group_size of two, nunits is {16, 16}
and VF is {8, 8} and gap is one.  Given we know the
multiple_p we know that (group_size * cvf) % cnunits is zero,
so what remains is group_size - gap < nunits but 1 is probably
always less than {16, 16}.

The new logic I added in the later patch that peeling a single
iteration is OK when we use a smaller, rounded-up to power-of-two
sized access is

  || ((group_size * cvf) % cnunits + group_size - gap < 
cnunits
  /* But peeling a single scalar iteration is enough 
if
 we can use the next power-of-two sized partial
 access.  */
  && (cremain = (group_size * cvf - gap) % cnunits, 
true)
  && (cpart_size = (1 << ceil_log2 (cremain))) != 
cnunits
  && vector_vector_composition_type 
   (vectype, cnunits / cpart_size, 
&half_vtype) == NULL_TREE)))

again knowing the multiple we know cremain is nunits - gap and with
gap == 1 rounding this size up will yield nunits and thus the existing
peeling is OK.  Something is inconsistent here and the pre-existing

  (group_size * cvf) % cnunits + group_size - gap < cnunits

check looks suspicious for a general check.

  (group_size * cvf - gap)

should be the number of elements we can access without touching
excess elements.  Peeling a single iteration will make sure
group_size * cvf + group_size - gap is accessed
(that's group_size * (cvf + 1) - gap).  The excess elements
touched in the vector loop are

  cnunits - (group_size * cvf - gap) % cnunits

I think that number needs to be less or equal to group_size, so
the correct check should be

  (group_size * cvf - gap) % cnunits + group_size < cnunits

for the SVE case that's (nunits - 1) + 2 < nunits which should
simplify to false.  Now the question is how to formulate this
with poly-ints in a way that it works out, for the case in
question doing the overr

[PATCH 1/3][v3] tree-optimization/114107 - avoid peeling for gaps in more cases

The following refactors the code to detect necessary peeling for
gaps, in particular the PR103116 case when there is no gap but
the group size is smaller than the vector size.  The testcase in
PR114107 shows we fail to SLP

  for (int i=0; i

[PATCH 2/3][v3] tree-optimization/115385 - handle more gaps with peeling of a single iteration

The following makes peeling of a single scalar iteration handle more
gaps, including non-power-of-two cases.  This can be done by rounding
up the remaining access to the next power-of-two which ensures that
the next scalar iteration will pick at least the number of excess
elements we access.

I've added a correctness testcase and one x86 specific scanning for
the optimization.

PR tree-optimization/115385
* tree-vect-stmts.cc (get_group_load_store_type): Peeling
of a single scalar iteration is sufficient if we can narrow
the access to the next power of two of the bits in the last
access.
(vectorizable_load): Ensure that the last access is narrowed.

* gcc.dg/vect/pr115385.c: New testcase.
* gcc.target/i386/vect-pr115385.c: Likewise.
---
 gcc/testsuite/gcc.dg/vect/pr115385.c  | 88 +++
 gcc/testsuite/gcc.target/i386/vect-pr115385.c | 53 +++
 gcc/tree-vect-stmts.cc| 44 --
 3 files changed, 180 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr115385.c
 create mode 100644 gcc/testsuite/gcc.target/i386/vect-pr115385.c

diff --git a/gcc/testsuite/gcc.dg/vect/pr115385.c 
b/gcc/testsuite/gcc.dg/vect/pr115385.c
new file mode 100644
index 000..a18cd665d7d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr115385.c
@@ -0,0 +1,88 @@
+/* { dg-require-effective-target mmap } */
+
+#include 
+#include 
+
+#define COUNT 511
+#define MMAP_SIZE 0x2
+#define ADDRESS 0x112200
+#define TYPE unsigned char
+
+#ifndef MAP_ANONYMOUS
+#define MAP_ANONYMOUS MAP_ANON
+#endif
+
+void __attribute__((noipa)) foo(TYPE * __restrict x,
+TYPE *y, int n)
+{
+  for (int i = 0; i < n; ++i)
+{
+  x[16*i+0] = y[3*i+0];
+  x[16*i+1] = y[3*i+1];
+  x[16*i+2] = y[3*i+2];
+  x[16*i+3] = y[3*i+0];
+  x[16*i+4] = y[3*i+1];
+  x[16*i+5] = y[3*i+2];
+  x[16*i+6] = y[3*i+0];
+  x[16*i+7] = y[3*i+1];
+  x[16*i+8] = y[3*i+2];
+  x[16*i+9] = y[3*i+0];
+  x[16*i+10] = y[3*i+1];
+  x[16*i+11] = y[3*i+2];
+  x[16*i+12] = y[3*i+0];
+  x[16*i+13] = y[3*i+1];
+  x[16*i+14] = y[3*i+2];
+  x[16*i+15] = y[3*i+0];
+}
+}
+
+void __attribute__((noipa)) bar(TYPE * __restrict x,
+TYPE *y, int n)
+{
+  for (int i = 0; i < n; ++i)
+{
+  x[16*i+0] = y[5*i+0];
+  x[16*i+1] = y[5*i+1];
+  x[16*i+2] = y[5*i+2];
+  x[16*i+3] = y[5*i+3];
+  x[16*i+4] = y[5*i+4];
+  x[16*i+5] = y[5*i+0];
+  x[16*i+6] = y[5*i+1];
+  x[16*i+7] = y[5*i+2];
+  x[16*i+8] = y[5*i+3];
+  x[16*i+9] = y[5*i+4];
+  x[16*i+10] = y[5*i+0];
+  x[16*i+11] = y[5*i+1];
+  x[16*i+12] = y[5*i+2];
+  x[16*i+13] = y[5*i+3];
+  x[16*i+14] = y[5*i+4];
+  x[16*i+15] = y[5*i+0];
+}
+}
+
+TYPE x[COUNT * 16];
+
+int
+main (void)
+{
+  void *y;
+  TYPE *end_y;
+
+  y = mmap ((void *) ADDRESS, MMAP_SIZE, PROT_READ | PROT_WRITE,
+MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
+  if (y == MAP_FAILED)
+{
+  perror ("mmap");
+  return 1;
+}
+
+  end_y = (TYPE *) ((char *) y + MMAP_SIZE);
+
+  foo (x, end_y - COUNT * 3, COUNT);
+  bar (x, end_y - COUNT * 5, COUNT);
+
+  return 0;
+}
+
+/* We always require a scalar epilogue here but we don't know which
+   targets support vector composition this way.  */
diff --git a/gcc/testsuite/gcc.target/i386/vect-pr115385.c 
b/gcc/testsuite/gcc.target/i386/vect-pr115385.c
new file mode 100644
index 000..a6be9ce4e54
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/vect-pr115385.c
@@ -0,0 +1,53 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -msse4.1 -mno-avx -fdump-tree-vect-details" } */
+
+void __attribute__((noipa)) foo(unsigned char * __restrict x,
+unsigned char *y, int n)
+{
+  for (int i = 0; i < n; ++i)
+{
+  x[16*i+0] = y[3*i+0];
+  x[16*i+1] = y[3*i+1];
+  x[16*i+2] = y[3*i+2];
+  x[16*i+3] = y[3*i+0];
+  x[16*i+4] = y[3*i+1];
+  x[16*i+5] = y[3*i+2];
+  x[16*i+6] = y[3*i+0];
+  x[16*i+7] = y[3*i+1];
+  x[16*i+8] = y[3*i+2];
+  x[16*i+9] = y[3*i+0];
+  x[16*i+10] = y[3*i+1];
+  x[16*i+11] = y[3*i+2];
+  x[16*i+12] = y[3*i+0];
+  x[16*i+13] = y[3*i+1];
+  x[16*i+14] = y[3*i+2];
+  x[16*i+15] = y[3*i+0];
+}
+}
+
+void __attribute__((noipa)) bar(unsigned char * __restrict x,
+unsigned char *y, int n)
+{
+  for (int i = 0; i < n; ++i)
+{
+  x[16*i+0] = y[5*i+0];
+  x[16*i+1] = y[5*i+1];
+  x[16*i+2] = y[5*i+2];
+  x[16*i+3] = y[5*i+3];
+  x[16*i+4] = y[5*i+4];
+  x[16*i+5] = y[5*i+0];
+  x[16*i+6] = y[5*i+1];
+  x[16*i+7] = y[5*i+2];
+  x[16*i+8] = y[5*i+3];
+  x[16*i+9] = y[5*i+4];
+  x[16*i+10] = y[5*i+0];
+  x[16*i+11] = y[5*i+1];
+  x[16*i+12] = y[5*i+2];
+  x[16*i+13] = y[5*i+3];
+  x[1

[PATCH 3/3][v3] Improve code generation of strided SLP loads

This avoids falling back to elementwise accesses for strided SLP
loads when the group size is not a multiple of the vector element
size.  Instead we can use a smaller vector or integer type for the load.

For stores we can do the same though restrictions on stores we handle
and the fact that store-merging covers up makes this mostly effective
for cost modeling which shows for gcc.target/i386/vect-strided-3.c
which we now vectorize with V4SI vectors rather than just V2SI ones.

For all of this there's still the opportunity to use non-uniform
accesses, say for a 6-element group with a VF of two do
V4SI, { V2SI, V2SI }, V4SI.  But that's for a possible followup.

* gcc.target/i386/vect-strided-1.c: New testcase.
* gcc.target/i386/vect-strided-2.c: Likewise.
* gcc.target/i386/vect-strided-3.c: Likewise.
* gcc.target/i386/vect-strided-4.c: Likewise.
---
 .../gcc.target/i386/vect-strided-1.c  |  24 +
 .../gcc.target/i386/vect-strided-2.c  |  17 +++
 .../gcc.target/i386/vect-strided-3.c  |  20 
 .../gcc.target/i386/vect-strided-4.c  |  20 
 gcc/tree-vect-stmts.cc| 100 --
 5 files changed, 127 insertions(+), 54 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/vect-strided-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/vect-strided-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/vect-strided-3.c
 create mode 100644 gcc/testsuite/gcc.target/i386/vect-strided-4.c

diff --git a/gcc/testsuite/gcc.target/i386/vect-strided-1.c 
b/gcc/testsuite/gcc.target/i386/vect-strided-1.c
new file mode 100644
index 000..db4a06711f1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/vect-strided-1.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -msse2 -mno-avx" } */
+
+void foo (int * __restrict a, int *b, int s)
+{
+  for (int i = 0; i < 1024; ++i)
+{
+  a[8*i+0] = b[s*i+0];
+  a[8*i+1] = b[s*i+1];
+  a[8*i+2] = b[s*i+2];
+  a[8*i+3] = b[s*i+3];
+  a[8*i+4] = b[s*i+4];
+  a[8*i+5] = b[s*i+5];
+  a[8*i+6] = b[s*i+4];
+  a[8*i+7] = b[s*i+5];
+}
+}
+
+/* Three two-element loads, two four-element stores.  On ia32 we elide
+   a permute and perform a redundant load.  */
+/* { dg-final { scan-assembler-times "movq" 2 } } */
+/* { dg-final { scan-assembler-times "movhps" 2 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "movhps" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "movups" 2 } } */
diff --git a/gcc/testsuite/gcc.target/i386/vect-strided-2.c 
b/gcc/testsuite/gcc.target/i386/vect-strided-2.c
new file mode 100644
index 000..6fd64e28cf0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/vect-strided-2.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -msse2 -mno-avx" } */
+
+void foo (int * __restrict a, int *b, int s)
+{
+  for (int i = 0; i < 1024; ++i)
+{
+  a[4*i+0] = b[s*i+0];
+  a[4*i+1] = b[s*i+1];
+  a[4*i+2] = b[s*i+0];
+  a[4*i+3] = b[s*i+1];
+}
+}
+
+/* One two-element load, one four-element store.  */
+/* { dg-final { scan-assembler-times "movq" 1 } } */
+/* { dg-final { scan-assembler-times "movups" 1 } } */
diff --git a/gcc/testsuite/gcc.target/i386/vect-strided-3.c 
b/gcc/testsuite/gcc.target/i386/vect-strided-3.c
new file mode 100644
index 000..b462701a0b2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/vect-strided-3.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -msse2 -mno-avx -fno-tree-slp-vectorize" } */
+
+void foo (int * __restrict a, int *b, int s)
+{
+  if (s >= 6)
+for (int i = 0; i < 1024; ++i)
+  {
+   a[s*i+0] = b[4*i+0];
+   a[s*i+1] = b[4*i+1];
+   a[s*i+2] = b[4*i+2];
+   a[s*i+3] = b[4*i+3];
+   a[s*i+4] = b[4*i+0];
+   a[s*i+5] = b[4*i+1];
+  }
+}
+
+/* While the vectorizer generates 6 uint64 stores.  */
+/* { dg-final { scan-assembler-times "movq" 4 } } */
+/* { dg-final { scan-assembler-times "movhps" 2 } } */
diff --git a/gcc/testsuite/gcc.target/i386/vect-strided-4.c 
b/gcc/testsuite/gcc.target/i386/vect-strided-4.c
new file mode 100644
index 000..dd922926a2a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/vect-strided-4.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -msse4.2 -mno-avx -fno-tree-slp-vectorize" } */
+
+void foo (int * __restrict a, int * __restrict b, int *c, int s)
+{
+  if (s >= 2)
+for (int i = 0; i < 1024; ++i)
+  {
+   a[s*i+0] = c[4*i+0];
+   a[s*i+1] = c[4*i+1];
+   b[s*i+0] = c[4*i+2];
+   b[s*i+1] = c[4*i+3];
+  }
+}
+
+/* Vectorization factor two, two two-element stores to a using movq
+   and two two-element stores to b via pextrq/movhps of the high part.  */
+/* { dg-final { scan-assembler-times "movq" 2 } } */
+/* { dg-final { scan-assembler-times "pextrq" 2 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "movhps" 2 { target { ia32 } } } } */
dif

Re: [PATCH 06/52] m2: Replace uses of {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE

Hi Gaius,

>>  static tree
>>  build_m2_short_real_node (void)
>>  {
>> -  tree c;
>> -
>> -  /* Define `REAL'.  */
>> -
>> -  c = make_node (REAL_TYPE);
>> -  TYPE_PRECISION (c) = FLOAT_TYPE_SIZE;
>> -  layout_type (c);
>> -  return c;
>> +  /* Define `SHORTREAL'.  */
>> +  layout_type (float_type_node);
> 
> It looks that float_type_node, double_type_node, float128_type_node and
> long_double_type_node have been called with layout_type when they are
> being initialized in function build_common_tree_nodes, maybe we can just
> assert their TYPE_SIZE.

I just noticed that latest trunk still has {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE
in gcc/m2 and realized that my comment above was misleading, sorry about that.
It meant TYPE_SIZE (float_type_node) etc. instead of 
{FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE,
as this patch series would like to get rid of {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE.

I adjusted them as below patch, does this look good to you?

BR,
Kewen
-

[PATCH] m2: Remove uses of {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE

Joseph pointed out "floating types should have their mode,
not a poorly defined precision value" in the discussion[1],
as he and Richi suggested, the existing macros
{FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE will be replaced with a
hook mode_for_floating_type.  To be prepared for that, this
patch is to remove uses of {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE
in m2.  Currently they are used for assertion and can be
replaced with TYPE_SIZE check on the corresponding type node,
since we dropped the call to layout_type which would early
return once TYPE_SIZE is set and this assertion ensures it's
safe to drop that call.

[1] https://gcc.gnu.org/pipermail/gcc-patches/2024-May/651209.html

gcc/m2/ChangeLog:

* gm2-gcc/m2type.cc (build_m2_short_real_node): Adjust assertion with
TYPE_SIZE check.
(build_m2_real_node): Likewise.
(build_m2_long_real_node): Add assertion with TYPE_SIZE check.
---
 gcc/m2/gm2-gcc/m2type.cc | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/gcc/m2/gm2-gcc/m2type.cc b/gcc/m2/gm2-gcc/m2type.cc
index 5773a5cbd19..7ed184518cb 100644
--- a/gcc/m2/gm2-gcc/m2type.cc
+++ b/gcc/m2/gm2-gcc/m2type.cc
@@ -1416,7 +1416,7 @@ static tree
 build_m2_short_real_node (void)
 {
   /* Define `SHORTREAL'.  */
-  ASSERT_CONDITION (TYPE_PRECISION (float_type_node) == FLOAT_TYPE_SIZE);
+  ASSERT_CONDITION (TYPE_SIZE (float_type_node));
   return float_type_node;
 }

@@ -1424,7 +1424,7 @@ static tree
 build_m2_real_node (void)
 {
   /* Define `REAL'.  */
-  ASSERT_CONDITION (TYPE_PRECISION (double_type_node) == DOUBLE_TYPE_SIZE);
+  ASSERT_CONDITION (TYPE_SIZE (double_type_node));
   return double_type_node;
 }

@@ -1432,12 +1432,13 @@ static tree
 build_m2_long_real_node (void)
 {
   tree longreal;
-
+
   /* Define `LONGREAL'.  */
   if (M2Options_GetIEEELongDouble ())
 longreal = float128_type_node;
   else
 longreal = long_double_type_node;
+  ASSERT_CONDITION (TYPE_SIZE (longreal));
   return longreal;
 }

--
2.43.0

Re: [PATCH 04/52] go: Replace uses of {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE

Hi,

Gentle ping:

https://gcc.gnu.org/pipermail/gcc-patches/2024-June/653387.html

BR,
Kewen

on 2024/6/3 11:00, Kewen Lin wrote:
> Joseph pointed out "floating types should have their mode,
> not a poorly defined precision value" in the discussion[1],
> as he and Richi suggested, the existing macros
> {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE will be replaced with a
> hook mode_for_floating_type.  To be prepared for that, this
> patch is to replace use of {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE
> in go with TYPE_PRECISION of {float,{,long_}double}_type_node.
> 
> [1] https://gcc.gnu.org/pipermail/gcc-patches/2024-May/651209.html
> 
> gcc/go/ChangeLog:
> 
>   * go-gcc.cc (Gcc_backend::float_type): Use TYPE_PRECISION of
>   {float,double,long_double}_type_node to replace
>   {FLOAT,DOUBLE,LONG_DOUBLE}_TYPE_SIZE.
>   (Gcc_backend::complex_type): Likewise.
> ---
>  gcc/go/go-gcc.cc | 12 ++--
>  1 file changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/gcc/go/go-gcc.cc b/gcc/go/go-gcc.cc
> index bc9732c3db3..6aa751f9f30 100644
> --- a/gcc/go/go-gcc.cc
> +++ b/gcc/go/go-gcc.cc
> @@ -993,11 +993,11 @@ Btype*
>  Gcc_backend::float_type(int bits)
>  {
>tree type;
> -  if (bits == FLOAT_TYPE_SIZE)
> +  if (bits == TYPE_PRECISION (float_type_node))
>  type = float_type_node;
> -  else if (bits == DOUBLE_TYPE_SIZE)
> +  else if (bits == TYPE_PRECISION (double_type_node))
>  type = double_type_node;
> -  else if (bits == LONG_DOUBLE_TYPE_SIZE)
> +  else if (bits == TYPE_PRECISION (long_double_type_node))
>  type = long_double_type_node;
>else
>  {
> @@ -1014,11 +1014,11 @@ Btype*
>  Gcc_backend::complex_type(int bits)
>  {
>tree type;
> -  if (bits == FLOAT_TYPE_SIZE * 2)
> +  if (bits == TYPE_PRECISION (float_type_node) * 2)
>  type = complex_float_type_node;
> -  else if (bits == DOUBLE_TYPE_SIZE * 2)
> +  else if (bits == TYPE_PRECISION (double_type_node) * 2)
>  type = complex_double_type_node;
> -  else if (bits == LONG_DOUBLE_TYPE_SIZE * 2)
> +  else if (bits == TYPE_PRECISION (long_double_type_node) * 2)
>  type = complex_long_double_type_node;
>else
>  {

PING^1 [PATCH 05/52] rust: Replace uses of {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE

Hi,

Gentle ping:

https://gcc.gnu.org/pipermail/gcc-patches/2024-June/653339.html

BR,
Kewen

on 2024/6/3 11:00, Kewen Lin wrote:
> Joseph pointed out "floating types should have their mode,
> not a poorly defined precision value" in the discussion[1],
> as he and Richi suggested, the existing macros
> {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE will be replaced with a
> hook mode_for_floating_type.  To be prepared for that, this
> patch is to replace use of {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE
> in rust with TYPE_PRECISION of {float,{,long_}double}_type_node.
> 
> [1] https://gcc.gnu.org/pipermail/gcc-patches/2024-May/651209.html
> 
> gcc/rust/ChangeLog:
> 
>   * rust-gcc.cc (float_type): Use TYPE_PRECISION of
>   {float,double,long_double}_type_node to replace
>   {FLOAT,DOUBLE,LONG_DOUBLE}_TYPE_SIZE.
> ---
>  gcc/rust/rust-gcc.cc | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/gcc/rust/rust-gcc.cc b/gcc/rust/rust-gcc.cc
> index f17e19a2dfc..38169c08985 100644
> --- a/gcc/rust/rust-gcc.cc
> +++ b/gcc/rust/rust-gcc.cc
> @@ -411,11 +411,11 @@ tree
>  float_type (int bits)
>  {
>tree type;
> -  if (bits == FLOAT_TYPE_SIZE)
> +  if (bits == TYPE_PRECISION (float_type_node))
>  type = float_type_node;
> -  else if (bits == DOUBLE_TYPE_SIZE)
> +  else if (bits == TYPE_PRECISION (double_type_node))
>  type = double_type_node;
> -  else if (bits == LONG_DOUBLE_TYPE_SIZE)
> +  else if (bits == TYPE_PRECISION (long_double_type_node))
>  type = long_double_type_node;
>else
>  {

PING^1 [PATCH 08/52] vms: Replace use of LONG_DOUBLE_TYPE_SIZE

Hi,

Gentle ping:

https://gcc.gnu.org/pipermail/gcc-patches/2024-June/653342.html

BR,
Kewen

on 2024/6/3 11:00, Kewen Lin wrote:
> Joseph pointed out "floating types should have their mode,
> not a poorly defined precision value" in the discussion[1],
> as he and Richi suggested, the existing macros
> {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE will be replaced with a
> hook mode_for_floating_type.  To be prepared for that, this
> patch is to replace use of LONG_DOUBLE_TYPE_SIZE in vms port
> with TYPE_PRECISION of long_double_type_node.
> 
> [1] https://gcc.gnu.org/pipermail/gcc-patches/2024-May/651209.html
> 
> gcc/ChangeLog:
> 
>   * config/vms/vms.cc (vms_patch_builtins): Use TYPE_PRECISION of
>   long_double_type_node to replace LONG_DOUBLE_TYPE_SIZE.
> ---
>  gcc/config/vms/vms.cc | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/gcc/config/vms/vms.cc b/gcc/config/vms/vms.cc
> index d468c79e559..2fcc673c8a9 100644
> --- a/gcc/config/vms/vms.cc
> +++ b/gcc/config/vms/vms.cc
> @@ -141,6 +141,7 @@ vms_patch_builtins (void)
>if (builtin_decl_implicit_p (BUILT_IN_FWRITE_UNLOCKED))
>  set_builtin_decl_implicit_p (BUILT_IN_FWRITE_UNLOCKED, false);
>  
> +  unsigned long_double_type_size = TYPE_PRECISION (long_double_type_node);
>/* Define aliases for names.  */
>for (i = 0; i < NBR_CRTL_NAMES; i++)
>  {
> @@ -179,7 +180,7 @@ vms_patch_builtins (void)
> vms_add_crtl_xlat (alt, nlen + 1, res, rlen);
>  
> /* Long double version.  */
> -   res[rlen - 1] = (LONG_DOUBLE_TYPE_SIZE == 128 ? 'X' : 'T');
> +   res[rlen - 1] = (long_double_type_size == 128 ? 'X' : 'T');
> alt[nlen] = 'l';
> vms_add_crtl_xlat (alt, nlen + 1, res, rlen);
>  
> @@ -223,7 +224,7 @@ vms_patch_builtins (void)
>if (n->flags & VMS_CRTL_FLOAT64)
>  res[rlen++] = 't';
>  
> -  if ((n->flags & VMS_CRTL_FLOAT128) && LONG_DOUBLE_TYPE_SIZE == 128)
> +  if ((n->flags & VMS_CRTL_FLOAT128) && long_double_type_size == 128)
>  res[rlen++] = 'x';
>  
>memcpy (res + rlen, n->name, nlen);

Re: [PATCH] tree-optimization/115385 - handle more gaps with peeling of a single iteration

Richard Biener  writes:
> On Wed, 12 Jun 2024, Richard Biener wrote:
>
>> On Tue, 11 Jun 2024, Richard Sandiford wrote:
>> 
>> > Don't think it makes any difference, but:
>> > 
>> > Richard Biener  writes:
>> > > @@ -2151,7 +2151,16 @@ get_group_load_store_type (vec_info *vinfo, 
>> > > stmt_vec_info stmt_info,
>> > >   access excess elements.
>> > >   ???  Enhancements include peeling multiple 
>> > > iterations
>> > >   or using masked loads with a static mask.  */
>> > > -  || (group_size * cvf) % cnunits + group_size - gap < 
>> > > cnunits))
>> > > +  || ((group_size * cvf) % cnunits + group_size - gap < 
>> > > cnunits
>> > > +  /* But peeling a single scalar iteration is 
>> > > enough if
>> > > + we can use the next power-of-two sized partial
>> > > + access.  */
>> > > +  && ((cremain = (group_size * cvf - gap) % 
>> > > cnunits), true
>> > 
>> > ...this might be less surprising as:
>> > 
>> >  && ((cremain = (group_size * cvf - gap) % cnunits, true)
>> > 
>> > in terms of how the &&s line up.
>> 
>> Yeah - I'll fix before pushing.
>
> The aarch64 CI shows that a few testcases no longer use SVE
> (gcc.target/aarch64/sve/slp_perm_{4,7,8}.c) because peeling
> for gaps is deemed isufficient.  Formerly we had
>
>   if (loop_vinfo
>   && *memory_access_type == VMAT_CONTIGUOUS
>   && SLP_TREE_LOAD_PERMUTATION (slp_node).exists ()
>   && !multiple_p (group_size * LOOP_VINFO_VECT_FACTOR 
> (loop_vinfo),
>   nunits))
> {
>   unsigned HOST_WIDE_INT cnunits, cvf;
>   if (!can_overrun_p
>   || !nunits.is_constant (&cnunits)
>   || !LOOP_VINFO_VECT_FACTOR (loop_vinfo).is_constant 
> (&cvf)
>   /* Peeling for gaps assumes that a single scalar 
> iteration
>  is enough to make sure the last vector iteration 
> doesn't
>  access excess elements.
>  ???  Enhancements include peeling multiple iterations
>  or using masked loads with a static mask.  */
>   || (group_size * cvf) % cnunits + group_size - gap < 
> cnunits)
> {
>   if (dump_enabled_p ())
> dump_printf_loc (MSG_MISSED_OPTIMIZATION, 
> vect_location,
>  "peeling for gaps insufficient for "
>  "access\n");
>
> and in all cases multiple_p (group_size * LOOP_VINFO_VECT_FACTOR, nunits)
> is true so we didn't check for whether peeling one iteration is
> sufficient.  But after the refactoring the outer checks merely
> indicates there's overrun (which is there already because gap != 0).
>
> That is, we never verified, for the "regular" gap case, whether peeling
> for a single iteration is sufficient.  But now of course we run into
> the inner check which will always trigger if earlier checks didn't
> work out to set overrun_p to false.
>
> For slp_perm_8.c we have a group_size of two, nunits is {16, 16}
> and VF is {8, 8} and gap is one.  Given we know the
> multiple_p we know that (group_size * cvf) % cnunits is zero,
> so what remains is group_size - gap < nunits but 1 is probably
> always less than {16, 16}.

I thought the idea was that the size of the gap was immaterial
for VMAT_CONTIGUOUS, on the assumption that it would never be
bigger than a page.  That is, any gap loaded by the final
unpeeled iteration would belong to the same page as the non-gap
data from either the same vector iteration or the subsequent
peeled scalar iteration.

Will have to think more about this if that doesn't affect the
rest of the message, but FWIW...

> The new logic I added in the later patch that peeling a single
> iteration is OK when we use a smaller, rounded-up to power-of-two
> sized access is
>
>   || ((group_size * cvf) % cnunits + group_size - gap < 
> cnunits
>   /* But peeling a single scalar iteration is enough 
> if
>  we can use the next power-of-two sized partial
>  access.  */
>   && (cremain = (group_size * cvf - gap) % cnunits, 
> true)
>   && (cpart_size = (1 << ceil_log2 (cremain))) != 
> cnunits
>   && vector_vector_composition_type 
>(vectype, cnunits / cpart_size, 
> &half_vtype) == NULL_TREE)))
>
> again knowing the multiple we know cremain is nunits - gap and with
> gap == 1 rounding this size up will yield nunits and thus the existing
> peeling is OK.  Something is inconsistent here and the pre-existing
>
>   (group_size * cvf) % cnunits + group_size - gap < cnunits
>
> check looks suspicious for

Re: [PATCH] LoongArch: Use bstrins for "value & (-1u << const)"

2024-06-12 Thread Lulu Cheng


LGTM!

Thanks!

在 2024/6/9 下午9:48, Xi Ruoyao 写道:

A move/bstrins pair is as fast as a (addi.w|lu12i.w|lu32i.d|lu52i.d)/and
pair, and twice fast as a srli/slli pair.  When the src reg and the dst
reg happens to be the same, the move instruction can be optimized away.

gcc/ChangeLog:

* config/loongarch/predicates.md (high_bitmask_operand): New
predicate.
* config/loongarch/constraints.md (Yy): New constriant.
* config/loongarch/loongarch.md (and3_align): New
define_insn_and_split.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/bstrins-1.c: New test.
* gcc.target/loongarch/bstrins-2.c: New test.
---

Bootstrapped and regtested on loongarch64-linux-gnu.  Ok for trunk?

  gcc/config/loongarch/constraints.md|  5 +
  gcc/config/loongarch/loongarch.md  | 17 +
  gcc/config/loongarch/predicates.md |  4 
  gcc/testsuite/gcc.target/loongarch/bstrins-1.c |  9 +
  gcc/testsuite/gcc.target/loongarch/bstrins-2.c | 14 ++
  5 files changed, 49 insertions(+)
  create mode 100644 gcc/testsuite/gcc.target/loongarch/bstrins-1.c
  create mode 100644 gcc/testsuite/gcc.target/loongarch/bstrins-2.c

diff --git a/gcc/config/loongarch/constraints.md 
b/gcc/config/loongarch/constraints.md
index f07d31650d2..12cf5e2924a 100644
--- a/gcc/config/loongarch/constraints.md
+++ b/gcc/config/loongarch/constraints.md
@@ -94,6 +94,7 @@
  ;;   "A constant @code{move_operand} that can be safely loaded using
  ;;  @code{la}."
  ;;"Yx"
+;;"Yy"
  ;; "Z" -
  ;;"ZC"
  ;;  "A memory operand whose address is formed by a base register and 
offset
@@ -291,6 +292,10 @@ (define_constraint "Yx"
 "@internal"
 (match_operand 0 "low_bitmask_operand"))
  
+(define_constraint "Yy"

+   "@internal"
+   (match_operand 0 "high_bitmask_operand"))
+
  (define_constraint "YI"
"@internal
 A replicated vector const in which the replicated value is in the range
diff --git a/gcc/config/loongarch/loongarch.md 
b/gcc/config/loongarch/loongarch.md
index 5c80c169cbf..25c1d323ba0 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -1542,6 +1542,23 @@ (define_insn "and3_extended"
[(set_attr "move_type" "pick_ins")
 (set_attr "mode" "")])
  
+(define_insn_and_split "and3_align"

+  [(set (match_operand:GPR 0 "register_operand" "=r")
+   (and:GPR (match_operand:GPR 1 "register_operand" "r")
+(match_operand:GPR 2 "high_bitmask_operand" "Yy")))]
+  ""
+  "#"
+  ""
+  [(set (match_dup 0) (match_dup 1))
+   (set (zero_extract:GPR (match_dup 0) (match_dup 2) (const_int 0))
+   (const_int 0))]
+{
+  int len;
+
+  len = low_bitmask_len (mode, ~INTVAL (operands[2]));
+  operands[2] = GEN_INT (len);
+})
+
  (define_insn_and_split "*bstrins__for_mask"
[(set (match_operand:GPR 0 "register_operand" "=r")
(and:GPR (match_operand:GPR 1 "register_operand" "r")
diff --git a/gcc/config/loongarch/predicates.md 
b/gcc/config/loongarch/predicates.md
index eba7f246c84..58e406ea522 100644
--- a/gcc/config/loongarch/predicates.md
+++ b/gcc/config/loongarch/predicates.md
@@ -293,6 +293,10 @@ (define_predicate "low_bitmask_operand"
(and (match_code "const_int")
 (match_test "low_bitmask_len (mode, INTVAL (op)) > 12")))
  
+(define_predicate "high_bitmask_operand"

+  (and (match_code "const_int")
+   (match_test "low_bitmask_len (mode, ~INTVAL (op)) > 0")))
+
  (define_predicate "d_operand"
(and (match_code "reg")
 (match_test "GP_REG_P (REGNO (op))")))
diff --git a/gcc/testsuite/gcc.target/loongarch/bstrins-1.c 
b/gcc/testsuite/gcc.target/loongarch/bstrins-1.c
new file mode 100644
index 000..7cb3a952322
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/bstrins-1.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=loongarch64 -mabi=lp64d" } */
+/* { dg-final { scan-assembler "bstrins\\.d\t\\\$r4,\\\$r0,4,0" } } */
+
+long
+x (long a)
+{
+  return a & -32;
+}
diff --git a/gcc/testsuite/gcc.target/loongarch/bstrins-2.c 
b/gcc/testsuite/gcc.target/loongarch/bstrins-2.c
new file mode 100644
index 000..9777f502e5a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/bstrins-2.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=loongarch64 -mabi=lp64d" } */
+/* { dg-final { scan-assembler "bstrins\\.d\t\\\$r\[0-9\]+,\\\$r0,4,0" } } */
+
+struct aligned_buffer {
+  _Alignas(32) char x[1024];
+};
+
+extern int f(char *);
+int g(void)
+{
+  struct aligned_buffer buf;
+  return f(buf.x);
+}

Re: [PATCH] [APX CCMP] Use ctestcc when comparing to const 0

2024-06-12 Thread Uros Bizjak

On Wed, Jun 12, 2024 at 5:12 AM Hongyu Wang  wrote:
>
> Hi,
>
> For CTEST, we don't have conditional AND so there's no optimization
> opportunity to write a new ctest pattern. Emit ctest when ccmp did
> comparison to const 0 to save bytes.
>
> Bootstrapped & regtested under x86-64-pc-linux-gnu.
>
> Ok for trunk?
>
> gcc/ChangeLog:
>
> * config/i386/i386.md (@ccmp): Use ctestcc when
> operands[3] is const0_rtx.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/apx-ccmp-1.c: Adjust output to scan ctest.
> * gcc.target/i386/apx-ccmp-2.c: Adjust some condition to
> compare with 0.
> ---
>  gcc/config/i386/i386.md|  6 +-
>  gcc/testsuite/gcc.target/i386/apx-ccmp-1.c | 10 ++
>  gcc/testsuite/gcc.target/i386/apx-ccmp-2.c |  4 ++--
>  3 files changed, 13 insertions(+), 7 deletions(-)
>
> diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> index a64f2ad4f5f..014d48cddd6 100644
> --- a/gcc/config/i386/i386.md
> +++ b/gcc/config/i386/i386.md
> @@ -1522,7 +1522,11 @@ (define_insn "@ccmp"
>   [(match_operand:SI 4 "const_0_to_15_operand")]
>   UNSPEC_APX_DFV)))]
>   "TARGET_APX_CCMP"
> - "ccmp%C1{}\t%G4 {%3, %2|%2, %3}"
> + {
> +   if (operands[3] == const0_rtx && !MEM_P (operands[2]))
> + return "ctest%C1{}\t%G4 %2, %2";
> +   return "ccmp%C1{}\t%G4 {%3, %2|%2, %3}";
> + }

This could be implemented as an alternative using "r,C" constraint as
the first constraint for operands[2,3]. Then the register allocator
will match the constraints for you.

Uros.

>   [(set_attr "type" "icmp")
>(set_attr "mode" "")
>(set_attr "length_immediate" "1")
> diff --git a/gcc/testsuite/gcc.target/i386/apx-ccmp-1.c 
> b/gcc/testsuite/gcc.target/i386/apx-ccmp-1.c
> index e4e112f07e0..a8b70576760 100644
> --- a/gcc/testsuite/gcc.target/i386/apx-ccmp-1.c
> +++ b/gcc/testsuite/gcc.target/i386/apx-ccmp-1.c
> @@ -96,9 +96,11 @@ f15 (double a, double b, int c, int d)
>
>  /* { dg-final { scan-assembler-times "ccmpg" 2 } } */
>  /* { dg-final { scan-assembler-times "ccmple" 2 } } */
> -/* { dg-final { scan-assembler-times "ccmpne" 4 } } */
> -/* { dg-final { scan-assembler-times "ccmpe" 3 } } */
> +/* { dg-final { scan-assembler-times "ccmpne" 2 } } */
> +/* { dg-final { scan-assembler-times "ccmpe" 1 } } */
>  /* { dg-final { scan-assembler-times "ccmpbe" 1 } } */
> +/* { dg-final { scan-assembler-times "ctestne" 2 } } */
> +/* { dg-final { scan-assembler-times "cteste" 2 } } */
>  /* { dg-final { scan-assembler-times "ccmpa" 1 } } */
> -/* { dg-final { scan-assembler-times "ccmpbl" 2 } } */
> -
> +/* { dg-final { scan-assembler-times "ccmpbl" 1 } } */
> +/* { dg-final { scan-assembler-times "ctestbl" 1 } } */
> diff --git a/gcc/testsuite/gcc.target/i386/apx-ccmp-2.c 
> b/gcc/testsuite/gcc.target/i386/apx-ccmp-2.c
> index 0123a686d2c..4a0784394c3 100644
> --- a/gcc/testsuite/gcc.target/i386/apx-ccmp-2.c
> +++ b/gcc/testsuite/gcc.target/i386/apx-ccmp-2.c
> @@ -12,7 +12,7 @@ int foo_apx(int a, int b, int c, int d)
>c += d;
>a += b;
>sum += a + c;
> -  if (b != d && sum < c || sum > d)
> +  if (b > d && sum != 0 || sum > d)
> {
>   b += d;
>   sum += b;
> @@ -32,7 +32,7 @@ int foo_noapx(int a, int b, int c, int d)
>c += d;
>a += b;
>sum += a + c;
> -  if (b != d && sum < c || sum > d)
> +  if (b > d && sum != 0 || sum > d)
> {
>   b += d;
>   sum += b;
> --
> 2.31.1
>

Re: [PATCH] [APX CCMP] Use ctestcc when comparing to const 0

2024-06-12 Thread Uros Bizjak

On Wed, Jun 12, 2024 at 12:00 PM Uros Bizjak  wrote:
>
> On Wed, Jun 12, 2024 at 5:12 AM Hongyu Wang  wrote:
> >
> > Hi,
> >
> > For CTEST, we don't have conditional AND so there's no optimization
> > opportunity to write a new ctest pattern. Emit ctest when ccmp did
> > comparison to const 0 to save bytes.
> >
> > Bootstrapped & regtested under x86-64-pc-linux-gnu.
> >
> > Ok for trunk?
> >
> > gcc/ChangeLog:
> >
> > * config/i386/i386.md (@ccmp): Use ctestcc when
> > operands[3] is const0_rtx.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/i386/apx-ccmp-1.c: Adjust output to scan ctest.
> > * gcc.target/i386/apx-ccmp-2.c: Adjust some condition to
> > compare with 0.
> > ---
> >  gcc/config/i386/i386.md|  6 +-
> >  gcc/testsuite/gcc.target/i386/apx-ccmp-1.c | 10 ++
> >  gcc/testsuite/gcc.target/i386/apx-ccmp-2.c |  4 ++--
> >  3 files changed, 13 insertions(+), 7 deletions(-)
> >
> > diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> > index a64f2ad4f5f..014d48cddd6 100644
> > --- a/gcc/config/i386/i386.md
> > +++ b/gcc/config/i386/i386.md
> > @@ -1522,7 +1522,11 @@ (define_insn "@ccmp"
> >   [(match_operand:SI 4 "const_0_to_15_operand")]
> >   UNSPEC_APX_DFV)))]
> >   "TARGET_APX_CCMP"
> > - "ccmp%C1{}\t%G4 {%3, %2|%2, %3}"
> > + {
> > +   if (operands[3] == const0_rtx && !MEM_P (operands[2]))
> > + return "ctest%C1{}\t%G4 %2, %2";
> > +   return "ccmp%C1{}\t%G4 {%3, %2|%2, %3}";
> > + }
>
> This could be implemented as an alternative using "r,C" constraint as
> the first constraint for operands[2,3]. Then the register allocator
> will match the constraints for you.

Like in the attached (lightly tested) patch.

Uros.
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index a64f2ad4f5f..14d4d8cddca 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -1515,14 +1515,17 @@ (define_insn "@ccmp"
 (match_operator 1 "comparison_operator"
  [(reg:CC FLAGS_REG) (const_int 0)])
(compare:CC
- (minus:SWI (match_operand:SWI 2 "nonimmediate_operand" "m,")
-(match_operand:SWI 3 "" ","))
+ (minus:SWI (match_operand:SWI 2 "nonimmediate_operand" ",m,")
+(match_operand:SWI 3 "" 
"C,,"))
  (const_int 0))
(unspec:SI
  [(match_operand:SI 4 "const_0_to_15_operand")]
  UNSPEC_APX_DFV)))]
  "TARGET_APX_CCMP"
- "ccmp%C1{}\t%G4 {%3, %2|%2, %3}"
+ "@
+  ctest%C1{}\t%G4 %2, %2
+  ccmp%C1{}\t%G4 {%3, %2|%2, %3}
+  ccmp%C1{}\t%G4 {%3, %2|%2, %3}"
  [(set_attr "type" "icmp")
   (set_attr "mode" "")
   (set_attr "length_immediate" "1")

[PATCH 0/3] Remove ia64--linux from the list of obsolete targets

Hey there,

I wanted to come back to maintaining the ia64 port as discussed
preciously the other month on the gcc list.

It has been some days as we were busy releasing the biggest release of
our Embdded T2/Linux distribution [0] and we obviously did not want to
propose to enable LRA for IA-64 in the last last days of the gcc 14
release process.

We used the time to further stability test the LRA enabled GCC built
in T2/Linux and set up running the GCC testsuite accordingly for which
Frank posted test results from GCC git for reference [1] and w/ LRA
[2] enabled with only minimal changes, but also some new testsuite
passes. Due to the -j4 run I summed up the text files result manually
in LibreOffice:

gcc
35572, 31789
33273, 28492
37189, 36804
28735, 37634
sum 134769, 134719

g++
69349, 61058
61467, 63545
61614, 63752
56027, 60102
sum 248457, 248457

gfortran
18895, 17502
19329, 19051
13950, 17583
17442, 15482
sum 69616, 69618

objc
693, 783
760, 669
609, 649
716, 677
sum 2778, 2778

ibstdc++
4495, 4635
4001, 3629
3958, 4580
4970, 4580
sum 17424, 17424

The LRA enabled built Linux kernel and whole user-land packages boot
and function normally, too.

Instead of looking into random test suite failures, I would first
rather try to allocate some time to look into some build failures for
more advanced real-world open source packages that I observered over
the last years and already occured unrelated of the LRA enablement.

> > On Mar 7, 2024, at 20:08, Richard Biener  wrote:
> >> I saw the deprecation of ia64*-*-* scrolling by [1].
> >> 
> >> Which surprised me, as (minor bugs aside) gcc ia64*-*-linux just works for 
> >> us and
> >> we still actively support it as part of our T2 System Development 
> >> Environment [2].
> >> 
> >> For ia64 we are currently a team of three and also keep maintaining 
> >> linux-kernel and
> >> glibc git trees with ia64 restored and hope to bring back ia64 to linux 
> >> upstream the
> >> coming months as promised. [3]
> >> 
> >> Despite popular believe ia64 actually just works for all those projects 
> >> and we already
> >> fixed the few minor bugs we could find or reproduce.
> >> 
> >> Last week I also already patched and tested enabling LRA for ia64 in gcc 
> >> [4] and could
> >> -without any regression- compile a full ia64 T2/Linux release ISO that 
> >> boots and runs
> >> including an X desktop and Gtk applications. That was of course even with 
> >> the latest
> >> linux kernel and glibc versions with ia64 support restored respectively.
> >> 
> >> Given there are currently no other volunteers, I therefore with this email 
> >> step up and
> >> offer to become ia64 maintainer for GCC to keep the code compiling, tested 
> >> and
> >> un-deprecated for the next years and releases to come.
> > 
> > You’re welcome - we look forward to LRA enablement with ia64 and for it to 
> > get an
> > active maintainer.  Note maintainers are appointed by the Steering 
> > Committee.

[0] https://t2sde.org/
[1] https://gcc.gnu.org/pipermail/gcc-testresults/2024-May/816346.html
[2] https://gcc.gnu.org/pipermail/gcc-testresults/2024-May/816422.html

-- 
  René Rebe, ExactCODE GmbH, Lietzenburger Str. 42, DE-10789 Berlin
  https://exactcode.com | https://t2sde.org | https://rene.rebe.de

Re: [PATCH v2] Arm: Fix disassembly error in Thumb-1 relaxed load/store [PR115188]

2024-06-12 Thread Richard Earnshaw (lists)

On 11/06/2024 17:35, Wilco Dijkstra wrote:
> Hi Christophe,
> 
>>  PR target/115153
> I guess this is typo (should be 115188) ?
> 
> Correct.
> 
>> +/* { dg-options "-O2 -mthumb" } */-mthumb is included in arm_arch_v6m, so I 
>> think you don't need to add it
> here?
> 
> Indeed, it's not strictly necessary. Fixed in v2:
> 
> A Thumb-1 memory operand allows single-register LDMIA/STMIA. This doesn't get
> printed as LDR/STR with writeback in unified syntax, resulting in strange
> assembler errors if writeback is selected.  To work around this, use the 'Uw'
> constraint that blocks writeback.

Doing just this will mean that the register allocator will have to undo a 
pre/post memory operand that was accepted by the predicate (memory_operand).  I 
think we really need a tighter predicate (lets call it noautoinc_mem_op) here 
to avoid that.  Note that the existing uses of Uw also had another alternative 
that did permit 'm', so this wasn't previously practical, but they had 
alternative ways of being reloaded.

R.

> 
> Passes bootstrap & regress, OK for commit and backport?
> 
> gcc:
> PR target/115188
> * config/arm/sync.md (arm_atomic_load): Use 'Uw' constraint.
> (arm_atomic_store): Likewise.
> 
> gcc/testsuite:
> PR target/115188
> * gcc.target/arm/pr115188.c: Add new test.
> 
> ---
> 
> diff --git a/gcc/config/arm/sync.md b/gcc/config/arm/sync.md
> index 
> df8dbe170cacb6b60d56a6f19aadd5a6c9c51f7a..e856ee51d9ae7b945c4d1e9d1f08afeedc95707a
>  100644
> --- a/gcc/config/arm/sync.md
> +++ b/gcc/config/arm/sync.md
> @@ -65,7 +65,7 @@
>  (define_insn "arm_atomic_load"
>[(set (match_operand:QHSI 0 "register_operand" "=r,l")
>  (unspec_volatile:QHSI
> -  [(match_operand:QHSI 1 "memory_operand" "m,m")]
> +  [(match_operand:QHSI 1 "memory_operand" "m,Uw")]
>VUNSPEC_LDR))]
>""
>"ldr\t%0, %1"
> @@ -81,7 +81,7 @@
>  )
> 
>  (define_insn "arm_atomic_store"
> -  [(set (match_operand:QHSI 0 "memory_operand" "=m,m")
> +  [(set (match_operand:QHSI 0 "memory_operand" "=m,Uw")
>  (unspec_volatile:QHSI
>[(match_operand:QHSI 1 "register_operand" "r,l")]
>VUNSPEC_STR))]
> diff --git a/gcc/testsuite/gcc.target/arm/pr115188.c 
> b/gcc/testsuite/gcc.target/arm/pr115188.c
> new file mode 100644
> index 
> ..9a4022b56796d6962bb3f22e40bac4b81eb78ccf
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/pr115188.c
> @@ -0,0 +1,10 @@
> +/* { dg-do assemble } */
> +/* { dg-require-effective-target arm_arch_v6m_ok }
> +/* { dg-options "-O2" } */
> +/* { dg-add-options arm_arch_v6m } */
> +
> +void init (int *p, int n)
> +{
> +  for (int i = 0; i < n; i++)
> +__atomic_store_4 (p + i, 0, __ATOMIC_RELAXED);
> +}
>

Re: [PATCH] tree-optimization/115385 - handle more gaps with peeling of a single iteration

On Wed, 12 Jun 2024, Richard Sandiford wrote:

> Richard Biener  writes:
> > On Wed, 12 Jun 2024, Richard Biener wrote:
> >
> >> On Tue, 11 Jun 2024, Richard Sandiford wrote:
> >> 
> >> > Don't think it makes any difference, but:
> >> > 
> >> > Richard Biener  writes:
> >> > > @@ -2151,7 +2151,16 @@ get_group_load_store_type (vec_info *vinfo, 
> >> > > stmt_vec_info stmt_info,
> >> > > access excess elements.
> >> > > ???  Enhancements include peeling multiple 
> >> > > iterations
> >> > > or using masked loads with a static mask.  */
> >> > > -|| (group_size * cvf) % cnunits + group_size - gap < 
> >> > > cnunits))
> >> > > +|| ((group_size * cvf) % cnunits + group_size - gap < 
> >> > > cnunits
> >> > > +/* But peeling a single scalar iteration is 
> >> > > enough if
> >> > > +   we can use the next power-of-two sized partial
> >> > > +   access.  */
> >> > > +&& ((cremain = (group_size * cvf - gap) % 
> >> > > cnunits), true
> >> > 
> >> > ...this might be less surprising as:
> >> > 
> >> >&& ((cremain = (group_size * cvf - gap) % cnunits, true)
> >> > 
> >> > in terms of how the &&s line up.
> >> 
> >> Yeah - I'll fix before pushing.
> >
> > The aarch64 CI shows that a few testcases no longer use SVE
> > (gcc.target/aarch64/sve/slp_perm_{4,7,8}.c) because peeling
> > for gaps is deemed isufficient.  Formerly we had
> >
> >   if (loop_vinfo
> >   && *memory_access_type == VMAT_CONTIGUOUS
> >   && SLP_TREE_LOAD_PERMUTATION (slp_node).exists ()
> >   && !multiple_p (group_size * LOOP_VINFO_VECT_FACTOR 
> > (loop_vinfo),
> >   nunits))
> > {
> >   unsigned HOST_WIDE_INT cnunits, cvf;
> >   if (!can_overrun_p
> >   || !nunits.is_constant (&cnunits)
> >   || !LOOP_VINFO_VECT_FACTOR (loop_vinfo).is_constant 
> > (&cvf)
> >   /* Peeling for gaps assumes that a single scalar 
> > iteration
> >  is enough to make sure the last vector iteration 
> > doesn't
> >  access excess elements.
> >  ???  Enhancements include peeling multiple iterations
> >  or using masked loads with a static mask.  */
> >   || (group_size * cvf) % cnunits + group_size - gap < 
> > cnunits)
> > {
> >   if (dump_enabled_p ())
> > dump_printf_loc (MSG_MISSED_OPTIMIZATION, 
> > vect_location,
> >  "peeling for gaps insufficient for "
> >  "access\n");
> >
> > and in all cases multiple_p (group_size * LOOP_VINFO_VECT_FACTOR, nunits)
> > is true so we didn't check for whether peeling one iteration is
> > sufficient.  But after the refactoring the outer checks merely
> > indicates there's overrun (which is there already because gap != 0).
> >
> > That is, we never verified, for the "regular" gap case, whether peeling
> > for a single iteration is sufficient.  But now of course we run into
> > the inner check which will always trigger if earlier checks didn't
> > work out to set overrun_p to false.
> >
> > For slp_perm_8.c we have a group_size of two, nunits is {16, 16}
> > and VF is {8, 8} and gap is one.  Given we know the
> > multiple_p we know that (group_size * cvf) % cnunits is zero,
> > so what remains is group_size - gap < nunits but 1 is probably
> > always less than {16, 16}.
> 
> I thought the idea was that the size of the gap was immaterial
> for VMAT_CONTIGUOUS, on the assumption that it would never be
> bigger than a page.  That is, any gap loaded by the final
> unpeeled iteration would belong to the same page as the non-gap
> data from either the same vector iteration or the subsequent
> peeled scalar iteration.

The subsequent scalar iteration might be on the same page as the
last vector iteration but that accessing elements beyond those
touched by the subsequent scalar iteration (which could be on
the next page).  That's what this is supposed to check - that in
fact the next scalar iteration covers all elements accessed in
excess in the last vector iteration.

So the size of the gap matters if it is larger than the group_size
or the size of a vector (the former happens with group_size being
smaller than the vector size which can happen with permutes).

> 
> Will have to think more about this if that doesn't affect the
> rest of the message, but FWIW...
> 
> > The new logic I added in the later patch that peeling a single
> > iteration is OK when we use a smaller, rounded-up to power-of-two
> > sized access is
> >
> >   || ((group_size * cvf) % cnunits + group_size - gap < 
> > cnunits
> >   /* But peeling a single scalar iteration is enough

[PATCH 1/3] Remove ia64--linux from the list of obsolete targets

The following un-deprecates ia64*-*-linux for GCC 15. Since we plan to
support this for some years to come.

gcc/
* config.gcc: Only exlicitly list ia64*-*-(hpux|vms|elf) in the
  list of obsoleted targets.

contrib/
* config-list.mk (LIST): no --enable-obsolete for ia64*-*-linux.
---
 contrib/config-list.mk | 4 ++--
 gcc/config.gcc | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/contrib/config-list.mk b/contrib/config-list.mk
index f282cd95c8d..b99573b1f5b 100644
--- a/contrib/config-list.mk
+++ b/contrib/config-list.mk
@@ -60,8 +60,8 @@ LIST = \
   i686-pc-linux-gnu i686-pc-msdosdjgpp i686-lynxos i686-nto-qnx \
   i686-rtems i686-solaris2.11 i686-wrs-vxworks \
   i686-wrs-vxworksae \
-  i686-cygwinOPT-enable-threads=yes i686-mingw32crt 
ia64-elfOPT-enable-obsolete \
-  ia64-linuxOPT-enable-obsolete ia64-hpuxOPT-enable-obsolete \
+  i686-cygwinOPT-enable-threads=yes i686-mingw32crt linux-ia64 \
+  ia64-elfOPT-enable-obsolete ia64-hpuxOPT-enable-obsolete
   ia64-hp-vmsOPT-enable-obsolete iq2000-elf lm32-elf \
   lm32-rtems lm32-uclinux \
   loongarch64-linux-gnuf64 loongarch64-linux-gnuf32 loongarch64-linux-gnusf \
diff --git a/gcc/config.gcc b/gcc/config.gcc
index a37113bd00a..6d6ca6da7a0 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -272,7 +272,7 @@ esac
 
 # Obsolete configurations.
 case ${target} in
- ia64*-*-* \
+ ia64*-*-hpux* | ia64*-*-*vms* | ia64*-*-elf*  \
| nios2*-*-*\
  )
 if test "x$enable_obsolete" != xyes; then
-- 
2.45.0


-- 
  René Rebe, ExactCODE GmbH, Lietzenburger Str. 42, DE-10789 Berlin
  https://exactcode.com | https://t2sde.org | https://rene.rebe.de

[PATCH 3/3] MAINTAINERS: Add myself as IA-64 maintainer.

 MAINTAINERS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index e2870eef2ef..4328ca5f84c 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -78,6 +78,7 @@ i386 port Jan Hubicka 
 i386 port  Uros Bizjak 
 i386 vector ISA extns  Kirill Yukhin   
 i386 vector ISA extns  Hongtao Liu 
+ia64 port  René Rebe   
 iq2000 portNick Clifton
 lm32 port  Sebastien Bourdeauducq  
 LoongArch port Chenghua Xu 
-- 
2.45.0


-- 
  René Rebe, ExactCODE GmbH, Lietzenburger Str. 42, DE-10789 Berlin
  https://exactcode.com | https://t2sde.org | https://rene.rebe.de

[PATCH 2/3] Enabled LRA for ia64.

gcc/
* config/ia64/ia64.cc: Enable LRA for ia64.
* config/ia64/ia64.md: Likewise.
* config/ia64/predicates.md: Likewise.
---
 gcc/config/ia64/ia64.cc   | 7 ++-
 gcc/config/ia64/ia64.md   | 4 ++--
 gcc/config/ia64/predicates.md | 2 +-
 3 files changed, 5 insertions(+), 8 deletions(-)

diff --git a/gcc/config/ia64/ia64.cc b/gcc/config/ia64/ia64.cc
index ac3d56073ac..d189bfb2cb4 100644
--- a/gcc/config/ia64/ia64.cc
+++ b/gcc/config/ia64/ia64.cc
@@ -618,9 +618,6 @@ static const scoped_attribute_specs *const 
ia64_attribute_table[] =
 #undef TARGET_LEGITIMATE_ADDRESS_P
 #define TARGET_LEGITIMATE_ADDRESS_P ia64_legitimate_address_p
 
-#undef TARGET_LRA_P
-#define TARGET_LRA_P hook_bool_void_false
-
 #undef TARGET_CANNOT_FORCE_CONST_MEM
 #define TARGET_CANNOT_FORCE_CONST_MEM ia64_cannot_force_const_mem
 
@@ -1329,7 +1326,7 @@ ia64_expand_move (rtx op0, rtx op1)
 {
   machine_mode mode = GET_MODE (op0);
 
-  if (!reload_in_progress && !reload_completed && !ia64_move_ok (op0, op1))
+  if (!lra_in_progress && !reload_completed && !ia64_move_ok (op0, op1))
 op1 = force_reg (mode, op1);
 
   if ((mode == Pmode || mode == ptr_mode) && symbolic_operand (op1, VOIDmode))
@@ -1776,7 +1773,7 @@ ia64_expand_movxf_movrf (machine_mode mode, rtx 
operands[])
}
 }
 
-  if (!reload_in_progress && !reload_completed)
+  if (!lra_in_progress && !reload_completed)
 {
   operands[1] = spill_xfmode_rfmode_operand (operands[1], 0, mode);
 
diff --git a/gcc/config/ia64/ia64.md b/gcc/config/ia64/ia64.md
index 698e302081e..d485acc0ea8 100644
--- a/gcc/config/ia64/ia64.md
+++ b/gcc/config/ia64/ia64.md
@@ -2318,7 +2318,7 @@
  (match_operand:DI 3 "register_operand" "f"))
 (match_operand:DI 4 "nonmemory_operand" "rI")))
(clobber (match_scratch:DI 5 "=f"))]
-  "reload_in_progress"
+  "lra_in_progress"
   "#"
   [(set_attr "itanium_class" "unknown")])
 
@@ -3407,7 +3407,7 @@
   (match_operand:DI 2 "shladd_operand" "n"))
  (match_operand:DI 3 "nonmemory_operand" "r"))
 (match_operand:DI 4 "nonmemory_operand" "rI")))]
-  "reload_in_progress"
+  "lra_in_progress"
   "* gcc_unreachable ();"
   "reload_completed"
   [(set (match_dup 0) (plus:DI (mult:DI (match_dup 1) (match_dup 2))
diff --git a/gcc/config/ia64/predicates.md b/gcc/config/ia64/predicates.md
index 01a4effd339..85f5380e734 100644
--- a/gcc/config/ia64/predicates.md
+++ b/gcc/config/ia64/predicates.md
@@ -347,7 +347,7 @@
   allows reload the opportunity to avoid spilling addresses to
   the stack, and instead simply substitute in the value from a
   REG_EQUIV.  We'll split this up again when splitting the insn.  */
-   if (reload_in_progress || reload_completed)
+   if (lra_in_progress || reload_completed)
  return true;
 
/* Some symbol types we allow to use with any offset.  */
-- 
2.45.0


-- 
  René Rebe, ExactCODE GmbH, Lietzenburger Str. 42, DE-10789 Berlin
  https://exactcode.com | https://t2sde.org | https://rene.rebe.de

Re: [PATCH v2] Arm: Fix disassembly error in Thumb-1 relaxed load/store [PR115188]

2024-06-12 Thread Richard Earnshaw




On 12/06/2024 11:35, Richard Earnshaw (lists) wrote:
> On 11/06/2024 17:35, Wilco Dijkstra wrote:
>> Hi Christophe,
>>
>>>  PR target/115153
>> I guess this is typo (should be 115188) ?
>>
>> Correct.
>>
>>> +/* { dg-options "-O2 -mthumb" } */-mthumb is included in arm_arch_v6m, so 
>>> I think you don't need to add it
>> here?
>>
>> Indeed, it's not strictly necessary. Fixed in v2:
>>
>> A Thumb-1 memory operand allows single-register LDMIA/STMIA. This doesn't get
>> printed as LDR/STR with writeback in unified syntax, resulting in strange
>> assembler errors if writeback is selected.  To work around this, use the 'Uw'
>> constraint that blocks writeback.
> 
> Doing just this will mean that the register allocator will have to undo a 
> pre/post memory operand that was accepted by the predicate (memory_operand).  
> I think we really need a tighter predicate (lets call it noautoinc_mem_op) 
> here to avoid that.  Note that the existing uses of Uw also had another 
> alternative that did permit 'm', so this wasn't previously practical, but 
> they had alternative ways of being reloaded.

No, sorry that won't work; there's another 'm' alternative here as well.
The correct fix is to add alternatives for T1, I think, similar to the one in 
thumb1_movsi_insn.

Also, by observation I think there's a similar problem in the load operations.

R.

> 
> R.
> 
>>
>> Passes bootstrap & regress, OK for commit and backport?
>>
>> gcc:
>> PR target/115188
>> * config/arm/sync.md (arm_atomic_load): Use 'Uw' constraint.
>> (arm_atomic_store): Likewise.
>>
>> gcc/testsuite:
>> PR target/115188
>> * gcc.target/arm/pr115188.c: Add new test.
>>
>> ---
>>
>> diff --git a/gcc/config/arm/sync.md b/gcc/config/arm/sync.md
>> index 
>> df8dbe170cacb6b60d56a6f19aadd5a6c9c51f7a..e856ee51d9ae7b945c4d1e9d1f08afeedc95707a
>>  100644
>> --- a/gcc/config/arm/sync.md
>> +++ b/gcc/config/arm/sync.md
>> @@ -65,7 +65,7 @@
>>  (define_insn "arm_atomic_load"
>>[(set (match_operand:QHSI 0 "register_operand" "=r,l")
>>  (unspec_volatile:QHSI
>> -  [(match_operand:QHSI 1 "memory_operand" "m,m")]
>> +  [(match_operand:QHSI 1 "memory_operand" "m,Uw")]
>>VUNSPEC_LDR))]
>>""
>>"ldr\t%0, %1"
>> @@ -81,7 +81,7 @@
>>  )
>>
>>  (define_insn "arm_atomic_store"
>> -  [(set (match_operand:QHSI 0 "memory_operand" "=m,m")
>> +  [(set (match_operand:QHSI 0 "memory_operand" "=m,Uw")
>>  (unspec_volatile:QHSI
>>[(match_operand:QHSI 1 "register_operand" "r,l")]
>>VUNSPEC_STR))]
>> diff --git a/gcc/testsuite/gcc.target/arm/pr115188.c 
>> b/gcc/testsuite/gcc.target/arm/pr115188.c
>> new file mode 100644
>> index 
>> ..9a4022b56796d6962bb3f22e40bac4b81eb78ccf
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/arm/pr115188.c
>> @@ -0,0 +1,10 @@
>> +/* { dg-do assemble } */
>> +/* { dg-require-effective-target arm_arch_v6m_ok }
>> +/* { dg-options "-O2" } */
>> +/* { dg-add-options arm_arch_v6m } */
>> +
>> +void init (int *p, int n)
>> +{
>> +  for (int i = 0; i < n; i++)
>> +__atomic_store_4 (p + i, 0, __ATOMIC_RELAXED);
>> +}
>>
>

Re: [PATCH 2/3] Enabled LRA for ia64.

On Wed, 12 Jun 2024, Rene Rebe wrote:
>
> gcc/
> * config/ia64/ia64.cc: Enable LRA for ia64.
> * config/ia64/ia64.md: Likewise.
> * config/ia64/predicates.md: Likewise.

That looks simple enough.  I cannot find any copyright assignment on
file with the FSF so you probably want to contribute to GCC under
the DCO (see https://gcc.gnu.org/dco.html), in that case please post
patches with Signed-off-by: tags.

For this patch please state how you tested it, I assume you
bootstrapped GCC natively on ia64-linux and ran the testsuite.
I can find two gcc-testresult postings, one appearantly with LRA
and one without?  Both from May:

https://sourceware.org/pipermail/gcc-testresults/2024-May/816422.html
https://sourceware.org/pipermail/gcc-testresults/2024-May/816346.html

somehow for example libstdc++ summaries were not merged, it might
be you do not have recent python installed on the system?  Or you
didn't use contrib/test_summary to create those mails.  It would be
nice to see the difference between LRA and not LRA in the testresults,
can you quote that?

Thanks,
Richard.

> ---
>  gcc/config/ia64/ia64.cc   | 7 ++-
>  gcc/config/ia64/ia64.md   | 4 ++--
>  gcc/config/ia64/predicates.md | 2 +-
>  3 files changed, 5 insertions(+), 8 deletions(-)
> 
> diff --git a/gcc/config/ia64/ia64.cc b/gcc/config/ia64/ia64.cc
> index ac3d56073ac..d189bfb2cb4 100644
> --- a/gcc/config/ia64/ia64.cc
> +++ b/gcc/config/ia64/ia64.cc
> @@ -618,9 +618,6 @@ static const scoped_attribute_specs *const 
> ia64_attribute_table[] =
>  #undef TARGET_LEGITIMATE_ADDRESS_P
>  #define TARGET_LEGITIMATE_ADDRESS_P ia64_legitimate_address_p
>  
> -#undef TARGET_LRA_P
> -#define TARGET_LRA_P hook_bool_void_false
> -
>  #undef TARGET_CANNOT_FORCE_CONST_MEM
>  #define TARGET_CANNOT_FORCE_CONST_MEM ia64_cannot_force_const_mem
>  
> @@ -1329,7 +1326,7 @@ ia64_expand_move (rtx op0, rtx op1)
>  {
>machine_mode mode = GET_MODE (op0);
>  
> -  if (!reload_in_progress && !reload_completed && !ia64_move_ok (op0, op1))
> +  if (!lra_in_progress && !reload_completed && !ia64_move_ok (op0, op1))
>  op1 = force_reg (mode, op1);
>  
>if ((mode == Pmode || mode == ptr_mode) && symbolic_operand (op1, 
> VOIDmode))
> @@ -1776,7 +1773,7 @@ ia64_expand_movxf_movrf (machine_mode mode, rtx 
> operands[])
>   }
>  }
>  
> -  if (!reload_in_progress && !reload_completed)
> +  if (!lra_in_progress && !reload_completed)
>  {
>operands[1] = spill_xfmode_rfmode_operand (operands[1], 0, mode);
>  
> diff --git a/gcc/config/ia64/ia64.md b/gcc/config/ia64/ia64.md
> index 698e302081e..d485acc0ea8 100644
> --- a/gcc/config/ia64/ia64.md
> +++ b/gcc/config/ia64/ia64.md
> @@ -2318,7 +2318,7 @@
> (match_operand:DI 3 "register_operand" "f"))
>(match_operand:DI 4 "nonmemory_operand" "rI")))
> (clobber (match_scratch:DI 5 "=f"))]
> -  "reload_in_progress"
> +  "lra_in_progress"
>"#"
>[(set_attr "itanium_class" "unknown")])
>  
> @@ -3407,7 +3407,7 @@
>  (match_operand:DI 2 "shladd_operand" "n"))
> (match_operand:DI 3 "nonmemory_operand" "r"))
>(match_operand:DI 4 "nonmemory_operand" "rI")))]
> -  "reload_in_progress"
> +  "lra_in_progress"
>"* gcc_unreachable ();"
>"reload_completed"
>[(set (match_dup 0) (plus:DI (mult:DI (match_dup 1) (match_dup 2))
> diff --git a/gcc/config/ia64/predicates.md b/gcc/config/ia64/predicates.md
> index 01a4effd339..85f5380e734 100644
> --- a/gcc/config/ia64/predicates.md
> +++ b/gcc/config/ia64/predicates.md
> @@ -347,7 +347,7 @@
>  allows reload the opportunity to avoid spilling addresses to
>  the stack, and instead simply substitute in the value from a
>  REG_EQUIV.  We'll split this up again when splitting the insn.  */
> - if (reload_in_progress || reload_completed)
> + if (lra_in_progress || reload_completed)
> return true;
>  
>   /* Some symbol types we allow to use with any offset.  */
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

RE: [Committed] RISC-V: Add basic Zaamo and Zalrsc support

2024-06-12 Thread Li, Pan2

Do we need to upgrade the binutils of the riscv-gnu-toolchain repo? Or we may 
have unknown prefixed ISA extension `zaamo' when building.

Pan


-Original Message-
From: Patrick O'Neill  
Sent: Wednesday, June 12, 2024 1:08 AM
To: Jeff Law ; gcc-patches@gcc.gnu.org
Cc: kito.ch...@gmail.com; e...@rivosinc.com; pal...@dabbelt.com; 
gnu-toolch...@rivosinc.com; and...@rivosinc.com
Subject: [Committed] RISC-V: Add basic Zaamo and Zalrsc support


On 6/10/24 21:33, Jeff Law wrote:
>
>
> On 6/10/24 3:46 PM, Patrick O'Neill wrote:
>> The A extension has been split into two parts: Zaamo and Zalrsc.
>> This patch adds basic support by making the A extension imply Zaamo and
>> Zalrsc.
>>
>> Zaamo/Zalrsc spec: https://github.com/riscv/riscv-zaamo-zalrsc/tags
>> Ratification: https://jira.riscv.org/browse/RVS-1995
>>
>> v2:
>> Rebased and updated some testcases that rely on the ISA string.
>>
>> v3:
>> Regex-ify temp registers in added testcases.
>> Remove unintentional whitespace changes.
>> Add riscv_{a|ztso|zaamo|zalrsc} docs to sourcebuild.texi (and move 
>> core-v bi
>> extension doc into appropriate section).
>>
>> Edwin Lu (1):
>>    RISC-V: Add basic Zaamo and Zalrsc support
>>
>> Patrick O'Neill (2):
>>    RISC-V: Add Zalrsc and Zaamo testsuite support
>>    RISC-V: Add Zalrsc amo-op patterns
>>
>>   gcc/common/config/riscv/riscv-common.cc   |  11 +-
>>   gcc/config/riscv/arch-canonicalize    |   1 +
>>   gcc/config/riscv/riscv.opt    |   6 +-
>>   gcc/config/riscv/sync.md  | 152 +++---
>>   gcc/doc/sourcebuild.texi  |  16 +-
>>   .../riscv/amo-table-a-6-amo-add-1.c   |   2 +-
>>   .../riscv/amo-table-a-6-amo-add-2.c   |   2 +-
>>   .../riscv/amo-table-a-6-amo-add-3.c   |   2 +-
>>   .../riscv/amo-table-a-6-amo-add-4.c   |   2 +-
>>   .../riscv/amo-table-a-6-amo-add-5.c   |   2 +-
>>   .../riscv/amo-table-a-6-compare-exchange-1.c  |   2 +-
>>   .../riscv/amo-table-a-6-compare-exchange-2.c  |   2 +-
>>   .../riscv/amo-table-a-6-compare-exchange-3.c  |   2 +-
>>   .../riscv/amo-table-a-6-compare-exchange-4.c  |   2 +-
>>   .../riscv/amo-table-a-6-compare-exchange-5.c  |   2 +-
>>   .../riscv/amo-table-a-6-compare-exchange-6.c  |   2 +-
>>   .../riscv/amo-table-a-6-compare-exchange-7.c  |   2 +-
>>   .../riscv/amo-table-a-6-subword-amo-add-1.c   |   2 +-
>>   .../riscv/amo-table-a-6-subword-amo-add-2.c   |   2 +-
>>   .../riscv/amo-table-a-6-subword-amo-add-3.c   |   2 +-
>>   .../riscv/amo-table-a-6-subword-amo-add-4.c   |   2 +-
>>   .../riscv/amo-table-a-6-subword-amo-add-5.c   |   2 +-
>>   .../riscv/amo-table-ztso-amo-add-1.c  |   2 +-
>>   .../riscv/amo-table-ztso-amo-add-2.c  |   2 +-
>>   .../riscv/amo-table-ztso-amo-add-3.c  |   2 +-
>>   .../riscv/amo-table-ztso-amo-add-4.c  |   2 +-
>>   .../riscv/amo-table-ztso-amo-add-5.c  |   2 +-
>>   .../riscv/amo-table-ztso-compare-exchange-1.c |   2 +-
>>   .../riscv/amo-table-ztso-compare-exchange-2.c |   2 +-
>>   .../riscv/amo-table-ztso-compare-exchange-3.c |   2 +-
>>   .../riscv/amo-table-ztso-compare-exchange-4.c |   2 +-
>>   .../riscv/amo-table-ztso-compare-exchange-5.c |   2 +-
>>   .../riscv/amo-table-ztso-compare-exchange-6.c |   2 +-
>>   .../riscv/amo-table-ztso-compare-exchange-7.c |   2 +-
>>   .../riscv/amo-table-ztso-subword-amo-add-1.c  |   2 +-
>>   .../riscv/amo-table-ztso-subword-amo-add-2.c  |   2 +-
>>   .../riscv/amo-table-ztso-subword-amo-add-3.c  |   2 +-
>>   .../riscv/amo-table-ztso-subword-amo-add-4.c  |   2 +-
>>   .../riscv/amo-table-ztso-subword-amo-add-5.c  |   2 +-
>>   .../riscv/amo-zaamo-preferred-over-zalrsc.c   |  17 ++
>>   .../gcc.target/riscv/amo-zalrsc-amo-add-1.c   |  19 +++
>>   .../gcc.target/riscv/amo-zalrsc-amo-add-2.c   |  19 +++
>>   .../gcc.target/riscv/amo-zalrsc-amo-add-3.c   |  19 +++
>>   .../gcc.target/riscv/amo-zalrsc-amo-add-4.c   |  19 +++
>>   .../gcc.target/riscv/amo-zalrsc-amo-add-5.c   |  19 +++
>>   gcc/testsuite/gcc.target/riscv/attribute-15.c |   2 +-
>>   gcc/testsuite/gcc.target/riscv/attribute-16.c |   2 +-
>>   gcc/testsuite/gcc.target/riscv/attribute-17.c |   2 +-
>>   gcc/testsuite/gcc.target/riscv/attribute-18.c |   2 +-
>>   gcc/testsuite/gcc.target/riscv/pr110696.c |   2 +-
>>   .../gcc.target/riscv/rvv/base/pr114352-1.c    |   4 +-
>>   .../gcc.target/riscv/rvv/base/pr114352-3.c    |   8 +-
>>   gcc/testsuite/lib/target-supports.exp |  48 +-
>>   53 files changed, 366 insertions(+), 70 deletions(-)
>>   create mode 100644 
>> gcc/testsuite/gcc.target/riscv/amo-zaamo-preferred-over-zalrsc.c
>>   create mode 100644 
>> gcc/testsuite/gcc.target/riscv/amo-zalrsc-amo-add-1.c
>>   create mode 100644 
>> gcc/testsuite/gcc.target/riscv/amo-zalrsc-amo-add-2.c
>>   create mode 100644 
>> gcc/testsuite/gcc.target/riscv/amo-zalrsc-amo-add-3.c
>>   create mode 100644 
>> gcc/testsuite/gcc.target/riscv/amo-zalrsc-amo-add

Re: [Committed] RISC-V: Add basic Zaamo and Zalrsc support

2024-06-12 Thread Andreas Schwab

On Jun 12 2024, Li, Pan2 wrote:

> Do we need to upgrade the binutils of the riscv-gnu-toolchain repo? Or we may 
> have unknown prefixed ISA extension `zaamo' when building.

There needs to be a configure check if binutils can grok the extension.

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."

[PATCH v5 1/6] libgomp: change alloc-pinned tests failure mode

The feature doesn't work on non-Linux hosts, at present, so skip the tests
entirely.

On Linux systems that have insufficient lockable memory configured we still
need to fail or else the feature won't be getting tested when we think it is,
but now there's a message to explain why.

libgomp/ChangeLog:

* testsuite/libgomp.c/alloc-pinned-1.c: Change dg-xfail-run-if to
dg-skip-if.
Correct spelling mistake.
Abort on insufficient lockable memory.
Use #error on non-linux hosts.
* testsuite/libgomp.c/alloc-pinned-2.c: Likewise.
---
 libgomp/testsuite/libgomp.c/alloc-pinned-1.c | 20 ++--
 libgomp/testsuite/libgomp.c/alloc-pinned-2.c | 20 ++--
 2 files changed, 12 insertions(+), 28 deletions(-)

diff --git a/libgomp/testsuite/libgomp.c/alloc-pinned-1.c 
b/libgomp/testsuite/libgomp.c/alloc-pinned-1.c
index 4185accf2e6..672f2453a78 100644
--- a/libgomp/testsuite/libgomp.c/alloc-pinned-1.c
+++ b/libgomp/testsuite/libgomp.c/alloc-pinned-1.c
@@ -1,6 +1,6 @@
 /* { dg-do run } */
 
-/* { dg-xfail-run-if "Pinning not implemented on this host" { ! *-*-linux-gnu 
} } */
+/* { dg-skip-if "Pinning not implemented on this host" { ! *-*-linux-gnu* } } 
*/
 
 /* Test that pinned memory works.  */
 
@@ -19,7 +19,10 @@
   struct rlimit limit; \
   if (getrlimit (RLIMIT_MEMLOCK, &limit) \
   || limit.rlim_cur <= SIZE) \
-fprintf (stderr, "unsufficient lockable memory; please increase 
ulimit\n"); \
+{ \
+  fprintf (stderr, "insufficient lockable memory; please increase 
ulimit\n"); \
+  abort (); \
+} \
   }
 
 int
@@ -44,18 +47,7 @@ get_pinned_mem ()
   abort ();
 }
 #else
-#define PAGE_SIZE 1024 /* unknown */
-#define CHECK_SIZE(SIZE) { \
-  fprintf (stderr, "OS unsupported\n"); \
-  abort (); \
-  }
-#define EXPECT_OMP_NULL_ALLOCATOR
-
-int
-get_pinned_mem ()
-{
-  return 0;
-}
+#error "OS unsupported"
 #endif
 
 static void
diff --git a/libgomp/testsuite/libgomp.c/alloc-pinned-2.c 
b/libgomp/testsuite/libgomp.c/alloc-pinned-2.c
index 0b9c11d0315..b6d1d83fb6f 100644
--- a/libgomp/testsuite/libgomp.c/alloc-pinned-2.c
+++ b/libgomp/testsuite/libgomp.c/alloc-pinned-2.c
@@ -1,6 +1,6 @@
 /* { dg-do run } */
 
-/* { dg-xfail-run-if "Pinning not implemented on this host" { ! *-*-linux-gnu 
} } */
+/* { dg-skip-if "Pinning not implemented on this host" { ! *-*-linux-gnu* } } 
*/
 
 /* Test that pinned memory works (pool_size code path).  */
 
@@ -19,7 +19,10 @@
   struct rlimit limit; \
   if (getrlimit (RLIMIT_MEMLOCK, &limit) \
   || limit.rlim_cur <= SIZE) \
-fprintf (stderr, "unsufficient lockable memory; please increase 
ulimit\n"); \
+{ \
+  fprintf (stderr, "insufficient lockable memory; please increase 
ulimit\n"); \
+  abort (); \
+} \
   }
 
 int
@@ -44,18 +47,7 @@ get_pinned_mem ()
   abort ();
 }
 #else
-#define PAGE_SIZE 1024 /* unknown */
-#define CHECK_SIZE(SIZE) { \
-  fprintf (stderr, "OS unsupported\n"); \
-  abort (); \
-  }
-#define EXPECT_OMP_NULL_ALLOCATOR
-
-int
-get_pinned_mem ()
-{
-  return 0;
-}
+#error "OS unsupported"
 #endif
 
 static void
-- 
2.41.0

[PATCH v5 0/6] libgomp: OpenMP pinned memory for omp_alloc

This patch series is a rework of the v4 series I posted in May:

https://patchwork.sourceware.org/project/gcc/list/?series=34587&state=%2A&archive=both

This adds a new patch (1/6) that adjusts criticisms of the testcases
that were already committed. The same issues have been fixed in the new
testcases included in the rest of the series.

Otherwise, I've address comments regarding the enum values, naming,
and implemented previously missed cases in the environment variables and
parsers.

OK for mainline?

Andrew

Andrew Stubbs (6):
  libgomp: change alloc-pinned tests failure mode
  libgomp, openmp: Add ompx_gnu_pinned_mem_alloc
  openmp: Add -foffload-memory
  openmp: -foffload-memory=pinned
  libgomp, nvptx: Cuda pinned memory
  libgomp: fine-grained pinned memory allocator

 gcc/common.opt|  16 +
 gcc/coretypes.h   |   7 +
 gcc/doc/invoke.texi   |  16 +-
 gcc/fortran/openmp.cc |  11 +-
 gcc/omp-builtins.def  |   3 +
 gcc/omp-low.cc|  66 
 .../gfortran.dg/gomp/allocate-pinned-1.f90|  16 +
 libgomp/Makefile.am   |   2 +-
 libgomp/Makefile.in   |   7 +-
 libgomp/allocator.c   | 115 +--
 libgomp/config/linux/allocator.c  | 206 +--
 libgomp/env.c |   1 +
 libgomp/libgomp-plugin.h  |   2 +
 libgomp/libgomp.h |  14 +
 libgomp/libgomp.map   |   1 +
 libgomp/libgomp.texi  |  18 +-
 libgomp/libgomp_g.h   |   1 +
 libgomp/omp.h.in  |   1 +
 libgomp/omp_lib.f90.in|   2 +
 libgomp/omp_lib.h.in  |   2 +
 libgomp/plugin/plugin-nvptx.c |  42 +++
 libgomp/target.c  | 136 
 .../libgomp.c-c++-common/alloc-pinned-1.c |  28 ++
 libgomp/testsuite/libgomp.c/alloc-pinned-1.c  |  46 ++-
 libgomp/testsuite/libgomp.c/alloc-pinned-2.c  |  46 ++-
 libgomp/testsuite/libgomp.c/alloc-pinned-3.c  |  45 ++-
 libgomp/testsuite/libgomp.c/alloc-pinned-4.c  |  44 ++-
 libgomp/testsuite/libgomp.c/alloc-pinned-5.c  | 126 +++
 libgomp/testsuite/libgomp.c/alloc-pinned-6.c  | 128 +++
 libgomp/testsuite/libgomp.c/alloc-pinned-7.c  |  63 
 libgomp/testsuite/libgomp.c/alloc-pinned-8.c  | 122 +++
 .../libgomp.fortran/alloc-pinned-1.f90|  16 +
 libgomp/usmpin-allocator.c| 319 ++
 33 files changed, 1550 insertions(+), 118 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/allocate-pinned-1.f90
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/alloc-pinned-1.c
 create mode 100644 libgomp/testsuite/libgomp.c/alloc-pinned-5.c
 create mode 100644 libgomp/testsuite/libgomp.c/alloc-pinned-6.c
 create mode 100644 libgomp/testsuite/libgomp.c/alloc-pinned-7.c
 create mode 100644 libgomp/testsuite/libgomp.c/alloc-pinned-8.c
 create mode 100644 libgomp/testsuite/libgomp.fortran/alloc-pinned-1.f90
 create mode 100644 libgomp/usmpin-allocator.c

-- 
2.41.0

[PATCH v5 3/6] openmp: Add -foffload-memory

Add a new option.  It's inactive until I add some follow-up patches.

gcc/ChangeLog:

* common.opt: Add -foffload-memory and its enum values.
* coretypes.h (enum offload_memory): New.
* doc/invoke.texi: Document -foffload-memory.
---
 gcc/common.opt  | 16 
 gcc/coretypes.h |  7 +++
 gcc/doc/invoke.texi | 16 +++-
 3 files changed, 38 insertions(+), 1 deletion(-)

diff --git a/gcc/common.opt b/gcc/common.opt
index 2c078fdd1f8..e874e88d3e1 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -2349,6 +2349,22 @@ Enum(offload_abi) String(ilp32) Value(OFFLOAD_ABI_ILP32)
 EnumValue
 Enum(offload_abi) String(lp64) Value(OFFLOAD_ABI_LP64)
 
+foffload-memory=
+Common Joined RejectNegative Enum(offload_memory) Var(flag_offload_memory) 
Init(OFFLOAD_MEMORY_NONE)
+-foffload-memory=[none|unified|pinned] Use an offload memory optimization.
+
+Enum
+Name(offload_memory) Type(enum offload_memory) UnknownError(Unknown offload 
memory option %qs)
+
+EnumValue
+Enum(offload_memory) String(none) Value(OFFLOAD_MEMORY_NONE)
+
+EnumValue
+Enum(offload_memory) String(unified) Value(OFFLOAD_MEMORY_UNIFIED)
+
+EnumValue
+Enum(offload_memory) String(pinned) Value(OFFLOAD_MEMORY_PINNED)
+
 fomit-frame-pointer
 Common Var(flag_omit_frame_pointer) Optimization
 When possible do not generate stack frames.
diff --git a/gcc/coretypes.h b/gcc/coretypes.h
index 1ac6f0abea3..938cfa93753 100644
--- a/gcc/coretypes.h
+++ b/gcc/coretypes.h
@@ -218,6 +218,13 @@ enum offload_abi {
   OFFLOAD_ABI_ILP32
 };
 
+/* Types of memory optimization for an offload device.  */
+enum offload_memory {
+  OFFLOAD_MEMORY_NONE,
+  OFFLOAD_MEMORY_UNIFIED,
+  OFFLOAD_MEMORY_PINNED
+};
+
 /* Types of profile update methods.  */
 enum profile_update {
   PROFILE_UPDATE_SINGLE,
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 45115b5fbed..eb0f8b4a58d 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -202,7 +202,7 @@ in the following sections.
 -fno-builtin  -fno-builtin-@var{function}  -fcond-mismatch
 -ffreestanding  -fgimple  -fgnu-tm  -fgnu89-inline  -fhosted
 -flax-vector-conversions  -fms-extensions
--foffload=@var{arg}  -foffload-options=@var{arg}
+-foffload=@var{arg}  -foffload-options=@var{arg} -foffload-memory=@var{arg} 
 -fopenacc  -fopenacc-dim=@var{geom}
 -fopenmp  -fopenmp-simd  -fopenmp-target-simd-clone@r{[}=@var{device-type}@r{]}
 -fpermitted-flt-eval-methods=@var{standard}
@@ -2786,6 +2786,20 @@ Typical command lines are
 -foffload-options=amdgcn-amdhsa=-march=gfx906
 @end smallexample
 
+@opindex foffload-memory
+@cindex OpenMP offloading memory modes
+@item -foffload-memory=none
+@itemx -foffload-memory=unified
+@itemx -foffload-memory=pinned
+Enable a memory optimization mode to use with OpenMP.  The default behavior,
+@option{-foffload-memory=none}, is to do nothing special (unless enabled via
+a requires directive in the code).  @option{-foffload-memory=unified} is
+equivalent to @code{#pragma omp requires unified_shared_memory}.
+@option{-foffload-memory=pinned} forces all host memory to be pinned (this
+mode may require the user to increase the ulimit setting for locked memory).
+All translation units must select the same setting to avoid undefined
+behavior.
+
 @opindex fopenacc
 @cindex OpenACC accelerator programming
 @item -fopenacc
-- 
2.41.0

[PATCH v5 2/6] libgomp, openmp: Add ompx_gnu_pinned_mem_alloc

Compared to the previous v4 (1/5) posting of this patch:
- The enumeration of the ompx allocators have been moved (again) to 200
  (as 100 is already in use by another toolchain vendor and this seems
  like a possible source of confusion).
- The "ompx" has also been changed to "ompx_gnu" to highlight that these
  are specifically GNU extensions.
- The failure mode of the testcases had been modified, including adding
  an abort in CHECK_SIZE and skipping the test on unsupported platforms.
- The OMP_ALLOCATE environment variable now supports the new allocator.
- The Fortran frontend allows use of the new allocator in "allocator"
  clauses.

---

This creates a new predefined allocator as a shortcut for using pinned
memory with OpenMP.  This is not in the OpenMP standard so it uses the "ompx"
namespace and an independent enum baseline of 200 (selected to not clash with
other known implementations).

The allocator is equivalent to using a custom allocator with the pinned
trait and the null fallback trait.  One motivation for having this feature is
for use by the (planned) -foffload-memory=pinned feature.

gcc/fortran/ChangeLog:

* openmp.cc (is_predefined_allocator): Update valid ranges to
  incorporate ompx_gnu_pinned_mem_alloc.

libgomp/ChangeLog:

* allocator.c (ompx_gnu_min_predefined_alloc): New.
(ompx_gnu_max_predefined_alloc): New.
(predefined_alloc_mapping): Rename to ...
(predefined_omp_alloc_mapping): ... this.
(predefined_ompx_gnu_alloc_mapping): New.
(_Static_assert): Adjust for the new name, and add a new assert for the
new table.
(predefined_allocator_p): New.
(predefined_alloc_mapping): New.
(omp_aligned_alloc): Support ompx_gnu_pinned_mem_alloc.
Use predefined_allocator_p and predefined_alloc_mapping.
(omp_free): Likewise.
(omp_alligned_calloc): Likewise.
(omp_realloc): Likewise.
* env.c (parse_allocator): Add ompx_gnu_pinned_mem_alloc.
* libgomp.texi: Document ompx_gnu_pinned_mem_alloc.
* omp.h.in (omp_allocator_handle_t): Add ompx_gnu_pinned_mem_alloc.
* omp_lib.f90.in: Add ompx_gnu_pinned_mem_alloc.
* omp_lib.h.in: Add ompx_gnu_pinned_mem_alloc.
* testsuite/libgomp.c/alloc-pinned-5.c: New test.
* testsuite/libgomp.c/alloc-pinned-6.c: New test.
* testsuite/libgomp.fortran/alloc-pinned-1.f90: New test.

gcc/testsuite/ChangeLog:

* gfortran.dg/gomp/allocate-pinned-1.f90: New test.

Co-Authored-By: Thomas Schwinge 
---
 gcc/fortran/openmp.cc |  11 +-
 .../gfortran.dg/gomp/allocate-pinned-1.f90|  16 +++
 libgomp/allocator.c   | 115 +-
 libgomp/env.c |   1 +
 libgomp/libgomp.texi  |   7 +-
 libgomp/omp.h.in  |   1 +
 libgomp/omp_lib.f90.in|   2 +
 libgomp/omp_lib.h.in  |   2 +
 libgomp/testsuite/libgomp.c/alloc-pinned-5.c  | 100 +++
 libgomp/testsuite/libgomp.c/alloc-pinned-6.c  | 102 
 .../libgomp.fortran/alloc-pinned-1.f90|  16 +++
 11 files changed, 336 insertions(+), 37 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/allocate-pinned-1.f90
 create mode 100644 libgomp/testsuite/libgomp.c/alloc-pinned-5.c
 create mode 100644 libgomp/testsuite/libgomp.c/alloc-pinned-6.c
 create mode 100644 libgomp/testsuite/libgomp.fortran/alloc-pinned-1.f90

diff --git a/gcc/fortran/openmp.cc b/gcc/fortran/openmp.cc
index 5246647e6f8..a177afb4974 100644
--- a/gcc/fortran/openmp.cc
+++ b/gcc/fortran/openmp.cc
@@ -7352,8 +7352,9 @@ resolve_omp_udr_clause (gfc_omp_namelist *n, 
gfc_namespace *ns,
 }
 
 /* Assume that a constant expression in the range 1 (omp_default_mem_alloc)
-   to 8 (omp_thread_mem_alloc) range is fine.  The original symbol name is
-   already lost during matching via gfc_match_expr.  */
+   to 8 (omp_thread_mem_alloc) range, or 200 (ompx_gnu_pinned_mem_alloc) is
+   fine.  The original symbol name is already lost during matching via
+   gfc_match_expr.  */
 static bool
 is_predefined_allocator (gfc_expr *expr)
 {
@@ -7362,8 +7363,10 @@ is_predefined_allocator (gfc_expr *expr)
  && expr->ts.type == BT_INTEGER
  && expr->ts.kind == gfc_c_intptr_kind
  && expr->expr_type == EXPR_CONSTANT
- && mpz_sgn (expr->value.integer) > 0
- && mpz_cmp_si (expr->value.integer, 8) <= 0);
+ && ((mpz_sgn (expr->value.integer) > 0
+  && mpz_cmp_si (expr->value.integer, 8) <= 0)
+ || (mpz_cmp_si (expr->value.integer, 200) >= 0
+ && mpz_cmp_si (expr->value.integer, 200) <= 0)));
 }
 
 /* Resolve declarative ALLOCATE statement. Note: Common block vars only appear
diff --git a/gcc/testsuite/gfortran.dg/gomp/allocate-pinned-1.f90 
b/gcc/testsuite/gfortran.dg/gomp/allocate-pinned-1.f90
new fi

[PATCH v5 5/6] libgomp, nvptx: Cuda pinned memory

This patch was already approved, in the v3 posting by Tobias Burnus
(with one caveat about initialization location), but wasn't committed at
that time as I didn't want to disentangle it from the textual
dependencies on the other patches in the series.

--

Use Cuda to pin memory, instead of Linux mlock, when available.

There are two advantages: firstly, this gives a significant speed boost for
NVPTX offloading, and secondly, it side-steps the usual OS ulimit/rlimit
setting.

The design adds a device independent plugin API for allocating pinned memory,
and then implements it for NVPTX.  At present, the other supported devices do
not have equivalent capabilities (or requirements).

libgomp/ChangeLog:

* config/linux/allocator.c: Include assert.h.
(using_device_for_page_locked): New variable.
(linux_memspace_alloc): Add init0 parameter. Support device pinning.
(linux_memspace_calloc): Set init0 to true.
(linux_memspace_free): Support device pinning.
(linux_memspace_realloc): Support device pinning.
(MEMSPACE_ALLOC): Set init0 to false.
* libgomp-plugin.h
(GOMP_OFFLOAD_page_locked_host_alloc): New prototype.
(GOMP_OFFLOAD_page_locked_host_free): Likewise.
* libgomp.h (gomp_page_locked_host_alloc): Likewise.
(gomp_page_locked_host_free): Likewise.
(struct gomp_device_descr): Add page_locked_host_alloc_func and
page_locked_host_free_func.
* libgomp.texi: Adjust the docs for the pinned trait.
* libgomp_g.h (GOMP_enable_pinned_mode): New prototype.
* plugin/plugin-nvptx.c
(GOMP_OFFLOAD_page_locked_host_alloc): New function.
(GOMP_OFFLOAD_page_locked_host_free): Likewise.
* target.c (device_for_page_locked): New variable.
(get_device_for_page_locked): New function.
(gomp_page_locked_host_alloc): Likewise.
(gomp_page_locked_host_free): Likewise.
(gomp_load_plugin_for_device): Add page_locked_host_alloc and
page_locked_host_free.
* testsuite/libgomp.c/alloc-pinned-1.c: Change expectations for NVPTX
devices.
* testsuite/libgomp.c/alloc-pinned-2.c: Likewise.
* testsuite/libgomp.c/alloc-pinned-3.c: Likewise.
* testsuite/libgomp.c/alloc-pinned-4.c: Likewise.
* testsuite/libgomp.c/alloc-pinned-5.c: Likewise.
* testsuite/libgomp.c/alloc-pinned-6.c: Likewise.

Co-Authored-By: Thomas Schwinge 
---
 libgomp/config/linux/allocator.c | 141 ++-
 libgomp/libgomp-plugin.h |   2 +
 libgomp/libgomp.h|   4 +
 libgomp/libgomp.texi |  11 +-
 libgomp/libgomp_g.h  |   1 +
 libgomp/plugin/plugin-nvptx.c|  42 ++
 libgomp/target.c | 136 ++
 libgomp/testsuite/libgomp.c/alloc-pinned-1.c |  26 
 libgomp/testsuite/libgomp.c/alloc-pinned-2.c |  26 
 libgomp/testsuite/libgomp.c/alloc-pinned-3.c |  45 +-
 libgomp/testsuite/libgomp.c/alloc-pinned-4.c |  44 +-
 libgomp/testsuite/libgomp.c/alloc-pinned-5.c |  26 
 libgomp/testsuite/libgomp.c/alloc-pinned-6.c |  34 -
 13 files changed, 488 insertions(+), 50 deletions(-)

diff --git a/libgomp/config/linux/allocator.c b/libgomp/config/linux/allocator.c
index 7e09ba44b2f..063c46f972c 100644
--- a/libgomp/config/linux/allocator.c
+++ b/libgomp/config/linux/allocator.c
@@ -36,6 +36,11 @@
 
 /* Implement malloc routines that can handle pinned memory on Linux.

+   Given that pinned memory is typically used to help host <-> device memory
+   transfers, we attempt to allocate such memory using a device (really:
+   libgomp plugin), but fall back to mmap plus mlock if no suitable device is
+   available.
+
It's possible to use mlock on any heap memory, but using munlock is
problematic if there are multiple pinned allocations on the same page.
Tracking all that manually would be possible, but adds overhead. This may
@@ -49,6 +54,7 @@
 #define _GNU_SOURCE
 #include 
 #include 
+#include 
 #include "libgomp.h"
 #ifdef HAVE_INTTYPES_H
 # include   /* For PRIu64.  */
@@ -68,50 +74,92 @@ GOMP_enable_pinned_mode ()
 always_pinned_mode = true;
 }
 
+static int using_device_for_page_locked
+  = /* uninitialized */ -1;
+
 static void *
-linux_memspace_alloc (omp_memspace_handle_t memspace, size_t size, int pin)
+linux_memspace_alloc (omp_memspace_handle_t memspace, size_t size, int pin,
+ bool init0)
 {
-  (void)memspace;
+  gomp_debug (0, "%s: memspace=%llu, size=%llu, pin=%d, init0=%d\n",
+ __FUNCTION__, (unsigned long long) memspace,
+ (unsigned long long) size, pin, init0);
+
+  void *addr;
 
   /* Explicit pinning may not be required.  */
   pin = pin && !always_pinned_mode;
 
   if (pin)
 {
-  /* Note that mmap always returns zeroed memory and is therefore also a
-suitabl

[PATCH v5 4/6] openmp: -foffload-memory=pinned

Implement the -foffload-memory=pinned option such that libgomp is
instructed to enable fully-pinned memory at start-up.  The option is
intended to provide a performance boost to certain offload programs without
modifying the code.

This feature only works on Linux, at present, and simply calls mlockall to
enable always-on memory pinning.  It requires that the ulimit feature is
set high enough to accommodate all the program's memory usage.

In this mode the ompx_gnu_pinned_memory_alloc feature is disabled as it is not
needed and may conflict.

gcc/ChangeLog:

* omp-builtins.def (BUILT_IN_GOMP_ENABLE_PINNED_MODE): New.
* omp-low.cc (omp_enable_pinned_mode): New function.
(execute_lower_omp): Call omp_enable_pinned_mode.

libgomp/ChangeLog:

* config/linux/allocator.c (always_pinned_mode): New variable.
(GOMP_enable_pinned_mode): New function.
(linux_memspace_alloc): Disable pinning when always_pinned_mode set.
(linux_memspace_calloc): Likewise.
(linux_memspace_free): Likewise.
(linux_memspace_realloc): Likewise.
* libgomp.map: Add GOMP_enable_pinned_mode.
* testsuite/libgomp.c/alloc-pinned-7.c: New test.
* testsuite/libgomp.c-c++-common/alloc-pinned-1.c: New test.
---
 gcc/omp-builtins.def  |  3 +
 gcc/omp-low.cc| 66 +++
 libgomp/config/linux/allocator.c  | 26 
 libgomp/libgomp.map   |  1 +
 .../libgomp.c-c++-common/alloc-pinned-1.c | 28 
 libgomp/testsuite/libgomp.c/alloc-pinned-7.c  | 63 ++
 6 files changed, 187 insertions(+)
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/alloc-pinned-1.c
 create mode 100644 libgomp/testsuite/libgomp.c/alloc-pinned-7.c

diff --git a/gcc/omp-builtins.def b/gcc/omp-builtins.def
index 044d5d087b6..aefc52e5f9f 100644
--- a/gcc/omp-builtins.def
+++ b/gcc/omp-builtins.def
@@ -476,3 +476,6 @@ DEF_GOMP_BUILTIN (BUILT_IN_GOMP_WARNING, "GOMP_warning",
  BT_FN_VOID_CONST_PTR_SIZE, ATTR_NOTHROW_LEAF_LIST)
 DEF_GOMP_BUILTIN (BUILT_IN_GOMP_ERROR, "GOMP_error",
  BT_FN_VOID_CONST_PTR_SIZE, 
ATTR_COLD_NORETURN_NOTHROW_LEAF_LIST)
+DEF_GOMP_BUILTIN (BUILT_IN_GOMP_ENABLE_PINNED_MODE,
+ "GOMP_enable_pinned_mode",
+ BT_FN_VOID, ATTR_NOTHROW_LIST)
diff --git a/gcc/omp-low.cc b/gcc/omp-low.cc
index 4d003f42098..cf3f57748d8 100644
--- a/gcc/omp-low.cc
+++ b/gcc/omp-low.cc
@@ -14596,6 +14596,68 @@ lower_omp (gimple_seq *body, omp_context *ctx)
   input_location = saved_location;
 }
 
+/* Emit a constructor function to enable -foffload-memory=pinned
+   at runtime.  Libgomp handles the OS mode setting, but we need to trigger
+   it by calling GOMP_enable_pinned mode before the program proper runs.  */
+
+static void
+omp_enable_pinned_mode ()
+{
+  static bool visited = false;
+  if (visited)
+return;
+  visited = true;
+
+  /* Create a new function like this:
+ 
+   static void __attribute__((constructor))
+   __set_pinned_mode ()
+   {
+ GOMP_enable_pinned_mode ();
+   }
+  */
+
+  tree name = get_identifier ("__set_pinned_mode");
+  tree voidfntype = build_function_type_list (void_type_node, NULL_TREE);
+  tree decl = build_decl (UNKNOWN_LOCATION, FUNCTION_DECL, name, voidfntype);
+
+  TREE_STATIC (decl) = 1;
+  TREE_USED (decl) = 1;
+  DECL_ARTIFICIAL (decl) = 1;
+  DECL_IGNORED_P (decl) = 0;
+  TREE_PUBLIC (decl) = 0;
+  DECL_UNINLINABLE (decl) = 1;
+  DECL_EXTERNAL (decl) = 0;
+  DECL_CONTEXT (decl) = NULL_TREE;
+  DECL_INITIAL (decl) = make_node (BLOCK);
+  BLOCK_SUPERCONTEXT (DECL_INITIAL (decl)) = decl;
+  DECL_STATIC_CONSTRUCTOR (decl) = 1;
+  DECL_ATTRIBUTES (decl) = tree_cons (get_identifier ("constructor"),
+ NULL_TREE, NULL_TREE);
+
+  tree t = build_decl (UNKNOWN_LOCATION, RESULT_DECL, NULL_TREE,
+  void_type_node);
+  DECL_ARTIFICIAL (t) = 1;
+  DECL_IGNORED_P (t) = 1;
+  DECL_CONTEXT (t) = decl;
+  DECL_RESULT (decl) = t;
+
+  push_struct_function (decl);
+  init_tree_ssa (cfun);
+
+  tree calldecl = builtin_decl_explicit (BUILT_IN_GOMP_ENABLE_PINNED_MODE);
+  gcall *call = gimple_build_call (calldecl, 0);
+
+  gimple_seq seq = NULL;
+  gimple_seq_add_stmt (&seq, call);
+  gimple_set_body (decl, gimple_build_bind (NULL_TREE, seq, NULL));
+
+  cfun->function_end_locus = UNKNOWN_LOCATION;
+  cfun->curr_properties |= PROP_gimple_any;
+  pop_cfun ();
+  cgraph_node::add_new_function (decl, true);
+}
+
 /* Main entry point.  */
 
 static unsigned int
@@ -14652,6 +14714,10 @@ execute_lower_omp (void)
   for (auto task_stmt : task_cpyfns)
 finalize_task_copyfn (task_stmt);
   task_cpyfns.release ();
+
+  if (flag_offload_memory == OFFLOAD_MEMORY_PINNED)
+omp_enable_pinned_mode ();
+
   return 0;
 }
 
diff --git a/libgomp/config/linux/allocator.c b/libgomp/config/linux/allocator.c
inde

[PATCH v5 6/6] libgomp: fine-grained pinned memory allocator

This patch introduces a new custom memory allocator for use with pinned
memory (in the case where the Cuda allocator isn't available).  In future,
this allocator will also be used for Unified Shared Memory.  Both memories
are incompatible with the system malloc because allocated memory cannot
share a page with memory allocated for other purposes.

This means that small allocations will no longer consume an entire page of
pinned memory.  Unfortunately, it also means that pinned memory pages will
never be unmapped (although they may be reused).

The implementation is not perfect; there are various corner cases (especially
related to extending onto new pages) where allocations and reallocations may
be sub-optimal, but it should still be a step forward in support for small
allocations.

I have considered using libmemkind's "fixed" memory but rejected it for three
reasons: 1) libmemkind may not always be present at runtime, 2) there's no
currently documented means to extend a "fixed" kind one page at a time
(although the code appears to have an undocumented function that may do the
job, and/or extending libmemkind to support the MAP_LOCKED mmap flag with its
regular kinds would be straight-forward), 3) USM benefits from having the
metadata located in different memory and using an external implementation makes
it hard to guarantee this.

libgomp/ChangeLog:

* Makefile.am (libgomp_la_SOURCES): Add usmpin-allocator.c.
* Makefile.in: Regenerate.
* config/linux/allocator.c: Include unistd.h.
(pin_ctx): New variable.
(ctxlock): New variable.
(linux_init_pin_ctx): New function.
(linux_memspace_alloc): Use usmpin-allocator for pinned memory.
(linux_memspace_free): Likewise.
(linux_memspace_realloc): Likewise.
* libgomp.h (usmpin_init_context): New prototype.
(usmpin_register_memory): New prototype.
(usmpin_alloc): New prototype.
(usmpin_free): New prototype.
(usmpin_realloc): New prototype.
* testsuite/libgomp.c/alloc-pinned-1.c: Adjust for new behaviour.
* testsuite/libgomp.c/alloc-pinned-2.c: Likewise.
* testsuite/libgomp.c/alloc-pinned-5.c: Likewise.
* testsuite/libgomp.c/alloc-pinned-8.c: New test.
* usmpin-allocator.c: New file.
---
 libgomp/Makefile.am  |   2 +-
 libgomp/Makefile.in  |   7 +-
 libgomp/config/linux/allocator.c |  97 --
 libgomp/libgomp.h|  10 +
 libgomp/testsuite/libgomp.c/alloc-pinned-8.c | 122 +++
 libgomp/usmpin-allocator.c   | 319 +++
 6 files changed, 522 insertions(+), 35 deletions(-)
 create mode 100644 libgomp/testsuite/libgomp.c/alloc-pinned-8.c
 create mode 100644 libgomp/usmpin-allocator.c

diff --git a/libgomp/Makefile.am b/libgomp/Makefile.am
index 855f0affddf..73c21699332 100644
--- a/libgomp/Makefile.am
+++ b/libgomp/Makefile.am
@@ -70,7 +70,7 @@ libgomp_la_SOURCES = alloc.c atomic.c barrier.c critical.c 
env.c error.c \
target.c splay-tree.c libgomp-plugin.c oacc-parallel.c oacc-host.c \
oacc-init.c oacc-mem.c oacc-async.c oacc-plugin.c oacc-cuda.c \
priority_queue.c affinity-fmt.c teams.c allocator.c oacc-profiling.c \
-   oacc-target.c target-indirect.c
+   oacc-target.c target-indirect.c usmpin-allocator.c
 
 include $(top_srcdir)/plugin/Makefrag.am
 
diff --git a/libgomp/Makefile.in b/libgomp/Makefile.in
index da902f3daca..b74e39a1c2a 100644
--- a/libgomp/Makefile.in
+++ b/libgomp/Makefile.in
@@ -219,7 +219,8 @@ am_libgomp_la_OBJECTS = alloc.lo atomic.lo barrier.lo 
critical.lo \
oacc-parallel.lo oacc-host.lo oacc-init.lo oacc-mem.lo \
oacc-async.lo oacc-plugin.lo oacc-cuda.lo priority_queue.lo \
affinity-fmt.lo teams.lo allocator.lo oacc-profiling.lo \
-   oacc-target.lo target-indirect.lo $(am__objects_1)
+   oacc-target.lo target-indirect.lo usmpin-allocator.lo \
+   $(am__objects_1)
 libgomp_la_OBJECTS = $(am_libgomp_la_OBJECTS)
 AM_V_P = $(am__v_P_@AM_V@)
 am__v_P_ = $(am__v_P_@AM_DEFAULT_V@)
@@ -552,7 +553,8 @@ libgomp_la_SOURCES = alloc.c atomic.c barrier.c critical.c 
env.c \
oacc-parallel.c oacc-host.c oacc-init.c oacc-mem.c \
oacc-async.c oacc-plugin.c oacc-cuda.c priority_queue.c \
affinity-fmt.c teams.c allocator.c oacc-profiling.c \
-   oacc-target.c target-indirect.c $(am__append_3)
+   oacc-target.c target-indirect.c usmpin-allocator.c \
+   $(am__append_3)
 
 # Nvidia PTX OpenACC plugin.
 @PLUGIN_NVPTX_TRUE@libgomp_plugin_nvptx_version_info = -version-info 
$(libtool_VERSION)
@@ -786,6 +788,7 @@ distclean-compile:
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/team.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/teams.Plo@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/time.Plo@am__quote@
+@AMDEP_TRUE@@am__include@ 
@am__quote@./$(DEPDIR)/usmpin-allocat

Re: [PATCH] aarch64: Add vector popcount besides QImode [PR113859]

Pengxuan Zheng  writes:
> This patch improves GCC’s vectorization of __builtin_popcount for aarch64 
> target
> by adding popcount patterns for vector modes besides QImode, i.e., HImode,
> SImode and DImode.
>
> With this patch, we now generate the following for HImode:
>   cnt v1.16b, v.16b
>   uaddlp  v2.8h, v1.16b
>
> For SImode, we generate:
>   cnt v1.16b, v.16b
>   uaddlp  v2.8h, v1.16b
>   uaddlp  v3.4s, v2.8h
>
> For V2DI, we generate:
>   cnt v1.16b, v.16b
>   uaddlp  v2.8h, v1.16b
>   uaddlp  v3.4s, v2.8h
>   uaddlp  v4.2d, v3.4s
>
> gcc/ChangeLog:
>
>   PR target/113859
>   * config/aarch64/aarch64-simd.md (popcount2): New define_expand.
>
> gcc/testsuite/ChangeLog:
>
>   PR target/113859
>   * gcc.target/aarch64/popcnt-vec.c: New test.
>
> Signed-off-by: Pengxuan Zheng 
> ---
>  gcc/config/aarch64/aarch64-simd.md| 40 
>  gcc/testsuite/gcc.target/aarch64/popcnt-vec.c | 48 +++
>  2 files changed, 88 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/popcnt-vec.c
>
> diff --git a/gcc/config/aarch64/aarch64-simd.md 
> b/gcc/config/aarch64/aarch64-simd.md
> index f8bb973a278..093c32ee8ff 100644
> --- a/gcc/config/aarch64/aarch64-simd.md
> +++ b/gcc/config/aarch64/aarch64-simd.md
> @@ -3540,6 +3540,46 @@ (define_insn "popcount2"
>[(set_attr "type" "neon_cnt")]
>  )
>  
> +(define_expand "popcount2"
> +  [(set (match_operand:VQN 0 "register_operand" "=w")
> +(popcount:VQN (match_operand:VQN 1 "register_operand" "w")))]
> +  "TARGET_SIMD"
> +  {
> +rtx v = gen_reg_rtx (V16QImode);
> +rtx v1 = gen_reg_rtx (V16QImode);
> +emit_move_insn (v, gen_lowpart (V16QImode, operands[1]));
> +emit_insn (gen_popcountv16qi2 (v1, v));
> +if (mode == V8HImode)
> +  {
> +/* For V8HI, we generate:
> +cnt v1.16b, v.16b
> +uaddlp  v2.8h, v1.16b */
> +emit_insn (gen_aarch64_uaddlpv16qi (operands[0], v1));
> +DONE;
> +  }
> +rtx v2 = gen_reg_rtx (V8HImode);
> +emit_insn (gen_aarch64_uaddlpv16qi (v2, v1));
> +if (mode == V4SImode)
> +  {
> +/* For V4SI, we generate:
> +cnt v1.16b, v.16b
> +uaddlp  v2.8h, v1.16b
> +uaddlp  v3.4s, v2.8h */
> +emit_insn (gen_aarch64_uaddlpv8hi (operands[0], v2));
> +DONE;
> +  }
> +/* For V2DI, we generate:
> +cnt v1.16b, v.16b
> +uaddlp  v2.8h, v1.16b
> +uaddlp  v3.4s, v2.8h
> +uaddlp  v4.2d, v3.4s */
> +rtx v3 = gen_reg_rtx (V4SImode);
> +emit_insn (gen_aarch64_uaddlpv8hi (v3, v2));
> +emit_insn (gen_aarch64_uaddlpv4si (operands[0], v3));
> +DONE;
> +  }
> +)
> +

Could you add support for V4HI and V2SI at the same time?

I think it's possible to handle all 5 modes iteratively, like so:

(define_expand "popcount2"
  [(set (match_operand:VDQHSD 0 "register_operand")
(popcount:VDQHSD (match_operand:VDQHSD 1 "register_operand")))]
  "TARGET_SIMD"
{
  /* Generate a byte popcount.  */
  machine_mode mode =  == 64 ? V8QImode : V16QImode;
  rtx tmp = gen_reg_rtx (mode);
  auto icode = optab_handler (popcount_optab, mode);
  emit_insn (GEN_FCN (icode) (tmp, gen_lowpart (mode, operands[1])));

  /* Use a sequence of UADDLPs to accumulate the counts.  Each step doubles
 the element size and halves the number of elements.  */
  do
{
  auto icode = code_for_aarch64_addlp (ZERO_EXTEND, GET_MODE (tmp));
  mode = insn_data[icode].operand[0].mode;
  rtx dest = mode == mode ? operands[0] : gen_reg_rtx (mode);
  emit_insn (GEN_FCN (icode) (dest, tmp));
  tmp = dest;
}
  while (mode != mode);
  DONE;
})

(only lightly tested).  This requires changing:

(define_expand "aarch64_addlp"

to:

(define_expand "@aarch64_addlp"

Thanks,
Richard

>  ;; 'across lanes' max and min ops.
>  
>  ;; Template for outputting a scalar, so we can create __builtins which can be
> diff --git a/gcc/testsuite/gcc.target/aarch64/popcnt-vec.c 
> b/gcc/testsuite/gcc.target/aarch64/popcnt-vec.c
> new file mode 100644
> index 000..4c9a1b95990
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/popcnt-vec.c
> @@ -0,0 +1,48 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +
> +/* This function should produce cnt v.16b. */
> +void
> +bar (unsigned char *__restrict b, unsigned char *__restrict d)
> +{
> +  for (int i = 0; i < 1024; i++)
> +d[i] = __builtin_popcount (b[i]);
> +}
> +
> +/* This function should produce cnt v.16b and uaddlp (Add Long Pairwise). */
> +void
> +bar1 (unsigned short *__restrict b, unsigned short *__restrict d)
> +{
> +  for (int i = 0; i < 1024; i++)
> +d[i] = __builtin_popcount (b[i]);
> +}
> +
> +/* This function should produce cnt v.16b and 2 uaddlp (Add Long Pairwise). 
> */
> +void
> +bar2 (unsigned int *__restrict b, unsigned int *__restrict d)
> +{
> +  for (int i = 0; i < 1024; i++)
> +d[i] = __builtin_popcount (b[i])

Re: [PATCH v3 2/2] testsuite: Fix expand-return CMSE test for Armv8.1-M [PR115253]

2024-06-12 Thread Torbjorn SVENSSON





On 2024-06-11 16:00, Richard Earnshaw (lists) wrote:

On 10/06/2024 15:04, Torbjörn SVENSSON wrote:

For Armv8.1-M, the clearing of the registers is handled differently than
for Armv8-M, so update the test case accordingly.

gcc/testsuite/ChangeLog:

PR target/115253
* gcc.target/arm/cmse/extend-return.c: Update test case
condition for Armv8.1-M.

Signed-off-by: Torbjörn SVENSSON 
Co-authored-by: Yvan ROUX 
---
  .../gcc.target/arm/cmse/extend-return.c   | 62 +--
  1 file changed, 56 insertions(+), 6 deletions(-)

diff --git a/gcc/testsuite/gcc.target/arm/cmse/extend-return.c 
b/gcc/testsuite/gcc.target/arm/cmse/extend-return.c
index 081de0d699f..2288d166bd3 100644
--- a/gcc/testsuite/gcc.target/arm/cmse/extend-return.c
+++ b/gcc/testsuite/gcc.target/arm/cmse/extend-return.c
@@ -1,5 +1,7 @@
  /* { dg-do compile } */
  /* { dg-options "-mcmse -fshort-enums" } */
+/* ARMv8-M expectation with target { ! arm_cmse_clear_ok }.  */
+/* ARMv8.1-M expectation with target arm_cmse_clear_ok.  */
  /* { dg-final { check-function-bodies "**" "" "" } } */
  
  #include 

@@ -20,7 +22,15 @@ typedef enum offset __attribute__ ((cmse_nonsecure_call)) 
ns_enum_foo_t (void);
  typedef bool __attribute__ ((cmse_nonsecure_call)) ns_bool_foo_t (void);
  
  /*

-**unsignNonsecure0:
+**unsignNonsecure0:  { target arm_cmse_clear_ok }
+** ...
+** blxns   r[0-3]
+** ...
+** uxtbr0, r0
+** ...
+*/
+/*
+**unsignNonsecure0: { target { ! arm_cmse_clear_ok } }
  **...
  **bl  __gnu_cmse_nonsecure_call
  **uxtbr0, r0
@@ -32,7 +42,15 @@ unsigned char unsignNonsecure0 (ns_unsign_foo_t * ns_foo_p)
  }
  
  /*

-**signNonsecure0:
+**signNonsecure0:  { target arm_cmse_clear_ok }
+** ...
+** blxns   r[0-3]
+** ...
+** sxtbr0, r0
+** ...
+*/
+/*
+**signNonsecure0: { target { ! arm_cmse_clear_ok } }
  **...
  **bl  __gnu_cmse_nonsecure_call
  **sxtbr0, r0
@@ -44,7 +62,15 @@ signed char signNonsecure0 (ns_sign_foo_t * ns_foo_p)
  }
  
  /*

-**shortUnsignNonsecure0:
+**shortUnsignNonsecure0:  { target arm_cmse_clear_ok }
+** ...
+** blxns   r[0-3]
+** ...
+** uxthr0, r0
+** ...
+*/
+/*
+**shortUnsignNonsecure0: { target { ! arm_cmse_clear_ok } }
  **...
  **bl  __gnu_cmse_nonsecure_call
  **uxthr0, r0
@@ -56,7 +82,15 @@ unsigned short shortUnsignNonsecure0 (ns_short_unsign_foo_t 
* ns_foo_p)
  }
  
  /*

-**shortSignNonsecure0:
+**shortSignNonsecure0:  { target arm_cmse_clear_ok }
+** ...
+** blxns   r[0-3]
+** ...
+** sxthr0, r0
+** ...
+*/
+/*
+**shortSignNonsecure0: { target { ! arm_cmse_clear_ok } }
  **...
  **bl  __gnu_cmse_nonsecure_call
  **sxthr0, r0
@@ -68,7 +102,15 @@ signed short shortSignNonsecure0 (ns_short_sign_foo_t * 
ns_foo_p)
  }
  
  /*

-**enumNonsecure0:
+**enumNonsecure0:  { target arm_cmse_clear_ok }
+** ...
+** blxns   r[0-3]
+** ...
+** uxtbr0, r0
+** ...
+*/
+/*
+**enumNonsecure0: { target { ! arm_cmse_clear_ok } }
  **...
  **bl  __gnu_cmse_nonsecure_call
  **uxtbr0, r0
@@ -80,7 +122,15 @@ unsigned char __attribute__((noipa)) enumNonsecure0 
(ns_enum_foo_t * ns_foo_p)
  }
  
  /*

-**boolNonsecure0:
+**boolNonsecure0:  { target arm_cmse_clear_ok }
+** ...
+** blxns   r[0-3]
+** ...
+** uxtbr0, r0
+** ...
+*/
+/*
+**boolNonsecure0: { target { ! arm_cmse_clear_ok } }
  **...
  **bl  __gnu_cmse_nonsecure_call
  **uxtbr0, r0


OK when the nits in the first patch are sorted.

R.


Pushed as:

basepoints/gcc-15-1201-gcf5f9171bae
releases/gcc-14.1.0-134-g9100e78ba28
releases/gcc-13.3.0-64-gdfab6851eb5
releases/gcc-12.3.0-1035-g3d9e4eedb6b
releases/gcc-11.4.0-650-gbf9c877c4c9

Kind regards,
Torbjörn

Re: [PATCH v3 1/2] arm: Zero/Sign extends for CMSE security on Armv8-M.baseline [PR115253]

2024-06-12 Thread Torbjorn SVENSSON





On 2024-06-11 15:59, Richard Earnshaw (lists) wrote:

On 10/06/2024 15:04, Torbjörn SVENSSON wrote:

Properly handle zero and sign extension for Armv8-M.baseline as
Cortex-M23 can have the security extension active.
Currently, there is an internal compiler error on Cortex-M23 for the
epilog processing of sign extension.

This patch addresses the following CVE-2024-0151 for Armv8-M.baseline.

gcc/ChangeLog:

PR target/115253
* config/arm/arm.cc (cmse_nonsecure_call_inline_register_clear):
Sign extend for Thumb1.
(thumb1_expand_prologue): Add zero/sign extend.

Signed-off-by: Torbjörn SVENSSON 
Co-authored-by: Yvan ROUX 
---
  gcc/config/arm/arm.cc | 71 ++-
  1 file changed, 63 insertions(+), 8 deletions(-)

diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc
index ea0c963a4d6..e7b4caf1083 100644
--- a/gcc/config/arm/arm.cc
+++ b/gcc/config/arm/arm.cc
@@ -19220,17 +19220,22 @@ cmse_nonsecure_call_inline_register_clear (void)
  || TREE_CODE (ret_type) == BOOLEAN_TYPE)
  && known_lt (GET_MODE_SIZE (TYPE_MODE (ret_type)), 4))
{
- machine_mode ret_mode = TYPE_MODE (ret_type);
+ rtx ret_reg = gen_rtx_REG (TYPE_MODE (ret_type), R0_REGNUM);
+ rtx si_reg = gen_rtx_REG (SImode, R0_REGNUM);
  rtx extend;
  if (TYPE_UNSIGNED (ret_type))
-   extend = gen_rtx_ZERO_EXTEND (SImode,
- gen_rtx_REG (ret_mode, 
R0_REGNUM));
+   extend = gen_rtx_SET (si_reg, gen_rtx_ZERO_EXTEND (SImode,
+  ret_reg));
  else
-   extend = gen_rtx_SIGN_EXTEND (SImode,
- gen_rtx_REG (ret_mode, 
R0_REGNUM));
- emit_insn_after (gen_rtx_SET (gen_rtx_REG (SImode, R0_REGNUM),
-extend), insn);
-
+   /* Signed-extension is a special case because of
+  thumb1_extendhisi2.  */
+   if (TARGET_THUMB1


You effectively have an 'else if' split across a comment here, and the 
indentation looks weird.  Either write 'else if' on one line (and re-indent 
accordingly) or put this entire block inside braces.


+   && known_ge (GET_MODE_SIZE (TYPE_MODE (ret_type)), 2))


You can use known_eq here.  We'll never have any value other than 2, given the 
known_le (4) above and anyway it doesn't make sense to call extendhisi with any 
other size.


+ extend = gen_thumb1_extendhisi2 (si_reg, ret_reg);
+   else
+ extend = gen_rtx_SET (si_reg, gen_rtx_SIGN_EXTEND (SImode,
+ret_reg));
+ emit_insn_after (extend, insn);
}
  
  
@@ -27250,6 +27255,56 @@ thumb1_expand_prologue (void)

live_regs_mask = offsets->saved_regs_mask;
lr_needs_saving = live_regs_mask & (1 << LR_REGNUM);
  


Similar comments to above apply to the hunk below.


+  /* The AAPCS requires the callee to widen integral types narrower
+ than 32 bits to the full width of the register; but when handling
+ calls to non-secure space, we cannot trust the callee to have
+ correctly done so.  So forcibly re-widen the result here.  */
+  if (IS_CMSE_ENTRY (func_type))
+{
+  function_args_iterator args_iter;
+  CUMULATIVE_ARGS args_so_far_v;
+  cumulative_args_t args_so_far;
+  bool first_param = true;
+  tree arg_type;
+  tree fndecl = current_function_decl;
+  tree fntype = TREE_TYPE (fndecl);
+  arm_init_cumulative_args (&args_so_far_v, fntype, NULL_RTX, fndecl);
+  args_so_far = pack_cumulative_args (&args_so_far_v);
+  FOREACH_FUNCTION_ARGS (fntype, arg_type, args_iter)
+   {
+ rtx arg_rtx;
+
+ if (VOID_TYPE_P (arg_type))
+   break;
+
+ function_arg_info arg (arg_type, /*named=*/true);
+ if (!first_param)
+   /* We should advance after processing the argument and pass
+  the argument we're advancing past.  */
+   arm_function_arg_advance (args_so_far, arg);
+ first_param = false;
+ arg_rtx = arm_function_arg (args_so_far, arg);
+ gcc_assert (REG_P (arg_rtx));
+ if ((TREE_CODE (arg_type) == INTEGER_TYPE
+ || TREE_CODE (arg_type) == ENUMERAL_TYPE
+ || TREE_CODE (arg_type) == BOOLEAN_TYPE)
+ && known_lt (GET_MODE_SIZE (GET_MODE (arg_rtx)), 4))
+   {
+ rtx res_reg = gen_rtx_REG (SImode, REGNO (arg_rtx));
+ if (TYPE_UNSIGNED (arg_type))
+   emit_set_insn (res_reg, gen_rtx_ZERO_EXTEND (SImode, arg_rtx));
+ else
+   /* Signed-extension is a special case because of
+  thumb1_extendhisi2.  */
+   if (known_ge (GET_MO

Re: Frontend access to target features (was Re: [PATCH] libgccjit: Add ability to get CPU features)

2024-06-12 Thread Antoni Boucher

David: Ping.

Le 2024-04-26 à 09 h 51, Antoni Boucher a écrit :
Now that we have a more general way to check if target-dependent types
are supported (see this commit:
https://github.com/rust-lang/gcc/commit/1c9a9b2f1fd914cad911467ec1d29f158643c2ce#diff-018089519ab2b14a34313ded0ae1a2f9fcab5f7bcb2fa31f147e1dc757bbdd7aR4016), perhaps we should remove gcc_jit_target_info_supports_128bit_int from this patch, or change it to include the more general way.

David, what are your thoughts on this?

Le 2024-04-19 à 08 h 34, Antoni Boucher a écrit :

David: Ping.

Le 2024-04-09 à 09 h 21, Antoni Boucher a écrit :

David: Ping.

Le 2024-04-01 à 08 h 20, Antoni Boucher a écrit :

David: Ping.

Le 2024-03-19 à 07 h 03, Arthur Cohen a écrit :

Hi,

On 3/5/24 16:09, David Malcolm wrote:

On Thu, 2023-11-09 at 19:33 -0500, Antoni Boucher wrote:

Hi.
See answers below.

On Thu, 2023-11-09 at 18:04 -0500, David Malcolm wrote:

On Thu, 2023-11-09 at 17:27 -0500, Antoni Boucher wrote:

Hi.
This patch adds support for getting the CPU features in libgccjit
(bug
112466)

There's a TODO in the test:
I'm not sure how to test that gcc_jit_target_info_arch returns
the
correct value since it is dependant on the CPU.
Any idea on how to improve this?

Also, I created a CStringHash to be able to have a
std::unordered_set. Is there any built-in way of
doing
this?

Thanks for the patch.

Some high-level questions:

Is this specifically about detecting capabilities of the host that
libgccjit is currently running on? or how the target was configured
when libgccjit was built?

I'm less sure about this part. I'll need to do more tests.

One of the benefits of libgccjit is that, in theory, we support all
of
the targets that GCC already supports. Does this patch change
that,
or
is this more about giving client code the ability to determine
capabilities of the specific host being compiled for?

This should not change that. If it does, this is a bug.

I'm nervous about having per-target jit code. Presumably there's a
reason that we can't reuse existing target logic here - can you
please
describe what the problem is. I see that the ChangeLog has:

* config/i386/i386-jit.cc: New file.

where i386-jit.cc has almost 200 lines of nontrivial code. Where
did
this come from? Did you base it on existing code in our source
tree,
making modifications to fit the new internal API, or did you write
it
from scratch? In either case, how onerous would this be for other
targets?

This was mostly copied from the same code done for the Rust and D
frontends.
See this commit and the following:
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=b1c06fd9723453dd2b2ec306684cb806dc2b4fbb
The equivalent to i386-jit.cc is there:
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=22e3557e2d52f129f2bbfdc98688b945dba28dc9

[CCing Iain and Arthur re those patches; for reference, the patch
being

discussed is attached to :
https://gcc.gnu.org/pipermail/jit/2024q1/001792.html ]

One of my concerns about this patch is that we seem to be gaining
code
that's per-(frontend x config) which seems to be copied and pasted
with

a search and replace, which could lead to an M*N explosion.

I think this is definitely already the case, and it would be worth
investigating if C/C++/Rust/jit can reuse a similar set of target
files, or how to factor them together. I imagine that all of these
components share similar needs for the targets they support.

Is there any real difference between the per-config code for the
different frontends, or should there be a general "enumerate all
features of the target" hook that's independent of the frontend? (but
perhaps calls into it).

Am I right in thinking that (rustc with default LLVM backend) has
some

set of feature strings that both (rustc with rustc_codegen_gcc) and
gccrs are trying to emulate? If so, is it presumably a goal that
libgccjit gives identical results to gccrs? If so, would it be crazy
for libgccjit to consume e.g. config/i386/i386-rust.cc ?

I think this would definitely make sense, and it could probably be
extended to other frontends. For the time being I think it makes
sense to try it out for gccrs and jit. But finding a fitting name
will be hard :)

Best,

Arthur

Dave

I'm not at expert at target hooks (or at the i386 backend), so if
we
do
go with this approach I'd want someone else to review those parts
of
the patch.

Have you verified that GCC builds with this patch with jit *not*
enabled in the enabled languages?

I will do.

[...snip...]

A nitpick:

+.. function:: const char * \
+ gcc_jit_target_info_arch (gcc_jit_target_info
*info)
+
+ Get the architecture of the currently running CPU.

What does this string look like?
How long does the pointer remain valid?

It's the march string, like "znver2", for instance.
It remains valid until we free the gcc_jit_target_info object.

Thanks again; hope the above makes sense
Dave

Re: [PATCH 04/52] go: Replace uses of {FLOAT, {, LONG_}DOUBLE}_TYPE_SIZE

2024-06-12 Thread Ian Lance Taylor

"Kewen.Lin"  writes:

> Hi,
>
> Gentle ping:
>
> https://gcc.gnu.org/pipermail/gcc-patches/2024-June/653387.html
>
> BR,
> Kewen
>
> on 2024/6/3 11:00, Kewen Lin wrote:
>> Joseph pointed out "floating types should have their mode,
>> not a poorly defined precision value" in the discussion[1],
>> as he and Richi suggested, the existing macros
>> {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE will be replaced with a
>> hook mode_for_floating_type.  To be prepared for that, this
>> patch is to replace use of {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE
>> in go with TYPE_PRECISION of {float,{,long_}double}_type_node.
>> 
>> [1] https://gcc.gnu.org/pipermail/gcc-patches/2024-May/651209.html
>> 
>> gcc/go/ChangeLog:
>> 
>>  * go-gcc.cc (Gcc_backend::float_type): Use TYPE_PRECISION of
>>  {float,double,long_double}_type_node to replace
>>  {FLOAT,DOUBLE,LONG_DOUBLE}_TYPE_SIZE.
>>  (Gcc_backend::complex_type): Likewise.

This is fine if the other parts of the patch are accepted.

Thanks.

Ian

[PATCH v2] libatomic: Add rcpc3 128-bit atomic operations for AArch64

2024-06-12 Thread Victor Do Nascimento

The introduction of the optional RCPC3 architectural extension for
Armv8.2-A upwards provides additional support for the release
consistency model, introducing the Load-Acquire RCpc Pair Ordered, and
Store-Release Pair Ordered operations in the form of LDIAPP and STILP.

These operations are single-copy atomic on cores which also implement
LSE2 and, as such, support for these operations is added to Libatomic
and employed accordingly when the LSE2 and RCPC3 features are detected
in a given core at runtime.

libatomic/ChangeLog:

* config/linux/aarch64/atomic_16.S (libat_load_16): Add LRCPC3
variant.
(libat_store_16): Likewise.
* config/linux/aarch64/host-config.h (HWCAP2_LRCPC3): New.
(LSE2_LRCPC3_ATOP): Previously LSE2_ATOP.  New ifuncs guarded
under it.
(has_rcpc3): New.
---
 libatomic/config/linux/aarch64/atomic_16.S   | 46 +++-
 libatomic/config/linux/aarch64/host-config.h | 34 +--
 2 files changed, 74 insertions(+), 6 deletions(-)

diff --git a/libatomic/config/linux/aarch64/atomic_16.S 
b/libatomic/config/linux/aarch64/atomic_16.S
index c44c31c6418..5767fba5c03 100644
--- a/libatomic/config/linux/aarch64/atomic_16.S
+++ b/libatomic/config/linux/aarch64/atomic_16.S
@@ -35,16 +35,21 @@
writes, this will be true when using atomics in actual code.
 
The libat__16 entry points are ARMv8.0.
-   The libat__16_i1 entry points are used when LSE128 is available.
+   The libat__16_i1 entry points are used when LSE128 or LRCPC3 is 
available.
The libat__16_i2 entry points are used when LSE2 is available.  */
 
 #include "auto-config.h"
 
.arch   armv8-a+lse
 
+/* There is overlap in atomic instructions implemented in RCPC3 and LSE2.
+   Consequently, both _i1 and _i2 suffixes are needed for functions using 
these.
+   Elsewhere, all extension-specific implementations are mapped to _i1.  */
+
+#define LRCPC3(NAME)   libat_##NAME##_i1
 #define LSE128(NAME)   libat_##NAME##_i1
 #define LSE(NAME)  libat_##NAME##_i1
-#define LSE2(NAME) libat_##NAME##_i1
+#define LSE2(NAME) libat_##NAME##_i2
 #define CORE(NAME) libat_##NAME
 #define ATOMIC(NAME)   __atomic_##NAME
 
@@ -513,6 +518,43 @@ END (test_and_set_16)
 /* ifunc implementations: Carries run-time dependence on the presence of 
further
architectural extensions.  */
 
+ENTRY_FEAT (load_16, LRCPC3)
+   cbnzw1, 1f
+
+   /* RELAXED.  */
+   ldp res0, res1, [x0]
+   ret
+1:
+   cmp w1, SEQ_CST
+   b.eq2f
+
+   /* ACQUIRE/CONSUME (Load-AcquirePC semantics).  */
+   /* ldiapp res0, res1, [x0]  */
+   .inst   0xd9411800
+   ret
+
+   /* SEQ_CST.  */
+2: ldartmp0, [x0]  /* Block reordering with Store-Release instr.  
*/
+   /* ldiapp res0, res1, [x0]  */
+   .inst   0xd9411800
+   ret
+END_FEAT (load_16, LRCPC3)
+
+
+ENTRY_FEAT (store_16, LRCPC3)
+   cbnzw4, 1f
+
+   /* RELAXED.  */
+   stp in0, in1, [x0]
+   ret
+
+   /* RELEASE/SEQ_CST.  */
+1: /* stilp in0, in1, [x0]  */
+   .inst   0xd9031802
+   ret
+END_FEAT (store_16, LRCPC3)
+
+
 ENTRY_FEAT (exchange_16, LSE128)
mov tmp0, x0
mov res0, in0
diff --git a/libatomic/config/linux/aarch64/host-config.h 
b/libatomic/config/linux/aarch64/host-config.h
index d05e9eb628f..8adf0563001 100644
--- a/libatomic/config/linux/aarch64/host-config.h
+++ b/libatomic/config/linux/aarch64/host-config.h
@@ -33,6 +33,9 @@
 #ifndef HWCAP_USCAT
 # define HWCAP_USCAT   (1 << 25)
 #endif
+#ifndef HWCAP2_LRCPC3
+# define HWCAP2_LRCPC3 (1UL << 46)
+#endif
 #ifndef HWCAP2_LSE128
 # define HWCAP2_LSE128 (1UL << 47)
 #endif
@@ -54,7 +57,7 @@ typedef struct __ifunc_arg_t {
 #if defined (LAT_CAS_N)
 # define LSE_ATOP
 #elif defined (LAT_LOAD_N) || defined (LAT_STORE_N)
-# define LSE2_ATOP
+# define LSE2_LRCPC3_ATOP
 #elif defined (LAT_EXCH_N) || defined (LAT_FIOR_N) || defined (LAT_FAND_N)
 # define LSE128_ATOP
 #endif
@@ -63,9 +66,10 @@ typedef struct __ifunc_arg_t {
 #  if defined (LSE_ATOP)
 #   define IFUNC_NCOND(N)  1
 #   define IFUNC_COND_1(hwcap & HWCAP_ATOMICS)
-#  elif defined (LSE2_ATOP)
-#   define IFUNC_NCOND(N)  1
-#   define IFUNC_COND_1(has_lse2 (hwcap, features))
+#  elif defined (LSE2_LRCPC3_ATOP)
+#   define IFUNC_NCOND(N)  2
+#   define IFUNC_COND_1(has_rcpc3 (hwcap, features))
+#   define IFUNC_COND_2(has_lse2 (hwcap, features))
 #  elif defined (LSE128_ATOP)
 #   define IFUNC_NCOND(N)  1
 #   define IFUNC_COND_1(has_lse128 (hwcap, features))
@@ -131,6 +135,28 @@ has_lse128 (unsigned long hwcap, const __ifunc_arg_t 
*features)
   return false;
 }
 
+/* LRCPC atomic support encoded in ID_AA64ISAR1_EL1.Atomic, bits[23:20].  The
+   expected value is 0b0011.  Check that.  */
+
+static inline bool
+has_rcpc3 (unsigned long hwcap, const __ifunc_arg_t *features)
+{
+  if (hwcap & _IFUNC_ARG_HWCAP
+  && features-

[PATCH v1] Match: Support more forms for the scalar unsigned .SAT_SUB

2024-06-12 Thread pan2 . li

From: Pan Li 

After we support the scalar unsigned form 1 and 2,  we would like
to introduce more forms include the branch and branchless.  There
are forms 3-10 list as below:

Form 3:
  #define SAT_SUB_U_3(T) \
  T sat_sub_u_3_##T (T x, T y) \
  { \
return x > y ? x - y : 0; \
  }

Form 4:
  #define SAT_SUB_U_4(T) \
  T sat_sub_u_4_##T (T x, T y) \
  { \
return x >= y ? x - y : 0; \
  }

Form 5:
  #define SAT_SUB_U_5(T) \
  T sat_sub_u_5_##T (T x, T y) \
  { \
return x < y ? 0 : x - y; \
  }

Form 6:
  #define SAT_SUB_U_6(T) \
  T sat_sub_u_6_##T (T x, T y) \
  { \
return x <= y ? 0 : x - y; \
  }

Form 7:
  #define SAT_SUB_U_7(T) \
  T sat_sub_u_7_##T (T x, T y) \
  { \
T ret; \
T overflow = __builtin_sub_overflow (x, y, &ret); \
return ret & (T)(overflow - 1); \
  }

Form 8:
  #define SAT_SUB_U_8(T) \
  T sat_sub_u_8_##T (T x, T y) \
  { \
T ret; \
T overflow = __builtin_sub_overflow (x, y, &ret); \
return ret & (T)-(!overflow); \
  }

Form 9:
  #define SAT_SUB_U_9(T) \
  T sat_sub_u_9_##T (T x, T y) \
  { \
T ret; \
T overflow = __builtin_sub_overflow (x, y, &ret); \
return overflow ? 0 : ret; \
  }

Form 10:
  #define SAT_SUB_U_10(T) \
  T sat_sub_u_10_##T (T x, T y) \
  { \
T ret; \
T overflow = __builtin_sub_overflow (x, y, &ret); \
return !overflow ? ret : 0; \
  }

Take form 10 as example:

SAT_SUB_U_10(uint64_t);

Before this patch:
uint8_t sat_sub_u_10_uint8_t (uint8_t x, uint8_t y)
{
  unsigned char _1;
  unsigned char _2;
  uint8_t _3;
  __complex__ unsigned char _6;

;;   basic block 2, loop depth 0
;;pred:   ENTRY
  _6 = .SUB_OVERFLOW (x_4(D), y_5(D));
  _2 = IMAGPART_EXPR <_6>;
  if (_2 == 0)
goto ; [50.00%]
  else
goto ; [50.00%]
;;succ:   3
;;4

;;   basic block 3, loop depth 0
;;pred:   2
  _1 = REALPART_EXPR <_6>;
;;succ:   4

;;   basic block 4, loop depth 0
;;pred:   2
;;3
  # _3 = PHI <0(2), _1(3)>
  return _3;
;;succ:   EXIT

}

After this patch:
uint8_t sat_sub_u_10_uint8_t (uint8_t x, uint8_t y)
{
  uint8_t _3;

;;   basic block 2, loop depth 0
;;pred:   ENTRY
  _3 = .SAT_SUB (x_4(D), y_5(D)); [tail call]
  return _3;
;;succ:   EXIT

}

The below test suites are passed for this patch:
1. The rv64gcv fully regression test with newlib.
2. The rv64gcv build with glibc.
3. The x86 bootstrap test.
4. The x86 fully regression test.

gcc/ChangeLog:

* match.pd: Add more match for unsigned sat_sub.
* tree-ssa-math-opts.cc (match_unsigned_saturation_sub): Add new
func impl to match phi node for .SAT_SUB.
(math_opts_dom_walker::after_dom_children): Try match .SAT_SUB
for the phi node, MULT_EXPR, BIT_XOR_EXPR and BIT_AND_EXPR.

Signed-off-by: Pan Li 
---
 gcc/match.pd  | 25 +++--
 gcc/tree-ssa-math-opts.cc | 33 +
 2 files changed, 56 insertions(+), 2 deletions(-)

diff --git a/gcc/match.pd b/gcc/match.pd
index 5cfe81e80b3..66e411b3359 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3140,14 +3140,14 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 /* Unsigned saturation sub, case 1 (branch with gt):
SAT_U_SUB = X > Y ? X - Y : 0  */
 (match (unsigned_integer_sat_sub @0 @1)
- (cond (gt @0 @1) (minus @0 @1) integer_zerop)
+ (cond^ (gt @0 @1) (minus @0 @1) integer_zerop)
  (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
   && types_match (type, @0, @1
 
 /* Unsigned saturation sub, case 2 (branch with ge):
SAT_U_SUB = X >= Y ? X - Y : 0.  */
 (match (unsigned_integer_sat_sub @0 @1)
- (cond (ge @0 @1) (minus @0 @1) integer_zerop)
+ (cond^ (ge @0 @1) (minus @0 @1) integer_zerop)
  (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
   && types_match (type, @0, @1
 
@@ -3165,6 +3165,27 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
  (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
   && types_match (type, @0, @1
 
+/* Unsigned saturation sub, case 5 (branchless bit_and with .SUB_OVERFLOW.  */
+(match (unsigned_integer_sat_sub @0 @1)
+ (bit_and:c (realpart (IFN_SUB_OVERFLOW@2 @0 @1))
+  (plus:c (imagpart @2) integer_minus_onep))
+ (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
+  && types_match (type, @0, @1
+
+/* Unsigned saturation sub, case 6 (branchless mult with .SUB_OVERFLOW.  */
+(match (unsigned_integer_sat_sub @0 @1)
+ (mult:c (realpart (IFN_SUB_OVERFLOW@2 @0 @1))
+  (bit_xor:c (imagpart @2) integer_onep))
+ (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
+  && types_match (type, @0, @1
+
+/* Unsigned saturation sub, case 7 (branch with .SUB_OVERFLOW.  */
+(match (unsigned_integer_sat_sub @0 @1)
+ (cond^ (eq (imagpart (IFN_SUB_OVERFLOW@2 @0 @1)) integer_zerop)
+  (realpart @2) integer_zerop)
+ (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
+  && types_match (type, @0, @1
+
 /* x >  y  &&  x != XXX_MIN  -->  x > y
x >  y  &&  x == XXX_MIN  -->  false . */

Re: [PATCH v2] [PR100106] Reject unaligned subregs when strict alignment is required

2024-06-12 Thread Maciej W. Rozycki

On Thu, 5 May 2022, Alexandre Oliva via Gcc-patches wrote:

> [PR100106] Reject unaligned subregs when strict alignment is required
> 
> From: Alexandre Oliva 
> 
> The testcase for pr100106, compiled with optimization for 32-bit
> powerpc -mcpu=604 with -mstrict-align expands the initialization of a
> union from a float _Complex value into a load from an SCmode
> constant pool entry, aligned to 4 bytes, into a DImode pseudo,
> requiring 8-byte alignment.

 This has regressed building the `alpha-linux-gnu' target, in libada, as 
from commit d6b756447cd5 including GCC 14 and up to current GCC 15 trunk:

during RTL pass: ira
+===GNAT BUG DETECTED==+
| 15.0.0 20240610 (experimental) (alpha-linux-gnu) GCC error:  |
| in gen_rtx_SUBREG, at emit-rtl.cc:1032   |
| Error detected around g-debpoo.adb:1896:8|
| Compiling g-debpoo.adb   |
| Please submit a bug report; see https://gcc.gnu.org/bugs/ .  |
| Use a subject line meaningful to you and us to track the bug.|
| Include the entire contents of this bug box in the report.   |
| Include the exact command that you entered.  |
| Also include sources listed below.   |
+==+

I have filed PR #115459.

  Maciej

Re: [PATCH v5 1/6] libgomp: change alloc-pinned tests failure mode

2024-06-12 Thread Tobias Burnus


Andrew Stubbs wrote:

The feature doesn't work on non-Linux hosts, at present, so skip the tests
entirely.

On Linux systems that have insufficient lockable memory configured we still
need to fail or else the feature won't be getting tested when we think it is,
but now there's a message to explain why.

libgomp/ChangeLog:

* testsuite/libgomp.c/alloc-pinned-1.c: Change dg-xfail-run-if to
dg-skip-if.
Correct spelling mistake.
Abort on insufficient lockable memory.
Use #error on non-linux hosts.
* testsuite/libgomp.c/alloc-pinned-2.c: Likewise.


LGTM. Thanks!

Tobias

Re: [PATCH v5 2/6] libgomp, openmp: Add ompx_gnu_pinned_mem_alloc

2024-06-12 Thread Tobias Burnus


Andrew Stubbs wrote:

Compared to the previous v4 (1/5) posting of this patch:
- The enumeration of the ompx allocators have been moved (again) to 200
   (as 100 is already in use by another toolchain vendor and this seems
   like a possible source of confusion).
- The "ompx" has also been changed to "ompx_gnu" to highlight that these
   are specifically GNU extensions.
- The failure mode of the testcases had been modified, including adding
   an abort in CHECK_SIZE and skipping the test on unsupported platforms.
- The OMP_ALLOCATE environment variable now supports the new allocator.
- The Fortran frontend allows use of the new allocator in "allocator"
   clauses.

---

This creates a new predefined allocator as a shortcut for using pinned
memory with OpenMP.  This is not in the OpenMP standard so it uses the "ompx"
namespace and an independent enum baseline of 200 (selected to not clash with
other known implementations).

The allocator is equivalent to using a custom allocator with the pinned
trait and the null fallback trait.  One motivation for having this feature is
for use by the (planned) -foffload-memory=pinned feature.


The patch LGTM.

Thanks!

Tobias

gcc/fortran/ChangeLog:

* openmp.cc (is_predefined_allocator): Update valid ranges to
  incorporate ompx_gnu_pinned_mem_alloc.

libgomp/ChangeLog:

* allocator.c (ompx_gnu_min_predefined_alloc): New.
(ompx_gnu_max_predefined_alloc): New.
(predefined_alloc_mapping): Rename to ...
(predefined_omp_alloc_mapping): ... this.
(predefined_ompx_gnu_alloc_mapping): New.
(_Static_assert): Adjust for the new name, and add a new assert for the
new table.
(predefined_allocator_p): New.
(predefined_alloc_mapping): New.
(omp_aligned_alloc): Support ompx_gnu_pinned_mem_alloc.
Use predefined_allocator_p and predefined_alloc_mapping.
(omp_free): Likewise.
(omp_alligned_calloc): Likewise.
(omp_realloc): Likewise.
* env.c (parse_allocator): Add ompx_gnu_pinned_mem_alloc.
* libgomp.texi: Document ompx_gnu_pinned_mem_alloc.
* omp.h.in (omp_allocator_handle_t): Add ompx_gnu_pinned_mem_alloc.
* omp_lib.f90.in: Add ompx_gnu_pinned_mem_alloc.
* omp_lib.h.in: Add ompx_gnu_pinned_mem_alloc.
* testsuite/libgomp.c/alloc-pinned-5.c: New test.
* testsuite/libgomp.c/alloc-pinned-6.c: New test.
* testsuite/libgomp.fortran/alloc-pinned-1.f90: New test.

gcc/testsuite/ChangeLog:

* gfortran.dg/gomp/allocate-pinned-1.f90: New test.

Co-Authored-By: Thomas Schwinge
---
  gcc/fortran/openmp.cc |  11 +-
  .../gfortran.dg/gomp/allocate-pinned-1.f90|  16 +++
  libgomp/allocator.c   | 115 +-
  libgomp/env.c |   1 +
  libgomp/libgomp.texi  |   7 +-
  libgomp/omp.h.in  |   1 +
  libgomp/omp_lib.f90.in|   2 +
  libgomp/omp_lib.h.in  |   2 +
  libgomp/testsuite/libgomp.c/alloc-pinned-5.c  | 100 +++
  libgomp/testsuite/libgomp.c/alloc-pinned-6.c  | 102 
  .../libgomp.fortran/alloc-pinned-1.f90|  16 +++
  11 files changed, 336 insertions(+), 37 deletions(-)
  create mode 100644 gcc/testsuite/gfortran.dg/gomp/allocate-pinned-1.f90
  create mode 100644 libgomp/testsuite/libgomp.c/alloc-pinned-5.c
  create mode 100644 libgomp/testsuite/libgomp.c/alloc-pinned-6.c
  create mode 100644 libgomp/testsuite/libgomp.fortran/alloc-pinned-1.f90

Re: [PATCH v2] middle-end: Drop __builtin_prefetch calls in autovectorization [PR114061]

On Tue, Jun 11, 2024 at 11:46 AM Victor Do Nascimento
 wrote:
>
> At present the autovectorizer fails to vectorize simple loops
> involving calls to `__builtin_prefetch'.  A simple example of such
> loop is given below:
>
> void foo(double * restrict a, double * restrict b, int n){
>   int i;
>   for(i=0; i a[i] = a[i] + b[i];
> __builtin_prefetch(&(b[i+8]));
>   }
> }
>
> The failure stems from two issues:
>
> 1. Given that it is typically not possible to fully reason about a
>function call due to the possibility of side effects, the
>autovectorizer does not attempt to vectorize loops which make such
>calls.
>
>Given the memory reference passed to `__builtin_prefetch', in the
>absence of assurances about its effect on the passed memory
>location the compiler deems the function unsafe to vectorize,
>marking it as clobbering memory in `vect_find_stmt_data_reference'.
>This leads to the failure in autovectorization.
>
> 2. Notwithstanding the above issue, though the prefetch statement
>would be classed as `vect_unused_in_scope', the loop invariant that
>is used in the address of the prefetch is the scalar loop's and not
>the vector loop's IV. That is, it still uses `i' and not `vec_iv'
>because the instruction wasn't vectorized, causing DCE to think the
>value is live, such that we now have both the vector and scalar loop
>invariant actively used in the loop.
>
> This patch addresses both of these:
>
> 1. About the issue regarding the memory clobber, data prefetch does
>not generate faults if its address argument is invalid and does not
>write to memory.  Therefore, it does not alter the internal state
>of the program or its control flow under any circumstance.  As
>such, it is reasonable that the function be marked as not affecting
>memory contents.
>
>To achieve this, we add the necessary logic to
>`get_references_in_stmt' to ensure that builtin functions are given
>given the same treatment as internal functions.  If the gimple call
>is to a builtin function and its function code is
>`BUILT_IN_PREFETCH', we mark `clobbers_memory' as false.
>
> 2. Finding precedence in the way clobber statements are handled,
>whereby the vectorizer drops these from both the scalar and
>vectorized versions of a given loop, we choose to drop prefetch
>hints in a similar fashion.  This seems appropriate given how
>software prefetch hints are typically ignored by processors across
>architectures, as they seldom lead to performance gain over their
>hardware counterparts.

OK.

Thanks,
Richard.

>PR tree-optimization/114061
>
> gcc/ChangeLog:
>
> * tree-data-ref.cc (get_references_in_stmt): set
> `clobbers_memory' to false for __builtin_prefetch.
> * tree-vect-loop.cc (vect_transform_loop): Drop all
> __builtin_prefetch calls from loops.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/vect/vect-prefetch-drop.c: New test.
> * gcc.target/aarch64/vect-prefetch-drop.c: Likewise.
> ---
>  gcc/testsuite/gcc.dg/vect/vect-prefetch-drop.c  | 12 
>  .../gcc.target/aarch64/vect-prefetch-drop.c | 13 +
>  gcc/tree-data-ref.cc|  2 ++
>  gcc/tree-vect-loop.cc   |  6 --
>  4 files changed, 31 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/vect/vect-prefetch-drop.c
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/vect-prefetch-drop.c
>
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-prefetch-drop.c 
> b/gcc/testsuite/gcc.dg/vect/vect-prefetch-drop.c
> new file mode 100644
> index 000..7a8915eb716
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-prefetch-drop.c
> @@ -0,0 +1,12 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_int } */
> +
> +void foo(int * restrict a, int * restrict b, int n){
> +  int i;
> +  for(i=0; i +a[i] = a[i] + b[i];
> +__builtin_prefetch(&(b[i+8]));
> +  }
> +}
> +
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  } } */
> diff --git a/gcc/testsuite/gcc.target/aarch64/vect-prefetch-drop.c 
> b/gcc/testsuite/gcc.target/aarch64/vect-prefetch-drop.c
> new file mode 100644
> index 000..e654b99fde8
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/vect-prefetch-drop.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile { target { aarch64*-*-* } } } */
> +/* { dg-additional-options "-O3 -march=armv9.2-a+sve --std=c99" { target { 
> aarch64*-*-* } } } */
> +
> +void foo(double * restrict a, double * restrict b, int n){
> +  int i;
> +  for(i=0; i +a[i] = a[i] + b[i];
> +__builtin_prefetch(&(b[i+8]));
> +  }
> +}
> +
> +/* { dg-final { scan-assembler-not "prfm" } } */
> +/* { dg-final { scan-assembler "fadd\tz\[0-9\]+.d, p\[0-9\]+/m, z\[0-9\]+.d, 
> z\[0-9\]+.d" } } */
> diff --git a/gcc/tree-data-ref.cc b/gcc/tree-data-ref.cc
> index 7b5f2d16238..bd61069b6

Re: [PATCH 2/3] Enabled LRA for ia64.

2024-06-12 Thread René Rebe

Hi,

> On Jun 12, 2024, at 13:01, Richard Biener  wrote:
> 
> On Wed, 12 Jun 2024, Rene Rebe wrote:
>> 
>> gcc/
>>* config/ia64/ia64.cc: Enable LRA for ia64.
>>* config/ia64/ia64.md: Likewise.
>>* config/ia64/predicates.md: Likewise.
> 
> That looks simple enough.  I cannot find any copyright assignment on
> file with the FSF so you probably want to contribute to GCC under
> the DCO (see https://gcc.gnu.org/dco.html), in that case please post
> patches with Signed-off-by: tags.

If it helps for the future, I can apply for copyright assignment, too.

> For this patch please state how you tested it, I assume you
> bootstrapped GCC natively on ia64-linux and ran the testsuite.
> I can find two gcc-testresult postings, one appearantly with LRA
> and one without?  Both from May:
> 
> https://sourceware.org/pipermail/gcc-testresults/2024-May/816422.html
> https://sourceware.org/pipermail/gcc-testresults/2024-May/816346.html

Yes, that are the two I quoted in the patch cover letter.

https://gcc.gnu.org/pipermail/gcc-patches/2024-June/654321.html

> somehow for example libstdc++ summaries were not merged, it might
> be you do not have recent python installed on the system?  Or you
> didn't use contrib/test_summary to create those mails.  It would be
> nice to see the difference between LRA and not LRA in the testresults,
> can you quote that?

We usually cross-compile gcc, but also ran natively for the testsuite.
Given the tests run quite long natively on the hardware we currently
have, I summed the results them up in the cover letter. I would assume
that shoudl be enough to include with a note the resulting kernel and
user-space world was booted and worked without issues?

If so, I’ll just resend with the additional information added.

Thank you so much,
René

> Thanks,
> Richard.
> 
>> ---
>> gcc/config/ia64/ia64.cc   | 7 ++-
>> gcc/config/ia64/ia64.md   | 4 ++--
>> gcc/config/ia64/predicates.md | 2 +-
>> 3 files changed, 5 insertions(+), 8 deletions(-)
>> 
>> diff --git a/gcc/config/ia64/ia64.cc b/gcc/config/ia64/ia64.cc
>> index ac3d56073ac..d189bfb2cb4 100644
>> --- a/gcc/config/ia64/ia64.cc
>> +++ b/gcc/config/ia64/ia64.cc
>> @@ -618,9 +618,6 @@ static const scoped_attribute_specs *const 
>> ia64_attribute_table[] =
>> #undef TARGET_LEGITIMATE_ADDRESS_P
>> #define TARGET_LEGITIMATE_ADDRESS_P ia64_legitimate_address_p
>> 
>> -#undef TARGET_LRA_P
>> -#define TARGET_LRA_P hook_bool_void_false
>> -
>> #undef TARGET_CANNOT_FORCE_CONST_MEM
>> #define TARGET_CANNOT_FORCE_CONST_MEM ia64_cannot_force_const_mem
>> 
>> @@ -1329,7 +1326,7 @@ ia64_expand_move (rtx op0, rtx op1)
>> {
>>   machine_mode mode = GET_MODE (op0);
>> 
>> -  if (!reload_in_progress && !reload_completed && !ia64_move_ok (op0, op1))
>> +  if (!lra_in_progress && !reload_completed && !ia64_move_ok (op0, op1))
>> op1 = force_reg (mode, op1);
>> 
>>   if ((mode == Pmode || mode == ptr_mode) && symbolic_operand (op1, 
>> VOIDmode))
>> @@ -1776,7 +1773,7 @@ ia64_expand_movxf_movrf (machine_mode mode, rtx 
>> operands[])
>> }
>> }
>> 
>> -  if (!reload_in_progress && !reload_completed)
>> +  if (!lra_in_progress && !reload_completed)
>> {
>>   operands[1] = spill_xfmode_rfmode_operand (operands[1], 0, mode);
>> 
>> diff --git a/gcc/config/ia64/ia64.md b/gcc/config/ia64/ia64.md
>> index 698e302081e..d485acc0ea8 100644
>> --- a/gcc/config/ia64/ia64.md
>> +++ b/gcc/config/ia64/ia64.md
>> @@ -2318,7 +2318,7 @@
>>  (match_operand:DI 3 "register_operand" "f"))
>> (match_operand:DI 4 "nonmemory_operand" "rI")))
>>(clobber (match_scratch:DI 5 "=f"))]
>> -  "reload_in_progress"
>> +  "lra_in_progress"
>>   "#"
>>   [(set_attr "itanium_class" "unknown")])
>> 
>> @@ -3407,7 +3407,7 @@
>>   (match_operand:DI 2 "shladd_operand" "n"))
>>  (match_operand:DI 3 "nonmemory_operand" "r"))
>> (match_operand:DI 4 "nonmemory_operand" "rI")))]
>> -  "reload_in_progress"
>> +  "lra_in_progress"
>>   "* gcc_unreachable ();"
>>   "reload_completed"
>>   [(set (match_dup 0) (plus:DI (mult:DI (match_dup 1) (match_dup 2))
>> diff --git a/gcc/config/ia64/predicates.md b/gcc/config/ia64/predicates.md
>> index 01a4effd339..85f5380e734 100644
>> --- a/gcc/config/ia64/predicates.md
>> +++ b/gcc/config/ia64/predicates.md
>> @@ -347,7 +347,7 @@
>>   allows reload the opportunity to avoid spilling addresses to
>>   the stack, and instead simply substitute in the value from a
>>   REG_EQUIV.  We'll split this up again when splitting the insn.  */
>> - if (reload_in_progress || reload_completed)
>> + if (lra_in_progress || reload_completed)
>>  return true;
>> 
>> /* Some symbol types we allow to use with any offset.  */
>> 
> 
> -- 
> Richard Biener 
> SUSE Software Solutions Germany GmbH,
> Frankenstrasse 146, 90461 Nuernberg, Germany;
> GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

-- 
ExactCODE GmbH, Lietzenburger Str. 42, DE-10789 Berlin
http://exactcode.com | http://exactscan.com | htt

Re: [PATCH] match: Improve gimple_bitwise_equal_p and gimple_bitwise_inverted_equal_p for truncating casts [PR115449]

On Wed, Jun 12, 2024 at 6:39 AM Andrew Pinski  wrote:
>
> As mentioned by Jeff in r15-831-g05daf617ea22e1d818295ed2d037456937e23530, we 
> don't handle
> `(X | Y) & ~Y` -> `X & ~Y` on the gimple level when there are some different 
> signed
> (but same precision) types dealing with matching `~Y` with the `Y` part. This
> improves both gimple_bitwise_equal_p and gimple_bitwise_inverted_equal_p to
> be able to say `(truncate)a` and `(truncate)a` are bitwise_equal and
> that `~(truncate)a` and `(truncate)a` are bitwise_invert_equal.
>
> Bootstrapped and tested on x86_64-linux-gnu with no regressions.

OK.

Richard.

> PR tree-optimization/115449
>
> gcc/ChangeLog:
>
> * gimple-match-head.cc (gimple_maybe_truncate): New declaration.
> (gimple_bitwise_equal_p): Match truncations that differ only
> in types with the same precision.
> (gimple_bitwise_inverted_equal_p): For matching after bit_not_with_nop
> call gimple_bitwise_equal_p.
> * match.pd (maybe_truncate): New match pattern.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/tree-ssa/bitops-10.c: New test.
>
> Signed-off-by: Andrew Pinski 
> ---
>  gcc/gimple-match-head.cc  | 17 +---
>  gcc/match.pd  |  7 +
>  gcc/testsuite/gcc.dg/tree-ssa/bitops-10.c | 34 +++
>  3 files changed, 48 insertions(+), 10 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/bitops-10.c
>
> diff --git a/gcc/gimple-match-head.cc b/gcc/gimple-match-head.cc
> index e26fa0860ee..924d3f1e710 100644
> --- a/gcc/gimple-match-head.cc
> +++ b/gcc/gimple-match-head.cc
> @@ -243,6 +243,7 @@ optimize_successive_divisions_p (tree divisor, tree 
> inner_div)
>gimple_bitwise_equal_p (expr1, expr2, valueize)
>
>  bool gimple_nop_convert (tree, tree *, tree (*) (tree));
> +bool gimple_maybe_truncate (tree, tree *, tree (*) (tree));
>
>  /* Helper function for bitwise_equal_p macro.  */
>
> @@ -271,6 +272,10 @@ gimple_bitwise_equal_p (tree expr1, tree expr2, tree 
> (*valueize) (tree))
>  }
>if (expr2 != expr4 && operand_equal_p (expr1, expr4, 0))
>  return true;
> +  if (gimple_maybe_truncate (expr3, &expr3, valueize)
> +  && gimple_maybe_truncate (expr4, &expr4, valueize)
> +  && operand_equal_p (expr3, expr4, 0))
> +return true;
>return false;
>  }
>
> @@ -318,21 +323,13 @@ gimple_bitwise_inverted_equal_p (tree expr1, tree 
> expr2, bool &wascmp, tree (*va
>/* Try if EXPR1 was defined as ~EXPR2. */
>if (gimple_bit_not_with_nop (expr1, &other, valueize))
>  {
> -  if (operand_equal_p (other, expr2, 0))
> -   return true;
> -  tree expr4;
> -  if (gimple_nop_convert (expr2, &expr4, valueize)
> - && operand_equal_p (other, expr4, 0))
> +  if (gimple_bitwise_equal_p (other, expr2, valueize))
> return true;
>  }
>/* Try if EXPR2 was defined as ~EXPR1. */
>if (gimple_bit_not_with_nop (expr2, &other, valueize))
>  {
> -  if (operand_equal_p (other, expr1, 0))
> -   return true;
> -  tree expr3;
> -  if (gimple_nop_convert (expr1, &expr3, valueize)
> - && operand_equal_p (other, expr3, 0))
> +  if (gimple_bitwise_equal_p (other, expr1, valueize))
> return true;
>  }
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 5cfe81e80b3..3204cf41538 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -200,6 +200,13 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>  (match (maybe_bit_not @0)
>   (bit_xor_cst@0 @1 @2))
>
> +#if GIMPLE
> +(match (maybe_truncate @0)
> + (convert @0)
> + (if (INTEGRAL_TYPE_P (type)
> +  && TYPE_PRECISION (type) < TYPE_PRECISION (TREE_TYPE (@0)
> +#endif
> +
>  /* Transform likes of (char) ABS_EXPR <(int) x> into (char) ABSU_EXPR 
> ABSU_EXPR returns unsigned absolute value of the operand and the operand
> of the ABSU_EXPR will have the corresponding signed type.  */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/bitops-10.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/bitops-10.c
> new file mode 100644
> index 000..000c5aef237
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/bitops-10.c
> @@ -0,0 +1,34 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O1 -fdump-tree-optimized-raw" } */
> +/* PR tree-optimization/115449 */
> +
> +void setBit_un(unsigned char *a, int b) {
> +   unsigned char c = 0x1UL << b;
> +   *a &= ~c;
> +   *a |= c;
> +}
> +
> +void setBit_sign(signed char *a, int b) {
> +   signed char c = 0x1UL << b;
> +   *a &= ~c;
> +   *a |= c;
> +}
> +
> +void setBit(char *a, int b) {
> +   char c = 0x1UL << b;
> +   *a &= ~c;
> +   *a |= c;
> +}
> +/*
> +   All three should produce:
> +_1 = 1 << b_4(D);
> +c_5 = (cast) _1;
> +_2 = *a_7(D);
> +_3 = _2 | c_5;
> +*a_7(D) = _3;
> +   Removing the `&~c` as we are matching `(~x & y) | x` -> `x | y`
> +   match pattern even with extra casts are being involved. */
> +
> +/* { dg-final { scan-tree-dump-not "bit_not_expr,

Re: [PATCH 2/3] Enabled LRA for ia64.

On Wed, 12 Jun 2024, René Rebe wrote:

> Hi,
> 
> > On Jun 12, 2024, at 13:01, Richard Biener  wrote:
> > 
> > On Wed, 12 Jun 2024, Rene Rebe wrote:
> >> 
> >> gcc/
> >>* config/ia64/ia64.cc: Enable LRA for ia64.
> >>* config/ia64/ia64.md: Likewise.
> >>* config/ia64/predicates.md: Likewise.
> > 
> > That looks simple enough.  I cannot find any copyright assignment on
> > file with the FSF so you probably want to contribute to GCC under
> > the DCO (see https://gcc.gnu.org/dco.html), in that case please post
> > patches with Signed-off-by: tags.
> 
> If it helps for the future, I can apply for copyright assignment, too.

It's not a requirement - you as contributor get the choice under
which legal framework you contribute to GCC, for the DCO there's
the formal requirement of Signed-off-by: tags.

> > For this patch please state how you tested it, I assume you
> > bootstrapped GCC natively on ia64-linux and ran the testsuite.
> > I can find two gcc-testresult postings, one appearantly with LRA
> > and one without?  Both from May:
> > 
> > https://sourceware.org/pipermail/gcc-testresults/2024-May/816422.html
> > https://sourceware.org/pipermail/gcc-testresults/2024-May/816346.html
> 
> Yes, that are the two I quoted in the patch cover letter.
> 
>   https://gcc.gnu.org/pipermail/gcc-patches/2024-June/654321.html
> 
> > somehow for example libstdc++ summaries were not merged, it might
> > be you do not have recent python installed on the system?  Or you
> > didn't use contrib/test_summary to create those mails.  It would be
> > nice to see the difference between LRA and not LRA in the testresults,
> > can you quote that?
> 
> We usually cross-compile gcc, but also ran natively for the testsuite.
> Given the tests run quite long natively on the hardware we currently
> have, I summed the results them up in the cover letter. I would assume
> that shoudl be enough to include with a note the resulting kernel and
> user-space world was booted and worked without issues?

I specifically wondered if bootstrap with LRA enabled succeeds.
That needs either native or emulated hardware.  I think we consider
ia64-linux a host platform and not only a cross compiler target.

> If so, I’ll just resend with the additional information added.

For the LRA enablement patch the requirement is that patches should
state how they were tested - usually you'll see sth like

Boostrapped and tested on x86_64-unknown-linux-gnu.

In your case it was

Cross-built from x86_64-linux(?) to ia64-linux, natively tested

not sure how you exactly did this though?  I've never tried
testing of a canadian-cross tree - did you copy the whole build
tree over from the x86 to the ia64 machine?

Thanks,
Richard.

> Thank you so much,
>   René
> 
> > Thanks,
> > Richard.
> > 
> >> ---
> >> gcc/config/ia64/ia64.cc   | 7 ++-
> >> gcc/config/ia64/ia64.md   | 4 ++--
> >> gcc/config/ia64/predicates.md | 2 +-
> >> 3 files changed, 5 insertions(+), 8 deletions(-)
> >> 
> >> diff --git a/gcc/config/ia64/ia64.cc b/gcc/config/ia64/ia64.cc
> >> index ac3d56073ac..d189bfb2cb4 100644
> >> --- a/gcc/config/ia64/ia64.cc
> >> +++ b/gcc/config/ia64/ia64.cc
> >> @@ -618,9 +618,6 @@ static const scoped_attribute_specs *const 
> >> ia64_attribute_table[] =
> >> #undef TARGET_LEGITIMATE_ADDRESS_P
> >> #define TARGET_LEGITIMATE_ADDRESS_P ia64_legitimate_address_p
> >> 
> >> -#undef TARGET_LRA_P
> >> -#define TARGET_LRA_P hook_bool_void_false
> >> -
> >> #undef TARGET_CANNOT_FORCE_CONST_MEM
> >> #define TARGET_CANNOT_FORCE_CONST_MEM ia64_cannot_force_const_mem
> >> 
> >> @@ -1329,7 +1326,7 @@ ia64_expand_move (rtx op0, rtx op1)
> >> {
> >>   machine_mode mode = GET_MODE (op0);
> >> 
> >> -  if (!reload_in_progress && !reload_completed && !ia64_move_ok (op0, 
> >> op1))
> >> +  if (!lra_in_progress && !reload_completed && !ia64_move_ok (op0, op1))
> >> op1 = force_reg (mode, op1);
> >> 
> >>   if ((mode == Pmode || mode == ptr_mode) && symbolic_operand (op1, 
> >> VOIDmode))
> >> @@ -1776,7 +1773,7 @@ ia64_expand_movxf_movrf (machine_mode mode, rtx 
> >> operands[])
> >> }
> >> }
> >> 
> >> -  if (!reload_in_progress && !reload_completed)
> >> +  if (!lra_in_progress && !reload_completed)
> >> {
> >>   operands[1] = spill_xfmode_rfmode_operand (operands[1], 0, mode);
> >> 
> >> diff --git a/gcc/config/ia64/ia64.md b/gcc/config/ia64/ia64.md
> >> index 698e302081e..d485acc0ea8 100644
> >> --- a/gcc/config/ia64/ia64.md
> >> +++ b/gcc/config/ia64/ia64.md
> >> @@ -2318,7 +2318,7 @@
> >>  (match_operand:DI 3 "register_operand" "f"))
> >> (match_operand:DI 4 "nonmemory_operand" "rI")))
> >>(clobber (match_scratch:DI 5 "=f"))]
> >> -  "reload_in_progress"
> >> +  "lra_in_progress"
> >>   "#"
> >>   [(set_attr "itanium_class" "unknown")])
> >> 
> >> @@ -3407,7 +3407,7 @@
> >>   (match_operand:DI 2 "shladd_operand" "n"))
> >>  (match_operand:DI 3 "nonmemory_operand" "r"))
> >> (match_operand:DI 4 "nonmemory_operand" "rI")))]
> >

Re: [PATCH v2] Target-independent store forwarding avoidance.

2024-06-12 Thread Manolis Tsamis

On Mon, Jun 10, 2024 at 9:27 PM Philipp Tomsich
 wrote:
>
> On Mon, 10 Jun 2024 at 20:03, Jeff Law  wrote:
> >
> >
> >
> > On 6/10/24 1:55 AM, Manolis Tsamis wrote:
> >
> > >>
> > > There was an older submission of a load-pair specific pass but this is
> > > a complete reimplementation and indeed significantly more general.
> > > Apart from being target independant, it addresses a number of
> > > important restrictions and can handle multiple store forwardings per
> > > load.
> > > It should be noted that it cannot handle the load-pair cases as these
> > > need special handling, but that's something we're planning to do in
> > > the future by reusing this infrastructure.
> > ACK.  Thanks for the additional background.
> >
> >
> > >
> > >>
> > >>> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> > >>> index 4e8967fd8ab..c769744d178 100644
> > >>> --- a/gcc/doc/invoke.texi
> > >>> +++ b/gcc/doc/invoke.texi
> > >>> @@ -12657,6 +12657,15 @@ loop unrolling.
> > >>>This option is enabled by default at optimization levels 
> > >>> @option{-O1},
> > >>>@option{-O2}, @option{-O3}, @option{-Os}.
> > >>>
> > >>> +@opindex favoid-store-forwarding
> > >>> +@item -favoid-store-forwarding
> > >>> +@itemx -fno-avoid-store-forwarding
> > >>> +Many CPUs will stall for many cycles when a load partially depends on 
> > >>> previous
> > >>> +smaller stores.  This pass tries to detect such cases and avoid the 
> > >>> penalty by
> > >>> +changing the order of the load and store and then fixing up the loaded 
> > >>> value.
> > >>> +
> > >>> +Disabled by default.
> > >> Is there any particular reason why this would be off by default at -O1
> > >> or higher?  It would seem to me that on modern cores that this
> > >> transformation should easily be a win.  Even on an old in-order core,
> > >> avoiding the load with the bit insert is likely profitable, just not as
> > >> much so.
> > >>
> > > I don't have a strong opinion for that but I believe Richard's
> > > suggestion to decide this on a per-target basis also makes a lot of
> > > sense.
> > > Deciding whether the transformation is profitable is tightly tied to
> > > the architecture in question (i.e. how large the stall is and what
> > > sort of bit-insert instructions are available).
> > > In order to make this more widely applicable, I think we'll need a
> > > target hook that decides in which case the forwarded stores incur a
> > > penalty and thus the transformation makes sense.
> > You and Richi are probably right.   I'm not a big fan of passes being
> > enabled/disabled on particular targets, but it may make sense here.
> >
> >
> >
> > > Afaik, for each CPU there may be cases that store forwarding is
> > > handled efficiently.
> > Absolutely.   But forwarding from a smaller store to a wider load is
> > painful from a hardware standpoint and if we can avoid it from a codegen
> > standpoint, we should.
>
> This change is what I briefly hinted as "the complete solution" that
> we had on the drawing board when we briefly talked last November in
> Santa Clara.
>
> > Did y'all look at spec2017 at all for this patch?  I've got our hardware
> > guys to expose a signal for this case so that we can (in a month or so)
> > get some hard data on how often it's happening in spec2017 and evaluate
> > how this patch helps the most affected workloads.  But if y'all already
> > have some data we can use it as a starting point.
>
>  We have looked at all of SPEC2017, especially for coverage (i.e.,
> making sure we see a significant number of uses of the transformation)
> and correctness.  The gcc_r and parest_r components triggered in a
> number of "interesting" ways (e.g., motivating the case of
> load-elimination).  If it helps, we could share the statistics for how
> often the pass triggers on compiling each of the SPEC2017 components?
>
Below is a table with 1) the number of successful
store-forwarding-avoidance transformations and 2) the number of these
where the load was also eliminated.
This is SPEC2017 intrate and fprate; the omitted benchmarks in each
case have zero transformations.

Following Richard's comment I did two runs: One with the pass placed
just after cse1 (this patch) and one with it placed after sched1
(Richard's suggestion). I see an increased number of transformations
after sched1 and also in some testcases we get improved code
generation so it looks promising. The original motivation was that
placing this early will enable subsequent passes to optimize the
transformed code, but it looks like this is not a major issue and the
load/store placement it more important.

I plan to follow up and provide some rough performance metrics and
example assembly code of the transformation for these benchmarks.
I also tried increasing the maximum distance from 10 to 15 and so only
a small increase in transformation count. From what I've seen most
cases that we care about are usually close.

The data (benchmark, # transformed, # load elimination):

After cse1:
  500:

[PATCH] configure: adjustments for building with in-tree binutils

2024-06-12 Thread Jan Beulich

For one setting ld_ver in a conditional (no in-tree ld) when it's used,
for x86 at least, in unconditional ways can't be quite right. And then
prefixing relative paths to binaries with ${objdir}/, when ${objdir}
nowadays resolves to just .libs, can at best be a leftover that wasn't
properly cleaned up at some earlier point.

gcc/

* configure.ac: Drop ${objdir}/ from NM and AR. Move setting of
  ld_ver out of conditional.
* configure: Re-generate.

--- a/gcc/configure
+++ b/gcc/configure
@@ -9066,7 +9066,7 @@ fi
 # NM
 if test x${build} = x${host} && test -f $srcdir/../binutils/nm.c \
   && test -d ../binutils ; then
-  NM='${objdir}/../binutils/nm-new'
+  NM='../binutils/nm-new'
 else
   # Extract the first word of "nm", so it can be a program name with args.
 set dummy nm; ac_word=$2
@@ -9111,7 +9111,7 @@ fi
 # AR
 if test x${build} = x${host} && test -f $srcdir/../binutils/ar.c \
   && test -d ../binutils ; then
-  AR='${objdir}/../binutils/ar'
+  AR='../binutils/ar'
 else
   # Extract the first word of "ar", so it can be a program name with args.
 set dummy ar; ac_word=$2
@@ -25919,8 +25919,8 @@ _ACEOF
 
 
 
+ld_ver=`$gcc_cv_ld --version 2>/dev/null | sed 1q`
 if test $in_tree_ld != yes ; then
-  ld_ver=`$gcc_cv_ld --version 2>/dev/null | sed 1q`
   if echo "$ld_ver" | grep GNU > /dev/null; then
 if test x"$ld_is_gold" = xyes; then
   # GNU gold --version looks like this:
--- a/gcc/configure.ac
+++ b/gcc/configure.ac
@@ -1320,7 +1320,7 @@ AC_SUBST(HAVE_PYTHON)
 # NM
 if test x${build} = x${host} && test -f $srcdir/../binutils/nm.c \
   && test -d ../binutils ; then
-  NM='${objdir}/../binutils/nm-new'
+  NM='../binutils/nm-new'
 else
   AC_CHECK_PROG(NM, nm, nm, ${CONFIG_SHELL-/bin/sh} ${srcdir}/../missing nm)
 fi
@@ -1328,7 +1328,7 @@ fi
 # AR
 if test x${build} = x${host} && test -f $srcdir/../binutils/ar.c \
   && test -d ../binutils ; then
-  AR='${objdir}/../binutils/ar'
+  AR='../binutils/ar'
 else
   AC_CHECK_PROG(AR, ar, ar, ${CONFIG_SHELL-/bin/sh} ${srcdir}/../missing ar)
 fi
@@ -3108,8 +3138,8 @@ AC_DEFINE_UNQUOTED(HAVE_GNU_INDIRECT_FUN
 
 
 changequote(,)dnl
+ld_ver=`$gcc_cv_ld --version 2>/dev/null | sed 1q`
 if test $in_tree_ld != yes ; then
-  ld_ver=`$gcc_cv_ld --version 2>/dev/null | sed 1q`
   if echo "$ld_ver" | grep GNU > /dev/null; then
 if test x"$ld_is_gold" = xyes; then
   # GNU gold --version looks like this:

Re: [patch, rs6000, middle-end 0/1] v1: Add implementation for different targets for pair mem fusion

2024-06-12 Thread Ajit Agarwal

Hello Richard:

On 12/06/24 3:02 am, Richard Sandiford wrote:
> Ajit Agarwal  writes:
>> Hello Richard:
>>
>> On 11/06/24 9:41 pm, Richard Sandiford wrote:
>>> Ajit Agarwal  writes:
>> Thanks a lot. Can I know what should we be doing with neg (fma)
>> correctness failures with load fusion.
>
> I think it would involve:
>
> - describing lxvp and stxvp as unspec patterns, as I mentioned
>   in the previous reply
>
> - making plain movoo split loads and stores into individual
>   lxv and stxvs.  (Or, alternative, it could use lxvp and stxvp,
>   but internally swap the registers after load and before store.)
>   That is, movoo should load the lower-numbered register from the
>   lower address and the higher-numbered register from the higher
>   address, and likewise for stores.
>

 Would you mind elaborating the above.
>>>
>>> I think movoo should use rs6000_split_multireg_move for all alternatives,
>>> like movxo does.  movoo should split into 2 V1TI loads/stores and movxo
>>> should split into 4 V1TI loads/stores.  lxvp and stxvp would be
>>> independent patterns of the form:
>>>
>>>   (set ...
>>>(unspec [...] UNSPEC_FOO))
>>>
>>> ---
>>>
>>
>> In load fusion pass I generate the above pattern for adjacent merge
>> pairs.
>>
>>> rs6000_split_multireg_move has:
>>>
>>>   /* The __vector_pair and __vector_quad modes are multi-register
>>>  modes, so if we have to load or store the registers, we have to be
>>>  careful to properly swap them if we're in little endian mode
>>>  below.  This means the last register gets the first memory
>>>  location.  We also need to be careful of using the right register
>>>  numbers if we are splitting XO to OO.  */
>>>
>>> But I don't see how this can work reliably if we allow the kind of
>>> subregs that you want to create here.  The register order is the opposite
>>> from the one that GCC expects.
>>>
>>> This is more a question for the PowerPC maintainers though.
>>>
>>
>> Above unspec pattern generated and modified the movoo pattern to accept
>> the above spec it goes through the rs6000_split_multireg_move
>> it splits into 2 VITI loads and generate consecutive loads with sequential
>> registers. In load_fusion pass I generate the subreg along with load results 
>> subreg (reg OO R) 16 and subreg (reg OO R) 0.
>>
>> But it doesnt generate lxvp instruction. If above unspec instruction
>> pattern and write separate pattern in md file to generate lxvp instead of
>> normal movoo, then it won't go through rs6000_split_multireg_move
> 
> I don't understand the last bit, sorry.  Under the scheme I described,
> lxvp should be generated only through an unspec (and no other way).
> Same for stxvp.  The fusion pass should generate those unspecs.
> 
> If the fusion pass has generated the code correctly, the lxvp unspec
> will remain throughout compilation, unless all uses of it are later
> deleted as dead.
> 
> The movoo rtl pattern should continue to be:
> 
>   [(set (match_operand:OO 0 "nonimmediate_operand" "=wa,ZwO,wa")
>   (match_operand:OO 1 "input_operand" "ZwO,wa,wa"))]
> 
> But movoo should generate individual loads, stores and moves.  By design,
> it should never generate lxvp or stxvp.
> 
> This means that, if a fused load is spilled, the sequence will be
> something like:
> 
>   lxvp ...   // original fused load (unspec)
>   ...
>   stxv ...   // store one half to the stack (split from movoo)
>   stxv ...   // store the other half to the stack (split from movoo)
> 
> Then insns that use the pair will load whichever half they need
> from the stack.
> 
> I realise that isn't great, but it should at least be correct.
> 

Thanks a lot. It worked.

> Thanks,
> Richard

Thanks & Regards
Ajit

Re: [PATCH 2/3] Enabled LRA for ia64.

2024-06-12 Thread René Rebe


On Jun 12, 2024, at 15:00, Richard Biener  wrote:
> 
> On Wed, 12 Jun 2024, René Rebe wrote:
> 
>> Hi,
>> 
>>> On Jun 12, 2024, at 13:01, Richard Biener  wrote:
>>> 
>>> On Wed, 12 Jun 2024, Rene Rebe wrote:
 
 gcc/
   * config/ia64/ia64.cc: Enable LRA for ia64.
   * config/ia64/ia64.md: Likewise.
   * config/ia64/predicates.md: Likewise.
>>> 
>>> That looks simple enough.  I cannot find any copyright assignment on
>>> file with the FSF so you probably want to contribute to GCC under
>>> the DCO (see https://gcc.gnu.org/dco.html), in that case please post
>>> patches with Signed-off-by: tags.
>> 
>> If it helps for the future, I can apply for copyright assignment, too.
> 
> It's not a requirement - you as contributor get the choice under
> which legal framework you contribute to GCC, for the DCO there's
> the formal requirement of Signed-off-by: tags.
> 
>>> For this patch please state how you tested it, I assume you
>>> bootstrapped GCC natively on ia64-linux and ran the testsuite.
>>> I can find two gcc-testresult postings, one appearantly with LRA
>>> and one without?  Both from May:
>>> 
>>> https://sourceware.org/pipermail/gcc-testresults/2024-May/816422.html
>>> https://sourceware.org/pipermail/gcc-testresults/2024-May/816346.html
>> 
>> Yes, that are the two I quoted in the patch cover letter.
>> 
>>  https://gcc.gnu.org/pipermail/gcc-patches/2024-June/654321.html
>> 
>>> somehow for example libstdc++ summaries were not merged, it might
>>> be you do not have recent python installed on the system?  Or you
>>> didn't use contrib/test_summary to create those mails.  It would be
>>> nice to see the difference between LRA and not LRA in the testresults,
>>> can you quote that?
>> 
>> We usually cross-compile gcc, but also ran natively for the testsuite.
>> Given the tests run quite long natively on the hardware we currently
>> have, I summed the results them up in the cover letter. I would assume
>> that shoudl be enough to include with a note the resulting kernel and
>> user-space world was booted and worked without issues?
> 
> I specifically wondered if bootstrap with LRA enabled succeeds.
> That needs either native or emulated hardware.  I think we consider
> ia64-linux a host platform and not only a cross compiler target.

With “also ran” I meant to say we did both, our T2 framework usually
boot-straps everything cross compiled, but also supports native in-system
compilation. Frank also manually natively bootstrapped gcc and ran the
testsuite natively on ia64.

>> If so, I’ll just resend with the additional information added.
> 
> For the LRA enablement patch the requirement is that patches should
> state how they were tested - usually you'll see sth like
> 
> Boostrapped and tested on x86_64-unknown-linux-gnu.
> 
> In your case it was
> 
> Cross-built from x86_64-linux(?) to ia64-linux, natively tested

OK - I include the details in v2.

> not sure how you exactly did this though?  I've never tried
> testing of a canadian-cross tree - did you copy the whole build
> tree over from the x86 to the ia64 machine?

IIRC the testsuite did not just work copying the canadian-cross.
I did run the testsuite from the cross-compiled gcc using a ssh
based dejagnu board config, but Frank also did fully bootstrap and
ran the testsuite natively.

Thanks,
René

> Thanks,
> Richard.
> 
>> Thank you so much,
>>  René
>> 
>>> Thanks,
>>> Richard.
>>> 
 ---
 gcc/config/ia64/ia64.cc   | 7 ++-
 gcc/config/ia64/ia64.md   | 4 ++--
 gcc/config/ia64/predicates.md | 2 +-
 3 files changed, 5 insertions(+), 8 deletions(-)
 
 diff --git a/gcc/config/ia64/ia64.cc b/gcc/config/ia64/ia64.cc
 index ac3d56073ac..d189bfb2cb4 100644
 --- a/gcc/config/ia64/ia64.cc
 +++ b/gcc/config/ia64/ia64.cc
 @@ -618,9 +618,6 @@ static const scoped_attribute_specs *const 
 ia64_attribute_table[] =
 #undef TARGET_LEGITIMATE_ADDRESS_P
 #define TARGET_LEGITIMATE_ADDRESS_P ia64_legitimate_address_p
 
 -#undef TARGET_LRA_P
 -#define TARGET_LRA_P hook_bool_void_false
 -
 #undef TARGET_CANNOT_FORCE_CONST_MEM
 #define TARGET_CANNOT_FORCE_CONST_MEM ia64_cannot_force_const_mem
 
 @@ -1329,7 +1326,7 @@ ia64_expand_move (rtx op0, rtx op1)
 {
  machine_mode mode = GET_MODE (op0);
 
 -  if (!reload_in_progress && !reload_completed && !ia64_move_ok (op0, 
 op1))
 +  if (!lra_in_progress && !reload_completed && !ia64_move_ok (op0, op1))
op1 = force_reg (mode, op1);
 
  if ((mode == Pmode || mode == ptr_mode) && symbolic_operand (op1, 
 VOIDmode))
 @@ -1776,7 +1773,7 @@ ia64_expand_movxf_movrf (machine_mode mode, rtx 
 operands[])
 }
}
 
 -  if (!reload_in_progress && !reload_completed)
 +  if (!lra_in_progress && !reload_completed)
{
  operands[1] = spill_xfmode_rfmode_operand (operands[1], 0, mode);
 
 diff --git a/gcc/c

Re: arm: Add .type and .size to __gnu_cmse_nonsecure_call [PR115360]

2024-06-12 Thread Richard Earnshaw (lists)

On 12/06/2024 09:53, Andre Vieira (lists) wrote:
> 
> 
> On 06/06/2024 12:53, Richard Earnshaw (lists) wrote:
>> On 05/06/2024 17:07, Andre Vieira (lists) wrote:
>>> Hi,
>>>
>>> This patch adds missing assembly directives to the CMSE library wrapper to 
>>> call functions with attribute cmse_nonsecure_call.  Without the .type 
>>> directive the linker will fail to produce the correct veneer if a call to 
>>> this wrapper function is to far from the wrapper itself.  The .size was 
>>> added for completeness, though we don't necessarily have a usecase for it.
>>>
>>> I did not add a testcase as I couldn't get dejagnu to disassemble the 
>>> linked binary to check we used an appropriate branch instruction, I did 
>>> however test it locally and with this change the GNU linker now generates 
>>> an appropriate veneer and call to that veneer when 
>>> __gnu_cmse_nonsecure_call is too far.
>>>
>>> OK for trunk and backport to any release branches still in support (after 
>>> waiting a week or so)?
>>>
>>> libgcc/ChangeLog:
>>>
>>>  PR target/115360
>>>  * config/arm/cmse_nonsecure_call.S: Add .type and .size directives.
>>
>> OK.
>>
>> R.
> 
> OK to backport? I was thinking backporting it as far as gcc-11 (we haven't 
> done a 11.5 yet).
> 
> Kind Regards,
> Andre

Yes.

R.

[pushed 3/3] pretty_printer: convert chunk_info into a class

2024-06-12 Thread David Malcolm

No functional change intended.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Successful run of analyzer integration tests on x86_64-pc-linux-gnu.
Pushed to trunk as r15-1210-g1cae1a5ce088c1.

gcc/cp/ChangeLog:
* error.cc (append_formatted_chunk): Move part of body into
chunk_info::append_formatted_chunk.

gcc/ChangeLog:
* dumpfile.cc (dump_pretty_printer::emit_items): Update for
changes to chunk_info.
* pretty-print.cc (chunk_info::append_formatted_chunk): New, based
on code in cp/error.cc's append_formatted_chunk.
(chunk_info::pop_from_output_buffer): New, based on code in
pp_output_formatted_text and dump_pretty_printer::emit_items.
(on_begin_quote): Convert to...
(chunk_info::on_begin_quote): ...this.
(on_end_quote): Convert to...
(chunk_info::on_end_quote): ...this.
(pretty_printer::format): Update for chunk_info becoming a class
and its fields gaining "m_" prefixes.  Update for on_begin_quote
and on_end_quote moving to chunk_info.
(quoting_info::handle_phase_3): Update for changes to chunk_info.
(pp_output_formatted_text): Likewise.  Move cleanup code to
chunk_info::pop_from_output_buffer.
* pretty-print.h (class output_buffer): New forward decl.
(class urlifier): New forward decl.
(struct chunk_info): Convert to...
(class chunk_info): ...this.  Add friend class pretty_printer.
(chunk_info::get_args): New accessor.
(chunk_info::get_quoting_info): New accessor.
(chunk_info::append_formatted_chunk): New decl.
(chunk_info::pop_from_output_buffer): New decl.
(chunk_info::on_begin_quote): New decl.
(chunk_info::on_end_quote): New decl.
(chunk_info::prev): Rename to...
(chunk_info::m_prev): ...this.
(chunk_info::args): Rename to...
(chunk_info::m_args): ...this.
(output_buffer::cur_chunk_array): Drop "struct" from decl.

Signed-off-by: David Malcolm 
---
 gcc/cp/error.cc | 10 +
 gcc/dumpfile.cc |  9 ++---
 gcc/pretty-print.cc | 96 -
 gcc/pretty-print.h  | 30 --
 4 files changed, 90 insertions(+), 55 deletions(-)

diff --git a/gcc/cp/error.cc b/gcc/cp/error.cc
index 01ad794df8e3..171a352c85fd 100644
--- a/gcc/cp/error.cc
+++ b/gcc/cp/error.cc
@@ -4307,14 +4307,8 @@ static void
 append_formatted_chunk (pretty_printer *pp, const char *content)
 {
   output_buffer *buffer = pp_buffer (pp);
-  struct chunk_info *chunk_array = buffer->cur_chunk_array;
-  const char **args = chunk_array->args;
-
-  unsigned int chunk_idx;
-  for (chunk_idx = 0; args[chunk_idx]; chunk_idx++)
-;
-  args[chunk_idx++] = content;
-  args[chunk_idx] = NULL;
+  chunk_info *chunk_array = buffer->cur_chunk_array;
+  chunk_array->append_formatted_chunk (content);
 }
 
 /* Create a copy of CONTENT, with quotes added, and,
diff --git a/gcc/dumpfile.cc b/gcc/dumpfile.cc
index 097f9bcfff21..82bd8b06bebf 100644
--- a/gcc/dumpfile.cc
+++ b/gcc/dumpfile.cc
@@ -819,8 +819,8 @@ void
 dump_pretty_printer::emit_items (optinfo *dest)
 {
   output_buffer *buffer = pp_buffer (this);
-  struct chunk_info *chunk_array = buffer->cur_chunk_array;
-  const char **args = chunk_array->args;
+  chunk_info *chunk_array = buffer->cur_chunk_array;
+  const char * const *args = chunk_array->get_args ();
 
   gcc_assert (buffer->obstack == &buffer->formatted_obstack);
   gcc_assert (buffer->line_length == 0);
@@ -847,10 +847,7 @@ dump_pretty_printer::emit_items (optinfo *dest)
   /* Ensure that we consumed all of stashed_items.  */
   gcc_assert (stashed_item_idx == m_stashed_items.length ());
 
-  /* Deallocate the chunk structure and everything after it (i.e. the
- associated series of formatted strings).  */
-  buffer->cur_chunk_array = chunk_array->prev;
-  obstack_free (&buffer->chunk_obstack, chunk_array);
+  chunk_array->pop_from_output_buffer (*buffer);
 }
 
 /* Subroutine of dump_pretty_printer::emit_items
diff --git a/gcc/pretty-print.cc b/gcc/pretty-print.cc
index 271cd650c4d1..639e2b881586 100644
--- a/gcc/pretty-print.cc
+++ b/gcc/pretty-print.cc
@@ -1239,29 +1239,53 @@ private:
   std::vector m_phase_3_quotes;
 };
 
-static void
-on_begin_quote (const output_buffer &buf,
-   unsigned chunk_idx,
-   const urlifier *urlifier)
+/* Adds a chunk to the end of formatted output, so that it
+   will be printed by pp_output_formatted_text.  */
+
+void
+chunk_info::append_formatted_chunk (const char *content)
+{
+  unsigned int chunk_idx;
+  for (chunk_idx = 0; m_args[chunk_idx]; chunk_idx++)
+;
+  m_args[chunk_idx++] = content;
+  m_args[chunk_idx] = nullptr;
+}
+
+/* Deallocate the current chunk structure and everything after it (i.e. the
+   associated series of formatted strings).  */
+
+void
+chunk_info::pop_from_output_buffer (output_buffer &buf)
+{
+  delete m_quotes;
+  buf.cu

[pushed 2/3] pretty_printer: make all fields private

2024-06-12 Thread David Malcolm

No functional change intended.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Successful run of analyzer integration tests on x86_64-pc-linux-gnu.
Pushed to trunk as r15-1209-gc5e3be456888aa.

gcc/analyzer/ChangeLog:
* access-diagram.cc (access_range::dump): Update for fields of
pretty_printer becoming private.
* call-details.cc (call_details::dump): Likewise.
* call-summary.cc (call_summary::dump): Likewise.
(call_summary_replay::dump): Likewise.
* checker-event.cc (checker_event::debug): Likewise.
* constraint-manager.cc (range::dump): Likewise.
(bounded_range::dump): Likewise.
(constraint_manager::dump): Likewise.
* engine.cc (exploded_node::dump): Likewise.
(exploded_path::dump): Likewise.
(exploded_path::dump_to_file): Likewise.
* feasible-graph.cc (feasible_graph::dump_feasible_path): Likewise.
* program-point.cc (program_point::dump): Likewise.
* program-state.cc (extrinsic_state::dump_to_file): Likewise.
(sm_state_map::dump): Likewise.
(program_state::dump_to_file): Likewise.
* ranges.cc (symbolic_byte_offset::dump): Likewise.
(symbolic_byte_range::dump): Likewise.
* record-layout.cc (record_layout::dump): Likewise.
* region-model-reachability.cc (reachable_regions::dump): Likewise.
* region-model.cc (region_to_value_map::dump): Likewise.
(region_model::dump): Likewise.
(model_merger::dump): Likewise.
* region-model.h (one_way_id_map::dump): Likewise.
* region.cc (region_offset::dump): Likewise.
(region::dump): Likewise.
* sm-malloc.cc (deallocator_set::dump): Likewise.
* store.cc (uncertainty_t::dump): Likewise.
(binding_key::dump): Likewise.
(bit_range::dump): Likewise.
(byte_range::dump): Likewise.
(binding_map::dump): Likewise.
(binding_cluster::dump): Likewise.
(store::dump): Likewise.
* supergraph.cc (supergraph::dump_dot_to_file): Likewise.
(superedge::dump): Likewise.
* svalue.cc (svalue::dump): Likewise.

gcc/c-family/ChangeLog:
* c-ada-spec.cc (dump_ads): Update for fields of pretty_printer
becoming private.
* c-pretty-print.cc: Likewise throughout.

gcc/c/ChangeLog:
* c-objc-common.cc (print_type): Update for fields of
pretty_printer becoming private.
(c_tree_printer): Likewise.

gcc/cp/ChangeLog:
* cxx-pretty-print.cc: Update throughout for fields of
pretty_printer becoming private.
* error.cc: Likewise.

gcc/ChangeLog:
* diagnostic.cc (diagnostic_context::urls_init): Update for fields
of pretty_printer becoming private.
(diagnostic_context::print_any_cwe): Likewise.
(diagnostic_context::print_any_rules): Likewise.
(diagnostic_context::print_option_information): Likewise.
* diagnostic.h (diagnostic_format_decoder): Likewise.
(diagnostic_prefixing_rule): Likewise, fixing typo.
* digraph.cc (test_dump_to_dot): Likewise.
* digraph.h (digraph::dump_dot_to_file): Likewise.
* dumpfile.cc
(dump_pretty_printer::emit_any_pending_textual_chunks): Likewise.
* gimple-pretty-print.cc (print_gimple_stmt): Likewise.
(print_gimple_expr): Likewise.
(print_gimple_seq): Likewise.
(dump_ssaname_info_to_file): Likewise.
(gimple_dump_bb): Likewise.
* graph.cc (print_graph_cfg): Likewise.
(start_graph_dump): Likewise.
* langhooks.cc (lhd_print_error_function): Likewise.
* lto-wrapper.cc (print_lto_docs_link): Likewise.
* pretty-print.cc (pp_set_real_maximum_length): Convert to...
(pretty_printer::set_real_maximum_length): ...this.
(pp_clear_state): Convert to...
(pretty_printer::clear_state): ...this.
(pp_wrap_text): Update for pp_remaining_character_count_for_line
becoming a member function.
(urlify_quoted_string): Update for fields of pretty_printer becoming
private.
(pp_format): Convert to...
(pretty_printer::format): ...this.  Reduce the scope of local
variables "old_line_length" and "old_wrapping_mode" and make
const.  Reduce the scope of locals "args", "new_chunk_array",
"curarg", "any_unnumbered", and "any_numbered".
(pp_output_formatted_text): Update for fields of pretty_printer
becoming private.
(pp_flush): Likewise.
(pp_really_flush): Likewise.
(pp_set_line_maximum_length): Likewise.
(pp_set_prefix): Convert to...
(pretty_printer::set_prefix): ...this.
(pp_take_prefix): Update for fields of pretty_printer gaining
"m_" prefixes.
(pp_destroy_prefix): Likewise.
(pp_emit_prefix): Convert to...
(pretty_printer::emit_prefix): ...this.
(pretty_printer::pretty_

Re: [patch, rs6000, middle-end 0/1] v1: Add implementation for different targets for pair mem fusion

2024-06-12 Thread Ajit Agarwal

Hello Richard:

On 12/06/24 3:02 am, Richard Sandiford wrote:
> Ajit Agarwal  writes:
>> Hello Richard:
>>
>> On 11/06/24 9:41 pm, Richard Sandiford wrote:
>>> Ajit Agarwal  writes:
>> Thanks a lot. Can I know what should we be doing with neg (fma)
>> correctness failures with load fusion.
>
> I think it would involve:
>
> - describing lxvp and stxvp as unspec patterns, as I mentioned
>   in the previous reply
>
> - making plain movoo split loads and stores into individual
>   lxv and stxvs.  (Or, alternative, it could use lxvp and stxvp,
>   but internally swap the registers after load and before store.)
>   That is, movoo should load the lower-numbered register from the
>   lower address and the higher-numbered register from the higher
>   address, and likewise for stores.
>

 Would you mind elaborating the above.
>>>
>>> I think movoo should use rs6000_split_multireg_move for all alternatives,
>>> like movxo does.  movoo should split into 2 V1TI loads/stores and movxo
>>> should split into 4 V1TI loads/stores.  lxvp and stxvp would be
>>> independent patterns of the form:
>>>
>>>   (set ...
>>>(unspec [...] UNSPEC_FOO))
>>>
>>> ---
>>>
>>
>> In load fusion pass I generate the above pattern for adjacent merge
>> pairs.
>>
>>> rs6000_split_multireg_move has:
>>>
>>>   /* The __vector_pair and __vector_quad modes are multi-register
>>>  modes, so if we have to load or store the registers, we have to be
>>>  careful to properly swap them if we're in little endian mode
>>>  below.  This means the last register gets the first memory
>>>  location.  We also need to be careful of using the right register
>>>  numbers if we are splitting XO to OO.  */
>>>
>>> But I don't see how this can work reliably if we allow the kind of
>>> subregs that you want to create here.  The register order is the opposite
>>> from the one that GCC expects.
>>>
>>> This is more a question for the PowerPC maintainers though.
>>>
>>
>> Above unspec pattern generated and modified the movoo pattern to accept
>> the above spec it goes through the rs6000_split_multireg_move
>> it splits into 2 VITI loads and generate consecutive loads with sequential
>> registers. In load_fusion pass I generate the subreg along with load results 
>> subreg (reg OO R) 16 and subreg (reg OO R) 0.
>>
>> But it doesnt generate lxvp instruction. If above unspec instruction
>> pattern and write separate pattern in md file to generate lxvp instead of
>> normal movoo, then it won't go through rs6000_split_multireg_move
> 
> I don't understand the last bit, sorry.  Under the scheme I described,
> lxvp should be generated only through an unspec (and no other way).
> Same for stxvp.  The fusion pass should generate those unspecs.
> 
> If the fusion pass has generated the code correctly, the lxvp unspec
> will remain throughout compilation, unless all uses of it are later
> deleted as dead.
> 
> The movoo rtl pattern should continue to be:
> 
>   [(set (match_operand:OO 0 "nonimmediate_operand" "=wa,ZwO,wa")
>   (match_operand:OO 1 "input_operand" "ZwO,wa,wa"))]
> 
> But movoo should generate individual loads, stores and moves.  By design,
> it should never generate lxvp or stxvp.
> 
> This means that, if a fused load is spilled, the sequence will be
> something like:
> 
>   lxvp ...   // original fused load (unspec)
>   ...
>   stxv ...   // store one half to the stack (split from movoo)
>   stxv ...   // store the other half to the stack (split from movoo)
> 
> Then insns that use the pair will load whichever half they need
> from the stack.
> 
> I realise that isn't great, but it should at least be correct.
> 

Thanks a lot. It worked.

> Thanks,
> Richard

Thanks & Regards
Ajit

[PATCH] Move cexpr_stree tree string build into utility function

No semantics changes.

gcc/cp/ChangeLog:

* cp-tree.h (extract): Add new overload to return tree.
* parser.cc (cp_parser_asm_string_expression): Use tree extract.
* semantics.cc (cexpr_str::extract): Add new overload to return
  tree.
---
 gcc/cp/cp-tree.h|  1 +
 gcc/cp/parser.cc|  5 +
 gcc/cp/semantics.cc | 14 ++
 3 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 62718ff126a2..416c60b7311e 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -9026,6 +9026,7 @@ public:
 
   bool type_check (location_t location);
   bool extract (location_t location, const char * & msg, int &len);
+  bool extract (location_t location, tree &str);
   tree message;
 private:
   tree message_data = NULL_TREE;
diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index 6cd7274046da..de5f0483c120 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -22862,12 +22862,9 @@ cp_parser_asm_string_expression (cp_parser *parser)
   cexpr_str cstr (string);
   if (!cstr.type_check (tok->location))
return error_mark_node;
-  const char *msg;
-  int len;
-  if (!cstr.extract (tok->location, msg, len))
+  if (!cstr.extract (tok->location, string))
return error_mark_node;
   parens.require_close (parser);
-  string = build_string (len, msg);
   return string;
 }
   else if (!cp_parser_is_string_literal (tok))
diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index 20f4675833e2..08f5f245e7d1 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -11728,6 +11728,20 @@ cexpr_str::type_check (location_t location)
   return true;
 }
 
+/* Extract constant string at LOCATON into output string STR.
+   Returns true if successful, otherwise false.  */
+
+bool
+cexpr_str::extract (location_t location, tree &str)
+{
+  const char *msg;
+  int len;
+  if (!extract (location, msg, len))
+return false;
+  str = build_string (len, msg);
+  return true;
+}
+
 /* Extract constant string at LOCATION into output string MSG with LEN.
Returns true if successful, otherwise false.  */
 
-- 
2.45.1

Re: [PATCH 2/3] Enabled LRA for ia64.

2024-06-12 Thread Frank Scheiner


Hi all,

On 12.06.24 15:19, René Rebe wrote:

On Jun 12, 2024, at 15:00, Richard Biener  wrote:

On Wed, 12 Jun 2024, René Rebe wrote:

On Jun 12, 2024, at 13:01, Richard Biener  wrote:
On Wed, 12 Jun 2024, Rene Rebe wrote:

not sure how you exactly did this though?  I've never tried
testing of a canadian-cross tree - did you copy the whole build
tree over from the x86 to the ia64 machine?


IIRC the testsuite did not just work copying the canadian-cross.
I did run the testsuite from the cross-compiled gcc using a ssh
based dejagnu board config, but Frank also did fully bootstrap and
ran the testsuite natively.


Exactly, the results I posted are both based on natively bootstrapped 
GCC binaries (took ca. 5 hours each on my rx2800 i2). The post titles 
include the exact commit hash they are based on:


1. [ia64] Results for 15.0.0 20240528 (experimental) [master revision 
236116068151bbc72aaaf53d0f223fe06f7e3bac] (GCC) testsuite on 
ia64-t2-linux-gnu ([1])


2. [ia64] Results for 15.0.0 20240528 (experimental) [master revision 
236116068151bbc72aaaf53d0f223fe06f7e3bac] (GCC) w/LRA testsuite on 
ia64-t2-linux-gnu ([2])


[1]: https://gcc.gnu.org/pipermail/gcc-testresults/2024-May/816346.html

[2]: https://gcc.gnu.org/pipermail/gcc-testresults/2024-May/816422.html

I tried to save time during the testsuite runs (it still took more than 
7 hours on my rx2800 i2 running in tmpfs) by manually running multiple 
testsuites in parallel in the following fashion:


```
gcc   | libstdc++
  |
  |
--|
g++   |
  |
  |
--|
gfortran  |
  |
  |
--|
libgomp   |---
  | libatomic
  |---
  | objc
```
... and also using multiple jobs per testsuite where it saved time (e.g. 
it does not for the libgomp testsuite). This is the reason that the 
output is somewhat split up.


[1] and [2] each also list the commands used to run the testsuites and 
timing data. For reference these were produced on a:


rx2800 i2 w/1 x Itanium 2 9320 running @1.33 GHz w/SMT enabled

Cheers,
Frank

Re: [PATCH 2/3] Enabled LRA for ia64.

2024-06-12 Thread Frank Scheiner


Dear Richard,

On 12.06.24 13:01, Richard Biener wrote:

[...]
I can find two gcc-testresult postings, one appearantly with LRA
and one without?  Both from May:

https://sourceware.org/pipermail/gcc-testresults/2024-May/816422.html
https://sourceware.org/pipermail/gcc-testresults/2024-May/816346.html

somehow for example libstdc++ summaries were not merged, it might
be you do not have recent python installed on the system?  Or you
didn't use contrib/test_summary to create those mails.


No, I did not use contrib/test_summary. But I still have tarballs of
both testsuite runs, so could still produce these summaries - I hope?

Do I need to run this on the host that did the testing or can I run it
on my NFS server where the tarballs are actually located, too?

Architectures are different though, the NFS server is 32-bit ARM.

Cheers,
Frank

Re: [PATCH v3 2/2] C++: Support constexpr strings for asm statements

2024-06-12 Thread Jason Merrill


On 6/11/24 23:53, Andi Kleen wrote:


Sorry I must have misunderstood you. I thought the patch was already
approved earlier and I did commit. I can revert or do additional
changes.


I only meant to approve the refactoring patch, but no worries.


On Tue, Jun 11, 2024 at 04:31:30PM -0400, Jason Merrill wrote:

+  if (tok->type == CPP_OPEN_PAREN)
+{
+  matching_parens parens;
+  parens.consume_open (parser);
+  tree string = cp_parser_constant_expression (parser);
+  if (string != error_mark_node)
+   string = cxx_constant_value (string, tf_error);
+  if (TREE_CODE (string) == NOP_EXPR)
+   string = TREE_OPERAND (string, 0);
+  if (TREE_CODE (string) == ADDR_EXPR
+ && TREE_CODE (TREE_OPERAND (string, 0)) == STRING_CST)
+   string = TREE_OPERAND (string, 0);
+  if (TREE_CODE (string) == VIEW_CONVERT_EXPR)
+   string = TREE_OPERAND (string, 0);


What in the testcases needs this wrapper stripping?


Without the stripping I get

/home/ak/gcc/gcc/gcc/testsuite/g++.dg/cpp1z/constexpr-asm-1.C: In
function 'void f()':
/home/ak/gcc/gcc/gcc/testsuite/g++.dg/cpp1z/constexpr-asm-1.C:27:16:
error: request for member 'size' in '(const char*)"foo %1,%0"', which is
of non-class type 'const char*'
/home/ak/gcc/gcc/gcc/testsuite/g++.dg/cpp1z/constexpr-asm-1.C:27:7:
error: constexpr string must be a string literal or object with 'size'
and 'data' members
/home/ak/gcc/gcc/gcc/testsuite/g++.dg/cpp1z/constexpr-asm-1.C:27:19:
error: expected primary-expression before ':' token
compiler exited with status 1

presumably because it fails this test:

   if (TREE_CODE (message) != STRING_CST
 && !type_dependent_expression_p (message))


Ah, yes, because you want to use a function returning const char *, 
which is not allowed for static_assert; rather, static_assert wants the 
function to return a wrapper class like std::string_view.  Only the 
return type needs to change; the return statement can still return a 
string-literal.


This extension relative to static_assert seems unnecessary to me.


+   "expected string-literal or constexpr in brackets");


parentheses, not brackets.

Jason

Re: [PATCH v2] Target-independent store forwarding avoidance.





On 6/12/24 12:47 AM, Richard Biener wrote:



One of the points I wanted to make is that sched1 can make quite a
difference as to the relative distance of the store and load and
we have the instruction window the pass considers when scanning
(possibly driven by target uarch details).  So doing the rewriting
before sched1 might be not ideal (but I don't know how much cleanup
work the pass leaves behind - there's nothing between sched1 and RA).
ACK.  I guess I'm just skeptical about much separation we can get in 
practice from scheduling.


As far as cleanup opportunity, it likely comes down to how clean the 
initial codegen is for the bitfield insertion step.






On the hardware side I always wondered whether a failed load-to-store
forward results in the load uop stalling (because the hardware actually
_did_ see the conflict with an in-flight store) or whether this gets
catched later as the hardware speculates a load from L1 (with the
wrong value) but has to roll back because of the conflict.  I would
imagine the latter is cheaper to implement but worse in case of
conflict.
I wouldn't be surprised to see both approaches being used and I suspect 
it really depends on how speculative your uarch is.  At some point 
there's enough speculation going on that you can't detect the violation 
early enough and you have to implement a replay/rollback scheme.


jeff

Re: [PATCH] regenerate-opt-urls.py: fix transposed values for "vax" and "v850"

2024-06-12 Thread Maciej W. Rozycki

On Mon, 3 Jun 2024, David Malcolm wrote:

> >  Thank you for fixing this up.  Is this a new requirement now for
> > .opt 
> > file changes?  
> 
> Yes, as of GCC 14.
> 
> I posted the objectives here:
> https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636060.html

 Thank you for the pointer.

> > Why does it have to be called by hand then rather than
> > being a make dependency?
> 
> IIRC, the idea is:
> (a) to avoid requiring Python 3 as a build dependency, and 
> (b) to avoid requiring the HTML docs to be built before gcc's code can
> be built
> 
> since missing a few URLs is a relatively minor issue that we don't want
> to complicate the build for.

 Understood.  My overall experience with Python over the years has been 
very bad, with things breaking randomly depending on the phase of the 
moon, the tide, etc., so I think avoiding it as a build dependency in the 
field is a good idea.

> Hence we decided to check for it in CI instead.
> 
> Hope the trade-off sounds reasonable

 I have reviewed the thread referred and I note that a concern such as 
mine has already been raised in response to which you have added the 
`regenerate-opt-urls' make target (thanks!).

 I think it's a good direction given the circumstances, but I also think a 
more generic make target such as `generated-sources' or suchlike would be 
good having as an umbrella for any future additions we may make, and as 
one that might be easier to remember.  Perhaps it could be a dependency 
for `all' even, in the maintainer mode (which I confess to be guilty of 
not regularly using though), along with any autoconf stuff, etc.

  Maciej

Re: [PATCH] Move cexpr_stree tree string build into utility function

2024-06-12 Thread Jason Merrill


On 6/12/24 10:02, Andi Kleen wrote:

No semantics changes.

gcc/cp/ChangeLog:

* cp-tree.h (extract): Add new overload to return tree.
* parser.cc (cp_parser_asm_string_expression): Use tree extract.
* semantics.cc (cexpr_str::extract): Add new overload to return
  tree.


OK.

Jason


---
  gcc/cp/cp-tree.h|  1 +
  gcc/cp/parser.cc|  5 +
  gcc/cp/semantics.cc | 14 ++
  3 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 62718ff126a2..416c60b7311e 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -9026,6 +9026,7 @@ public:
  
bool type_check (location_t location);

bool extract (location_t location, const char * & msg, int &len);
+  bool extract (location_t location, tree &str);
tree message;
  private:
tree message_data = NULL_TREE;
diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index 6cd7274046da..de5f0483c120 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -22862,12 +22862,9 @@ cp_parser_asm_string_expression (cp_parser *parser)
cexpr_str cstr (string);
if (!cstr.type_check (tok->location))
return error_mark_node;
-  const char *msg;
-  int len;
-  if (!cstr.extract (tok->location, msg, len))
+  if (!cstr.extract (tok->location, string))
return error_mark_node;
parens.require_close (parser);
-  string = build_string (len, msg);
return string;
  }
else if (!cp_parser_is_string_literal (tok))
diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index 20f4675833e2..08f5f245e7d1 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -11728,6 +11728,20 @@ cexpr_str::type_check (location_t location)
return true;
  }
  
+/* Extract constant string at LOCATON into output string STR.

+   Returns true if successful, otherwise false.  */
+
+bool
+cexpr_str::extract (location_t location, tree &str)
+{
+  const char *msg;
+  int len;
+  if (!extract (location, msg, len))
+return false;
+  str = build_string (len, msg);
+  return true;
+}
+
  /* Extract constant string at LOCATION into output string MSG with LEN.
 Returns true if successful, otherwise false.  */

[committed] libstdc++: Do not use memset in _Hashtable::clear()

Tested x86_64-linux. Pushed to trunk.

-- >8 --

Using memset is incorrect if the __bucket_ptr type is non-trivial, or
does not use an all-zero bit pattern for its null value.

Replace the three uses of memset with std::fill_n to set the pointers to
nullptr.

libstdc++-v3/ChangeLog:

* include/bits/hashtable.h (_Hashtable::clear): Do not use
memset to zero out bucket pointers.
(_Hashtable::_M_assign_elements): Likewise.
---
 libstdc++-v3/include/bits/hashtable.h | 10 --
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/libstdc++-v3/include/bits/hashtable.h 
b/libstdc++-v3/include/bits/hashtable.h
index 6e78cb7d9c0..983aa909d6c 100644
--- a/libstdc++-v3/include/bits/hashtable.h
+++ b/libstdc++-v3/include/bits/hashtable.h
@@ -34,6 +34,7 @@
 
 #include 
 #include 
+#include  // fill_n
 #include  // __has_is_transparent_t
 #if __cplusplus > 201402L
 # include 
@@ -1376,8 +1377,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
_M_bucket_count = __ht._M_bucket_count;
  }
else
- __builtin_memset(_M_buckets, 0,
-  _M_bucket_count * sizeof(__node_base_ptr));
+ std::fill_n(_M_buckets, _M_bucket_count, nullptr);
 
__try
  {
@@ -1400,8 +1400,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
_M_buckets = __former_buckets;
_M_bucket_count = __former_bucket_count;
  }
-   __builtin_memset(_M_buckets, 0,
-_M_bucket_count * sizeof(__node_base_ptr));
+   std::fill_n(_M_buckets, _M_bucket_count, nullptr);
__throw_exception_again;
  }
   }
@@ -2582,8 +2581,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 clear() noexcept
 {
   this->_M_deallocate_nodes(_M_begin());
-  __builtin_memset(_M_buckets, 0,
-  _M_bucket_count * sizeof(__node_base_ptr));
+  std::fill_n(_M_buckets, _M_bucket_count, nullptr);
   _M_element_count = 0;
   _M_before_begin._M_nxt = nullptr;
 }
-- 
2.45.1

[committed] libstdc++: Fix std::tr2::dynamic_bitset shift operations [PR115399]

Tested x86_64-linux. Pushed to trunk.

-- >8 --

The shift operations for dynamic_bitset fail to zero out words where the
non-zero bits were shifted to a completely different word.

For a right shift we don't need to sanitize the unused bits in the high
word, because we know they were already clear and a right shift doesn't
change that.

libstdc++-v3/ChangeLog:

PR libstdc++/115399
* include/tr2/dynamic_bitset (operator>>=): Remove redundant
call to _M_do_sanitize.
* include/tr2/dynamic_bitset.tcc (_M_do_left_shift): Zero out
low bits in words that should no longer be populated.
(_M_do_right_shift): Likewise for high bits.
* testsuite/tr2/dynamic_bitset/pr115399.cc: New test.
---
 libstdc++-v3/include/tr2/dynamic_bitset   |  5 +--
 libstdc++-v3/include/tr2/dynamic_bitset.tcc   |  6 +--
 .../testsuite/tr2/dynamic_bitset/pr115399.cc  | 37 +++
 3 files changed, 40 insertions(+), 8 deletions(-)
 create mode 100644 libstdc++-v3/testsuite/tr2/dynamic_bitset/pr115399.cc

diff --git a/libstdc++-v3/include/tr2/dynamic_bitset 
b/libstdc++-v3/include/tr2/dynamic_bitset
index 0e4e8894287..274c4f6a138 100644
--- a/libstdc++-v3/include/tr2/dynamic_bitset
+++ b/libstdc++-v3/include/tr2/dynamic_bitset
@@ -815,10 +815,7 @@ namespace tr2
   operator>>=(size_type __pos)
   {
if (__builtin_expect(__pos < this->_M_Nb, 1))
- {
-   this->_M_do_right_shift(__pos);
-   this->_M_do_sanitize();
- }
+ this->_M_do_right_shift(__pos);
else
  this->_M_do_reset();
return *this;
diff --git a/libstdc++-v3/include/tr2/dynamic_bitset.tcc 
b/libstdc++-v3/include/tr2/dynamic_bitset.tcc
index 63ba6f285c7..5aac7d88ee3 100644
--- a/libstdc++-v3/include/tr2/dynamic_bitset.tcc
+++ b/libstdc++-v3/include/tr2/dynamic_bitset.tcc
@@ -60,8 +60,7 @@ namespace tr2
  this->_M_w[__wshift] = this->_M_w[0] << __offset;
}
 
-  std::fill(this->_M_w.begin(), this->_M_w.begin() + __wshift,
-   static_cast<_WordT>(0));
+ std::fill_n(this->_M_w.begin(), __wshift, _WordT(0));
}
 }
 
@@ -88,8 +87,7 @@ namespace tr2
  this->_M_w[__limit] = this->_M_w[_M_w.size()-1] >> __offset;
}
 
- std::fill(this->_M_w.begin() + __limit + 1, this->_M_w.end(),
-   static_cast<_WordT>(0));
+ std::fill_n(this->_M_w.end() - __wshift, __wshift, _WordT(0));
}
 }
 
diff --git a/libstdc++-v3/testsuite/tr2/dynamic_bitset/pr115399.cc 
b/libstdc++-v3/testsuite/tr2/dynamic_bitset/pr115399.cc
new file mode 100644
index 000..e626e4a5d15
--- /dev/null
+++ b/libstdc++-v3/testsuite/tr2/dynamic_bitset/pr115399.cc
@@ -0,0 +1,37 @@
+// { dg-do run { target c++11 } }
+
+// PR libstdc++/115399
+// std::tr2::dynamic_bitset shift behaves differently from std::bitset
+
+#include 
+#include 
+
+void
+test_left_shift()
+{
+  std::tr2::dynamic_bitset<> b(65);
+  b[0] = 1;
+  auto b2 = b << 64;
+  VERIFY(b2[64] == 1);
+  VERIFY(b2[0] == 0);
+  b <<= 64;
+  VERIFY( b2 == b );
+}
+
+void
+test_right_shift()
+{
+  std::tr2::dynamic_bitset<> b(65);
+  b[64] = 1;
+  auto b2 = b >> 64;
+  VERIFY(b2[64] == 0);
+  VERIFY(b2[0] == 1);
+  b >>= 64;
+  VERIFY( b2 == b );
+}
+
+int main()
+{
+  test_left_shift();
+  test_right_shift();
+}
-- 
2.45.1

Re: [PATCH] rtlanal: Correct cost regularization in pattern_cost

Richard Biener  writes:
> On Fri, May 10, 2024 at 4:25 AM HAO CHEN GUI  wrote:
>>
>> Hi,
>>The cost return from set_src_cost might be zero. Zero for
>> pattern_cost means unknown cost. So the regularization converts the zero
>> to COSTS_N_INSNS (1).
>>
>>// pattern_cost
>>cost = set_src_cost (SET_SRC (set), GET_MODE (SET_DEST (set)), speed);
>>return cost > 0 ? cost : COSTS_N_INSNS (1);
>>
>>But if set_src_cost returns a value less than COSTS_N_INSNS (1), it's
>> untouched and just returned by pattern_cost. Thus "zero" from set_src_cost
>> is higher than "one" from set_src_cost.
>>
>>   For instance, i386 returns cost "one" for zero_extend op.
>> //ix86_rtx_costs
>> case ZERO_EXTEND:
>>   /* The zero extensions is often completely free on x86_64, so make
>>  it as cheap as possible.  */
>>   if (TARGET_64BIT && mode == DImode
>>   && GET_MODE (XEXP (x, 0)) == SImode)
>> *total = 1;
>>
>>   This patch fixes the problem by converting all costs which are less than
>> COSTS_N_INSNS (1) to COSTS_N_INSNS (1).
>>
>>   Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
>> regressions. Is it OK for the trunk?
>
> But if targets return sth < COSTS_N_INSNS (1) but > 0 this is now no
> longer meaningful.  So shouldn't it instead be
>
>   return cost > 0 ? cost : 1;
>
> ?  Alternatively returning fractions of COSTS_N_INSNS (1) from set_src_cost
> is invalid and thus the target is at fault (I do think that making zero the
> unknown value is quite bad since that makes it impossible to have zero
> as cost represented).

I agree zero is an unfortunate choice.  No-op moves should really have
zero cost, without having to be special-cased by callers.  And it came
as a surprise to me that we had this rule.

But like Segher says, it seems to have been around for a long time
(since 2004 by the looks of it, r0-59417).  Which just goes to show,
every day is a learning day. :)

IMO it would be nice to change it.  But then it would be even nicer
to get rid of pattern_cost and move everything to insn_cost.  And that's
going to be a lot of work to do together.

Maybe a compromise would be to open-code pattern_cost into insn_cost
and change the return value for insn_cost only?  That would still mean
auditing all current uses of insn_cost and all current target definitions
of the insn_cost hook, but at least it would be isolated from the work
of removing pattern_cost.

Thanks,
Richard

Re: [PATCH v4 5/6] bpf,btf: enable BTF pruning by default for BPF

2024-06-12 Thread Jose E. Marchesi



Hi Faust.
Thanks for the patch.
Please see a question below.

> This patch enables -gprune-btf by default in the BPF backend when
> generating BTF information, and fixes BPF CO-RE generation when using
> -gprune-btf.
>
> When generating BPF CO-RE information, we must ensure that types used
> in CO-RE relocations always have sufficient BTF information emited so
> that the CO-RE relocations can be processed by a BPF loader.  The BTF
> pruning algorithm on its own does not have sufficient information to
> determine which types are used in a BPF CO-RE relocation, so this
> information must be supplied by the BPF backend, using a new
> btf_mark_type_used function.
>
> Co-authored-by: Cupertino Miranda 
>
> gcc/
>   * btfout.cc (btf_mark_type_used): New.
>   * ctfc.h (btf_mark_type_used): Declare it here.
>   * config/bpf/bpf.cc (bpf_option_override): Enable -gprune-btf
>   by default if -gbtf is enabled.
>   * config/bpf/core-builtins.cc (extra_fn): New typedef.
>   (compute_field_expr): Add callback parameter, and call it if supplied.
>   Fix computation for MEM_REF.
>   (mark_component_type_as_used): New.
>   (bpf_mark_types_as_used): Likewise.
>   (bpf_expand_core_builtin): Call here.
>   * doc/invoke.texi (Debugging Options): Note that -gprune-btf is
>   enabled by default for BPF target when generating BTF.
>
> gcc/testsuite/
>   * gcc.dg/debug/btf/btf-variables-5.c: Adjust one test for bpf-*-*
>   target.
> ---
>  gcc/btfout.cc | 22 ++
>  gcc/config/bpf/bpf.cc |  5 ++
>  gcc/config/bpf/core-builtins.cc   | 71 +--
>  gcc/ctfc.h|  1 +
>  gcc/doc/invoke.texi   |  3 +
>  .../gcc.dg/debug/btf/btf-variables-5.c|  6 +-
>  6 files changed, 100 insertions(+), 8 deletions(-)
>
> diff --git a/gcc/btfout.cc b/gcc/btfout.cc
> index 34d8cec0a2e3..083ca48d6279 100644
> --- a/gcc/btfout.cc
> +++ b/gcc/btfout.cc
> @@ -1503,6 +1503,28 @@ btf_assign_datasec_ids (ctf_container_ref ctfc)
>  }
>  }
>  
> +
> +/* Manually mark that type T is used to ensure it will not be pruned.
> +   Used by the BPF backend when generating BPF CO-RE to mark types used
> +   in CO-RE relocations.  */
> +
> +void
> +btf_mark_type_used (tree t)
> +{
> +  /* If we are not going to prune anyway, this is a no-op.  */
> +  if (!debug_prune_btf)
> +return;
> +
> +  gcc_assert (TYPE_P (t));
> +  ctf_container_ref ctfc = ctf_get_tu_ctfc ();
> +  ctf_dtdef_ref dtd = ctf_lookup_tree_type (ctfc, t);
> +
> +  if (!dtd)
> +return;
> +
> +  btf_add_used_type (ctfc, dtd, false, false, true);
> +}
> +
>  /* Callback used for assembling the only-used-types list.  Note that this is
> the same as btf_type_list_cb above, but the hash_set traverse requires a
> different function signature.  */
> diff --git a/gcc/config/bpf/bpf.cc b/gcc/config/bpf/bpf.cc
> index dd1bfe38d29b..c62af7a6efa7 100644
> --- a/gcc/config/bpf/bpf.cc
> +++ b/gcc/config/bpf/bpf.cc
> @@ -221,6 +221,11 @@ bpf_option_override (void)
>&& !(target_flags_explicit & MASK_BPF_CORE))
>  target_flags |= MASK_BPF_CORE;
>  
> +  /* -gbtf implies -gprune-btf for BPF target.  */
> +  if (btf_debuginfo_p ())
> +SET_OPTION_IF_UNSET (&global_options, &global_options_set,
> +  debug_prune_btf, true);
> +
>/* Determine available features from ISA setting (-mcpu=).  */
>if (bpf_has_jmpext == -1)
>  bpf_has_jmpext = (bpf_isa >= ISA_V2);
> diff --git a/gcc/config/bpf/core-builtins.cc b/gcc/config/bpf/core-builtins.cc
> index 232bebcadbd5..86e2e9d6e39f 100644
> --- a/gcc/config/bpf/core-builtins.cc
> +++ b/gcc/config/bpf/core-builtins.cc
> @@ -624,13 +624,20 @@ bpf_core_get_index (const tree node, bool *valid)
>  
> ALLOW_ENTRY_CAST is an input arguments and specifies if the function 
> should
> consider as valid expressions in which NODE entry is a cast expression (or
> -   tree code nop_expr).  */
> +   tree code nop_expr).
> +
> +   EXTRA_FN is a callback function to allow extra functionality with this
> +   function traversal.  Currently used for marking used type during expand
> +   pass.  */
> +
> +typedef void (*extra_fn) (tree);
>  
>  static unsigned char
>  compute_field_expr (tree node, unsigned int *accessors,
>   bool *valid,
>   tree *access_node,
> - bool allow_entry_cast = true)
> + bool allow_entry_cast = true,
> + extra_fn callback = NULL)
>  {
>unsigned char n = 0;
>unsigned int fake_accessors[MAX_NR_ACCESSORS];
> @@ -647,6 +654,9 @@ compute_field_expr (tree node, unsigned int *accessors,
>  
>*access_node = node;
>  
> +  if (callback != NULL)
> +callback (node);
> +
>switch (TREE_CODE (node))
>  {
>  case INDIRECT_REF:
> @@ -664,17 +674,19 @@ compute_field_expr (tree node, unsigned int *accessors,
>

Re: [Committed] RISC-V: Add basic Zaamo and Zalrsc support

2024-06-12 Thread Patrick O'Neill




On 6/12/24 04:21, Andreas Schwab wrote:

On Jun 12 2024, Li, Pan2 wrote:


Do we need to upgrade the binutils of the riscv-gnu-toolchain repo? Or we may 
have unknown prefixed ISA extension `zaamo' when building.

There needs to be a configure check if binutils can grok the extension.


Ack. I'll make a patch for that.

In the meantime bumping binutils to tip-of-tree will resolve the build 
issue.


Patrick

Re: [Committed] RISC-V: Add basic Zaamo and Zalrsc support

2024-06-12 Thread Palmer Dabbelt


On Wed, 12 Jun 2024 10:09:06 PDT (-0700), Patrick O'Neill wrote:


On 6/12/24 04:21, Andreas Schwab wrote:

On Jun 12 2024, Li, Pan2 wrote:


Do we need to upgrade the binutils of the riscv-gnu-toolchain repo? Or we may 
have unknown prefixed ISA extension `zaamo' when building.

There needs to be a configure check if binutils can grok the extension.


Ack. I'll make a patch for that.


Thanks.  That's how we usually handle this stuff, it keeps the world 
building.




In the meantime bumping binutils to tip-of-tree will resolve the build
issue.

Patrick

[PATCH 1/3] Remove const char * support for asm constexpr

asm constexpr now only accepts the same string types as C++26 assert,
e.g. string_view and string. Adjust test suite and documentation.

gcc/cp/ChangeLog:

* parser.cc (cp_parser_asm_string_expression): Remove support
  for const char * for asm constexpr.

gcc/ChangeLog:

* doc/extend.texi: Use std::string_view in asm constexpr
example.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1z/constexpr-asm-1.C: Use std::std_string_view.
* g++.dg/cpp1z/constexpr-asm-3.C: Dito.
---
 gcc/cp/parser.cc |  7 ---
 gcc/doc/extend.texi  |  3 ++-
 gcc/testsuite/g++.dg/cpp1z/constexpr-asm-1.C | 12 +++-
 gcc/testsuite/g++.dg/cpp1z/constexpr-asm-3.C | 12 +++-
 4 files changed, 16 insertions(+), 18 deletions(-)

diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index de5f0483c120..98e8ca10ac40 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -22852,13 +22852,6 @@ cp_parser_asm_string_expression (cp_parser *parser)
   tree string = cp_parser_constant_expression (parser);
   if (string != error_mark_node)
string = cxx_constant_value (string, tf_error);
-  if (TREE_CODE (string) == NOP_EXPR)
-   string = TREE_OPERAND (string, 0);
-  if (TREE_CODE (string) == ADDR_EXPR
- && TREE_CODE (TREE_OPERAND (string, 0)) == STRING_CST)
-   string = TREE_OPERAND (string, 0);
-  if (TREE_CODE (string) == VIEW_CONVERT_EXPR)
-   string = TREE_OPERAND (string, 0);
   cexpr_str cstr (string);
   if (!cstr.type_check (tok->location))
return error_mark_node;
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 17e26c5004c1..ee3644a52645 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -10716,7 +10716,8 @@ message. Any string is converted to the character set 
of the source code.
 When this feature is available the @code{__GXX_CONSTEXPR_ASM__} cpp symbol is 
defined.
 
 @example
-constexpr const char *genfoo() @{ return "foo"; @}
+#include 
+constexpr std::string_view genfoo() @{ return "foo"; @}
 
 void function()
 @{
diff --git a/gcc/testsuite/g++.dg/cpp1z/constexpr-asm-1.C 
b/gcc/testsuite/g++.dg/cpp1z/constexpr-asm-1.C
index 7cc6b37d6208..311209acb43b 100644
--- a/gcc/testsuite/g++.dg/cpp1z/constexpr-asm-1.C
+++ b/gcc/testsuite/g++.dg/cpp1z/constexpr-asm-1.C
@@ -1,22 +1,24 @@
 /* { dg-do compile } */
-/* { dg-options "-std=gnu++11" } */
+/* { dg-options "-std=gnu++17" } */
 
-constexpr const char *genfoo ()
+#include 
+
+constexpr std::string_view genfoo ()
 {
   return "foo %1,%0";
 }
 
-constexpr const char *genoutput ()
+constexpr std::string_view genoutput ()
 {
   return "=r";
 }
 
-constexpr const char *geninput ()
+constexpr std::string_view geninput ()
 {
   return "r";
 }
 
-constexpr const char *genclobber ()
+constexpr std::string_view genclobber ()
 {
   return "memory";
 }
diff --git a/gcc/testsuite/g++.dg/cpp1z/constexpr-asm-3.C 
b/gcc/testsuite/g++.dg/cpp1z/constexpr-asm-3.C
index d33631876bdc..ef8a35a0b3ba 100644
--- a/gcc/testsuite/g++.dg/cpp1z/constexpr-asm-3.C
+++ b/gcc/testsuite/g++.dg/cpp1z/constexpr-asm-3.C
@@ -1,22 +1,24 @@
 /* { dg-do compile } */
-/* { dg-options "-std=gnu++11" } */
+/* { dg-options "-std=gnu++17" } */
 
-constexpr const char *genfoo ()
+#include 
+
+constexpr std::string_view genfoo ()
 {
   return "foo %1,%0";
 }
 
-constexpr const char *genoutput ()
+constexpr std::string_view genoutput ()
 {
   return "=r";
 }
 
-constexpr const char *geninput ()
+constexpr std::string_view geninput ()
 {
   return "r";
 }
 
-constexpr const char *genclobber ()
+constexpr std::string_view genclobber ()
 {
   return "memory";
 }
-- 
2.45.1

[PATCH 2/3] Parse close paren even when constexpr extraction fails

To get better error recovery.

gcc/cp/ChangeLog:

* parser.cc (cp_parser_asm_string_expression): Parse close
parent when constexpr extraction fails.
---
 gcc/cp/parser.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index 98e8ca10ac40..adc4e6fc1aee 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -22856,7 +22856,7 @@ cp_parser_asm_string_expression (cp_parser *parser)
   if (!cstr.type_check (tok->location))
return error_mark_node;
   if (!cstr.extract (tok->location, string))
-   return error_mark_node;
+   string = error_mark_node;
   parens.require_close (parser);
   return string;
 }
-- 
2.45.1

[PATCH 3/3] Fix error message

gcc/cp/ChangeLog:

* parser.cc (cp_parser_asm_string_expression): Use correct error
message.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1z/constexpr-asm-3.C: Adjust for new message.
---
 gcc/cp/parser.cc | 2 +-
 gcc/testsuite/g++.dg/cpp1z/constexpr-asm-3.C | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index adc4e6fc1aee..01a19080d6c7 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -22863,7 +22863,7 @@ cp_parser_asm_string_expression (cp_parser *parser)
   else if (!cp_parser_is_string_literal (tok))
 {
   error_at (tok->location,
-   "expected string-literal or constexpr in brackets");
+   "expected string-literal or constexpr in parentheses");
   return error_mark_node;
 }
   return cp_parser_string_literal (parser, false, false);
diff --git a/gcc/testsuite/g++.dg/cpp1z/constexpr-asm-3.C 
b/gcc/testsuite/g++.dg/cpp1z/constexpr-asm-3.C
index ef8a35a0b3ba..0cf8940e109c 100644
--- a/gcc/testsuite/g++.dg/cpp1z/constexpr-asm-3.C
+++ b/gcc/testsuite/g++.dg/cpp1z/constexpr-asm-3.C
@@ -26,7 +26,7 @@ constexpr std::string_view genclobber ()
 void f()
 {
   int a;
-  asm(genfoo () : /* { dg-error "expected string-literal or constexpr in 
brackets" } */
+  asm(genfoo () : /* { dg-error "expected string-literal or constexpr in 
parentheses" } */
   genoutput() (a) :
   geninput() (1) :
   genclobber());
-- 
2.45.1

Re: [PATCH v4 5/6] bpf,btf: enable BTF pruning by default for BPF

2024-06-12 Thread David Faust




On 6/12/24 09:55, Jose E. Marchesi wrote:
> 
> Hi Faust.
> Thanks for the patch.
> Please see a question below.
> 
>> This patch enables -gprune-btf by default in the BPF backend when
>> generating BTF information, and fixes BPF CO-RE generation when using
>> -gprune-btf.
>>
>> When generating BPF CO-RE information, we must ensure that types used
>> in CO-RE relocations always have sufficient BTF information emited so
>> that the CO-RE relocations can be processed by a BPF loader.  The BTF
>> pruning algorithm on its own does not have sufficient information to
>> determine which types are used in a BPF CO-RE relocation, so this
>> information must be supplied by the BPF backend, using a new
>> btf_mark_type_used function.
>>
>> Co-authored-by: Cupertino Miranda 
>>
>> gcc/
>>  * btfout.cc (btf_mark_type_used): New.
>>  * ctfc.h (btf_mark_type_used): Declare it here.
>>  * config/bpf/bpf.cc (bpf_option_override): Enable -gprune-btf
>>  by default if -gbtf is enabled.
>>  * config/bpf/core-builtins.cc (extra_fn): New typedef.
>>  (compute_field_expr): Add callback parameter, and call it if supplied.
>>  Fix computation for MEM_REF.
>>  (mark_component_type_as_used): New.
>>  (bpf_mark_types_as_used): Likewise.
>>  (bpf_expand_core_builtin): Call here.
>>  * doc/invoke.texi (Debugging Options): Note that -gprune-btf is
>>  enabled by default for BPF target when generating BTF.
>>
>> gcc/testsuite/
>>  * gcc.dg/debug/btf/btf-variables-5.c: Adjust one test for bpf-*-*
>>  target.
>> ---
>>  gcc/btfout.cc | 22 ++
>>  gcc/config/bpf/bpf.cc |  5 ++
>>  gcc/config/bpf/core-builtins.cc   | 71 +--
>>  gcc/ctfc.h|  1 +
>>  gcc/doc/invoke.texi   |  3 +
>>  .../gcc.dg/debug/btf/btf-variables-5.c|  6 +-
>>  6 files changed, 100 insertions(+), 8 deletions(-)
>>
>> diff --git a/gcc/btfout.cc b/gcc/btfout.cc
>> index 34d8cec0a2e3..083ca48d6279 100644
>> --- a/gcc/btfout.cc
>> +++ b/gcc/btfout.cc
>> @@ -1503,6 +1503,28 @@ btf_assign_datasec_ids (ctf_container_ref ctfc)
>>  }
>>  }
>>  
>> +
>> +/* Manually mark that type T is used to ensure it will not be pruned.
>> +   Used by the BPF backend when generating BPF CO-RE to mark types used
>> +   in CO-RE relocations.  */
>> +
>> +void
>> +btf_mark_type_used (tree t)
>> +{
>> +  /* If we are not going to prune anyway, this is a no-op.  */
>> +  if (!debug_prune_btf)
>> +return;
>> +
>> +  gcc_assert (TYPE_P (t));
>> +  ctf_container_ref ctfc = ctf_get_tu_ctfc ();
>> +  ctf_dtdef_ref dtd = ctf_lookup_tree_type (ctfc, t);
>> +
>> +  if (!dtd)
>> +return;
>> +
>> +  btf_add_used_type (ctfc, dtd, false, false, true);
>> +}
>> +
>>  /* Callback used for assembling the only-used-types list.  Note that this is
>> the same as btf_type_list_cb above, but the hash_set traverse requires a
>> different function signature.  */
>> diff --git a/gcc/config/bpf/bpf.cc b/gcc/config/bpf/bpf.cc
>> index dd1bfe38d29b..c62af7a6efa7 100644
>> --- a/gcc/config/bpf/bpf.cc
>> +++ b/gcc/config/bpf/bpf.cc
>> @@ -221,6 +221,11 @@ bpf_option_override (void)
>>&& !(target_flags_explicit & MASK_BPF_CORE))
>>  target_flags |= MASK_BPF_CORE;
>>  
>> +  /* -gbtf implies -gprune-btf for BPF target.  */
>> +  if (btf_debuginfo_p ())
>> +SET_OPTION_IF_UNSET (&global_options, &global_options_set,
>> + debug_prune_btf, true);
>> +
>>/* Determine available features from ISA setting (-mcpu=).  */
>>if (bpf_has_jmpext == -1)
>>  bpf_has_jmpext = (bpf_isa >= ISA_V2);
>> diff --git a/gcc/config/bpf/core-builtins.cc 
>> b/gcc/config/bpf/core-builtins.cc
>> index 232bebcadbd5..86e2e9d6e39f 100644
>> --- a/gcc/config/bpf/core-builtins.cc
>> +++ b/gcc/config/bpf/core-builtins.cc
>> @@ -624,13 +624,20 @@ bpf_core_get_index (const tree node, bool *valid)
>>  
>> ALLOW_ENTRY_CAST is an input arguments and specifies if the function 
>> should
>> consider as valid expressions in which NODE entry is a cast expression 
>> (or
>> -   tree code nop_expr).  */
>> +   tree code nop_expr).
>> +
>> +   EXTRA_FN is a callback function to allow extra functionality with this
>> +   function traversal.  Currently used for marking used type during expand
>> +   pass.  */
>> +
>> +typedef void (*extra_fn) (tree);
>>  
>>  static unsigned char
>>  compute_field_expr (tree node, unsigned int *accessors,
>>  bool *valid,
>>  tree *access_node,
>> -bool allow_entry_cast = true)
>> +bool allow_entry_cast = true,
>> +extra_fn callback = NULL)
>>  {
>>unsigned char n = 0;
>>unsigned int fake_accessors[MAX_NR_ACCESSORS];
>> @@ -647,6 +654,9 @@ compute_field_expr (tree node, unsigned int *accessors,
>>  
>>*access_node = node;
>>  
>> +  if (callback != NULL)
>> +callback

[PATCH] c++: ICE w/ ambig and non-strictly-viable cands [PR115239]

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk/14?

-- >8 --

Here during overload resolution we have two strictly viable ambiguous
candidates #1 and #2, and two non-strictly viable candidates #3 and #4
which we hold on to ever since r14-6522.  These latter candidates have
an empty third arg conversion since the second arg conversion was deemed
bad.  This ends up causing an ICE during joust for #3 and #4 due to this
empty arg conversion.

We can fix this by making joust robust to empty arg conversions, but in
this situation we shouldn't need to compare #3 and #4 at all given that
we have a strictly viable candidate.  To that end, this patch makes
tourney shortcut considering non-strictly viable candidates upon
encountering ambiguity between two strictly viable candidates, taking
advantage of the fact that the candidates list is sorted according to
viability via splice_viable.

PR c++/115239

gcc/cp/ChangeLog:

* call.cc (tourney): Don't consider a non-strictly viable
candidate as the champ if there was ambiguity between two
strictly viable candidates.

gcc/testsuite/ChangeLog:

* g++.dg/overload/error7.C: New test.
---
 gcc/cp/call.cc |  4 +++-
 gcc/testsuite/g++.dg/overload/error7.C | 10 ++
 2 files changed, 13 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/overload/error7.C

diff --git a/gcc/cp/call.cc b/gcc/cp/call.cc
index ed68eb3c568..82c70f5c39f 100644
--- a/gcc/cp/call.cc
+++ b/gcc/cp/call.cc
@@ -13484,9 +13484,11 @@ tourney (struct z_candidate *candidates, 
tsubst_flags_t complain)
}
   else
{
+ z_candidate *prev_champ = *champ;
  previous_worse_champ = nullptr;
  champ = &(*challenger)->next;
- if (!*champ || !(*champ)->viable)
+ if (!*champ || !(*champ)->viable
+ || (prev_champ->viable == 1 && (*champ)->viable == -1))
{
  champ = nullptr;
  break;
diff --git a/gcc/testsuite/g++.dg/overload/error7.C 
b/gcc/testsuite/g++.dg/overload/error7.C
new file mode 100644
index 000..68aaa236de4
--- /dev/null
+++ b/gcc/testsuite/g++.dg/overload/error7.C
@@ -0,0 +1,10 @@
+// PR c++/115239
+
+bool foo(const char *, char *, long); // #1, strictly viable, ambig with #2
+bool foo(const char *, char *, unsigned); // #2, strictly viable, ambig with #1
+bool foo(char, char, long);   // #3, non-strictly viable
+bool foo(char, char, unsigned);   // #4, non-strictly viable
+
+int main() {
+  foo((char *)0, (char *)0, 0); // { dg-error "ambiguous" }
+}
-- 
2.45.2.457.g8d94cfb545

Re: [PATCH] c++: ICE w/ ambig and non-strictly-viable cands [PR115239]

On Wed, 12 Jun 2024, Patrick Palka wrote:

> Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
> trunk/14?
> 
> -- >8 --
> 
> Here during overload resolution we have two strictly viable ambiguous
> candidates #1 and #2, and two non-strictly viable candidates #3 and #4
> which we hold on to ever since r14-6522.  These latter candidates have
> an empty third arg conversion since the second arg conversion was deemed
> bad.  This ends up causing an ICE during joust for #3 and #4 due to this
> empty arg conversion.
> 
> We can fix this by making joust robust to empty arg conversions, but in
> this situation we shouldn't need to compare #3 and #4 at all given that
> we have a strictly viable candidate.  To that end, this patch makes
> tourney shortcut considering non-strictly viable candidates upon
> encountering ambiguity between two strictly viable candidates, taking
> advantage of the fact that the candidates list is sorted according to
> viability via splice_viable.
> 
>   PR c++/115239
> 
> gcc/cp/ChangeLog:
> 
>   * call.cc (tourney): Don't consider a non-strictly viable
>   candidate as the champ if there was ambiguity between two
>   strictly viable candidates.
> 
> gcc/testsuite/ChangeLog:
> 
>   * g++.dg/overload/error7.C: New test.
> ---
>  gcc/cp/call.cc |  4 +++-
>  gcc/testsuite/g++.dg/overload/error7.C | 10 ++
>  2 files changed, 13 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/g++.dg/overload/error7.C
> 
> diff --git a/gcc/cp/call.cc b/gcc/cp/call.cc
> index ed68eb3c568..82c70f5c39f 100644
> --- a/gcc/cp/call.cc
> +++ b/gcc/cp/call.cc
> @@ -13484,9 +13484,11 @@ tourney (struct z_candidate *candidates, 
> tsubst_flags_t complain)
>   }
>else
>   {
> +   z_candidate *prev_champ = *champ;
> previous_worse_champ = nullptr;
> champ = &(*challenger)->next;
> -   if (!*champ || !(*champ)->viable)
> +   if (!*champ || !(*champ)->viable
> +   || (prev_champ->viable == 1 && (*champ)->viable == -1))
>   {
> champ = nullptr;
> break;
> diff --git a/gcc/testsuite/g++.dg/overload/error7.C 
> b/gcc/testsuite/g++.dg/overload/error7.C
> new file mode 100644
> index 000..68aaa236de4
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/overload/error7.C
> @@ -0,0 +1,10 @@
> +// PR c++/115239
> +
> +bool foo(const char *, char *, long); // #1, strictly viable, ambig with 
> #2
> +bool foo(const char *, char *, unsigned); // #2, strictly viable, ambig with 
> #1
> +bool foo(char, char, long);   // #3, non-strictly viable
> +bool foo(char, char, unsigned);   // #4, non-strictly viable
> +
> +int main() {
> +  foo((char *)0, (char *)0, 0); // { dg-error "ambiguous" }
> +}

FWIW I just realized this testcase can be simplified to:

  // PR c++/115239

  bool foo(char *, long); // #1, strictly viable, ambig with #2
  bool foo(char *, unsigned); // #2, strictly viable, ambig with #1
  bool foo(char, long);   // #3, non-strictly viable
  bool foo(char, unsigned);   // #4, non-strictly viable

  int main() {
foo((char *)0, 0); // { dg-error "ambiguous" }
  }

Re: [PATCH v2] Test: Move target independent test cases to gcc.dg/torture





On 6/11/24 8:53 AM, pan2...@intel.com wrote:

From: Pan Li 

The test cases of pr115387 are target independent,  at least x86
and riscv are able to reproduce.  Thus,  move these cases to
the gcc.dg/torture.

The below test suites are passed.
1. The rv64gcv fully regression test.
2. The x86 fully regression test.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/pr115387-1.c: Move to...
* gcc.dg/torture/pr115387-1.c: ...here.
* gcc.target/riscv/pr115387-2.c: Move to...
* gcc.dg/torture/pr115387-2.c: ...here.

OK
jeff

Re: [PATCH 0/3] RISC-V: Amo testsuite cleanup





On 6/11/24 12:03 PM, Patrick O'Neill wrote:

This series moves the atomic-related riscv testcases into their own folder and
fixes some minor bugs/rigidity of existing testcases.

This series is OK.
jeff

Re: [PATCH v2 2/3] RISC-V: Add Zalrsc and Zaamo testsuite support





On 6/11/24 12:21 PM, Patrick O'Neill wrote:



I made the whitespace cleanup patch (trailing whitespace, leading groups 
of 8 spaces -> tabs) for

target-supports.exp and got a diff of 584 lines.

Is this still worth doing or will it be too disruptive for rebasing/ 
other people's development?
I don't think it's overly disruptive.  This stuff doesn't have a lot of 
churn.  It'd be different if you were reformatting the whole tree :-)


Consider those fixes pre-approved.

jeff

Re: [RFC][PATCH] PR tree-optimization/109071 - -Warray-bounds false positive warnings due to code duplication from jump threading

2024-06-12 Thread Qing Zhao

An update on this task. (Also need more suggestions -:)

I have an initial implemenation locally, with this gcc, I can get the following 
message with the testing case in PR109071:

[ 109071]$ cat t.c
extern void warn(void);
static inline void assign(int val, int *regs, int *index)
{
  if (*index >= 4)
warn();
  *regs = val;
}
struct nums {int vals[4];};

void sparx5_set (int *ptr, struct nums *sg, int index)
{
  int *val = &sg->vals[index];

  assign(0,ptr, &index);
  assign(*val, ptr, &index);
}

[109071]$ sh t
/home/opc/Install/latest-d/bin/gcc -O2 -Warray-bounds=1 -c -o t.o t.c
t.c: In function ‘sparx5_set’:
t.c:12:23: warning: array subscript 4 is above array bounds of ‘int[4]’ 
[-Warray-bounds=]
   12 |   int *val = &sg->vals[index];
  |   ^~~
  event 1
|
|4 |   if (*index >= 4)
|  |  ^
|  |  |
|  |  (1) when the condition is evaluated to true
|
t.c:8:18: note: while referencing ‘vals’
8 | struct nums {int vals[4];};
  |  ^~~~

1. Is the above diagnostic message good enough? Any more suggestion?
2. It’s hard for me to come up with a more complicate testing case that has one 
basic block copied multiple times by the jump thread,  do you have any pointer 
to such testing cases? 

Thanks a lot for any help.

Qing

> On Jun 7, 2024, at 15:13, Qing Zhao  wrote:
> 
> Hi, Richard,
> 
>> On Jun 5, 2024, at 13:58, Qing Zhao  wrote:
>>> 
>> Like this?
>> 
>> diff --git a/libcpp/include/line-map.h b/libcpp/include/line-map.h
>> index e6e2b0897572..ee344f91333b 100644
>> --- a/libcpp/include/line-map.h
>> +++ b/libcpp/include/line-map.h
>> @@ -761,8 +761,9 @@ struct GTY(()) maps_info_macro {
>> struct GTY(()) location_adhoc_data {
>> location_t locus;
>> source_range src_range;
>> -  void * GTY((skip)) data;
>> unsigned discriminator;
>> +  void * GTY((skip)) data;
>> +  void * GTY((skip)) copy_history;
>> };
>> struct htab;
> 
> Yes.
> 
>> How about the copy_history? Do we need a new data structure (like
>> the following, or other suggestion) for this? Where should I add
>> this new data structure?
> 
> As it needs to be managed by libcpp it should be in this very same
> file.
> 
>> struct copy_history {
>> location_t condition;
>> Bool is_true_path;
>> }
> 
> I think we want a pointer to the previous copy-of state as well in
> case a stmt
> was copied twice.  We'll see whether a single (condition) location
> plus edge flag
> is sufficient.  I'd say we should plan for an enum to indicate the
> duplication
> reason at least (jump threading, unswitching, unrolling come to my
> mind).  For
> jump threading being able to say "when  is true/false" is
> probably
> good enough, though it might not always be easy to identify a single
> condition
> here given a threading path starts at an incoming edge to a CFG merge
> and
> will usually end with the outgoing edge of a condition that we can
> statically
> evaluate.  The condition controlling the path entry might not be
> determined
> fully by a single condition location.
> 
> Possibly building a full "diagnostic path" object at threading time
> might be
> the only way to record all the facts, but that's also going to be
> more
> expensive.
 
 Note that a diagnostic_path represents a path through some kind of
 graph, whereas it sounds like you want to be storing the *nodes* in the
 graph, and later generating the diagnostic_path from that graph when we
 need it (which is trivial if the graph is actually just a tree: just
 follow the parent links backwards, then reverse it).
>>> 
>>> I think we are mixing two things - one is that a single transform like 
>>> jump
>>> threading produces a stmt copy and when we emit a diagnostic on that
>>> copied statement we want to tell the user the condition under which the
>>> copy is executed.  That "condition" can be actually a sequence of
>>> conditionals.  I wanted to point out that a diagnostic_path instance 
>>> could
>>> be used to describe such complex condition.
>>> 
>>> But then the other thing I wanted to address with the link to a previous
>>> copy_history - that's when a statement gets copied twice, for example
>>> by two distinct jump threading optimizations.  Like when dumping
>>> the inlining decisions for diagnostics we could dump the logical "and"
>>> of the conditions of the two threadings.  Since we have a single
>>> location per GIMPLE stmt we'd have to ke

[Committed] RISC-V: Amo testsuite cleanup

2024-06-12 Thread Patrick O'Neill




On 6/12/24 11:12, Jeff Law wrote:



On 6/11/24 12:03 PM, Patrick O'Neill wrote:
This series moves the atomic-related riscv testcases into their own 
folder and

fixes some minor bugs/rigidity of existing testcases.

This series is OK.
jeff


Committed, thanks.

Patrick

[PATCH] c++: undeclared identifier in requires-clause [PR99678]

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK
for trunk/14?

-- >8 --

Since the terms of a requires-clause are grammatically primary-expressions
rather than e.g. postfix-expressions, it seems we need to explicitly
handle and diagnose the case where a term parses to bare unresolved
unqualified-id, like cp_parser_postfix_expression does, since
cp_parser_primary_expression doesn't do so itself.  Otherwise we fail to
reject the first three requires-clauses those below.

Note that the only users of primary-expression in the C++ grammar are
indeed postfix-expression and constraint-logical-and-expression, so it
seems not so surprising that we need this special handling here.

PR c++/99678

gcc/cp/ChangeLog:

* parser.cc (cp_parser_constraint_primary_expression): Reject
a bare unresolved unqualified-id.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/concepts-requires38.C: New test.
---
 gcc/cp/parser.cc |  2 ++
 gcc/testsuite/g++.dg/cpp2a/concepts-requires38.C | 14 ++
 2 files changed, 16 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-requires38.C

diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index bc4a2359153..0d59f7d2690 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -31496,6 +31496,8 @@ cp_parser_constraint_primary_expression (cp_parser 
*parser, bool lambda_p)
 }
   if (pce == pce_ok)
 {
+  if (idk == CP_ID_KIND_UNQUALIFIED && identifier_p (expr))
+   expr = unqualified_name_lookup_error (expr);
   cp_lexer_commit_tokens (parser->lexer);
   return finish_constraint_primary_expr (expr);
 }
diff --git a/gcc/testsuite/g++.dg/cpp2a/concepts-requires38.C 
b/gcc/testsuite/g++.dg/cpp2a/concepts-requires38.C
new file mode 100644
index 000..663195b79cc
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/concepts-requires38.C
@@ -0,0 +1,14 @@
+// PR c++/99678
+// { dg-do compile { target c++20 } }
+
+template
+void f1() requires undeclared_identifier; // { dg-error "not declared" }
+
+template
+void f2() requires true && undeclared_identifier; // { dg-error "not declared" 
}
+
+template
+void f3() requires false || undeclared_identifier; // { dg-error "not 
declared" }
+
+template
+void f4() requires undeclared_identifier(T{}); // { dg-error "must be enclosed 
in parentheses" }
-- 
2.45.2.457.g8d94cfb545

[pushed] pretty_printer: unbreak build on aarch64 [PR115465]

2024-06-12 Thread David Malcolm

I missed this target-specific usage of pretty_printer::buffer when
making the fields private in r15-1209-gc5e3be456888aa; sorry.

Verified that this fixes the build breakage with
--target=aarch64-unknown-linux-gnu.

Pushed as r15-1220-ge35f4eab68773b.

gcc/ChangeLog:
PR bootstrap/115465
* config/aarch64/aarch64-early-ra.cc (early_ra::process_block):
Update for fields of pretty_printer becoming private in
r15-1209-gc5e3be456888aa.

Signed-off-by: David Malcolm 
---
 gcc/config/aarch64/aarch64-early-ra.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/aarch64/aarch64-early-ra.cc 
b/gcc/config/aarch64/aarch64-early-ra.cc
index 1e2c823cb2eb..99324423ee5a 100644
--- a/gcc/config/aarch64/aarch64-early-ra.cc
+++ b/gcc/config/aarch64/aarch64-early-ra.cc
@@ -3446,7 +3446,7 @@ early_ra::process_block (basic_block bb, bool is_isolated)
fprintf (dump_file, "\nBlock %d:\n", bb->index);
  fprintf (dump_file, "%6d:", m_current_point);
  pretty_printer rtl_slim_pp;
- rtl_slim_pp.buffer->stream = dump_file;
+ rtl_slim_pp.set_output_stream (dump_file);
  print_insn (&rtl_slim_pp, insn, 1);
  pp_flush (&rtl_slim_pp);
  fprintf (dump_file, "\n");
-- 
2.26.3

Re: [PATCH 1/3] Remove ia64--linux from the list of obsolete targets


On 12/06/24 12:42 +0200, Rene Rebe wrote:

The following un-deprecates ia64*-*-linux for GCC 15. Since we plan to
support this for some years to come.

gcc/
   * config.gcc: Only exlicitly list ia64*-*-(hpux|vms|elf) in the
 list of obsoleted targets.

contrib/
   * config-list.mk (LIST): no --enable-obsolete for ia64*-*-linux.
---
contrib/config-list.mk | 4 ++--
gcc/config.gcc | 2 +-
2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/contrib/config-list.mk b/contrib/config-list.mk
index f282cd95c8d..b99573b1f5b 100644
--- a/contrib/config-list.mk
+++ b/contrib/config-list.mk
@@ -60,8 +60,8 @@ LIST = \
  i686-pc-linux-gnu i686-pc-msdosdjgpp i686-lynxos i686-nto-qnx \
  i686-rtems i686-solaris2.11 i686-wrs-vxworks \
  i686-wrs-vxworksae \
-  i686-cygwinOPT-enable-threads=yes i686-mingw32crt 
ia64-elfOPT-enable-obsolete \
-  ia64-linuxOPT-enable-obsolete ia64-hpuxOPT-enable-obsolete \
+  i686-cygwinOPT-enable-threads=yes i686-mingw32crt linux-ia64 \


Shouldn't this be ia64-linux? And why reorder the entries?

I would expect the change to be simply
s/ia64-linuxOPT-enable-obsolete/ia64-linux/


+  ia64-elfOPT-enable-obsolete ia64-hpuxOPT-enable-obsolete
  ia64-hp-vmsOPT-enable-obsolete iq2000-elf lm32-elf \
  lm32-rtems lm32-uclinux \
  loongarch64-linux-gnuf64 loongarch64-linux-gnuf32 loongarch64-linux-gnusf \
diff --git a/gcc/config.gcc b/gcc/config.gcc
index a37113bd00a..6d6ca6da7a0 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -272,7 +272,7 @@ esac

# Obsolete configurations.
case ${target} in
- ia64*-*-* \
+ ia64*-*-hpux* | ia64*-*-*vms* | ia64*-*-elf*  \
   | nios2*-*-* \
 )
if test "x$enable_obsolete" != xyes; then
--
2.45.0


--
 René Rebe, ExactCODE GmbH, Lietzenburger Str. 42, DE-10789 Berlin
 https://exactcode.com | https://t2sde.org | https://rene.rebe.de

Re: [PATCH 0/3] Remove ia64--linux from the list of obsolete targets

On 12/06/24 12:33 +0200, Rene Rebe wrote:

Hey there,

I wanted to come back to maintaining the ia64 port as discussed
preciously the other month on the gcc list.

It has been some days as we were busy releasing the biggest release of
our Embdded T2/Linux distribution [0] and we obviously did not want to
propose to enable LRA for IA-64 in the last last days of the gcc 14
release process.

We used the time to further stability test the LRA enabled GCC built
in T2/Linux and set up running the GCC testsuite accordingly for which
Frank posted test results from GCC git for reference [1] and w/ LRA
[2] enabled with only minimal changes, but also some new testsuite
passes. Due to the -j4 run I summed up the text files result manually
in LibreOffice:

But the .sum files already combine all the results into one file.

gcc
35572, 31789
33273, 28492
37189, 36804
28735, 37634
sum 134769, 134719

g++
69349, 61058
61467, 63545
61614, 63752
56027, 60102
sum 248457, 248457

gfortran
18895, 17502
19329, 19051
13950, 17583
17442, 15482
sum 69616, 69618

objc
693, 783
760, 669
609, 649
716, 677
sum 2778, 2778

ibstdc++
4495, 4635
4001, 3629
3958, 4580
4970, 4580
sum 17424, 17424

The LRA enabled built Linux kernel and whole user-land packages boot
and function normally, too.

Instead of looking into random test suite failures, I would first
rather try to allocate some time to look into some build failures for
more advanced real-world open source packages that I observered over
the last years and already occured unrelated of the LRA enablement.

> On Mar 7, 2024, at 20:08, Richard Biener  wrote:
>> I saw the deprecation of ia64*-*-* scrolling by [1].
>>
>> Which surprised me, as (minor bugs aside) gcc ia64*-*-linux just works for 
us and
>> we still actively support it as part of our T2 System Development 
Environment [2].
>>
>> For ia64 we are currently a team of three and also keep maintaining 
linux-kernel and
>> glibc git trees with ia64 restored and hope to bring back ia64 to linux 
upstream the
>> coming months as promised. [3]
>>
>> Despite popular believe ia64 actually just works for all those projects and 
we already
>> fixed the few minor bugs we could find or reproduce.
>>
>> Last week I also already patched and tested enabling LRA for ia64 in gcc [4] 
and could
>> -without any regression- compile a full ia64 T2/Linux release ISO that boots 
and runs
>> including an X desktop and Gtk applications. That was of course even with 
the latest
>> linux kernel and glibc versions with ia64 support restored respectively.
>>
>> Given there are currently no other volunteers, I therefore with this email 
step up and
>> offer to become ia64 maintainer for GCC to keep the code compiling, tested 
and
>> un-deprecated for the next years and releases to come.
>
> You’re welcome - we look forward to LRA enablement with ia64 and for it to 
get an
> active maintainer.  Note maintainers are appointed by the Steering Committee.

[0] https://t2sde.org/
[1] https://gcc.gnu.org/pipermail/gcc-testresults/2024-May/816346.html
[2] https://gcc.gnu.org/pipermail/gcc-testresults/2024-May/816422.html

--
 René Rebe, ExactCODE GmbH, Lietzenburger Str. 42, DE-10789 Berlin
 https://exactcode.com | https://t2sde.org | https://rene.rebe.de

Re: [PATCH v4 5/6] bpf,btf: enable BTF pruning by default for BPF

2024-06-12 Thread Jose E. Marchesi



> On 6/12/24 09:55, Jose E. Marchesi wrote:
>> 
>> Hi Faust.
>> Thanks for the patch.
>> Please see a question below.
>> 
>>> This patch enables -gprune-btf by default in the BPF backend when
>>> generating BTF information, and fixes BPF CO-RE generation when using
>>> -gprune-btf.
>>>
>>> When generating BPF CO-RE information, we must ensure that types used
>>> in CO-RE relocations always have sufficient BTF information emited so
>>> that the CO-RE relocations can be processed by a BPF loader.  The BTF
>>> pruning algorithm on its own does not have sufficient information to
>>> determine which types are used in a BPF CO-RE relocation, so this
>>> information must be supplied by the BPF backend, using a new
>>> btf_mark_type_used function.
>>>
>>> Co-authored-by: Cupertino Miranda 
>>>
>>> gcc/
>>> * btfout.cc (btf_mark_type_used): New.
>>> * ctfc.h (btf_mark_type_used): Declare it here.
>>> * config/bpf/bpf.cc (bpf_option_override): Enable -gprune-btf
>>> by default if -gbtf is enabled.
>>> * config/bpf/core-builtins.cc (extra_fn): New typedef.
>>> (compute_field_expr): Add callback parameter, and call it if supplied.
>>> Fix computation for MEM_REF.
>>> (mark_component_type_as_used): New.
>>> (bpf_mark_types_as_used): Likewise.
>>> (bpf_expand_core_builtin): Call here.
>>> * doc/invoke.texi (Debugging Options): Note that -gprune-btf is
>>> enabled by default for BPF target when generating BTF.
>>>
>>> gcc/testsuite/
>>> * gcc.dg/debug/btf/btf-variables-5.c: Adjust one test for bpf-*-*
>>> target.
>>> ---
>>>  gcc/btfout.cc | 22 ++
>>>  gcc/config/bpf/bpf.cc |  5 ++
>>>  gcc/config/bpf/core-builtins.cc   | 71 +--
>>>  gcc/ctfc.h|  1 +
>>>  gcc/doc/invoke.texi   |  3 +
>>>  .../gcc.dg/debug/btf/btf-variables-5.c|  6 +-
>>>  6 files changed, 100 insertions(+), 8 deletions(-)
>>>
>>> diff --git a/gcc/btfout.cc b/gcc/btfout.cc
>>> index 34d8cec0a2e3..083ca48d6279 100644
>>> --- a/gcc/btfout.cc
>>> +++ b/gcc/btfout.cc
>>> @@ -1503,6 +1503,28 @@ btf_assign_datasec_ids (ctf_container_ref ctfc)
>>>  }
>>>  }
>>>  
>>> +
>>> +/* Manually mark that type T is used to ensure it will not be pruned.
>>> +   Used by the BPF backend when generating BPF CO-RE to mark types used
>>> +   in CO-RE relocations.  */
>>> +
>>> +void
>>> +btf_mark_type_used (tree t)
>>> +{
>>> +  /* If we are not going to prune anyway, this is a no-op.  */
>>> +  if (!debug_prune_btf)
>>> +return;
>>> +
>>> +  gcc_assert (TYPE_P (t));
>>> +  ctf_container_ref ctfc = ctf_get_tu_ctfc ();
>>> +  ctf_dtdef_ref dtd = ctf_lookup_tree_type (ctfc, t);
>>> +
>>> +  if (!dtd)
>>> +return;
>>> +
>>> +  btf_add_used_type (ctfc, dtd, false, false, true);
>>> +}
>>> +
>>>  /* Callback used for assembling the only-used-types list.  Note that this 
>>> is
>>> the same as btf_type_list_cb above, but the hash_set traverse requires a
>>> different function signature.  */
>>> diff --git a/gcc/config/bpf/bpf.cc b/gcc/config/bpf/bpf.cc
>>> index dd1bfe38d29b..c62af7a6efa7 100644
>>> --- a/gcc/config/bpf/bpf.cc
>>> +++ b/gcc/config/bpf/bpf.cc
>>> @@ -221,6 +221,11 @@ bpf_option_override (void)
>>>&& !(target_flags_explicit & MASK_BPF_CORE))
>>>  target_flags |= MASK_BPF_CORE;
>>>  
>>> +  /* -gbtf implies -gprune-btf for BPF target.  */
>>> +  if (btf_debuginfo_p ())
>>> +SET_OPTION_IF_UNSET (&global_options, &global_options_set,
>>> +debug_prune_btf, true);
>>> +
>>>/* Determine available features from ISA setting (-mcpu=).  */
>>>if (bpf_has_jmpext == -1)
>>>  bpf_has_jmpext = (bpf_isa >= ISA_V2);
>>> diff --git a/gcc/config/bpf/core-builtins.cc 
>>> b/gcc/config/bpf/core-builtins.cc
>>> index 232bebcadbd5..86e2e9d6e39f 100644
>>> --- a/gcc/config/bpf/core-builtins.cc
>>> +++ b/gcc/config/bpf/core-builtins.cc
>>> @@ -624,13 +624,20 @@ bpf_core_get_index (const tree node, bool *valid)
>>>  
>>> ALLOW_ENTRY_CAST is an input arguments and specifies if the function 
>>> should
>>> consider as valid expressions in which NODE entry is a cast expression 
>>> (or
>>> -   tree code nop_expr).  */
>>> +   tree code nop_expr).
>>> +
>>> +   EXTRA_FN is a callback function to allow extra functionality with this
>>> +   function traversal.  Currently used for marking used type during expand
>>> +   pass.  */
>>> +
>>> +typedef void (*extra_fn) (tree);
>>>  
>>>  static unsigned char
>>>  compute_field_expr (tree node, unsigned int *accessors,
>>> bool *valid,
>>> tree *access_node,
>>> -   bool allow_entry_cast = true)
>>> +   bool allow_entry_cast = true,
>>> +   extra_fn callback = NULL)
>>>  {
>>>unsigned char n = 0;
>>>unsigned int fake_accessors[MAX_NR_ACCESSORS];
>>> @@ -647,6 +654,9 @@ compute_field_expr (tree no

[PATCH] Whitespace cleanup for target-supports.exp

2024-06-12 Thread Patrick O'Neill

This patch removes trailing whitespace and replaces leading groups of 8-16
spaces with tabs.

gcc/testsuite/ChangeLog:

* lib/target-supports.exp: Cleanup whitespace.

Signed-off-by: Patrick O'Neill 
---
Pre-approved here: 
https://inbox.sourceware.org/gcc-patches/3312c6a8-8f34-43f0-8562-99d64d502...@gmail.com/
I'll wait a half hour or so before committing.
---
 gcc/testsuite/lib/target-supports.exp | 1168 -
 1 file changed, 584 insertions(+), 584 deletions(-)

diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index e862a893244..e307f4e69ef 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -28,7 +28,7 @@
 # If ARGS is not empty, its first element is a string that
 # should be added to the command line.
 #
-# Assume by default that CONTENTS is C code.  
+# Assume by default that CONTENTS is C code.
 # Otherwise, code should contain:
 # "/* Assembly" for assembly code,
 # "// C++" for c++,
@@ -39,12 +39,12 @@
 # "// Go" for Go
 # "// Rust" for Rust
 # and "(* Modula-2" for Modula-2
-# If the tool is ObjC/ObjC++ then we overide the extension to .m/.mm to 
+# If the tool is ObjC/ObjC++ then we overide the extension to .m/.mm to
 # allow for ObjC/ObjC++ specific flags.
 
 proc check_compile {basename type contents args} {
 global tool
-verbose "check_compile tool: $tool for $basename" 
+verbose "check_compile tool: $tool for $basename"
 
 # Save additional_sources to avoid compiling testsuite's sources
 # against check_compile's source.
@@ -100,7 +100,7 @@ proc check_compile {basename type contents args} {
 global compiler_flags
 set save_compiler_flags $compiler_flags
 set lines [${tool}_target_compile $src $output $compile_type "$options"]
-set compiler_flags $save_compiler_flags 
+set compiler_flags $save_compiler_flags
 file delete $src
 
 set scan_output $output
@@ -280,8 +280,8 @@ proc check_configured_with { pattern } {
 set options [list "additional_flags=-v"]
 set gcc_output [${tool}_target_compile "" "" "none" $options]
 if { [ regexp "Configured with: \[^\n\]*$pattern" $gcc_output ] } {
-verbose "Matched: $pattern" 2
-return 1
+   verbose "Matched: $pattern" 2
+   return 1
 }
 
 verbose "Failed to match: $pattern" 2
@@ -301,19 +301,19 @@ proc check_weak_available { } {
 # All mips targets should support it
 
 if { [ string first "mips" $target_cpu ] >= 0 } {
-return 1
+   return 1
 }
 
 # All AIX targets should support it
 
 if { [istarget *-*-aix*] } {
-return 1
+   return 1
 }
 
 # All solaris2 targets should support it
 
 if { [istarget *-*-solaris2*] } {
-return 1
+   return 1
 }
 
 # Windows targets Cygwin and MingW32 support it
@@ -346,13 +346,13 @@ proc check_weak_available { } {
 set objformat [gcc_target_object_format]
 
 switch $objformat {
-elf  { return 1 }
-ecoff{ return 1 }
-a.out{ return 1 }
+   elf  { return 1 }
+   ecoff{ return 1 }
+   a.out{ return 1 }
mach-o   { return 1 }
som  { return 1 }
-unknown  { return -1 }
-default  { return 0 }
+   unknown  { return -1 }
+   default  { return 0 }
 }
 }
 
@@ -414,31 +414,31 @@ proc check_effective_target_vma_equals_lma { } {
if [string match "" $lines] then {
# No error messages
 
-set objdump_name [find_binutils_prog objdump]
-set output [remote_exec host "$objdump_name" "--section-headers 
--section=.data $exe"]
-set output [lindex $output 1]
-
-remote_file build delete $exe
-
-# Example output of objdump:
-#vma_equals_lma9059.exe: file format elf32-littlearm
-#
-#Sections:
-#Idx Name  Size  VMA   LMA   File off  Algn
-#  6 .data 0558  2000  08002658  0002  2**3
-#  CONTENTS, ALLOC, LOAD, DATA
-
-# Capture LMA and VMA columns for .data section
-if ![ regexp {\d*\d+\s+\.data\s+\d+\s+(\d+)\s+(\d+)} $output dummy 
vma lma ] {
-verbose "Could not parse objdump output" 2
-return 0
-} else {
-return [string equal $vma $lma]
-}
+   set objdump_name [find_binutils_prog objdump]
+   set output [remote_exec host "$objdump_name" "--section-headers 
--section=.data $exe"]
+   set output [lindex $output 1]
+
+   remote_file build delete $exe
+
+   # Example output of objdump:
+   #vma_equals_lma9059.exe: file format elf32-littlearm
+   #
+   #Sections:
+   #Idx Name  Size  VMA   LMA   File off  Algn
+   #  6 .data 0558  2000  080026

Re: [PATCH 1/3] Remove ia64--linux from the list of obsolete targets


On 12/06/24 19:40 +0100, Jonathan Wakely wrote:

On 12/06/24 12:42 +0200, Rene Rebe wrote:

The following un-deprecates ia64*-*-linux for GCC 15. Since we plan to
support this for some years to come.

gcc/
  * config.gcc: Only exlicitly list ia64*-*-(hpux|vms|elf) in the


"exlicitly"


list of obsoleted targets.

contrib/
  * config-list.mk (LIST): no --enable-obsolete for ia64*-*-linux.
---
contrib/config-list.mk | 4 ++--
gcc/config.gcc | 2 +-
2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/contrib/config-list.mk b/contrib/config-list.mk
index f282cd95c8d..b99573b1f5b 100644
--- a/contrib/config-list.mk
+++ b/contrib/config-list.mk
@@ -60,8 +60,8 @@ LIST = \
 i686-pc-linux-gnu i686-pc-msdosdjgpp i686-lynxos i686-nto-qnx \
 i686-rtems i686-solaris2.11 i686-wrs-vxworks \
 i686-wrs-vxworksae \
-  i686-cygwinOPT-enable-threads=yes i686-mingw32crt 
ia64-elfOPT-enable-obsolete \
-  ia64-linuxOPT-enable-obsolete ia64-hpuxOPT-enable-obsolete \
+  i686-cygwinOPT-enable-threads=yes i686-mingw32crt linux-ia64 \


Shouldn't this be ia64-linux? And why reorder the entries?

I would expect the change to be simply
s/ia64-linuxOPT-enable-obsolete/ia64-linux/


+  ia64-elfOPT-enable-obsolete ia64-hpuxOPT-enable-obsolete
 ia64-hp-vmsOPT-enable-obsolete iq2000-elf lm32-elf \
 lm32-rtems lm32-uclinux \
 loongarch64-linux-gnuf64 loongarch64-linux-gnuf32 loongarch64-linux-gnusf \
diff --git a/gcc/config.gcc b/gcc/config.gcc
index a37113bd00a..6d6ca6da7a0 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -272,7 +272,7 @@ esac

# Obsolete configurations.
case ${target} in
- ia64*-*-* \
+ ia64*-*-hpux* | ia64*-*-*vms* | ia64*-*-elf*  \
  | nios2*-*-*  \
)
   if test "x$enable_obsolete" != xyes; then
--
2.45.0


--
René Rebe, ExactCODE GmbH, Lietzenburger Str. 42, DE-10789 Berlin
https://exactcode.com | https://t2sde.org | https://rene.rebe.de

[PATCH] c++: visibility wrt concept-id as targ [PR115283]

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look
OK for trunk?

-- >8 --

It seems we don't maintain visibility flags for concepts either, so
min_vis_expr_r should ignore them for now, otherwise after r14-6789 we
incorrectly give function templates that use a concept-id in their
signature internal linkage.

PR c++/115283

gcc/cp/ChangeLog:

* decl2.cc (min_vis_expr_r) : Ignore
concepts.

gcc/testsuite/ChangeLog:

* g++.dg/template/linkage5.C: New test.
---
 gcc/cp/decl2.cc  |  5 +++--
 gcc/testsuite/g++.dg/template/linkage5.C | 14 ++
 2 files changed, 17 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/template/linkage5.C

diff --git a/gcc/cp/decl2.cc b/gcc/cp/decl2.cc
index 7baff46a192..88e87ad60c6 100644
--- a/gcc/cp/decl2.cc
+++ b/gcc/cp/decl2.cc
@@ -2723,9 +2723,10 @@ min_vis_expr_r (tree *tp, int */*walk_subtrees*/, void 
*data)
   break;
 
 case TEMPLATE_DECL:
-  if (DECL_ALIAS_TEMPLATE_P (t))
+  if (DECL_ALIAS_TEMPLATE_P (t) || concept_definition_p (t))
/* FIXME: We don't maintain TREE_PUBLIC / DECL_VISIBILITY for
-  alias templates so we can't trust it here (PR107906).  */
+  alias templates so we can't trust it here (PR107906).  Ditto
+  for concepts.  */
break;
   t = DECL_TEMPLATE_RESULT (t);
   /* Fall through.  */
diff --git a/gcc/testsuite/g++.dg/template/linkage5.C 
b/gcc/testsuite/g++.dg/template/linkage5.C
new file mode 100644
index 000..1dbb0beb5ea
--- /dev/null
+++ b/gcc/testsuite/g++.dg/template/linkage5.C
@@ -0,0 +1,14 @@
+// PR c++/115283
+// { dg-final { scan-assembler "(weak|glob)\[^\n\]*_Z1fIiEv1AIX4sameIT_EEE" } }
+// { dg-do compile { target c++20 } }
+
+template
+concept same = true;
+
+template
+struct A { };
+
+template
+void f(A>) { }
+
+template void f(A);
-- 
2.45.2.457.g8d94cfb545

Re: [PATCH 2/3] Enabled LRA for ia64.