Re: [PATCH] [testsuite] [ppc] block-cmp-8 should require powerpc64

2025-04-15 Thread Alexandre Oliva
On Apr 15, 2025, Peter Bergner  wrote:

> On 4/14/25 11:35 PM, Alexandre Oliva wrote:
>>> That said, that should be done in a separate patch.
>> 
>> *nod*.  Do you mean you're going to make that change, that I should, or
>> that you hope someone else will?  I'd rather avoid duplication, and this
>> is likely a somewhat involved change, since the string powerpc64 appears
>> all over gcc/testsuite/, with various meanings other than a dejagnu
>> effective target.

> Sorry.  I meant to say I'll have someone from my team do this follow-on patch.
> Yes, it will probably hit quite a few test cases.  No need for you to worry
> about that.

I see, thanks for the clarification.

Since that sort of broad change will presumably not make gcc-15 (it
wouldn't fix a regression, not even the problem addressed by the
upthread patch), may I understand your initial response in this thread
as approval of that patch?  That wasn't clear either.

(Sorry if that comes across as asking something obvious; I've noticed
misalignments between my expectations of obviousness and those of other
ppc maintainers before, so I've learned to be extra cautious)

-- 
Alexandre Oliva, happy hackerhttps://blog.lx.oliva.nom.br/
Free Software Activist FSFLA co-founder GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity.
Excluding neuro-others for not behaving ""normal"" is *not* inclusive!


RE: [PATCH]middle-end: Fix incorrect codegen with PFA and VLS [PR119351]

2025-04-15 Thread Richard Biener
On Tue, 15 Apr 2025, Tamar Christina wrote:

> > -Original Message-
> > From: Richard Sandiford 
> > Sent: Tuesday, April 15, 2025 10:52 AM
> > To: Tamar Christina 
> > Cc: gcc-patches@gcc.gnu.org; nd ; rguent...@suse.de
> > Subject: Re: [PATCH]middle-end: Fix incorrect codegen with PFA and VLS
> > [PR119351]
> > 
> > Tamar Christina  writes:
> > > diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
> > > index
> > 56a4e9a8b63f3cae0bf596bf5d22893887dc80e8..0722679d6e66e5dd5af4ec1c
> > e591f7c38b76d07f 100644
> > > --- a/gcc/tree-vect-loop-manip.cc
> > > +++ b/gcc/tree-vect-loop-manip.cc
> > > @@ -2195,6 +2195,22 @@ vect_can_peel_nonlinear_iv_p (loop_vec_info
> > loop_vinfo,
> > >return false;
> > >  }
> > >
> > > +  /* With early break vectorization we don't know whether the accesses 
> > > will stay
> > > + inside the loop or not.  TODO: The early break adjustment code can 
> > > be
> > > + implemented the same way for vectorizable_linear_induction.  
> > > However we
> > > + can't test this today so reject it.  */
> > > +  if (niters_skip != NULL_TREE
> > > +  && vect_use_loop_mask_for_alignment_p (loop_vinfo)
> > > +  && LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo)
> > > +  && LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > +{
> > > +  if (dump_enabled_p ())
> > > + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > > +  "Peeling for alignement using masking is not supported"
> > > +  " for nonlinear induction when using early breaks.\n");
> > > +  return false;
> > > +}
> > > +
> > >return true;
> > >  }
> > 
> > FTR, I was wondering here whether we should predict this in advance and
> > instead drop down to peeling for alignment without masks.  It probably
> > isn't worth the effort though.
> 
> We could move the check into vect_use_loop_mask_for_alignment_p where
> rejecting it there would get it to fall back to scalar peeling.  That seems 
> simple enough
> if that's preferrable.

The above is perferable IMO (short of fixing up that case, but with
a testcase).

Richard.

> Cheers,
> Tamar
> > 
> > > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> > > index
> > 9413dcef702597ab27165e676546b190e2bd36ba..6dcdee19bb250993d8cc6b0
> > 057d2fa46245d04d9 100644
> > > --- a/gcc/tree-vect-loop.cc
> > > +++ b/gcc/tree-vect-loop.cc
> > > @@ -10678,6 +10678,104 @@ vectorizable_induction (loop_vec_info
> > loop_vinfo,
> > >  LOOP_VINFO_MASK_SKIP_NITERS
> > (loop_vinfo));
> > > peel_mul = gimple_build_vector_from_val (&init_stmts,
> > >  step_vectype, peel_mul);
> > > +
> > > +   /* If early break then we have to create a new PHI which we can use as
> > > + an offset to adjust the induction reduction in early exits.  */
> > > +   if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > + {
> > > +   auto skip_niters = LOOP_VINFO_MASK_SKIP_NITERS (loop_vinfo);
> > > +   tree ty_skip_niters = TREE_TYPE (skip_niters);
> > > +   tree break_lhs_phi = NULL_TREE;
> > > +   break_lhs_phi = vect_get_new_vect_var (ty_skip_niters,
> > > +  vect_scalar_var,
> > > +  "pfa_iv_offset");
> > > +   gphi *nphi = create_phi_node (break_lhs_phi, bb);
> > > +   add_phi_arg (nphi, skip_niters, pe, UNKNOWN_LOCATION);
> > > +   add_phi_arg (nphi, build_zero_cst (ty_skip_niters),
> > > +loop_latch_edge (iv_loop), UNKNOWN_LOCATION);
> > > +
> > > +   /* Rewrite all the early exit usages.  */
> > > +   tree phi_lhs = PHI_RESULT (phi);
> > > +   imm_use_iterator iter;
> > > +   use_operand_p use_p;
> > > +   gimple *use_stmt;
> > > +
> > > +   FOR_EACH_IMM_USE_FAST (use_p, iter, phi_lhs)
> > > + {
> > > +   use_stmt = USE_STMT (use_p);
> > > +   if (!flow_bb_inside_loop_p (iv_loop, gimple_bb (use_stmt))
> > > +   && is_a  (use_stmt))
> > > + {
> > > +   auto gsi = gsi_last_bb (use_stmt->bb);
> > > +   for (auto exit : get_loop_exit_edges (iv_loop))
> > > + if (exit != LOOP_VINFO_IV_EXIT (loop_vinfo)
> > > + && bb == exit->src)
> > > +   {
> > > + /* Now create the PHI for the outside loop usage to
> > > +retrieve the value for the offset counter.  */
> > > + tree rphi_lhs = make_ssa_name (ty_skip_niters);
> > > + gphi *rphi
> > > +   = create_phi_node (rphi_lhs, use_stmt->bb);
> > > + for (unsigned i = 0; i < gimple_phi_num_args (rphi);
> > > +  i++)
> > > +   SET_PHI_ARG_DEF (rphi, i, PHI_RESULT (nphi));
> > > +
> > > + tree tmp = make_ssa_name (TREE_TYPE (phi_lhs));

Re: [PATCH] tailc: Fix up musttail calls vs. -fsanitize=thread [PR119801]

2025-04-15 Thread Richard Biener
On Tue, 15 Apr 2025, Jakub Jelinek wrote:

> Hi!
> 
> Calls with musttail attribute don't really work with -fsanitize=thread in
> GCC.  The problem is that TSan instrumentation adds
>   __tsan_func_entry (__builtin_return_address (0));
> calls at the start of each instrumented function and
>   __tsan_func_exit ();
> call at the end of those and the latter stands in a way of normal tail calls
> as well as musttail tail calls.
> 
> Looking at what LLVM does, for normal calls -fsanitize=thread also prevents
> tail calls like in GCC (well, the __tsan_func_exit () call itself can be
> tail called in GCC (and from what I see not in clang)).
> But for [[clang::musttail]] calls it arranges to move the
> __tsan_func_exit () before the musttail call instead of after it.
> 
> The following patch handles it similarly.  If we for -fsanitize=thread
> instrumented function detect __builtin_tsan_func_exit () call, we process
> it normally (so that the call can be tail called in function returning void)
> but set a flag that the builtin has been seen (only for cfun->has_musttail
> in the diag_musttail phase).  And then let tree_optimize_tail_calls_1
> call find_tail_calls again in a new mode where the __tsan_func_exit ()
> call is ignored and so we are able to find calls before it, but only
> accept that if the call before it is actually a musttail.  For C++ it needs
> to verify that EH cleanup if any also has the __tsan_func_exit () call
> and if all goes well, the musttail call is registered for tailcalling with
> a flag that it has __tsan_func_exit () after it and when optimizing that
> we emit __tsan_func_exit (); call before the musttail tail call (or musttail
> tail recursion).
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

Richard.

> 2025-04-15  Jakub Jelinek  
> 
>   PR sanitizer/119801
>   * sanitizer.def (BUILT_IN_TSAN_FUNC_EXIT): Use BT_FN_VOID rather
>   than BT_FN_VOID_PTR.
>   * tree-tailcall.cc: Include attribs.h and asan.h.
>   (struct tailcall): Add has_tsan_func_exit member.
>   (empty_eh_cleanup): Add eh_has_tsan_func_exit argument, set what
>   it points to to 1 if there is exactly one __tsan_func_exit call
>   and ignore that call otherwise.  Adjust recursive call.
>   (find_tail_calls): Add RETRY_TSAN_FUNC_EXIT argument, pass it
>   to recursive calls.  When seeing __tsan_func_exit call with
>   RETRY_TSAN_FUNC_EXIT 0, set it to -1.  If RETRY_TSAN_FUNC_EXIT
>   is 1, initially ignore __tsan_func_exit calls.  Adjust
>   empty_eh_cleanup caller.  When looking through stmts after the call,
>   ignore exactly one __tsan_func_exit call but remember it in
>   t->has_tsan_func_exit.  Diagnose if EH cleanups didn't have
>   __tsan_func_exit and normal path did or vice versa.
>   (optimize_tail_call): Emit __tsan_func_exit before the tail call
>   or tail recursion.
>   (tree_optimize_tail_calls_1): Adjust find_tail_calls callers.  If
>   find_tail_calls changes retry_tsan_func_exit to -1, set it to 1
>   and call it again with otherwise the same arguments.
> 
>   * c-c++-common/tsan/pr119801.c: New test.
> 
> --- gcc/sanitizer.def.jj  2025-04-14 19:30:31.804837079 +0200
> +++ gcc/sanitizer.def 2025-04-15 09:48:23.752349037 +0200
> @@ -247,7 +247,7 @@ DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_INIT
>  DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_FUNC_ENTRY, "__tsan_func_entry",
> BT_FN_VOID_PTR, ATTR_NOTHROW_LEAF_LIST)
>  DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_FUNC_EXIT, "__tsan_func_exit",
> -   BT_FN_VOID_PTR, ATTR_NOTHROW_LEAF_LIST)
> +   BT_FN_VOID, ATTR_NOTHROW_LEAF_LIST)
>  DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_VPTR_UPDATE, "__tsan_vptr_update",
> BT_FN_VOID_PTR_PTR, ATTR_NOTHROW_LEAF_LIST)
>  DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_READ1, "__tsan_read1",
> --- gcc/tree-tailcall.cc.jj   2025-04-14 19:30:31.976834786 +0200
> +++ gcc/tree-tailcall.cc  2025-04-15 10:12:48.879501238 +0200
> @@ -51,6 +51,8 @@ along with GCC; see the file COPYING3.
>  #include "symbol-summary.h"
>  #include "ipa-cp.h"
>  #include "ipa-prop.h"
> +#include "attribs.h"
> +#include "asan.h"
>  
>  /* The file implements the tail recursion elimination.  It is also used to
> analyze the tail calls in general, passing the results to the rtl level
> @@ -122,6 +124,9 @@ struct tailcall
>/* True if it is a call to the current function.  */
>bool tail_recursion;
>  
> +  /* True if there is __tsan_func_exit call after the call.  */
> +  bool has_tsan_func_exit;
> +
>/* The return value of the caller is mult * f + add, where f is the return
>   value of the call.  */
>tree mult, add;
> @@ -504,7 +509,7 @@ maybe_error_musttail (gcall *call, const
> Search at most CNT basic blocks (so that we don't need to do trivial
> loop discovery).  */
>  static bool
> -empty_eh_cleanup (basic_block bb, int cnt)
> +empty_eh_cleanup (basic_block bb,

Re: [PATCH] ipa-cp: Fix up ipcp_print_widest_int

2025-04-15 Thread Richard Biener
On Tue, 15 Apr 2025, Jakub Jelinek wrote:

> On Mon, Mar 31, 2025 at 03:34:07PM +0200, Martin Jambor wrote:
> > This patch just introduces a form of dumping of widest ints that only
> > have zeros in the lowest 128 bits so that instead of printing
> > thousands of f's the output looks like:
> > 
> >Bits: value = 0x, mask = all ones folled by 
> > 0x
> > 
> > and then makes sure we use the function not only to print bits but
> > also to print masks where values like these can also occur.
> 
> Shouldn't that be followed by instead?
> And the widest_int checks seems to be quite expensive (especially for
> large widest_ints), I think for the first one we can just == -1
> and for the second one wi::arshift (value, 128) == -1 and the zero extension
> by using wi::zext.
> 
> Anyway, I wonder if it wouldn't be better to use something shorter,
> the variant patch uses 0xf..f prefix before the 128-bit hexadecimal
> number (maybe we could also special case the even more common bits 64+
> are all ones case).  Or it could be 0xf*f prefix.  Or printing such
> numbers as -0x prefixed negative, though that is not a good idea for masks.

I'd accept 0xf..f as reasonable, possibly 0xff
when there are more than sizeof("repeated N times") fs inbetween.
It does make matching up masks more difficult when tracking changes
(from my experience with bit-CCP debugging, where such large masks
appear as well).  So IMO we can live with large 0xff but for
all-ones we could print -1 if that's the common noisy thing.

Richard.


RE: [PATCH]middle-end: Fix incorrect codegen with PFA and VLS [PR119351]

2025-04-15 Thread Richard Biener
On Tue, 15 Apr 2025, Tamar Christina wrote:

> > -Original Message-
> > From: Richard Biener 
> > Sent: Tuesday, April 15, 2025 12:50 PM
> > To: Tamar Christina 
> > Cc: Richard Sandiford ; gcc-patches@gcc.gnu.org;
> > nd 
> > Subject: RE: [PATCH]middle-end: Fix incorrect codegen with PFA and VLS
> > [PR119351]
> > 
> > On Tue, 15 Apr 2025, Tamar Christina wrote:
> > 
> > > > -Original Message-
> > > > From: Richard Sandiford 
> > > > Sent: Tuesday, April 15, 2025 10:52 AM
> > > > To: Tamar Christina 
> > > > Cc: gcc-patches@gcc.gnu.org; nd ; rguent...@suse.de
> > > > Subject: Re: [PATCH]middle-end: Fix incorrect codegen with PFA and VLS
> > > > [PR119351]
> > > >
> > > > Tamar Christina  writes:
> > > > > diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
> > > > > index
> > > >
> > 56a4e9a8b63f3cae0bf596bf5d22893887dc80e8..0722679d6e66e5dd5af4ec1c
> > > > e591f7c38b76d07f 100644
> > > > > --- a/gcc/tree-vect-loop-manip.cc
> > > > > +++ b/gcc/tree-vect-loop-manip.cc
> > > > > @@ -2195,6 +2195,22 @@ vect_can_peel_nonlinear_iv_p (loop_vec_info
> > > > loop_vinfo,
> > > > >return false;
> > > > >  }
> > > > >
> > > > > +  /* With early break vectorization we don't know whether the 
> > > > > accesses will
> > stay
> > > > > + inside the loop or not.  TODO: The early break adjustment code 
> > > > > can be
> > > > > + implemented the same way for vectorizable_linear_induction.  
> > > > > However
> > we
> > > > > + can't test this today so reject it.  */
> > > > > +  if (niters_skip != NULL_TREE
> > > > > +  && vect_use_loop_mask_for_alignment_p (loop_vinfo)
> > > > > +  && LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo)
> > > > > +  && LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > > > +{
> > > > > +  if (dump_enabled_p ())
> > > > > + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > > > > +  "Peeling for alignement using masking is not 
> > > > > supported"
> > > > > +  " for nonlinear induction when using early 
> > > > > breaks.\n");
> > > > > +  return false;
> > > > > +}
> > > > > +
> > > > >return true;
> > > > >  }
> > > >
> > > > FTR, I was wondering here whether we should predict this in advance and
> > > > instead drop down to peeling for alignment without masks.  It probably
> > > > isn't worth the effort though.
> > >
> > > We could move the check into vect_use_loop_mask_for_alignment_p where
> > > rejecting it there would get it to fall back to scalar peeling.  That 
> > > seems simple
> > enough
> > > if that's preferrable.
> > 
> > The above is perferable IMO (short of fixing up that case, but with
> > a testcase).
> > 
> 
> I wasn't able to make a testcase before as any non-linear induction feeding a 
> load becomes
> a gather load, which we block outright way before getting here though.  I 
> couldn't think of
> an example where it wouldn't be, even a gapped load e.g +=2 became one.

Well, a classic one would be

 int x = 2;
 for (i = 0; i < n; ++i)
   {
 if (y[i])
   break;
 x *= 3;
   }
 return x;

or the negate case

 int x = 2;
 for (i = 0; i < n; ++i)
   { 
 if (y[i]) 
   break;
 x = -x;
   }
 return x;

possibly we mark those as unhandled for early exits somewhere already.
There does seem to be code handling PFA with masks for them.

Richard.


> Thanks,
> Tamar
> 
> > Richard.
> > 
> > > Cheers,
> > > Tamar
> > > >
> > > > > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> > > > > index
> > > >
> > 9413dcef702597ab27165e676546b190e2bd36ba..6dcdee19bb250993d8cc6b0
> > > > 057d2fa46245d04d9 100644
> > > > > --- a/gcc/tree-vect-loop.cc
> > > > > +++ b/gcc/tree-vect-loop.cc
> > > > > @@ -10678,6 +10678,104 @@ vectorizable_induction (loop_vec_info
> > > > loop_vinfo,
> > > > >  LOOP_VINFO_MASK_SKIP_NITERS
> > > > (loop_vinfo));
> > > > > peel_mul = gimple_build_vector_from_val (&init_stmts,
> > > > >  step_vectype, 
> > > > > peel_mul);
> > > > > +
> > > > > +   /* If early break then we have to create a new PHI which we 
> > > > > can use as
> > > > > + an offset to adjust the induction reduction in early exits. 
> > > > >  */
> > > > > +   if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > > > + {
> > > > > +   auto skip_niters = LOOP_VINFO_MASK_SKIP_NITERS 
> > > > > (loop_vinfo);
> > > > > +   tree ty_skip_niters = TREE_TYPE (skip_niters);
> > > > > +   tree break_lhs_phi = NULL_TREE;
> > > > > +   break_lhs_phi = vect_get_new_vect_var (ty_skip_niters,
> > > > > +  vect_scalar_var,
> > > > > +  "pfa_iv_offset");
> > > > > +   gphi *nphi = create_phi_node (break_lhs_phi, bb);
> > > > > +   add_phi_arg (nphi, skip_niters, pe, UNKNOWN_LOCATION);
> > > > > +

Re: [PATCH] AArch64: Fix operands order in vec_extract expander

2025-04-15 Thread Tejas Belagod

On 4/14/25 7:44 PM, Kyrylo Tkachov wrote:

Hi Tejas,


On 14 Apr 2025, at 16:04, Tejas Belagod  wrote:

The operand order to gen_vcond_mask call in the vec_extract pattern is wrong.
Fix the order where predicate is operand 3.

Tested and bootstrapped on aarch64-linux-gnu. OK for trunk?

gcc/ChangeLog

* config/aarch64/aarch64-sve.md (vec_extract): Fix operand
order to gen_vcond_mask_*.
---
gcc/config/aarch64/aarch64-sve.md | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-sve.md 
b/gcc/config/aarch64/aarch64-sve.md
index 3dbd65986ec..d4af3706294 100644
--- a/gcc/config/aarch64/aarch64-sve.md
+++ b/gcc/config/aarch64/aarch64-sve.md
@@ -3133,9 +3133,9 @@
   "TARGET_SVE"
   {
 rtx tmp = gen_reg_rtx (mode);
-emit_insn (gen_vcond_mask_ (tmp, operands[1],
- CONST1_RTX (mode),
- CONST0_RTX (mode)));
+emit_insn (gen_vcond_mask_ (tmp, CONST1_RTX (mode),
+ CONST0_RTX (mode),
+ operands[1]));
 emit_insn (gen_vec_extract (operands[0], tmp, operands[2]));
 DONE;


Looks like a correct fix, is there a test case where this causes a problem?


No, I discovered it while working on making [] work on svbool_t, so it 
needs new changes for this code path to be taken.  With HEAD, I'm not 
able to get vec_extract to trigger on a boolean vector as I can't use [] 
on svbool_t. None of the existing intrinsics seem to help either.



Does it need back porting?


Probably not as it needs a new feature.

Thanks,
Tejas.



Thanks,
Kyrill



   }
--
2.25.1







Re: [PATCH]middle-end: Fix incorrect codegen with PFA and VLS [PR119351]

2025-04-15 Thread Richard Biener
On Tue, 15 Apr 2025, Tamar Christina wrote:

> Hi All,
> 
> The following example:
> 
> #define N 512
> #define START 2
> #define END 505
> 
> int x[N] __attribute__((aligned(32)));
> 
> int __attribute__((noipa))
> foo (void)
> {
>   for (signed int i = START; i < END; ++i)
> {
>   if (x[i] == 0)
> return i;
> }
>   return -1;
> }
> 
> generates incorrect code with fixed length SVE because for early break we need
> to know which value to start the scalar loop with if we take an early exit.
> 
> Historically this means that we take the first element of every induction.
> this is because there's an assumption in place, that even with masked loops 
> the
> masks come from a whilel* instruction.
> 
> As such we reduce using a BIT_FIELD_REF <, 0>.
> 
> When PFA was added this assumption was correct for non-masked loop, however we
> assumed that PFA for VLA wouldn't work for now, and disabled it using the
> alignment requirement checks.  We also expected VLS to PFA using scalar loops.
> 
> However as this PR shows, for VLS the vectorizer can, and does in some
> circumstances choose to peel using masks by masking the first iteration of the
> loop with an additional alignment mask.
> 
> When this is done, the first elements of the predicate can be inactive. In 
> this
> example element 1 is inactive based on the calculated misalignment.  hence the
> -1 value in the first vector IV element.
> 
> When we reduce using BIT_FIELD_REF we get the wrong value.
> 
> This patch updates it by creating a new scalar PHI that keeps track of whether
> we are the first iteration of the loop (with the additional masking) or 
> whether
> we have taken a loop iteration already.
> 
> The generated sequence:
> 
> pre-header:
>   bb1:
> i_1 = 
> 
> header:
>   bb2:
> i_2 = PHI 
> …
> 
> early-exit:
>   bb3:
> i_3 = iv_step * i_2 + PHI
> 
> Which eliminates the need to do an expensive mask based reduction.
> 
> This fixes gromacs with one OpenMP thread. But with > 1 there is still an 
> issue.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu,
> arm-none-linux-gnueabihf, x86_64-pc-linux-gnu
> -m32, -m64 and no issues.
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>   PR tree-optimization/119351
>   * tree-vect-loop-manip.cc (vect_can_peel_nonlinear_iv_p): Reject PFA
>   with masking with non-linear IVs.
>   * tree-vect-loop.cc (vectorizable_induction): Support PFA for masking.
> 
> gcc/testsuite/ChangeLog:
> 
>   PR tree-optimization/119351
>   * gcc.target/aarch64/sve/peel_ind_10.c: New test.
>   * gcc.target/aarch64/sve/peel_ind_10_run.c: New test.
>   * gcc.target/aarch64/sve/peel_ind_5.c: New test.
>   * gcc.target/aarch64/sve/peel_ind_5_run.c: New test.
>   * gcc.target/aarch64/sve/peel_ind_6.c: New test.
>   * gcc.target/aarch64/sve/peel_ind_6_run.c: New test.
>   * gcc.target/aarch64/sve/peel_ind_7.c: New test.
>   * gcc.target/aarch64/sve/peel_ind_7_run.c: New test.
>   * gcc.target/aarch64/sve/peel_ind_8.c: New test.
>   * gcc.target/aarch64/sve/peel_ind_8_run.c: New test.
>   * gcc.target/aarch64/sve/peel_ind_9.c: New test.
>   * gcc.target/aarch64/sve/peel_ind_9_run.c: New test.
> 
> ---
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/peel_ind_10.c 
> b/gcc/testsuite/gcc.target/aarch64/sve/peel_ind_10.c
> new file mode 100644
> index 
> ..b7a7bc5cb0cfdfdb74adb120c54ba15019832cf1
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/peel_ind_10.c
> @@ -0,0 +1,24 @@
> +/* Fix for PR119351 alignment peeling with vectors and VLS.  */
> +/* { dg-do compile } */
> +/* { dg-options "-Ofast -msve-vector-bits=256 --param 
> aarch64-autovec-preference=sve-only -fdump-tree-vect-details" } */
> +
> +#define N 512
> +#define START 0
> +#define END 505
> + 
> +int x[N] __attribute__((aligned(32)));
> +
> +int __attribute__((noipa))
> +foo (int start)
> +{
> +  for (unsigned int i = start; i < END; ++i)
> +{
> +  if (x[i] == 0)
> +return i;
> +}
> +  return -1;
> +}
> +
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> +/* { dg-final { scan-tree-dump "pfa_iv_offset" "vect" } } */
> +/* { dg-final { scan-tree-dump "Alignment of access forced using peeling" 
> "vect" } } */
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/peel_ind_10_run.c 
> b/gcc/testsuite/gcc.target/aarch64/sve/peel_ind_10_run.c
> new file mode 100644
> index 
> ..6169aebcc40cc1553f30c1af61ccec91b51cdb42
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/peel_ind_10_run.c
> @@ -0,0 +1,17 @@
> +/* Fix for PR119351 alignment peeling with vectors and VLS.  */
> +/* { dg-do run { target aarch64_sve_hw } } */
> +/* { dg-options "-Ofast --param aarch64-autovec-preference=sve-only" } */
> +/* { dg-additional-options "-msve-vector-bits=256" { target 
> aarch64_sve256_hw } } */
> +/* { dg-additional-options "-msve-vec

Re: [PATCH] ipa-cp: Fix up ipcp_print_widest_int

2025-04-15 Thread Martin Jambor
Hi,

On Tue, Apr 15 2025, Jakub Jelinek wrote:
> On Mon, Mar 31, 2025 at 03:34:07PM +0200, Martin Jambor wrote:
>> This patch just introduces a form of dumping of widest ints that only
>> have zeros in the lowest 128 bits so that instead of printing
>> thousands of f's the output looks like:
>> 
>>Bits: value = 0x, mask = all ones folled by 
>> 0x
>> 
>> and then makes sure we use the function not only to print bits but
>> also to print masks where values like these can also occur.
>
> Shouldn't that be followed by instead?

Yes, of course.

> And the widest_int checks seems to be quite expensive (especially for
> large widest_ints), I think for the first one we can just == -1
> and for the second one wi::arshift (value, 128) == -1 and the zero extension
> by using wi::zext.
>
> Anyway, I wonder if it wouldn't be better to use something shorter,
> the variant patch uses 0xf..f prefix before the 128-bit hexadecimal
> number (maybe we could also special case the even more common bits 64+
> are all ones case).  Or it could be 0xf*f prefix.  Or printing such
> numbers as -0x prefixed negative, though that is not a good idea for masks.

I was not sure myself what the best way would be and so proposed the
simplest variant I could think of.  I am fine with anything that does
not print thousands of f's which could be the case before.

So if you like the second variant more, I and I guess I do as well, by
all means go ahead and commit it.

Thanks,

Martin


>
>   Jakub
> 2025-04-15  Jakub Jelinek  
>
>   * ipa-cp.cc (ipcp_print_widest_int): Fix a typo, folled -> followed.
>   Simplify wide_int check for -1 or all ones above least significant
>   128 bits.
>
> --- gcc/ipa-cp.cc.jj  2025-04-15 07:55:18.369479825 +0200
> +++ gcc/ipa-cp.cc 2025-04-15 11:37:03.059964475 +0200
> @@ -313,14 +313,12 @@ ipcp_lattice::print (FILE * f,
>  static void
>  ipcp_print_widest_int (FILE *f, const widest_int &value)
>  {
> -  if (wi::eq_p (wi::bit_not (value), 0))
> +  if (value == -1)
>  fprintf (f, "-1");
> -  else if (wi::eq_p (wi::bit_not (wi::bit_or (value,
> -   wi::sub (wi::lshift (1, 128),
> -1))), 0))
> +  else if (wi::arshift (value, 128) == -1)
>  {
> -  fprintf (f, "all ones folled by ");
> -  print_hex (wi::bit_and (value, wi::sub (wi::lshift (1, 128), 1)), f);
> +  fprintf (f, "all ones followed by ");
> +  print_hex (wi::zext (value, 128), f);
>  }
>else
>  print_hex (value, f);
> 2025-04-15  Jakub Jelinek  
>
>   * ipa-cp.cc (ipcp_print_widest_int): Print values with all ones in
>   bits 128+ with "0xf..f" prefix instead of "all ones folled by ".
>   Simplify wide_int check for -1 or all ones above least significant
>   128 bits.
>
> --- gcc/ipa-cp.cc.jj  2025-04-15 07:55:18.369479825 +0200
> +++ gcc/ipa-cp.cc 2025-04-15 11:54:45.369704056 +0200
> @@ -313,14 +313,20 @@ ipcp_lattice::print (FILE * f,
>  static void
>  ipcp_print_widest_int (FILE *f, const widest_int &value)
>  {
> -  if (wi::eq_p (wi::bit_not (value), 0))
> +  if (value == -1)
>  fprintf (f, "-1");
> -  else if (wi::eq_p (wi::bit_not (wi::bit_or (value,
> -   wi::sub (wi::lshift (1, 128),
> -1))), 0))
> +  else if (wi::arshift (value, 128) == -1)
>  {
> -  fprintf (f, "all ones folled by ");
> -  print_hex (wi::bit_and (value, wi::sub (wi::lshift (1, 128), 1)), f);
> +  char buf[35];
> +  widest_int v = wi::zext (value, 128);
> +  size_t len;
> +  print_hex (v, buf);
> +  len = strlen (buf + 2);
> +  if (len == 32)
> + fprintf (f, "0xf..f");
> +  else
> + fprintf (f, "0xf..f%0*d", (int) (32 - len), 0);
> +  fputs (buf + 2, f);
>  }
>else
>  print_hex (value, f);


Re: [PATCH v5 1/2] i386: Prefer PLT indirection for __fentry__ calls under -fPIC

2025-04-15 Thread Ard Biesheuvel
On Tue, 15 Apr 2025 at 09:48, Uros Bizjak  wrote:
>
> On Thu, Apr 10, 2025 at 2:27 PM Ard Biesheuvel  wrote:
> >
> > From: Ard Biesheuvel 
> >
> > Commit bde21de1205 ("i386: Honour -mdirect-extern-access when calling
> > __fentry__") updated the logic that emits mcount() / __fentry__() calls
> > into function prologues when profiling is enabled, to avoid GOT-based
> > indirect calls when a direct call would suffice.
> >
> > There are two problems with that change:
> > - it relies on -mdirect-extern-access rather than -fno-plt to decide
> >   whether or not a direct [PLT based] call is appropriate;
> > - for the PLT case, it falls through to x86_print_call_or_nop(), which
> >   does not emit the @PLT suffix, resulting in the wrong relocation to be
> >   used (R_X86_64_PC32 instead of R_X86_64_PLT32)
> >
> > Fix this by testing flag_plt instead of ix86_direct_extern_access, and
> > updating x86_print_call_or_nop() to take flag_pic and flag_plt into
> > account. This also ensures that -mnop-mcount works as expected when
> > emitting the PLT based profiling calls.
> >
> > While at it, fix the 32-bit logic as well, and issue a PLT call unless
> > PLTs are explicitly disabled.
> >
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119386
> >
> > Signed-off-by: Ard Biesheuvel 
> >
> > gcc/ChangeLog:
> >
> > PR target/119386
> > * config/i386/i386.cc (x86_print_call_or_nop): Add @PLT suffix
> > where appropriate.
> > (x86_function_profiler): Fall through to x86_print_call_or_nop()
> > for PIC codegen when flag_plt is set.
> >
> > gcc/testsuite/ChangeLog:
> >
> > PR target/119386
> > * gcc.target/i386/pr119386-1.c: New test.
> > * gcc.target/i386/pr119386-2.c: New test.
>
> OK if there are no further comments in the next day or two.
>

Thanks

> BTW: Do you have commit rights?
>

No I do not.


Re: [PATCH] c++: prev declared hidden tmpl friend inst, cont [PR119807]

2025-04-15 Thread Jason Merrill

On 4/14/25 5:22 PM, Patrick Palka wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this
look OK for trunk and later 14?


OK.


-- >8 --

When remapping existing specializations of a hidden template friend from
a previous declaration to the new definition, we must remap only those
specializations that match this new definition, but currently we
remap all specializations since they're all contained in the
DECL_TEMPLATE_INSTANTIATIONS list of the most general template of f.

Concretely, in the first testcase below, we form two specializations of
the friend A::f, one with arguments {{0},{bool}} and another with
arguments {{1},{bool}}.  Later when instantiating B, we need to remap
these specializations.  During the B<0> instantiation we only want to
remap the first specialization, and during the B<1> instantiation only
the second specialization, but currently we remap both specializations
twice.

tsubst_friend_function needs to determine if an existing specialization
matches the shape of the new definition, which is tricky in general,
e.g.  the outer template parameters may not match up.  Fortunately we
don't have to reinvent the wheel here since is_specialization_of_friend
seems to do exactly what we need.  We can check this unconditionally,
but I think it's only necessary when dealing with existing
specializations formed from a class template scope previous declaration,
hence TMPL_ARGS_HAVE_MULTIPLE_LEVELS check.

PR c++/119807
PR c++/112288

gcc/cp/ChangeLog:

* pt.cc (tsubst_friend_function): Skip remapping an
existing specialization if it doesn't match the shape of
the new friend definition.

gcc/testsuite/ChangeLog:

* g++.dg/template/friend86.C: New test.
* g++.dg/template/friend87.C: New test.
---
  gcc/cp/pt.cc |  4 +++
  gcc/testsuite/g++.dg/template/friend86.C | 25 ++
  gcc/testsuite/g++.dg/template/friend87.C | 42 
  3 files changed, 71 insertions(+)
  create mode 100644 gcc/testsuite/g++.dg/template/friend86.C
  create mode 100644 gcc/testsuite/g++.dg/template/friend87.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index b7060b4c5aa6..4349b19119b7 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -11772,6 +11772,10 @@ tsubst_friend_function (tree decl, tree args)
  elt.args = DECL_TI_ARGS (spec);
  elt.spec = NULL_TREE;
  
+		  if (TMPL_ARGS_HAVE_MULTIPLE_LEVELS (DECL_TI_ARGS (spec))

+ && !is_specialization_of_friend (spec, new_template))
+   continue;
+
  decl_specializations->remove_elt (&elt);
  
  		  tree& spec_args = DECL_TI_ARGS (spec);

diff --git a/gcc/testsuite/g++.dg/template/friend86.C 
b/gcc/testsuite/g++.dg/template/friend86.C
new file mode 100644
index ..9e2c1afb351c
--- /dev/null
+++ b/gcc/testsuite/g++.dg/template/friend86.C
@@ -0,0 +1,25 @@
+// PR c++/119807
+// { dg-do run }
+
+template
+struct A {
+  template friend int f(A, T);
+};
+
+template struct A<0>;
+template struct A<1>;
+
+int main() {
+  A<0> x;
+  A<1> y;
+  if (f(x, true) != 0) __builtin_abort();
+  if (f(y, true) != 1) __builtin_abort();
+}
+
+template
+struct B {
+  template friend int f(A, T) { return N; }
+};
+
+template struct B<0>;
+template struct B<1>;
diff --git a/gcc/testsuite/g++.dg/template/friend87.C 
b/gcc/testsuite/g++.dg/template/friend87.C
new file mode 100644
index ..94c0dfc52924
--- /dev/null
+++ b/gcc/testsuite/g++.dg/template/friend87.C
@@ -0,0 +1,42 @@
+// PR c++/119807
+// { dg-do compile { target c++20 } }
+
+using size_t = decltype(sizeof(0));
+
+template
+struct CounterReader {
+   template
+   friend auto counterFlag(CounterReader) noexcept;
+};
+
+template
+struct CounterWriter {
+   static constexpr size_t value = current;
+
+   template
+   friend auto counterFlag(CounterReader) noexcept {}
+};
+
+template
+[[nodiscard]] constexpr size_t counterAdvance() noexcept {
+   if constexpr (!mask) {
+   return CounterWriter::value;
+   } else if constexpr (requires { counterFlag(CounterReader()); }) {
+   return counterAdvance> 
1)>();
+   }
+   else {
+   return counterAdvance> 1)>();
+   }
+}
+
+constexpr auto defaultCounterTag = [] {};
+
+template
+constexpr size_t counter() noexcept {
+   return counterAdvance();
+}
+
+int main() {
+   static_assert(counter() == 1);
+   static_assert(counter() == 2);
+}




Re: [PATCH v2 3/4] libstdc++: Implement std::extents [PR107761].

2025-04-15 Thread Tomasz Kaminski
On Tue, Apr 15, 2025 at 2:35 PM Luc Grosheintz 
wrote:

> Thank you! I left two comments. Everything not commented on, I'll just
> incorporate into the next iteration.
> On 4/15/25 11:18 AM, Tomasz Kaminski wrote:
>
>
>
> On Tue, Apr 15, 2025 at 10:43 AM Luc Grosheintz 
> wrote:
>
>> This implements std::extents from  according to N4950 and
>> contains partial progress towards PR107761.
>>
>> If an extent changes its type, there's a precondition in the standard,
>> that the value is representable in the target integer type. This commit
>> uses direct initialization to perform the conversion, without any
>> additional checks.
>>
>> The precondition for 'extents::{static_,}extent' is that '__r < rank()'.
>> For extents this precondition is always violated and results in
>> calling __builtin_trap. For all other specializations it's checked via
>> __glibcxx_assert.
>>
>> PR libstdc++/107761
>>
>> libstdc++-v3/ChangeLog:
>>
>> * include/std/mdspan (extents): New class.
>> * src/c++23/std.cc.in: Add 'using std::extents'.
>>
>> Signed-off-by: Luc Grosheintz 
>>
> Looks really good, thanks. A bunch of small suggestions.
>
>> ---
>>  libstdc++-v3/include/std/mdspan  | 304 +++
>>  libstdc++-v3/src/c++23/std.cc.in |   6 +-
>>  2 files changed, 309 insertions(+), 1 deletion(-)
>>
>> diff --git a/libstdc++-v3/include/std/mdspan
>> b/libstdc++-v3/include/std/mdspan
>> index 4094a416d1e..72ca3445d15 100644
>> --- a/libstdc++-v3/include/std/mdspan
>> +++ b/libstdc++-v3/include/std/mdspan
>> @@ -33,6 +33,10 @@
>>  #pragma GCC system_header
>>  #endif
>>
>> +#include 
>> +#include 
>> +#include 
>> +
>>  #define __glibcxx_want_mdspan
>>  #include 
>>
>> @@ -41,6 +45,306 @@
>>  namespace std _GLIBCXX_VISIBILITY(default)
>>  {
>>  _GLIBCXX_BEGIN_NAMESPACE_VERSION
>> +  namespace __mdspan
>> +  {
>> +template
>> +  class __array
>>
> Any reason for using __array here, instead of std::array.
> We already need to pull that header, because of the constructors.
>
> Yes, this is related to [[no_unique_address]]. If all extents are static,
> then there's no dynamic extents that need to be stored. A similar
> situation is encountered in span. Where the size of the span can be
> a compile time constant. The implementation of span uses
> [[no_unique_address]], which is why I wanted to do the same.
>
You could use __array_traits::_Type here, as it is either an empty
struct
with appropriate [] member or array.

> If I try to use [[no_unique_address]] with array, it has no effect.
> (I believe it's because array always contains a private member
> `_M_elems` and because array predates [[no_unique_address]] it
> can't use that attribute.) Here's an example of this on godbolt:
> https://godbolt.org/z/9vd5Gv31x
>
> This does raise a question about ABI, is it correct to use
> [[no_unique_address]]? When reading the standard, it's not clear to
> me whether using [[no_unique_address]] is permitted/prohibited.
>
It would be breaking to change the layout for the array, but I think it is
allowed,
and even desired from a QoI perspective, to put it there.

> +  {
>> +  public:
>> +   constexpr _Tp&
>> + operator[](size_t __n) noexcept
>> + {
>> +   return _M_elems[__n];
>> + }
>> +
>> +   constexpr const _Tp&
>> + operator[](size_t __n) const noexcept
>> + {
>> +   return _M_elems[__n];
>> + }
>> +
>> +  private:
>> +   array<_Tp, _Nm> _M_elems;
>> +  };
>> +
>> +template
>> +  class __array<_Tp, 0>
>> +  {
>> +  public:
>> +   constexpr _Tp&
>> + operator[](size_t __n) noexcept
>> + {
>> +   __builtin_trap();
>> + }
>> +
>> +   constexpr const _Tp&
>> + operator[](size_t __n) const noexcept
>> + {
>> +   __builtin_trap();
>> + }
>> +  };
>> +
>> +template
>>
> I would use NTTP of the std::array here, and eliminate internal _S_exts.
> Your __array is not structural, because it has private members.
>
> There were two weak reasons. The first was that the following
> line is a bit lengthy and breaks awkwardly:
>
>   using _S_storage = __mdspan::_ExtentsStorage<_IndexType,
> array{_Extents...}>;
>
You could use the CTAD deduction  array{ _Extends... } or
to_array({ _Extends... }).
I prefer the latter.

> when sizeof...(_Extents) == 0, it can't deduce the template arguments, so I
> couldn't make the shorter <_IndexType, {_Extents...}> work. During the
> implementation it was sometimes useful to be able to manipulate the
> extents as a pack; but now there's no such cases left anymore. I'll change
> it.
>
> +  class _ExtentsStorage
>> +  {
>> +  public:
>> +   static constexpr bool
>> +   _M_is_dyn(size_t __ext) noexcept
>> +   { return __ext == dynamic_extent; }
>> +
>> +   template
>> + static constexpr _IndexType
>> + _M_int_cast(const _OIndexType& __other) noexce

Re: [PATCH] x86: Update gcc.target/i386/apx-interrupt-1.c

2025-04-15 Thread H.J. Lu
On Tue, Apr 15, 2025 at 12:45 AM Uros Bizjak  wrote:
>
> On Tue, Apr 15, 2025 at 1:06 AM H.J. Lu  wrote:
> >
> > ix86_add_cfa_restore_note omits the REG_CFA_RESTORE REG note for registers
> > pushed in red-zone.  Since
> >
> > commit 0a074b8c7e79f9d9359d044f1499b0a9ce9d2801
> > Author: H.J. Lu 
> > Date:   Sun Apr 13 12:20:42 2025 -0700
> >
> > APX: Don't use red-zone with 32 GPRs and no caller-saved registers
> >
> > disabled red-zone, update gcc.target/i386/apx-interrupt-1.c to expect
> > 31 .cfi_restore directives.
>
> Hm, did you also account for RED_ZONE_RESERVE? The last 8-byte slot is
> reserved for internal use by the compiler.

There is no red-zone in this case.

> Uros.
>
> >
> > PR target/119784
> > * gcc.target/i386/apx-interrupt-1.c: Expect 31 .cfi_restore
> > directives.
> >
> > Signed-off-by: H.J. Lu 
> > ---
> >  gcc/testsuite/gcc.target/i386/apx-interrupt-1.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/gcc/testsuite/gcc.target/i386/apx-interrupt-1.c 
> > b/gcc/testsuite/gcc.target/i386/apx-interrupt-1.c
> > index fefe2e6d6fc..fa1acc7a142 100644
> > --- a/gcc/testsuite/gcc.target/i386/apx-interrupt-1.c
> > +++ b/gcc/testsuite/gcc.target/i386/apx-interrupt-1.c
> > @@ -66,7 +66,7 @@ void foo (void *frame)
> >  /* { dg-final { scan-assembler-times {\t\.cfi_offset 132, -120} 1 } } */
> >  /* { dg-final { scan-assembler-times {\t\.cfi_offset 131, -128} 1 } } */
> >  /* { dg-final { scan-assembler-times {\t\.cfi_offset 130, -136} 1 } } */
> > -/* { dg-final { scan-assembler-times ".cfi_restore" 15} } */
> > +/* { dg-final { scan-assembler-times ".cfi_restore" 31 } } */
> >  /* { dg-final { scan-assembler-times "pop(?:l|q)\[\\t \]*%(?:e|r)ax" 1 } } 
> > */
> >  /* { dg-final { scan-assembler-times "pop(?:l|q)\[\\t \]*%(?:e|r)bx" 1 } } 
> > */
> >  /* { dg-final { scan-assembler-times "pop(?:l|q)\[\\t \]*%(?:e|r)cx" 1 } } 
> > */
> > --
> > 2.49.0
> >



-- 
H.J.


Re: [PATCH] ipa-cp: Fix up ipcp_print_widest_int

2025-04-15 Thread Jakub Jelinek
On Tue, Apr 15, 2025 at 01:56:25PM +0200, Richard Biener wrote:
> On Tue, 15 Apr 2025, Jakub Jelinek wrote:
> 
> > On Mon, Mar 31, 2025 at 03:34:07PM +0200, Martin Jambor wrote:
> > > This patch just introduces a form of dumping of widest ints that only
> > > have zeros in the lowest 128 bits so that instead of printing
> > > thousands of f's the output looks like:
> > > 
> > >Bits: value = 0x, mask = all ones folled by 
> > > 0x
> > > 
> > > and then makes sure we use the function not only to print bits but
> > > also to print masks where values like these can also occur.
> > 
> > Shouldn't that be followed by instead?
> > And the widest_int checks seems to be quite expensive (especially for
> > large widest_ints), I think for the first one we can just == -1
> > and for the second one wi::arshift (value, 128) == -1 and the zero extension
> > by using wi::zext.
> > 
> > Anyway, I wonder if it wouldn't be better to use something shorter,
> > the variant patch uses 0xf..f prefix before the 128-bit hexadecimal
> > number (maybe we could also special case the even more common bits 64+
> > are all ones case).  Or it could be 0xf*f prefix.  Or printing such
> > numbers as -0x prefixed negative, though that is not a good idea for masks.
> 
> I'd accept 0xf..f as reasonable, possibly 0xff
> when there are more than sizeof("repeated N times") fs inbetween.
> It does make matching up masks more difficult when tracking changes
> (from my experience with bit-CCP debugging, where such large masks
> appear as well).  So IMO we can live with large 0xff but for
> all-ones we could print -1 if that's the common noisy thing.

There is already -1 special case, the problem which Martin was trying
to solve is that the 0xf<32762 times f>f and similar cases were
still way too common.
The primary problem is using wrong type for it, IMHO both ipa-cp and
tree-ssa-ccp should be using wide_int with the precision of the particular
SSA_NAME etc., rather than widest_int.  But there is no chance rewriting
that now for GCC 15.

Jakub



Re: [PATCH] ipa-cp: Fix up ipcp_print_widest_int

2025-04-15 Thread Richard Biener
On Tue, 15 Apr 2025, Jakub Jelinek wrote:

> On Tue, Apr 15, 2025 at 01:56:25PM +0200, Richard Biener wrote:
> > On Tue, 15 Apr 2025, Jakub Jelinek wrote:
> > 
> > > On Mon, Mar 31, 2025 at 03:34:07PM +0200, Martin Jambor wrote:
> > > > This patch just introduces a form of dumping of widest ints that only
> > > > have zeros in the lowest 128 bits so that instead of printing
> > > > thousands of f's the output looks like:
> > > > 
> > > >Bits: value = 0x, mask = all ones folled by 
> > > > 0x
> > > > 
> > > > and then makes sure we use the function not only to print bits but
> > > > also to print masks where values like these can also occur.
> > > 
> > > Shouldn't that be followed by instead?
> > > And the widest_int checks seems to be quite expensive (especially for
> > > large widest_ints), I think for the first one we can just == -1
> > > and for the second one wi::arshift (value, 128) == -1 and the zero 
> > > extension
> > > by using wi::zext.
> > > 
> > > Anyway, I wonder if it wouldn't be better to use something shorter,
> > > the variant patch uses 0xf..f prefix before the 128-bit hexadecimal
> > > number (maybe we could also special case the even more common bits 64+
> > > are all ones case).  Or it could be 0xf*f prefix.  Or printing such
> > > numbers as -0x prefixed negative, though that is not a good idea for 
> > > masks.
> > 
> > I'd accept 0xf..f as reasonable, possibly 0xff
> > when there are more than sizeof("repeated N times") fs inbetween.
> > It does make matching up masks more difficult when tracking changes
> > (from my experience with bit-CCP debugging, where such large masks
> > appear as well).  So IMO we can live with large 0xff but for
> > all-ones we could print -1 if that's the common noisy thing.
> 
> There is already -1 special case, the problem which Martin was trying
> to solve is that the 0xf<32762 times f>f and similar cases were
> still way too common.
> The primary problem is using wrong type for it, IMHO both ipa-cp and
> tree-ssa-ccp should be using wide_int with the precision of the particular
> SSA_NAME etc., rather than widest_int.  But there is no chance rewriting
> that now for GCC 15.

ISTR at least CCP _prints_ wide_int (or it's tree form), not widest_int
(but the lattice has widest_int indeed).  But CCP knows the precision
of the lattice entry (which is for an SSA name), possibly IPA CP
doesn't.

Richard.


Re: [PATCH v2 3/4] libstdc++: Implement std::extents [PR107761].

2025-04-15 Thread Luc Grosheintz

Thank you! I left two comments. Everything not commented on, I'll just
incorporate into the next iteration.

On 4/15/25 11:18 AM, Tomasz Kaminski wrote:



On Tue, Apr 15, 2025 at 10:43 AM Luc Grosheintz 
 wrote:


This implements std::extents from  according to N4950 and
contains partial progress towards PR107761.

If an extent changes its type, there's a precondition in the standard,
that the value is representable in the target integer type. This
commit
uses direct initialization to perform the conversion, without any
additional checks.

The precondition for 'extents::{static_,}extent' is that '__r <
rank()'.
For extents this precondition is always violated and results in
calling __builtin_trap. For all other specializations it's checked via
__glibcxx_assert.

        PR libstdc++/107761

libstdc++-v3/ChangeLog:

        * include/std/mdspan (extents): New class.
        * src/c++23/std.cc.in : Add 'using
std::extents'.

Signed-off-by: Luc Grosheintz 

Looks really good, thanks. A bunch of small suggestions.

---
 libstdc++-v3/include/std/mdspan  | 304
+++
 libstdc++-v3/src/c++23/std.cc.in  |   6 +-
 2 files changed, 309 insertions(+), 1 deletion(-)

diff --git a/libstdc++-v3/include/std/mdspan
b/libstdc++-v3/include/std/mdspan
index 4094a416d1e..72ca3445d15 100644
--- a/libstdc++-v3/include/std/mdspan
+++ b/libstdc++-v3/include/std/mdspan
@@ -33,6 +33,10 @@
 #pragma GCC system_header
 #endif

+#include 
+#include 
+#include 
+
 #define __glibcxx_want_mdspan
 #include 

@@ -41,6 +45,306 @@
 namespace std _GLIBCXX_VISIBILITY(default)
 {
 _GLIBCXX_BEGIN_NAMESPACE_VERSION
+  namespace __mdspan
+  {
+    template
+      class __array

Any reason for using __array here, instead of std::array.
We already need to pull that header, because of the constructors.

Yes, this is related to [[no_unique_address]]. If all extents are static,
then there's no dynamic extents that need to be stored. A similar
situation is encountered in span. Where the size of the span can be
a compile time constant. The implementation of span uses
[[no_unique_address]], which is why I wanted to do the same.

If I try to use [[no_unique_address]] with array, it has no effect.
(I believe it's because array always contains a private member
`_M_elems` and because array predates [[no_unique_address]] it
can't use that attribute.) Here's an example of this on godbolt:
https://godbolt.org/z/9vd5Gv31x

This does raise a question about ABI, is it correct to use
[[no_unique_address]]? When reading the standard, it's not clear to
me whether using [[no_unique_address]] is permitted/prohibited.


+      {
+      public:
+       constexpr _Tp&
+         operator[](size_t __n) noexcept
+         {
+           return _M_elems[__n];
+         }
+
+       constexpr const _Tp&
+         operator[](size_t __n) const noexcept
+         {
+           return _M_elems[__n];
+         }
+
+      private:
+       array<_Tp, _Nm> _M_elems;
+      };
+
+    template
+      class __array<_Tp, 0>
+      {
+      public:
+       constexpr _Tp&
+         operator[](size_t __n) noexcept
+         {
+           __builtin_trap();
+         }
+
+       constexpr const _Tp&
+         operator[](size_t __n) const noexcept
+         {
+           __builtin_trap();
+         }
+      };
+
+    template

I would use NTTP of the std::array here, and eliminate internal _S_exts.
Your __array is not structural, because it has private members.


There were two weak reasons. The first was that the following
line is a bit lengthy and breaks awkwardly:

  using _S_storage = __mdspan::_ExtentsStorage<_IndexType, 
array{_Extents...}>;


when sizeof...(_Extents) == 0, it can't deduce the template arguments, so I
couldn't make the shorter <_IndexType, {_Extents...}> work. During the
implementation it was sometimes useful to be able to manipulate the
extents as a pack; but now there's no such cases left anymore. I'll change
it.


+      class _ExtentsStorage
+      {
+      public:
+       static constexpr bool
+       _M_is_dyn(size_t __ext) noexcept
+       { return __ext == dynamic_extent; }
+
+       template
+         static constexpr _IndexType
+         _M_int_cast(const _OIndexType& __other) noexcept
+         { return _IndexType(__other); }
+
+       static constexpr size_t _S_rank = sizeof...(_Extents);
+       static constexpr array _S_exts{_Extents...};
+
+       // For __r in [0, _S_rank], _S_dynamic_index[__r] is the
number
+       // of dynamic extents up to (and not including) __r.
+       //
+       // If __r 

Re: [PATCH] ipa-cp: Fix up ipcp_print_widest_int

2025-04-15 Thread Jakub Jelinek
On Tue, Apr 15, 2025 at 02:33:23PM +0200, Richard Biener wrote:
> ISTR at least CCP _prints_ wide_int (or it's tree form), not widest_int
> (but the lattice has widest_int indeed).  But CCP knows the precision
> of the lattice entry (which is for an SSA name), possibly IPA CP
> doesn't.

ipcp_bits_lattice that is being printed doesn't seem to know the precision.
If it used wide_int, it would know it trivially from get_precision ()
on the m_value and m_mask.

Jakub



Re: [PATCH] ipa-cp: Fix up ipcp_print_widest_int

2025-04-15 Thread Jakub Jelinek
On Tue, Apr 15, 2025 at 02:17:46PM +0200, Martin Jambor wrote:
> Hi,
> 
> On Tue, Apr 15 2025, Jakub Jelinek wrote:
> > On Mon, Mar 31, 2025 at 03:34:07PM +0200, Martin Jambor wrote:
> >> This patch just introduces a form of dumping of widest ints that only
> >> have zeros in the lowest 128 bits so that instead of printing
> >> thousands of f's the output looks like:
> >> 
> >>Bits: value = 0x, mask = all ones folled by 
> >> 0x
> >> 
> >> and then makes sure we use the function not only to print bits but
> >> also to print masks where values like these can also occur.
> >
> > Shouldn't that be followed by instead?
> 
> Yes, of course.
> 
> > And the widest_int checks seems to be quite expensive (especially for
> > large widest_ints), I think for the first one we can just == -1
> > and for the second one wi::arshift (value, 128) == -1 and the zero extension
> > by using wi::zext.
> >
> > Anyway, I wonder if it wouldn't be better to use something shorter,
> > the variant patch uses 0xf..f prefix before the 128-bit hexadecimal
> > number (maybe we could also special case the even more common bits 64+
> > are all ones case).  Or it could be 0xf*f prefix.  Or printing such
> > numbers as -0x prefixed negative, though that is not a good idea for masks.
> 
> I was not sure myself what the best way would be and so proposed the
> simplest variant I could think of.  I am fine with anything that does
> not print thousands of f's which could be the case before.
> 
> So if you like the second variant more, I and I guess I do as well, by
> all means go ahead and commit it.

Here is perhaps even better one which doesn't print e.g.
0xf..f
but just
0xf..f
(of course, for say mask of
0xf..f
it prints it like that, doesn't try to shorten the 0 digits.
But if the most significant bits aren't set, it will be just
0x

2025-04-15  Jakub Jelinek  

* ipa-cp.cc (ipcp_print_widest_int): Print values with all ones in
bits 128+ with "0xf..f" prefix instead of "all ones folled by ".
Simplify wide_int check for -1 or all ones above least significant
128 bits.

--- gcc/ipa-cp.cc.jj2025-04-15 12:22:07.485558525 +0200
+++ gcc/ipa-cp.cc   2025-04-15 14:12:01.327407951 +0200
@@ -313,14 +313,24 @@ ipcp_lattice::print (FILE * f,
 static void
 ipcp_print_widest_int (FILE *f, const widest_int &value)
 {
-  if (wi::eq_p (wi::bit_not (value), 0))
+  if (value == -1)
 fprintf (f, "-1");
-  else if (wi::eq_p (wi::bit_not (wi::bit_or (value,
- wi::sub (wi::lshift (1, 128),
-  1))), 0))
+  else if (wi::arshift (value, 128) == -1)
 {
-  fprintf (f, "all ones folled by ");
-  print_hex (wi::bit_and (value, wi::sub (wi::lshift (1, 128), 1)), f);
+  char buf[35], *p = buf + 2;
+  widest_int v = wi::zext (value, 128);
+  size_t len;
+  print_hex (v, buf);
+  len = strlen (p);
+  if (len == 32)
+   {
+ fprintf (f, "0xf..f");
+ while (*p == 'f')
+   ++p;
+   }
+  else
+   fprintf (f, "0xf..f%0*d", (int) (32 - len), 0);
+  fputs (p, f);
 }
   else
 print_hex (value, f);


Jakub



Re: [PATCH] Wbuiltin-declaration-mismatch-4.c: accept long long in warning for llp64

2025-04-15 Thread Jonathan Yong

On 4/13/25 2:46 AM, Jonathan Yong wrote:

Attached patch OK for master branch?
Will push soon if there are no objections.


Pushed to master branch.



Re: [PATCH] [testsuite] [ppc] block-cmp-8 should require powerpc64

2025-04-15 Thread Peter Bergner
On 4/14/25 11:35 PM, Alexandre Oliva wrote:
>> That said, that should be done in a separate patch.
> 
> *nod*.  Do you mean you're going to make that change, that I should, or
> that you hope someone else will?  I'd rather avoid duplication, and this
> is likely a somewhat involved change, since the string powerpc64 appears
> all over gcc/testsuite/, with various meanings other than a dejagnu
> effective target.

Sorry.  I meant to say I'll have someone from my team do this follow-on patch.
Yes, it will probably hit quite a few test cases.  No need for you to worry
about that.

Peter




Re: [RFC] [C]New syntax for the argument of counted_by attribute for C language

2025-04-15 Thread Michael Matz
Hello,

On Mon, 14 Apr 2025, Bill Wendling wrote:

> Now, I don't think this will be necessarily confusing to the
> programmer, but it's inconsistent. In other words, either 'counted_by'
> *must* forward declare the in-structure identifier or neither must.

If that's your concern then both should require it, instead of neither, 
because ...

> 1. The syntax needs to be unambiguous.
> 2. Identifier lookup must be consistent between the two attribute forms.
> 3. The common use case should take the least amount of code to write.
> (More of a "nice to have".)
> 
> Therefore, I suggest the following rules, that are more-or-less the
> reverse of the current proposal's rules:
> 
> - All untagged identifiers are assumed to be within the structure. If
> they aren't found in the struct, it's an error.
> - All globals (i.e. identifiers not in the struct) must be referenced
> via a special tag or a builtin (e.g. __builtin_global_ref()). The tag
> or builtin follow current scoping rules---i.e. it may pick up a shadow
> variable rather than the global.

... with that proposal, you're basically reverting back to the hack to 
replicate the whole assignment-expression syntax tree down to 
primary-expr just so that you can apply a special lookup rule for 
the identifier->primary-expr production.  And you're using that replicated 
assignment-expression tree in just a single attribute.  And to top it off 
you then introduce another syntactic production to revert to the normal 
lookup behaviour within the to-be-determined tag/builtin.

As discussed multiple times in this thread this syntax isn't unambiguous
(you need special wording and a new scoping rule on top of the syntax to 
make it all unambigous).  Even worse, under your new proposal this ...

> struct A {
>   int *buf __counted_by(len); // 'len' *must* be in the struct.
>   int len;
> };

... means that we would have to implement general delayed parsing for 
expressions in C parsers.  Doing that for a lone-ident only was always 
meh, but somewhat acceptable.  But the above means we have to do that 
always for the new expr-within-counted syntax tree.  That may be working 
fantastically if you're hacking all this into a c++ parser, but for other 
parsers that is something completely new.

I don't think this new proposal is new, it seems to be just the original 
Apple variant at its core, with new non-general scoping rule, weird escape 
mechanism, delayed parsing and basically everything that caused this long 
thread.


Ciao,
Michael.


[PATCH] ipa-bit-cp: Fix adjusting value according to mask (PR119803)

2025-04-15 Thread Martin Jambor
Hi,

In my fix for PR 119318 I put mask calculation in
ipcp_bits_lattice::meet_with_1 above a final fix to value so that all
the bits in the value which are meaningless according to mask have
value zero, which has tripped a validator in PR 119803.  This patch
fixes that by moving the adjustment down.

Even thought the fix for PR 119318 did a similar thing in
ipcp_bits_lattice::meet_with, the same is not necessary because that
code path then feeds the new value and mask to
ipcp_bits_lattice::set_to_constant which does the final adjustment
correctly.

In both places, however, Jakup proposed a better way of calculating
cap_mask and so I have changed it accordingly.

Because this fix has been really proposed by Jakub in BZ before I
managed to even run try the test-case myself, I think I can use my IPA
reviewer role to commit the patch which I plan to do later to day or
early tomorrow.

Thanks,

Martin


gcc/ChangeLog:

2025-04-15  Martin Jambor  

PR ipa/119803
* ipa-cp.cc (ipcp_bits_lattice::meet_with_1): Move m_value adjustmed
according to m_mask below the adjustment of the latter according to
cap_mask.  Optimize the  calculation of cap_mask a bit.
(ipcp_bits_lattice::meet_with): Optimize the calculation of cap_mask a
bit.

gcc/testsuite/ChangeLog:

2025-04-15  Martin Jambor  

PR ipa/119803
* gcc.dg/ipa/pr119803.c: New test.

Co-authored-by: Jakub Jelinek 
---
 gcc/ipa-cp.cc   |  6 +++---
 gcc/testsuite/gcc.dg/ipa/pr119803.c | 16 
 2 files changed, 19 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/ipa/pr119803.c

diff --git a/gcc/ipa-cp.cc b/gcc/ipa-cp.cc
index 26b1496f29b..43f8c6922d0 100644
--- a/gcc/ipa-cp.cc
+++ b/gcc/ipa-cp.cc
@@ -923,13 +923,13 @@ ipcp_bits_lattice::meet_with_1 (widest_int value, 
widest_int mask,
   m_mask = (m_mask | mask) | (m_value ^ value);
   if (drop_all_ones)
 m_mask |= m_value;
-  m_value &= ~m_mask;
 
-  widest_int cap_mask = wi::bit_not (wi::sub (wi::lshift (1, precision), 1));
+  widest_int cap_mask = wi::shifted_mask  (0, precision, true);
   m_mask |= cap_mask;
   if (wi::sext (m_mask, precision) == -1)
 return set_to_bottom ();
 
+  m_value &= ~m_mask;
   return m_mask != old_mask;
 }
 
@@ -1005,7 +1005,7 @@ ipcp_bits_lattice::meet_with (ipcp_bits_lattice& other, 
unsigned precision,
  adjusted_mask |= adjusted_value;
  adjusted_value &= ~adjusted_mask;
}
-  widest_int cap_mask = wi::bit_not (wi::sub (wi::lshift (1, precision), 
1));
+  widest_int cap_mask = wi::shifted_mask  (0, precision, true);
   adjusted_mask |= cap_mask;
   if (wi::sext (adjusted_mask, precision) == -1)
return set_to_bottom ();
diff --git a/gcc/testsuite/gcc.dg/ipa/pr119803.c 
b/gcc/testsuite/gcc.dg/ipa/pr119803.c
new file mode 100644
index 000..1a7bfd25018
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/ipa/pr119803.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+extern void f(int p);
+int a, b;
+char c;
+static int d(int e) { return !e || a == 1 ? 0 : a / e; }
+static void h(short e) {
+  int g = d(e);
+  f(g);
+}
+void i() {
+  c = 128;
+  h(c);
+  b = d(65536);
+}
-- 
2.49.0



[committed] d: Fix ICE in dwarf2out_imported_module_or_decl, at dwarf2out.cc:27676 [PR119817]

2025-04-15 Thread Iain Buclaw
Hi,

This patch fixes the ICE in PR119817.

The ImportVisitor method for handling the importing of overload sets was
pushing NULL_TREE to the array of import decls, which in turn got passed
to `debug_hooks->imported_module_or_decl', triggering the observed
internal compiler error.

NULL_TREE is returned from `build_import_decl' when the symbol was
ignored for being non-trivial to represent in debug, for example,
template or tuple declarations.  So similarly "skip" adding the symbol
when this is the case for overload sets too.

Bootstrapped and regression tested on x86_64-linux-gnu/-m32, and
committed to mainline.  Will backport as necessary once gcc-14/13/12
release branches have been regtested too.

Regards,
Iain.

---
PR d/119817

gcc/d/ChangeLog:

* imports.cc (ImportVisitor::visit (OverloadSet *)): Don't push
NULL_TREE to vector of import symbols.

gcc/testsuite/ChangeLog:

* gdc.dg/debug/imports/m119817/a.d: New test.
* gdc.dg/debug/imports/m119817/b.d: New test.
* gdc.dg/debug/imports/m119817/package.d: New test.
* gdc.dg/debug/pr119817.d: New test.
---
 gcc/d/imports.cc | 6 +-
 gcc/testsuite/gdc.dg/debug/imports/m119817/a.d   | 2 ++
 gcc/testsuite/gdc.dg/debug/imports/m119817/b.d   | 2 ++
 gcc/testsuite/gdc.dg/debug/imports/m119817/package.d | 4 
 gcc/testsuite/gdc.dg/debug/pr119817.d| 6 ++
 5 files changed, 19 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gdc.dg/debug/imports/m119817/a.d
 create mode 100644 gcc/testsuite/gdc.dg/debug/imports/m119817/b.d
 create mode 100644 gcc/testsuite/gdc.dg/debug/imports/m119817/package.d
 create mode 100644 gcc/testsuite/gdc.dg/debug/pr119817.d

diff --git a/gcc/d/imports.cc b/gcc/d/imports.cc
index 776caafd25c..16e4df69d65 100644
--- a/gcc/d/imports.cc
+++ b/gcc/d/imports.cc
@@ -182,7 +182,11 @@ public:
 vec_alloc (tset, d->a.length);
 
 for (size_t i = 0; i < d->a.length; i++)
-  vec_safe_push (tset, build_import_decl (d->a[i]));
+  {
+   tree overload = build_import_decl (d->a[i]);
+   if (overload != NULL_TREE)
+ vec_safe_push (tset, overload);
+  }
 
 this->result_ = build_tree_list_vec (tset);
 tset->truncate (0);
diff --git a/gcc/testsuite/gdc.dg/debug/imports/m119817/a.d 
b/gcc/testsuite/gdc.dg/debug/imports/m119817/a.d
new file mode 100644
index 000..a13747240c4
--- /dev/null
+++ b/gcc/testsuite/gdc.dg/debug/imports/m119817/a.d
@@ -0,0 +1,2 @@
+module imports.m119817.a;
+void f119817()() { }
diff --git a/gcc/testsuite/gdc.dg/debug/imports/m119817/b.d 
b/gcc/testsuite/gdc.dg/debug/imports/m119817/b.d
new file mode 100644
index 000..aef0e373ca6
--- /dev/null
+++ b/gcc/testsuite/gdc.dg/debug/imports/m119817/b.d
@@ -0,0 +1,2 @@
+module imports.m119817.b;
+void f119817() { }
diff --git a/gcc/testsuite/gdc.dg/debug/imports/m119817/package.d 
b/gcc/testsuite/gdc.dg/debug/imports/m119817/package.d
new file mode 100644
index 000..188827e669f
--- /dev/null
+++ b/gcc/testsuite/gdc.dg/debug/imports/m119817/package.d
@@ -0,0 +1,4 @@
+module imports.m119817;
+public import
+imports.m119817.a,
+imports.m119817.b;
diff --git a/gcc/testsuite/gdc.dg/debug/pr119817.d 
b/gcc/testsuite/gdc.dg/debug/pr119817.d
new file mode 100644
index 000..3eea6ba9a90
--- /dev/null
+++ b/gcc/testsuite/gdc.dg/debug/pr119817.d
@@ -0,0 +1,6 @@
+// { dg-do compile }
+// { dg-additional-sources "imports/m119817/package.d" }
+// { dg-additional-sources "imports/m119817/a.d" }
+// { dg-additional-sources "imports/m119817/b.d" }
+module pr119817;
+import imports.m119817 : f119817;
-- 
2.43.0



[committed] d: Fix internal compiler error: in visit, at d/decl.cc:838 [PR119799]

2025-04-15 Thread Iain Buclaw
Hi,

This patch fixes the ICE in PR119799.

This was caused by a check in the D front-end disallowing static
VAR_DECLs with a size `0'.

While empty structs in D are given the size `1', the same symbol coming
from ImportC modules do infact have no size, so allow C variables to
pass the check as well as array objects.

Bootstrapped and regression tested on x86_64-linux-gnu/-m32, and
committed to mainline.  Will backport as necessary once gcc-14/13/12
release branches have been regtested too.

Regards,
Iain.

---
PR d/119799

gcc/d/ChangeLog:

* decl.cc (DeclVisitor::visit (VarDeclaration *)): Check front-end
type size before building the VAR_DECL.  Allow C symbols to have a
size of `0'.

gcc/testsuite/ChangeLog:

* gdc.dg/import-c/pr119799.d: New test.
* gdc.dg/import-c/pr119799c.c: New test.
---
 gcc/d/decl.cc | 15 ++-
 gcc/testsuite/gdc.dg/import-c/pr119799.d  |  2 ++
 gcc/testsuite/gdc.dg/import-c/pr119799c.c |  1 +
 3 files changed, 13 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gdc.dg/import-c/pr119799.d
 create mode 100644 gcc/testsuite/gdc.dg/import-c/pr119799c.c

diff --git a/gcc/d/decl.cc b/gcc/d/decl.cc
index 136f78b32ff..9ddf7cf1540 100644
--- a/gcc/d/decl.cc
+++ b/gcc/d/decl.cc
@@ -791,6 +791,12 @@ public:
   }
 else if (d->isDataseg ())
   {
+   /* When the front-end type size is invalid, an error has already been
+  given for the declaration or type.  */
+   dinteger_t size = dmd::size (d->type, d->loc);
+   if (size == SIZE_INVALID)
+ return;
+
tree decl = get_symbol_decl (d);
 
/* Only need to build the VAR_DECL for extern declarations.  */
@@ -804,9 +810,7 @@ public:
  return;
 
/* How big a symbol can be should depend on back-end.  */
-   tree size = build_integer_cst (dmd::size (d->type, d->loc),
-  build_ctype (Type::tsize_t));
-   if (!valid_constant_size_p (size))
+   if (!valid_constant_size_p (build_integer_cst (size, size_type_node)))
  {
error_at (make_location_t (d->loc), "size is too large");
return;
@@ -835,8 +839,9 @@ public:
  }
 
/* Frontend should have already caught this.  */
-   gcc_assert (!integer_zerop (size)
-   || d->type->toBasetype ()->isTypeSArray ());
+   gcc_assert ((size != 0 && size != SIZE_INVALID)
+   || d->type->toBasetype ()->isTypeSArray ()
+   || d->isCsymbol ());
 
d_finish_decl (decl);
 
diff --git a/gcc/testsuite/gdc.dg/import-c/pr119799.d 
b/gcc/testsuite/gdc.dg/import-c/pr119799.d
new file mode 100644
index 000..d8b0fa22fe1
--- /dev/null
+++ b/gcc/testsuite/gdc.dg/import-c/pr119799.d
@@ -0,0 +1,2 @@
+// { dg-do compile }
+import pr119799c;
diff --git a/gcc/testsuite/gdc.dg/import-c/pr119799c.c 
b/gcc/testsuite/gdc.dg/import-c/pr119799c.c
new file mode 100644
index 000..b80e856f75f
--- /dev/null
+++ b/gcc/testsuite/gdc.dg/import-c/pr119799c.c
@@ -0,0 +1 @@
+static struct {} s119799;
-- 
2.43.0



Re: [PATCH] ipa-bit-cp: Fix adjusting value according to mask (PR119803)

2025-04-15 Thread Jakub Jelinek
On Tue, Apr 15, 2025 at 03:14:49PM +0200, Martin Jambor wrote:
> In my fix for PR 119318 I put mask calculation in
> ipcp_bits_lattice::meet_with_1 above a final fix to value so that all
> the bits in the value which are meaningless according to mask have
> value zero, which has tripped a validator in PR 119803.  This patch
> fixes that by moving the adjustment down.
> 
> Even thought the fix for PR 119318 did a similar thing in
> ipcp_bits_lattice::meet_with, the same is not necessary because that
> code path then feeds the new value and mask to
> ipcp_bits_lattice::set_to_constant which does the final adjustment
> correctly.

You're right, set_to_constant handles that.

> In both places, however, Jakup proposed a better way of calculating
> cap_mask and so I have changed it accordingly.
> 
> Because this fix has been really proposed by Jakub in BZ before I
> managed to even run try the test-case myself, I think I can use my IPA
> reviewer role to commit the patch which I plan to do later to day or
> early tomorrow.

> gcc/ChangeLog:
> 
> 2025-04-15  Martin Jambor  
> 
>   PR ipa/119803
>   * ipa-cp.cc (ipcp_bits_lattice::meet_with_1): Move m_value adjustmed
>   according to m_mask below the adjustment of the latter according to
>   cap_mask.  Optimize the  calculation of cap_mask a bit.
>   (ipcp_bits_lattice::meet_with): Optimize the calculation of cap_mask a
>   bit.
> 
> gcc/testsuite/ChangeLog:
> 
> 2025-04-15  Martin Jambor  
> 
>   PR ipa/119803
>   * gcc.dg/ipa/pr119803.c: New test.
> 
> Co-authored-by: Jakub Jelinek 

LGTM.

Jakub



Re: [PATCH] c++: Prune lambda captures from more places [PR119755]

2025-04-15 Thread Nathaniel Shead
On Tue, Apr 15, 2025 at 11:39:21PM +1000, Nathaniel Shead wrote:
> On Tue, Apr 15, 2025 at 09:16:46AM -0400, Jason Merrill wrote:
> > On 4/15/25 2:56 AM, Nathaniel Shead wrote:
> > > On Mon, Apr 14, 2025 at 05:33:05PM -0400, Jason Merrill wrote:
> > > > On 4/13/25 6:32 AM, Nathaniel Shead wrote:
> > > > > Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk?
> > > > > 
> > > > > -- >8 --
> > > > > 
> > > > > Currently, pruned lambda captures are still leftover in the function's
> > > > > BLOCK and topmost BIND_EXPR; this doesn't cause any issues for normal
> > > > > compilation, but does break modules streaming as we try to 
> > > > > reconstruct a
> > > > > FIELD_DECL that no longer exists on the type itself.
> > > > > 
> > > > >   PR c++/119755
> > > > > 
> > > > > gcc/cp/ChangeLog:
> > > > > 
> > > > >   * lambda.cc (prune_lambda_captures): Remove pruned capture from
> > > > >   function's BLOCK_VARS and BIND_EXPR_VARS.
> > > > > 
> > > > > gcc/testsuite/ChangeLog:
> > > > > 
> > > > >   * g++.dg/modules/lambda-10_a.H: New test.
> > > > >   * g++.dg/modules/lambda-10_b.C: New test.
> > > > > 
> > > > > Signed-off-by: Nathaniel Shead 
> > > > > ---
> > > > >gcc/cp/lambda.cc   | 22 
> > > > > ++
> > > > >gcc/testsuite/g++.dg/modules/lambda-10_a.H | 17 +
> > > > >gcc/testsuite/g++.dg/modules/lambda-10_b.C |  7 +++
> > > > >3 files changed, 46 insertions(+)
> > > > >create mode 100644 gcc/testsuite/g++.dg/modules/lambda-10_a.H
> > > > >create mode 100644 gcc/testsuite/g++.dg/modules/lambda-10_b.C
> > > > > 
> > > > > diff --git a/gcc/cp/lambda.cc b/gcc/cp/lambda.cc
> > > > > index f0a54b60275..d01bb04cd32 100644
> > > > > --- a/gcc/cp/lambda.cc
> > > > > +++ b/gcc/cp/lambda.cc
> > > > > @@ -1858,6 +1858,14 @@ prune_lambda_captures (tree body)
> > > > >  cp_walk_tree_without_duplicates (&body, mark_const_cap_r, 
> > > > > &const_vars);
> > > > > +  tree bind_expr = expr_single (DECL_SAVED_TREE (lambda_function 
> > > > > (lam)));
> > > > > +  gcc_assert (bind_expr
> > > > > +   && (TREE_CODE (bind_expr) == BIND_EXPR
> > > > > +   /* FIXME: In a noexcept lambda we never prune captures
> > > > > +  (PR119764); when we do we need to handle this case
> > > > > +  for modules streaming.  */
> > > > 
> > > > The attached patch seems to fix that, with the result that your patch
> > > > crashes.
> > > > 
> > > 
> > > Thanks.  And yup, crashing was deliberate here as I wasn't 100% sure
> > > what the tree would look like for this case after an appropriate fix.
> > > 
> > > One quick question about your patch, since it could in theory affect ABI
> > > (the size of the lambdas change) should the pruning of such lambdas be
> > > dependent on an ABI version check?
> > 
> > Indeed, perhaps this is too late in the 15 cycle for such a change.
> > 
> > > Otherwise here's an updated patch that relies on your patch.
> > > Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk along
> > > with yours?  (Or if the potential ABI concerns mean that your change
> > > isn't appropriate for GCC15, would the old version of my patch still be
> > > OK for GCC15 to get 'import std' working again for C++26?)
> > 
> > For 15 please adjust this patch to be more fault-tolerant:
> > 
> > > -- >8 --
> > > 
> > > Currently, pruned lambda captures are still leftover in the function's
> > > BLOCK and topmost BIND_EXPR; this doesn't cause any issues for normal
> > > compilation, but does break modules streaming as we try to reconstruct a
> > > FIELD_DECL that no longer exists on the type itself.
> > > 
> > >   PR c++/119755
> > > 
> > > gcc/cp/ChangeLog:
> > > 
> > >   * lambda.cc (prune_lambda_captures): Remove pruned capture from
> > >   function's BLOCK_VARS and BIND_EXPR_VARS.
> > > 
> > > gcc/testsuite/ChangeLog:
> > > 
> > >   * g++.dg/modules/lambda-10_a.H: New test.
> > >   * g++.dg/modules/lambda-10_b.C: New test.
> > > 
> > > Signed-off-by: Nathaniel Shead 
> > > Reviewed-by: Jason Merrill 
> > > ---
> > >   gcc/cp/lambda.cc   | 19 +++
> > >   gcc/testsuite/g++.dg/modules/lambda-10_a.H | 17 +
> > >   gcc/testsuite/g++.dg/modules/lambda-10_b.C |  7 +++
> > >   3 files changed, 43 insertions(+)
> > >   create mode 100644 gcc/testsuite/g++.dg/modules/lambda-10_a.H
> > >   create mode 100644 gcc/testsuite/g++.dg/modules/lambda-10_b.C
> > > 
> > > diff --git a/gcc/cp/lambda.cc b/gcc/cp/lambda.cc
> > > index c6308b941d3..7bb88a900d5 100644
> > > --- a/gcc/cp/lambda.cc
> > > +++ b/gcc/cp/lambda.cc
> > > @@ -1862,6 +1862,11 @@ prune_lambda_captures (tree body)
> > > cp_walk_tree_without_duplicates (&body, mark_const_cap_r, 
> > > &const_vars);
> > > +  tree bind_expr = expr_single (DECL_SAVED_TREE (lambda_function (lam)));
> > > +  if (bind_expr && TREE_CODE (bind_expr) == MUST_NOT_THROW_EXPR)
> > > +  

Re: [PATCH] wwwdocs: Document changes to the D front-end

2025-04-15 Thread Iain Buclaw
Ping.

If no obvious grammar errors are flagged, I'll likely just commit this 
later today.

Thanks,
Iain.

Excerpts from Iain Buclaw's message of März 20, 2025 12:25 pm:
> Hi,
> 
> This patch adds D front-end section to the GCC changes pages. When
> inspecting this, I noticed that the previous two releases has been
> missed/forgot about as well.
> 
> Ran all pages through the w3c validator and got no reported errors.
> 
> OK?
> 
> Thanks,
> Iain.
> 
> ---
>   * htdocs/gcc-13/changes.html: Add D changes section.
>   * htdocs/gcc-14/changes.html: Likewise.
>   * htdocs/gcc-15/changes.html: Likewise.
> ---
>  htdocs/gcc-13/changes.html | 69 +-
>  htdocs/gcc-14/changes.html | 10 +-
>  htdocs/gcc-15/changes.html | 14 +++-
>  3 files changed, 90 insertions(+), 3 deletions(-)
> 
> diff --git a/htdocs/gcc-13/changes.html b/htdocs/gcc-13/changes.html
> index 4860c500..aa7556a2 100644
> --- a/htdocs/gcc-13/changes.html
> +++ b/htdocs/gcc-13/changes.html
> @@ -434,7 +434,74 @@ You may also want to check out our
>
>  
>  
> -
> +D
> +
> +  Support for the D programming language has been updated to version
> +2.103.1 of the language and run-time library. Full changelog for this
> +release and previous releases can be found on the
> +https://dlang.org/changelog/2.103.1.html";>dlang.org
> +website.
> +  
> +  The following GCC attributes are now recognized and available from
> +the gcc.attributes module with short-hand aliases for
> +convenience:
> +
> +  @attribute("no_sanitize", arguments) or
> +@no_sanitize(arguments).
> +  
> +  @attribute("register") or
> +@register.
> +  
> +  @attribute("simd") or @simd.
> +  @attribute("simd_clones", mask) or
> +@simd_clones(mask).
> +  
> +  @attribute("visibility", arguments) or
> +@visibility(arguments).
> +  
> +
> +  
> +  New aliases have been added to gcc.attributes for
> +compatibility with ldc.attributes.
> +
> +  The @hidden attribute is an alias for
> +@attribute("visibility", "hidden").
> +  
> +  The @noSanitize attribute is an alias for
> +@attribute("no_sanitize").
> +  
> +
> +  
> +  Vector operation intrinsics prefetch,
> +  loadUnaligned, storeUnaligned,
> +  shuffle, shufflevector,
> +  extractelement, insertelement,
> +  convertvector, and blendvector have been 
> added
> +  to the gcc.simd module.
> +  
> +  New warnings:
> +
> +   href="https://gcc.gnu.org/onlinedocs/gcc-13.1.0/gdc/Warnings.html#index-Wno-builtin-declaration-mismatch";>-Wbuiltin-declaration-mismatch=
> + warns when a built-in function is declared with the wrong signature.
> +  
> +   href="https://gcc.gnu.org/onlinedocs/gcc-13.2.0/gdc/Warnings.html#index-Wmismatched-special-enum";>-Wmismatched-special-enum
> + warns when a special enum is declared with the wrong base type.
> +  
> +
> +  
> +  New version identifier D_Optimized is now predefined when 
> the
> +-O option, or any higher optimization level is used.
> +  
> +  The predefinition of version D_Exceptions can now by
> +controlled by the option -fexception.
> +  
> +  The predefinition of version D_TypeInfo can now by
> +  controlled by the option -frtti.
> +  
> +  The -fdebug= and -fversion= compiler
> +switches no longer accept an integer argument.
> +  
> +
>  
>  Fortran
>  
> diff --git a/htdocs/gcc-14/changes.html b/htdocs/gcc-14/changes.html
> index 8ff84582..df5ebc6e 100644
> --- a/htdocs/gcc-14/changes.html
> +++ b/htdocs/gcc-14/changes.html
> @@ -602,7 +602,15 @@ You may also want to check out our
>
>  
>  
> -
> +D
> +
> +  Support for the D programming language has been updated to version
> +2.108.1 of the language and run-time library. Full changelog for this
> +release and previous releases can be found on the
> +https://dlang.org/changelog/2.108.1.html";>dlang.org
> +website.
> +  
> +
>  
>  Fortran
>  
> diff --git a/htdocs/gcc-15/changes.html b/htdocs/gcc-15/changes.html
> index 42b713a2..968d5c17 100644
> --- a/htdocs/gcc-15/changes.html
> +++ b/htdocs/gcc-15/changes.html
> @@ -423,7 +423,19 @@ asm (".text; %cc0: mov %cc2, %%r0; .previous;"
>
>  
>  
> -
> +D
> +
> +  Support for the D programming language has been updated to version
> +2.111.0 of the language and run-time library. Full changelog for this
> +release and previous releases can be found on the
> +https://dlang.org/changelog/2.111.0.html";>dlang.org
> +website.
> +  
> +  On supported targets, the version GNU_CET is now 
> predefined
> +when the option -fcf-protection is used. The protection 
> level
> +is also set in the traits key __traits(getTargetInfo, 
> "CET").
> +  
> +
>  
>  Fortran
>  
> -- 
> 2.43.0
> 
> 


Re: [PATCH] ipa-cp: Fix up ipcp_print_widest_int

2025-04-15 Thread Martin Jambor
Hi,

On Tue, Apr 15 2025, Jakub Jelinek wrote:
> On Tue, Apr 15, 2025 at 02:17:46PM +0200, Martin Jambor wrote:
>> Hi,
>> 
>> On Tue, Apr 15 2025, Jakub Jelinek wrote:
>> > On Mon, Mar 31, 2025 at 03:34:07PM +0200, Martin Jambor wrote:
>> >> This patch just introduces a form of dumping of widest ints that only
>> >> have zeros in the lowest 128 bits so that instead of printing
>> >> thousands of f's the output looks like:
>> >> 
>> >>Bits: value = 0x, mask = all ones folled by 
>> >> 0x
>> >> 
>> >> and then makes sure we use the function not only to print bits but
>> >> also to print masks where values like these can also occur.
>> >
>> > Shouldn't that be followed by instead?
>> 
>> Yes, of course.
>> 
>> > And the widest_int checks seems to be quite expensive (especially for
>> > large widest_ints), I think for the first one we can just == -1
>> > and for the second one wi::arshift (value, 128) == -1 and the zero 
>> > extension
>> > by using wi::zext.
>> >
>> > Anyway, I wonder if it wouldn't be better to use something shorter,
>> > the variant patch uses 0xf..f prefix before the 128-bit hexadecimal
>> > number (maybe we could also special case the even more common bits 64+
>> > are all ones case).  Or it could be 0xf*f prefix.  Or printing such
>> > numbers as -0x prefixed negative, though that is not a good idea for masks.
>> 
>> I was not sure myself what the best way would be and so proposed the
>> simplest variant I could think of.  I am fine with anything that does
>> not print thousands of f's which could be the case before.
>> 
>> So if you like the second variant more, I and I guess I do as well, by
>> all means go ahead and commit it.
>
> Here is perhaps even better one which doesn't print e.g.
> 0xf..f
> but just
> 0xf..f
> (of course, for say mask of
> 0xf..f
> it prints it like that, doesn't try to shorten the 0 digits.
> But if the most significant bits aren't set, it will be just
> 0x
>
> 2025-04-15  Jakub Jelinek  
>
>   * ipa-cp.cc (ipcp_print_widest_int): Print values with all ones in
>   bits 128+ with "0xf..f" prefix instead of "all ones folled by ".
>   Simplify wide_int check for -1 or all ones above least significant
>   128 bits.

That is great, thank you.

Martin


>
> --- gcc/ipa-cp.cc.jj  2025-04-15 12:22:07.485558525 +0200
> +++ gcc/ipa-cp.cc 2025-04-15 14:12:01.327407951 +0200
> @@ -313,14 +313,24 @@ ipcp_lattice::print (FILE * f,
>  static void
>  ipcp_print_widest_int (FILE *f, const widest_int &value)
>  {
> -  if (wi::eq_p (wi::bit_not (value), 0))
> +  if (value == -1)
>  fprintf (f, "-1");
> -  else if (wi::eq_p (wi::bit_not (wi::bit_or (value,
> -   wi::sub (wi::lshift (1, 128),
> -1))), 0))
> +  else if (wi::arshift (value, 128) == -1)
>  {
> -  fprintf (f, "all ones folled by ");
> -  print_hex (wi::bit_and (value, wi::sub (wi::lshift (1, 128), 1)), f);
> +  char buf[35], *p = buf + 2;
> +  widest_int v = wi::zext (value, 128);
> +  size_t len;
> +  print_hex (v, buf);
> +  len = strlen (p);
> +  if (len == 32)
> + {
> +   fprintf (f, "0xf..f");
> +   while (*p == 'f')
> + ++p;
> + }
> +  else
> + fprintf (f, "0xf..f%0*d", (int) (32 - len), 0);
> +  fputs (p, f);
>  }
>else
>  print_hex (value, f);
>
>
>   Jakub


Re: [PATCH v2] libstdc++: Implement formatter for ranges and range_formatter [PR109162]

2025-04-15 Thread Jonathan Wakely

On 15/04/25 15:05 +0100, Jonathan Wakely wrote:

A few spelling and grammar fixes, and whitespace tweaks, but the only
significant thing is to qualify some calls to prevent ADL ...


Doh, as pointed out in chat the ADL thing is irrelevant because that's
not std::format.

So I think everything else is just spelling or whitespace, and the
double spaces before "range" in the exception strings.

OK for trunk with those minor issues fixed then.



On 14/04/25 16:13 +0200, Tomasz Kamiński wrote:

This patch implements formatter specialization for input_ranges and
range_formatter class form P2286R8, as adjusted by P2585R1. The formatter


"form" should be "from"


for pair/tuple is not yet provided, making maps not formattable.

To indicate partial support we define __glibcxx_format_ranges macro
value 1, without defining __cpp_lib_format_ranges.


That was already pushed in an earlier commit, but this sounds like
it's done here.


This introduces an new _M_format_range member to internal __formatter_str,
that formats range as _CharT as string, according the to the format spec.


"the to the" should be "to the"


This function transform any contiguous range into basic_string_view direclty,


"directly"


by computing size if necessary. Otherwise, for ranges for which size can be
computed (forward_range or sized_range) we use a stack buffer, if they are
sufficiently small. Finally, we create a basic_string<_CharT> from the range,
and format it content.


Should be "its content"



In case when padding is specified, this is handled by firstly formatting
the content of the range to the temporary string object. However, this can be
only implemented if the iterator of the basic_format_context is internal
type-erased iterator used by implementation. Otherwise a new 
basic_format_context
would need to be created, which would require rebinding of handles stored in
the arguments: note that format spec for element type could retrive any format


"retrive" should be "retrieve"


argument from format context, visit and and user handle to format it.


"and and"


As basic_format_context provide no user-facing constructor, the user are not 
able


"the user are not" should be "users are not"


to cosntructor object of that type with arbitrally iterators.


"cosntructor" should be "construct"
"arbitrally" should be "arbitrary"



The signatures of the user-facing parse and format method of the provided


"method" should be "methods"


formatters deviate from the standard by constraining types of params:
* _CharT is constrained __formatter::__char
* basic_format_parse_context<_CharT> for parse argument
* basic_format_context<_Out, _CharT> for format second argument
The standard specifies last three of above as unconstrained types. This types


"This types" should be "These types"


are later passed to possibly user-provided formatter specializations, that are
required via formattable concept to only accept above types.

Finally, the formatter specialization is implemented
without using specialization of range-default-formatter exposition only
template as base class, while providing same functionality.

PR libstdc++/109162

libstdc++-v3/ChangeLog:

* include/std/format (__format::__has_debug_format, 
_Pres_type::_Pres_seq)
(_Pres_type::_Pres_str, __format::__Stackbuf_size): Define.
(_Separators::_S_squares, _Separators::_S_parens, _Separators::_S_comma)
(_Separators::_S_colon): Define additional constants.
(_Spec::_M_parse_fill_and_align): Define overload accepting
list of excluded characters for fill, and forward existing overload.
(__formatter_str::_M_format_range): Define.
(__format::_Buf_sink) Use __Stackbuf_size for size of array.
(__format::__is_map_formattable, std::range_formatter)
(std::formatter<_Rg, _CharT>): Define.
* src/c++23/std.cc.in (std::format_kind, std::range_format)
(std::range_formatter): Export.
* testsuite/std/format/formatter/lwg3944.cc: Guarded tests with
__glibcxx_format_ranges.
* testsuite/std/format/formatter/requirements.cc: Adjusted for standard
behavior.
* testsuite/23_containers/vector/bool/format.cc: Test vector 
formatting.
* testsuite/std/format/ranges/format_kind.cc: New test.
* testsuite/std/format/ranges/formatter.cc: New test.
* testsuite/std/format/ranges/sequence.cc: New test.
* testsuite/std/format/ranges/string.cc: New test.
---
Adjusted the commit message and added test for result of formattable
check for ranges of types that are not formattable.

libstdc++-v3/include/std/format   | 511 --
libstdc++-v3/src/c++23/std.cc.in  |   6 +
.../23_containers/vector/bool/format.cc   |   6 +
.../testsuite/std/format/formatter/lwg3944.cc |   4 +-
.../std/format/formatter/requirements.cc  |  14 +-
.../std/format/ranges/format_kind.cc  |  94 
.../testsuite/std/format/r

RE: [PATCH]middle-end: Fix incorrect codegen with PFA and VLS [PR119351]

2025-04-15 Thread Tamar Christina
> -Original Message-
> From: Richard Biener 
> Sent: Tuesday, April 15, 2025 12:49 PM
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; nd 
> Subject: Re: [PATCH]middle-end: Fix incorrect codegen with PFA and VLS
> [PR119351]
> 
> On Tue, 15 Apr 2025, Tamar Christina wrote:
> 
> > Hi All,
> >
> > The following example:
> >
> > #define N 512
> > #define START 2
> > #define END 505
> >
> > int x[N] __attribute__((aligned(32)));
> >
> > int __attribute__((noipa))
> > foo (void)
> > {
> >   for (signed int i = START; i < END; ++i)
> > {
> >   if (x[i] == 0)
> > return i;
> > }
> >   return -1;
> > }
> >
> > generates incorrect code with fixed length SVE because for early break we 
> > need
> > to know which value to start the scalar loop with if we take an early exit.
> >
> > Historically this means that we take the first element of every induction.
> > this is because there's an assumption in place, that even with masked loops 
> > the
> > masks come from a whilel* instruction.
> >
> > As such we reduce using a BIT_FIELD_REF <, 0>.
> >
> > When PFA was added this assumption was correct for non-masked loop,
> however we
> > assumed that PFA for VLA wouldn't work for now, and disabled it using the
> > alignment requirement checks.  We also expected VLS to PFA using scalar 
> > loops.
> >
> > However as this PR shows, for VLS the vectorizer can, and does in some
> > circumstances choose to peel using masks by masking the first iteration of 
> > the
> > loop with an additional alignment mask.
> >
> > When this is done, the first elements of the predicate can be inactive. In 
> > this
> > example element 1 is inactive based on the calculated misalignment.  hence 
> > the
> > -1 value in the first vector IV element.
> >
> > When we reduce using BIT_FIELD_REF we get the wrong value.
> >
> > This patch updates it by creating a new scalar PHI that keeps track of 
> > whether
> > we are the first iteration of the loop (with the additional masking) or 
> > whether
> > we have taken a loop iteration already.
> >
> > The generated sequence:
> >
> > pre-header:
> >   bb1:
> > i_1 = 
> >
> > header:
> >   bb2:
> > i_2 = PHI 
> > …
> >
> > early-exit:
> >   bb3:
> > i_3 = iv_step * i_2 + PHI
> >
> > Which eliminates the need to do an expensive mask based reduction.
> >
> > This fixes gromacs with one OpenMP thread. But with > 1 there is still an 
> > issue.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu,
> > arm-none-linux-gnueabihf, x86_64-pc-linux-gnu
> > -m32, -m64 and no issues.
> >
> > Ok for master?
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > PR tree-optimization/119351
> > * tree-vect-loop-manip.cc (vect_can_peel_nonlinear_iv_p): Reject PFA
> > with masking with non-linear IVs.
> > * tree-vect-loop.cc (vectorizable_induction): Support PFA for masking.
> >
> > gcc/testsuite/ChangeLog:
> >
> > PR tree-optimization/119351
> > * gcc.target/aarch64/sve/peel_ind_10.c: New test.
> > * gcc.target/aarch64/sve/peel_ind_10_run.c: New test.
> > * gcc.target/aarch64/sve/peel_ind_5.c: New test.
> > * gcc.target/aarch64/sve/peel_ind_5_run.c: New test.
> > * gcc.target/aarch64/sve/peel_ind_6.c: New test.
> > * gcc.target/aarch64/sve/peel_ind_6_run.c: New test.
> > * gcc.target/aarch64/sve/peel_ind_7.c: New test.
> > * gcc.target/aarch64/sve/peel_ind_7_run.c: New test.
> > * gcc.target/aarch64/sve/peel_ind_8.c: New test.
> > * gcc.target/aarch64/sve/peel_ind_8_run.c: New test.
> > * gcc.target/aarch64/sve/peel_ind_9.c: New test.
> > * gcc.target/aarch64/sve/peel_ind_9_run.c: New test.
> >
> > ---
> > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/peel_ind_10.c
> b/gcc/testsuite/gcc.target/aarch64/sve/peel_ind_10.c
> > new file mode 100644
> > index
> ..b7a7bc5cb0cfdfdb74adb12
> 0c54ba15019832cf1
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/aarch64/sve/peel_ind_10.c
> > @@ -0,0 +1,24 @@
> > +/* Fix for PR119351 alignment peeling with vectors and VLS.  */
> > +/* { dg-do compile } */
> > +/* { dg-options "-Ofast -msve-vector-bits=256 --param aarch64-autovec-
> preference=sve-only -fdump-tree-vect-details" } */
> > +
> > +#define N 512
> > +#define START 0
> > +#define END 505
> > +
> > +int x[N] __attribute__((aligned(32)));
> > +
> > +int __attribute__((noipa))
> > +foo (int start)
> > +{
> > +  for (unsigned int i = start; i < END; ++i)
> > +{
> > +  if (x[i] == 0)
> > +return i;
> > +}
> > +  return -1;
> > +}
> > +
> > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > +/* { dg-final { scan-tree-dump "pfa_iv_offset" "vect" } } */
> > +/* { dg-final { scan-tree-dump "Alignment of access forced using peeling" 
> > "vect"
> } } */
> > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/peel_ind_10_run.c
> b/gcc/testsuite/gcc.target/aarch64/sve/peel_ind_10_run.c
> > new file mode 100644
> > index
> 0

Re: [PATCH] [testsuite] [ppc] disable -mpowerpc64 for various ilp32 asm-out checks

2025-04-15 Thread Peter Bergner
On 4/15/25 12:01 AM, Alexandre Oliva wrote:
> On Apr 14, 2025, Peter Bergner  wrote:
> 
>> But -mcpu= should not enable -mpowerpc64 by default for -m32 compiles.
> 
> Oh, is that so?  It seems to have been the case for quite a long time.
> I can trivially see that GCC 9 already did that, but it may have been
> around for much longer than that.
> 
> And TBH it seemed sensible to me: if the selected CPU has 64-bit
> registers and instructions that use them, why not use them, even when
> using a 32-bit ABI?  (when one controls the whole system, that is)

To be pedantic, -m32 -mno-powerpc64 and -m32 -mpowerpc64 are different ABIs.
The question is, what ABI does the powerpc-elf target use?  The same 32-bit
ABI as powerpc-linux?  If so, that would require not enabling -mpowerpc64
by default for -m32 compiles.

You have to be careful using -m32 -mpowerpc64, as that is a specialized
use case.  You have to know that the kernel you are running on saves and
restores the full 64-bit register on context switches.  The 32-bit signal
contexts will not contain any of the upper 32-bits of the GPR regs.
Even calling a library function could corrupt the top part of the
callee-saved GPR regs if that library (eg, glibc) is compiled with
-m32 -mno-powerpc64.

So what ABI does powerpc-elf use and what does it mandate?

Peter



Re: [PATCH] libgcobol: mark riscv64-*-linux* as supported target

2025-04-15 Thread Jeff Law




On 4/15/25 7:57 AM, Andreas Schwab wrote:

* configure.tgt: Set LIBGCOBOL_SUPPORTED for riscv64-*-linux* with
64-bit multilib.
Can't say I'm happy with the amount of Cobol related churn at this phase 
in our cycle.  But this should be exceedingly safe.  So OK.


jeff



[PATCH] ref-temp1.C: Enable some tests for PE targets

2025-04-15 Thread Jonathan Yong

Attached patch OK for master branch?
Will push soon if there are no objections.
From b329d899d07cda78bc44d88f81bdf10d5e2db302 Mon Sep 17 00:00:00 2001
From: Jonathan Yong <10wa...@gmail.com>
Date: Tue, 15 Apr 2025 11:41:36 +
Subject: [PATCH] ref-temp1.C: Enable some tests for PE targets

Test for expected PE values.

Signed-off-by: Jonathan Yong <10wa...@gmail.com>

gcc/testsuite/ChangeLog:

	* g++.dg/abi/ref-temp1.C: Replicate some test based on
	PE expectations.
	* lib/target-supports.exp: New check_effective_target_pe.
---
 gcc/testsuite/g++.dg/abi/ref-temp1.C  | 13 +
 gcc/testsuite/lib/target-supports.exp | 10 ++
 2 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/gcc/testsuite/g++.dg/abi/ref-temp1.C b/gcc/testsuite/g++.dg/abi/ref-temp1.C
index 70c9a7a431c..b02dcf61042 100644
--- a/gcc/testsuite/g++.dg/abi/ref-temp1.C
+++ b/gcc/testsuite/g++.dg/abi/ref-temp1.C
@@ -7,11 +7,16 @@ struct B { const A (&x)[2]; };
 template  B &&b = { { { { 1, 2, 3 } }, { { 4, 5, 6 } } } };
 B &temp = b;
 
-// { dg-final { scan-assembler ".weak\(_definition\)?\[ \t\]_?_ZGR1bIvE_" } }
-// { dg-final { scan-assembler ".weak\(_definition\)?\[ \t\]_?_ZGR1bIvE0_" } }
-// { dg-final { scan-assembler ".weak\(_definition\)?\[ \t\]_?_ZGR1bIvE1_" } }
-// { dg-final { scan-assembler ".weak\(_definition\)?\[ \t\]_?_ZGR1bIvE2_" } }
+// { dg-final { scan-assembler ".weak\(_definition\)?\[ \t\]_?_ZGR1bIvE_" { target { ! pe } } } }
+// { dg-final { scan-assembler ".weak\(_definition\)?\[ \t\]_?_ZGR1bIvE0_" { target { ! pe } } } }
+// { dg-final { scan-assembler ".weak\(_definition\)?\[ \t\]_?_ZGR1bIvE1_" { target { ! pe } } } }
+// { dg-final { scan-assembler ".weak\(_definition\)?\[ \t\]_?_ZGR1bIvE2_" { target { ! pe } } } }
 
+// { dg-final { scan-assembler "\.section\t\.data\\\$_ZGR1bIvE_,\"w\"\n\t\.linkonce same_size" { target pe } } }
+// { dg-final { scan-assembler "\.section\t\.rdata\\\$_ZGR1bIvE0_,\"dr\"\n\t\.linkonce same_size" { target pe } } }
+// { dg-final { scan-assembler "\.section\t\.rdata\\\$_ZGR1bIvE1_,\"dr\"\n\t\.linkonce same_size" { target pe } } }
+// { dg-final { scan-assembler "\.section\t\.rdata\\\$_ZGR1bIvE2_,\"dr\"\n\t\.linkonce same_size" { target pe } } }
+//
 // { dg-final { scan-assembler "_ZGR1bIvE_:\n\[^\n]+_ZGR1bIvE0_" } }
 // { dg-final { scan-assembler "_ZGR1bIvE0_:\n\[^\n]+_ZGR1bIvE1_" } }
 // { dg-final { scan-assembler "_ZGR1bIvE1_:\n\[^\n]+\[ \t\]1" } }
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index a62f459ad7e..869d1501c38 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -562,6 +562,16 @@ proc check_effective_target_elf { } {
 }
 }
 
+# Returns 1 if the target uses the PE/COFF object format, 0 otherwise.
+
+proc check_effective_target_pe { } {
+if { [gcc_target_object_format] == "pe" } {
+return 1;
+} else {
+return 0;
+}
+}
+
 # Returns 1 if the target toolchain supports ifunc, 0 otherwise.
 
 proc check_ifunc_available { } {
-- 
2.49.0



Re: [PATCH] c++: Prune lambda captures from more places [PR119755]

2025-04-15 Thread Jason Merrill

On 4/15/25 9:39 AM, Nathaniel Shead wrote:

On Tue, Apr 15, 2025 at 09:16:46AM -0400, Jason Merrill wrote:

On 4/15/25 2:56 AM, Nathaniel Shead wrote:

On Mon, Apr 14, 2025 at 05:33:05PM -0400, Jason Merrill wrote:

On 4/13/25 6:32 AM, Nathaniel Shead wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk?

-- >8 --

Currently, pruned lambda captures are still leftover in the function's
BLOCK and topmost BIND_EXPR; this doesn't cause any issues for normal
compilation, but does break modules streaming as we try to reconstruct a
FIELD_DECL that no longer exists on the type itself.

PR c++/119755

gcc/cp/ChangeLog:

* lambda.cc (prune_lambda_captures): Remove pruned capture from
function's BLOCK_VARS and BIND_EXPR_VARS.

gcc/testsuite/ChangeLog:

* g++.dg/modules/lambda-10_a.H: New test.
* g++.dg/modules/lambda-10_b.C: New test.

Signed-off-by: Nathaniel Shead 
---
gcc/cp/lambda.cc   | 22 ++
gcc/testsuite/g++.dg/modules/lambda-10_a.H | 17 +
gcc/testsuite/g++.dg/modules/lambda-10_b.C |  7 +++
3 files changed, 46 insertions(+)
create mode 100644 gcc/testsuite/g++.dg/modules/lambda-10_a.H
create mode 100644 gcc/testsuite/g++.dg/modules/lambda-10_b.C

diff --git a/gcc/cp/lambda.cc b/gcc/cp/lambda.cc
index f0a54b60275..d01bb04cd32 100644
--- a/gcc/cp/lambda.cc
+++ b/gcc/cp/lambda.cc
@@ -1858,6 +1858,14 @@ prune_lambda_captures (tree body)
  cp_walk_tree_without_duplicates (&body, mark_const_cap_r, &const_vars);
+  tree bind_expr = expr_single (DECL_SAVED_TREE (lambda_function (lam)));
+  gcc_assert (bind_expr
+ && (TREE_CODE (bind_expr) == BIND_EXPR
+ /* FIXME: In a noexcept lambda we never prune captures
+(PR119764); when we do we need to handle this case
+for modules streaming.  */


The attached patch seems to fix that, with the result that your patch
crashes.



Thanks.  And yup, crashing was deliberate here as I wasn't 100% sure
what the tree would look like for this case after an appropriate fix.

One quick question about your patch, since it could in theory affect ABI
(the size of the lambdas change) should the pruning of such lambdas be
dependent on an ABI version check?


Indeed, perhaps this is too late in the 15 cycle for such a change.


Otherwise here's an updated patch that relies on your patch.
Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk along
with yours?  (Or if the potential ABI concerns mean that your change
isn't appropriate for GCC15, would the old version of my patch still be
OK for GCC15 to get 'import std' working again for C++26?)


For 15 please adjust this patch to be more fault-tolerant:


-- >8 --

Currently, pruned lambda captures are still leftover in the function's
BLOCK and topmost BIND_EXPR; this doesn't cause any issues for normal
compilation, but does break modules streaming as we try to reconstruct a
FIELD_DECL that no longer exists on the type itself.

PR c++/119755

gcc/cp/ChangeLog:

* lambda.cc (prune_lambda_captures): Remove pruned capture from
function's BLOCK_VARS and BIND_EXPR_VARS.

gcc/testsuite/ChangeLog:

* g++.dg/modules/lambda-10_a.H: New test.
* g++.dg/modules/lambda-10_b.C: New test.

Signed-off-by: Nathaniel Shead 
Reviewed-by: Jason Merrill 
---
   gcc/cp/lambda.cc   | 19 +++
   gcc/testsuite/g++.dg/modules/lambda-10_a.H | 17 +
   gcc/testsuite/g++.dg/modules/lambda-10_b.C |  7 +++
   3 files changed, 43 insertions(+)
   create mode 100644 gcc/testsuite/g++.dg/modules/lambda-10_a.H
   create mode 100644 gcc/testsuite/g++.dg/modules/lambda-10_b.C

diff --git a/gcc/cp/lambda.cc b/gcc/cp/lambda.cc
index c6308b941d3..7bb88a900d5 100644
--- a/gcc/cp/lambda.cc
+++ b/gcc/cp/lambda.cc
@@ -1862,6 +1862,11 @@ prune_lambda_captures (tree body)
 cp_walk_tree_without_duplicates (&body, mark_const_cap_r, &const_vars);
+  tree bind_expr = expr_single (DECL_SAVED_TREE (lambda_function (lam)));
+  if (bind_expr && TREE_CODE (bind_expr) == MUST_NOT_THROW_EXPR)
+bind_expr = expr_single (TREE_OPERAND (bind_expr, 0));
+  gcc_assert (bind_expr && TREE_CODE (bind_expr) == BIND_EXPR);


i.e. here clear bind_expr if it isn't a BIND_EXPR...


 tree *fieldp = &TYPE_FIELDS (LAMBDA_EXPR_CLOSURE (lam));
 for (tree *capp = &LAMBDA_EXPR_CAPTURE_LIST (lam); *capp; )
   {
@@ -1883,6 +1888,20 @@ prune_lambda_captures (tree body)
fieldp = &DECL_CHAIN (*fieldp);
  *fieldp = DECL_CHAIN (*fieldp);
+ /* And out of the bindings for the function.  */
+ tree *blockp = &BLOCK_VARS (current_binding_level->blocks);
+ while (*blockp != DECL_EXPR_DECL (**use))
+   blockp = &DECL_CHAIN (*blockp);
+ *blockp = DECL_CHAIN (*blockp);
+
+ /* And maybe out of the va

[PATCH v3] libstdc++: Implement formatter for ranges and range_formatter [PR109162]

2025-04-15 Thread Tomasz Kamiński
This patch implements formatter specialization for input_ranges and
range_formatter class from P2286R8, as adjusted by P2585R1. The formatter
for pair/tuple is not yet provided, making maps not formattable.

This introduces an new _M_format_range member to internal __formatter_str,
that formats range as _CharT as string, according to the format spec.
This function transform any contiguous range into basic_string_view directly,
by computing size if necessary. Otherwise, for ranges for which size can be
computed (forward_range or sized_range) we use a stack buffer, if they are
sufficiently small. Finally, we create a basic_string<_CharT> from the range,
and format its content.

In case when padding is specified, this is handled by firstly formatting
the content of the range to the temporary string object. However, this can be
only implemented if the iterator of the basic_format_context is internal
type-erased iterator used by implementation. Otherwise a new 
basic_format_context
would need to be created, which would require rebinding of handles stored in
the arguments: note that format spec for element type could retrieve any format
argument from format context, visit and use handle to format it.
As basic_format_context provide no user-facing constructor, the user are not 
able
to construct object of that type with arbitrary iterators.

The signatures of the user-facing parse and format methods of the provided
formatters deviate from the standard by constraining types of params:
* _CharT is constrained __formatter::__char
* basic_format_parse_context<_CharT> for parse argument
* basic_format_context<_Out, _CharT> for format second argument
The standard specifies last three of above as unconstrained types. These types
are later passed to possibly user-provided formatter specializations, that are
required via formattable concept to only accept above types.

Finally, the formatter specialization is implemented
without using specialization of range-default-formatter exposition only
template as base class, while providing same functionality.

PR libstdc++/109162

libstdc++-v3/ChangeLog:

* include/std/format (__format::__has_debug_format, 
_Pres_type::_Pres_seq)
(_Pres_type::_Pres_str, __format::__Stackbuf_size): Define.
(_Separators::_S_squares, _Separators::_S_parens, _Separators::_S_comma)
(_Separators::_S_colon): Define additional constants.
(_Spec::_M_parse_fill_and_align): Define overload accepting
list of excluded characters for fill, and forward existing overload.
(__formatter_str::_M_format_range): Define.
(__format::_Buf_sink) Use __Stackbuf_size for size of array.
(__format::__is_map_formattable, std::range_formatter)
(std::formatter<_Rg, _CharT>): Define.
* src/c++23/std.cc.in (std::format_kind, std::range_format)
(std::range_formatter): Export.
* testsuite/std/format/formatter/lwg3944.cc: Guarded tests with
__glibcxx_format_ranges.
* testsuite/std/format/formatter/requirements.cc: Adjusted for standard
behavior.
* testsuite/23_containers/vector/bool/format.cc: Test vector 
formatting.
* testsuite/std/format/ranges/format_kind.cc: New test.
* testsuite/std/format/ranges/formatter.cc: New test.
* testsuite/std/format/ranges/sequence.cc: New test.
* testsuite/std/format/ranges/string.cc: New test.

Reviewed-by: Jonathan Wakely 
Signed-off-by: Tomasz Kamiński 
---
Applied the suggestions. I have done one additional changes, to avoid
repeating error handling for only `?` in format specifier. Quoting it
here:
+if (*__first == '?')
+  {
+++__first;
+__spec._M_type = __format::_Pres_esc;
+if (__finished() || *__first != 's')
+  __throw_format_error("format error: '?' is allowed only in "
+   " combination with 's'");
+  }
+
+if (*__first == 's')

---
 libstdc++-v3/include/std/format   | 506 --
 libstdc++-v3/src/c++23/std.cc.in  |   6 +
 .../23_containers/vector/bool/format.cc   |   6 +
 .../testsuite/std/format/formatter/lwg3944.cc |   4 +-
 .../std/format/formatter/requirements.cc  |  14 +-
 .../std/format/ranges/format_kind.cc  |  94 
 .../testsuite/std/format/ranges/formatter.cc  | 145 +
 .../testsuite/std/format/ranges/sequence.cc   | 190 +++
 .../testsuite/std/format/ranges/string.cc | 226 
 9 files changed, 1126 insertions(+), 65 deletions(-)
 create mode 100644 libstdc++-v3/testsuite/std/format/ranges/format_kind.cc
 create mode 100644 libstdc++-v3/testsuite/std/format/ranges/formatter.cc
 create mode 100644 libstdc++-v3/testsuite/std/format/ranges/sequence.cc
 create mode 100644 libstdc++-v3/testsuite/std/format/ranges/string.cc

diff --git a/libstdc++-v3/include/std/format b/libstdc++-v3/include/std/format
index 23f00970840..a63ef063fd5 1

Re: [PATCH v2] RISC-V: vsetvl: elide abnormal edges from LCM computations [PR119533]

2025-04-15 Thread Jeff Law




On 4/9/25 2:53 PM, Vineet Gupta wrote:

Changes since v2
  - Elide abnormal edges before LCM not after LCM
---

vsetvl phase4 uses LCM guided info to insert VSETVL insns, including a
straggler loop for "mising vsetvls" on certain edges. Currently it
asserts on encountering EDGE_ABNORMAL.

When enabling go frontend with V enabled, libgo build hits the assert.

The solution is to filter out abnormal edges from getting into LCM at
all. Existing invalid_opt_bb_p () has such checks for BB predecessors
but not for successors which is what the patch adds.

Crucially, the ICE/fix also depends on avoiding vsetvl hoisting past
non-transparent blocks: That is taken care of by Robin's patch
"RISC-V: Do not lift up vsetvl into non-transparent blocks [PR119547]"
or a different yet related issue.

Reported-by: Heinrich Schuchardt 
Signed-off-by: Vineet Gupta 

PR target/119533

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc (invalid_opt_bb_p): Check for
EDGE_ABNOMAL.
(pre_vsetvl::compute_lcm_local_properties): Dump skipped edge.

gcc/testsuite/ChangeLog:

* go.dg/pr119533-riscv.go: New test.
* go.dg/pr119533-riscv-2.go: New test.

FTR: This was acked in the patch review meeting this morning.

Thanks,
jeff



[PATCH] bitintlower: Fix interaction of gimple_assign_copy_p stmts vs. has_single_use [PR119808]

2025-04-15 Thread Jakub Jelinek
Hi!

The following testcase is miscompiled, because we emit a CLOBBER in a place
where it shouldn't be emitted.
Before lowering we have:
  b_5 = 0;
  b.0_6 = b_5;
  b.1_1 = (unsigned _BitInt(129)) b.0_6;
...
   = b_5;
The bitint coalescing assigns the same partition/underlying variable
for both b_5 and b.0_6 (possible because there is a copy assignment)
and of course a different one for b.1_1 (and other SSA_NAMEs in between).
This is -O0 so stmts aren't DCEd and aren't propagated that much etc.
It is -O0 so we also don't try to optimize and omit some names from m_names
and handle multiple stmts at once, so the expansion emits essentially
  bitint.4 = {};
  bitint.4 = bitint.4;
  bitint.2 = cast of bitint.4;
  bitint.4 = CLOBBER;
...
   = bitint.4;
and the CLOBBER is the problem because bitint.4 is still live afterwards.
We emit the clobbers to improve code generation, but do it only for
(initially) has_single_use SSA_NAMEs (remembered in m_single_use_names)
being used, if they don't have the same partition on the lhs and a few
other conditions.
The problem above is that b.0_6 which is used in the cast has_single_use
and so was in m_single_use_names bitmask and the lhs in that case is
bitint.2, so a different partition.  But there is gimple_assign_copy_p
with SSA_NAME rhs1 and the partitioning special cases those and while
b.0_6 is single use, b_5 has multiple uses.  I believe this ought to be
a problem solely in the case of such copy stmts and its special case
by the partitioning, if instead of b.0_6 = b_5; there would be
b.0_6 = b_5 + 1; or whatever other stmts that performs or may perform
changes on the value, partitioning couldn't assign the same partition
to b.0_6 and b_5 if b_5 is used later, it couldn't have two different
(or potentially different) values in the same bitint.N var.  With
copy that is possible though.

So the following patch fixes it by being more careful when we set
m_single_use_names, don't set it if it is a has_single_use SSA_NAME
but SSA_NAME_DEF_STMT of it is a copy stmt with SSA_NAME rhs1 and that
rhs1 doesn't have single use, or has_single_use but SSA_NAME_DEF_STMT of it
is a copy stmt etc.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Just to make sure it doesn't change code generation too much, I've gathered
statistics how many times
  if (m_first
  && m_single_use_names
  && m_vars[p] != m_lhs
  && m_after_stmt
  && bitmap_bit_p (m_single_use_names, SSA_NAME_VERSION (op)))
{
  tree clobber = build_clobber (TREE_TYPE (m_vars[p]),
CLOBBER_STORAGE_END);
  g = gimple_build_assign (m_vars[p], clobber);
  gimple_stmt_iterator gsi = gsi_for_stmt (m_after_stmt);
  gsi_insert_after (&gsi, g, GSI_SAME_STMT);
}
emits a clobber on
make check-gcc GCC_TEST_RUN_EXPENSIVE=1 
RUNTESTFLAGS="--target_board=unix\{-m64,-m32\} GCC_TEST_RUN_EXPENSIVE=1 
dg.exp='*bitint* pr112673.c builtin-stdc-bit-*.c pr112566-2.c pr112511.c 
pr116588.c pr116003.c pr113693.c pr113602.c flex-array-counted-by-7.c' 
dg-torture.exp='*bitint* pr116480-2.c pr114312.c pr114121.c' dfp.exp=*bitint* 
i386.exp='pr118017.c pr117946.c apx-ndd-x32-2a.c' 
vect.exp='vect-early-break_99-pr113287.c' tree-ssa.exp=pr113735.c"
and before this patch it was 41010 clobbers and after it is 40968,
so difference is 42 clobbers, 0.1% fewer.

2025-04-16  Jakub Jelinek  

PR middle-end/119808
* gimple-lower-bitint.cc (gimple_lower_bitint): Don't set
m_single_use_names bits for SSA_NAMEs which have single use but
their SSA_NAME_DEF_STMT is a copy from another SSA_NAME which doesn't
have a single use, or single use which is such a copy etc.

* gcc.dg/bitint-121.c: New test.

--- gcc/gimple-lower-bitint.cc.jj   2025-04-12 13:13:47.543814860 +0200
+++ gcc/gimple-lower-bitint.cc  2025-04-15 21:00:32.779348865 +0200
@@ -6647,10 +6647,28 @@ gimple_lower_bitint (void)
  bitmap_set_bit (large_huge.m_names, SSA_NAME_VERSION (s));
  if (has_single_use (s))
{
- if (!large_huge.m_single_use_names)
-   large_huge.m_single_use_names = BITMAP_ALLOC (NULL);
- bitmap_set_bit (large_huge.m_single_use_names,
- SSA_NAME_VERSION (s));
+ tree s2 = s;
+ /* The coalescing hook special cases SSA_NAME copies.
+Make sure not to mark in m_single_use_names single
+use SSA_NAMEs copied from non-single use SSA_NAMEs.  */
+ while (gimple_assign_copy_p (SSA_NAME_DEF_STMT (s2)))
+   {
+ s2 = gimple_assign_rhs1 (SSA_NAME_DEF_STMT (s2));
+ if (TREE_CODE (s2) != SSA_NAME)
+   break;
+ if (!has_single_use (s2))
+   {
+ s2 = NULL_TREE;
+ break;
+   }
+   }
+ if

Re: [PATCH] [testsuite] [ppc] compile [PR112822] with -mvsx

2025-04-15 Thread Alexandre Oliva
On Apr 15, 2025, Peter Bergner  wrote:

> On 4/14/25 11:30 PM, Alexandre Oliva wrote:
>> On Apr 14, 2025, Peter Bergner  wrote:
>> 
>>> This is an architecture independent test case, so I'm surprised this
>>> doesn't FAIL on non-powerpc targets since they don't know anything
>>> about altivec.
>> 
>> AFAICT we ignore attributes we don't know about.
>> 
>> I'd think the following fix should help them too.
>> 
>> I considered doing something like that, but I don't know whether the
>> modified test would trigger the original ICE.  It seemed fragile, and
>> the change could sort of invalidate the regression test.

> I have verified the modified test case ICEs with the exact same
> error as the original test case using the commit immediately
> before the commit the fixed the ICE.

Awesome, thanks!  I hereby withdraw the proposed patch, in favor of yours.

-- 
Alexandre Oliva, happy hackerhttps://blog.lx.oliva.nom.br/
Free Software Activist FSFLA co-founder GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity.
Excluding neuro-others for not behaving ""normal"" is *not* inclusive!


[committed] testsuite: Add testcase for already fixed PR [PR116093]

2025-04-15 Thread Jakub Jelinek
Hi!

This testcase got fixed with r15-9397 PR119722 fix.

Tested on x86_64-linux -m32/-m64 with vanilla trunk as well
as with r15-9397 fix reverted (where it FAILs), committed
to trunk as obvious.

2025-04-16  Jakub Jelinek  

PR tree-optimization/116093
* gcc.dg/bitint-122.c: New test.

--- gcc/testsuite/gcc.dg/bitint-122.c.jj
+++ gcc/testsuite/gcc.dg/bitint-122.c
@@ -0,0 +1,20 @@
+/* PR tree-optimization/116093 */
+/* { dg-do run { target bitint } } */
+/* { dg-options "-Og -ftree-vrp -fno-tree-dce" } */
+
+#if __BITINT_MAXWIDTH__ >= 129
+char
+foo (int a, _BitInt (129) b, char c)
+{
+  return c << (5 / b % (0xdb75dbf5 | a));
+}
+#endif
+
+int
+main ()
+{
+#if __BITINT_MAXWIDTH__ >= 129
+  if (foo (0, 6, 1) != 1)
+__builtin_abort ();
+#endif
+}

Jakub



Re: [PATCH] [testsuite] [ppc] disable -mpowerpc64 for various ilp32 asm-out checks

2025-04-15 Thread Alexandre Oliva
On Apr 15, 2025, Peter Bergner  wrote:

> On 4/15/25 9:36 AM, Peter Bergner wrote:
>> So what ABI does powerpc-elf use and what does it mandate?

That's not for me to decide, but to me the patch that introduced
OS_MISSING_POWERPC64 and the PR106680 coversation suggests that enabling
-mpowerpc64 with -m32 -mcpu=<64bitcapable> had long been intended,
*except* on systems that are not 64-bit-compatible when running 32-bit
mode programs.

I acknowledge it is a different ABI, but since call-saved registers are
handled in a way that makes the difference irrelevant, a system without
preemption (including interrupts and traps) would hit no trouble AFAICT.

Since powerpc-elf doesn't assume an underlying operating system, it
doesn't strike me as an unreasonable assumption that there won't be
preemption, or that, if the initialization code enables execution of
powerpc64 instructions (I'd expect that to be necessary to enable them,
but that's just an uncheckedd guess), then one could also count on
64-bit register saves at traps, interrupts, and context switches; as
long as there isn't something like async signals, that should suffice to
make -mpowerpc64 safe, useful and desirable in this target.

But I acknowledge that it's a bit of a risky proposition; I suppose it
would be more conservative to disable it uniformly on all targets, and
only enable it with -m32 when explicitly requested.  I.e., make
OS_MISSING_POWERPC64 the rule rather than the exception, and define it
to zero on targets where it is deemed safe.

I'm not sure what the setting should be for powerpc-elf.  Since
!OS_MISSING_POWERPC64 has been in effect for so long on it, my
inclination would be to leave it as is.

> It seems the behavior to add OPTION_MASK_POWERPC64 happened with
> Kewen's patch

No, it had been there *long* before, Kewen's patch only enabled systems
that would misbehave due to ABI concerns to declare the effective
incompatibility of enabling -mpowerpc64 in -m32.  IMHO it should be a
lot noisier than it is, possibly an error, when the incompatibility is
there, since any async interaction could be trouble.

> maybe you need the same fix rtmes added?

I don't see reason to consider that a fix for powerpc-elf; it would be a
feature regression for those who take legitimate advantage of 64-bit
registers and instructions on deployed hardware.  But my considerations
about its not being a conservatively safe feature apply: I wouldn't
stand in the way of flipping the global default, and *not* enabling
-mpowerpc64 implicitly any more on powerpc-elf, provided that it *could*
still be enabled when knowledge about the target environment makes it
safe.

>   https://gcc.gnu.org/PR106680

-- 
Alexandre Oliva, happy hackerhttps://blog.lx.oliva.nom.br/
Free Software Activist FSFLA co-founder GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity.
Excluding neuro-others for not behaving ""normal"" is *not* inclusive!


Re:[committed] [PATCH] AArch64: Fix operands order in vec_extract expander

2025-04-15 Thread Tejas Belagod

On 4/15/25 1:56 PM, Richard Sandiford wrote:

Tejas Belagod  writes:

The operand order to gen_vcond_mask call in the vec_extract pattern is wrong.
Fix the order where predicate is operand 3.

Tested and bootstrapped on aarch64-linux-gnu. OK for trunk?

gcc/ChangeLog

* config/aarch64/aarch64-sve.md (vec_extract): Fix operand
order to gen_vcond_mask_*.


Thanks, LGTM too.



Thanks, now applied to trunk as 31e16c8b75b.

Tejas.


Richard


---
  gcc/config/aarch64/aarch64-sve.md | 6 +++---
  1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-sve.md 
b/gcc/config/aarch64/aarch64-sve.md
index 3dbd65986ec..d4af3706294 100644
--- a/gcc/config/aarch64/aarch64-sve.md
+++ b/gcc/config/aarch64/aarch64-sve.md
@@ -3133,9 +3133,9 @@
"TARGET_SVE"
{
  rtx tmp = gen_reg_rtx (mode);
-emit_insn (gen_vcond_mask_ (tmp, operands[1],
-CONST1_RTX (mode),
-CONST0_RTX (mode)));
+emit_insn (gen_vcond_mask_ (tmp, CONST1_RTX (mode),
+CONST0_RTX (mode),
+operands[1]));
  emit_insn (gen_vec_extract (operands[0], tmp, operands[2]));
  DONE;
}




Re: [PATCH] [testsuite] [ppc] pr87600, pr89313: test for __PPC__ as well

2025-04-15 Thread Alexandre Oliva
On Apr 14, 2025, Peter Bergner  wrote:

> On 4/11/25 1:03 PM, Alexandre Oliva wrote:
>> gcc.dg/pr87600.h and gcc.dg/pr89313.c test for __powerpc__ and
>> __POWERPC__ to choose ppc register names, but ppc-elf defines neither;
>> it defines __PPC__, so test for that as well.

> Is there a reason why powerpc-*-elf doesn't define __powerpc__ or
> __POWERPC__ like we do for other powerpc* targets?

-ENOCLUE :-(

It doesn't seem to be uniform.

darwin.h defines __POWERPC__; 32-bit freebsd.h defines neither (though
freebsd64.h defines __powerpc__ with -m32); sysv4.h defines neither.
It seems to be a long and messy history.

> That said, I think this probably falls under the "obvious" rule too.

Thanks,

-- 
Alexandre Oliva, happy hackerhttps://blog.lx.oliva.nom.br/
Free Software Activist FSFLA co-founder GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity.
Excluding neuro-others for not behaving ""normal"" is *not* inclusive!


Re: [PATCH] [testsuite] [ppc] compile [PR112822] with -mvsx

2025-04-15 Thread Alexandre Oliva
On Apr 14, 2025, Peter Bergner  wrote:

> diff --git a/gcc/testsuite/g++.dg/pr112822.C b/gcc/testsuite/g++.dg/pr112822.C
> -typedef __attribute__((altivec(vector__))) double co;
> +typedef double co __attribute__ ((vector_size (16)));

FWIW, I've tested this change on gcc-14 powerpc-elf and I confirm that
it solves the failures that motivated the initial patch in this thread.

-- 
Alexandre Oliva, happy hackerhttps://blog.lx.oliva.nom.br/
Free Software Activist FSFLA co-founder GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity.
Excluding neuro-others for not behaving ""normal"" is *not* inclusive!


Re: [PATCH v2] riscv: Fix incorrect gnu property alignment on rv32

2025-04-15 Thread Kito Cheng
Thanks, committed to trunk :)

On Fri, Apr 11, 2025 at 12:27 PM Jesse Huang  wrote:
>
> Codegen is incorrectly emitting a ".p2align 3" that coerces the
> alignment of the .note.gnu.property section from 4 to 8 on rv32.
>
> 2025-04-11  Jesse Huang  
>
> gcc/ChangeLog
>
> * config/riscv/riscv.cc (riscv_file_end): Fix .p2align value.
>
> gcc/testsuite/ChangeLog
>
> * gcc.target/riscv/gnu-property-align-rv32.c: New file.
> * gcc.target/riscv/gnu-property-align-rv64.c: New file.
> ---
>  gcc/config/riscv/riscv.cc| 2 +-
>  gcc/testsuite/gcc.target/riscv/gnu-property-align-rv32.c | 7 +++
>  gcc/testsuite/gcc.target/riscv/gnu-property-align-rv64.c | 7 +++
>  3 files changed, 15 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/gnu-property-align-rv32.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/gnu-property-align-rv64.c
>
> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> index 38f3ae7cd84..d3656a7a430 100644
> --- a/gcc/config/riscv/riscv.cc
> +++ b/gcc/config/riscv/riscv.cc
> @@ -10382,7 +10382,7 @@ riscv_file_end ()
>fprintf (asm_out_file, "1:\n");
>
>/* pr_type.  */
> -  fprintf (asm_out_file, "\t.p2align\t3\n");
> +  fprintf (asm_out_file, "\t.p2align\t%u\n", p2align);
>fprintf (asm_out_file, "2:\n");
>fprintf (asm_out_file, "\t.long\t0xc000\n");
>/* pr_datasz.  */
> diff --git a/gcc/testsuite/gcc.target/riscv/gnu-property-align-rv32.c 
> b/gcc/testsuite/gcc.target/riscv/gnu-property-align-rv32.c
> new file mode 100644
> index 000..4f48cff33da
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/gnu-property-align-rv32.c
> @@ -0,0 +1,7 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv32g_zicfiss -fcf-protection=return -mabi=ilp32d " 
> } */
> +
> +void foo() {}
> +
> +/* { dg-final { scan-assembler-times ".p2align\t2" 3 } } */
> +/* { dg-final { scan-assembler-not ".p2align\t3" } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/gnu-property-align-rv64.c 
> b/gcc/testsuite/gcc.target/riscv/gnu-property-align-rv64.c
> new file mode 100644
> index 000..1bfd1271826
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/gnu-property-align-rv64.c
> @@ -0,0 +1,7 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64g_zicfiss -fcf-protection=return -mabi=lp64d " } 
> */
> +
> +void foo() {}
> +
> +/* { dg-final { scan-assembler-times ".p2align\t3" 3 } } */
> +/* { dg-final { scan-assembler-not ".p2align\t2" } } */
> --
> 2.39.3
>


Re: [PATCH] RISC-V: Put jump table in text for large code model

2025-04-15 Thread Kito Cheng
committed :)

On Mon, Apr 14, 2025 at 6:01 PM Kito Cheng  wrote:
>
> This patch will be committed this week if CI passes and not strong
> objections since it's bug to large code model, also change is small
>
> On Mon, Apr 14, 2025 at 6:00 PM Kito Cheng  wrote:
> >
> > Large code model assume the data or rodata may put far away from
> > text section.  So we need to put jump table in text section for
> > large code model.
> >
> > gcc/ChangeLog:
> >
> > * config/riscv/riscv.h (JUMP_TABLES_IN_TEXT_SECTION): Check if
> > large code model.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/riscv/jump-table-large-code-model.c: New test.
> > ---
> >  gcc/config/riscv/riscv.h  |  2 +-
> >  .../riscv/jump-table-large-code-model.c   | 24 +++
> >  2 files changed, 25 insertions(+), 1 deletion(-)
> >  create mode 100644 
> > gcc/testsuite/gcc.target/riscv/jump-table-large-code-model.c
> >
> > diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
> > index 2bcabd03517..2759a4cb1c9 100644
> > --- a/gcc/config/riscv/riscv.h
> > +++ b/gcc/config/riscv/riscv.h
> > @@ -888,7 +888,7 @@ extern enum riscv_cc get_riscv_cc (const rtx use);
> >  #define ASM_OUTPUT_OPCODE(STREAM, PTR) \
> >(PTR) = riscv_asm_output_opcode(STREAM, PTR)
> >
> > -#define JUMP_TABLES_IN_TEXT_SECTION 0
> > +#define JUMP_TABLES_IN_TEXT_SECTION (riscv_cmodel == CM_LARGE)
> >  #define CASE_VECTOR_MODE SImode
> >  #define CASE_VECTOR_PC_RELATIVE (riscv_cmodel != CM_MEDLOW)
> >
> > diff --git a/gcc/testsuite/gcc.target/riscv/jump-table-large-code-model.c 
> > b/gcc/testsuite/gcc.target/riscv/jump-table-large-code-model.c
> > new file mode 100644
> > index 000..1ee7f6c07d3
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/jump-table-large-code-model.c
> > @@ -0,0 +1,24 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-march=rv64gc -mabi=lp64 -mcmodel=large" } */
> > +
> > +int foo(int x, int y)
> > +{
> > +  switch(x){
> > +  case 0:
> > +return 123 + y;
> > +  case 1:
> > +return 456 + y;
> > +  case 2:
> > +return 789 - y;
> > +  case 3:
> > +return 12 * y;
> > +  case 4:
> > +return 13 % y;
> > +  case 5:
> > +return 11 *y;
> > +  }
> > +  return 0;
> > +}
> > +
> > +
> > +/* { dg-final { scan-assembler-not "\.section  \.rodata" } } */
> > --
> > 2.34.1
> >


Re: [PATCH STAGE 4] aarch64: Disable sysreg feature gating

2025-04-15 Thread Richard Sandiford
Alice Carlotti  writes:
> This applies to the sysreg read/write intrinsics __arm_[wr]sr*.  It does
> not depend on changes to Binutils, because GCC converts recognised
> sysreg names to an encoding based form, which is already ungated in Binutils.
>
> We have, however, agreed to make an equivalent change in Binutils (which
> would then disable feature gating for sysreg accesses in inline
> assembly), but this has not yet been posted upstream.
>
> In the future we may introduce a new flag to renable some checking,
> but these checks could not be comprehensive because many system
> registers depend on architecture features that don't have corresponding
> GCC/GAS --march options.  This would also depend on addressing numerous
> inconsistencies in the existing list of sysreg feature dependencies.
>
> ---
>
> Ok for master now? And how about backporting to gcc 14? I do recognise that
> this is late in stage 4, sorry - it slipped through the gaps of being
> Binutils-adjacent work with a different deadline.

OK for trunk and backports.  I'm disappointed that I didn't notice
the lack of dg-error tests for the code being removed.

Thanks,
Richard

>
> Thanks,
> Alice
>
>
>
> gcc/ChangeLog:
>
>   * config/aarch64/aarch64.cc
>   (aarch64_valid_sysreg_name_p): Remove feature check.
>   (aarch64_retrieve_sysreg): Ditto.
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/aarch64/acle/rwsr-ungated.c: New test.
>
>
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index 
> 4e801146c60a52c7ef6f8c0f92b1b922e729c234..433ec975d7e4e9d7130fe49eac37f4ebfb880416
>  100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -31073,8 +31073,6 @@ aarch64_valid_sysreg_name_p (const char *regname)
>const sysreg_t *sysreg = aarch64_lookup_sysreg_map (regname);
>if (sysreg == NULL)
>  return aarch64_is_implem_def_reg (regname);
> -  if (sysreg->arch_reqs)
> -return bool (aarch64_isa_flags & sysreg->arch_reqs);
>return true;
>  }
>  
> @@ -31098,8 +31096,6 @@ aarch64_retrieve_sysreg (const char *regname, bool 
> write_p, bool is128op)
>if ((write_p && (sysreg->properties & F_REG_READ))
>|| (!write_p && (sysreg->properties & F_REG_WRITE)))
>  return NULL;
> -  if ((~aarch64_isa_flags & sysreg->arch_reqs) != 0)
> -return NULL;
>return sysreg->encoding;
>  }
>  
> diff --git a/gcc/testsuite/gcc.target/aarch64/acle/rwsr-ungated.c 
> b/gcc/testsuite/gcc.target/aarch64/acle/rwsr-ungated.c
> new file mode 100644
> index 
> ..d67a42673733cdb128fd62d465fa122037ae531d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/acle/rwsr-ungated.c
> @@ -0,0 +1,13 @@
> +/* Test that __arm_[r,w]sr intrinsics aren't gated (by default).  */
> +
> +/* { dg-do compile } */
> +/* { dg-options "-march=armv8-a" } */
> +
> +#include 
> +
> +uint64_t
> +foo (uint64_t a)
> +{
> +  __arm_wsr64 ("zcr_el1", a);
> +  return __arm_rsr64 ("smcr_el1");
> +}


[PATCH v2] c: Fully fold each parameter for call to .ACCESS_WITH_SIZE [PR119717]

2025-04-15 Thread Qing Zhao
This is the 2nd version of the patch, the change is to replace "FALSE" with
"false" per Marek's comments.

C_MAYBE_CONST_EXPR is a C FE operator that will be removed by c_fully_fold.
In c_fully_fold, it assumes that operands of function calls have already
been folded. However, when we build call to .ACCESS_WITH_SIZE, all its
operands are not fully folded. therefore the C FE specific operator is
passed to middle-end.

In order to fix this issue, fully fold the parameters before building the
call to .ACCESS_WITH_SIZE.

Bootstrapped and regression tested on both x86 and aarch64.
Okay for trunk?

Thanks.

Qing

=

PR c/119717

gcc/c/ChangeLog:

* c-typeck.cc (build_access_with_size_for_counted_by): Fully fold the
parameters for call to .ACCESS_WITH_SIZE.

gcc/testsuite/ChangeLog:

* gcc.dg/pr119717.c: New test.
---
 gcc/c/c-typeck.cc   |  8 ++--
 gcc/testsuite/gcc.dg/pr119717.c | 24 
 2 files changed, 30 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/pr119717.c

diff --git a/gcc/c/c-typeck.cc b/gcc/c/c-typeck.cc
index 3870e8a1558..55d896e02df 100644
--- a/gcc/c/c-typeck.cc
+++ b/gcc/c/c-typeck.cc
@@ -3013,12 +3013,16 @@ build_access_with_size_for_counted_by (location_t loc, 
tree ref,
   gcc_assert (c_flexible_array_member_type_p (TREE_TYPE (ref)));
   /* The result type of the call is a pointer to the flexible array type.  */
   tree result_type = c_build_pointer_type (TREE_TYPE (ref));
+  tree first_param
+= c_fully_fold (array_to_pointer_conversion (loc, ref), false, NULL);
+  tree second_param
+= c_fully_fold (counted_by_ref, false, NULL);
 
   tree call
 = build_call_expr_internal_loc (loc, IFN_ACCESS_WITH_SIZE,
result_type, 6,
-   array_to_pointer_conversion (loc, ref),
-   counted_by_ref,
+   first_param,
+   second_param,
build_int_cst (integer_type_node, 1),
build_int_cst (counted_by_type, 0),
build_int_cst (integer_type_node, -1),
diff --git a/gcc/testsuite/gcc.dg/pr119717.c b/gcc/testsuite/gcc.dg/pr119717.c
new file mode 100644
index 000..e5eedc567b3
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr119717.c
@@ -0,0 +1,24 @@
+/* PR c/119717  */
+/* { dg-additional-options "-std=c23" } */
+/* { dg-do compile } */
+
+struct annotated {
+  unsigned count;
+  [[gnu::counted_by(count)]] char array[];
+};
+
+[[gnu::noinline,gnu::noipa]]
+static unsigned
+size_of (bool x, struct annotated *a)
+{
+  char *p = (x ? a : 0)->array;
+  return __builtin_dynamic_object_size (p, 1);
+}
+
+int main()
+{
+  struct annotated *p = __builtin_malloc(sizeof *p);
+  p->count = 0;
+  __builtin_printf ("the bdos whole is %ld\n", size_of (0, p));
+  return 0;
+}
-- 
2.31.1



Re: [RFC] [C]New syntax for the argument of counted_by attribute for C language

2025-04-15 Thread Martin Uecker
Am Dienstag, dem 15.04.2025 um 14:50 +0200 schrieb Michael Matz:
> Hello,
...

> > struct A {
> >   int *buf __counted_by(len); // 'len' *must* be in the struct.
> >   int len;
> > };
> 
> ... means that we would have to implement general delayed parsing for 
> expressions in C parsers. 

I have to agree with Michael.  This was the main reason
we rejected the original approach.  

I also think consistency with general syntax for arrays in structs
is far more important for C than consistency for the special case of
having only one identifier in counted_by.

Martin


Re: [PATCH] [PR119765] testsuite: adjust amd64-abi-9.c to check both ms and sysv ABIs

2025-04-15 Thread NightStrike
On Tue, Apr 15, 2025 at 5:02 AM LIU Hao  wrote:
>
> 在 2025-4-14 04:10, Peter Damianov 写道:
> > diff --git a/gcc/testsuite/gcc.target/i386/amd64-abi-9.c 
> > b/gcc/testsuite/gcc.target/i386/amd64-abi-9.c
> > index 9b2cd7e7b49..827215be3e2 100644
> > --- a/gcc/testsuite/gcc.target/i386/amd64-abi-9.c
> > +++ b/gcc/testsuite/gcc.target/i386/amd64-abi-9.c
> > @@ -1,18 +1,46 @@
> >   /* { dg-do compile { target { ! ia32 } } } */
> >   /* { dg-options "-O2 -mno-sse -mno-skip-rax-setup" } */
> > +
> > +// For sysv abi, eax holds the number of XMM registers used in the call.
> > +// Since sse is disabled, check that it is zeroed
> >   /* { dg-final { scan-assembler-times "xorl\[\\t \]*\\\%eax,\[\\t \]*%eax" 
> > 2 } } */
> >
> > -void foo (const char *, ...);
> > +// For ms abi, the argument should go in edx
> > +/* { dg-final { scan-assembler-times "movl\[\\t \]*\\\$20,\[\\t \[]*%edx" 
> > 2 } } */
>
> is this a superfluous `\[` ? --^^
>
> > +
> > +// For sysv abi, the argument should go in esi
> > +/* { dg-final { scan-assembler-times "movl\[\\t \]*\\\$20,\[\\t \[]*%esi" 
> > 2 } } */
> > +
> > +
>
> ditto.

Both should be \] instead of \[]


[PATCH][GCC14] Extend check-function-bodies to allow label and directives

2025-04-15 Thread H.J. Lu
Hi,

I'd like to backport this testsuite enhancement to GCC 14 so that

https://gcc.gnu.org/pipermail/gcc-patches/2025-April/680896.html

can be backported to GCC 14 with testcases unchanged.


H.J.
---
As PR target/116174 shown, we may need to verify labels and the directive
order.  Extend check-function-bodies to support matched output lines to
allow label and directives.

gcc/

* doc/sourcebuild.texi (check-function-bodies): Add an optional
argument for matched output lines.

gcc/testsuite/

* gcc.target/i386/pr116174.c: Use check-function-bodies.
* lib/scanasm.exp (parse_function_bodies): Append the line if
$up_config(matched) matches the line.
(check-function-bodies): Add an argument for matched.  Set
up_config(matched) to $matched.  Append the expected line without
$config(line_prefix) to function_regexp if it starts with ".L".

Signed-off-by: H.J. Lu 
(cherry picked from commit d6bb1e257fc414d21bc31faa7ddecbc93a197e3c)
---
 gcc/doc/sourcebuild.texi |  9 ++---
 gcc/testsuite/gcc.target/i386/pr116174.c | 18 +++---
 gcc/testsuite/lib/scanasm.exp| 15 +--
 3 files changed, 34 insertions(+), 8 deletions(-)

diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index 23dedef4161..c8130dc1ba9 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -3440,7 +3440,7 @@ assembly output.
 Passes if @var{symbol} is not defined as a hidden symbol in the test's
 assembly output.
 
-@item check-function-bodies @var{prefix} @var{terminator} [@var{options} [@{ 
target/xfail @var{selector} @}]]
+@item check-function-bodies @var{prefix} @var{terminator} [@var{options} [@{ 
target/xfail @var{selector} @} [@var{matched}]]]
 Looks through the source file for comments that give the expected assembly
 output for selected functions.  Each line of expected output starts with the
 prefix string @var{prefix} and the expected output for a function as a whole
@@ -3467,8 +3467,11 @@ Depending on the configuration (see
 @code{configure_check-function-bodies} in
 @file{gcc/testsuite/lib/scanasm.exp}), the test may discard from the
 compiler's assembly output directives such as @code{.cfi_startproc},
-local label definitions such as @code{.LFB0}, and more.
-It then matches the result against the expected
+local label definitions such as @code{.LFB0}, and more.  This behavior
+can be overridden using the optional @var{matched} argument, which
+specifies a regexp for lines that should not be discarded in this way.
+
+The test then matches the result against the expected
 output for a function as a single regular expression.  This means that
 later lines can use backslashes to refer back to @samp{(@dots{})}
 captures on earlier lines.  For example:
diff --git a/gcc/testsuite/gcc.target/i386/pr116174.c 
b/gcc/testsuite/gcc.target/i386/pr116174.c
index 8877d0b51af..686aeb9ff31 100644
--- a/gcc/testsuite/gcc.target/i386/pr116174.c
+++ b/gcc/testsuite/gcc.target/i386/pr116174.c
@@ -1,6 +1,20 @@
 /* { dg-do compile { target *-*-linux* } } */
-/* { dg-options "-O2 -fcf-protection=branch" } */
+/* { dg-options "-O2 -g0 -fcf-protection=branch" } */
+/* Keep labels and directives ('.p2align', '.cfi_startproc').
+/* { dg-final { check-function-bodies "**" "" "" { target "*-*-*" } {^\t?\.}  
} } */
 
+/*
+**foo:
+**.LFB0:
+** .cfi_startproc
+** (
+** endbr64
+** .p2align 5
+** |
+** endbr32
+** )
+**...
+*/
 char *
 foo (char *dest, const char *src)
 {
@@ -8,5 +22,3 @@ foo (char *dest, const char *src)
 /* nothing */;
   return --dest;
 }
-
-/* { dg-final { scan-assembler "\t\.cfi_startproc\n\tendbr(32|64)\n" } } */
diff --git a/gcc/testsuite/lib/scanasm.exp b/gcc/testsuite/lib/scanasm.exp
index 6cf9997240d..d1c8e3b5079 100644
--- a/gcc/testsuite/lib/scanasm.exp
+++ b/gcc/testsuite/lib/scanasm.exp
@@ -952,6 +952,9 @@ proc parse_function_bodies { config filename result } {
verbose "parse_function_bodies: $function_name:\n$function_body"
set up_result($function_name) $function_body
set in_function 0
+   } elseif { $up_config(matched) ne "" \
+  && [regexp $up_config(matched) $line] } {
+   append function_body $line "\n"
} elseif { [regexp $up_config(fluff) $line] } {
verbose "parse_function_bodies: $function_name: ignoring fluff 
line: $line"
} else {
@@ -982,7 +985,7 @@ proc check_function_body { functions name body_regexp } {
 
 # Check the implementations of functions against expected output.  Used as:
 #
-# { dg-do { check-function-bodies PREFIX TERMINATOR[ OPTION[ SELECTOR]] } }
+# { dg-do { check-function-bodies PREFIX TERMINATOR[ OPTION[ SELECTOR 
[MATCHED]]] } }
 #
 # See sourcebuild.texi for details.
 
@@ -990,7 +993,7 @@ proc check-function-bodies { args } {
 if { [llength $args] < 2 } {
error "too few arguments to check-function-bodies"

Re: [PATCH 2/4] cfgloopmanip: Add infrastructure for scaling of multi-exit loops [PR117790]

2025-04-15 Thread Jan Hubicka
> Hi,
> > gcc/ChangeLog:
> > 
> > PR tree-optimization/117790
> > * cfgloopmanip.cc (can_flow_scale_loop_freqs_p): New.
> > (flow_scale_loop_freqs): New.
> > (scale_loop_freqs_with_exit_counts): New.
> > (scale_loop_freqs_hold_exit_counts): New.
> > (scale_loop_profile): Refactor to use the newly-added
> > scale_loop_profile_1, and use scale_loop_freqs_hold_exit_counts to
> > correctly handle reducing the expected niters for loops with multiple
> > exits.
> > (scale_loop_freqs_with_new_exit_count): New.
> > (scale_loop_profile_1): New.
> > (scale_loop_profile_hold_exit_counts): New.
> > * cfgloopmanip.h (scale_loop_profile_hold_exit_counts): New.
> > (scale_loop_freqs_with_new_exit_count): New.
> +template
> +static bool
> +can_flow_scale_loop_freqs_p (class loop *loop,
> +  ExitCountFn get_exit_count)
> +{
> +  basic_block bb = loop->header;
> +
> +  const profile_count count_in = loop_count_in (loop);
> +  profile_count exit_count = profile_count::zero ();
> +
> +  while (bb != loop->latch)
> +{
> +  /* Punt if any of the BB counts are uninitialized.  */
> +  if (!bb->count.initialized_p ())
> + return false;
> +
> +  bool found_exit = false;
> +  edge internal_edge = nullptr;
> +  for (auto e : bb->succs)
> + if (flow_bb_inside_loop_p (loop, e->dest))
> +   {
> + if (internal_edge)
> +   return false;
> + internal_edge = e;
> 
> This assumes that there are at most 2 edges out which is not always the
> case (i.e. for EH and switch).  I suppose vectorizer never calls it
> there but probably you want to test that there are precisely two edges
> in can_flow_scale_loop_freqs and if not drop message to dump file, so in
> case we encounter such loops we notice.

Also forgot to write. Ohter interesting case is when loop has inner
loop.  In this case there will be 2 edges but both of them internal.
It is also possible that the exit of loop sits inside inner loop.

Honza


Re: [PATCH 2/4] cfgloopmanip: Add infrastructure for scaling of multi-exit loops [PR117790]

2025-04-15 Thread Jan Hubicka
Hi,
> gcc/ChangeLog:
> 
>   PR tree-optimization/117790
>   * cfgloopmanip.cc (can_flow_scale_loop_freqs_p): New.
>   (flow_scale_loop_freqs): New.
>   (scale_loop_freqs_with_exit_counts): New.
>   (scale_loop_freqs_hold_exit_counts): New.
>   (scale_loop_profile): Refactor to use the newly-added
>   scale_loop_profile_1, and use scale_loop_freqs_hold_exit_counts to
>   correctly handle reducing the expected niters for loops with multiple
>   exits.
>   (scale_loop_freqs_with_new_exit_count): New.
>   (scale_loop_profile_1): New.
>   (scale_loop_profile_hold_exit_counts): New.
>   * cfgloopmanip.h (scale_loop_profile_hold_exit_counts): New.
>   (scale_loop_freqs_with_new_exit_count): New.
+template
+static bool
+can_flow_scale_loop_freqs_p (class loop *loop,
+ExitCountFn get_exit_count)
+{
+  basic_block bb = loop->header;
+
+  const profile_count count_in = loop_count_in (loop);
+  profile_count exit_count = profile_count::zero ();
+
+  while (bb != loop->latch)
+{
+  /* Punt if any of the BB counts are uninitialized.  */
+  if (!bb->count.initialized_p ())
+   return false;
+
+  bool found_exit = false;
+  edge internal_edge = nullptr;
+  for (auto e : bb->succs)
+   if (flow_bb_inside_loop_p (loop, e->dest))
+ {
+   if (internal_edge)
+ return false;
+   internal_edge = e;

This assumes that there are at most 2 edges out which is not always the
case (i.e. for EH and switch).  I suppose vectorizer never calls it
there but probably you want to test that there are precisely two edges
in can_flow_scale_loop_freqs and if not drop message to dump file, so in
case we encounter such loops we notice.

+ }
+   else
+ {
+   if (found_exit)
+ return false;
+   found_exit = true;
+   exit_count += get_exit_count (e);
+ }
+
+  bb = internal_edge->dest;
+}
+
+  /* Punt if any exit edge had an uninitialized count.  */
+  if (!exit_count.initialized_p ())
+return false;
You already early rturn once you hit bb with uninitialized count, so
perhaps you can move this check just after call of get_exit_count?

+  const profile_count new_exit_count = get_exit_count (exit_edge);
+  profile_probability new_exit_prob;
+  if (new_block_count.nonzero_p ())
+   new_exit_prob = new_exit_count.probability_in (new_block_count);

If new_exit_count > new_block_count probability_in will return 1.  I
guess there is not much to do, but pehraps logging inconsistency into
dump is not a bad idea here.

+  else
+   {
+ /* NEW_BLOCK_COUNT is zero, so the only way we can make the profile
+consistent is if NEW_EXIT_COUNT is zero too.  */
+ if (dump_file && new_exit_count.nonzero_p ())
+   fprintf (dump_file,
+";; flow_scale_loop_freqs wants non-zero exit count "
+"but bb count is zero/uninit: profile is inconsistent\n");
+
+ /* Arbitrarily set the exit probability to 0.  */
+ new_exit_prob = profile_probability::never ();

never is kind of strong hint to optimize the other patch (it has
RELIABLE reliability). Since we have no info I would just keep the
probability which was there before.
+   }

Patch is OK with these changes.

Honza


Re: [PATCH 2/4] cfgloopmanip: Add infrastructure for scaling of multi-exit loops [PR117790]

2025-04-15 Thread Jan Hubicka
Hi,
> gcc/ChangeLog:
> 
>   PR tree-optimization/117790
>   * tree-vect-loop.cc (scale_profile_for_vect_loop): Use
>   scale_loop_profile_hold_exit_counts instead of scale_loop_profile.  Drop
>   the exit edge parameter, since the code now handles multiple exits.
>   Adjust the caller ...
>   (vect_transform_loop): ... here.
>
>gcc/testsuite/ChangeLog:
>
>   PR tree-optimization/117790
>   * gcc.dg/vect/vect-early-break-profile-2.c: New test.
>
>
>-  if (entry_count.nonzero_p ())
>-set_edge_probability_and_rescale_others
>-  (exit_e,
>-   entry_count.probability_in (loop->header->count / vf));
>-  /* Avoid producing very large exit probability when we do not have
>- sensible profile.  */
>-  else if (exit_e->probability < profile_probability::always () / (vf * 2))

This is handling relatively common case wehre we decide to vectorize
loop with, say, factor of 32 and have no profile-feedback.
In this case if the loop trip count is unknown at early loop, we will
esitmate it to iterate few times (approx 3-5 as that is average
iteration count of random loop based on some measurements).

The fact that we want to vecotirze by factor 32 implies that vectorizer
does not take this info seriously and its heuristics thinks better.
In this case we do not wan to drop the loop to 0 iteraitons as that
would result of poor code layout and regalloc.

I don't think you kept this logic in the new code?

Honza
>-set_edge_probability_and_rescale_others (exit_e, exit_e->probability * 
>vf);
>-  loop->latch->count = single_pred_edge (loop->latch)->count ();
>-
>-  scale_loop_profile (loop, profile_probability::always () / vf,
>-get_likely_max_loop_iterations_int (loop));
>+  const auto likely_max_niters = get_likely_max_loop_iterations_int (loop);
>+  scale_loop_profile_hold_exit_counts (loop,
>+ profile_probability::always () / vf,
>+ likely_max_niters);



[PATCH] ipa-prop: Extend the tailc IPA-VRP hack to LTO [PR119614]

2025-04-15 Thread Jakub Jelinek
Hi!

Here is my attempt at the PR119614 LTO fix.
Of course, if Martin can come up with something cleaner, let's go with that
instead.

This patch just remembers when ipa_record_return_value_range was set
to a singleton range with CONSTANT_CLASS_P value and propagates that value
through LTO to ltrans where ipa_return_value_range used by tailc pass can
consume it.
Initially I wanted to store it in cgraph_node as a tree, but haven't figured
out how to stream that tree out/in, so this patch stores it as an attribute
instead, which is streamed automatically.

Bootstrapped/regtested on x86_64-linux and i686-linux.

2025-04-14  Jakub Jelinek  

PR tree-optimization/119614
* ipa-prop.cc: Include attribs.h.
(ipa_record_return_value_range): Set "singleton retval" attribute if
the recorded range is singleton with CONSTANT_CLASS_P value, or
remove it otherwise.
(ipa_return_value_range): Use "singleton retval" attribute and create
singleton range from it as fallback.

* g++.dg/lto/pr119614_0.C: New test.

--- gcc/ipa-prop.cc.jj  2025-04-10 17:14:31.689344793 +0200
+++ gcc/ipa-prop.cc 2025-04-14 08:02:15.083339571 +0200
@@ -60,6 +60,7 @@ along with GCC; see the file COPYING3.
 #include "gimple-range.h"
 #include "value-range-storage.h"
 #include "vr-values.h"
+#include "attribs.h"
 
 /* Function summary where the parameter infos are actually stored. */
 ipa_node_params_t *ipa_node_params_sum = NULL;
@@ -6158,6 +6159,21 @@ ipa_record_return_value_range (value_ran
   ipa_return_value_sum->disable_insertion_hook ();
 }
   ipa_return_value_sum->get_create (n)->vr = ipa_get_value_range (val);
+  tree valr;
+  if (flag_lto || flag_wpa)
+{
+  if (val.singleton_p (&valr)
+ && CONSTANT_CLASS_P (valr)
+ && !tree_expr_nan_p (valr))
+   DECL_ATTRIBUTES (current_function_decl)
+ = tree_cons (get_identifier ("singleton retval"), valr,
+  DECL_ATTRIBUTES (current_function_decl));
+  else
+   DECL_ATTRIBUTES (current_function_decl)
+ = remove_attribute ("singleton retval",
+ DECL_ATTRIBUTES (current_function_decl));
+}
+
   if (dump_file && (dump_flags & TDF_DETAILS))
 {
   fprintf (dump_file, "Recording return range ");
@@ -6172,7 +6188,7 @@ bool
 ipa_return_value_range (value_range &range, tree decl)
 {
   cgraph_node *n = cgraph_node::get (decl);
-  if (!n || !ipa_return_value_sum)
+  if (!n || (!ipa_return_value_sum && !flag_ltrans))
 return false;
   enum availability avail;
   n = n->ultimate_alias_target (&avail);
@@ -6180,11 +6196,21 @@ ipa_return_value_range (value_range &ran
 return false;
   if (n->decl != decl && !useless_type_conversion_p (TREE_TYPE (decl), 
TREE_TYPE (n->decl)))
 return false;
-  ipa_return_value_summary *v = ipa_return_value_sum->get (n);
-  if (!v)
-return false;
-  v->vr->get_vrange (range);
-  return true;
+  if (ipa_return_value_sum)
+if (ipa_return_value_summary *v = ipa_return_value_sum->get (n))
+  {
+   v->vr->get_vrange (range);
+   return true;
+  }
+  if (tree attr = lookup_attribute ("singleton retval", DECL_ATTRIBUTES 
(n->decl)))
+{
+  value_range vr (TREE_VALUE (attr), TREE_VALUE (attr));
+  if (is_a  (vr))
+   (as_a  (vr)).clear_nan ();
+  range = vr;
+  return true;
+}
+  return false;
 }
 
 /* Reset all state within ipa-prop.cc so that we can rerun the compiler
--- gcc/testsuite/g++.dg/lto/pr119614_0.C.jj2025-04-14 08:06:08.774121960 
+0200
+++ gcc/testsuite/g++.dg/lto/pr119614_0.C   2025-04-07 08:42:35.629686614 
+0200
@@ -0,0 +1,34 @@
+// PR tree-optimization/119614
+// { dg-lto-do link }
+// { dg-lto-options { { -O2 -fPIC -flto -flto-partition=max } } }
+// { dg-require-effective-target shared }
+// { dg-require-effective-target fpic }
+// { dg-require-effective-target musttail }
+// { dg-extra-ld-options "-shared" }
+
+struct S {} b;
+char *foo ();
+int e, g;
+void bar ();
+void corge (S);
+
+[[gnu::noinline]] static char *
+baz ()
+{
+  bar ();
+  return 0;
+}
+
+const char *
+qux ()
+{
+  if (e)
+{
+  S a = b;
+  corge (a);
+  if (g)
+return 0;
+  [[gnu::musttail]] return baz ();
+}
+  return foo ();
+}

Jakub



Re: COBOL: Is anything stalled because of me?

2025-04-15 Thread Jakub Jelinek
On Tue, Apr 15, 2025 at 10:47:13AM -0500, Robert Dubner wrote:
> Speaking purely casually:  I thought that that COBOL would be released with 
> documented limited capability.  "Yeah, it works on x86_64-linux and 
> aarch64-linux.  More to come.".  We knew that we didn't know how to 
> cross-compile, and we knew that other platforms would have to come, in time.

What is definitely known not to work is big endian targets, cross
compilation from big endian hosts to little endian targets, 32-bit targets,
cross compilation from 32-bit hosts, I'm afraid we can live with it for the
15 release.

What is still missing are web page updates, the repository in that case
is ssh://gcc.gnu.org/git/gcc-wwwdocs.git and e.g https://gcc.gnu.org/
lists in News (left column)
"Modula-2 front end added [2022-12-14]
The Modula-2 programming language front end has been added to GCC.
This front end was contributed by Gaius Mulley."
so we want something like that for COBOL too, then in
https://gcc.gnu.org/gcc-15/changes.html something that COBOL FE has been
added and perhaps the limitations for this release.
See e.g. https://gcc.gnu.org/gcc-13/changes.html which mentioned the
addition of Modula-2.

Jakub



[pushed] c++: constexpr, trivial, and non-alias target [PR111075]

2025-04-15 Thread Jason Merrill
Tested the testcase fix with a Darwin cross-compiler.
Regression tested x86_64-pc-linux-gnu.
Applying to trunk.

-- 8< --

On Darwin and other targets with !can_alias_cdtor, we instead go to
maybe_thunk_ctor, which builds a thunk function that calls the general
constructor.  And then cp_fold tries to constant-evaluate that call, and we
ICE because we don't expect to ever be asked to constant-evaluate a call to
a trivial function.

No new test because this fixes g++.dg/torture/tail-padding1.C on affected
targets.

PR c++/111075

gcc/cp/ChangeLog:

* constexpr.cc (cxx_eval_call_expression): Allow trivial
call from a thunk.
---
 gcc/cp/constexpr.cc | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/gcc/cp/constexpr.cc b/gcc/cp/constexpr.cc
index dc59f59aa3f..4346b29abc6 100644
--- a/gcc/cp/constexpr.cc
+++ b/gcc/cp/constexpr.cc
@@ -3103,6 +3103,9 @@ cxx_eval_call_expression (const constexpr_ctx *ctx, tree 
t,
  we can only get a trivial function here with -fno-elide-constructors.  */
   gcc_checking_assert (!trivial_fn_p (fun)
   || !flag_elide_constructors
+  /* Or it's a call from maybe_thunk_body (111075).  */
+  || (TREE_CODE (t) == CALL_EXPR ? CALL_FROM_THUNK_P (t)
+  : AGGR_INIT_FROM_THUNK_P (t))
   /* We don't elide constructors when processing
  a noexcept-expression.  */
   || cp_noexcept_operand);

base-commit: 7f56a8e8ad1c33d358e9e09fcbaf263c2caba1b9
-- 
2.49.0



Re: [PATCH] x86: Update gcc.target/i386/apx-interrupt-1.c

2025-04-15 Thread Uros Bizjak
On Tue, Apr 15, 2025 at 2:23 PM H.J. Lu  wrote:
>
> On Tue, Apr 15, 2025 at 12:45 AM Uros Bizjak  wrote:
> >
> > On Tue, Apr 15, 2025 at 1:06 AM H.J. Lu  wrote:
> > >
> > > ix86_add_cfa_restore_note omits the REG_CFA_RESTORE REG note for registers
> > > pushed in red-zone.  Since
> > >
> > > commit 0a074b8c7e79f9d9359d044f1499b0a9ce9d2801
> > > Author: H.J. Lu 
> > > Date:   Sun Apr 13 12:20:42 2025 -0700
> > >
> > > APX: Don't use red-zone with 32 GPRs and no caller-saved registers
> > >
> > > disabled red-zone, update gcc.target/i386/apx-interrupt-1.c to expect
> > > 31 .cfi_restore directives.
> >
> > Hm, did you also account for RED_ZONE_RESERVE? The last 8-byte slot is
> > reserved for internal use by the compiler.
>
> There is no red-zone in this case.
>
> > Uros.
> >
> > >
> > > PR target/119784
> > > * gcc.target/i386/apx-interrupt-1.c: Expect 31 .cfi_restore
> > > directives.

OK.

Thanks,
Uros.

> > >
> > > Signed-off-by: H.J. Lu 
> > > ---
> > >  gcc/testsuite/gcc.target/i386/apx-interrupt-1.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/gcc/testsuite/gcc.target/i386/apx-interrupt-1.c 
> > > b/gcc/testsuite/gcc.target/i386/apx-interrupt-1.c
> > > index fefe2e6d6fc..fa1acc7a142 100644
> > > --- a/gcc/testsuite/gcc.target/i386/apx-interrupt-1.c
> > > +++ b/gcc/testsuite/gcc.target/i386/apx-interrupt-1.c
> > > @@ -66,7 +66,7 @@ void foo (void *frame)
> > >  /* { dg-final { scan-assembler-times {\t\.cfi_offset 132, -120} 1 } } */
> > >  /* { dg-final { scan-assembler-times {\t\.cfi_offset 131, -128} 1 } } */
> > >  /* { dg-final { scan-assembler-times {\t\.cfi_offset 130, -136} 1 } } */
> > > -/* { dg-final { scan-assembler-times ".cfi_restore" 15} } */
> > > +/* { dg-final { scan-assembler-times ".cfi_restore" 31 } } */
> > >  /* { dg-final { scan-assembler-times "pop(?:l|q)\[\\t \]*%(?:e|r)ax" 1 } 
> > > } */
> > >  /* { dg-final { scan-assembler-times "pop(?:l|q)\[\\t \]*%(?:e|r)bx" 1 } 
> > > } */
> > >  /* { dg-final { scan-assembler-times "pop(?:l|q)\[\\t \]*%(?:e|r)cx" 1 } 
> > > } */
> > > --
> > > 2.49.0
> > >
>
>
>
> --
> H.J.


[COMMITTED] Docs: Address -fivopts, -O0, and -Q confusion [PR71094]

2025-04-15 Thread Sandra Loosemore
There's a blurb at the top of the "Optimize Options" node telling
people that most optimization options are completely disabled at -O0
and a similar blurb in the entry for -Og, but nothing at the entry for
-O0.  Since this is a continuing point of confusion it seems wise to
duplicate the information in all the places users are likely to look
for it.

gcc/ChangeLog
PR tree-optimization/71094
* doc/invoke.texi (Optimize Options): Document that -fivopts is
enabled at -O1 and higher.  Add blurb about -O0 causing GCC to
completely ignore most optimization options.
---
 gcc/doc/invoke.texi | 9 +
 1 file changed, 9 insertions(+)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index b99da94dca1..0b6644b0315 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -12746,6 +12746,7 @@ complexity than at @option{-O}.
 -fipa-pure-const
 -fipa-reference
 -fipa-reference-addressable
+-fivopts
 -fmerge-constants
 -fmove-loop-invariants
 -fmove-loop-stores
@@ -12854,6 +12855,13 @@ by @option{-O2} and also turns on the following 
optimization flags:
 Reduce compilation time and make debugging produce the expected
 results.  This is the default.
 
+At @option{-O0}, GCC completely disables most optimization passes;
+they are not run even if you explicitly enable them on the command
+line, or are listed by @option{-Q --help=optimizers} as being enabled by
+default.  Many optimizations performed by GCC depend on code analysis
+or canonicalization passes that are enabled by @option{-O}, and it would
+not be useful to run individual optimization passes in isolation.
+
 @opindex Os
 @item -Os
 Optimize for size.  @option{-Os} enables all @option{-O2} optimizations
@@ -14306,6 +14314,7 @@ Enabled by default at @option{-O1} and higher.
 @item -fivopts
 Perform induction variable optimizations (strength reduction, induction
 variable merging and induction variable elimination) on trees.
+Enabled by default at @option{-O1} and higher.
 
 @opindex ftree-parallelize-loops
 @item -ftree-parallelize-loops=n
-- 
2.34.1



[PATCH STAGE 4] aarch64: Disable sysreg feature gating

2025-04-15 Thread Alice Carlotti
This applies to the sysreg read/write intrinsics __arm_[wr]sr*.  It does
not depend on changes to Binutils, because GCC converts recognised
sysreg names to an encoding based form, which is already ungated in Binutils.

We have, however, agreed to make an equivalent change in Binutils (which
would then disable feature gating for sysreg accesses in inline
assembly), but this has not yet been posted upstream.

In the future we may introduce a new flag to renable some checking,
but these checks could not be comprehensive because many system
registers depend on architecture features that don't have corresponding
GCC/GAS --march options.  This would also depend on addressing numerous
inconsistencies in the existing list of sysreg feature dependencies.

---

Ok for master now? And how about backporting to gcc 14? I do recognise that
this is late in stage 4, sorry - it slipped through the gaps of being
Binutils-adjacent work with a different deadline.

Thanks,
Alice



gcc/ChangeLog:

* config/aarch64/aarch64.cc
(aarch64_valid_sysreg_name_p): Remove feature check.
(aarch64_retrieve_sysreg): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/acle/rwsr-ungated.c: New test.


diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 
4e801146c60a52c7ef6f8c0f92b1b922e729c234..433ec975d7e4e9d7130fe49eac37f4ebfb880416
 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -31073,8 +31073,6 @@ aarch64_valid_sysreg_name_p (const char *regname)
   const sysreg_t *sysreg = aarch64_lookup_sysreg_map (regname);
   if (sysreg == NULL)
 return aarch64_is_implem_def_reg (regname);
-  if (sysreg->arch_reqs)
-return bool (aarch64_isa_flags & sysreg->arch_reqs);
   return true;
 }
 
@@ -31098,8 +31096,6 @@ aarch64_retrieve_sysreg (const char *regname, bool 
write_p, bool is128op)
   if ((write_p && (sysreg->properties & F_REG_READ))
   || (!write_p && (sysreg->properties & F_REG_WRITE)))
 return NULL;
-  if ((~aarch64_isa_flags & sysreg->arch_reqs) != 0)
-return NULL;
   return sysreg->encoding;
 }
 
diff --git a/gcc/testsuite/gcc.target/aarch64/acle/rwsr-ungated.c 
b/gcc/testsuite/gcc.target/aarch64/acle/rwsr-ungated.c
new file mode 100644
index 
..d67a42673733cdb128fd62d465fa122037ae531d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/acle/rwsr-ungated.c
@@ -0,0 +1,13 @@
+/* Test that __arm_[r,w]sr intrinsics aren't gated (by default).  */
+
+/* { dg-do compile } */
+/* { dg-options "-march=armv8-a" } */
+
+#include 
+
+uint64_t
+foo (uint64_t a)
+{
+  __arm_wsr64 ("zcr_el1", a);
+  return __arm_rsr64 ("smcr_el1");
+}



[pushed] configure, Darwin: Recognise new naming for Xcode ld.

2025-04-15 Thread Iain Sandoe
Tested on i686, x86_64 and aarch64 Darwin, plus x86_64, aarch64 and
powerpc64le Linux, pushed to trunk, thanks
Iain

--- 8 ---

The latest editions of XCode have altered the identify reported by 'ld -v'
(again).  This means that GCC configure no longer detects the version.

Fixed by adding the new name to the set checked.

gcc/ChangeLog:

* configure: Regenerate.
* configure.ac: Recognise PROJECT:ld-.nn.aa as an identifier
for Darwin's static linker.

Signed-off-by: Iain Sandoe 
---
 gcc/configure| 7 ---
 gcc/configure.ac | 7 ---
 2 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/gcc/configure b/gcc/configure
index 821f8b44bc6..16965953f05 100755
--- a/gcc/configure
+++ b/gcc/configure
@@ -3948,7 +3948,7 @@ if test x"${DEFAULT_LINKER+set}" = x"set"; then
 as_fn_error $? "cannot execute: $DEFAULT_LINKER: check --with-ld or env. 
var. DEFAULT_LINKER" "$LINENO" 5
   elif $DEFAULT_LINKER -v < /dev/null 2>&1 | grep GNU > /dev/null; then
 gnu_ld_flag=yes
-  elif $DEFAULT_LINKER -v < /dev/null 2>&1 | grep ld64- > /dev/null; then
+  elif $DEFAULT_LINKER -v < /dev/null 2>&1 | grep 'PROJECT:ld\(64\)*-' > 
/dev/null; then
 ld64_flag=yes
   fi
 
@@ -32730,8 +32730,9 @@ $as_echo "$gcc_cv_ld64_major" >&6; }
 { $as_echo "$as_me:${as_lineno-$LINENO}: checking linker version" >&5
 $as_echo_n "checking linker version... " >&6; }
 if test x"${gcc_cv_ld64_version}" = x; then
-  gcc_cv_ld64_version=`$gcc_cv_ld -v 2>&1 | $EGREP 'ld64|dyld' \
-  | sed -e 's/.*ld64-//' -e 's/.*dyld-//'| awk '{print $1}'`
+  gcc_cv_ld64_version=`$gcc_cv_ld -v 2>&1 | $EGREP 'ld64|dyld|PROJECT:ld' \
+  | sed -e 's/.*ld64-//' -e 's/.*dyld-//' -e 's/.*PROJECT:ld-//' \
+  | awk '{print $1}'`
 fi
 { $as_echo "$as_me:${as_lineno-$LINENO}: result: $gcc_cv_ld64_version" >&5
 $as_echo "$gcc_cv_ld64_version" >&6; }
diff --git a/gcc/configure.ac b/gcc/configure.ac
index 3d0a4e6f8f5..9f67e62950a 100644
--- a/gcc/configure.ac
+++ b/gcc/configure.ac
@@ -358,7 +358,7 @@ if test x"${DEFAULT_LINKER+set}" = x"set"; then
 AC_MSG_ERROR([cannot execute: $DEFAULT_LINKER: check --with-ld or env. 
var. DEFAULT_LINKER])
   elif $DEFAULT_LINKER -v < /dev/null 2>&1 | grep GNU > /dev/null; then
 gnu_ld_flag=yes
-  elif $DEFAULT_LINKER -v < /dev/null 2>&1 | grep ld64- > /dev/null; then
+  elif $DEFAULT_LINKER -v < /dev/null 2>&1 | grep 'PROJECT:ld\(64\)*-' > 
/dev/null; then
 ld64_flag=yes
   fi
   AC_DEFINE_UNQUOTED(DEFAULT_LINKER,"$DEFAULT_LINKER",
@@ -6418,8 +6418,9 @@ if test x"$ld64_flag" = x"yes"; then
 # If the version was not specified, try to find it.
 AC_MSG_CHECKING(linker version)
 if test x"${gcc_cv_ld64_version}" = x; then
-  gcc_cv_ld64_version=`$gcc_cv_ld -v 2>&1 | $EGREP 'ld64|dyld' \
-  | sed -e 's/.*ld64-//' -e 's/.*dyld-//'| awk '{print $1}'`
+  gcc_cv_ld64_version=`$gcc_cv_ld -v 2>&1 | $EGREP 'ld64|dyld|PROJECT:ld' \
+  | sed -e 's/.*ld64-//' -e 's/.*dyld-//' -e 's/.*PROJECT:ld-//' \
+  | awk '{print $1}'`
 fi
 AC_MSG_RESULT($gcc_cv_ld64_version)
 
-- 
2.39.2 (Apple Git-143)



[PATCH] Fortran: pure subroutine with pure procedure as dummy [PR106948]

2025-04-15 Thread Harald Anlauf

Dear all,

the testcase in the PR shows a case where the pureness of a function
is not properly determined, even though the function is resolved, and
its attributes clearly show that it is pure, because gfc_pure_function
relies on isym or esym being set.  This does not happen here, probably
because the function is used as a dummy here.

The least invasive fix seems to be to look at the symbol's attributes
when isym or esym is not set.

Regression testing lead to additional redundant error messages for two
testcases, so I opted to restrict the change to the case of functions
as dummy arguments, making this patch very safe.

Regtested on x86_64-pc-linux-gnu.  OK for mainline?

Thanks,
Harald

From 5ebb5bb438e8ccf6ea30559604a9f27a75dea0ef Mon Sep 17 00:00:00 2001
From: Harald Anlauf 
Date: Tue, 15 Apr 2025 20:43:05 +0200
Subject: [PATCH] Fortran: pure subroutine with pure procedure as dummy
 [PR106948]

	PR fortran/106948

gcc/fortran/ChangeLog:

	* resolve.cc (gfc_pure_function): If a function has been resolved,
	but esym is not yet set, look at its attributes to see whether it
	is pure or elemental.

gcc/testsuite/ChangeLog:

	* gfortran.dg/pure_formal_proc_4.f90: New test.
---
 gcc/fortran/resolve.cc|  7 +++
 .../gfortran.dg/pure_formal_proc_4.f90| 49 +++
 2 files changed, 56 insertions(+)
 create mode 100644 gcc/testsuite/gfortran.dg/pure_formal_proc_4.f90

diff --git a/gcc/fortran/resolve.cc b/gcc/fortran/resolve.cc
index cdf043b6411..410ff685906 100644
--- a/gcc/fortran/resolve.cc
+++ b/gcc/fortran/resolve.cc
@@ -3190,6 +3190,13 @@ gfc_pure_function (gfc_expr *e, const char **name)
 	 || e->value.function.isym->elemental;
   *name = e->value.function.isym->name;
 }
+  else if (e->symtree && e->symtree->n.sym && e->symtree->n.sym->attr.dummy)
+{
+  /* The function has been resolved, but esym is not yet set.
+	 This can happen with functions as dummy argument.  */
+  pure = e->symtree->n.sym->attr.pure || e->symtree->n.sym->attr.elemental;
+  *name = e->symtree->n.sym->name;
+}
   else
 {
   /* Implicit functions are not pure.  */
diff --git a/gcc/testsuite/gfortran.dg/pure_formal_proc_4.f90 b/gcc/testsuite/gfortran.dg/pure_formal_proc_4.f90
new file mode 100644
index 000..92640e2d2f4
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/pure_formal_proc_4.f90
@@ -0,0 +1,49 @@
+! { dg-do compile }
+! PR fortran/106948 - check that passing of PURE procedures works
+!
+! Contributed by Jim Feng
+
+module a
+  implicit none
+
+  interface new
+pure module subroutine b(x, f)
+  integer, intent(inout) :: x
+  interface
+pure function f(x) result(r)
+  real, intent(in) :: x
+  real :: r
+end function f
+  end interface
+end subroutine b
+  end interface new
+end module a
+
+submodule(a) a_b
+  implicit none
+
+contains
+  module procedure b
+x = int(f(real(x)) * 0.15)
+  end procedure b
+end submodule a_b
+
+program test
+  use a
+  implicit none
+
+  integer :: x
+
+  x = 100
+  call new(x, g)
+  print *, x
+
+contains
+
+  pure function g(y) result(r)
+real, intent(in) :: y
+real :: r
+
+r = sqrt(y)
+  end function g
+end program test
-- 
2.43.0



Re: [GCC16,RFC,V2 05/14] aarch64: add new definition for post-index stg

2025-04-15 Thread Richard Sandiford
Indu Bhagat  writes:
> Using post-index stg is a faster way of memory tagging/untagging.
>
> TBD:
>   - Currently generated by in the aarch64 backend.  Not sure if this
> is the right way to do it.
>   - Also not clear how to weave in the generation of stzg.

Similarly to patch 4, I think we should rewrite the existing stg pattern
to use the same kind of approach that I mentioned in response to patch 2,
then extend the predicate and constraint to support PRE_MODIFY and
POST_MODIFY addresses.

Thanks,
Richard

>
> ChangeLog:
>   * gcc/config/aarch64/aarch64.md
>
> ---
>
> [New in RFC V2]
> ---
>  gcc/config/aarch64/aarch64.md | 15 +++
>  1 file changed, 15 insertions(+)
>
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index 175aed3146ac..3cb773a77ad8 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -8475,6 +8475,21 @@
>[(set_attr "type" "memtag")]
>  )
>  
> +;; STG with post-index writeback.
> +(define_insn "*stg_post"
> +  [(set (mem:QI (unspec:DI
> +  [(plus:DI (match_operand:DI 1 "register_operand" "=rk")
> +(const_int 0))]
> +  UNSPEC_TAG_SPACE))
> + (and:QI (lshiftrt:DI (match_operand:DI 0 "register_operand" "rk")
> +  (const_int 56)) (const_int 15)))
> +(set (match_dup 1)
> + (plus:DI (match_dup 1) (match_operand:DI 2 
> "aarch64_granule16_simm9" "i")))]
> +  "TARGET_MEMTAG"
> +  "stg\\t%0, [%1], #%2"
> +  [(set_attr "type" "memtag")]
> +)
> +
>  ;; ST2G updates allocation tags for two memory granules (i.e. 32 bytes) at
>  ;; once, without zero initialization.
>  (define_insn "st2g"


RE: COBOL: Is anything stalled because of me?

2025-04-15 Thread Robert Dubner



> -Original Message-
> From: Jakub Jelinek 
> Sent: Tuesday, April 15, 2025 13:54
> To: Robert Dubner 
> Cc: 'Jeff Law' ; gcc-patches@gcc.gnu.org; 'James
K.
> Lowden' 
> Subject: Re: COBOL: Is anything stalled because of me?
> 
> On Tue, Apr 15, 2025 at 10:47:13AM -0500, Robert Dubner wrote:
> > Speaking purely casually:  I thought that that COBOL would be released
> with
> > documented limited capability.  "Yeah, it works on x86_64-linux and
> > aarch64-linux.  More to come.".  We knew that we didn't know how to
> > cross-compile, and we knew that other platforms would have to come, in
> time.
> 
> What is definitely known not to work is big endian targets, cross
> compilation from big endian hosts to little endian targets, 32-bit
> targets,
> cross compilation from 32-bit hosts, I'm afraid we can live with it for
> the
> 15 release.

I am afraid we're going to have to.

> 
> What is still missing are web page updates, the repository in that case
> is ssh://gcc.gnu.org/git/gcc-wwwdocs.git and e.g https://gcc.gnu.org/
> lists in News (left column)
> "Modula-2 front end added [2022-12-14]
> The Modula-2 programming language front end has been added to GCC.
> This front end was contributed by Gaius Mulley."
> so we want something like that for COBOL too, then in
> https://gcc.gnu.org/gcc-15/changes.html something that COBOL FE has been
> added and perhaps the limitations for this release.
> See e.g. https://gcc.gnu.org/gcc-13/changes.html which mentioned the
> addition of Modula-2.

Jim has been taking the lead on documentation.  He's eager to get to it.
He's been attending to some pressing family matters that require his
attention.

Thank you very much for the summary.

> 
>   Jakub



Re: [GCC16,RFC,V2 04/14] aarch64: add new definition for post-index st2g

2025-04-15 Thread Richard Sandiford
Indu Bhagat  writes:
> Using post-index st2g is a faster way of memory tagging/untagging.
> Because a post-index 'st2g tag, [addr], #32' is equivalent to:
>stg tag, addr, #0
>stg tag, addr, #16
>add addr, addr, #32
>
> TBD:
>   - Currently generated by in the aarch64 backend.  Not sure if this is
> the right way to do it.

If we do go for the "aarch64_granule_memory_operand" approach that
I described for patch 3, then that predicate (and the associated constrant)
could handle PRE_MODIFY and POST_MODIFY addresseses, which would remove
the need for separate patterns.

>   - Also not clear how to weave in the generation of stz2g.

I think stz2g could be:

(set (match_operand:OI 0 "aarch64_granule_memory_operand" "+")
 (unspec_volatile:OI
   [(const_int 0)
(match_operand:DI 1 "register_operand" "rk")]
   UNSPECV...))

I think in practice stz2g will need a separate pattern from st2g,
rather than being an alternatives of the same pattern.  (That's because
the suggested pattern for st2g uses a (match_dup 0), which isn't subject
to constraint matching.)

Thanks,
Richard

>
> ChangeLog:
>   * gcc/config/aarch64/aarch64.md
>
> ---
> [New in RFC V2]
> ---
>  gcc/config/aarch64/aarch64.md | 20 
>  1 file changed, 20 insertions(+)
>
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index d3223e275c51..175aed3146ac 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -8495,6 +8495,26 @@
>[(set_attr "type" "memtag")]
>  )
>  
> +;; ST2G with post-index writeback.
> +(define_insn "*st2g_post"
> +  [(set (mem:QI (unspec:DI
> +  [(plus:DI (match_operand:DI 1 "register_operand" "=&rk")
> +(const_int 0))]
> +  UNSPEC_TAG_SPACE))
> + (and:QI (lshiftrt:DI (match_operand:DI 0 "register_operand" "rk")
> +  (const_int 56)) (const_int 15)))
> +   (set (mem:QI (unspec:DI
> +  [(plus:DI (match_dup 1) (const_int -16))]
> +  UNSPEC_TAG_SPACE))
> + (and:QI (lshiftrt:DI (match_dup 0)
> +  (const_int 56)) (const_int 15)))
> +(set (match_dup 1)
> + (plus:DI (match_dup 1) (match_operand:DI 2 
> "aarch64_granule16_simm9" "i")))]
> +  "TARGET_MEMTAG"
> +  "st2g\\t%0, [%1], #%2"
> +  [(set_attr "type" "memtag")]
> +)
> +
>  ;; Load/Store 64-bit (LS64) instructions.
>  (define_insn "ld64b"
>[(set (match_operand:V8DI 0 "register_operand" "=r")


Re: [PATCH] c++: Properly mangle CONST_DECL without a INTEGER_CST value [PR116511]

2025-04-15 Thread Simon Martin
Hi Jason,

On Thu Apr 10, 2025 at 10:42 PM CEST, Jason Merrill wrote:
> On 9/6/24 7:15 AM, Simon Martin wrote:
>> We ICE upon the following *valid* code when mangling the requires
>> clause
>> 
>> === cut here ===
>> template  struct s1 {
>>enum { e1 = 1 };
>> };
>> template  struct s2 {
>>enum { e1 = s1::e1 };
>>s2() requires(0 != e1) {}
>> };
>> s2<8> a;
>> === cut here ===
>> 
>> The problem is that the mangler wrongly assumes that the DECL_INITIAL of
>> a CONST_DECL is always an INTEGER_CST, and blindly passes it to
>> write_integer_cst.
>> 
>> I assume we should be able to actually compute the value of e1 and use
>> it when mangling, however from my investigation, it seems to be a pretty
>> involved change.
>> 
>> What's clear however is that we should not try to write a non-literal as
>> a literal. This patch adds a utility function to determine whether a
>> tree is a literal as per the definition in the ABI, and uses it to only
>> call write_template_arg_literal when we actually have a literal in hand.
>> 
>> Note that I had to change the expectation of an existing test, that was
>> expecting "[...]void (AF::*)(){}[...]" and now gets an equivalent
>> "[...](void (AF::*)())0[...]" (and FWIW is what clang and icx give; see
>> https://godbolt.org/z/hnjdeKEhW).
>
> Unfortunately we need to provide backward bug compatibility for 
> -fabi-version=14, so this change needs to check abi_version_at_least (15).
Good point, ack.

>> +/* Determine whether T is a literal per section 5.1.6.1 of the CXX ABI.  */
>> +
>> +static bool
>> +literal_p (const tree t)
>> +{
>> +  if ((TREE_TYPE (t) && NULLPTR_TYPE_P (TREE_TYPE (t)))
>
> This looks wrong; a random expression with type nullptr_t is not a 
> literal, and can be instantiation-dependent.  And I don't see any test 
> of mangling such a thing.
TBH I think there might be more than just this wrong with this patch :-)

I have been flip-flopping between "it's wrong to just mangle the
expression" and "but I don't think we can do much better" and never
settled on one; that's why I never pinged this 6+ month old patch.

Is the approach this took actually valid? I think that in an ideal
world, the enum value would have been tsubst'd (or we'd have all we need
to tsubst it) when we mangle, and I tried to hook things up so that it
happens, but I never succeeded.

Simon



Re: [PATCH v2] c: Fully fold each parameter for call to .ACCESS_WITH_SIZE [PR119717]

2025-04-15 Thread Marek Polacek
On Tue, Apr 15, 2025 at 06:46:26PM +, Qing Zhao wrote:
> This is the 2nd version of the patch, the change is to replace "FALSE" with
> "false" per Marek's comments.
> 
> C_MAYBE_CONST_EXPR is a C FE operator that will be removed by c_fully_fold.
> In c_fully_fold, it assumes that operands of function calls have already
> been folded. However, when we build call to .ACCESS_WITH_SIZE, all its
> operands are not fully folded. therefore the C FE specific operator is
> passed to middle-end.
> 
> In order to fix this issue, fully fold the parameters before building the
> call to .ACCESS_WITH_SIZE.
> 
> Bootstrapped and regression tested on both x86 and aarch64.
> Okay for trunk?

LGTM now, thanks.
 
> Thanks.
> 
> Qing
> 
> =
> 
>   PR c/119717
> 
> gcc/c/ChangeLog:
> 
>   * c-typeck.cc (build_access_with_size_for_counted_by): Fully fold the
>   parameters for call to .ACCESS_WITH_SIZE.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/pr119717.c: New test.
> ---
>  gcc/c/c-typeck.cc   |  8 ++--
>  gcc/testsuite/gcc.dg/pr119717.c | 24 
>  2 files changed, 30 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/pr119717.c
> 
> diff --git a/gcc/c/c-typeck.cc b/gcc/c/c-typeck.cc
> index 3870e8a1558..55d896e02df 100644
> --- a/gcc/c/c-typeck.cc
> +++ b/gcc/c/c-typeck.cc
> @@ -3013,12 +3013,16 @@ build_access_with_size_for_counted_by (location_t 
> loc, tree ref,
>gcc_assert (c_flexible_array_member_type_p (TREE_TYPE (ref)));
>/* The result type of the call is a pointer to the flexible array type.  */
>tree result_type = c_build_pointer_type (TREE_TYPE (ref));
> +  tree first_param
> += c_fully_fold (array_to_pointer_conversion (loc, ref), false, NULL);
> +  tree second_param
> += c_fully_fold (counted_by_ref, false, NULL);
>  
>tree call
>  = build_call_expr_internal_loc (loc, IFN_ACCESS_WITH_SIZE,
>   result_type, 6,
> - array_to_pointer_conversion (loc, ref),
> - counted_by_ref,
> + first_param,
> + second_param,
>   build_int_cst (integer_type_node, 1),
>   build_int_cst (counted_by_type, 0),
>   build_int_cst (integer_type_node, -1),
> diff --git a/gcc/testsuite/gcc.dg/pr119717.c b/gcc/testsuite/gcc.dg/pr119717.c
> new file mode 100644
> index 000..e5eedc567b3
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/pr119717.c
> @@ -0,0 +1,24 @@
> +/* PR c/119717  */
> +/* { dg-additional-options "-std=c23" } */
> +/* { dg-do compile } */
> +
> +struct annotated {
> +  unsigned count;
> +  [[gnu::counted_by(count)]] char array[];
> +};
> +
> +[[gnu::noinline,gnu::noipa]]
> +static unsigned
> +size_of (bool x, struct annotated *a)
> +{
> +  char *p = (x ? a : 0)->array;
> +  return __builtin_dynamic_object_size (p, 1);
> +}
> +
> +int main()
> +{
> +  struct annotated *p = __builtin_malloc(sizeof *p);
> +  p->count = 0;
> +  __builtin_printf ("the bdos whole is %ld\n", size_of (0, p));
> +  return 0;
> +}
> -- 
> 2.31.1
> 

Marek



Re: [PATCH] OpenMP: omp.h omp::allocator C++ Allocator interface

2025-04-15 Thread Tobias Burnus

Alex wrote:

Tested on x86_64-pc-linux-gnu, this is only a library addition (and a
few tests) so it shouldn't cause any major impacts.  I also tested
libgomp C to ensure the conditional compile was working.


Namely, the change is only to omp.h(.in) - effective for
__cplusplus >= 201103L.

Note that the following is an OpenMP 5.0 feature that for some reason
was missed when implementing omp_alloc / omp_free support.

   omp::allocator:: ... 

where ... is the name of a predefined allocator (with omp_ and _alloc 
stripped).

[Support for omp::allocator::null_allocator is a (semi-accidental)
OpenMP 6.0 feature, where omp_null_allocator implies that the allocator
of default-allocator-var ICV is used.]

The main use case of this feature is to make it easy to use those
allocators with containers from the STL like:

  std::vector> var;

where cgroup_mem uses low latency memory on AMD and Nvidia GPU devices,
which is faster than the normal allocator.
(→ https://gcc.gnu.org/onlinedocs/libgomp/Offload-Target-Specifics.html
for cgroup_mem )

* * *

LGTM. Thanks for the patch!

Tobias


Re: [PATCH] discriminators: Fix assigning discriminators on edge [PR113546]

2025-04-15 Thread Andrew Pinski
On Sun, Mar 16, 2025 at 11:43 AM Jeff Law  wrote:
>
>
>
> On 3/15/25 9:01 PM, Andrew Pinski wrote:
> > The problem here is there was a compare debug since the discriminators
> > would still take into account debug statements. For the edge we would look
> > at the first statement after the labels and that might have been a debug 
> > statement.
> > So we need to skip over debug statements otherwise we could get different
> > discriminators # with and without -g.
> >
> > Bootstrapped and tested on x86_64-linux-gnu with no regressions.
> >
> >   PR middle-end/113546
> >
> > gcc/ChangeLog:
> >
> >   * tree-cfg.cc (first_non_label_stmt): Rename to ...
> >   (first_non_label_nondebug_stmt): This and use 
> > gsi_start_nondebug_after_labels_bb.
> >   (assign_discriminators): Update call to first_non_label_nondebug_stmt.
> >
> > gcc/testsuite/ChangeLog:
> >
> >   * c-c++-common/torture/pr113546-1.c: New test.
> OK.

Now backported/pushed to GCC 14.

Thanks,
Andrew Pinski

>
> Jeff


Re: [RFC] [C]New syntax for the argument of counted_by attribute for C language

2025-04-15 Thread Kees Cook
On Tue, Apr 15, 2025 at 09:07:44PM +0200, Martin Uecker wrote:
> Am Dienstag, dem 15.04.2025 um 14:50 +0200 schrieb Michael Matz:
> > Hello,
> ...
> 
> > > struct A {
> > >   int *buf __counted_by(len); // 'len' *must* be in the struct.
> > >   int len;
> > > };
> > 
> > ... means that we would have to implement general delayed parsing for 
> > expressions in C parsers. 
> 
> I have to agree with Michael.  This was the main reason
> we rejected the original approach.  
> 
> I also think consistency with general syntax for arrays in structs
> is far more important for C than consistency for the special case of
> having only one identifier in counted_by.

Okay, so I think the generally recognized way forward is with two
attributes:

counted_by(struct_member)

and

counted_by_expr(type struct_member; ...; expression)

This leaves flexible array members with counted_by unchanged from
current behavior.

Questions I am left with:

1) When applying counted_by to pointer members, are out-of-order member
declarations expected to be handled? As in, is this expected to be valid?

struct foo {
struct bar *p __attribute__((counted_by(count)));
int count;
};

1.A) If it is _not_ valid, is it valid to use it when the member has
been declared earlier? Such as:

struct foo {
int count;
struct bar *p __attribute__((counted_by(count)));
};

1.B) If "1" isn't valid, but "1.A" is valid, I would expect that way to
allow the member ordering in "1" is through counted_by_expr? For example:

struct foo {
struct bar *p __attribute__((counted_by_expr(int count; 
count)));
int count;
};

1.C) If "1" isn't valid, and "1.A" isn't valid, then counted_by of
pointer members must always use counted_by_expr. Is that expected?
(I ask because it seems like a potentially weird case there member order
forces choosing between two differently named attributes. It'd be really
nice if "1" could be valid.)


2) For all counted_by of pointer members, I understand this to only be
about the parsing step, not further analysis where the full sizes of
all objects will need to be known. Which means that this is valid:

struct bar; // empty declaration

struct foo {
struct bar *p __attribute__((counted_by_expr(int count; 
count)));
int count;
};
...
// defined after being referenced by counted_by_expr above
struct bar {
int a, b, c;
struct foo *p;
};

Is that correct?


3) It seems it will be possible to provide a "singleton" alias to
indicate that a given pointer member is not an array of objects, but
rather a pointer to a single object instance:

struct bar {
int a, b, c;
struct foo *p __attribute__((counted_by_expr(1)));
};

Is that correct? (This will be useful once we can apply counted_by to
function arguments...)


4) If there are type mismatches between the counted_by_expr struct
member declaration and the later actual struct member declaration, I
assume that will be a hard error. For example, this would fail to compile:

struct foo {
struct bar *p __attribute__((counted_by_expr(int count; 
count)));
unsigned long count;
};

Is that correct? It feels like if we're already able to do this analysis,
then "1" should be possible also. Perhaps I'm misunderstanding something
about the parser.


Thanks!

-Kees

-- 
Kees Cook


Re: [PATCH v2] c: Fully fold each parameter for call to .ACCESS_WITH_SIZE [PR119717]

2025-04-15 Thread Qing Zhao
Thanks.

Pushed to trunk.

Qing

> On Apr 15, 2025, at 14:56, Marek Polacek  wrote:
> 
> On Tue, Apr 15, 2025 at 06:46:26PM +, Qing Zhao wrote:
>> This is the 2nd version of the patch, the change is to replace "FALSE" with
>> "false" per Marek's comments.
>> 
>> C_MAYBE_CONST_EXPR is a C FE operator that will be removed by c_fully_fold.
>> In c_fully_fold, it assumes that operands of function calls have already
>> been folded. However, when we build call to .ACCESS_WITH_SIZE, all its
>> operands are not fully folded. therefore the C FE specific operator is
>> passed to middle-end.
>> 
>> In order to fix this issue, fully fold the parameters before building the
>> call to .ACCESS_WITH_SIZE.
>> 
>> Bootstrapped and regression tested on both x86 and aarch64.
>> Okay for trunk?
> 
> LGTM now, thanks.
> 
>> Thanks.
>> 
>> Qing
>> 
>> =
>> 
>> PR c/119717
>> 
>> gcc/c/ChangeLog:
>> 
>> * c-typeck.cc (build_access_with_size_for_counted_by): Fully fold the
>> parameters for call to .ACCESS_WITH_SIZE.
>> 
>> gcc/testsuite/ChangeLog:
>> 
>> * gcc.dg/pr119717.c: New test.
>> ---
>> gcc/c/c-typeck.cc   |  8 ++--
>> gcc/testsuite/gcc.dg/pr119717.c | 24 
>> 2 files changed, 30 insertions(+), 2 deletions(-)
>> create mode 100644 gcc/testsuite/gcc.dg/pr119717.c
>> 
>> diff --git a/gcc/c/c-typeck.cc b/gcc/c/c-typeck.cc
>> index 3870e8a1558..55d896e02df 100644
>> --- a/gcc/c/c-typeck.cc
>> +++ b/gcc/c/c-typeck.cc
>> @@ -3013,12 +3013,16 @@ build_access_with_size_for_counted_by (location_t 
>> loc, tree ref,
>>   gcc_assert (c_flexible_array_member_type_p (TREE_TYPE (ref)));
>>   /* The result type of the call is a pointer to the flexible array type.  */
>>   tree result_type = c_build_pointer_type (TREE_TYPE (ref));
>> +  tree first_param
>> += c_fully_fold (array_to_pointer_conversion (loc, ref), false, NULL);
>> +  tree second_param
>> += c_fully_fold (counted_by_ref, false, NULL);
>> 
>>   tree call
>> = build_call_expr_internal_loc (loc, IFN_ACCESS_WITH_SIZE,
>> result_type, 6,
>> - array_to_pointer_conversion (loc, ref),
>> - counted_by_ref,
>> + first_param,
>> + second_param,
>> build_int_cst (integer_type_node, 1),
>> build_int_cst (counted_by_type, 0),
>> build_int_cst (integer_type_node, -1),
>> diff --git a/gcc/testsuite/gcc.dg/pr119717.c 
>> b/gcc/testsuite/gcc.dg/pr119717.c
>> new file mode 100644
>> index 000..e5eedc567b3
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/pr119717.c
>> @@ -0,0 +1,24 @@
>> +/* PR c/119717  */
>> +/* { dg-additional-options "-std=c23" } */
>> +/* { dg-do compile } */
>> +
>> +struct annotated {
>> +  unsigned count;
>> +  [[gnu::counted_by(count)]] char array[];
>> +};
>> +
>> +[[gnu::noinline,gnu::noipa]]
>> +static unsigned
>> +size_of (bool x, struct annotated *a)
>> +{
>> +  char *p = (x ? a : 0)->array;
>> +  return __builtin_dynamic_object_size (p, 1);
>> +}
>> +
>> +int main()
>> +{
>> +  struct annotated *p = __builtin_malloc(sizeof *p);
>> +  p->count = 0;
>> +  __builtin_printf ("the bdos whole is %ld\n", size_of (0, p));
>> +  return 0;
>> +}
>> -- 
>> 2.31.1
>> 
> 
> Marek




[PATCH v2 3/4] libstdc++: Implement std::extents [PR107761].

2025-04-15 Thread Luc Grosheintz
This implements std::extents from  according to N4950 and
contains partial progress towards PR107761.

If an extent changes its type, there's a precondition in the standard,
that the value is representable in the target integer type. This commit
uses direct initialization to perform the conversion, without any
additional checks.

The precondition for 'extents::{static_,}extent' is that '__r < rank()'.
For extents this precondition is always violated and results in
calling __builtin_trap. For all other specializations it's checked via
__glibcxx_assert.

PR libstdc++/107761

libstdc++-v3/ChangeLog:

* include/std/mdspan (extents): New class.
* src/c++23/std.cc.in: Add 'using std::extents'.

Signed-off-by: Luc Grosheintz 
---
 libstdc++-v3/include/std/mdspan  | 304 +++
 libstdc++-v3/src/c++23/std.cc.in |   6 +-
 2 files changed, 309 insertions(+), 1 deletion(-)

diff --git a/libstdc++-v3/include/std/mdspan b/libstdc++-v3/include/std/mdspan
index 4094a416d1e..72ca3445d15 100644
--- a/libstdc++-v3/include/std/mdspan
+++ b/libstdc++-v3/include/std/mdspan
@@ -33,6 +33,10 @@
 #pragma GCC system_header
 #endif
 
+#include 
+#include 
+#include 
+
 #define __glibcxx_want_mdspan
 #include 
 
@@ -41,6 +45,306 @@
 namespace std _GLIBCXX_VISIBILITY(default)
 {
 _GLIBCXX_BEGIN_NAMESPACE_VERSION
+  namespace __mdspan
+  {
+template
+  class __array
+  {
+  public:
+   constexpr _Tp&
+ operator[](size_t __n) noexcept
+ {
+   return _M_elems[__n];
+ }
+
+   constexpr const _Tp&
+ operator[](size_t __n) const noexcept
+ {
+   return _M_elems[__n];
+ }
+
+  private:
+   array<_Tp, _Nm> _M_elems;
+  };
+
+template
+  class __array<_Tp, 0>
+  {
+  public:
+   constexpr _Tp&
+ operator[](size_t __n) noexcept
+ {
+   __builtin_trap();
+ }
+
+   constexpr const _Tp&
+ operator[](size_t __n) const noexcept
+ {
+   __builtin_trap();
+ }
+  };
+
+template
+  class _ExtentsStorage
+  {
+  public:
+   static constexpr bool
+   _M_is_dyn(size_t __ext) noexcept
+   { return __ext == dynamic_extent; }
+
+   template
+ static constexpr _IndexType
+ _M_int_cast(const _OIndexType& __other) noexcept
+ { return _IndexType(__other); }
+
+   static constexpr size_t _S_rank = sizeof...(_Extents);
+   static constexpr array _S_exts{_Extents...};
+
+   // For __r in [0, _S_rank], _S_dynamic_index[__r] is the number
+   // of dynamic extents up to (and not including) __r.
+   //
+   // If __r is the index of a dynamic extent, then
+   // _S_dynamic_index[__r] is the index of that extent in
+   // _M_dynamic_extents.
+   static constexpr auto _S_dynamic_index = [] consteval
+   {
+ array __ret;
+ size_t __dyn = 0;
+ for(size_t __i = 0; __i < _S_rank; ++__i)
+   {
+ __ret[__i] = __dyn;
+ __dyn += _M_is_dyn(_S_exts[__i]);
+   }
+ __ret[_S_rank] = __dyn;
+ return __ret;
+   }();
+
+   static constexpr size_t _S_rank_dynamic = _S_dynamic_index[_S_rank];
+
+   // For __r in [0, _S_rank_dynamic), _S_dynamic_index_inv[__r] is the
+   // index of the __r-th dynamic extent in _S_exts.
+   static constexpr auto _S_dynamic_index_inv = [] consteval
+   {
+ array __ret;
+ for (size_t __i = 0, __r = 0; __i < _S_rank; ++__i)
+   if (_M_is_dyn(_S_exts[__i]))
+ __ret[__r++] = __i;
+ return __ret;
+   }();
+
+   static constexpr size_t
+   _M_static_extent(size_t __r) noexcept
+   { return _S_exts[__r]; }
+
+   constexpr _IndexType
+   _M_extent(size_t __r) const noexcept
+   {
+ auto __se = _S_exts[__r];
+ if (__se == dynamic_extent)
+   return _M_dynamic_extents[_S_dynamic_index[__r]];
+ else
+   return __se;
+   }
+
+  private:
+   template
+ constexpr void
+ _M_init_dynamic_extents(_GetOtherExtent __get_extent) noexcept
+ {
+   for(size_t __i = 0; __i < _S_rank_dynamic; ++__i)
+ {
+   size_t __di = __i;
+   if constexpr (_OtherRank != _S_rank_dynamic)
+ __di = _S_dynamic_index_inv[__i];
+   _M_dynamic_extents[__i] = _M_int_cast(__get_extent(__di));
+ }
+ }
+
+  public:
+   constexpr
+   _ExtentsStorage() noexcept = default;
+
+   template
+ constexpr
+ _ExtentsStorage(const _ExtentsStorage<_OIndexType, _OExtents...>&
+ __other) noexcept
+ {
+   _M_init_dynamic_extents<_S_rank>([&__other](auto __i)
+ { return __other._M_extent(__i); });
+ }
+
+   template
+ constexpr
+ _ExtentsStorage

Re: [PATCH] [PR119765] testsuite: adjust amd64-abi-9.c to check both ms and sysv ABIs

2025-04-15 Thread LIU Hao

在 2025-4-14 04:10, Peter Damianov 写道:

diff --git a/gcc/testsuite/gcc.target/i386/amd64-abi-9.c 
b/gcc/testsuite/gcc.target/i386/amd64-abi-9.c
index 9b2cd7e7b49..827215be3e2 100644
--- a/gcc/testsuite/gcc.target/i386/amd64-abi-9.c
+++ b/gcc/testsuite/gcc.target/i386/amd64-abi-9.c
@@ -1,18 +1,46 @@
  /* { dg-do compile { target { ! ia32 } } } */
  /* { dg-options "-O2 -mno-sse -mno-skip-rax-setup" } */
+
+// For sysv abi, eax holds the number of XMM registers used in the call.
+// Since sse is disabled, check that it is zeroed
  /* { dg-final { scan-assembler-times "xorl\[\\t \]*\\\%eax,\[\\t \]*%eax" 2 } 
} */
  
-void foo (const char *, ...);

+// For ms abi, the argument should go in edx
+/* { dg-final { scan-assembler-times "movl\[\\t \]*\\\$20,\[\\t \[]*%edx" 2 } 
} */


is this a superfluous `\[` ? --^^


+
+// For sysv abi, the argument should go in esi
+/* { dg-final { scan-assembler-times "movl\[\\t \]*\\\$20,\[\\t \[]*%esi" 2 } 
} */
+
+


ditto.


--
Best regards,
LIU Hao


OpenPGP_signature.asc
Description: OpenPGP digital signature


Re: [PATCH] c++: Prune lambda captures from more places [PR119755]

2025-04-15 Thread Jason Merrill

On 4/15/25 2:56 AM, Nathaniel Shead wrote:

On Mon, Apr 14, 2025 at 05:33:05PM -0400, Jason Merrill wrote:

On 4/13/25 6:32 AM, Nathaniel Shead wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk?

-- >8 --

Currently, pruned lambda captures are still leftover in the function's
BLOCK and topmost BIND_EXPR; this doesn't cause any issues for normal
compilation, but does break modules streaming as we try to reconstruct a
FIELD_DECL that no longer exists on the type itself.

PR c++/119755

gcc/cp/ChangeLog:

* lambda.cc (prune_lambda_captures): Remove pruned capture from
function's BLOCK_VARS and BIND_EXPR_VARS.

gcc/testsuite/ChangeLog:

* g++.dg/modules/lambda-10_a.H: New test.
* g++.dg/modules/lambda-10_b.C: New test.

Signed-off-by: Nathaniel Shead 
---
   gcc/cp/lambda.cc   | 22 ++
   gcc/testsuite/g++.dg/modules/lambda-10_a.H | 17 +
   gcc/testsuite/g++.dg/modules/lambda-10_b.C |  7 +++
   3 files changed, 46 insertions(+)
   create mode 100644 gcc/testsuite/g++.dg/modules/lambda-10_a.H
   create mode 100644 gcc/testsuite/g++.dg/modules/lambda-10_b.C

diff --git a/gcc/cp/lambda.cc b/gcc/cp/lambda.cc
index f0a54b60275..d01bb04cd32 100644
--- a/gcc/cp/lambda.cc
+++ b/gcc/cp/lambda.cc
@@ -1858,6 +1858,14 @@ prune_lambda_captures (tree body)
 cp_walk_tree_without_duplicates (&body, mark_const_cap_r, &const_vars);
+  tree bind_expr = expr_single (DECL_SAVED_TREE (lambda_function (lam)));
+  gcc_assert (bind_expr
+ && (TREE_CODE (bind_expr) == BIND_EXPR
+ /* FIXME: In a noexcept lambda we never prune captures
+(PR119764); when we do we need to handle this case
+for modules streaming.  */


The attached patch seems to fix that, with the result that your patch
crashes.



Thanks.  And yup, crashing was deliberate here as I wasn't 100% sure
what the tree would look like for this case after an appropriate fix.

One quick question about your patch, since it could in theory affect ABI
(the size of the lambdas change) should the pruning of such lambdas be
dependent on an ABI version check?


Indeed, perhaps this is too late in the 15 cycle for such a change.


Otherwise here's an updated patch that relies on your patch.
Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk along
with yours?  (Or if the potential ABI concerns mean that your change
isn't appropriate for GCC15, would the old version of my patch still be
OK for GCC15 to get 'import std' working again for C++26?)


For 15 please adjust this patch to be more fault-tolerant:


-- >8 --

Currently, pruned lambda captures are still leftover in the function's
BLOCK and topmost BIND_EXPR; this doesn't cause any issues for normal
compilation, but does break modules streaming as we try to reconstruct a
FIELD_DECL that no longer exists on the type itself.

PR c++/119755

gcc/cp/ChangeLog:

* lambda.cc (prune_lambda_captures): Remove pruned capture from
function's BLOCK_VARS and BIND_EXPR_VARS.

gcc/testsuite/ChangeLog:

* g++.dg/modules/lambda-10_a.H: New test.
* g++.dg/modules/lambda-10_b.C: New test.

Signed-off-by: Nathaniel Shead 
Reviewed-by: Jason Merrill 
---
  gcc/cp/lambda.cc   | 19 +++
  gcc/testsuite/g++.dg/modules/lambda-10_a.H | 17 +
  gcc/testsuite/g++.dg/modules/lambda-10_b.C |  7 +++
  3 files changed, 43 insertions(+)
  create mode 100644 gcc/testsuite/g++.dg/modules/lambda-10_a.H
  create mode 100644 gcc/testsuite/g++.dg/modules/lambda-10_b.C

diff --git a/gcc/cp/lambda.cc b/gcc/cp/lambda.cc
index c6308b941d3..7bb88a900d5 100644
--- a/gcc/cp/lambda.cc
+++ b/gcc/cp/lambda.cc
@@ -1862,6 +1862,11 @@ prune_lambda_captures (tree body)
  
cp_walk_tree_without_duplicates (&body, mark_const_cap_r, &const_vars);
  
+  tree bind_expr = expr_single (DECL_SAVED_TREE (lambda_function (lam)));

+  if (bind_expr && TREE_CODE (bind_expr) == MUST_NOT_THROW_EXPR)
+bind_expr = expr_single (TREE_OPERAND (bind_expr, 0));
+  gcc_assert (bind_expr && TREE_CODE (bind_expr) == BIND_EXPR);


i.e. here clear bind_expr if it isn't a BIND_EXPR...


tree *fieldp = &TYPE_FIELDS (LAMBDA_EXPR_CLOSURE (lam));
for (tree *capp = &LAMBDA_EXPR_CAPTURE_LIST (lam); *capp; )
  {
@@ -1883,6 +1888,20 @@ prune_lambda_captures (tree body)
fieldp = &DECL_CHAIN (*fieldp);
  *fieldp = DECL_CHAIN (*fieldp);
  
+	  /* And out of the bindings for the function.  */

+ tree *blockp = &BLOCK_VARS (current_binding_level->blocks);
+ while (*blockp != DECL_EXPR_DECL (**use))
+   blockp = &DECL_CHAIN (*blockp);
+ *blockp = DECL_CHAIN (*blockp);
+
+ /* And maybe out of the vars declared in the containing
+BIND_EXPR, if it's listed there.  */
+ tree *bindp = &BIND_EX

[PATCH v4 18/20] Fix FMV return type ambiguation

2025-04-15 Thread Alfie Richards
Add logic for the case of two FMV annotated functions with identical
signature other than the return type.

Previously this was ignored, this changes the behavior to emit a diagnostic.

gcc/cp/ChangeLog:
PR c++/119498
* decl.cc (duplicate_decls): Change logic to not always exclude FMV
annotated functions in cases of return type non-ambiguation.
---
 gcc/cp/decl.cc | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
index 4a374fa29e3..6494944e3ba 100644
--- a/gcc/cp/decl.cc
+++ b/gcc/cp/decl.cc
@@ -2022,8 +2022,11 @@ duplicate_decls (tree newdecl, tree olddecl, bool 
hiding, bool was_hidden)
}
  /* For function versions, params and types match, but they
 are not ambiguous.  */
- else if ((!DECL_FUNCTION_VERSIONED (newdecl)
-   && !DECL_FUNCTION_VERSIONED (olddecl))
+ else if (((!DECL_FUNCTION_VERSIONED (newdecl)
+&& !DECL_FUNCTION_VERSIONED (olddecl))
+   || !comptypes (TREE_TYPE (TREE_TYPE (newdecl)),
+  TREE_TYPE (TREE_TYPE (olddecl)),
+  COMPARE_STRICT))
   /* Let constrained hidden friends coexist for now, we'll
  check satisfaction later.  */
   && !member_like_constrained_friend_p (newdecl)
-- 
2.34.1



[committed] libstdc++: Do not define __cpp_lib_ranges_iota in

2025-04-15 Thread Jonathan Wakely
In r14-7153-gadbc46942aee75 we removed a duplicate definition of
__glibcxx_want_range_iota from , but __cpp_lib_ranges_iota
should be defined in  at all.

libstdc++-v3/ChangeLog:

* include/std/ranges (__glibcxx_want_ranges_iota): Do not
define.
---

Tested x86_64-linux. Pushed to trunk.

 libstdc++-v3/include/std/ranges | 1 -
 1 file changed, 1 deletion(-)

diff --git a/libstdc++-v3/include/std/ranges b/libstdc++-v3/include/std/ranges
index 7a339c51368..9300c364a16 100644
--- a/libstdc++-v3/include/std/ranges
+++ b/libstdc++-v3/include/std/ranges
@@ -64,7 +64,6 @@
 #define __glibcxx_want_ranges_chunk
 #define __glibcxx_want_ranges_chunk_by
 #define __glibcxx_want_ranges_enumerate
-#define __glibcxx_want_ranges_iota
 #define __glibcxx_want_ranges_join_with
 #define __glibcxx_want_ranges_repeat
 #define __glibcxx_want_ranges_slide
-- 
2.49.0



[committed] libstdc++: Do not declare namespace ranges in unconditionally

2025-04-15 Thread Jonathan Wakely
Move namespace ranges inside the feature test macro guard, because
'ranges' is not a reserved name before C++20.

libstdc++-v3/ChangeLog:

* include/std/numeric (ranges): Only declare namespace for C++23
and later.
(ranges::iota_result): Fix indentation.
* testsuite/17_intro/names.cc: Check ranges is not used as an
identifier before C++20.
---

Tested x86_64-linux. Pushed to trunk.

 libstdc++-v3/include/std/numeric | 8 +++-
 libstdc++-v3/testsuite/17_intro/names.cc | 4 
 2 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/libstdc++-v3/include/std/numeric b/libstdc++-v3/include/std/numeric
index 4d36fcd36d9..490963ee46d 100644
--- a/libstdc++-v3/include/std/numeric
+++ b/libstdc++-v3/include/std/numeric
@@ -732,12 +732,11 @@ namespace __detail
   /// @} group numeric_ops
 #endif // C++17
 
+#if __glibcxx_ranges_iota >= 202202L // C++ >= 23
 namespace ranges
 {
-#if __glibcxx_ranges_iota >= 202202L // C++ >= 23
-
   template
-  using iota_result = out_value_result<_Out, _Tp>;
+using iota_result = out_value_result<_Out, _Tp>;
 
   struct __iota_fn
   {
@@ -762,9 +761,8 @@ namespace ranges
   };
 
   inline constexpr __iota_fn iota{};
-
-#endif // __glibcxx_ranges_iota
 } // namespace ranges
+#endif // __glibcxx_ranges_iota
 
 _GLIBCXX_END_NAMESPACE_VERSION
 } // namespace std
diff --git a/libstdc++-v3/testsuite/17_intro/names.cc 
b/libstdc++-v3/testsuite/17_intro/names.cc
index 4458325e52b..f67818db425 100644
--- a/libstdc++-v3/testsuite/17_intro/names.cc
+++ b/libstdc++-v3/testsuite/17_intro/names.cc
@@ -142,6 +142,10 @@
 #define try_emplace (
 #endif
 
+#if __cplusplus < 202002L
+#define ranges (
+#endif
+
 // These clash with newlib so don't use them.
 # define __lockablecannot be used as an identifier
 # define __null_sentinel   cannot be used as an identifier
-- 
2.49.0



[PUSHED/14 6/6] testcase: Add testcase for already fixed PR [PR118476]

2025-04-15 Thread Andrew Pinski
This testcase was fixed by r15-3052-gc7b76a076cb2c6ded but is
a testcase that failed in a different fashion and a much older
failure than the one added with r15-3052.

Pushed as obvious after a quick test.

PR tree-optimization/118476

gcc/testsuite/ChangeLog:

* gcc.dg/torture/pr118476-1.c: New test.

Signed-off-by: Andrew Pinski 
(cherry picked from commit d45a6502d1ec87d43f1a39f87cca58f1e28369c8)
---
 gcc/testsuite/gcc.dg/torture/pr118476-1.c | 14 ++
 1 file changed, 14 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr118476-1.c

diff --git a/gcc/testsuite/gcc.dg/torture/pr118476-1.c 
b/gcc/testsuite/gcc.dg/torture/pr118476-1.c
new file mode 100644
index 000..33509403b61
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr118476-1.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+
+/* PR tree-optimization/118476 */
+
+typedef unsigned long long poly64x1 
__attribute__((__vector_size__(1*sizeof(long long;
+
+poly64x1 vext_p64(poly64x1 a, poly64x1 b, const int n)
+{
+  poly64x1 r = a;
+  unsigned src = (unsigned)n;
+  long long t = b[0];
+  r[0] = (src < 1) ? a[src] : t;
+  return r;
+}
-- 
2.43.0



Re: [PATCH v2] libstdc++: Implement formatter for ranges and range_formatter [PR109162]

2025-04-15 Thread Jonathan Wakely

A few spelling and grammar fixes, and whitespace tweaks, but the only
significant thing is to qualify some calls to prevent ADL ...


On 14/04/25 16:13 +0200, Tomasz Kamiński wrote:

This patch implements formatter specialization for input_ranges and
range_formatter class form P2286R8, as adjusted by P2585R1. The formatter


"form" should be "from"


for pair/tuple is not yet provided, making maps not formattable.

To indicate partial support we define __glibcxx_format_ranges macro
value 1, without defining __cpp_lib_format_ranges.


That was already pushed in an earlier commit, but this sounds like
it's done here.


This introduces an new _M_format_range member to internal __formatter_str,
that formats range as _CharT as string, according the to the format spec.


"the to the" should be "to the"


This function transform any contiguous range into basic_string_view direclty,


"directly"


by computing size if necessary. Otherwise, for ranges for which size can be
computed (forward_range or sized_range) we use a stack buffer, if they are
sufficiently small. Finally, we create a basic_string<_CharT> from the range,
and format it content.


Should be "its content"



In case when padding is specified, this is handled by firstly formatting
the content of the range to the temporary string object. However, this can be
only implemented if the iterator of the basic_format_context is internal
type-erased iterator used by implementation. Otherwise a new 
basic_format_context
would need to be created, which would require rebinding of handles stored in
the arguments: note that format spec for element type could retrive any format


"retrive" should be "retrieve"


argument from format context, visit and and user handle to format it.


"and and"


As basic_format_context provide no user-facing constructor, the user are not 
able


"the user are not" should be "users are not"


to cosntructor object of that type with arbitrally iterators.


"cosntructor" should be "construct"
"arbitrally" should be "arbitrary"



The signatures of the user-facing parse and format method of the provided


"method" should be "methods"


formatters deviate from the standard by constraining types of params:
* _CharT is constrained __formatter::__char
* basic_format_parse_context<_CharT> for parse argument
* basic_format_context<_Out, _CharT> for format second argument
The standard specifies last three of above as unconstrained types. This types


"This types" should be "These types"


are later passed to possibly user-provided formatter specializations, that are
required via formattable concept to only accept above types.

Finally, the formatter specialization is implemented
without using specialization of range-default-formatter exposition only
template as base class, while providing same functionality.

PR libstdc++/109162

libstdc++-v3/ChangeLog:

* include/std/format (__format::__has_debug_format, 
_Pres_type::_Pres_seq)
(_Pres_type::_Pres_str, __format::__Stackbuf_size): Define.
(_Separators::_S_squares, _Separators::_S_parens, _Separators::_S_comma)
(_Separators::_S_colon): Define additional constants.
(_Spec::_M_parse_fill_and_align): Define overload accepting
list of excluded characters for fill, and forward existing overload.
(__formatter_str::_M_format_range): Define.
(__format::_Buf_sink) Use __Stackbuf_size for size of array.
(__format::__is_map_formattable, std::range_formatter)
(std::formatter<_Rg, _CharT>): Define.
* src/c++23/std.cc.in (std::format_kind, std::range_format)
(std::range_formatter): Export.
* testsuite/std/format/formatter/lwg3944.cc: Guarded tests with
__glibcxx_format_ranges.
* testsuite/std/format/formatter/requirements.cc: Adjusted for standard
behavior.
* testsuite/23_containers/vector/bool/format.cc: Test vector 
formatting.
* testsuite/std/format/ranges/format_kind.cc: New test.
* testsuite/std/format/ranges/formatter.cc: New test.
* testsuite/std/format/ranges/sequence.cc: New test.
* testsuite/std/format/ranges/string.cc: New test.
---
Adjusted the commit message and added test for result of formattable
check for ranges of types that are not formattable.

libstdc++-v3/include/std/format   | 511 --
libstdc++-v3/src/c++23/std.cc.in  |   6 +
.../23_containers/vector/bool/format.cc   |   6 +
.../testsuite/std/format/formatter/lwg3944.cc |   4 +-
.../std/format/formatter/requirements.cc  |  14 +-
.../std/format/ranges/format_kind.cc  |  94 
.../testsuite/std/format/ranges/formatter.cc  | 145 +
.../testsuite/std/format/ranges/sequence.cc   | 190 +++
.../testsuite/std/format/ranges/string.cc | 226 
9 files changed, 1131 insertions(+), 65 deletions(-)
create mode 100644 libstdc++-v3/testsuite/std/format/ranges/format_kind.cc
create mode 100644 libstdc++-v3/

Re: [PATCH] Locality cloning pass (was: Introduce -flto-partition=locality)

2025-04-15 Thread Kyrylo Tkachov


> On 15 Apr 2025, at 15:42, Richard Biener  wrote:
> 
> On Mon, Apr 14, 2025 at 3:11 PM Kyrylo Tkachov  wrote:
>> 
>> Hi Honza,
>> 
>>> On 13 Apr 2025, at 23:19, Jan Hubicka  wrote:
>>> 
 +@opindex fipa-reorder-for-locality
 +@item -fipa-reorder-for-locality
 +Group call chains close together in the binary layout to improve code code
 +locality.  This option is incompatible with an explicit
 +@option{-flto-partition=} option since it enforces a custom partitioning
 +scheme.
>>> 
>>> Please also cross-link this with -fprofile-reorder-functions and
>>> -freorder-functions, which does similar thing.
>>> If you see how to clean-up the description of the other two so user is
>>> not confused.
>>> 
>>> Perhaps say that -freorder-functions only partitions functions into
>>> never-executed/cold/normal/hot and -fprofile-reroder-functions is aiming
>>> for program startup optimization (it reorders by measured first time the
>>> function is executed.  By accident it seems to kind of work for
>>> locality.
>> 
>> Yeah, the option names are quite similar aren't they?
>> I’ve attempted to disambiguate them a bit in their description.
>> I’m attaching a diff from the previous version (as the full updated patch) 
>> to make it easier to see what’s adjusted.
>> 
>> 
>>> 
 +
 +/* Helper function of to accumulate call counts.  */
 +static bool
 +accumulate_profile_counts_after_cloning (cgraph_node *node, void *data)
 +{
 +  struct profile_stats *stats = (struct profile_stats *) data;
 +  for (cgraph_edge *e = node->callers; e; e = e->next_caller)
 +{
 +  if (e->caller == stats->target)
 + {
 +  if (stats->rec_count.compatible_p (e->count.ipa ()))
 +stats->rec_count += e->count.ipa ();
 + }
 +  else
 + {
 +  if (stats->nonrec_count.compatible_p (e->count.ipa ()))
 +stats->nonrec_count += e->count.ipa ();
 + }
>>> In case part of profile is missing (which may happen if one unit has -O0
>>> or so) , we may have counts to be uninitialized. Uninitialized counts are
>>> compatible with everything, but any arithmetics with it will produce
>>> uninitialized result which will likely confuse code later.  So I would
>>> skip edges with uninitialized counts.
>>> 
>>> On the other hand ipa counts are always compatible, so compatible_p
>>> should be redundat. Main reaosn for existence of compatible_p is that we
>>> can have local profiles that are 0 or unknown at IPA level.  The ipa ()
>>> conversion turns all counts into IPA counts and those are compatible
>>> with each other.
>>> 
>>> I suppose compatibe_p test is there since the code ICEd in past,but I
>>> think it was because of missing ipa() conversion.
>>> 
>>> 
 +}
 +  return false;
 +}
 +
 +/* NEW_NODE is a previously created clone of ORIG_NODE already present in
 +   current partition.  EDGES contains newly redirected edges to NEW_NODE.
 +   Adjust profile information for both nodes and the edge.  */
 +
 +static void
 +adjust_profile_info_for_non_self_rec_edges (auto_vec 
 &edges,
 +cgraph_node *new_node,
 +cgraph_node *orig_node)
 +{
 +  profile_count orig_node_count = orig_node->count.ipa ();
 +  profile_count edge_count = profile_count::zero ();
 +  profile_count final_new_count = profile_count::zero ();
 +  profile_count final_orig_count = profile_count::zero ();
 +
 +  for (unsigned i = 0; i < edges.length (); ++i)
 +edge_count += edges[i]->count.ipa ();
>>> Here I would again skip uninitialized.  It is probably legal for -O0
>>> function to end up in partition.
 +
 +  final_orig_count = orig_node_count - edge_count;
 +
 +  /* NEW_NODE->count was adjusted for other callers when the clone was
 + first created.  Just add the new edge count.  */
 +  if (new_node->count.compatible_p (edge_count))
 +final_new_count = new_node->count + edge_count;
>>> And here compatible_p should be unnecesary.
 +/* Accumulate frequency of all edges from EDGE->caller to EDGE->callee.  
 */
 +
 +static sreal
 +accumulate_incoming_edge_frequency (cgraph_edge *edge)
 +{
 +  sreal count = 0;
 +  struct cgraph_edge *e;
 +  for (e = edge->callee->callers; e; e = e->next_caller)
 +{
 +  /* Make a local decision about all edges for EDGE->caller but not 
 the
 + other nodes already in the partition.  Their edges will be visited
 + later or may have been visited before and not fit the
 + cut-off criteria.  */
 +  if (e->caller == edge->caller)
 + {
 +  profile_count caller_count = e->caller->inlined_to
 + ? e->caller->inlined_to->count
 + : e->caller->count;
 +  if (e->count.compatible_p (caller_count))
>>> Here again compatiblity check should not be necessary, since the counts
>>> belong to one function body (after inlining) and should be compatible.
>>> inliner 

COBOL: Is anything stalled because of me?

2025-04-15 Thread Robert Dubner
Speaking purely casually:  I thought that that COBOL would be released with 
documented limited capability.  "Yeah, it works on x86_64-linux and 
aarch64-linux.  More to come.".  We knew that we didn't know how to 
cross-compile, and we knew that other platforms would have to come, in time.

It never occurred to me that significant efforts would be made to fix all 
that in a month.

More formally:  I am very aware that I have not been as responsive here as 
maybe I should have been.  I plead incapacition due to inundation.

If I have missed anything; if anybody is waiting for me, please remind me. 
And if I have missed pings, I apologize; they've just been hidden in the 
deluge.

Thanks.

> -Original Message-
> From: Jeff Law 
> Sent: Tuesday, April 15, 2025 10:32
> To: gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH] libgcobol: mark riscv64-*-linux* as supported target
>
>
>
> On 4/15/25 7:57 AM, Andreas Schwab wrote:
> > * configure.tgt: Set LIBGCOBOL_SUPPORTED for riscv64-*-linux* with
> > 64-bit multilib.
> Can't say I'm happy with the amount of Cobol related churn at this phase
> in our cycle.  But this should be exceedingly safe.  So OK.
>
> jeff



Re: Mark const parameters passed by invisible reference as readonly in the function body

2025-04-15 Thread Jason Merrill

On 3/30/25 6:12 PM, Jan Hubicka wrote:

Hi,
I noticed that this patch got forgotten and I think it may be useful to
solve this next stage 1.


cp_apply_type_quals_to_decl drops 'const' if the type has mutable members.
Unfortunately TREE_READONLY on the PARM_DECL isn't helpful in the case of an
invisiref parameter.


But maybe classes with mutable
members are never POD and thus always runtime initialized?


No.


C++ frontend has

/* Nonzero means that this type contains a mutable member.  */
#define CLASSTYPE_HAS_MUTABLE(NODE) (LANG_TYPE_CLASS_CHECK (NODE)->has_mutable)
#define TYPE_HAS_MUTABLE_P(NODE) (cp_has_mutable_p (NODE))

but it is not exported to middle-end.

However still this is quite special situation since the object is passed
using invisible reference, so I wonder if in this situation a copy is
constructed so the callee can possibly overwrite the mutable fields?


The object bound to the invisible reference is usually a copy, mutable
doesn't make a difference.

If I understand situation right, in the following testcase:

struct foo
{
   mutable int a;
   void bar() const;
   ~foo()
   {
 if (a != 42)
   __builtin_abort ();
   }
};
__attribute__ ((noinline))
void test(const struct foo a)
{
 int b = a.a;
 a.bar();
 if (a.a != b)
   __builtin_printf ("optimize me away");
}

We can not assume that value of a.a was not changed by bar because a is
mutable, but otherwise it is safe to optimize out the final check.
If that is so, I think we want to let middle-end know that a type has
mutable field and use it here, right?


Ah, yes, that makes sense.

Jason



Re: [PATCH] c: Fully fold each parameter for call to .ACCESS_WITH_SIZE [PR119717]

2025-04-15 Thread Marek Polacek
On Mon, Apr 14, 2025 at 08:28:55PM +, Qing Zhao wrote:
> C_MAYBE_CONST_EXPR is a C FE operator that will be removed by c_fully_fold.
> In c_fully_fold, it assumes that operands of function calls have already
> been folded. However, when we build call to .ACCESS_WITH_SIZE, all its
> operands are not fully folded. therefore the C FE specific operator is
> passed to middle-end.
> 
> In order to fix this issue, fully fold the parameters before building the
> call to .ACCESS_WITH_SIZE.
> 
> I am doing the bootstrap and regression testing on both X86 and aarch64 now.
> Okay for trunk if testing going well?
> 
> thanks.
> 
> Qing
> 
>   PR c/119717
> 
> gcc/c/ChangeLog:
> 
>   * c-typeck.cc (build_access_with_size_for_counted_by): Fully fold the
>   parameters for call to .ACCESS_WITH_SIZE.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/pr119717.c: New test.
> ---
>  gcc/c/c-typeck.cc   |  8 ++--
>  gcc/testsuite/gcc.dg/pr119717.c | 24 
>  2 files changed, 30 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/pr119717.c
> 
> diff --git a/gcc/c/c-typeck.cc b/gcc/c/c-typeck.cc
> index 3870e8a1558..dd176d96a41 100644
> --- a/gcc/c/c-typeck.cc
> +++ b/gcc/c/c-typeck.cc
> @@ -3013,12 +3013,16 @@ build_access_with_size_for_counted_by (location_t 
> loc, tree ref,
>gcc_assert (c_flexible_array_member_type_p (TREE_TYPE (ref)));
>/* The result type of the call is a pointer to the flexible array type.  */
>tree result_type = c_build_pointer_type (TREE_TYPE (ref));
> +  tree first_param
> += c_fully_fold (array_to_pointer_conversion (loc, ref), FALSE, NULL);
> +  tree second_param
> += c_fully_fold (counted_by_ref, FALSE, NULL);

Why FALSE?  Just use false.  You can also use nullptr rather than NULL now.
  
>tree call
>  = build_call_expr_internal_loc (loc, IFN_ACCESS_WITH_SIZE,
>   result_type, 6,
> - array_to_pointer_conversion (loc, ref),
> - counted_by_ref,
> + first_param,
> + second_param,
>   build_int_cst (integer_type_node, 1),
>   build_int_cst (counted_by_type, 0),
>   build_int_cst (integer_type_node, -1),
> diff --git a/gcc/testsuite/gcc.dg/pr119717.c b/gcc/testsuite/gcc.dg/pr119717.c
> new file mode 100644
> index 000..e5eedc567b3
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/pr119717.c
> @@ -0,0 +1,24 @@
> +/* PR c/119717  */
> +/* { dg-additional-options "-std=c23" } */
> +/* { dg-do compile } */
> +
> +struct annotated {
> +  unsigned count;
> +  [[gnu::counted_by(count)]] char array[];
> +};
> +
> +[[gnu::noinline,gnu::noipa]]
> +static unsigned
> +size_of (bool x, struct annotated *a)
> +{
> +  char *p = (x ? a : 0)->array;
> +  return __builtin_dynamic_object_size (p, 1);
> +}
> +
> +int main()
> +{
> +  struct annotated *p = __builtin_malloc(sizeof *p);
> +  p->count = 0;
> +  __builtin_printf ("the bdos whole is %ld\n", size_of (0, p));
> +  return 0;
> +}
> -- 
> 2.31.1
> 

Marek



[PATCH] x86: Update gcc.target/i386/apx-interrupt-1.c

2025-04-15 Thread H.J. Lu
ix86_add_cfa_restore_note omits the REG_CFA_RESTORE REG note for registers
pushed in red-zone.  Since

commit 0a074b8c7e79f9d9359d044f1499b0a9ce9d2801
Author: H.J. Lu 
Date:   Sun Apr 13 12:20:42 2025 -0700

APX: Don't use red-zone with 32 GPRs and no caller-saved registers

disabled red-zone, update gcc.target/i386/apx-interrupt-1.c to expect
31 .cfi_restore directives.

PR target/119784
* gcc.target/i386/apx-interrupt-1.c: Expect 31 .cfi_restore
directives.

Signed-off-by: H.J. Lu 
---
 gcc/testsuite/gcc.target/i386/apx-interrupt-1.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/i386/apx-interrupt-1.c 
b/gcc/testsuite/gcc.target/i386/apx-interrupt-1.c
index fefe2e6d6fc..fa1acc7a142 100644
--- a/gcc/testsuite/gcc.target/i386/apx-interrupt-1.c
+++ b/gcc/testsuite/gcc.target/i386/apx-interrupt-1.c
@@ -66,7 +66,7 @@ void foo (void *frame)
 /* { dg-final { scan-assembler-times {\t\.cfi_offset 132, -120} 1 } } */
 /* { dg-final { scan-assembler-times {\t\.cfi_offset 131, -128} 1 } } */
 /* { dg-final { scan-assembler-times {\t\.cfi_offset 130, -136} 1 } } */
-/* { dg-final { scan-assembler-times ".cfi_restore" 15} } */
+/* { dg-final { scan-assembler-times ".cfi_restore" 31 } } */
 /* { dg-final { scan-assembler-times "pop(?:l|q)\[\\t \]*%(?:e|r)ax" 1 } } */
 /* { dg-final { scan-assembler-times "pop(?:l|q)\[\\t \]*%(?:e|r)bx" 1 } } */
 /* { dg-final { scan-assembler-times "pop(?:l|q)\[\\t \]*%(?:e|r)cx" 1 } } */
-- 
2.49.0



[PATCH v4] libstdc++: Implement formatter for ranges and range_formatter [PR109162]

2025-04-15 Thread Tomasz Kamiński
This patch implements formatter specialization for input_ranges and
range_formatter class from P2286R8, as adjusted by P2585R1. The formatter
for pair/tuple is not yet provided, making maps not formattable.

This introduces an new _M_format_range member to internal __formatter_str,
that formats range as _CharT as string, according to the format spec.
This function transform any contiguous range into basic_string_view directly,
by computing size if necessary. Otherwise, for ranges for which size can be
computed (forward_range or sized_range) we use a stack buffer, if they are
sufficiently small. Finally, we create a basic_string<_CharT> from the range,
and format its content.

In case when padding is specified, this is handled by firstly formatting
the content of the range to the temporary string object. However, this can be
only implemented if the iterator of the basic_format_context is internal
type-erased iterator used by implementation. Otherwise a new 
basic_format_context
would need to be created, which would require rebinding of handles stored in
the arguments: note that format spec for element type could retrieve any format
argument from format context, visit and use handle to format it.
As basic_format_context provide no user-facing constructor, the user are not 
able
to construct object of that type with arbitrary iterators.

The signatures of the user-facing parse and format methods of the provided
formatters deviate from the standard by constraining types of params:
* _CharT is constrained __formatter::__char
* basic_format_parse_context<_CharT> for parse argument
* basic_format_context<_Out, _CharT> for format second argument
The standard specifies last three of above as unconstrained types. These types
are later passed to possibly user-provided formatter specializations, that are
required via formattable concept to only accept above types.

Finally, the formatter specialization is implemented
without using specialization of range-default-formatter exposition only
template as base class, while providing same functionality.

PR libstdc++/109162

libstdc++-v3/ChangeLog:

* include/std/format (__format::__has_debug_format, 
_Pres_type::_Pres_seq)
(_Pres_type::_Pres_str, __format::__Stackbuf_size): Define.
(_Separators::_S_squares, _Separators::_S_parens, _Separators::_S_comma)
(_Separators::_S_colon): Define additional constants.
(_Spec::_M_parse_fill_and_align): Define overload accepting
list of excluded characters for fill, and forward existing overload.
(__formatter_str::_M_format_range): Define.
(__format::_Buf_sink) Use __Stackbuf_size for size of array.
(__format::__is_map_formattable, std::range_formatter)
(std::formatter<_Rg, _CharT>): Define.
* src/c++23/std.cc.in (std::format_kind, std::range_format)
(std::range_formatter): Export.
* testsuite/std/format/formatter/lwg3944.cc: Guarded tests with
__glibcxx_format_ranges.
* testsuite/std/format/formatter/requirements.cc: Adjusted for standard
behavior.
* testsuite/23_containers/vector/bool/format.cc: Test vector 
formatting.
* testsuite/std/format/ranges/format_kind.cc: New test.
* testsuite/std/format/ranges/formatter.cc: New test.
* testsuite/std/format/ranges/sequence.cc: New test.
* testsuite/std/format/ranges/string.cc: New test.

Reviewed-by: Jonathan Wakely 
Signed-off-by: Tomasz Kamiński 
---
Fixed another double spacing error.

 libstdc++-v3/include/std/format   | 505 --
 libstdc++-v3/src/c++23/std.cc.in  |   6 +
 .../23_containers/vector/bool/format.cc   |   6 +
 .../testsuite/std/format/formatter/lwg3944.cc |   4 +-
 .../std/format/formatter/requirements.cc  |  14 +-
 .../std/format/ranges/format_kind.cc  |  94 
 .../testsuite/std/format/ranges/formatter.cc  | 145 +
 .../testsuite/std/format/ranges/sequence.cc   | 190 +++
 .../testsuite/std/format/ranges/string.cc | 226 
 9 files changed, 1125 insertions(+), 65 deletions(-)
 create mode 100644 libstdc++-v3/testsuite/std/format/ranges/format_kind.cc
 create mode 100644 libstdc++-v3/testsuite/std/format/ranges/formatter.cc
 create mode 100644 libstdc++-v3/testsuite/std/format/ranges/sequence.cc
 create mode 100644 libstdc++-v3/testsuite/std/format/ranges/string.cc

diff --git a/libstdc++-v3/include/std/format b/libstdc++-v3/include/std/format
index 23f00970840..096dda4f989 100644
--- a/libstdc++-v3/include/std/format
+++ b/libstdc++-v3/include/std/format
@@ -97,6 +97,10 @@ namespace __format
 #define _GLIBCXX_WIDEN_(C, S) ::std::__format::_Widen(S, L##S)
 #define _GLIBCXX_WIDEN(S) _GLIBCXX_WIDEN_(_CharT, S)
 
+  // Size for stack located buffer
+  template
+  constexpr size_t __stackbuf_size = 32 * sizeof(void*) / sizeof(_CharT);
+
   // Type-erased character sinks.
   template class _Sink;
   template class _Fixedbuf_sink;
@@ -47

RE: [PATCH]middle-end: Fix incorrect codegen with PFA and VLS [PR119351]

2025-04-15 Thread Tamar Christina
> -Original Message-
> From: Richard Biener 
> Sent: Tuesday, April 15, 2025 12:50 PM
> To: Tamar Christina 
> Cc: Richard Sandiford ; gcc-patches@gcc.gnu.org;
> nd 
> Subject: RE: [PATCH]middle-end: Fix incorrect codegen with PFA and VLS
> [PR119351]
> 
> On Tue, 15 Apr 2025, Tamar Christina wrote:
> 
> > > -Original Message-
> > > From: Richard Sandiford 
> > > Sent: Tuesday, April 15, 2025 10:52 AM
> > > To: Tamar Christina 
> > > Cc: gcc-patches@gcc.gnu.org; nd ; rguent...@suse.de
> > > Subject: Re: [PATCH]middle-end: Fix incorrect codegen with PFA and VLS
> > > [PR119351]
> > >
> > > Tamar Christina  writes:
> > > > diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
> > > > index
> > >
> 56a4e9a8b63f3cae0bf596bf5d22893887dc80e8..0722679d6e66e5dd5af4ec1c
> > > e591f7c38b76d07f 100644
> > > > --- a/gcc/tree-vect-loop-manip.cc
> > > > +++ b/gcc/tree-vect-loop-manip.cc
> > > > @@ -2195,6 +2195,22 @@ vect_can_peel_nonlinear_iv_p (loop_vec_info
> > > loop_vinfo,
> > > >return false;
> > > >  }
> > > >
> > > > +  /* With early break vectorization we don't know whether the accesses 
> > > > will
> stay
> > > > + inside the loop or not.  TODO: The early break adjustment code 
> > > > can be
> > > > + implemented the same way for vectorizable_linear_induction.  
> > > > However
> we
> > > > + can't test this today so reject it.  */
> > > > +  if (niters_skip != NULL_TREE
> > > > +  && vect_use_loop_mask_for_alignment_p (loop_vinfo)
> > > > +  && LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo)
> > > > +  && LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > > +{
> > > > +  if (dump_enabled_p ())
> > > > +   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > > > +"Peeling for alignement using masking is not 
> > > > supported"
> > > > +" for nonlinear induction when using early 
> > > > breaks.\n");
> > > > +  return false;
> > > > +}
> > > > +
> > > >return true;
> > > >  }
> > >
> > > FTR, I was wondering here whether we should predict this in advance and
> > > instead drop down to peeling for alignment without masks.  It probably
> > > isn't worth the effort though.
> >
> > We could move the check into vect_use_loop_mask_for_alignment_p where
> > rejecting it there would get it to fall back to scalar peeling.  That seems 
> > simple
> enough
> > if that's preferrable.
> 
> The above is perferable IMO (short of fixing up that case, but with
> a testcase).
> 

I wasn't able to make a testcase before as any non-linear induction feeding a 
load becomes
a gather load, which we block outright way before getting here though.  I 
couldn't think of
an example where it wouldn't be, even a gapped load e.g +=2 became one.

Thanks,
Tamar

> Richard.
> 
> > Cheers,
> > Tamar
> > >
> > > > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> > > > index
> > >
> 9413dcef702597ab27165e676546b190e2bd36ba..6dcdee19bb250993d8cc6b0
> > > 057d2fa46245d04d9 100644
> > > > --- a/gcc/tree-vect-loop.cc
> > > > +++ b/gcc/tree-vect-loop.cc
> > > > @@ -10678,6 +10678,104 @@ vectorizable_induction (loop_vec_info
> > > loop_vinfo,
> > > >LOOP_VINFO_MASK_SKIP_NITERS
> > > (loop_vinfo));
> > > >   peel_mul = gimple_build_vector_from_val (&init_stmts,
> > > >step_vectype, 
> > > > peel_mul);
> > > > +
> > > > + /* If early break then we have to create a new PHI which we 
> > > > can use as
> > > > +   an offset to adjust the induction reduction in early exits. 
> > > >  */
> > > > + if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > > +   {
> > > > + auto skip_niters = LOOP_VINFO_MASK_SKIP_NITERS 
> > > > (loop_vinfo);
> > > > + tree ty_skip_niters = TREE_TYPE (skip_niters);
> > > > + tree break_lhs_phi = NULL_TREE;
> > > > + break_lhs_phi = vect_get_new_vect_var (ty_skip_niters,
> > > > +vect_scalar_var,
> > > > +"pfa_iv_offset");
> > > > + gphi *nphi = create_phi_node (break_lhs_phi, bb);
> > > > + add_phi_arg (nphi, skip_niters, pe, UNKNOWN_LOCATION);
> > > > + add_phi_arg (nphi, build_zero_cst (ty_skip_niters),
> > > > +  loop_latch_edge (iv_loop), UNKNOWN_LOCATION);
> > > > +
> > > > + /* Rewrite all the early exit usages.  */
> > > > + tree phi_lhs = PHI_RESULT (phi);
> > > > + imm_use_iterator iter;
> > > > + use_operand_p use_p;
> > > > + gimple *use_stmt;
> > > > +
> > > > + FOR_EACH_IMM_USE_FAST (use_p, iter, phi_lhs)
> > > > +   {
> > > > + use_stmt = USE_STMT (use_p);
> > > > + if (!flow_bb_inside_loop_p (iv_loop, gimple_bb 
> > > > 

[PATCH v4 08/20] Add get_clone_versions and get_version functions.

2025-04-15 Thread Alfie Richards
This is a reimplementation of get_target_clone_attr_len,
get_attr_str, and separate_attrs using string_slice and auto_vec to make
memory management and use simpler.

Adds get_target_version helper function to get the target_version string
from a decl.

gcc/c-family/ChangeLog:

* c-attribs.cc (handle_target_clones_attribute): Change to use
get_clone_versions.

gcc/ChangeLog:

* tree.cc (get_clone_versions): New function.
(get_clone_attr_versions): New function.
(get_version): New function.
* tree.h (get_clone_versions): New function.
(get_clone_attr_versions): New function.
(get_target_version): New function.
---
 gcc/c-family/c-attribs.cc |  4 ++-
 gcc/tree.cc   | 59 +++
 gcc/tree.h| 11 
 3 files changed, 73 insertions(+), 1 deletion(-)

diff --git a/gcc/c-family/c-attribs.cc b/gcc/c-family/c-attribs.cc
index 5a0e3d328ba..5dff489fcca 100644
--- a/gcc/c-family/c-attribs.cc
+++ b/gcc/c-family/c-attribs.cc
@@ -6132,7 +6132,9 @@ handle_target_clones_attribute (tree *node, tree name, 
tree ARG_UNUSED (args),
}
}
 
-  if (get_target_clone_attr_len (args) == -1)
+  auto_vec versions= get_clone_attr_versions (args, NULL);
+
+  if (versions.length () == 1)
{
  warning (OPT_Wattributes,
   "single % attribute is ignored");
diff --git a/gcc/tree.cc b/gcc/tree.cc
index eccfcc89da4..fdcdfb336bc 100644
--- a/gcc/tree.cc
+++ b/gcc/tree.cc
@@ -15372,6 +15372,65 @@ get_target_clone_attr_len (tree arglist)
   return str_len_sum;
 }
 
+/* Returns an auto_vec of string_slices containing the version strings from
+   ARGLIST.  DEFAULT_COUNT is incremented for each default version found.  */
+
+auto_vec
+get_clone_attr_versions (const tree arglist, int *default_count)
+{
+  gcc_assert (TREE_CODE (arglist) == TREE_LIST);
+  auto_vec versions;
+
+  static const char separator_str[] = {TARGET_CLONES_ATTR_SEPARATOR, 0};
+  string_slice separators = string_slice (separator_str);
+
+  for (tree arg = arglist; arg; arg = TREE_CHAIN (arg))
+{
+  string_slice str = string_slice (TREE_STRING_POINTER (TREE_VALUE (arg)));
+  while (str.is_valid ())
+   {
+ string_slice attr = string_slice::tokenize (&str, separators);
+ attr = attr.strip ();
+
+ if (attr == "default" && default_count)
+   (*default_count)++;
+ versions.safe_push (attr);
+   }
+}
+  return versions;
+}
+
+/* Returns an auto_vec of string_slices containing the version strings from
+   the target_clone attribute from DECL.  DEFAULT_COUNT is incremented for each
+   default version found.  */
+auto_vec
+get_clone_versions (const tree decl, int *default_count)
+{
+  tree attr = lookup_attribute ("target_clones", DECL_ATTRIBUTES (decl));
+  if (!attr)
+return auto_vec ();
+  tree arglist = TREE_VALUE (attr);
+  return get_clone_attr_versions (arglist, default_count);
+}
+
+/* If DECL has a target_version attribute, returns a string_slice containing 
the
+   attribute value.  Otherwise, returns string_slice::invalid.
+   Only works for target_version due to target attributes allowing multiple
+   string arguments to specify one target.  */
+string_slice
+get_target_version (const tree decl)
+{
+  gcc_assert (!TARGET_HAS_FMV_TARGET_ATTRIBUTE);
+
+  tree attr = lookup_attribute ("target_version", DECL_ATTRIBUTES (decl));
+
+  if (!attr)
+return string_slice::invalid ();
+
+  return string_slice (TREE_STRING_POINTER (TREE_VALUE (TREE_VALUE (attr
+  .strip ();
+}
+
 void
 tree_cc_finalize (void)
 {
diff --git a/gcc/tree.h b/gcc/tree.h
index 99f26177628..a89f3cf7189 100644
--- a/gcc/tree.h
+++ b/gcc/tree.h
@@ -22,6 +22,7 @@ along with GCC; see the file COPYING3.  If not see
 
 #include "tree-core.h"
 #include "options.h"
+#include "vec.h"
 
 /* Convert a target-independent built-in function code to a combined_fn.  */
 
@@ -7052,4 +7053,14 @@ extern tree get_attr_nonstring_decl (tree, tree * = 
NULL);
 
 extern int get_target_clone_attr_len (tree);
 
+/* Returns the version string for a decl with target_version attribute.
+   Returns an invalid string_slice if no attribute is present.  */
+extern string_slice get_target_version (const tree);
+/* Returns a vector of the version strings from a target_clones attribute on
+   a decl.  Can also record the number of default versions found.  */
+extern auto_vec get_clone_versions (const tree, int * = NULL);
+/* Returns a vector of the version strings from a target_clones attribute
+   directly.  */
+extern auto_vec get_clone_attr_versions (const tree, int *);
+
 #endif  /* GCC_TREE_H  */
-- 
2.34.1



Re: [PATCH v2] RISC-V: Do not lift up vsetvl into non-transparent blocks [PR119547].

2025-04-15 Thread Jeff Law




On 4/9/25 6:08 AM, Robin Dapp wrote:

Hi,

when lifting up a vsetvl into a block we currently don't consider the
block's transparency with respect to the vsetvl as in other parts of the
pass.  This patch does not perform the lift when transparency is not
guaranteed.

This condition is more restrictive than necessary as we can still
perform a vsetvl lift if the conflicting register is only every used
in vsetvls and no regular insns but given how late we are in the GCC 15
cycle it seems better to defer this.  Therefore
gcc.target/riscv/rvv/vsetvl/avl_single-68.c is XFAILed for now.

This issue was found in OpenCV where it manifests as a runtime error.
Zhijin Zeng debugged PR119547 and provided an initial patch.

V2 now uses the transparency property rather than the manual approach 
before, both because it is cleaner and also because it helps with the go 
ICE

in PR119533.

Regtested on rv64gcv_zvl512b.

Regards
Robin

Reported-By: 曾治金 

 PR target/119547

gcc/ChangeLog:

 * config/riscv/riscv-vsetvl.cc 
(pre_vsetvl::earliest_fuse_vsetvl_info):

 Do not perform lift if block is not transparent.

gcc/testsuite/ChangeLog:

 * gcc.target/riscv/rvv/vsetvl/avl_single-68.c: xfail.
 * g++.target/riscv/rvv/autovec/pr119547.C: New test.
 * g++.target/riscv/rvv/autovec/pr119547-2.C: New test.
 * gcc.target/riscv/rvv/vsetvl/vlmax_switch_vtype-10.c: Adjust.

OK for the trunk.

jeff



Re: [PATCH] c++: wrong targs printed in hard satisfaction error [PR99214]

2025-04-15 Thread Jason Merrill

On 4/13/25 1:56 PM, Patrick Palka wrote:

Alternatively, rather than passing the most general template + args to
push_tinst_level, we can pass the partially instantiated template +
innermost args via just:

gcc/cp/ChangeLog:

* constraint.cc (satisfy_declaration_constraints): Pass the
original T and ARGS to push_tinst_level.

diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc
index 2f1678ce4ff9..52768972da43 100644
--- a/gcc/cp/constraint.cc
+++ b/gcc/cp/constraint.cc
@@ -2704,6 +2704,8 @@ satisfy_declaration_constraints (tree t, sat_info info)
  static tree
  satisfy_declaration_constraints (tree t, tree args, sat_info info)
  {
+  tree orig_t = t, orig_args = args;
+
/* Update the declaration for diagnostics.  */
info.in_decl = t;
  
@@ -2732,7 +2734,7 @@ satisfy_declaration_constraints (tree t, tree args, sat_info info)

tree result = boolean_true_node;
if (tree norm = get_normalized_constraints_from_decl (t, info.noisy ()))
  {
-  if (!push_tinst_level (t, args))
+  if (!push_tinst_level (orig_t, orig_args))
return result;
tree pattern = DECL_TEMPLATE_RESULT (t);
push_to_top_level ();

So that for diagnostic20.C in question we emit:

   In substitution of '... void A::f() [with U = char]'.

compared to (with the previous approach)

   In substitution of '... void A::f() [with U = char; T = int]'.

or (wrongly, with the status quo)

   In substitution of '... void A::f() [with U = int]'

Would this be preferable?  I'd be good with either.


This approach certainly seems tidier; OK.

Jason


On Wed, 9 Apr 2025, Patrick Palka wrote:


On Wed, 9 Apr 2025, Patrick Palka wrote:


On Wed, 5 Mar 2025, Jason Merrill wrote:


On 3/5/25 10:13 AM, Patrick Palka wrote:

On Tue, 4 Mar 2025, Jason Merrill wrote:


On 3/4/25 2:49 PM, Patrick Palka wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK
for trunk/14?

-- >8 --

In the three-parameter version of satisfy_declaration_constraints, when
't' isn't the most general template, then 't' won't correspond with
'args' after we augment the latter via add_outermost_template_args, and
so the instantiation context that we push via push_tinst_level isn't
quite correct: 'args' is a complete set of template arguments, but 't'
is not necessarily the most general template.  This manifests as
misleading diagnostic context lines when issuing a hard error (or a
constraint recursion error) that occurred during satisfaction, e.g. for
the below testcase without this patch we emit:
 In substitution of '... void A::f() [with U = int]'
and with this patch we emit:
 In substitution of '... void A::f() [with U = char; T = int]'.

This patch fixes this by always passing the most general template to
push_tinst_level.


That soungs good, but getting it by passing it back from
get_normalized_constraints_from_decl seems confusing; I'd think we should
calculate it in parallel to changing args to correspond to that template.


Hmm, won't that mean duplicating the template adjustment logic in
get_normalized_constraints_from_decl, which seems undesirable?  The
function has many callers, some of which are for satisfaction where
targs are involved, and the rest are for subsumption where no targs are
involved, so I don't see a clean way of refactoring the code to avoid
duplication of the template adjustment logic.  Right now the targ
adjustment logic is unfortunately duplicated across both overloads
of satisfy_declaration_constraints and it seems undesirable to add
more duplication.


Fair enough.  Incidentally, I wonder why the two-parm overload doesn't call
the three-parm overload?


Maybe one way to reduce the duplication would be to go the other way and
move the targ adjustment logic to get_normalized_constraints_from_decl
as well (so that it has two out-parameters, 'gen_d' and 'gen_args').
The proposed patch then would be an incremental step towards that.


That makes sense, passing back something suitable for
add_outermost_template_args.


I tried combining the two overloads, and/or moving the targ adjustment
logic to get_normalized_constraints_from_decl, but I couldn't arrive at
a formulation that worked and I was happy with (i.e. didn't lead to more
code duplication than the original appproach).

In the meantime I noticed that this bug is more pervasive than I
thought, and leads to wrong diagnostic context lines printed even in the
case of ordinary satisfaction failure -- however the wrong diagnostic
lines are more annoying/noticable during a hard error or constraint
recursion where there's likely no other useful diagnostic lines that
might have the correct args printed.

So I adjusted the testcase in the original patch accordingly.  Could the
following go in for now?

I also attached a diff of the output of all our concepts testcases
currently, before/after this patch.  Each change seems like a clear
improvement/correction to me.


Oops, that was not a complete diff of all the concepts

Re: [PATCH] c: Fully fold each parameter for call to .ACCESS_WITH_SIZE [PR119717]

2025-04-15 Thread Qing Zhao


> On Apr 14, 2025, at 16:35, Marek Polacek  wrote:
> 
> On Mon, Apr 14, 2025 at 08:28:55PM +, Qing Zhao wrote:
>> C_MAYBE_CONST_EXPR is a C FE operator that will be removed by c_fully_fold.
>> In c_fully_fold, it assumes that operands of function calls have already
>> been folded. However, when we build call to .ACCESS_WITH_SIZE, all its
>> operands are not fully folded. therefore the C FE specific operator is
>> passed to middle-end.
>> 
>> In order to fix this issue, fully fold the parameters before building the
>> call to .ACCESS_WITH_SIZE.
>> 
>> I am doing the bootstrap and regression testing on both X86 and aarch64 now.
>> Okay for trunk if testing going well?
>> 
>> thanks.
>> 
>> Qing
>> 
>> PR c/119717
>> 
>> gcc/c/ChangeLog:
>> 
>> * c-typeck.cc (build_access_with_size_for_counted_by): Fully fold the
>> parameters for call to .ACCESS_WITH_SIZE.
>> 
>> gcc/testsuite/ChangeLog:
>> 
>> * gcc.dg/pr119717.c: New test.
>> ---
>> gcc/c/c-typeck.cc   |  8 ++--
>> gcc/testsuite/gcc.dg/pr119717.c | 24 
>> 2 files changed, 30 insertions(+), 2 deletions(-)
>> create mode 100644 gcc/testsuite/gcc.dg/pr119717.c
>> 
>> diff --git a/gcc/c/c-typeck.cc b/gcc/c/c-typeck.cc
>> index 3870e8a1558..dd176d96a41 100644
>> --- a/gcc/c/c-typeck.cc
>> +++ b/gcc/c/c-typeck.cc
>> @@ -3013,12 +3013,16 @@ build_access_with_size_for_counted_by (location_t 
>> loc, tree ref,
>>   gcc_assert (c_flexible_array_member_type_p (TREE_TYPE (ref)));
>>   /* The result type of the call is a pointer to the flexible array type.  */
>>   tree result_type = c_build_pointer_type (TREE_TYPE (ref));
>> +  tree first_param
>> += c_fully_fold (array_to_pointer_conversion (loc, ref), FALSE, NULL);
>> +  tree second_param
>> += c_fully_fold (counted_by_ref, FALSE, NULL);
> 
> Why FALSE?  Just use false.  You can also use nullptr rather than NULL now.

Just replaced FALSE with false.
I am keeping NULL to be consistent with other calls to c_fully_fold in the same 
file.

And  testing the new version now.

Thanks a lot.

(With FALSE, the compilation went fine…)

Qing
> 
>>   tree call
>> = build_call_expr_internal_loc (loc, IFN_ACCESS_WITH_SIZE,
>> result_type, 6,
>> - array_to_pointer_conversion (loc, ref),
>> - counted_by_ref,
>> + first_param,
>> + second_param,
>> build_int_cst (integer_type_node, 1),
>> build_int_cst (counted_by_type, 0),
>> build_int_cst (integer_type_node, -1),
>> diff --git a/gcc/testsuite/gcc.dg/pr119717.c 
>> b/gcc/testsuite/gcc.dg/pr119717.c
>> new file mode 100644
>> index 000..e5eedc567b3
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/pr119717.c
>> @@ -0,0 +1,24 @@
>> +/* PR c/119717  */
>> +/* { dg-additional-options "-std=c23" } */
>> +/* { dg-do compile } */
>> +
>> +struct annotated {
>> +  unsigned count;
>> +  [[gnu::counted_by(count)]] char array[];
>> +};
>> +
>> +[[gnu::noinline,gnu::noipa]]
>> +static unsigned
>> +size_of (bool x, struct annotated *a)
>> +{
>> +  char *p = (x ? a : 0)->array;
>> +  return __builtin_dynamic_object_size (p, 1);
>> +}
>> +
>> +int main()
>> +{
>> +  struct annotated *p = __builtin_malloc(sizeof *p);
>> +  p->count = 0;
>> +  __builtin_printf ("the bdos whole is %ld\n", size_of (0, p));
>> +  return 0;
>> +}
>> -- 
>> 2.31.1
>> 
> 
> Marek




Re: [PATCH] MATCH: Fix patterns of type (a != b) and (a == b) [PR117760]

2025-04-15 Thread Jeff Law




On 4/15/25 12:24 AM, Eikansh Gupta wrote:

The patterns can be simplified as shown below:

(a != b) & ((a|b) != 0)  -> (a != b)
(a != b) | ((a|b) != 0)  -> ((a|b) != 0)

The similar simplification can be there for (a == b). This patch adds
simplification for above patterns. The forwprop pass was modifying the
patterns to some other form and they were not getting simplified. The
patch also adds simplification for those patterns.

Bootstrapped and tested on x86_64-linux-gnu with no regressions.

PR 117760

gcc/ChangeLog:

* match.pd ((a != b) and/or ((a | b) != 0)): New pattern.
   ((a == b) and/or (a | b) == 0): New pattern.
   ((a == b) & (a | b) == 0): New pattern.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/pr117760-1.c: New test.
* gcc.dg/tree-ssa/pr117760-2.c: New test.
* gcc.dg/tree-ssa/pr117760.c: New test.

Deferring to gcc-16 stage1.

jeff



Re: [GCC16,RFC,V2 02/14] aarch64: add new define_insn for subg

2025-04-15 Thread Richard Sandiford
Hi,

Indu Bhagat  writes:
> subg (Subtract with Tag) is an Armv8.5-A memory tagging (MTE)
> instruction.  It can be used to subtract an immediate value scaled by
> the tag granule from the address in the source register.
>
> gcc/ChangeLog:
>
>   * config/aarch64/aarch64.md (subg): New definition.

In my previous comment about this patch:

  https://gcc.gnu.org/pipermail/gcc-patches/2024-November/668669.html

I hadn't realised that the pattern follows the existing "addg" pattern.
But...

> ---
>  gcc/config/aarch64/aarch64.md | 17 +
>  1 file changed, 17 insertions(+)
>
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index 031e621c98a1..0c7aebb838cd 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -8416,6 +8416,23 @@
>[(set_attr "type" "memtag")]
>  )
>  
> +(define_insn "subg"
> +  [(set (match_operand:DI 0 "register_operand" "=rk")
> + (ior:DI
> +  (and:DI (minus:DI (match_operand:DI 1 "register_operand" "rk")
> +   (match_operand:DI 2 "aarch64_granule16_uimm6" "i"))
> +  (const_int -1080863910568919041)) ;; 0xf0ff...
> +  (ashift:DI
> +   (unspec:QI
> +[(and:QI (lshiftrt:DI (match_dup 1) (const_int 56)) (const_int 15))
> + (match_operand:QI 3 "aarch64_memtag_tag_offset" "i")]
> +UNSPEC_GEN_TAG)
> +   (const_int 56]
> +  "TARGET_MEMTAG"
> +  "subg\\t%0, %1, #%2, #%3"
> +  [(set_attr "type" "memtag")]
> +)
> +
>  (define_insn "subp"
>[(set (match_operand:DI 0 "register_operand" "=r")
>   (minus:DI

...subtractions of constants are canonically expressed using (plus ...)
of a negative number, rather than (minus ...) of a positive number.
So I think we should instead add subg to the existing addg pattern.
That is, in:

(define_insn "addg"
  [(set (match_operand:DI 0 "register_operand" "=rk")
(ior:DI
 (and:DI (plus:DI (match_operand:DI 1 "register_operand" "rk")
  (match_operand:DI 2 "aarch64_granule16_uimm6" "i"))
 (const_int -1080863910568919041)) ;; 0xf0ff...
 (ashift:DI
  (unspec:QI
   [(and:QI (lshiftrt:DI (match_dup 1) (const_int 56)) (const_int 15))
(match_operand:QI 3 "aarch64_memtag_tag_offset" "i")]
   UNSPEC_GEN_TAG)
  (const_int 56]
  "TARGET_MEMTAG"
  "addg\\t%0, %1, #%2, #%3"
  [(set_attr "type" "memtag")]
)

the aarch64_granule16_uimm6 would be replaced with a predicate that
accepts all multiples of 16 in the range [-1008, 1008].  Then the
output pattern would generate an addg or subg instruction based on
whether operand 2 is negative.

Thanks,
Richard


[PATCH] OpenMP: omp.h omp::allocator C++ Allocator interface

2025-04-15 Thread Alex
Tested on x86_64-pc-linux-gnu, this is only a library addition (and a
few tests) so it shouldn't cause any major impacts.  I also tested
libgomp C to ensure the conditional compile was working.

Okay for trunk?
From 1ef3fe0a1f026689e64963ec9ab0b04b7e6b1bde Mon Sep 17 00:00:00 2001
From: waffl3x 
Date: Tue, 15 Apr 2025 04:12:55 -0600
Subject: [PATCH] OpenMP: omp.h omp::allocator C++ Allocator interface

The implementation of each allocator is simplified by inheriting from
__detail::__allocator_templ.  At the moment, none of the implementations
diverge in any way, simply passing in the allocator handle to be used when
an allocation is made.  In the future, const_mem will need special handling
added to it to support constant memory space.

libgomp/ChangeLog:

	* omp.h.in: Add omp::allocator::* and ompx::allocator::* allocators.
	(__detail::__allocator_templ):
	New struct template.
	(null_allocator): New struct template.
	(default_mem): Likewise.
	(large_cap_mem): Likewise.
	(const_mem): Likewise.
	(high_bw_mem): Likewise.
	(low_lat_mem): Likewise.
	(cgroup_mem): Likewise.
	(pteam_mem): Likewise.
	(thread_mem): Likewise.
	(ompx::allocator::gnu_pinned_mem): Likewise.
	* testsuite/libgomp.c++/allocator-1.C: New test.
	* testsuite/libgomp.c++/allocator-2.C: New test.

Signed-off-by: waffl3x 
---
 libgomp/omp.h.in| 132 
 libgomp/testsuite/libgomp.c++/allocator-1.C | 158 
 libgomp/testsuite/libgomp.c++/allocator-2.C | 132 
 3 files changed, 422 insertions(+)
 create mode 100644 libgomp/testsuite/libgomp.c++/allocator-1.C
 create mode 100644 libgomp/testsuite/libgomp.c++/allocator-2.C

diff --git a/libgomp/omp.h.in b/libgomp/omp.h.in
index d5e8be46e94..8d17db1da9a 100644
--- a/libgomp/omp.h.in
+++ b/libgomp/omp.h.in
@@ -432,4 +432,136 @@ extern const char *omp_get_uid_from_device (int) __GOMP_NOTHROW;
 }
 #endif
 
+#if __cplusplus >= 201103L
+
+/* std::__throw_bad_alloc and std::__throw_bad_array_new_length.  */
+#include 
+
+namespace omp
+{
+namespace allocator
+{
+
+namespace __detail
+{
+
+template
+struct __allocator_templ
+{
+  using value_type = __T;
+  using pointer = __T*;
+  using const_pointer = const __T*;
+  using size_type = __SIZE_TYPE__;
+  using difference_type = __PTRDIFF_TYPE__;
+
+  __T*
+  allocate (size_type __n)
+  {
+if (__SIZE_MAX__ / sizeof(__T) < __n)
+  std::__throw_bad_array_new_length ();
+void *__p = omp_aligned_alloc (alignof(__T), __n * sizeof(__T), __Handle);
+if (!__p)
+  std::__throw_bad_alloc ();
+return static_cast<__T*>(__p);
+  }
+
+  void
+  deallocate (__T *__p, size_type) __GOMP_NOTHROW
+  {
+omp_free (static_cast(__p), __Handle);
+  }
+};
+
+template
+constexpr bool
+operator== (const __allocator_templ<__T, __Handle>&,
+	const __allocator_templ<__U, __Handle>&) __GOMP_NOTHROW
+{
+  return true;
+}
+
+template
+constexpr bool
+operator== (const __allocator_templ<__T, __Handle>&,
+	const __allocator_templ<__U, __UHandle>&) __GOMP_NOTHROW
+{
+  return false;
+}
+
+template
+constexpr bool
+operator!= (const __allocator_templ<__T, __Handle>&,
+	const __allocator_templ<__U, __Handle>&) __GOMP_NOTHROW
+{
+  return false;
+}
+
+template
+constexpr bool
+operator!= (const __allocator_templ<__T, __Handle>&,
+	const __allocator_templ<__U, __UHandle>&) __GOMP_NOTHROW
+{
+  return true;
+}
+
+} /* namespace __detail */
+
+template
+struct null_allocator
+  : __detail::__allocator_templ<__T, omp_null_allocator> {};
+
+template
+struct default_mem
+  : __detail::__allocator_templ<__T, omp_default_mem_alloc> {};
+
+template
+struct large_cap_mem
+  : __detail::__allocator_templ<__T, omp_large_cap_mem_alloc> {};
+
+template
+struct const_mem
+  : __detail::__allocator_templ<__T, omp_const_mem_alloc> {};
+
+template
+struct high_bw_mem
+  : __detail::__allocator_templ<__T, omp_high_bw_mem_alloc> {};
+
+template
+struct low_lat_mem
+  : __detail::__allocator_templ<__T, omp_low_lat_mem_alloc> {};
+
+template
+struct cgroup_mem
+  : __detail::__allocator_templ<__T, omp_cgroup_mem_alloc> {};
+
+template
+struct pteam_mem
+  : __detail::__allocator_templ<__T, omp_pteam_mem_alloc> {};
+
+template
+struct thread_mem
+  : __detail::__allocator_templ<__T, omp_thread_mem_alloc> {};
+
+} /* namespace allocator */
+
+} /* namespace omp */
+
+namespace ompx
+{
+
+namespace allocator
+{
+
+template
+struct gnu_pinned_mem
+  : omp::allocator::__detail::__allocator_templ<__T, ompx_gnu_pinned_mem_alloc> {};
+
+} /* namespace allocator */
+
+} /* namespace ompx */
+
+#endif /* __cplusplus */
+
 #endif /* _OMP_H */
diff --git a/libgomp/testsuite/libgomp.c++/allocator-1.C b/libgomp/testsuite/libgomp.c++/allocator-1.C
new file mode 100644
index 000..725beade0c8
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c++/allocator-1.C
@@ -0,0 +1,158 @@
+// { dg-do run }
+
+#include 
+#include 
+#include 
+
+template class Alloc>
+void test (T const initial_value = T())
+{
+  using Allocator 

Regenerate common.opt.urls

2025-04-15 Thread Kyrylo Tkachov
Pushing as obvious.
Thanks,
Kyrill

Signed-off-by: Kyrylo Tkachov 

* common.opt.urls: Regenerate.



0001-Regenerate-common.opt.urls.patch
Description: 0001-Regenerate-common.opt.urls.patch


Re: [PATCH] x86: Update gcc.target/i386/apx-interrupt-1.c

2025-04-15 Thread Uros Bizjak
On Tue, Apr 15, 2025 at 1:06 AM H.J. Lu  wrote:
>
> ix86_add_cfa_restore_note omits the REG_CFA_RESTORE REG note for registers
> pushed in red-zone.  Since
>
> commit 0a074b8c7e79f9d9359d044f1499b0a9ce9d2801
> Author: H.J. Lu 
> Date:   Sun Apr 13 12:20:42 2025 -0700
>
> APX: Don't use red-zone with 32 GPRs and no caller-saved registers
>
> disabled red-zone, update gcc.target/i386/apx-interrupt-1.c to expect
> 31 .cfi_restore directives.

Hm, did you also account for RED_ZONE_RESERVE? The last 8-byte slot is
reserved for internal use by the compiler.

Uros.

>
> PR target/119784
> * gcc.target/i386/apx-interrupt-1.c: Expect 31 .cfi_restore
> directives.
>
> Signed-off-by: H.J. Lu 
> ---
>  gcc/testsuite/gcc.target/i386/apx-interrupt-1.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/gcc/testsuite/gcc.target/i386/apx-interrupt-1.c 
> b/gcc/testsuite/gcc.target/i386/apx-interrupt-1.c
> index fefe2e6d6fc..fa1acc7a142 100644
> --- a/gcc/testsuite/gcc.target/i386/apx-interrupt-1.c
> +++ b/gcc/testsuite/gcc.target/i386/apx-interrupt-1.c
> @@ -66,7 +66,7 @@ void foo (void *frame)
>  /* { dg-final { scan-assembler-times {\t\.cfi_offset 132, -120} 1 } } */
>  /* { dg-final { scan-assembler-times {\t\.cfi_offset 131, -128} 1 } } */
>  /* { dg-final { scan-assembler-times {\t\.cfi_offset 130, -136} 1 } } */
> -/* { dg-final { scan-assembler-times ".cfi_restore" 15} } */
> +/* { dg-final { scan-assembler-times ".cfi_restore" 31 } } */
>  /* { dg-final { scan-assembler-times "pop(?:l|q)\[\\t \]*%(?:e|r)ax" 1 } } */
>  /* { dg-final { scan-assembler-times "pop(?:l|q)\[\\t \]*%(?:e|r)bx" 1 } } */
>  /* { dg-final { scan-assembler-times "pop(?:l|q)\[\\t \]*%(?:e|r)cx" 1 } } */
> --
> 2.49.0
>


Re: [PATCH v5 2/2] i386: Enable -mnop-mcount for -fpic with PLTs

2025-04-15 Thread Uros Bizjak
On Thu, Apr 10, 2025 at 2:26 PM Ard Biesheuvel  wrote:
>
> From: Ard Biesheuvel 
>
> -mnop-mcount can be trivially enabled for -fPIC codegen as long as PLTs
> are being used, given that the instruction encodings are identical, only
> the target may resolve differently depending on how the linker decides
> to incorporate the object file.
>
> So relax the option check, and add a test to ensure that 5-byte NOPs are
> emitted when -mnop-mcount is being used.
>
> Signed-off-by: Ard Biesheuvel 
>
> gcc/ChangeLog:
>
> PR target/119386
> * config/i386/i386-options.cc: Permit -mnop-mcount when using
>   -fpic with PLTs.
>
> gcc/testsuite/ChangeLog:
>
> PR target/119386
> * gcc.target/i386/pr119386-3.c: New test.

LGTM.

Thanks,
Uros.

> ---
>  gcc/config/i386/i386-options.cc|  4 ++--
>  gcc/testsuite/gcc.target/i386/pr119386-3.c | 10 ++
>  2 files changed, 12 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr119386-3.c
>
> diff --git a/gcc/config/i386/i386-options.cc b/gcc/config/i386/i386-options.cc
> index a9fac011f3d..964449fa8cd 100644
> --- a/gcc/config/i386/i386-options.cc
> +++ b/gcc/config/i386/i386-options.cc
> @@ -2828,8 +2828,8 @@ ix86_option_override_internal (bool main_args_p,
>if (flag_nop_mcount)
>  error ("%<-mnop-mcount%> is not compatible with this target");
>  #endif
> -  if (flag_nop_mcount && flag_pic)
> -error ("%<-mnop-mcount%> is not implemented for %<-fPIC%>");
> +  if (flag_nop_mcount && flag_pic && !flag_plt)
> +error ("%<-mnop-mcount%> is not implemented for %<-fno-plt%>");
>
>/* Accept -msseregparm only if at least SSE support is enabled.  */
>if (TARGET_SSEREGPARM_P (opts->x_target_flags)
> diff --git a/gcc/testsuite/gcc.target/i386/pr119386-3.c 
> b/gcc/testsuite/gcc.target/i386/pr119386-3.c
> new file mode 100644
> index 000..287410b951a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr119386-3.c
> @@ -0,0 +1,10 @@
> +/* PR target/119386 */
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fpic -pg -mnop-mcount" } */
> +/* { dg-final { scan-assembler ".byte\[ \t\]+0x0f, 0x1f, 0x44, 0x00, 0x00" } 
> } */
> +
> +int
> +main ()
> +{
> +  return 0;
> +}
> --
> 2.49.0.504.g3bcea36a83-goog
>


Re: [PATCH] combine: Correct comment about combine_validate_cost

2025-04-15 Thread Richard Sandiford
Hans-Peter Nilsson  writes:
> Noticed while investigating a regression for cris-elf with
> r15-9239-g4d7a634f6d4102 "combine: Allow 2->2 combinations,
> but with a tweak [PR116398]" (to-be-reported).
>
> The comment was introduced when breaking out the
> combine_validate_cost function, in r0-59417-g64b8935d4809f3.
>
> I thought about wordsmithing to keep the "polarity" of the
> statement, but "are equal to or cheaper than" didn't read
> well.
>
> Ok to commit?

OK, thanks.

Richard

> -- >8 --
> The *code* has been the same since forever, but this
> comment, at a critical path, is misleading: if the new cost
> is the same (like, when doing an identity replacement), then
> combine_validate_cost returns true.
>
>   * combine.cc (try_combine): Correct comment about
>   combine_validate_cost.
> ---
>  gcc/combine.cc | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/combine.cc b/gcc/combine.cc
> index 5f085187cfef..c2c1d50ca49f 100644
> --- a/gcc/combine.cc
> +++ b/gcc/combine.cc
> @@ -4129,8 +4129,8 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, 
> rtx_insn *i0,
>   }
>  }
>  
> -  /* Only allow this combination if insn_cost reports that the
> - replacement instructions are cheaper than the originals.  */
> +  /* Reject this combination if insn_cost reports that the replacement
> + instructions are more expensive than the originals.  */
>if (!combine_validate_cost (i0, i1, i2, i3, newpat, newi2pat, other_pat))
>  {
>undo_all ();


Re: [PATCH v5 1/2] i386: Prefer PLT indirection for __fentry__ calls under -fPIC

2025-04-15 Thread Uros Bizjak
On Thu, Apr 10, 2025 at 2:27 PM Ard Biesheuvel  wrote:
>
> From: Ard Biesheuvel 
>
> Commit bde21de1205 ("i386: Honour -mdirect-extern-access when calling
> __fentry__") updated the logic that emits mcount() / __fentry__() calls
> into function prologues when profiling is enabled, to avoid GOT-based
> indirect calls when a direct call would suffice.
>
> There are two problems with that change:
> - it relies on -mdirect-extern-access rather than -fno-plt to decide
>   whether or not a direct [PLT based] call is appropriate;
> - for the PLT case, it falls through to x86_print_call_or_nop(), which
>   does not emit the @PLT suffix, resulting in the wrong relocation to be
>   used (R_X86_64_PC32 instead of R_X86_64_PLT32)
>
> Fix this by testing flag_plt instead of ix86_direct_extern_access, and
> updating x86_print_call_or_nop() to take flag_pic and flag_plt into
> account. This also ensures that -mnop-mcount works as expected when
> emitting the PLT based profiling calls.
>
> While at it, fix the 32-bit logic as well, and issue a PLT call unless
> PLTs are explicitly disabled.
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119386
>
> Signed-off-by: Ard Biesheuvel 
>
> gcc/ChangeLog:
>
> PR target/119386
> * config/i386/i386.cc (x86_print_call_or_nop): Add @PLT suffix
> where appropriate.
> (x86_function_profiler): Fall through to x86_print_call_or_nop()
> for PIC codegen when flag_plt is set.
>
> gcc/testsuite/ChangeLog:
>
> PR target/119386
> * gcc.target/i386/pr119386-1.c: New test.
> * gcc.target/i386/pr119386-2.c: New test.

OK if there are no further comments in the next day or two.

BTW: Do you have commit rights?

Thanks,
Uros.

> ---
>  gcc/config/i386/i386.cc| 12 ++--
>  gcc/testsuite/gcc.target/i386/pr119386-1.c | 10 ++
>  gcc/testsuite/gcc.target/i386/pr119386-2.c | 12 
>  3 files changed, 32 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr119386-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr119386-2.c
>
> diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> index 4f8380c4a58..20059b775b9 100644
> --- a/gcc/config/i386/i386.cc
> +++ b/gcc/config/i386/i386.cc
> @@ -23158,6 +23158,12 @@ x86_print_call_or_nop (FILE *file, const char 
> *target)
>if (flag_nop_mcount || !strcmp (target, "nop"))
>  /* 5 byte nop: nopl 0(%[re]ax,%[re]ax,1) */
>  fprintf (file, "1:" ASM_BYTE "0x0f, 0x1f, 0x44, 0x00, 0x00\n");
> +  else if (!TARGET_PECOFF && flag_pic)
> +{
> +  gcc_assert (flag_plt);
> +
> +  fprintf (file, "1:\tcall\t%s@PLT\n", target);
> +}
>else
>  fprintf (file, "1:\tcall\t%s\n", target);
>  }
> @@ -23321,7 +23327,7 @@ x86_function_profiler (FILE *file, int labelno 
> ATTRIBUTE_UNUSED)
>   break;
> case CM_SMALL_PIC:
> case CM_MEDIUM_PIC:
> - if (!ix86_direct_extern_access)
> + if (!flag_plt)
> {
>   if (ASSEMBLER_DIALECT == ASM_INTEL)
> fprintf (file, "1:\tcall\t[QWORD PTR %s@GOTPCREL[rip]]\n",
> @@ -23352,7 +23358,9 @@ x86_function_profiler (FILE *file, int labelno 
> ATTRIBUTE_UNUSED)
>  "\tleal\t%sP%d@GOTOFF(%%ebx), %%" PROFILE_COUNT_REGISTER 
> "\n",
>  LPREFIX, labelno);
>  #endif
> -  if (ASSEMBLER_DIALECT == ASM_INTEL)
> +  if (flag_plt)
> +   x86_print_call_or_nop (file, mcount_name);
> +  else if (ASSEMBLER_DIALECT == ASM_INTEL)
> fprintf (file, "1:\tcall\t[DWORD PTR %s@GOT[ebx]]\n", mcount_name);
>else
> fprintf (file, "1:\tcall\t*%s@GOT(%%ebx)\n", mcount_name);
> diff --git a/gcc/testsuite/gcc.target/i386/pr119386-1.c 
> b/gcc/testsuite/gcc.target/i386/pr119386-1.c
> new file mode 100644
> index 000..9a0dc64b5b9
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr119386-1.c
> @@ -0,0 +1,10 @@
> +/* PR target/119386 */
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fpic -pg" } */
> +/* { dg-final { scan-assembler "call\[ \t\]+mcount@PLT" } } */
> +
> +int
> +main ()
> +{
> +  return 0;
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/pr119386-2.c 
> b/gcc/testsuite/gcc.target/i386/pr119386-2.c
> new file mode 100644
> index 000..3ea978ecfdf
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr119386-2.c
> @@ -0,0 +1,12 @@
> +/* PR target/119386 */
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fpic -fno-plt -pg" } */
> +/* { dg-final { scan-assembler "call\[ \t\]+\\*mcount@GOTPCREL\\(" { target 
> { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "call\[ \t\]+\\*mcount@GOT\\(" { target ia32 
> } } } */
> +
> +
> +int
> +main ()
> +{
> +  return 0;
> +}
> --
> 2.49.0.504.g3bcea36a83-goog
>


[committed v3] libstdc++: Fix std::string construction from volatile char* [PR119748]

2025-04-15 Thread Jonathan Wakely
My recent r15-9381-g648d5c26e25497 change assumes that a contiguous
iterator with the correct value_type can be converted to a const charT*
but that's not true for volatile charT*. The optimization should only be
done if it can be converted to the right pointer type.

Additionally, some generic loops for non-contiguous iterators need an
explicit cast to deal with iterator reference types that do not bind to
the const charT& parameter of traits_type::assign.

libstdc++-v3/ChangeLog:

PR libstdc++/119748
* include/bits/basic_string.h (_S_copy_chars): Only optimize for
contiguous iterators that are convertible to const charT*. Use
explicit conversion to charT after dereferencing iterator.
(_S_copy_range): Likewise for contiguous ranges.
* include/bits/basic_string.tcc (_M_construct): Use explicit
conversion to charT after dereferencing iterator.
* include/bits/cow_string.h (_S_copy_chars): Likewise.
(basic_string(from_range_t, R&&, const Allocator&)): Likewise.
Only optimize for contiguous iterators that are convertible to
const charT*.
* testsuite/21_strings/basic_string/cons/char/119748.cc: New
test.
* testsuite/21_strings/basic_string/cons/wchar_t/119748.cc:
New test.

Reviewed-by: Tomasz Kaminski 
---

Changes in v3:
- Fixed commit message to not talk about iterator references that aren't
  implicitly convertible to value_type.
- Used testsuite_iterators.h for new tests (after enabling the
  test_container(T(&)[N]) constructor for C++98).

Tested x86_64-linux. Pushed to trunk.

The static_cast parts would be OK to backport too.

 libstdc++-v3/include/bits/basic_string.h  | 24 +
 libstdc++-v3/include/bits/basic_string.tcc|  3 +-
 libstdc++-v3/include/bits/cow_string.h| 17 ++---
 .../basic_string/cons/char/119748.cc  | 35 +++
 .../basic_string/cons/wchar_t/119748.cc   |  7 
 5 files changed, 73 insertions(+), 13 deletions(-)
 create mode 100644 
libstdc++-v3/testsuite/21_strings/basic_string/cons/char/119748.cc
 create mode 100644 
libstdc++-v3/testsuite/21_strings/basic_string/cons/wchar_t/119748.cc

diff --git a/libstdc++-v3/include/bits/basic_string.h 
b/libstdc++-v3/include/bits/basic_string.h
index 9c431c765ab..c90bd099b63 100644
--- a/libstdc++-v3/include/bits/basic_string.h
+++ b/libstdc++-v3/include/bits/basic_string.h
@@ -488,8 +488,11 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
  is_same<_IterBase, const _CharT*>>::value)
_S_copy(__p, std::__niter_base(__k1), __k2 - __k1);
 #if __cpp_lib_concepts
- else if constexpr (contiguous_iterator<_Iterator>
-  && is_same_v, _CharT>)
+ else if constexpr (requires {
+  requires contiguous_iterator<_Iterator>;
+  { std::to_address(__k1) }
+-> convertible_to;
+})
{
  const auto __d = __k2 - __k1;
  (void) (__k1 + __d); // See P3349R1
@@ -499,7 +502,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
  else
 #endif
  for (; __k1 != __k2; ++__k1, (void)++__p)
-   traits_type::assign(*__p, *__k1); // These types are off.
+   traits_type::assign(*__p, static_cast<_CharT>(*__k1));
}
 #pragma GCC diagnostic pop
 
@@ -527,12 +530,19 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
static constexpr void
_S_copy_range(pointer __p, _Rg&& __rg, size_type __n)
{
- if constexpr (ranges::contiguous_range<_Rg>
- && is_same_v, _CharT>)
+ if constexpr (requires {
+ requires ranges::contiguous_range<_Rg>;
+ { ranges::data(std::forward<_Rg>(__rg)) }
+   -> convertible_to;
+   })
_S_copy(__p, ranges::data(std::forward<_Rg>(__rg)), __n);
  else
-   for (auto&& __e : __rg)
- traits_type::assign(*__p++, std::forward(__e));
+   {
+ auto __first = ranges::begin(__rg);
+ const auto __last = ranges::end(__rg);
+ for (; __first != __last; ++__first)
+   traits_type::assign(*__p++, static_cast<_CharT>(*__first));
+   }
}
 #endif
 
diff --git a/libstdc++-v3/include/bits/basic_string.tcc 
b/libstdc++-v3/include/bits/basic_string.tcc
index 02230aca5d2..bca55bc5658 100644
--- a/libstdc++-v3/include/bits/basic_string.tcc
+++ b/libstdc++-v3/include/bits/basic_string.tcc
@@ -210,7 +210,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
_M_data(__another);
_M_capacity(__capacity);
  }
-   traits_type::assign(_M_data()[__len++], *__beg);
+   traits_type::assign(_M_data()[__len++],
+   static_cast<_CharT>(*__beg));
++__beg;
  }
 
di

[PATCH v2 4/4] libstdc++: Add tests for std::extents.

2025-04-15 Thread Luc Grosheintz
A prior commit added std::extents, this commit adds the tests. The bulk
is focussed on testing the constructors. These are split into three
groups:

1. the ctor from other extents and the copy ctor,
2. the ctor from a pack of integer-like objects,
3. the ctor from shapes, i.e. span and array.

For each group check that the ctor:
* produces an object with the expected values for extent,
* is implicit if and only if required,
* is constexpr,
* doesn't change the rank of the extent.

libstdc++-v3/ChangeLog:

* testsuite/23_containers/mdspan/extents/assign.cc: New test.
* testsuite/23_containers/mdspan/extents/class_properties.cc: New test.
* testsuite/23_containers/mdspan/extents/ctor_copy.cc: New test.
* testsuite/23_containers/mdspan/extents/ctor_copy_constexpr.cc: New 
test.
* testsuite/23_containers/mdspan/extents/ctor_ints.cc: New test.
* testsuite/23_containers/mdspan/extents/ctor_ints_constexpr.cc: New 
test.
* testsuite/23_containers/mdspan/extents/ctor_shape_all_extents.cc: New 
test.
* testsuite/23_containers/mdspan/extents/ctor_shape_constexpr.cc: New 
test.
* testsuite/23_containers/mdspan/extents/ctor_shape_dynamic_extents.cc: 
New test.
* testsuite/23_containers/mdspan/extents/custom_integer.cc: New test.
* testsuite/23_containers/mdspan/extents/deduction_guide.cc: New test.
* testsuite/23_containers/mdspan/extents/dextents.cc: New test.
* testsuite/23_containers/mdspan/extents/extent.cc: New test.
* testsuite/23_containers/mdspan/extents/ops_eq.cc: New test.

Signed-off-by: Luc Grosheintz 
---
 .../23_containers/mdspan/extents/assign.cc| 29 ++
 .../mdspan/extents/class_properties.cc| 62 +
 .../23_containers/mdspan/extents/ctor_copy.cc | 75 +++
 .../mdspan/extents/ctor_copy_constexpr.cc | 20 
 .../23_containers/mdspan/extents/ctor_ints.cc | 58 
 .../mdspan/extents/ctor_ints_constexpr.cc | 12 +++
 .../mdspan/extents/ctor_shape_all_extents.cc  | 61 +
 .../mdspan/extents/ctor_shape_constexpr.cc| 23 +
 .../extents/ctor_shape_dynamic_extents.cc | 91 +++
 .../mdspan/extents/custom_integer.cc  | 87 ++
 .../mdspan/extents/deduction_guide.cc | 34 +++
 .../23_containers/mdspan/extents/dextents.cc  | 11 +++
 .../23_containers/mdspan/extents/extent.cc| 24 +
 .../23_containers/mdspan/extents/ops_eq.cc| 58 
 14 files changed, 645 insertions(+)
 create mode 100644 
libstdc++-v3/testsuite/23_containers/mdspan/extents/assign.cc
 create mode 100644 
libstdc++-v3/testsuite/23_containers/mdspan/extents/class_properties.cc
 create mode 100644 
libstdc++-v3/testsuite/23_containers/mdspan/extents/ctor_copy.cc
 create mode 100644 
libstdc++-v3/testsuite/23_containers/mdspan/extents/ctor_copy_constexpr.cc
 create mode 100644 
libstdc++-v3/testsuite/23_containers/mdspan/extents/ctor_ints.cc
 create mode 100644 
libstdc++-v3/testsuite/23_containers/mdspan/extents/ctor_ints_constexpr.cc
 create mode 100644 
libstdc++-v3/testsuite/23_containers/mdspan/extents/ctor_shape_all_extents.cc
 create mode 100644 
libstdc++-v3/testsuite/23_containers/mdspan/extents/ctor_shape_constexpr.cc
 create mode 100644 
libstdc++-v3/testsuite/23_containers/mdspan/extents/ctor_shape_dynamic_extents.cc
 create mode 100644 
libstdc++-v3/testsuite/23_containers/mdspan/extents/custom_integer.cc
 create mode 100644 
libstdc++-v3/testsuite/23_containers/mdspan/extents/deduction_guide.cc
 create mode 100644 
libstdc++-v3/testsuite/23_containers/mdspan/extents/dextents.cc
 create mode 100644 
libstdc++-v3/testsuite/23_containers/mdspan/extents/extent.cc
 create mode 100644 
libstdc++-v3/testsuite/23_containers/mdspan/extents/ops_eq.cc

diff --git a/libstdc++-v3/testsuite/23_containers/mdspan/extents/assign.cc 
b/libstdc++-v3/testsuite/23_containers/mdspan/extents/assign.cc
new file mode 100644
index 000..3bc32361a7b
--- /dev/null
+++ b/libstdc++-v3/testsuite/23_containers/mdspan/extents/assign.cc
@@ -0,0 +1,29 @@
+// { dg-do run { target c++23 } }
+#include 
+
+#include 
+
+constexpr auto dyn = std::dynamic_extent;
+
+static_assert(std::is_nothrow_assignable_v,
+  std::extents>);
+
+int
+main()
+{
+  auto e1 = std::extents();
+  auto e2 = std::extents();
+
+  e2 = e1;
+  VERIFY(e2 == e1);
+
+  auto e5 = std::extents();
+  e5 = e1;
+  VERIFY(e5 == e1);
+
+  auto e3 = std::extents(1, 2);
+  auto e4 = std::extents(3, 4);
+  e3 = e4;
+  VERIFY(e3 == e4);
+  return 0;
+}
diff --git 
a/libstdc++-v3/testsuite/23_containers/mdspan/extents/class_properties.cc 
b/libstdc++-v3/testsuite/23_containers/mdspan/extents/class_properties.cc
new file mode 100644
index 000..548900a7f44
--- /dev/null
+++ b/libstdc++-v3/testsuite/23_containers/mdspan/extents/class_properties.cc
@@ -0,0 +1,62 @@
+// { dg-do compile { target c++23 } }
+#include 
+
+

[PATCH v2 1/4] libstdc++: Setup internal FTM for mdspan.

2025-04-15 Thread Luc Grosheintz
Uses the FTM infrastructure to create an internal feature testing macro
for partial availability of mdspan; which is then used to hide the
contents of the header mdspan when compiling against a standard prior to
C++23.

libstdc++-v3/ChangeLog:

* include/bits/version.def: Add internal feature testing macro
__glibcxx_mdspan.
* include/bits/version.h: Regenerate.

Signed-off-by: Luc Grosheintz 
---
 libstdc++-v3/include/bits/version.def | 9 +
 libstdc++-v3/include/bits/version.h   | 9 +
 2 files changed, 18 insertions(+)

diff --git a/libstdc++-v3/include/bits/version.def 
b/libstdc++-v3/include/bits/version.def
index 0afaf0dec24..b2aaabff6d2 100644
--- a/libstdc++-v3/include/bits/version.def
+++ b/libstdc++-v3/include/bits/version.def
@@ -998,6 +998,15 @@ ftms = {
   };
 };
 
+ftms = {
+  name = mdspan;
+  no_stdname = true; // FIXME: remove
+  values = {
+v = 1; // FIXME: 202207
+cxxmin = 23;
+  };
+};
+
 ftms = {
   name = ssize;
   values = {
diff --git a/libstdc++-v3/include/bits/version.h 
b/libstdc++-v3/include/bits/version.h
index 980fee641e9..9ee1e0e980d 100644
--- a/libstdc++-v3/include/bits/version.h
+++ b/libstdc++-v3/include/bits/version.h
@@ -1115,6 +1115,15 @@
 #endif /* !defined(__cpp_lib_span) && defined(__glibcxx_want_span) */
 #undef __glibcxx_want_span
 
+#if !defined(__cpp_lib_mdspan)
+# if (__cplusplus >= 202100L)
+#  define __glibcxx_mdspan 1L
+#  if defined(__glibcxx_want_all) || defined(__glibcxx_want_mdspan)
+#  endif
+# endif
+#endif /* !defined(__cpp_lib_mdspan) && defined(__glibcxx_want_mdspan) */
+#undef __glibcxx_want_mdspan
+
 #if !defined(__cpp_lib_ssize)
 # if (__cplusplus >= 202002L)
 #  define __glibcxx_ssize 201902L
-- 
2.48.1



[PATCH] tailc: Fix up musttail calls vs. -fsanitize=thread [PR119801]

2025-04-15 Thread Jakub Jelinek
Hi!

Calls with musttail attribute don't really work with -fsanitize=thread in
GCC.  The problem is that TSan instrumentation adds
  __tsan_func_entry (__builtin_return_address (0));
calls at the start of each instrumented function and
  __tsan_func_exit ();
call at the end of those and the latter stands in a way of normal tail calls
as well as musttail tail calls.

Looking at what LLVM does, for normal calls -fsanitize=thread also prevents
tail calls like in GCC (well, the __tsan_func_exit () call itself can be
tail called in GCC (and from what I see not in clang)).
But for [[clang::musttail]] calls it arranges to move the
__tsan_func_exit () before the musttail call instead of after it.

The following patch handles it similarly.  If we for -fsanitize=thread
instrumented function detect __builtin_tsan_func_exit () call, we process
it normally (so that the call can be tail called in function returning void)
but set a flag that the builtin has been seen (only for cfun->has_musttail
in the diag_musttail phase).  And then let tree_optimize_tail_calls_1
call find_tail_calls again in a new mode where the __tsan_func_exit ()
call is ignored and so we are able to find calls before it, but only
accept that if the call before it is actually a musttail.  For C++ it needs
to verify that EH cleanup if any also has the __tsan_func_exit () call
and if all goes well, the musttail call is registered for tailcalling with
a flag that it has __tsan_func_exit () after it and when optimizing that
we emit __tsan_func_exit (); call before the musttail tail call (or musttail
tail recursion).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2025-04-15  Jakub Jelinek  

PR sanitizer/119801
* sanitizer.def (BUILT_IN_TSAN_FUNC_EXIT): Use BT_FN_VOID rather
than BT_FN_VOID_PTR.
* tree-tailcall.cc: Include attribs.h and asan.h.
(struct tailcall): Add has_tsan_func_exit member.
(empty_eh_cleanup): Add eh_has_tsan_func_exit argument, set what
it points to to 1 if there is exactly one __tsan_func_exit call
and ignore that call otherwise.  Adjust recursive call.
(find_tail_calls): Add RETRY_TSAN_FUNC_EXIT argument, pass it
to recursive calls.  When seeing __tsan_func_exit call with
RETRY_TSAN_FUNC_EXIT 0, set it to -1.  If RETRY_TSAN_FUNC_EXIT
is 1, initially ignore __tsan_func_exit calls.  Adjust
empty_eh_cleanup caller.  When looking through stmts after the call,
ignore exactly one __tsan_func_exit call but remember it in
t->has_tsan_func_exit.  Diagnose if EH cleanups didn't have
__tsan_func_exit and normal path did or vice versa.
(optimize_tail_call): Emit __tsan_func_exit before the tail call
or tail recursion.
(tree_optimize_tail_calls_1): Adjust find_tail_calls callers.  If
find_tail_calls changes retry_tsan_func_exit to -1, set it to 1
and call it again with otherwise the same arguments.

* c-c++-common/tsan/pr119801.c: New test.

--- gcc/sanitizer.def.jj2025-04-14 19:30:31.804837079 +0200
+++ gcc/sanitizer.def   2025-04-15 09:48:23.752349037 +0200
@@ -247,7 +247,7 @@ DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_INIT
 DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_FUNC_ENTRY, "__tsan_func_entry",
  BT_FN_VOID_PTR, ATTR_NOTHROW_LEAF_LIST)
 DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_FUNC_EXIT, "__tsan_func_exit",
- BT_FN_VOID_PTR, ATTR_NOTHROW_LEAF_LIST)
+ BT_FN_VOID, ATTR_NOTHROW_LEAF_LIST)
 DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_VPTR_UPDATE, "__tsan_vptr_update",
  BT_FN_VOID_PTR_PTR, ATTR_NOTHROW_LEAF_LIST)
 DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_READ1, "__tsan_read1",
--- gcc/tree-tailcall.cc.jj 2025-04-14 19:30:31.976834786 +0200
+++ gcc/tree-tailcall.cc2025-04-15 10:12:48.879501238 +0200
@@ -51,6 +51,8 @@ along with GCC; see the file COPYING3.
 #include "symbol-summary.h"
 #include "ipa-cp.h"
 #include "ipa-prop.h"
+#include "attribs.h"
+#include "asan.h"
 
 /* The file implements the tail recursion elimination.  It is also used to
analyze the tail calls in general, passing the results to the rtl level
@@ -122,6 +124,9 @@ struct tailcall
   /* True if it is a call to the current function.  */
   bool tail_recursion;
 
+  /* True if there is __tsan_func_exit call after the call.  */
+  bool has_tsan_func_exit;
+
   /* The return value of the caller is mult * f + add, where f is the return
  value of the call.  */
   tree mult, add;
@@ -504,7 +509,7 @@ maybe_error_musttail (gcall *call, const
Search at most CNT basic blocks (so that we don't need to do trivial
loop discovery).  */
 static bool
-empty_eh_cleanup (basic_block bb, int cnt)
+empty_eh_cleanup (basic_block bb, int *eh_has_tsan_func_exit, int cnt)
 {
   if (EDGE_COUNT (bb->succs) > 1)
 return false;
@@ -515,6 +520,14 @@ empty_eh_cleanup (basic_block bb, int cn
   gimple *g = gsi_stmt (gs

  1   2   >