Re: [RFC] Doc: Clean up musttail attribute docs some more

2025-04-03 Thread Richard Biener
On Wed, 2 Apr 2025, Sandra Loosemore wrote:

> Looking over the recently-committed change to the musttail attribute
> documentation, it appears the comment in the last example was a paste-o,
> as it does not agree with either what the similar example in the
> -Wmaybe-musttail-local-addr documentation says, or the actual behavior
> observed when running the code.
> 
> In addition, the entire section on musttail was in need of copy-editing
> to put it in the present tense, avoid reference to "the user", etc.  I've
> attempted to clean it up here.

LGTM

> gcc/ChangeLog
>   * doc/extend.texi (Statement Attributes): Copy-edit the musttail
>   attribute documentation and correct the comment in the last
>   example.
> ---
>  gcc/doc/extend.texi | 46 ++---
>  1 file changed, 23 insertions(+), 23 deletions(-)
> 
> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> index 4302df76c7f..16ad83fc510 100644
> --- a/gcc/doc/extend.texi
> +++ b/gcc/doc/extend.texi
> @@ -9314,7 +9314,7 @@ The @code{gnu::musttail} or @code{clang::musttail} 
> standard attribute
>  or @code{musttail} GNU attribute can be applied to a @code{return} statement
>  with a return-value expression that is a function call.  It asserts that the
>  call must be a tail call that does not allocate extra stack space, so it is
> -safe to use tail recursion to implement long running loops.
> +safe to use tail recursion to implement long-running loops.
>  
>  @smallexample
>  [[gnu::musttail]] return foo();
> @@ -9324,11 +9324,11 @@ safe to use tail recursion to implement long running 
> loops.
>  __attribute__((musttail)) return bar();
>  @end smallexample
>  
> -If the compiler cannot generate a @code{musttail} tail call it will report
> -an error.  On some targets tail calls may never be supported.
> -The user asserts for @code{musttail} tail calls that lifetime of automatic
> +If the compiler cannot generate a @code{musttail} tail call it reports
> +an error.  On some targets, tail calls may not be supported at all.
> +The @code{musttail} attribute asserts that the lifetime of automatic
>  variables, function parameters and temporaries (unless they have non-trivial
> -destruction) can end before the actual call instruction and that any access
> +destruction) can end before the actual call instruction, and that any access
>  to those from inside of the called function results is considered undefined
>  behavior.  Enabling @option{-O1} or @option{-O2} can improve the success of
>  tail calls.
> @@ -9344,9 +9344,9 @@ baz (int *x)
>if (*x == 1)
>  @{
>int a = 42;
> -  /* The call will be tail called (would not be without the
> - attribute), dereferencing the pointer in the callee is
> - undefined behavior and there will be a warning emitted
> +  /* The call is a tail call (would not be without the
> + attribute).  Dereferencing the pointer in the callee is
> + undefined behavior, and there is a warning emitted
>   for this by default (@option{-Wmusttail-local-addr}).  */
>[[gnu::musttail]] return foo (&a);
>  @}
> @@ -9354,9 +9354,9 @@ baz (int *x)
>  @{
>int a = 42;
>bar (&a);
> -  /* The call will be tail called (would not be without the
> - attribute), if bar stores the pointer anywhere, dereferencing
> - it in foo will be undefined behavior and there will be a warning
> +  /* The call is a tail call (would not be without the
> + attribute).  If bar stores the pointer anywhere, dereferencing
> + it in foo is undefined behavior.  There is a warning
>   emitted for this with @option{-Wextra}, which implies
>   @option{-Wmaybe-musttail-local-addr}.  */
>[[gnu::musttail]] return foo (nullptr);
> @@ -9365,8 +9365,8 @@ baz (int *x)
>  @{
>S s;
>/* The s variable requires non-trivial destruction which ought
> - to be performed after the foo call returns, so this will
> - be rejected.  */
> + to be performed after the foo call returns, so this is
> + rejected.  */
>[[gnu::musttail]] return foo (&s.s);
>  @}
>  @}
> @@ -9374,9 +9374,9 @@ baz (int *x)
>  
>  To avoid the @option{-Wmaybe-musttail-local-addr} warning in the
>  above @code{*x == 2} case and similar code, consider defining the
> -maybe escaped variables in a separate scope which will end before the
> -return statement if possible to make it clear that the variable is not
> -live during the call.  So
> +maybe-escaped variables in a separate scope that ends before the
> +return statement, if that is possible, to make it clear that the
> +variable is not live during the call.  So:
>  
>  @smallexample
>else if (*x == 2)
> @@ -9385,17 +9385,17 @@ live during the call.  So
>  int a = 42;
>  bar (&a);
>@}
> -  /* The call will be tail called (would not be without the
> - attribute), if bar s

Re: [PATCH] combine: Allow 2->2 combos but limit insn searches [PR116398]

2025-04-03 Thread Richard Biener
On Wed, 2 Apr 2025, Richard Sandiford wrote:

> Richard Biener  writes:
> > On Tue, 1 Apr 2025, Richard Sandiford wrote:
> >
> >> The problem in PR101523 was that, after each successful 2->2
> >> combination attempt, distribute_links would search further and
> >> further for the next combinable use of the i2 destination.
> >> The original patch for the PR dealt with that by disallowing such
> >> combinations.  However, that led to various optimisation regressions,
> >> so there was interest in allowing the combinations again, at least
> >> until an alternative way of getting the same results is in place.
> >> 
> >> This patch does that, but limits distribute_links to a certain number
> >> of instructions when i2 is unchanged.  As Segher said in the PR trail,
> >> it would make more conceptual sense to apply the limit unconditionally,
> >> but I thought it would be better to change as little as possible at
> >> this development stage.  Logically, in stage 1, the --param should
> >> be applied directly by distribute_links with no input from callers.
> >> 
> >> As I mentioned in:
> >> 
> >>   https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116398#c28
> >> 
> >> I think it's safe to drop log links even if a use exists.  All
> >> processing of log links seems to handle the absence of a link
> >> for a particular register in a conservative way.
> >> 
> >> The initial set-up errs on the side of dropping links, since for example
> >> create_log_links has:
> >> 
> >>  /* flow.c claimed:
> >> 
> >>  We don't build a LOG_LINK for hard registers contained
> >>  in ASM_OPERANDs.  If these registers get replaced,
> >>  we might wind up changing the semantics of the insn,
> >>  even if reload can make what appear to be valid
> >>  assignments later.  */
> >>   if (regno < FIRST_PSEUDO_REGISTER
> >>   && asm_noperands (PATTERN (use_insn)) >= 0)
> >> continue;
> >> 
> >> which excludes combinations by dropping log links, rather than during
> >> try_combine.  And:
> >> 
> >>   /* If this register is being initialized using itself, and the
> >>  register is uninitialized in this basic block, and there are
> >>  no LOG_LINKS which set the register, then part of the
> >>  register is uninitialized.  In that case we can't assume
> >>  anything about the number of nonzero bits.
> >> 
> >>  ??? We could do better if we checked this in
> >>  reg_{nonzero_bits,num_sign_bit_copies}_for_combine.  Then we
> >>  could avoid making assumptions about the insn which initially
> >>  sets the register, while still using the information in other
> >>  insns.  We would have to be careful to check every insn
> >>  involved in the combination.  */
> >> 
> >>   if (insn
> >>   && reg_referenced_p (x, PATTERN (insn))
> >>   && !REGNO_REG_SET_P (DF_LR_IN (BLOCK_FOR_INSN (insn)),
> >>REGNO (x)))
> >> {
> >>   struct insn_link *link;
> >> 
> >>   FOR_EACH_LOG_LINK (link, insn)
> >> if (dead_or_set_p (link->insn, x))
> >>   break;
> >>   if (!link)
> >> {
> >>   rsp->nonzero_bits = GET_MODE_MASK (mode);
> >>   rsp->sign_bit_copies = 1;
> >>   return;
> >> }
> >> }
> >> 
> >> treats the lack of a log link is a possible sign of uninitialised data,
> >> but that would be a missed optimisation rather than a correctness issue.
> >> 
> >> One question is what the default --param value should be.  I went with
> >> Jakub's suggestion of 3000 from:
> >> 
> >>   https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116398#c25
> >> 
> >> Also, to answer Jakub's question in that comment, I tried bisecting:
> >> 
> >>   int limit = atoi (getenv ("BISECT"));
> >> 
> >> (so applying the limit for all calls from try_combine) with an
> >> abort in distribute_links if the limit caused a link to be skipped.
> >> The minimum BISECT value that allowed an aarch64-linux-gnu bootstrap
> >> to succeed with --enable-languages=all --enable-checking=yes,rtl,extra
> >> was 142, so much lower than the parameter value.  I realised too late
> >> that --enable-checking=release would probably have been a more
> >> interesting test.
> >> 
> >> Some of the try_combine changes come from Richi's patch in:
> >> 
> >>   https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523#c53
> >> 
> >> Bootstrapped & regression-tested on aarch64-linux-gnu,
> >> powerpc64le-linux-gnu and x86_64-linux-gnu.  OK to install?
> >> 
> >> Richard
> >> 
> >> 
> >> gcc/
> >>PR testsuite/116398
> >>* params.opt (-param=max-combine-search-insns=): New param.
> >>* doc/invoke.texi: Document it.
> >>* combine.cc (distribute_links): Add a limit parameter.
> >>(try_combine): Use the new param to limit distribute_links
> >>   

Re: [PATCH] fold-const, cobol, v2: Add native_encode_wide_int and use it in COBOL FE [PR119242]

2025-04-03 Thread Richard Biener
On Wed, 2 Apr 2025, Jakub Jelinek wrote:

> On Wed, Apr 02, 2025 at 09:22:54PM +0100, Richard Sandiford wrote:
> > I'm not sure the _1 template is worth it.  wi::to_widest ->
> > const wide_int_ref & should be very cheap (it doesn't for example
> > construct a widest_int), so I think native_encode_int could call
> > native_encode_wide_int.
> 
> You're right, I was afraid it would construct something expensive.
> Looking at --enable-checking=release build, I see native_encode_int
> is inlined into native_encode_expr and does
>   MEM[(struct wide_int_ref_storage *)&D.261480] ={v} {CLOBBER};
>   _30 = *expr_10.base.u.int_length.extended;
>   _31 = (unsigned int) _30;
>   _32 = &expr_10->int_cst.val[0];
>   MEM  [(struct wide_int_ref_storage *)&D.261480] = _32;
>   MEM  [(struct wide_int_ref_storage *)&D.261480 + 8B] = _31;
>   MEM  [(struct wide_int_ref_storage *)&D.261480 + 12B] = 
> 131072;
>   _33 = expr_10->typed.type;
>   _34 = native_encode_wide_int (_33, &D.261480, ptr_11, len_12, off_8);
> there in optimized dump, which is
>   temp.val = &TREE_INT_CST_ELT (expr, 0);
>   temp.len = TREE_INT_CST_EXT_NUNITS (expr);
>   temp.precision = WIDEST_INT_MAX_PRECISION;
> so that seems fairly cheap.
> 
> So I'll retest:

OK if it worked out.

Richard.

> 2025-04-02  Jakub Jelinek  
> 
>   PR cobol/119242
> gcc/
>   * fold-const.h (native_encode_wide_int): Declare.
>   * fold-const.cc (native_encode_wide_int): New function.
>   (native_encode_int): Use it.
> gcc/cobol/
>   * genapi.cc (binary_initial_from_float128): Use
>   native_encode_wide_int.
> 
> --- gcc/fold-const.h.jj   2025-04-02 19:26:55.300272195 +0200
> +++ gcc/fold-const.h  2025-04-02 22:36:19.066641205 +0200
> @@ -35,6 +35,8 @@ extern bool folding_cxx_constexpr;
>  extern int native_encode_expr (const_tree, unsigned char *, int, int off = 
> -1);
>  extern int native_encode_initializer (tree, unsigned char *, int,
> int off = -1, unsigned char * = nullptr);
> +extern int native_encode_wide_int (tree, const wide_int_ref &,
> +unsigned char *, int, int off = -1);
>  extern int native_encode_real (scalar_float_mode, const REAL_VALUE_TYPE *,
>  unsigned char *, int, int off = -1);
>  extern tree native_interpret_expr (tree, const unsigned char *, int);
> --- gcc/fold-const.cc.jj  2025-04-02 19:26:55.300272195 +0200
> +++ gcc/fold-const.cc 2025-04-02 22:39:04.932113465 +0200
> @@ -7465,15 +7465,16 @@ fold_plusminus_mult_expr (location_t loc
>return NULL_TREE;
>  }
>  
> -/* Subroutine of native_encode_expr.  Encode the INTEGER_CST
> -   specified by EXPR into the buffer PTR of length LEN bytes.
> +
> +/* Subroutine of native_encode_int.  Encode the integer VAL with type TYPE
> +   into the buffer PTR of length LEN bytes.
> Return the number of bytes placed in the buffer, or zero
> upon failure.  */
>  
> -static int
> -native_encode_int (const_tree expr, unsigned char *ptr, int len, int off)
> +int
> +native_encode_wide_int (tree type, const wide_int_ref &val,
> + unsigned char *ptr, int len, int off)
>  {
> -  tree type = TREE_TYPE (expr);
>int total_bytes;
>if (TREE_CODE (type) == BITINT_TYPE)
>  {
> @@ -7516,7 +7517,7 @@ native_encode_int (const_tree expr, unsi
>int bitpos = byte * BITS_PER_UNIT;
>/* Extend EXPR according to TYPE_SIGN if the precision isn't a whole
>number of bytes.  */
> -  value = wi::extract_uhwi (wi::to_widest (expr), bitpos, BITS_PER_UNIT);
> +  value = wi::extract_uhwi (val, bitpos, BITS_PER_UNIT);
>  
>if (total_bytes > UNITS_PER_WORD)
>   {
> @@ -7537,6 +7538,18 @@ native_encode_int (const_tree expr, unsi
>return MIN (len, total_bytes - off);
>  }
>  
> +/* Subroutine of native_encode_expr.  Encode the INTEGER_CST
> +   specified by EXPR into the buffer PTR of length LEN bytes.
> +   Return the number of bytes placed in the buffer, or zero
> +   upon failure.  */
> +
> +static int
> +native_encode_int (const_tree expr, unsigned char *ptr, int len, int off)
> +{
> +  return native_encode_wide_int (TREE_TYPE (expr), wi::to_widest (expr),
> +  ptr, len, off);
> +}
> +
>  
>  /* Subroutine of native_encode_expr.  Encode the FIXED_CST
> specified by EXPR into the buffer PTR of length LEN bytes.
> --- gcc/cobol/genapi.cc.jj2025-04-02 19:26:55.265272660 +0200
> +++ gcc/cobol/genapi.cc   2025-04-02 22:36:19.071641137 +0200
> @@ -15216,25 +15216,19 @@ binary_initial_from_float128(cbl_field_t
>FIXED_WIDE_INT(128) i
>  = FIXED_WIDE_INT(128)::from (real_to_integer (&value, &fail, 128), 
> SIGNED);
>  
> -  /* ???  Use native_encode_* below.  */
>retval = (char *)xmalloc(field->data.capacity);
>switch(field->data.capacity)
>  {
> +  tree type;
>  case 1:
> -  *(signed char *)retval = (signed char)i.slow ();
> -  break;
>  case 2:
> -  *(signe

[committed] c-family: Regenerate c.opt.urls

2025-04-03 Thread Jakub Jelinek
Hi!

On Sun, Mar 30, 2025 at 02:48:43PM +0200, Martin Uecker wrote:
> The warning -Wzero-as-null-pointer-constant is now not only supported
> in C++ but also in C.  Change the documentation accordingly.

This change didn't include make regenerate-opt-urls changes, because
moving option documentation to different section can affect the *.urls
files.

Fixed thusly, committed to trunk as obvious.

2025-04-03  Jakub Jelinek  

* c.opt.urls: Regenerate.

--- gcc/c-family/c.opt.urls.jj  2025-03-31 07:39:03.426720264 +0200
+++ gcc/c-family/c.opt.urls 2025-04-03 08:44:26.594223610 +0200
@@ -976,7 +976,7 @@ Wxor-used-as-pow
 UrlSuffix(gcc/Warning-Options.html#index-Wno-xor-used-as-pow)
 
 Wzero-as-null-pointer-constant
-UrlSuffix(gcc/C_002b_002b-Dialect-Options.html#index-Wno-zero-as-null-pointer-constant)
+UrlSuffix(gcc/Warning-Options.html#index-Wno-zero-as-null-pointer-constant)
 
 Wzero-length-bounds
 UrlSuffix(gcc/Warning-Options.html#index-Wzero-length-bounds)


Jakub



[COMMITTED] Doc: Clarify use of access attribute [PR101440]

2025-04-03 Thread Sandra Loosemore
This issue was specifically about a confusing mention of the "second
and third arguments to the memcpy function" when only the second one
is a pointer affected by the attribute, but reading through the entire
discussion I found other things confusing as well; e.g. in some cases
it wasn't clear whether the "arguments" were the arguments to the
attribute or the function, or exactly what a "positional argument"
was.  I've tried to rewrite that part to straighten it out, as well as
some light copy-editing throughout.

gcc/ChangeLog
PR c/101440
* doc/extend.texi (Common Function Attributes): Clean up some
confusing language in the description of the "access" attribute.
---
 gcc/doc/extend.texi | 37 -
 1 file changed, 20 insertions(+), 17 deletions(-)

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index d76d333576f..4302df76c7f 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -1728,23 +1728,24 @@ write-only accesses to objects that are never read 
from.  Such accesses
 may be diagnosed by warnings such as @option{-Wstringop-overflow},
 @option{-Wuninitialized}, @option{-Wunused}, and others.
 
-The @code{access} attribute specifies that a function to whose by-reference
-arguments the attribute applies accesses the referenced object according to
-@var{access-mode}.  The @var{access-mode} argument is required and must be
-one of four names: @code{read_only}, @code{read_write}, @code{write_only},
-or @code{none}.  The remaining two are positional arguments.
+The @code{access} attribute specifies that a pointer or reference argument
+to the function is accessed according to @var{access-mode}, which must be
+one of @code{read_only}, @code{read_write}, @code{write_only},
+or @code{none}.  The semantics of these modes are described below.
 
-The required @var{ref-index} positional argument  denotes a function
-argument of pointer (or in C++, reference) type that is subject to
-the access.  The same pointer argument can be referenced by at most one
-distinct @code{access} attribute.
+The argument the attribute applies to is identifed by @var{ref-index}, which
+is an integer constant representing its position in the argument list.
+Argument numbering starts from 1.  You can specify multiple @code{access}
+attributes to describe the access modes of different arguments, but multiple
+@code{access} attributes applying to the same argument are not permitted.
 
-The optional @var{size-index} positional argument denotes a function
+The optional @var{size-index} denotes the position of a function
 argument of integer type that specifies the maximum size of the access.
 The size is the number of elements of the type referenced by @var{ref-index},
 or the number of bytes when the pointer type is @code{void*}.  When no
 @var{size-index} argument is specified, the pointer argument must be either
-null or point to a space that is suitably aligned and large for at least one
+null or point to a space that is suitably aligned and large enough
+for at least one
 object of the referenced type (this implies that a past-the-end pointer is
 not a valid argument).  The actual size of the access may be less but it
 must not be more.
@@ -1756,7 +1757,7 @@ is zero, the referenced object must be initialized.  The 
mode implies
 a stronger guarantee than the @code{const} qualifier which, when cast away
 from a pointer, does not prevent the pointed-to object from being modified.
 Examples of the use of the @code{read_only} access mode is the argument to
-the @code{puts} function, or the second and third arguments to
+the @code{puts} function, or the second argument to
 the @code{memcpy} function.
 
 @smallexample
@@ -1796,12 +1797,13 @@ __attribute__ ((access (write_only, 1, 2), access 
(read_write, 3)))
 int fgets (char*, int, FILE*);
 @end smallexample
 
-The access mode @code{none} specifies that the pointer to which it applies
+The access mode @code{none} specifies that the pointer argument
+to which it applies
 is not used to access the referenced object at all.  Unless the pointer is
-null the pointed-to object must exist and have at least the size as denoted
+null, the pointed-to object must exist and have at least the size as denoted
 by the @var{size-index} argument.  When the optional @var{size-index}
-argument is omitted for an argument of @code{void*} type the actual pointer
-agument is ignored.  The referenced object need not be initialized.
+argument is omitted for an argument of @code{void*} type, the actual pointer
+argument is ignored.  The referenced object need not be initialized.
 The mode is intended to be used as a means to help validate the expected
 object size, for example in functions that call @code{__builtin_object_size}.
 @xref{Object Size Checking}.
@@ -1812,7 +1814,8 @@ an access @strong{will} happen.  Also, the @code{access} 
attribute does not
 imply the attribute @code{nonnull} nor the attribute @code{nonnull_if_nonzero};
 it may be 

[PATCH] cse: Fix up delete_trivially_dead_insns [PR119594]

2025-04-03 Thread Jakub Jelinek
Hi!

The following testcase is miscompiled by delete_trivially_dead_insns,
latently since r0-6313, actually since r15-1575.

The problem is in that r0-6313 change, which made count_reg_usage not
count uses of the pseudo which the containing SET sets.  That is needed
so we can delete those instructions as trivially dead if they are really
dead, but has the following problem.  After fwprop proper we have:
(insn 7 2 8 2 (set (reg/v:DI 101 [ g ])
(const_int -1 [0x])) "pr119594.c":8:10 95 
{*movdi_internal}
 (nil))
...
(insn 26 24 27 7 (set (reg:DI 104 [ g ])
(zero_extend:DI (subreg:SI (reg/v:DI 101 [ g ]) 0))) "pr119594.c":11:8 
175 {*zero_extendsidi2}
 (expr_list:REG_EQUAL (const_int 4294967295 [0x])
(expr_list:REG_DEAD (reg/v:DI 101 [ g ])
(nil
(insn 27 26 28 7 (set (reg/v:DI 101 [ g ])
(zero_extend:DI (subreg:SI (reg/v:DI 101 [ g ]) 0))) "pr119594.c":11:8 
175 {*zero_extendsidi2}
 (expr_list:REG_EQUAL (const_int 4294967295 [0x])
(expr_list:REG_UNUSED (reg/v:DI 101 [ g ])
(nil
and nothing else uses or sets the 101 and 104 pseudos.  The subpass doesn't
look at REG_UNUSED or REG_DEAD notes (correctly, as they aren't guaranteed
to be accurate).  The last change in the IL was forward propagation of
(reg:DI 104 [ g ]) value into the following insn.
Now, count_reg_usage doesn't count anything on insn 7, the SET_DEST is a
reg, so we don't count that and SET_SRC doesn't contain any regs.
On insn 26 it counts one usage of pseudo 101 (so counts[101] = 1) and
on insn 27 since r0-6313 doesn't count anything as that insn sets
pseudo 101 to something that uses it, it isn't a side-effect instruction
and can't throw.

Now, after counting reg usages the subpass walks the IL from end to start,
sees insn 27, counts[101] is non-zero, so insn_live_p is true, nothing is
deleted.  Then sees insn 26, counts[104] is zero, insn_live_p is false,
we delete the insn and decrease associated counts, in this case counts[101]
becomes zero.  And finally later we process insn 7, counts[101] is now zero,
insn_live_p is false, we delete the insn (and decrease associated counts,
which aren't any).
Except that this resulted in insn 27 staying in the IL but using a REG
which is no longer set (and worse, having a REG_EQUAL note of something we
need later in the same bb, so we then assume pseudo 101 contains 0x,
which it no longer does.

Now, if insn 26 was after insn 27, this would work just fine, we'd first
delete that and then insn 27 and then insn 7, which is why most of the time
it happens to work fine.

The following patch attempts to detect these cases by counting the uses
of pseudos in instructions which set it in a separate new slice of the
counts array and then we can check if we've decreased usage count to zero
on something that still has such self-uses non-zero (so some such
instructions not removed) and do another pass of the IL and delete those
(and just those, without updating further reg usage counts so that we don't
have to repeat; also, I think we really don't need to update debug insns at
that point, in such cases it were originally multiple sets of the pseudo
(unless the code was undefined) and there is following loop on the IL to
adjust the further debug insns.

Previously, we forced (through passing dest = pc_rtx) counting the uses
even in sets of such pseudo on some instructions which would be never
insn_live_p regardless of counts array (insns throwing with non-call
exceptions and side_effects_p instructions).  The patch extends it
to other cases where it is never insn_live_p regardless of counts array
content (e.g. insns which set hard registers in parallel with some
pseudo, etc.) and also add to that the case of setting 2 or more pseudos
in one instruction (something insn_live_p could return false if both
pseudos are unused but if we aren't lucky and one of them is, we would
need to keep it but could remove the setter of the self-reference use).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Note, we never care about usage counts of hard registers, so I wonder
(perhaps for GCC 16) if we shouldn't decrease the size of slices from
2 * nreg (or for -g 4 * nreg) to 2 * (nreg - FIRST_PSEUDO_REGISTER)
or 4 times that and simply only track pseudo registers with REGNO (x) -
FIRST_PSEUDO_REGISTER indexes.

2025-04-04  Jakub Jelinek  

PR rtl-optimization/119594
* cse.cc (count_reg_usage): Add NREG argument, pass it to
recursive call.  For REG increment counts[REGNO (x) + nreg] if
x == dest and nreg is non-zero.  For INSNs, set dest = pc_rtx
if PATTERN is a PARALLEL and insn_live_p will be true regardless
of count array content, or if it sets 2 or more pseudos.
(delete_trivially_dead_insns): For MAY_HAVE_DEBUG_BIND_INSNS allocate
4 nreg array slices instead of 3 and adjust count + nreg * 2 to
count + nreg * 3 and coun

Re: [PATCH] combine: Allow 2->2 combos but limit insn searches [PR116398]

2025-04-03 Thread Richard Biener
On Thu, 3 Apr 2025, Richard Sandiford wrote:

> Richard Biener  writes:
> > On Wed, 2 Apr 2025, Richard Sandiford wrote:
> >
> >> Richard Biener  writes:
> >> > On Tue, 1 Apr 2025, Richard Sandiford wrote:
> >> >
> >> >> The problem in PR101523 was that, after each successful 2->2
> >> >> combination attempt, distribute_links would search further and
> >> >> further for the next combinable use of the i2 destination.
> >> >> The original patch for the PR dealt with that by disallowing such
> >> >> combinations.  However, that led to various optimisation regressions,
> >> >> so there was interest in allowing the combinations again, at least
> >> >> until an alternative way of getting the same results is in place.
> >> >> 
> >> >> This patch does that, but limits distribute_links to a certain number
> >> >> of instructions when i2 is unchanged.  As Segher said in the PR trail,
> >> >> it would make more conceptual sense to apply the limit unconditionally,
> >> >> but I thought it would be better to change as little as possible at
> >> >> this development stage.  Logically, in stage 1, the --param should
> >> >> be applied directly by distribute_links with no input from callers.
> >> >> 
> >> >> As I mentioned in:
> >> >> 
> >> >>   https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116398#c28
> >> >> 
> >> >> I think it's safe to drop log links even if a use exists.  All
> >> >> processing of log links seems to handle the absence of a link
> >> >> for a particular register in a conservative way.
> >> >> 
> >> >> The initial set-up errs on the side of dropping links, since for example
> >> >> create_log_links has:
> >> >> 
> >> >>  /* flow.c claimed:
> >> >> 
> >> >>  We don't build a LOG_LINK for hard registers contained
> >> >>  in ASM_OPERANDs.  If these registers get replaced,
> >> >>  we might wind up changing the semantics of the insn,
> >> >>  even if reload can make what appear to be valid
> >> >>  assignments later.  */
> >> >>   if (regno < FIRST_PSEUDO_REGISTER
> >> >>   && asm_noperands (PATTERN (use_insn)) >= 0)
> >> >> continue;
> >> >> 
> >> >> which excludes combinations by dropping log links, rather than during
> >> >> try_combine.  And:
> >> >> 
> >> >>   /* If this register is being initialized using itself, and the
> >> >>  register is uninitialized in this basic block, and there are
> >> >>  no LOG_LINKS which set the register, then part of the
> >> >>  register is uninitialized.  In that case we can't assume
> >> >>  anything about the number of nonzero bits.
> >> >> 
> >> >>  ??? We could do better if we checked this in
> >> >>  reg_{nonzero_bits,num_sign_bit_copies}_for_combine.  Then we
> >> >>  could avoid making assumptions about the insn which initially
> >> >>  sets the register, while still using the information in other
> >> >>  insns.  We would have to be careful to check every insn
> >> >>  involved in the combination.  */
> >> >> 
> >> >>   if (insn
> >> >>   && reg_referenced_p (x, PATTERN (insn))
> >> >>   && !REGNO_REG_SET_P (DF_LR_IN (BLOCK_FOR_INSN (insn)),
> >> >>REGNO (x)))
> >> >> {
> >> >>   struct insn_link *link;
> >> >> 
> >> >>   FOR_EACH_LOG_LINK (link, insn)
> >> >> if (dead_or_set_p (link->insn, x))
> >> >>   break;
> >> >>   if (!link)
> >> >> {
> >> >>   rsp->nonzero_bits = GET_MODE_MASK (mode);
> >> >>   rsp->sign_bit_copies = 1;
> >> >>   return;
> >> >> }
> >> >> }
> >> >> 
> >> >> treats the lack of a log link is a possible sign of uninitialised data,
> >> >> but that would be a missed optimisation rather than a correctness issue.
> >> >> 
> >> >> One question is what the default --param value should be.  I went with
> >> >> Jakub's suggestion of 3000 from:
> >> >> 
> >> >>   https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116398#c25
> >> >> 
> >> >> Also, to answer Jakub's question in that comment, I tried bisecting:
> >> >> 
> >> >>   int limit = atoi (getenv ("BISECT"));
> >> >> 
> >> >> (so applying the limit for all calls from try_combine) with an
> >> >> abort in distribute_links if the limit caused a link to be skipped.
> >> >> The minimum BISECT value that allowed an aarch64-linux-gnu bootstrap
> >> >> to succeed with --enable-languages=all --enable-checking=yes,rtl,extra
> >> >> was 142, so much lower than the parameter value.  I realised too late
> >> >> that --enable-checking=release would probably have been a more
> >> >> interesting test.
> >> >> 
> >> >> Some of the try_combine changes come from Richi's patch in:
> >> >> 
> >> >>   https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523#c53
> >> >> 
> >> >> Bootstrapped & regression-tested on aarch64-linux-gnu,
> >>

Re: [PATCH] libgcobol: Check if the target needs libm.

2025-04-03 Thread Richard Biener
On Thu, Apr 3, 2025 at 9:19 PM Iain Sandoe  wrote:
>
> We should not assume that the target needs (or even has) libm.
>
> Tested on the usual suspects (x86_64, aarch64 linux, x86_64 darwin).
> OK for trunk?

OK.

> thanks
> Iain.
>
> --- 8< ---
>
> Use the libtool config check and $(LIBM).
>
> libgcobol/ChangeLog:
>
> * Makefile.am: Use $(LIBM) to add the math lib when
> it is needed.
> * Makefile.in: Regenerate.
> * configure: Regenerate.
> * configure.ac: Check if the target wants libm.
>
> Signed-off-by: Iain Sandoe 
> ---
>  libgcobol/Makefile.am  |   2 +-
>  libgcobol/Makefile.in  |   3 +-
>  libgcobol/configure| 146 -
>  libgcobol/configure.ac |   1 +
>  4 files changed, 148 insertions(+), 4 deletions(-)
>
> diff --git a/libgcobol/Makefile.am b/libgcobol/Makefile.am
> index cafb733dde1..45217421b1c 100644
> --- a/libgcobol/Makefile.am
> +++ b/libgcobol/Makefile.am
> @@ -48,7 +48,7 @@ libgcobol_la_LINK = $(LIBTOOL) --mode=link --tag=CXX $(CXX) 
>   \
> -Wc,-shared-libgcc  \
> -version-info $(LIBGCOBOL_VERSION)  \
> -lstdc++\
> -   $(LTLDFLAGS) $(LTLIBICONV)
> +   $(LTLDFLAGS) $(LTLIBICONV) $(LIBM)
>
>  WARN_CFLAGS = -W -Wall -Wwrite-strings
>
> diff --git a/libgcobol/Makefile.in b/libgcobol/Makefile.in
> index c4a562a8058..0f0d9b687f1 100644
> --- a/libgcobol/Makefile.in
> +++ b/libgcobol/Makefile.in
> @@ -305,6 +305,7 @@ LD = @LD@
>  LDFLAGS = @LDFLAGS@
>  LIBGCOBOL_VERSION = @LIBGCOBOL_VERSION@
>  LIBICONV = @LIBICONV@
> +LIBM = @LIBM@
>  LIBOBJS = @LIBOBJS@
>  LIBS = @LIBS@
>  LIBTOOL = @LIBTOOL@
> @@ -428,7 +429,7 @@ libgcobol_la_LINK = $(LIBTOOL) --mode=link --tag=CXX 
> $(CXX) \
> -Wc,-shared-libgcc  \
> -version-info $(LIBGCOBOL_VERSION)  \
> -lstdc++\
> -   $(LTLDFLAGS) $(LTLIBICONV)
> +   $(LTLDFLAGS) $(LTLIBICONV) $(LIBM)
>
>  WARN_CFLAGS = -W -Wall -Wwrite-strings
>  AM_CXXFLAGS = $(CXXFLAGS_FOR_TARGET)
> diff --git a/libgcobol/configure b/libgcobol/configure
> index e12b72e0817..e7b3b830890 100755
> --- a/libgcobol/configure
> +++ b/libgcobol/configure
> @@ -646,6 +646,7 @@ enable_static
>  enable_shared
>  ENABLE_DARWIN_AT_RPATH_FALSE
>  ENABLE_DARWIN_AT_RPATH_TRUE
> +LIBM
>  CXXCPP
>  OTOOL64
>  OTOOL
> @@ -12908,7 +12909,7 @@ else
>lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
>lt_status=$lt_dlunknown
>cat > conftest.$ac_ext <<_LT_EOF
> -#line 12911 "configure"
> +#line 12912 "configure"
>  #include "confdefs.h"
>
>  #if HAVE_DLFCN_H
> @@ -13014,7 +13015,7 @@ else
>lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
>lt_status=$lt_dlunknown
>cat > conftest.$ac_ext <<_LT_EOF
> -#line 13017 "configure"
> +#line 13018 "configure"
>  #include "confdefs.h"
>
>  #if HAVE_DLFCN_H
> @@ -16348,6 +16349,147 @@ enable_dlopen=yes
>
>
>
> +LIBM=
> +case $host in
> +*-*-beos* | *-*-cegcc* | *-*-cygwin* | *-*-haiku* | *-*-pw32* | *-*-darwin*)
> +  # These system don't have libm, or don't need it
> +  ;;
> +*-ncr-sysv4.3*)
> +  { $as_echo "$as_me:${as_lineno-$LINENO}: checking for _mwvalidcheckl in 
> -lmw" >&5
> +$as_echo_n "checking for _mwvalidcheckl in -lmw... " >&6; }
> +if ${ac_cv_lib_mw__mwvalidcheckl+:} false; then :
> +  $as_echo_n "(cached) " >&6
> +else
> +  ac_check_lib_save_LIBS=$LIBS
> +LIBS="-lmw  $LIBS"
> +if test x$gcc_no_link = xyes; then
> +  as_fn_error $? "Link tests are not allowed after GCC_NO_EXECUTABLES." 
> "$LINENO" 5
> +fi
> +cat confdefs.h - <<_ACEOF >conftest.$ac_ext
> +/* end confdefs.h.  */
> +
> +/* Override any GCC internal prototype to avoid an error.
> +   Use char because int might match the return type of a GCC
> +   builtin and then its argument prototype would still apply.  */
> +#ifdef __cplusplus
> +extern "C"
> +#endif
> +char _mwvalidcheckl ();
> +int
> +main ()
> +{
> +return _mwvalidcheckl ();
> +  ;
> +  return 0;
> +}
> +_ACEOF
> +if ac_fn_c_try_link "$LINENO"; then :
> +  ac_cv_lib_mw__mwvalidcheckl=yes
> +else
> +  ac_cv_lib_mw__mwvalidcheckl=no
> +fi
> +rm -f core conftest.err conftest.$ac_objext \
> +conftest$ac_exeext conftest.$ac_ext
> +LIBS=$ac_check_lib_save_LIBS
> +fi
> +{ $as_echo "$as_me:${as_lineno-$LINENO}: result: 
> $ac_cv_lib_mw__mwvalidcheckl" >&5
> +$as_echo "$ac_cv_lib_mw__mwvalidcheckl" >&6; }
> +if test "x$ac_cv_lib_mw__mwvalidcheckl" = xyes; then :
> +  LIBM="-lmw"
> +fi
> +
> +  { $as_echo "$as_me:${as_lineno-$LINENO}: checking for cos in -lm" >&5
> +$as_echo_n "checking for cos in -lm... " >&6; }
> +if ${ac_cv_lib_m_cos+:} false; then :
> +  $as_echo_n "(cached) " >&6
> +else
> +  ac_check_lib_save_LIBS=$LIBS
> +LIBS="-lm  $LIBS"
> +if test x$gcc_no_link = xyes; then
> +  as_fn_error $? "Link tests are not allowed a

Re: [PATCH] always turn return into __builtin_unreachable for noreturn fuctions [PR119599]

2025-04-03 Thread Richard Biener
On Thu, Apr 3, 2025 at 11:56 PM Andrew Pinski  wrote:
>
> r8-3988-g356fcc67fba52b added code to turn return statements into 
> __builtin_unreachable
> calls inside noreturn functions but only while optimizing. Since 
> -funreachable-traps
> was added (r13-1204-gd68d3664253696), it is a good idea to move over to using
> __builtin_unreachable (and the trap version with this option which defaults 
> at -O0 and -0g)
> instead of just a follow through even at -O0.
>
> This also fixes a regression when inlining a noreturn function that returns 
> at -O0 (due to always_inline)
> as we would get an empty bb which has no successor edge instead of one with a 
> call to __builtin_unreachable.
>
> I also noticed there was no testcase testing the warning about 
> __builtin_return inside a noreturn function
> so I added a testcase there.
>
> Bootstrapped and tested on x86_64-linux-gnu.

OK.

Thanks,
Richard.

> PR ipa/119599
>
> gcc/ChangeLog:
>
> * tree-cfg.cc (pass_warn_function_return::execute): Turn return 
> statements always
> into __builtin_unreachable calls.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/torture/pr119599-1.c: New test.
> * gcc.dg/builtin-apply5.c: New test.
>
> Signed-off-by: Andrew Pinski 
> ---
>  gcc/testsuite/gcc.dg/builtin-apply5.c | 23 +++
>  gcc/testsuite/gcc.dg/torture/pr119599-1.c | 27 +++
>  gcc/tree-cfg.cc   | 20 +
>  3 files changed, 61 insertions(+), 9 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/builtin-apply5.c
>  create mode 100644 gcc/testsuite/gcc.dg/torture/pr119599-1.c
>
> diff --git a/gcc/testsuite/gcc.dg/builtin-apply5.c 
> b/gcc/testsuite/gcc.dg/builtin-apply5.c
> new file mode 100644
> index 000..16892f76a8a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/builtin-apply5.c
> @@ -0,0 +1,23 @@
> +/* { dg-options "-O2 -Wmissing-noreturn -fgnu89-inline" } */
> +/* { dg-additional-options "-mno-mmx" { target { { i?86-*-* x86_64-*-* } && 
> ia32 } } } */
> +/* { dg-do compile } */
> +
> +extern void abort (void);
> +
> +double
> +foo (int arg)
> +{
> +  if (arg != 116)
> +abort();
> +  return arg + 1;
> +}
> +
> +__attribute__((noreturn))
> +double
> +bar (int arg)
> +{
> +  foo (arg);
> +  __builtin_return (__builtin_apply ((void (*) ()) foo, /* { dg-warning 
> "'noreturn' function does return" } */
> +__builtin_apply_args (), 16));
> +}
> +
> diff --git a/gcc/testsuite/gcc.dg/torture/pr119599-1.c 
> b/gcc/testsuite/gcc.dg/torture/pr119599-1.c
> new file mode 100644
> index 000..4fbd228a359
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/torture/pr119599-1.c
> @@ -0,0 +1,27 @@
> +/* { dg-do compile } */
> +/* { dg-options "-fdump-tree-einline" } */
> +
> +/* PR ipa/119599 */
> +/* inlining a noreturn function which returns
> +   can cause an ICE when dealing finding an unreachable block.
> +   We should get a __builtin_unreachable after the inliing.  */
> +
> +
> +void baz (void);
> +
> +static inline __attribute__((always_inline, noreturn)) void
> +bar (void)
> +{
> +  static volatile int t = 0;
> +  if (t == 0)
> +baz ();
> +} /* { dg-warning "function does return" } */
> +
> +void
> +foo (void)
> +{
> +  bar ();
> +}
> +
> +/* After inlining, we should have call to __builtin_unreachable now. */
> +/* { dg-final { scan-tree-dump "__builtin_unreachable " "einline" } } */
> diff --git a/gcc/tree-cfg.cc b/gcc/tree-cfg.cc
> index f25480cf6fc..6a29a56ca9a 100644
> --- a/gcc/tree-cfg.cc
> +++ b/gcc/tree-cfg.cc
> @@ -9808,18 +9808,20 @@ pass_warn_function_return::execute (function *fun)
>(e = ei_safe_edge (ei)); )
> {
>   last = *gsi_last_bb (e->src);
> - if ((gimple_code (last) == GIMPLE_RETURN
> -  || gimple_call_builtin_p (last, BUILT_IN_RETURN))
> - && location == UNKNOWN_LOCATION
> - && ((location = LOCATION_LOCUS (gimple_location (last)))
> - != UNKNOWN_LOCATION)
> - && !optimize)
> -   break;
> - /* When optimizing, replace return stmts in noreturn functions
> + /* Warn about __builtin_return .*/
> + if (gimple_call_builtin_p (last, BUILT_IN_RETURN)
> + && location == UNKNOWN_LOCATION)
> +   {
> + location = LOCATION_LOCUS (gimple_location (last));
> + ei_next (&ei);
> +   }
> + /* Replace return stmts in noreturn functions
>  with __builtin_unreachable () call.  */
> - if (optimize && gimple_code (last) == GIMPLE_RETURN)
> + else if (gimple_code (last) == GIMPLE_RETURN)
> {
>   location_t loc = gimple_location (last);
> + if (location == UNKNOWN_LOCATION)
> +   location = LOCATION_LOCUS (loc);
>   gimple *new_stmt = gimple_build_builtin_unreachable (loc);
>   gimple_stmt_iterator gsi = gsi_for_stmt (last);
>

[PATCH] libgcobol: Check if the target needs libm.

2025-04-03 Thread Iain Sandoe
We should not assume that the target needs (or even has) libm.

Tested on the usual suspects (x86_64, aarch64 linux, x86_64 darwin).
OK for trunk?
thanks
Iain.

--- 8< ---

Use the libtool config check and $(LIBM).

libgcobol/ChangeLog:

* Makefile.am: Use $(LIBM) to add the math lib when
it is needed.
* Makefile.in: Regenerate.
* configure: Regenerate.
* configure.ac: Check if the target wants libm.

Signed-off-by: Iain Sandoe 
---
 libgcobol/Makefile.am  |   2 +-
 libgcobol/Makefile.in  |   3 +-
 libgcobol/configure| 146 -
 libgcobol/configure.ac |   1 +
 4 files changed, 148 insertions(+), 4 deletions(-)

diff --git a/libgcobol/Makefile.am b/libgcobol/Makefile.am
index cafb733dde1..45217421b1c 100644
--- a/libgcobol/Makefile.am
+++ b/libgcobol/Makefile.am
@@ -48,7 +48,7 @@ libgcobol_la_LINK = $(LIBTOOL) --mode=link --tag=CXX $(CXX)   
\
-Wc,-shared-libgcc  \
-version-info $(LIBGCOBOL_VERSION)  \
-lstdc++\
-   $(LTLDFLAGS) $(LTLIBICONV)
+   $(LTLDFLAGS) $(LTLIBICONV) $(LIBM)
 
 WARN_CFLAGS = -W -Wall -Wwrite-strings
 
diff --git a/libgcobol/Makefile.in b/libgcobol/Makefile.in
index c4a562a8058..0f0d9b687f1 100644
--- a/libgcobol/Makefile.in
+++ b/libgcobol/Makefile.in
@@ -305,6 +305,7 @@ LD = @LD@
 LDFLAGS = @LDFLAGS@
 LIBGCOBOL_VERSION = @LIBGCOBOL_VERSION@
 LIBICONV = @LIBICONV@
+LIBM = @LIBM@
 LIBOBJS = @LIBOBJS@
 LIBS = @LIBS@
 LIBTOOL = @LIBTOOL@
@@ -428,7 +429,7 @@ libgcobol_la_LINK = $(LIBTOOL) --mode=link --tag=CXX $(CXX) 
\
-Wc,-shared-libgcc  \
-version-info $(LIBGCOBOL_VERSION)  \
-lstdc++\
-   $(LTLDFLAGS) $(LTLIBICONV)
+   $(LTLDFLAGS) $(LTLIBICONV) $(LIBM)
 
 WARN_CFLAGS = -W -Wall -Wwrite-strings
 AM_CXXFLAGS = $(CXXFLAGS_FOR_TARGET)
diff --git a/libgcobol/configure b/libgcobol/configure
index e12b72e0817..e7b3b830890 100755
--- a/libgcobol/configure
+++ b/libgcobol/configure
@@ -646,6 +646,7 @@ enable_static
 enable_shared
 ENABLE_DARWIN_AT_RPATH_FALSE
 ENABLE_DARWIN_AT_RPATH_TRUE
+LIBM
 CXXCPP
 OTOOL64
 OTOOL
@@ -12908,7 +12909,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 12911 "configure"
+#line 12912 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
@@ -13014,7 +13015,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 13017 "configure"
+#line 13018 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
@@ -16348,6 +16349,147 @@ enable_dlopen=yes
 
 
 
+LIBM=
+case $host in
+*-*-beos* | *-*-cegcc* | *-*-cygwin* | *-*-haiku* | *-*-pw32* | *-*-darwin*)
+  # These system don't have libm, or don't need it
+  ;;
+*-ncr-sysv4.3*)
+  { $as_echo "$as_me:${as_lineno-$LINENO}: checking for _mwvalidcheckl in 
-lmw" >&5
+$as_echo_n "checking for _mwvalidcheckl in -lmw... " >&6; }
+if ${ac_cv_lib_mw__mwvalidcheckl+:} false; then :
+  $as_echo_n "(cached) " >&6
+else
+  ac_check_lib_save_LIBS=$LIBS
+LIBS="-lmw  $LIBS"
+if test x$gcc_no_link = xyes; then
+  as_fn_error $? "Link tests are not allowed after GCC_NO_EXECUTABLES." 
"$LINENO" 5
+fi
+cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+
+/* Override any GCC internal prototype to avoid an error.
+   Use char because int might match the return type of a GCC
+   builtin and then its argument prototype would still apply.  */
+#ifdef __cplusplus
+extern "C"
+#endif
+char _mwvalidcheckl ();
+int
+main ()
+{
+return _mwvalidcheckl ();
+  ;
+  return 0;
+}
+_ACEOF
+if ac_fn_c_try_link "$LINENO"; then :
+  ac_cv_lib_mw__mwvalidcheckl=yes
+else
+  ac_cv_lib_mw__mwvalidcheckl=no
+fi
+rm -f core conftest.err conftest.$ac_objext \
+conftest$ac_exeext conftest.$ac_ext
+LIBS=$ac_check_lib_save_LIBS
+fi
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_lib_mw__mwvalidcheckl" 
>&5
+$as_echo "$ac_cv_lib_mw__mwvalidcheckl" >&6; }
+if test "x$ac_cv_lib_mw__mwvalidcheckl" = xyes; then :
+  LIBM="-lmw"
+fi
+
+  { $as_echo "$as_me:${as_lineno-$LINENO}: checking for cos in -lm" >&5
+$as_echo_n "checking for cos in -lm... " >&6; }
+if ${ac_cv_lib_m_cos+:} false; then :
+  $as_echo_n "(cached) " >&6
+else
+  ac_check_lib_save_LIBS=$LIBS
+LIBS="-lm  $LIBS"
+if test x$gcc_no_link = xyes; then
+  as_fn_error $? "Link tests are not allowed after GCC_NO_EXECUTABLES." 
"$LINENO" 5
+fi
+cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+
+/* Override any GCC internal prototype to avoid an error.
+   Use char because int might match the return type of a GCC
+   builtin and then its argument prototype would still apply.  */
+#ifdef __cplusplus
+extern "C"
+#endif
+char cos 

[COMMITTED 31/35] gccrs: FIX ICE when working with HIR::BareFunctionType

2025-04-03 Thread arthur . cohen
From: Philip Herron 

Fixes Rust-GCC#3615

gcc/rust/ChangeLog:

* hir/rust-hir-dump.cc (Dump::visit): check has type
* hir/tree/rust-hir-type.cc (BareFunctionType::BareFunctionType): 
likewise

gcc/testsuite/ChangeLog:

* rust/compile/issue-3615.rs: New test.

Signed-off-by: Philip Herron 
---
 gcc/rust/hir/rust-hir-dump.cc| 4 +++-
 gcc/rust/hir/tree/rust-hir-type.cc   | 3 ++-
 gcc/testsuite/rust/compile/issue-3615.rs | 7 +++
 3 files changed, 12 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/rust/compile/issue-3615.rs

diff --git a/gcc/rust/hir/rust-hir-dump.cc b/gcc/rust/hir/rust-hir-dump.cc
index 0a9d617a919..89fcc3dd1f9 100644
--- a/gcc/rust/hir/rust-hir-dump.cc
+++ b/gcc/rust/hir/rust-hir-dump.cc
@@ -2440,7 +2440,9 @@ Dump::visit (BareFunctionType &e)
   end_field ("params");
 }
 
-  visit_field ("return_type", e.get_return_type ());
+  if (e.has_return_type ())
+visit_field ("return_type", e.get_return_type ());
+
   put_field ("is_variadic", std::to_string (e.get_is_variadic ()));
   end ("BareFunctionType");
 }
diff --git a/gcc/rust/hir/tree/rust-hir-type.cc 
b/gcc/rust/hir/tree/rust-hir-type.cc
index 689d86b7372..6a6c319fc98 100644
--- a/gcc/rust/hir/tree/rust-hir-type.cc
+++ b/gcc/rust/hir/tree/rust-hir-type.cc
@@ -268,7 +268,8 @@ BareFunctionType::BareFunctionType (BareFunctionType const 
&other)
 for_lifetimes (other.for_lifetimes),
 function_qualifiers (other.function_qualifiers), params (other.params),
 is_variadic (other.is_variadic),
-return_type (other.return_type->clone_type ())
+return_type (other.has_return_type () ? other.return_type->clone_type ()
+ : nullptr)
 {}
 
 BareFunctionType &
diff --git a/gcc/testsuite/rust/compile/issue-3615.rs 
b/gcc/testsuite/rust/compile/issue-3615.rs
new file mode 100644
index 000..e5c50725ea2
--- /dev/null
+++ b/gcc/testsuite/rust/compile/issue-3615.rs
@@ -0,0 +1,7 @@
+pub trait Trait {
+pub fn nrvo(init: fn()) -> [u8; 4096] {
+let mut buf = [0; 4096];
+
+buf
+}
+}
-- 
2.49.0



[PATCH v1 1/2] c: Add target_version attribute support.

2025-04-03 Thread Alfie Richards

This commit introduces support for the target_version attribute in the c
frontend, following the behavior defined in the Arm C Language Extension.

Key changes include:

- During pushdecl, the compiler now checks whether the current symbol is
  part of a multiversioned set.
  - New versions are added to the function multiversioning (FMV) set, and the
symbol binding is updated to include the default version (if present).
This means the binding for a multiversioned symbol will always reference
the default version (if present), as it defines the scope and signature
for the entire set.
  - Pre-existing versions are merged with their previous version (or diagnosed).
- Lookup logic is adjusted to prevent resolving non-default versions.
- start_decl and start_function are updated to handle marking and mangling of
  versioned functions.
- c_parse_final_cleanups now includes a call to process_same_body_aliases.
  This has no functional impact other than setting cpp_implicit_aliases_done
  on all nodes, which is necessary for certain shared FMV logic.

gcc/c/ChangeLog:

* c-decl.cc (maybe_mark_function_versioned): New function.
(merge_decls): Preserve DECL_FUNCTION_VERSIONED in merging.
(duplicate_decls): Add check and diagnostic for unmergable version 
decls.
(pushdecl): Add FMV target_version logic.
(lookup_name): Dont resolve non-default versions.
(start_decl): Mark and mangle versioned functions.
(start_function): Mark and mangle versioned functions.
(c_parse_final_cleanups): Add call to process_same_body_aliases.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/mv-1.c: New test.
* gcc.target/aarch64/mv-and-mvc1.c: New test.
* gcc.target/aarch64/mv-and-mvc2.c: New test.
* gcc.target/aarch64/mv-and-mvc3.c: New test.
* gcc.target/aarch64/mv-and-mvc4.c: New test.
* gcc.target/aarch64/mv-symbols1.c: New test.
* gcc.target/aarch64/mv-symbols10.c: New test.
* gcc.target/aarch64/mv-symbols11.c: New test.
* gcc.target/aarch64/mv-symbols12.c: New test.
* gcc.target/aarch64/mv-symbols13.c: New test.
* gcc.target/aarch64/mv-symbols2.c: New test.
* gcc.target/aarch64/mv-symbols3.c: New test.
* gcc.target/aarch64/mv-symbols4.c: New test.
* gcc.target/aarch64/mv-symbols5.c: New test.
* gcc.target/aarch64/mv-symbols6.c: New test.
* gcc.target/aarch64/mv-symbols7.c: New test.
* gcc.target/aarch64/mv-symbols8.c: New test.
* gcc.target/aarch64/mv-symbols9.c: New test.
* gcc.target/aarch64/mvc-symbols1.c: New test.
* gcc.target/aarch64/mvc-symbols2.c: New test.
* gcc.target/aarch64/mvc-symbols3.c: New test.
* gcc.target/aarch64/mvc-symbols4.c: New test.
---
 gcc/c/c-decl.cc   | 117 ++
 gcc/testsuite/gcc.target/aarch64/mv-1.c   |  43 +++
 .../gcc.target/aarch64/mv-and-mvc1.c  |  37 ++
 .../gcc.target/aarch64/mv-and-mvc2.c  |  28 +
 .../gcc.target/aarch64/mv-and-mvc3.c  |  40 ++
 .../gcc.target/aarch64/mv-and-mvc4.c  |  37 ++
 .../gcc.target/aarch64/mv-symbols1.c  |  38 ++
 .../gcc.target/aarch64/mv-symbols10.c |  42 +++
 .../gcc.target/aarch64/mv-symbols11.c |  16 +++
 .../gcc.target/aarch64/mv-symbols12.c |  27 
 .../gcc.target/aarch64/mv-symbols13.c |  28 +
 .../gcc.target/aarch64/mv-symbols2.c  |  28 +
 .../gcc.target/aarch64/mv-symbols3.c  |  27 
 .../gcc.target/aarch64/mv-symbols4.c  |  31 +
 .../gcc.target/aarch64/mv-symbols5.c  |  36 ++
 .../gcc.target/aarch64/mv-symbols6.c  |  20 +++
 .../gcc.target/aarch64/mv-symbols7.c  |  47 +++
 .../gcc.target/aarch64/mv-symbols8.c  |  47 +++
 .../gcc.target/aarch64/mv-symbols9.c  |  44 +++
 .../gcc.target/aarch64/mvc-symbols1.c |  25 
 .../gcc.target/aarch64/mvc-symbols2.c |  15 +++
 .../gcc.target/aarch64/mvc-symbols3.c |  19 +++
 .../gcc.target/aarch64/mvc-symbols4.c |  12 ++
 23 files changed, 804 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/mv-1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/mv-and-mvc1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/mv-and-mvc2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/mv-and-mvc3.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/mv-and-mvc4.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/mv-symbols1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/mv-symbols10.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/mv-symbols11.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/mv-symbols12.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/mv-symbols13.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/mv-symbols2.c
 create mode 100644 gc

Re: [PATCH 2/2] c++: maybe_dependent_member_ref and typenames [PR118626]

2025-04-03 Thread Jason Merrill

On 3/31/25 7:51 PM, Patrick Palka wrote:

Here during maybe_dependent_member_ref for accepted_type<_Up>, we
correctly don't strip the typedef because it's a complex one (its
defaulted template parameter isn't used in its definition) and so
we recurse to consider its corresponding TYPE_DECL.

We then incorrectly decide to not rewrite this use because of the
TYPENAME_TYPE shortcut.  But I don't think this shortcut should apply to
a typedef TYPE_DECL.

PR c++/118626

gcc/cp/ChangeLog:

* pt.cc (maybe_dependent_member_ref): Restrict TYPENAME_TYPE
shortcut to non-typedef TYPE_DECL.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/class-deduction-alias25a.C: New test.
---
  gcc/cp/pt.cc  |  3 ++-
  .../g++.dg/cpp2a/class-deduction-alias25a.C   | 19 +++
  2 files changed, 21 insertions(+), 1 deletion(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/class-deduction-alias25a.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 6509089efdc..a1b0df428dc 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -17786,7 +17786,8 @@ maybe_dependent_member_ref (tree t, tree args, 
tsubst_flags_t complain,
  
if (TREE_CODE (t) == TYPE_DECL)

  {
-  if (TREE_CODE (TREE_TYPE (t)) == TYPENAME_TYPE
+  if (!DECL_ORIGINAL_TYPE (t)


I think let's check !is_typedef_decl instead.  OK with that tweak.


+ && TREE_CODE (TREE_TYPE (t)) == TYPENAME_TYPE
  && TYPE_NAME (TREE_TYPE (t)) == t)
/* The TYPE_DECL for a typename has DECL_CONTEXT of the typename
   scope, but it doesn't need to be rewritten again.  */
diff --git a/gcc/testsuite/g++.dg/cpp2a/class-deduction-alias25a.C 
b/gcc/testsuite/g++.dg/cpp2a/class-deduction-alias25a.C
new file mode 100644
index 000..74ef1e4d890
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/class-deduction-alias25a.C
@@ -0,0 +1,19 @@
+// PR c++/118626
+// { dg-do compile { target c++20 } }
+
+template struct _Nth_type { using type = _Nth_type; };
+
+template
+struct variant {
+  template static constexpr long __accepted_index = 0;
+  template using __to_type = typename _Nth_type<_Np>::type;
+  template using __accepted_type = 
__to_type<__accepted_index<_Tp>>;
+  template> variant(_Up);
+};
+
+template
+struct Node { Node(_Tp); };
+
+template using Tree = variant>;
+using type = decltype(Tree{Node{42}});
+using type = Tree;




Re: [PATCH 1/2] c++: maybe_dependent_member_ref and stripped alias [PR118626]

2025-04-03 Thread Jason Merrill

On 3/31/25 7:51 PM, Patrick Palka wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this
look OK for trunk?  This isn't a regression and potentially impacts
C++17 code, but it also makes alias CTAD unusable for our std::variant
currently.


OK.


-- >8 --

Here during maybe_dependent_member_ref (as part of CTAD rewriting
for the variant constructor) for __accepted_type<_Up> we strip this
alias all the way to type _Nth_type<__accepted_index<_Up>>, for which
we return NULL since _Nth_type is at namespace scope and so no
longer needs rewriting.

Note that however the template argument __accepted_index<_Up> of this
stripped type _does_ need rewriting (since it specializes a variable
template from the current instantiation).  We end up not rewriting this
variable template reference at any point however because upon returning
NULL, the caller (tsubst) proceeds to substitute the original form of
the type __accepted_type<_Up>, which doesn't directly refer to
__accepted_index.  This later leads to an ICE during subsequent alias
CTAD rewriting of this guide that contains a non-rewritten reference
to __accepted_index.

So when maybe_dependent_member_ref decides to not rewrite a class-scope
alias that's been stripped, the caller needs to commit to substituting
the stripped type rather than the original type.  This patch essentially
implements that by making maybe_dependent_member_ref call tsubst itself
in that case.

PR c++/118626

gcc/cp/ChangeLog:

* pt.cc (maybe_dependent_member_ref): Substitute and return the
stripped type if we decided to not rewrite it directly.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/class-deduction-alias25.C: New test.
---
  gcc/cp/pt.cc  | 27 ---
  .../g++.dg/cpp2a/class-deduction-alias25.C| 19 +
  2 files changed, 43 insertions(+), 3 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/class-deduction-alias25.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index f7c56a10794..6509089efdc 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -17741,13 +17741,34 @@ maybe_dependent_member_ref (tree t, tree args, 
tsubst_flags_t complain,
  
if (TYPE_P (t))

  {
+  bool stripped = false;
if (typedef_variant_p (t))
-   t = strip_typedefs (t);
-  tree decl = TYPE_NAME (t);
+   {
+ /* Since this transformation may undesirably turn a deduced context
+into a non-deduced one, we'd rather strip typedefs than perform
+the transformation.  */
+ tree u = strip_typedefs (t);
+ stripped = u != t;
+ t = u;
+   }
+  decl = TYPE_NAME (t);
if (decl)
decl = maybe_dependent_member_ref (decl, args, complain, in_decl);
if (!decl)
-   return NULL_TREE;
+   {
+ if (stripped)
+   /* The original type was an alias from the current instantiation
+  which we stripped to something outside it.  At this point we
+  need to commit to using the stripped type rather than deferring
+  to the caller (which would use the original type), to ensure
+  eligible bits of the stripped type get transformed.  */
+   return tsubst (t, args, complain, in_decl);
+ else
+   /* The original type wasn't a typedef, and we decided it doesn't
+  need rewriting, so just let the caller (tsubst) substitute it
+  normally.  */
+   return NULL_TREE;
+   }
return cp_build_qualified_type (TREE_TYPE (decl), cp_type_quals (t),
  complain);
  }
diff --git a/gcc/testsuite/g++.dg/cpp2a/class-deduction-alias25.C 
b/gcc/testsuite/g++.dg/cpp2a/class-deduction-alias25.C
new file mode 100644
index 000..37ab932c2cc
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/class-deduction-alias25.C
@@ -0,0 +1,19 @@
+// PR c++/118626
+// { dg-do compile { target c++20 } }
+
+template struct _Nth_type;
+
+template
+struct variant {
+  template static constexpr long __accepted_index = 0;
+  template using __to_type = _Nth_type<_Np>;
+  template using __accepted_type = __to_type<__accepted_index<_Tp>>;
+  template> variant(_Up);
+};
+
+template
+struct Node { Node(_Tp); };
+
+template using Tree = variant>;
+using type = decltype(Tree{Node{42}});
+using type = Tree;




Re: [PATCH] c++: Fix typo in RAW_DATA_CST build_list_conv subsubconv hanling [PR119563]

2025-04-03 Thread Jason Merrill

On 4/1/25 10:31 AM, Jakub Jelinek wrote:

Hi!

The following testcase ICEs (the embed one actually doesn't but
dereferences random uninitialized pointer far after allocated memory)
because of a typo.  In the RAW_DATA_CST handling of list conversion
where there are conversions to something other than
initializer_list<{{,un}signed ,}char>, the code now calls
implicit_conversion for all the RAW_DATA_CST elements and stores them
into subsubconvs array.
The next loop (done in a separate loop because subsubconvs[0] is
handled differently) attempts to do the
   for (i = 0; i < len; ++i)
 {
   conversion *sub = subconvs[i];
   if (sub->rank > t->rank)
 t->rank = sub->rank;
   if (sub->user_conv_p)
 t->user_conv_p = true;
   if (sub->bad_p)
 t->bad_p = true;
 }
rank/user_conv_p/bad_p merging, but I mistyped the index, the loop
iterates with j iterator and i is subconvs index, so the loop effectively
doesn't do anything interesting except for merging from one of the
subsubconvs element, if lucky within the subsubconvs array, if unlucky
not even from inside of the array.

The following patch fixes that.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?


OK, seems obvious.


2025-04-01  Andrew Pinski  
Jakub Jelinek  

PR c++/119563
* call.cc (build_list_conv): Fix a typo in loop gathering
summary information from subsubconvs.

* g++.dg/cpp0x/pr119563.C: New test.
* g++.dg/cpp/embed-26.C: New test.

--- gcc/cp/call.cc.jj   2025-03-06 11:08:20.657231238 +0100
+++ gcc/cp/call.cc  2025-04-01 11:25:14.030321320 +0200
@@ -917,7 +917,7 @@ build_list_conv (tree type, tree ctor, i
  t->rank = cr_exact;
  for (j = 0; j < (unsigned) RAW_DATA_LENGTH (val); ++j)
{
- sub = subsubconvs[i];
+ sub = subsubconvs[j];
  if (sub->rank > t->rank)
t->rank = sub->rank;
  if (sub->user_conv_p)
--- gcc/testsuite/g++.dg/cpp0x/pr119563.C.jj2025-04-01 11:14:24.702394749 
+0200
+++ gcc/testsuite/g++.dg/cpp0x/pr119563.C   2025-04-01 11:13:50.962866358 
+0200
@@ -0,0 +1,79 @@
+// PR c++/119563
+// { dg-do run { target c++11 } }
+// { dg-options "-O2" }
+
+namespace std {
+template 
+struct initializer_list {
+private:
+  T *_M_array;
+  decltype (sizeof 0) _M_len;
+public:
+  constexpr decltype (sizeof 0)
+  size () const noexcept { return _M_len; }
+  constexpr const T *
+  begin () const noexcept { return _M_array; }
+  constexpr const T *
+  end () const noexcept { return begin () + size (); }
+};
+}
+
+struct A {} a;
+
+struct B {
+  constexpr B (int x) : B (a, x) {}
+  template 
+  constexpr B (A, T... x) : b(x...) {}
+  int b;
+};
+
+struct C {
+  C (std::initializer_list x)
+  {
+if (x.size () != 130 + 1024 + 130)
+  __builtin_abort ();
+unsigned int i = 1, j = 0;
+for (auto a = x.begin (); a < x.end (); ++a)
+  if (a->b != i)
+   __builtin_abort ();
+  else
+   {
+ if (j == 129 || j == 129 + 1024)
+   i = 0;
+ i = (i & 15) + 1;
+ ++j;
+   }
+c = true;
+  }
+  bool c;
+};
+
+#define D 1 + 0, 2 + 0, 3 + 0, 4 + 0, 5 + 0, 6 + 0, 7 + 0, 8 + 0, \
+ 9 + 0, 10 + 0, 11 + 0, 12 + 0, 13 + 0, 14 + 0, 15 + 0, 16 + 0
+#define E D, D, D, D, D, D, D, D
+
+C c { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
+  1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
+  1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
+  1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
+  1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
+  1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
+  1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
+  1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
+  1, 2, E, E, E, E, E, E, E, E,
+  1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
+  1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
+  1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
+  1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
+  1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
+  1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
+  1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
+  1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
+  1, 2 };
+
+int
+main ()
+{
+  if (!c.c)
+__builtin_abort ();
+}
--- gcc/testsuite/g++.dg/cpp/embed-26.C.jj  2025-04-01 11:18:56.563595381 
+0200
+++ gcc/testsuite/g++.dg/cpp/embed-26.C 2025-04-01 11:18:35.444890450 +0200
@@ -0,0 +1,63 @@
+// PR c++/119563
+// { dg-do run { target c++11 } }
+// { dg-options "-O2" }
+
+namespace std {
+template 
+struct initializer_list {
+private:
+  T *_M_array;
+  decltype (sizeof 0) _M_len;
+public:
+  constexpr decltype (sizeof 0)
+  size () const noexcept { return _M_len; }
+  constexpr const T *
+  begin () const noexcept { return _M_array; }
+

nvptx: Don't use PTX '.const', constant state space [PR119573] (was: 'TREE_READONLY' for 'const' array in C vs. C++)

2025-04-03 Thread Thomas Schwinge
Hi!

I have, by the way, filed 
"nvptx: PTX '.const', constant state space" for this topic.

On 2025-04-01T09:32:46+0200, Jakub Jelinek  wrote:
> On Tue, Apr 01, 2025 at 09:19:08AM +0200, Richard Biener via Gcc wrote:
>> On Tue, Apr 1, 2025 at 12:04 AM Thomas Schwinge  
>> wrote:
>> > In Nvidia PTX, "A state space is a storage area with particular
>> > characteristics.  All variables reside in some state space.  [...]".
>> > These include:
>> >
>> > .const  Shared, read-only memory.
>> > .global  Global memory, shared by all threads.
>> >
>> > Implemented via 'TARGET_ENCODE_SECTION_INFO', GCC/nvptx then uses
>> > special-cased instructions for accessing the respective memory regions.
>> >
>> > Now, given a 'const' array (with whatever element type; not interesting
>> > here), like:
>> >
>> > extern int * const arr[];
>> >
>> > ..., for GCC/C compilation, we access this as '.const' memory: GCC/nvptx
>> > 'DATA_AREA_CONST', but for GCC/C++ compilation, we access it as
>> > 'DATA_AREA_GLOBAL', and then fault at run time due to mismatch with the
>> > definition, which actually is '.const' for both C and C++ compilation.
>> >
>> > The difference is, when we get to 'TARGET_ENCODE_SECTION_INFO', that for
>> > C we've got 'TREE_READONLY(decl)', but for C++ we don't.
>> 
>> In C++ there could be runtime initializers for a const qualified
>> object.

Of course, C++...  ;-)

>> I think all
>> you need to do is make sure the logic that places the object in .const
>> vs. .global
>> is consistent with the logic deciding how to access it.

Right, that was the goal, essentially.  However it's indeed not possible,
because of...

> If it is extern, then for C++ you really don't know if it will be
> TREE_READONLY or not, that depends on whether it has runtime initialization
> or not and only the TU which defines it knows that.

... this.

> And deriving something from TYPE_READONLY of the ARRAY_TYPE or even
> from strip_array_types of the type does not work, that reflects whether it
> is const or not, I think it will still be set even if it has mutable members
> and doesn't say anything about runtime initialization.

A 'make check' clearly agreed that this is not the right thing to do.
(Not sure how/if the GCC/AVR back end gets away with doing it this way?

Pushed to trunk branch commit 5deeae29dab2af64e3342daf7a3e424c64ea
"nvptx: Don't use PTX '.const', constant state space [PR119573]", see
attached.  This is also in line with the remark by Alexander, that using
PTX '.const' for arbitrary read-only data is questionable.

In addition to the individual test case progressions listed in the Git
commit log, in the following the complete lists of progressed libstdc++
test cases, as indicated in the Git commit log, where I didn't want to
include these in detail.

| We progress:
| 
| error   : Memory space doesn't match for 
'_ZNSt3tr18__detail12__prime_listE' in 'input file 3 at offset 53085', first 
specified in 'input file 1 at offset 1924'
| nvptx-run: cuLinkAddData failed: device kernel image is invalid 
(CUDA_ERROR_INVALID_SOURCE, 300)
| 
| ... into execution test PASS for a few dozens of libstdc++ test cases.

These are:

PASS: tr1/6_containers/unordered_map/24064.cc  -std=gnu++17 (test for 
excess errors)
[-FAIL:-]{+PASS:+} tr1/6_containers/unordered_map/24064.cc  -std=gnu++17 
execution test

PASS: tr1/6_containers/unordered_map/capacity/29134-map.cc  -std=gnu++17 
(test for excess errors)
[-FAIL:-]{+PASS:+} tr1/6_containers/unordered_map/capacity/29134-map.cc  
-std=gnu++17 execution test

PASS: tr1/6_containers/unordered_map/erase/1.cc  -std=gnu++17 (test for 
excess errors)
[-FAIL:-]{+PASS:+} tr1/6_containers/unordered_map/erase/1.cc  -std=gnu++17 
execution test

PASS: tr1/6_containers/unordered_map/erase/24061-map.cc  -std=gnu++17 (test 
for excess errors)
[-FAIL:-]{+PASS:+} tr1/6_containers/unordered_map/erase/24061-map.cc  
-std=gnu++17 execution test

PASS: tr1/6_containers/unordered_map/find/map1.cc  -std=gnu++17 (test for 
excess errors)
[-FAIL:-]{+PASS:+} tr1/6_containers/unordered_map/find/map1.cc  
-std=gnu++17 execution test

PASS: tr1/6_containers/unordered_map/insert/24061-map.cc  -std=gnu++17 
(test for excess errors)
[-FAIL:-]{+PASS:+} tr1/6_containers/unordered_map/insert/24061-map.cc  
-std=gnu++17 execution test

PASS: tr1/6_containers/unordered_map/insert/array_syntax.cc  -std=gnu++17 
(test for excess errors)
[-FAIL:-]{+PASS:+} tr1/6_containers/unordered_map/insert/array_syntax.cc  
-std=gnu++17 execution test

PASS: tr1/6_containers/unordered_map/insert/map_range.cc  -std=gnu++17 
(test for excess errors)
[-FAIL:-]{+PASS:+} tr1/6_containers/unordered_map/insert/map_range.cc  
-std=gnu++17 execution test

PASS: tr1/6_containers/unordered_map/insert/map_single.cc  -std=gnu++17 
(test for excess errors)
[-FAIL:-]{+PASS:+} tr1/6_containers/unordered_map/insert/map_single.cc  
-st

Re: [PATCH] c++: constraint variable used in evaluated context [PR117849]

2025-04-03 Thread Jason Merrill

On 4/2/25 2:28 PM, Patrick Palka wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look
OK for trunk/14?


OK for 14.

For 15, what happens if we remove that error entirely?  Do we still give 
other errors in that case?  That seems to be the expectation of 
https://eel.is/c++draft/basic.scope.param#note-1 and I'd expect that in 
a constant-evaluated (and therefore evaluated) context we would complain 
about it not being constant.


I think the current rule that corresponds to that error is
https://eel.is/c++draft/basic.def.odr#10 -- odr-using a variable that 
isn't odr-usable.  process_outer_var_ref/mark_use make that distinction 
for variables from an enclosing function, but I think for references 
outside of any function constant-evaluation should handle it.



-- >8 --

Here we wrongly reject the requires-expression requirement at parse time
due to the use of the constraint variable 't' within a template argument
(an evaluated context).  Fix this simply by refining the "use of parameter
outside function body" code path to exclude constraint variables.

PR c++/104255 tracks the same issue for function parameters, but fixing
that would be more involved, requiring changes to the PARM_DECL case of
tsubst_expr.

PR c++/117849

gcc/cp/ChangeLog:

* semantics.cc (finish_id_expression_1): Allow use of constraint
variable outside in an evaluated context.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/concepts-requires41.C: New test.
---
  gcc/cp/semantics.cc   |  1 +
  .../g++.dg/cpp2a/concepts-requires41.C| 23 +++
  2 files changed, 24 insertions(+)
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-requires41.C

diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index 7d8beb8833d..a10ef34383c 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -4755,6 +4755,7 @@ finish_id_expression_1 (tree id_expression,
 body, except inside an unevaluated context (i.e. decltype).  */
if (TREE_CODE (decl) == PARM_DECL
  && DECL_CONTEXT (decl) == NULL_TREE
+ && !CONSTRAINT_VAR_P (decl)
  && !cp_unevaluated_operand
  && !processing_contract_condition
  && !processing_omp_trait_property_expr)
diff --git a/gcc/testsuite/g++.dg/cpp2a/concepts-requires41.C 
b/gcc/testsuite/g++.dg/cpp2a/concepts-requires41.C
new file mode 100644
index 000..840ed86c4ac
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/concepts-requires41.C
@@ -0,0 +1,23 @@
+// PR c++/117849
+// { dg-do compile { target c++20 } }
+
+template struct array {
+  constexpr int size() const { return N; }
+};
+
+struct vector {
+  int _size = 3;
+  constexpr int size() const { return _size; }
+};
+
+template struct integral_constant {
+  constexpr operator int() const { return N; }
+};
+
+template
+concept StaticSize = requires (T& t) {
+  typename integral_constant;
+};
+
+static_assert(StaticSize>);
+static_assert(!StaticSize);




[PATCH] always turn return into __builtin_unreachable for noreturn fuctions [PR119599]

2025-04-03 Thread Andrew Pinski
r8-3988-g356fcc67fba52b added code to turn return statements into 
__builtin_unreachable
calls inside noreturn functions but only while optimizing. Since 
-funreachable-traps
was added (r13-1204-gd68d3664253696), it is a good idea to move over to using
__builtin_unreachable (and the trap version with this option which defaults at 
-O0 and -0g)
instead of just a follow through even at -O0.

This also fixes a regression when inlining a noreturn function that returns at 
-O0 (due to always_inline)
as we would get an empty bb which has no successor edge instead of one with a 
call to __builtin_unreachable.

I also noticed there was no testcase testing the warning about __builtin_return 
inside a noreturn function
so I added a testcase there.

Bootstrapped and tested on x86_64-linux-gnu.

PR ipa/119599

gcc/ChangeLog:

* tree-cfg.cc (pass_warn_function_return::execute): Turn return 
statements always
into __builtin_unreachable calls.

gcc/testsuite/ChangeLog:

* gcc.dg/torture/pr119599-1.c: New test.
* gcc.dg/builtin-apply5.c: New test.

Signed-off-by: Andrew Pinski 
---
 gcc/testsuite/gcc.dg/builtin-apply5.c | 23 +++
 gcc/testsuite/gcc.dg/torture/pr119599-1.c | 27 +++
 gcc/tree-cfg.cc   | 20 +
 3 files changed, 61 insertions(+), 9 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/builtin-apply5.c
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr119599-1.c

diff --git a/gcc/testsuite/gcc.dg/builtin-apply5.c 
b/gcc/testsuite/gcc.dg/builtin-apply5.c
new file mode 100644
index 000..16892f76a8a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/builtin-apply5.c
@@ -0,0 +1,23 @@
+/* { dg-options "-O2 -Wmissing-noreturn -fgnu89-inline" } */
+/* { dg-additional-options "-mno-mmx" { target { { i?86-*-* x86_64-*-* } && 
ia32 } } } */
+/* { dg-do compile } */
+
+extern void abort (void);
+
+double
+foo (int arg)
+{
+  if (arg != 116)
+abort();
+  return arg + 1;
+}
+
+__attribute__((noreturn))
+double
+bar (int arg)
+{
+  foo (arg);
+  __builtin_return (__builtin_apply ((void (*) ()) foo, /* { dg-warning 
"'noreturn' function does return" } */
+__builtin_apply_args (), 16));
+}
+
diff --git a/gcc/testsuite/gcc.dg/torture/pr119599-1.c 
b/gcc/testsuite/gcc.dg/torture/pr119599-1.c
new file mode 100644
index 000..4fbd228a359
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr119599-1.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-options "-fdump-tree-einline" } */
+
+/* PR ipa/119599 */
+/* inlining a noreturn function which returns
+   can cause an ICE when dealing finding an unreachable block.
+   We should get a __builtin_unreachable after the inliing.  */
+
+
+void baz (void);
+
+static inline __attribute__((always_inline, noreturn)) void
+bar (void)
+{
+  static volatile int t = 0;
+  if (t == 0)
+baz ();
+} /* { dg-warning "function does return" } */
+
+void
+foo (void)
+{
+  bar ();
+}
+
+/* After inlining, we should have call to __builtin_unreachable now. */
+/* { dg-final { scan-tree-dump "__builtin_unreachable " "einline" } } */
diff --git a/gcc/tree-cfg.cc b/gcc/tree-cfg.cc
index f25480cf6fc..6a29a56ca9a 100644
--- a/gcc/tree-cfg.cc
+++ b/gcc/tree-cfg.cc
@@ -9808,18 +9808,20 @@ pass_warn_function_return::execute (function *fun)
   (e = ei_safe_edge (ei)); )
{
  last = *gsi_last_bb (e->src);
- if ((gimple_code (last) == GIMPLE_RETURN
-  || gimple_call_builtin_p (last, BUILT_IN_RETURN))
- && location == UNKNOWN_LOCATION
- && ((location = LOCATION_LOCUS (gimple_location (last)))
- != UNKNOWN_LOCATION)
- && !optimize)
-   break;
- /* When optimizing, replace return stmts in noreturn functions
+ /* Warn about __builtin_return .*/
+ if (gimple_call_builtin_p (last, BUILT_IN_RETURN)
+ && location == UNKNOWN_LOCATION)
+   {
+ location = LOCATION_LOCUS (gimple_location (last));
+ ei_next (&ei);
+   }
+ /* Replace return stmts in noreturn functions
 with __builtin_unreachable () call.  */
- if (optimize && gimple_code (last) == GIMPLE_RETURN)
+ else if (gimple_code (last) == GIMPLE_RETURN)
{
  location_t loc = gimple_location (last);
+ if (location == UNKNOWN_LOCATION)
+   location = LOCATION_LOCUS (loc);
  gimple *new_stmt = gimple_build_builtin_unreachable (loc);
  gimple_stmt_iterator gsi = gsi_for_stmt (last);
  gsi_replace (&gsi, new_stmt, true);
-- 
2.43.0



libstdc++, nvptx: Remove machinery to inject per-file flags (was: GCN, nvptx: Allow for "hosted" libstdc++ build)

2025-04-03 Thread Thomas Schwinge
Hi!

On 2025-03-14T11:39:20+0100, I wrote:
> Subject: [PATCH] GCN, nvptx: Allow for "hosted" libstdc++ build
>
> We need [...], and
> for nvptx twiddle some more knobs.

> --- /dev/null
> +++ b/libstdc++-v3/config/cpu/nvptx/t-nvptx
> @@ -0,0 +1,7 @@
> +# Per-file flags, see '../../../configure.host', "inject per-file flags".
> +
> +# 'ptxas'/CUDA Driver rejects objects with a lot of global constant data:
> +# ptxas error   : File uses too much global constant data ([...])
> +# Cut short the assembly-time check; defer to actual use of the object file.
> +AM_MAKEFLAGS += CXXFLAGS-src/c++17/floating_to_chars.lo=-Wa,--no-verify
> +AM_MAKEFLAGS += CXXFLAGS-src/c++20/tzdb.lo=-Wa,--no-verify

This use is gone as of commit 5deeae29dab2af64e3342daf7a3e424c64ea
"nvptx: Don't use PTX '.const', constant state space [PR119573]", so...

> --- a/libstdc++-v3/configure.host
> +++ b/libstdc++-v3/configure.host

> +  nvptx-*-none)
> +# For 'make all-target-libstdc++-v3', we need to inject per-file flags:
> +OPTIMIZE_CXXFLAGS="${OPTIMIZE_CXXFLAGS} \$(CXXFLAGS-\$(subdir)/\$@)"
> +# ..., see:
> +tmake_file="$tmake_file cpu/nvptx/t-nvptx"

... this is not necessary anymore, and I've pushed to trunk branch
commit 287f360b3e75a19c48ee14c71f51b6e7968474ef
"libstdc++, nvptx: Remove machinery to inject per-file flags", see
attached.


Grüße
 Thomas


>From 287f360b3e75a19c48ee14c71f51b6e7968474ef Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Wed, 2 Apr 2025 11:05:08 +0200
Subject: [PATCH] libstdc++, nvptx: Remove machinery to inject per-file flags

Not used anymore.

	libstdc++-v3/
	* config/cpu/nvptx/t-nvptx: Remove.
	* configure.host [nvptx]: Adjust.
---
 libstdc++-v3/config/cpu/nvptx/t-nvptx | 1 -
 libstdc++-v3/configure.host   | 5 -
 2 files changed, 6 deletions(-)
 delete mode 100644 libstdc++-v3/config/cpu/nvptx/t-nvptx

diff --git a/libstdc++-v3/config/cpu/nvptx/t-nvptx b/libstdc++-v3/config/cpu/nvptx/t-nvptx
deleted file mode 100644
index eacc5468d627..
--- a/libstdc++-v3/config/cpu/nvptx/t-nvptx
+++ /dev/null
@@ -1 +0,0 @@
-# Per-file flags, see '../../../configure.host', "inject per-file flags".
diff --git a/libstdc++-v3/configure.host b/libstdc++-v3/configure.host
index 0bed9dfff957..8375764bf4dc 100644
--- a/libstdc++-v3/configure.host
+++ b/libstdc++-v3/configure.host
@@ -374,11 +374,6 @@ case "${host}" in
  port_specific_symbol_files="\$(srcdir)/../config/os/gnu-linux/arm-eabi-extra.ver"
  ;;
   nvptx-*-none)
-# For 'make all-target-libstdc++-v3', we need to inject per-file flags:
-OPTIMIZE_CXXFLAGS="${OPTIMIZE_CXXFLAGS} \$(CXXFLAGS-\$(subdir)/\$@)"
-# ..., see:
-tmake_file="$tmake_file cpu/nvptx/t-nvptx"
-
 # For 'make all-target-libstdc++-v3', re 'alloca'/VLA usage:
 EXTRA_CFLAGS="${EXTRA_CFLAGS} -mfake-ptx-alloca"
 OPTIMIZE_CXXFLAGS="${OPTIMIZE_CXXFLAGS} -mfake-ptx-alloca"
-- 
2.34.1



[pushed] c++/modules: inline loaded at eof

2025-04-03 Thread Jason Merrill
Tested x86_64-pc-linux-gnu, applying to trunk.

-- 8< --

std/format/string.cc and a few other libstdc++ tests were failing with
module std with undefined references to __failed_to_parse_format_spec.  This
turned out to be because since r15-8012 we don't end up calling
note_vague_linkage_fn for functions loaded after at_eof is set.

But once import_export_decl decides on COMDAT linkage, we should be able to
just clear DECL_EXTERNAL and let cgraph take it from there.

I initially made this change in import_export_decl, but decided that for GCC
15 it would be safer to limit the change to modules.  For GCC 16 I'd like to
do away with DECL_NOT_REALLY_EXTERN entirely, it's been obsolete since
cgraphunit in 2003.

gcc/cp/ChangeLog:

* module.cc (module_state::read_cluster)
(post_load_processing): Clear DECL_EXTERNAL if DECL_COMDAT.
---
 gcc/cp/module.cc | 13 +
 1 file changed, 13 insertions(+)

diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index ce22b2ece3f..89deabbfee3 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -16679,6 +16679,15 @@ module_state::read_cluster (unsigned snum)
 #endif
  cfun->returns_struct = aggr;
  expand_or_defer_fn (decl);
+
+ /* If we first see this function after at_eof, it doesn't get
+note_vague_linkage_fn from tentative_decl_linkage, so the loop in
+c_parse_final_cleanups won't consider it.  But with DECL_COMDAT we
+can just clear DECL_EXTERNAL and let cgraph decide.
+FIXME handle this outside module.cc after GCC 15.  */
+ if (at_eof && DECL_COMDAT (decl) && DECL_EXTERNAL (decl)
+ && DECL_NOT_REALLY_EXTERN (decl))
+   DECL_EXTERNAL (decl) = false;
}
 
 }
@@ -19159,6 +19168,10 @@ post_load_processing ()
 
   gcc_checking_assert (DECL_MAYBE_IN_CHARGE_CDTOR_P (decl));
   expand_or_defer_fn (decl);
+  /* As in module_state::read_cluster.  */
+  if (at_eof && DECL_COMDAT (decl) && DECL_EXTERNAL (decl)
+ && DECL_NOT_REALLY_EXTERN (decl))
+   DECL_EXTERNAL (decl) = false;
 }
 
   cfun = old_cfun;

base-commit: 634215cdc3c569f9a9a247dcd4d9a4d6ce68ad57
-- 
2.49.0



Re: [PATCH] c++: P2280R4 and speculative constexpr folding [PR119387]

2025-04-03 Thread Jason Merrill

On 4/2/25 3:07 PM, Patrick Palka wrote:

Bootstrapped and regtested on x86_64-pc-linux, does this look
OK  for trunk/14?


OK.


-- >8 --

The testcase in this PR uses 2.5x more memory and 6x more compile
time ever since r14-5979 which implements P2280R4.  This is because
our speculative constexpr folding now does a lot more work trying to
fold ultimately non-constant calls to constexpr functions, and in turn
produces a lot of garbage.  Sometimes, we do successfully fold more
thanks to P2280R4, but it seems to be trivial stuff like calls to
std::array::size or std::addressof.  The benefit of P2280 therefore
doesn't seem worth the cost during speculative constexpr folding, so
this patch restricts the paper to only manifestly-constant evaluation.

PR c++/119387

gcc/cp/ChangeLog:

* constexpr.cc (p2280_active_p): Define.
(cxx_eval_constant_expression) : Use it to
restrict P2280 relaxations.
: Likewise.
---
  gcc/cp/constexpr.cc | 22 +++---
  1 file changed, 19 insertions(+), 3 deletions(-)

diff --git a/gcc/cp/constexpr.cc b/gcc/cp/constexpr.cc
index 4820bcc84aa..ee968bc4630 100644
--- a/gcc/cp/constexpr.cc
+++ b/gcc/cp/constexpr.cc
@@ -1294,6 +1294,22 @@ struct constexpr_ctx {
mce_value manifestly_const_eval;
  };
  
+/* True if the constexpr relaxations afforded by P2280R4 for unknown

+   references and objects are in effect.  */
+
+static bool
+p2280_active_p (const constexpr_ctx *ctx)
+{
+  if (ctx->manifestly_const_eval != mce_true)
+/* Disable these relaxations during speculative constexpr folding,
+   as it can significantly increase compile time/memory use
+   (PR119387).  */
+return false;
+
+  /* P2280R4 is accepted as a DR back to C++11.  */
+  return cxx_dialect >= cxx11;
+}
+
  /* Remove T from the global values map, checking for attempts to destroy
 a value that has already finished its lifetime.  */
  
@@ -7792,7 +7808,7 @@ cxx_eval_constant_expression (const constexpr_ctx *ctx, tree t,

r = TARGET_EXPR_INITIAL (r);
if (DECL_P (r)
  /* P2280 allows references to unknown.  */
- && !(VAR_P (t) && TYPE_REF_P (TREE_TYPE (t
+ && !(p2280_active_p (ctx) && VAR_P (t) && TYPE_REF_P (TREE_TYPE (t
{
  if (!ctx->quiet)
non_const_var_error (loc, r, /*fundef_p*/false);
@@ -7844,9 +7860,9 @@ cxx_eval_constant_expression (const constexpr_ctx *ctx, 
tree t,
  r = build_constructor (TREE_TYPE (t), NULL);
  TREE_CONSTANT (r) = true;
}
-  else if (TYPE_REF_P (TREE_TYPE (t)))
+  else if (p2280_active_p (ctx) && TYPE_REF_P (TREE_TYPE (t)))
/* P2280 allows references to unknown...  */;
-  else if (is_this_parameter (t))
+  else if (p2280_active_p (ctx) && is_this_parameter (t))
/* ...as well as the this pointer.  */;
else
{




Re: [RFC] [C]New syntax for the argument of counted_by attribute for C language

2025-04-03 Thread Kees Cook
On Wed, Apr 02, 2025 at 09:16:47PM +, Qing Zhao wrote:
> Hi, Bill,
> 
> Thanks for the summary. 
> 
> I think that this is good.
> 
> Two more questions:
> 
> 1. Shall we keep the same name of the attribute counted_by for the 2nd new 
> syntax? 
> Or use a new name, such as, “counted_by_exp"?

FWIW, the Linux kernel will likely need to create its own macros for all
of this stuff (just to do the compiler version testing), so there should
be no problem with different names (or same names) in the compiler API.

> I don’t have strong preference here. As mentioned by Jacub in a 
> previous email, these two syntaxes can be distinguished by the number 
> of arguments of the attribute. 
>   
> So for GCC, there should be no issue with either old name of a new name.
> However, I am not sure about CLANG. (Considering the complication with 
> APPLE’s implementation)
> 
> 2. The 2nd new syntax should resolve all the following new requests from 
> Linux Kernel:
>   2.1 Refer to a field in the nested structure
>   2.2 Refer to globals or locals
>   2.3 Represent simple expression
>   2.4 Forward referencing
> 
> Except not very sure on 2.1: refer to a field in the nested structure
> 
> struct Y {
>  int n;
>  int other;
> }
>  
> struct Z {   
>  struct Y y;
>  int array[]  __attribute__ ((counted_by(?y.n)));
> };
> 
> in the above, what should be put instead of "?" to refer to the field "n" of 
> the field "y" of the current object of this struct Z?
> 
> Based on my understanding, with the new syntax, 
> 
> struct Z {   
>  struct Y y;
>  int array[]  __attribute__ ((counted_by(struct Y y, y.n)));
> };
> 
> i.e, the compiler will lookup “y” inside the current structure, then 
> resolving the member access operator “.” as an expression. 
> 
> Is this correct understanding?

I had the same question, e.g. how is this supposed to be declared:

struct Y {
  int n;
  int other;
}

struct Z {
  int *ints __attribute__ ((counted_by(y.n)));
  struct Y y;
};



-- 
Kees Cook


[PATCH] c++: Fix GC with TU_LOCAL_ENTITY [PR119564]

2025-04-03 Thread Nathaniel Shead
Tested on x86_64-pc-linux-gnu (so far just modules.exp), OK for trunk if
full bootstrap+regtest succeeds?

-- >8 --

When adding TU_LOCAL_ENTITY in r15-6379 I neglected to add it to
cp_tree_node_structure, so garbage collection was crashing on it.

gcc/cp/ChangeLog:

* decl.cc (cp_tree_node_structure): Add TU_LOCAL_ENTITY; fix
formatting.

gcc/testsuite/ChangeLog:

* g++.dg/modules/gc-3_a.C: New test.
* g++.dg/modules/gc-3_b.C: New test.

Signed-off-by: Nathaniel Shead 
---
 gcc/cp/decl.cc| 7 ---
 gcc/testsuite/g++.dg/modules/gc-3_a.C | 7 +++
 gcc/testsuite/g++.dg/modules/gc-3_b.C | 4 
 3 files changed, 15 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/modules/gc-3_a.C
 create mode 100644 gcc/testsuite/g++.dg/modules/gc-3_b.C

diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
index 2ed94fd786c..4e97093b134 100644
--- a/gcc/cp/decl.cc
+++ b/gcc/cp/decl.cc
@@ -19834,14 +19834,14 @@ cp_tree_node_structure (union lang_tree_node * t)
 {
   switch (TREE_CODE (&t->generic))
 {
-case ARGUMENT_PACK_SELECT:  return TS_CP_ARGUMENT_PACK_SELECT;
+case ARGUMENT_PACK_SELECT: return TS_CP_ARGUMENT_PACK_SELECT;
 case BASELINK: return TS_CP_BASELINK;
-case CONSTRAINT_INFO:   return TS_CP_CONSTRAINT_INFO;
+case CONSTRAINT_INFO:  return TS_CP_CONSTRAINT_INFO;
 case DEFERRED_NOEXCEPT:return TS_CP_DEFERRED_NOEXCEPT;
 case DEFERRED_PARSE:   return TS_CP_DEFERRED_PARSE;
 case IDENTIFIER_NODE:  return TS_CP_IDENTIFIER;
 case LAMBDA_EXPR:  return TS_CP_LAMBDA_EXPR;
-case BINDING_VECTOR:   return TS_CP_BINDING_VECTOR;
+case BINDING_VECTOR:   return TS_CP_BINDING_VECTOR;
 case OVERLOAD: return TS_CP_OVERLOAD;
 case PTRMEM_CST:   return TS_CP_PTRMEM;
 case STATIC_ASSERT:return TS_CP_STATIC_ASSERT;
@@ -19849,6 +19849,7 @@ cp_tree_node_structure (union lang_tree_node * t)
 case TEMPLATE_INFO:return TS_CP_TEMPLATE_INFO;
 case TEMPLATE_PARM_INDEX:  return TS_CP_TPI;
 case TRAIT_EXPR:   return TS_CP_TRAIT_EXPR;
+case TU_LOCAL_ENTITY:  return TS_CP_TU_LOCAL_ENTITY;
 case USERDEF_LITERAL:  return TS_CP_USERDEF_LITERAL;
 default:   return TS_CP_GENERIC;
 }
diff --git a/gcc/testsuite/g++.dg/modules/gc-3_a.C 
b/gcc/testsuite/g++.dg/modules/gc-3_a.C
new file mode 100644
index 000..b4adb2aa5a3
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/gc-3_a.C
@@ -0,0 +1,7 @@
+// PR c++/119564
+// { dg-additional-options "-fmodules -Wtemplate-names-tu-local" }
+// { dg-module-cmi M }
+
+export module M;
+static void foo() {};  // { dg-message "declared" }
+template  void bar() { foo(); }  // { dg-warning "TU-local" }
diff --git a/gcc/testsuite/g++.dg/modules/gc-3_b.C 
b/gcc/testsuite/g++.dg/modules/gc-3_b.C
new file mode 100644
index 000..1d1dc872eaf
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/gc-3_b.C
@@ -0,0 +1,4 @@
+// PR c++/119564
+// { dg-additional-options "-fmodules -fno-module-lazy 
--param=ggc-min-expand=0 --param=ggc-min-heapsize=0" }
+
+import M;
-- 
2.47.0



[PATCH] c++/modules: Fix divergence in streaming/non-streaming tree walks [PR119608]

2025-04-03 Thread Nathaniel Shead
Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk?

-- >8 --

Modules streaming walks decls multiple times, first as a non-streaming
walk to find dependencies, and then later to actually emit the decls.
The first walk needs to be done to note locations that will be emitted.

In the PR we are getting a checking ICE because we are streaming a decl
that we didn't initially walk when collecting dependencies, so the
location isn't in the noted locations map.  This is because in decl_node
we have a branch where a PARM_DECL that hasn't previously been
referenced gets walked by value only if 'streaming_p ()' is true.

The true root cause here is that the decltype(v) in the testcase refers
to a different PARM_DECL from the one in the declaration that we're
streaming; it's the PARM_DECL from the initial forward-declaration, that
we're not streaming.  A proper fix would be to ensure that it gets
remapped to the decl in the definition we're actually emitting, but for
now this workaround fixes the bug (and any other bugs that might
manifest similarly).

PR c++/119608

gcc/cp/ChangeLog:

* module.cc (trees_out::decl_node): Maybe require by-value
walking not just when streaming.

gcc/testsuite/ChangeLog:

* g++.dg/modules/pr119608_a.C: New test.
* g++.dg/modules/pr119608_b.C: New test.

Signed-off-by: Nathaniel Shead 
---
 gcc/cp/module.cc  | 39 ---
 gcc/testsuite/g++.dg/modules/pr119608_a.C | 16 ++
 gcc/testsuite/g++.dg/modules/pr119608_b.C |  8 +
 3 files changed, 45 insertions(+), 18 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/modules/pr119608_a.C
 create mode 100644 gcc/testsuite/g++.dg/modules/pr119608_b.C

diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index ce22b2ece3f..f60d7cb8e12 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -9004,25 +9004,28 @@ trees_out::decl_node (tree decl, walk_kind ref)
if (streaming_p ())
  i (tt_parm);
tree_node (DECL_CONTEXT (decl));
-   if (streaming_p ())
- {
-   /* That must have put this in the map.  */
-   walk_kind ref = ref_node (decl);
-   if (ref != WK_none)
- // FIXME:OPTIMIZATION We can wander into bits of the
- // template this was instantiated from.  For instance
- // deferred noexcept and default parms.  Currently we'll
- // end up cloning those bits of tree.  It would be nice
- // to reference those specific nodes.  I think putting
- // those things in the map when we reference their
- // template by name.  See the note in add_indirects.
- return true;
 
-   dump (dumper::TREE)
- && dump ("Wrote %s reference %N",
-  TREE_CODE (decl) == PARM_DECL ? "parameter" : "result",
-  decl);
- }
+   /* That must have put this in the map.  */
+   walk_kind ref = ref_node (decl);
+   if (ref != WK_none)
+ // FIXME:OPTIMIZATION We can wander into bits of the
+ // template this was instantiated from, for instance
+ // deferred noexcept and default parms, or references
+ // to parms from earlier forward-decls (PR c++/119608).
+ //
+ // Currently we'll end up cloning those bits of tree. 
+ // It would be nice to reference those specific nodes.
+ // I think putting those things in the map when we
+ // reference their template by name.
+ //
+ // See the note in add_indirects.
+ return true;
+
+   if (streaming_p ())
+ dump (dumper::TREE)
+   && dump ("Wrote %s reference %N",
+TREE_CODE (decl) == PARM_DECL ? "parameter" : "result",
+decl);
   }
   return false;
 
diff --git a/gcc/testsuite/g++.dg/modules/pr119608_a.C 
b/gcc/testsuite/g++.dg/modules/pr119608_a.C
new file mode 100644
index 000..4e7b359ac58
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/pr119608_a.C
@@ -0,0 +1,16 @@
+// PR c++/119608
+// { dg-additional-options "-fmodules" }
+// { dg-module-cmi repro }
+
+export module repro;
+
+export template
+auto
+visit(
+  Visitor v
+) -> decltype(
+  v);
+
+export template auto visit(Visitor v) -> decltype(v) {
+  return {};
+}
diff --git a/gcc/testsuite/g++.dg/modules/pr119608_b.C 
b/gcc/testsuite/g++.dg/modules/pr119608_b.C
new file mode 100644
index 000..023d20adf4c
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/pr119608_b.C
@@ -0,0 +1,8 @@
+// PR c++/119608
+// { dg-additional-options "-fmodules" }
+
+import repro;
+
+int main() {
+  visit(123);
+}
-- 
2.47.0



[PATCH] tailc: Use the IPA-VRP tail call hack even for pointers [PR119614]

2025-04-03 Thread Jakub Jelinek
Hi!

As the first two testcases show, even with pointers IPA-VRP can optimize
return values from functions if they have singleton ranges into just the
exact value, so we need to virtually undo that for tail calls similarly
to integers and floats.  The third test just adds check that it works
even with floats (which it does).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2025-04-04  Jakub Jelinek  

PR tree-optimization/119614
* tree-tailcall.cc (find_tail_calls): Handle also pointer types in the
IPA-VRP workaround.

* c-c++-common/pr119614-1.c: New test.
* c-c++-common/pr119614-2.c: New test.
* c-c++-common/pr119614-3.c: New test.

--- gcc/tree-tailcall.cc.jj 2025-04-02 20:02:24.303939759 +0200
+++ gcc/tree-tailcall.cc2025-04-03 22:10:57.018048523 +0200
@@ -1036,7 +1036,9 @@ find_tail_calls (basic_block bb, struct
  && TREE_CONSTANT (ret_var))
if (tree type = gimple_range_type (call))
  if (tree callee = gimple_call_fndecl (call))
-   if ((INTEGRAL_TYPE_P (type) || SCALAR_FLOAT_TYPE_P (type))
+   if ((INTEGRAL_TYPE_P (type)
+|| SCALAR_FLOAT_TYPE_P (type)
+|| POINTER_TYPE_P (type))
&& useless_type_conversion_p (TREE_TYPE (TREE_TYPE (callee)),
  type)
&& useless_type_conversion_p (TREE_TYPE (ret_var), type)
--- gcc/testsuite/c-c++-common/pr119614-1.c.jj  2025-04-03 22:22:09.060828085 
+0200
+++ gcc/testsuite/c-c++-common/pr119614-1.c 2025-04-03 22:21:26.201416120 
+0200
@@ -0,0 +1,28 @@
+/* PR tree-optimization/119614 */
+/* { dg-do compile { target musttail } } */
+/* { dg-options "-O2" } */
+
+volatile int v;
+
+[[gnu::noinline]] const char *
+foo (int x)
+{
+  v += x;
+  return 0;
+}
+
+const char *
+bar (int x)
+{
+  if (x == 42)
+[[gnu::musttail]] return foo (42);
+  [[gnu::musttail]] return foo (32);
+}
+
+const char *
+baz (int x)
+{
+  if (x == 5)
+return foo (42);
+  return foo (32);
+}
--- gcc/testsuite/c-c++-common/pr119614-2.c.jj  2025-04-03 22:22:12.584779742 
+0200
+++ gcc/testsuite/c-c++-common/pr119614-2.c 2025-04-03 22:21:42.035198879 
+0200
@@ -0,0 +1,28 @@
+/* PR tree-optimization/119614 */
+/* { dg-do compile { target musttail } } */
+/* { dg-options "-O2" } */
+
+volatile int v;
+
+[[gnu::noinline]] const char *
+foo (int x)
+{
+  v += x;
+  return (const char *) -42;
+}
+
+const char *
+bar (int x)
+{
+  if (x == 42)
+[[gnu::musttail]] return foo (42);
+  [[gnu::musttail]] return foo (32);
+}
+
+const char *
+baz (int x)
+{
+  if (x == 5)
+return foo (42);
+  return foo (32);
+}
--- gcc/testsuite/c-c++-common/pr119614-3.c.jj  2025-04-03 22:22:17.302715007 
+0200
+++ gcc/testsuite/c-c++-common/pr119614-3.c 2025-04-03 22:21:51.423070078 
+0200
@@ -0,0 +1,28 @@
+/* PR tree-optimization/119614 */
+/* { dg-do compile { target musttail } } */
+/* { dg-options "-O2" } */
+
+volatile int v;
+
+[[gnu::noinline]] double
+foo (int x)
+{
+  v += x;
+  return 0.5;
+}
+
+double
+bar (int x)
+{
+  if (x == 42)
+[[gnu::musttail]] return foo (42);
+  [[gnu::musttail]] return foo (32);
+}
+
+double
+baz (int x)
+{
+  if (x == 5)
+return foo (42);
+  return foo (32);
+}

Jakub



Re: [Fortran, Patch, PR119380, v1] Fix freeing procedure pointers in components

2025-04-03 Thread Andre Vehreschild
Hi all,

the backport to gcc-14 has been committed as: gcc-14.2.0-996-gf955c5b409a

Regards,
Andre

On Fri, 21 Mar 2025 13:33:37 +0100
Andre Vehreschild  wrote:

> Hi Paul,
>
> well, I had those might complicated patches bit my mightily. So let's hope for
> the best :-)
>
> Thanks for the review. Committed with your proposed change in the testcase as
> gcc-15-8642-ga5c69abf138
>
> Thanks again,
>   Andre
>
> On Fri, 21 Mar 2025 10:40:11 +
> Paul Richard Thomas  wrote:
>
> > Hi Andre,
> >
> > Gosh, that's a mighty complicated patch :-) I suggest changing the comment
> > in the test case:
> >
> > s/Check that components of procedure pointer aren't freeed./Do not free
> > procedure pointer components/ or some such.
> >
> > OK for mainline and, I propose, 14-branch.
> >
> > Regards and thanks
> >
> > Paul
> >
> >
> > On Fri, 21 Mar 2025 at 09:38, Andre Vehreschild  wrote:
> >
> > > Hi all,
> > >
> > > attached patch fixes freeing of procedure pointers that are stored in a
> > > derived
> > > type's component. GFortran did that already for polymorphic types but
> > > missed
> > > out on the others.
> > >
> > > Regtested ok on x86_64-pc-linux-gnu / F41. Ok for mainline?
> > >
> > > Regards,
> > > Andre
> > > --
> > > Andre Vehreschild * Email: vehre ad gmx dot de
> > >
>
>
> --
> Andre Vehreschild * Email: vehre ad gmx dot de


--
Andre Vehreschild * Email: vehre ad gmx dot de


[RFC PATCH] Implement -fdiag-prefix-map

2025-04-03 Thread Rasmus Villemoes
In many setups, especially when CI and/or some meta-build system like
Yocto or buildroot, is involved, gcc ends up being invoked using
absolute path names, which are often long and uninteresting.

That amounts to a lot of noise both when trying to decipher the
warning or error, when the warning text is copy-pasted to a commit
fixing the issue, or posted to a mailing list. In the latter case, the
path might also reveal details that should not be public; as a made-up
example, seeing the string

  /home/ravi/customers/acme-corp/yocto/tmp/work/bird-spa/

would make Road Runner wary of any "free plumage treatment"
sign.

Removing the prefixes manually is tedious and error-prone. So similar
to the other f*-prefix-map options, provide an option allowing one to
strip/remap certain prefixes when emitting diagnostics.

[Of course, one still has to be very careful whenever there might be
confidential information in the messages, but having the prefix likely
to contain customer or project names removed automatically is helpful.]
---
This is mostly just a POC, to ask if something like this could be
implemented. It obviously lacks documentation and tests.

I've tested this very lightly, and it works as expected for the simple
cases I've tried.  But I don't know if I managed to find all the
locations that would need to call remap_diag_filename().

To make linking work, I had to do a bit of juggling, moving
file-prefix-map.o to OBJS-libcommon in order to make that function
available to the diagnostic*.cc files, and then also move the
definition of flag_canon_prefix_map to file-prefix-map.cc. It builds
for me, but maybe it's broken in some way I don't know about.

 gcc/Makefile.in   |  2 +-
 gcc/common.opt|  4 
 gcc/diagnostic-format-text.cc |  3 +++
 gcc/diagnostic.cc |  2 ++
 gcc/file-prefix-map.cc| 20 
 gcc/file-prefix-map.h |  2 ++
 gcc/lto-opts.cc   |  1 +
 gcc/opts-global.cc|  4 
 gcc/opts.cc   |  5 ++---
 9 files changed, 39 insertions(+), 4 deletions(-)

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index ebfcd8a8a0d..5d3cd24a440 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1472,7 +1472,6 @@ OBJS = \
expr.o \
ext-dce.o \
fibonacci_heap.o \
-   file-prefix-map.o \
final.o \
fixed-value.o \
fold-const.o \
@@ -1857,6 +1856,7 @@ OBJS-libcommon = diagnostic-spec.o diagnostic.o 
diagnostic-color.o \
diagnostic-path.o \
diagnostic-show-locus.o \
edit-context.o \
+   file-prefix-map.o \
pretty-print.o intl.o \
json.o json-parsing.o \
sbitmap.o \
diff --git a/gcc/common.opt b/gcc/common.opt
index b9e74cd3ac4..46e043878b4 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -1336,6 +1336,10 @@ fdebug-prefix-map=
 Common Joined RejectNegative Var(common_deferred_options) Defer
 -fdebug-prefix-map== Map one directory name to another in debug 
information.
 
+fdiag-prefix-map=
+Common Joined RejectNegative Var(common_deferred_options) Defer
+-fdiag-prefix-map==  Map one directory name to another in 
diagnostics messages.
+
 ffile-prefix-map=
 Common Joined RejectNegative Var(common_deferred_options) Defer
 -ffile-prefix-map==  Map one directory name to another in 
compilation result.
diff --git a/gcc/diagnostic-format-text.cc b/gcc/diagnostic-format-text.cc
index 9273973baaf..ae7b3c445a6 100644
--- a/gcc/diagnostic-format-text.cc
+++ b/gcc/diagnostic-format-text.cc
@@ -35,6 +35,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "diagnostic-buffer.h"
 #include "text-art/theme.h"
 #include "make-unique.h"
+#include "file-prefix-map.h"
 
 /* Disable warnings about quoting issues in the pp_xxx calls below
that (intentionally) don't follow GCC diagnostic conventions.  */
@@ -337,6 +338,8 @@ diagnostic_text_output_format::file_name_as_prefix (const 
char *f) const
   const char *locus_cs
 = colorize_start (pp_show_color (pp), "locus");
   const char *locus_ce = colorize_stop (pp_show_color (pp));
+
+  f = remap_diag_filename (f);
   return build_message_string ("%s%s:%s ", locus_cs, f, locus_ce);
 }
 
diff --git a/gcc/diagnostic.cc b/gcc/diagnostic.cc
index 07c76b6c652..1bc4319a9eb 100644
--- a/gcc/diagnostic.cc
+++ b/gcc/diagnostic.cc
@@ -50,6 +50,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "logical-location.h"
 #include "diagnostic-buffer.h"
 #include "make-unique.h"
+#include "file-prefix-map.h"
 
 #ifdef HAVE_TERMIOS_H
 # include 
@@ -761,6 +762,7 @@ diagnostic_column_policy::get_location_text (const 
expanded_location &s,
   line = s.line;
   if (show_column)
col = converted_column (s);
+  file = remap_diag_filename (file);
 }
 
   const char *line_col = maybe_line_and_column (line, col);
diff --git a/gcc/file-prefix-map.cc b/gcc/file-prefix-map.cc
index 3a77b195ae3..7ae3e7f95d5 100644
--- a/gcc/file-prefix-map.cc
+++ b/gcc/fil

Re: [PATCH v2 04/18] Add microMIPS R6 support

2025-04-03 Thread Aleksandar Rakic
HTEC Public

Hi,

> Umm, this change can't go forward I'm afraid without microMIPSr6
> support landing in binutils first.  Otherwise we have no consumer
> for compiled code available.

We have sent a patch series to binutils:
https://sourceware.org/pipermail/binutils/2025-April/140356.html
It includes adding microMIPSR6 support.

>  Also is there a way to verify this target, e.g. QEMU?

There is a CPU supporting microMIPS32 Release 6 ISA:
https://gitlab.com/qemu-project/qemu/-/commit/4b3bcd0

Kind regards,
Aleksandar


Re: [PATCH] sra: Avoid creating TBAA hazards (PR118924)

2025-04-03 Thread Jan Hubicka
> > So in WPA we can not assume that TYPE_CANONICAL (A) == TYPE_CANONICAL
> > (B) is forever.  We also don't do any gimple transforms here, so this is
> > kind of safe, but ugly.
> 
> Hmm.  But we do
> 
>   /* alias_ptr_types_compatible_p relies on fact that during LTO
>  types do not get refined from WPA time to ltrans.  */
>   bp_pack_value (bp, flag_wpa && TYPE_CANONICAL (expr) 
>  ? TYPE_CXX_ODR_P (TYPE_CANONICAL (expr))
>  : TYPE_CXX_ODR_P (expr), 1);
> 
> which I thought would save us from that issue (unless TYPE_CANONICAL
> ends up as the ODR type?).

Interesting.  I sort of remember working on this, but do not remember it
got into mainline.

TYPE_CANONICAL should not be ODR in this case, since we first compute
canonical typs for non-ODR types only and then walk ODR types and those
conflicting with known non-ODR type get non-ODR canonical.
> 
> That said, ideally we'd not re-compute TYPE_CANONICAL during LTRANS
> (but we want to avoid pulling in the canonical itself, just choose
> one that gets put into the LTRANS unit from the set of necessary
> types having the same TYPE_CANONICAL).
> 
> > I think types_equal_for_same_type_for_tbaa_p can simply use flag_wpa and
> > we can drop that parameter?  I guess that would be less confusing, since
> > the name is kind of misleading anyway.
> >
> > If the fact that WPA may see worse TBAA then ltrans will end up hurting
> > in pratcice we may add extra flag to ODR types when canonical type
> > computation should ignore it and stream it WPA->ltrans to disable the
> > extra precision.
> 
> See above ...?
Looking into history the streaming code above got into mainline
(shortly) after I added the ao-ref compare logic to fix ICF issues.
The patch was fixing profiledbootstrap.
https://gcc.gnu.org/pipermail/gcc-cvs/2020-November/337265.html
So probably the check for lto_streaming_safe in
https://gcc.gnu.org/pipermail/gcc-cvs/2020-November/337265.html can be
dropped next stage1. 

Also remember that with Maritn Liska we had checker that canonical types
do not get refined, which never passed fully (the frontend guided
TYPE_CANONICALs differs in interesting side cases to those computed by
lto canonical type hash), so that is something to get back to as well :(

Honza
> 
> Richard.


RE: [PATCH] rtlanal, i386: Adjust pattern_cost and x86 constant cost [PR115910]

2025-04-03 Thread Roger Sayle


Hi Jakub,
I've been bootstrapping and regression testing a variant of Jakub's
previously proposed change:

diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index f38e3db41fa..123948a2dea 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -21883,7 +21883,10 @@ ix86_rtx_costs (rtx x, machine_mode mode, int
outer_cod
e_i, int opno,
 case SYMBOL_REF:
   if (x86_64_immediate_operand (x, VOIDmode))
*total = 0;
- else
+  else if (x86_64_zext_immediate_operand (x, VOIDmode))
+   /* movl is as expensive as a simple instruction.  */
+   *total = COSTS_N_INSNS (1);
+  else
/* movabsq is slightly more expensive than a simple instruction. */
*total = COSTS_N_INSNS (1) + 1;
   return true;

Which more accurately models the costs (matching the three types of operands
he describes).
The plan was to then investigate the RTL expansion, to improve shift
generation, so that
expand sees the "actual" instruction count, and make a more informed
decision on which 
division implementation to use.


p.s. It's odd this is a P1.  It's not wrong code, but a single-cycle
instruction timing issue
(where the current trunk implementation is typically more correct than GCC
14's).

Thoughts?


> -Original Message-
> From: Jakub Jelinek 
> Sent: 02 April 2025 12:30
> To: Richard Biener ; Jan Hubicka ; Uros
Bizjak
> ; Roger Sayle ; Richard
> Sandiford 
> Cc: gcc-patches@gcc.gnu.org
> Subject: [PATCH] rtlanal, i386: Adjust pattern_cost and x86 constant cost
> [PR115910]
> 
> Hi!
> 
> Below is an attempt to fix up RTX costing P1 caused by r15-775
> https://gcc.gnu.org/pipermail/gcc-patches/2024-May/thread.html#652446
> @@ -21562,7 +21562,8 @@ ix86_rtx_costs (rtx x, machine_mode mode, int
> outer_code_i, int opno,
>if (x86_64_immediate_operand (x, VOIDmode))
>   *total = 0;
>   else
> - *total = 1;
> + /* movabsq is slightly more expensive than a simple instruction. */
> + *total = COSTS_N_INSNS (1) + 1;
>return true;
> 
>  case CONST_DOUBLE:
> change.  In my understanding this was partially trying to workaround weird
code
> in pattern_cost, which uses
>   return cost > 0 ? cost : COSTS_N_INSNS (1); That doesn't make sense to
me.  All
> costs smaller than COSTS_N_INSNS (1) mean we need to have at least one
> instruction there which has the COSTS_N_INSNS (1) minimal cost.  So
special
> casing just cost 0 for the really cheap immediates which can be used
pretty much
> everywhere but not ones which have just tiny bit larger cost than that (1,
2 or 3) is
> just weird.
> 
> So, the following patch changes that to MAX (COSTS_N_INSNS (1), cost)
which
> doesn't have this weird behavior where set_src_cost 0 is considered more
> expensive than set_src_cost 1.
> 
> Note, pattern_cost isn't the only spot where costs are computed and
normally we
> often sum the subcosts of different parts of a pattern or just query rtx
costs of
> different parts of subexpressions, so the jump from
> 1 to 5 is quite significant.
> 
> Additionally, x86_64 doesn't have just 2 kinds of constants with different
costs, it
> has 3, signed 32-bit ones are the ones which can appear in almost all
instructions
> and so using cost of 0 for those looks best, then unsigned 32-bit ones
which can be
> done with still cheap movl instruction (and I think some others too) and
finally full
> 64-bit ones which can be done only with a single movabsq instruction and
are
> quite costly both in instruction size and even more expensive to execute.
> 
> The following patch attempts to restore the behavior of GCC 14 with the
> pattern_cost hunk fixed for the unsigned 32-bit ones and only keeps the
bigger
> cost for the 64-bit ones.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, bootstraps/regtests
in
> progress on aarch64-linux, powerpc64le-linux and s390x-linux.
> 
> I don't have a setup to test this on SPEC though.
> 
> Your thoughts on this?
> 
> 2025-04-02  Jakub Jelinek  
> 
>   PR target/115910
>   * rtlanal.cc (pattern_cost): Return at least COSTS_N_INSNS (1)
>   rather than just COSTS_N_INTNS (1) for cost <= 0.
>   * config/i386/i386.cc (ix86_rtx_costs): Set *total to 1 for
>   TARGET_64BIT x86_64_zext_immediate_operand constants.
> 
>   * gcc.target/i386/pr115910.c: New test.
> 
> --- gcc/rtlanal.cc.jj 2025-03-06 11:08:20.729230232 +0100
> +++ gcc/rtlanal.cc2025-04-02 12:08:33.327409772 +0200
> @@ -5772,7 +5772,7 @@ pattern_cost (rtx pat, bool speed)
>  return 0;
> 
>cost = set_src_cost (SET_SRC (set), GET_MODE (SET_DEST (set)), speed);
> -  return cost > 0 ? cost : COSTS_N_INSNS (1);
> +  return MAX (COSTS_N_INSNS (1), cost);
>  }
> 
>  /* Calculate the cost of a single instruction.  A return value of zero
> --- gcc/config/i386/i386.cc.jj2025-03-27 23:35:17.798315113 +0100
> +++ gcc/config/i386/i386.cc   2025-04-02 12:14:25.522539997 +0200
> @@ -21883,7 +21883,11 @@ ix86_rtx_costs (rtx x, machine_mod

Re: [PATCH] sra: Avoid creating TBAA hazards (PR118924)

2025-04-03 Thread Jan Hubicka
> > > So in WPA we can not assume that TYPE_CANONICAL (A) == TYPE_CANONICAL
> > > (B) is forever.  We also don't do any gimple transforms here, so this is
> > > kind of safe, but ugly.
> > 
> > Hmm.  But we do
> > 
> >   /* alias_ptr_types_compatible_p relies on fact that during LTO
> >  types do not get refined from WPA time to ltrans.  */
> >   bp_pack_value (bp, flag_wpa && TYPE_CANONICAL (expr) 
> >  ? TYPE_CXX_ODR_P (TYPE_CANONICAL (expr))
> >  : TYPE_CXX_ODR_P (expr), 1);
> > 
> > which I thought would save us from that issue (unless TYPE_CANONICAL
> > ends up as the ODR type?).
> 
> Interesting.  I sort of remember working on this, but do not remember it
> got into mainline.
> 
> TYPE_CANONICAL should not be ODR in this case, since we first compute
> canonical typs for non-ODR types only and then walk ODR types and those
> conflicting with known non-ODR type get non-ODR canonical.
> > 
> > That said, ideally we'd not re-compute TYPE_CANONICAL during LTRANS
> > (but we want to avoid pulling in the canonical itself, just choose
> > one that gets put into the LTRANS unit from the set of necessary
> > types having the same TYPE_CANONICAL).
> > 
> > > I think types_equal_for_same_type_for_tbaa_p can simply use flag_wpa and
> > > we can drop that parameter?  I guess that would be less confusing, since
> > > the name is kind of misleading anyway.
> > >
> > > If the fact that WPA may see worse TBAA then ltrans will end up hurting
> > > in pratcice we may add extra flag to ODR types when canonical type
> > > computation should ignore it and stream it WPA->ltrans to disable the
> > > extra precision.
> > 
> > See above ...?
> Looking into history the streaming code above got into mainline
> (shortly) after I added the ao-ref compare logic to fix ICF issues.
> The patch was fixing profiledbootstrap.
> https://gcc.gnu.org/pipermail/gcc-cvs/2020-November/337265.html
> So probably the check for lto_streaming_safe in
> https://gcc.gnu.org/pipermail/gcc-cvs/2020-November/337265.html can be

^^^ wrong cut and paste, I meant types_equal_for_same_type_for_tbaa_p

Thanks for noticing this.
Honza
> dropped next stage1. 
> 
> Also remember that with Maritn Liska we had checker that canonical types
> do not get refined, which never passed fully (the frontend guided
> TYPE_CANONICALs differs in interesting side cases to those computed by
> lto canonical type hash), so that is something to get back to as well :(
> 
> Honza
> > 
> > Richard.


Re: [PATCH] sra: Avoid creating TBAA hazards (PR118924)

2025-04-03 Thread Jan Hubicka
> On Mon, 31 Mar 2025, Martin Jambor wrote:
> 
> > Hi,
> > 
> > the testcase in PR 118924, when compiled on Aarch64, contains an
> > gimple aggregate assignment statement in between different types which
> > are types_compatible_p but behave differently for the purposes of
> > alias analysis.
> > 
> > SRA replaces the statement with a series of scalar assignments which
> > however have LHSs access chains modeled on the RHS type and so do not
> > alias with a subsequent reads and so are DSEd.
> > 
> > SRA clearly gets its "same_access_path" logic subtly wrong.  One issue
> > is that the same_access_path_p function probably should be implemented
> > more along the lines of (parts of ao_compare::compare_ao_refs) instead
> > of internally relying on operand_equal_p.  That is however not the
> > problem in the PR and so I will deal with it only later.
> > 
> > The issue here is that even when the access path is the same, it must
> > not be bolted on an aggregate type that does not match.  This patch
> > does that, taking just one simple function from the
> > ao_compare::compare_ao_refs machinery and using it to detect the
> > situation.  The rest is just merging the information in between
> > accesses of the same access group.
> > 
> > I looked at how many times we come across such assignment during
> > "make stage2-bubble" of GCC (configured with only c and C++ and
> > without multilib and libsanitizers) and on an x86_64 there were 87924
> > such assignments (though now I realize not all of them had to be
> > aggregate), so they do happen.  The patch leads to about 5% increase
> > of cases where we don't use an "access path" but resort to a
> > MEM_REF (from 90209 to 95204).  On an Aarch64, there were 92268 such
> > assignments and the increase of falling back to MEM_REFs was by
> > 4% (but from a bigger base 132983 to 107991).
> > 
> > Bootstrapped on x86_64-linux and Aarch64-linux.  OK for master and then
> > for all active release branches?
> 
> I'm unsure about the lto_streaming_safe arg, in fact the implementation is
> 
>   if (lto_streaming_safe)
> return type1 == type2;
>   else
> return TYPE_CANONICAL (type1) == TYPE_CANONICAL (type2);
> 
> but I think we guarantee (we _need_ to guarantee!) that when
> TYPE_CANONICAL (type1) == TYPE_CANONICAL (type2), after LTO streaming
> it's still TYPE_CANONICAL (type1) == TYPE_CANONICAL (type2).  Otherwise
> assignments previously valid GIMPLE might no longer be valid and
> things that aliased previously might no longer alias.

The reason why we can not test TYPE_CANONICAL equivalence here is the
way ODR types work.  Suppose we have type A, B and C which are
structurally equiavalent and suppose that A, B has ODR name, while C is
from non-C++ source w/o ODR.

Type merging will notice ODR/non-ODR conflict and type canonical will be
the same.  Now the program will get partitioned and if partition uses A
and B, but not C, the canonical type computation in ltrans will have
different canonicals for A and B.

So in WPA we can not assume that TYPE_CANONICAL (A) == TYPE_CANONICAL
(B) is forever.  We also don't do any gimple transforms here, so this is
kind of safe, but ugly.

I think types_equal_for_same_type_for_tbaa_p can simply use flag_wpa and
we can drop that parameter?  I guess that would be less confusing, since
the name is kind of misleading anyway.

If the fact that WPA may see worse TBAA then ltrans will end up hurting
in pratcice we may add extra flag to ODR types when canonical type
computation should ignore it and stream it WPA->ltrans to disable the
extra precision.

Honza


Re: [PATCH] combine: Allow 2->2 combos but limit insn searches [PR116398]

2025-04-03 Thread Richard Sandiford
Richard Biener  writes:
> On Wed, 2 Apr 2025, Richard Sandiford wrote:
>
>> Richard Biener  writes:
>> > On Tue, 1 Apr 2025, Richard Sandiford wrote:
>> >
>> >> The problem in PR101523 was that, after each successful 2->2
>> >> combination attempt, distribute_links would search further and
>> >> further for the next combinable use of the i2 destination.
>> >> The original patch for the PR dealt with that by disallowing such
>> >> combinations.  However, that led to various optimisation regressions,
>> >> so there was interest in allowing the combinations again, at least
>> >> until an alternative way of getting the same results is in place.
>> >> 
>> >> This patch does that, but limits distribute_links to a certain number
>> >> of instructions when i2 is unchanged.  As Segher said in the PR trail,
>> >> it would make more conceptual sense to apply the limit unconditionally,
>> >> but I thought it would be better to change as little as possible at
>> >> this development stage.  Logically, in stage 1, the --param should
>> >> be applied directly by distribute_links with no input from callers.
>> >> 
>> >> As I mentioned in:
>> >> 
>> >>   https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116398#c28
>> >> 
>> >> I think it's safe to drop log links even if a use exists.  All
>> >> processing of log links seems to handle the absence of a link
>> >> for a particular register in a conservative way.
>> >> 
>> >> The initial set-up errs on the side of dropping links, since for example
>> >> create_log_links has:
>> >> 
>> >>  /* flow.c claimed:
>> >> 
>> >>  We don't build a LOG_LINK for hard registers contained
>> >>  in ASM_OPERANDs.  If these registers get replaced,
>> >>  we might wind up changing the semantics of the insn,
>> >>  even if reload can make what appear to be valid
>> >>  assignments later.  */
>> >>   if (regno < FIRST_PSEUDO_REGISTER
>> >>   && asm_noperands (PATTERN (use_insn)) >= 0)
>> >> continue;
>> >> 
>> >> which excludes combinations by dropping log links, rather than during
>> >> try_combine.  And:
>> >> 
>> >>   /* If this register is being initialized using itself, and the
>> >>  register is uninitialized in this basic block, and there are
>> >>  no LOG_LINKS which set the register, then part of the
>> >>  register is uninitialized.  In that case we can't assume
>> >>  anything about the number of nonzero bits.
>> >> 
>> >>  ??? We could do better if we checked this in
>> >>  reg_{nonzero_bits,num_sign_bit_copies}_for_combine.  Then we
>> >>  could avoid making assumptions about the insn which initially
>> >>  sets the register, while still using the information in other
>> >>  insns.  We would have to be careful to check every insn
>> >>  involved in the combination.  */
>> >> 
>> >>   if (insn
>> >>   && reg_referenced_p (x, PATTERN (insn))
>> >>   && !REGNO_REG_SET_P (DF_LR_IN (BLOCK_FOR_INSN (insn)),
>> >>REGNO (x)))
>> >> {
>> >>   struct insn_link *link;
>> >> 
>> >>   FOR_EACH_LOG_LINK (link, insn)
>> >> if (dead_or_set_p (link->insn, x))
>> >>   break;
>> >>   if (!link)
>> >> {
>> >>   rsp->nonzero_bits = GET_MODE_MASK (mode);
>> >>   rsp->sign_bit_copies = 1;
>> >>   return;
>> >> }
>> >> }
>> >> 
>> >> treats the lack of a log link is a possible sign of uninitialised data,
>> >> but that would be a missed optimisation rather than a correctness issue.
>> >> 
>> >> One question is what the default --param value should be.  I went with
>> >> Jakub's suggestion of 3000 from:
>> >> 
>> >>   https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116398#c25
>> >> 
>> >> Also, to answer Jakub's question in that comment, I tried bisecting:
>> >> 
>> >>   int limit = atoi (getenv ("BISECT"));
>> >> 
>> >> (so applying the limit for all calls from try_combine) with an
>> >> abort in distribute_links if the limit caused a link to be skipped.
>> >> The minimum BISECT value that allowed an aarch64-linux-gnu bootstrap
>> >> to succeed with --enable-languages=all --enable-checking=yes,rtl,extra
>> >> was 142, so much lower than the parameter value.  I realised too late
>> >> that --enable-checking=release would probably have been a more
>> >> interesting test.
>> >> 
>> >> Some of the try_combine changes come from Richi's patch in:
>> >> 
>> >>   https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523#c53
>> >> 
>> >> Bootstrapped & regression-tested on aarch64-linux-gnu,
>> >> powerpc64le-linux-gnu and x86_64-linux-gnu.  OK to install?
>> >> 
>> >> Richard
>> >> 
>> >> 
>> >> gcc/
>> >>   PR testsuite/116398
>> >>   * params.opt (-param=max-combine-search-insns=): New param.
>> >>   * doc/invoke.texi: Document it.
>> >> 

Make ipa-cp propagate over non-hot calls

2025-04-03 Thread Jan Hubicka
Hi,
Currently enabling profile feedback regresses x264 and exchange. In both cases 
the root of the
issue is that ipa-cp cost model thinks cloning is not relevant when feedback is 
available
while it clones without feedback.

Consider:

__attribute__ ((used))
int a[1000];

__attribute__ ((noinline))
void
test2(int sz)
{
  for (int i = 0; i < sz; i++)
  a[i]++;
  asm volatile (""::"m"(a));
}

__attribute__ ((noinline))
void
test1 (int sz)
{
  for (int i = 0; i < 1000; i++)
  test2(sz);
}
int main()
{
test1(1000);
return 0;
}

Here we want to clone call both test1 and test2 and specialize for 1000, but
ipa-cp will not do that, since it will skip call main->test1 as not hot since
it is called just once both with or without profile feedback.
In this simple testcase even without profile feedback we will track that main
is called once.

I think the testcase shows that hotness of call is not that relevant when
deciding whether we want to propagate constants across it.  ipa-cp with IPA
profile can compute overall estimate of time saved (which is existing time
benefit computing time saved per invociation of the function multiplied by
number of executions) and see if result is big enough. An easy check is to
simply call maybe_hot_p on the resulting count. 

So this patch makes ipa-cp to consider all calls sites except those known to be
unlikely executed (i.e. run 0 times in train run or known to lead to someting
bad) as interesting, which makes ipa-cp to propagate across them, find cloning
candidates and feed them into good_clonning_oppurtunity.  

For this I added cs_interesting_for_ipcp_p which also attempts to do right
thing with partial training.

Now good_clonning_oppurtunity will currently return false, since it will figure
out that the call edge is not very frequent.
It already kind of knows that frequency of call instruction istself is not too
important, but instead of computing overall time saved, it tries to compare it
with param_ipa_cp_profile_count_base percentage of counts of call edges.  I
think this is not very relevant since estimated time saved per call can be
large.  So I dropped this logic and replaced it with simple use of overall
saved time.

Since ipa-cp is not dealing well with the cases where it hits the allowed unit
growth limit, we probably want to be more careful, so I keep existing metric
with this change.

So now we get:

Evaluating opportunities for test1/3.
 - considering value 1000 for param #0 sz (caller_count: 1)
 good_cloning_opportunity_p (time: 1, size: 8, count_sum: 1 (precise), 
overall time saved: 1 (adjusted)) -> evaluation: 0.12, threshold: 500
 not cloning: time saved is not hot
 good_cloning_opportunity_p (time: 129001, size: 20, count_sum: 1 
(precise), overall time saved: 129001 (adjusted)) -> evaluation: 6450.05, 
threshold: 500

First call to good_cloning_oppurtunity considers the case where only test1 is
clonned. In this case time saved is 1 (for passing the value around) and since
it is called just once (count_sum) overall time saved is 1 which is not
considered hot and we also get very low evaulation score.

In the second call we consider cloning chain test1->test2.  In this case time
saved is large (12901) since test2 is invoked many times and it is used to
controll the loop.  We still know that the count is 1 but overall time is
129001 which is already considered relevant and we clone.

I also try to do something sensible in case we have calls both with
and without IPA profile (which can happen for comdats where profile got missing
or with LTO if some units were not trained).
Instead of checking whether sum of calls with known profile is nonzero, I keep
track if there are other calls and if so, also try the local heuristics that
is used without profile feedback.

The patch improves SPECint with -Ofast -fprofile-use by approx 1% by speeding
up x264 from 99.3s to 91.3s (9%) and exchange from 99.7s to 95.5s (3.3%).

We still get better x264 runtime without profile (86.4s for x264 and 93.8 for 
exchange).

The main problem I see is that ipa-cp has the global limit for growth of 10%
but does not consider the oppurtunities in priority order.  Consequently if the
limit is hit, randomly some clone oppurtunities are dropped in favour of
others.

I dumped unit size changes with -flto -Ofast build of SPEC2017. Without patch I 
get:

orignew growth
588677  605385  102.838229  
43786037137.894016  
484650  494851  102.104818  
41114111100.00  
99953   103519  103.567677  
106181  114889  108.201091  
21389   21597   100.972462  
24925   26746   107.305918  
15308   23974   156.610922  
27354   27906   102.017986  
494 494 100.00  
46314631100.00  
863216  872729  101.102042  
126604  126604  100.00  
605138  627156  103.638509  
41124112100.00  
222006  231293  104.183220  
29523384114.634146

Re: [PATCH] libstdc++: Fix handling of field width for wide strings and characters [PR119593]

2025-04-03 Thread Jonathan Wakely
On Thu, 3 Apr 2025 at 13:55, Tomasz Kaminski  wrote:
>
>
>
> On Thu, Apr 3, 2025 at 2:49 PM Jonathan Wakely  wrote:
>>
>> On Thu, 3 Apr 2025 at 10:24, Jonathan Wakely  wrote:
>> >
>> > On Thu, 3 Apr 2025 at 09:55, Tomasz Kamiński  wrote:
>> > >
>> > > This patch corrects handling of UTF-32LE and UTF32-BE in
>> > > __unicode::__literal_encoding_is_unicode<_CharT>, so they are
>> > > recognized as unicode and functions produces correct result for wchar_t.
>> > >
>> > > Use `__unicode::__field_width` to compute the estimated witdh
>> >
>> > "width"
>> >
>> > > of the charcter for unicode wide encoding.
>> >
>> > "character"
>> >
>> > >
>> > > PR libstdc++-v3/119593
>> > >
>> > > libstdc++-v3/ChangeLog:
>> > >
>> > > * include/bits/unicode.h
>> > > (__unicode::__literal_encoding_is_unicode<_CharT>):
>> > > Corrected handing for UTF-16 and UTF-32 with "LE" or "BE" suffix.
>> > > * include/std/format (__formatter_str::_S_character_width):
>> > > Define.
>> > > (__formatter_str::_S_character_width): Updated passed char
>> > > length.
>> > > * testsuite/std/format/functions/format.cc: Test for wchar_t.
>> > > ---
>> > > Testing on x86_64-linux. OK for trunk?
>> > > I believe we should backport it, given that all wchar_t uses are
>> > > impacted.
>> > >
>> > >  libstdc++-v3/include/bits/unicode.h   |  2 ++
>> > >  libstdc++-v3/include/std/format   | 15 ++-
>> > >  .../testsuite/std/format/functions/format.cc  |  8 ++--
>> > >  3 files changed, 22 insertions(+), 3 deletions(-)
>> > >
>> > > diff --git a/libstdc++-v3/include/bits/unicode.h 
>> > > b/libstdc++-v3/include/bits/unicode.h
>> > > index 24b1ac3d53d..99d972eccff 100644
>> > > --- a/libstdc++-v3/include/bits/unicode.h
>> > > +++ b/libstdc++-v3/include/bits/unicode.h
>> > > @@ -1039,6 +1039,8 @@ inline namespace __v16_0_0
>> > >   string_view __s(__enc);
>> > >   if (__s.ends_with("//"))
>> > > __s.remove_suffix(2);
>> > > + if (__s.ends_with("LE") || __s.ends_with("BE"))
>> > > +   __s.remove_suffix(2);
>> > >   return __s == "16" || __s == "32";
>> > > }
>> > > }
>> > > diff --git a/libstdc++-v3/include/std/format 
>> > > b/libstdc++-v3/include/std/format
>> > > index c3327e1d384..603facc51de 100644
>> > > --- a/libstdc++-v3/include/std/format
>> > > +++ b/libstdc++-v3/include/std/format
>> > > @@ -1277,12 +1277,25 @@ namespace __format
>> > >   _M_spec);
>> > > }
>> > >
>> >
>> > Please put [[__gnu__::__always_inline__]] on this function, so that it
>> > doesn't add any overhead for narrow chars:
>> >
>> > > +  static size_t
>> > > +  _S_character_width(_CharT __c)
>> > > +  {
>> > > +   using __unicode::__literal_encoding_is_unicode;
>> > > +   // N.B. single byte cannot encode charcter of width greater than 
>> > > 1
>> > > +   if (sizeof(_CharT) > 1u && 
>> > > __literal_encoding_is_unicode<_CharT>())
>> >
>> > I think this can be 'if constexpr'
>> >
>> > OK for trunk and gcc-14 with those changes, thanks.
>> > (No backport for gcc-13 because it doesn't have the Unicode-aware
>> > field width support.)
>> >
>> > > + return __unicode::__field_width(__c);
>> > > +   else
>> > > + return 1u;
>> > > +  }
>> > > +
>> > >template
>> > > typename basic_format_context<_Out, _CharT>::iterator
>> > > _M_format_character(_CharT __c,
>> > >   basic_format_context<_Out, _CharT>& __fc) const
>> > > {
>> > > - return __format::__write_padded_as_spec({&__c, 1u}, 1, __fc, 
>> > > _M_spec);
>> > > + return __format::__write_padded_as_spec({&__c, 1u},
>> > > + 
>> > > _S_character_width(__c),
>> > > + __fc, _M_spec);
>> > > }
>> > >
>> > >template
>> > > diff --git a/libstdc++-v3/testsuite/std/format/functions/format.cc 
>> > > b/libstdc++-v3/testsuite/std/format/functions/format.cc
>> > > index 7fc42017045..d8dbf463413 100644
>> > > --- a/libstdc++-v3/testsuite/std/format/functions/format.cc
>> > > +++ b/libstdc++-v3/testsuite/std/format/functions/format.cc
>> > > @@ -501,9 +501,14 @@ test_unicode()
>> > >  {
>> > >// Similar to sC example in test_std_examples, but not from the 
>> > > standard.
>> > >// Verify that the character "🤡" has estimated field width 2,
>> > > -  // rather than estimated field width equal to strlen("🤡"), which 
>> > > would be 4.
>> > > +  // rather than estimated field width equal to strlen("🤡"), which 
>> > > would be 4,
>> > > +  // or just width 1 for single character.
>> > >std::string sC = std::format("{:*<3}", "🤡");
>> > >VERIFY( sC == "🤡*" );
>> > > +  std::wstring wsC = std::format(L"{:*<3}", L"🤡");
>> > > +  VERIFY( wsC == L"🤡*" 

Re: Pushed r15-9167: [PATCH] LoongArch: Make gen-evolution.awk compatible with FreeBSD awk

2025-04-03 Thread Jinyang He

On 2025-04-03 11:37, Lulu Cheng wrote:



在 2025/4/3 上午11:12, Xi Ruoyao 写道:

On Thu, 2025-04-03 at 10:13 +0800, Lulu Cheng wrote:

在 2025/4/2 上午11:19, Xi Ruoyao 写道:

Avoid using gensub that FreeBSD awk lacks, use gsub and split those
each
of gawk, mawk, and FreeBSD awk provides.

Reported-by: mp...@vip.163.com
Link: https://man.freebsd.org/cgi/man.cgi?query=awk

gcc/ChangeLog:

* config/loongarch/genopts/gen-evolution.awk: Avoid using
gensub
that FreeBSD awk lacks.
---

Manually tested the script with gawk and FreeBSD awk.  Ok for trunk?

OK.

Thanks!

Pushed now.


Could you backpoint it  to gcc 14?

Thanks.


gcc/config/loongarch/genopts/gen-evolution.awk | 8 +---
   1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/gcc/config/loongarch/genopts/gen-evolution.awk
b/gcc/config/loongarch/genopts/gen-evolution.awk
index bf16b26760e..142b658fe7a 100644
--- a/gcc/config/loongarch/genopts/gen-evolution.awk
+++ b/gcc/config/loongarch/genopts/gen-evolution.awk
@@ -33,10 +33,12 @@ BEGIN {
   {
   cpucfg_word[NR] = $1
   cpucfg_bit_in_word[NR] = $2
-    name[NR] = gensub(/-/, "_", "g", $3)
+    name[NR] = $3
+    gsub("-", "_", name[NR])
   name_capitalized[NR] = toupper(name[NR])
-    isa_version_major[NR] = gensub(/^([1-9][0-9]*)\.([0-9]+)$/,
"\\1", 1, $4)
-    isa_version_minor[NR] = gensub(/^([1-9][0-9]*)\.([0-9]+)$/,
"\\2", 1, $4)
+    split($4, isa_ver, "\\.")
+    isa_version_major[NR] = isa_ver[1]
+    isa_version_minor[NR] = isa_ver[2]
      $1 = $2 = $3 = $4 = ""
   sub (/^\s*/, "")

Hi, Ruoyao and Lulu,

When I cross-build gcc on ubuntu I triggered `illegal reference to array 
idx_list` at asort by mawk.


$ cd gcc/config/loongarch/genopts
$ mawk -v header_p=1 -f gen-evolution.awk isa-evolution.in
mawk: gen-evolution.awk: line 104: illegal reference to array idx_list

$ mawk --version
mawk 1.3.4 20240123
Copyright 2008-2023,2024, Thomas E. Dickey
Copyright 1991-1996,2014, Michael D. Brennan

random-funcs:   arc4random_stir/arc4random
regex-funcs:    internal

compiled limits:
sprintf buffer  8192
maximum-integer 9223372036854775808




Re: [PATCH] rtlanal, i386: Adjust pattern_cost and x86 constant cost [PR115910]

2025-04-03 Thread Jakub Jelinek
On Thu, Apr 03, 2025 at 08:42:01AM -0400, Roger Sayle wrote:
> I've been bootstrapping and regression testing a variant of Jakub's
> previously proposed change:
> 
> diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> index f38e3db41fa..123948a2dea 100644
> --- a/gcc/config/i386/i386.cc
> +++ b/gcc/config/i386/i386.cc
> @@ -21883,7 +21883,10 @@ ix86_rtx_costs (rtx x, machine_mode mode, int
> outer_cod
> e_i, int opno,
>  case SYMBOL_REF:
>if (x86_64_immediate_operand (x, VOIDmode))
> *total = 0;
> - else
> +  else if (x86_64_zext_immediate_operand (x, VOIDmode))
> +   /* movl is as expensive as a simple instruction.  */
> +   *total = COSTS_N_INSNS (1);
> +  else
> /* movabsq is slightly more expensive than a simple instruction. */
> *total = COSTS_N_INSNS (1) + 1;
>return true;

This breaks the testcase again.
With my version of the patch it is
;; positive division: unsigned cost: 26; signed cost: 28
with your adjustment
;; positive division: unsigned cost: 29; signed cost: 28
and with vanilla trunk
;; positive division: unsigned cost: 30; signed cost: 28
So, we'd need to put more effort into tweaking this.
My preference would be in stage1, the patch restores the GCC 14 cost
for the unsigned 32-bit immediates for normal rtx_cost and fixes up the
weird behavior of pattern_cost everywhere where cost [1,3] is less
expensive than cost 0.

Also, compared to movl I'd then use something like COSTS_N_INSNS (1) + 3
for movabsq.

> p.s. It's odd this is a P1.  It's not wrong code, but a single-cycle
> instruction timing issue
> (where the current trunk implementation is typically more correct than GCC
> 14's).

It is not odd, it is a regression on primary platform and clearly it
doesn't affect just that one testcase but most likely many of the unsigned
vs. signed division decisions and clearly other code as well (see PR119594
for another testcase affected by that).

Jakub



[committed] cobol: Additional cobol.dg/group2 testcases

2025-04-03 Thread Robert Dubner
cobol: New testcases for INSPECT statement.

gcc/testsuite

* cobol.dg/group2/INSPECT_BACKWARD_REPLACING_LEADING.cob: New
testcase.
* cobol.dg/group2/INSPECT_BACKWARD_REPLACING_TRAILING.cob:
Likewise.
* cobol.dg/group2/INSPECT_BACKWARD_simple_CONVERTING.cob:
Likewise.
* cobol.dg/group2/INSPECT_BACKWARD_simple_REPLACING.cob:
Likewise.
* cobol.dg/group2/INSPECT_BACKWARD_simple_TALLYING.cob:
Likewise.
* cobol.dg/group2/INSPECT_CONVERTING_NULL.cob: Likewise.
*
cobol.dg/group2/INSPECT_CONVERTING_TO_figurative_constant.cob: Likewise.
*
cobol.dg/group2/INSPECT_CONVERTING_TO_figurative_constants.cob: Likewise.
* cobol.dg/group2/INSPECT_ISO_Example_1.cob: Likewise.
* cobol.dg/group2/INSPECT_ISO_Example_2.cob: Likewise.
* cobol.dg/group2/INSPECT_ISO_Example_3.cob: Likewise.
* cobol.dg/group2/INSPECT_ISO_Example_4.cob: Likewise.
* cobol.dg/group2/INSPECT_ISO_Example_5.cob: Likewise.
* cobol.dg/group2/INSPECT_ISO_Example_5-f.cob: Likewise.
* cobol.dg/group2/INSPECT_ISO_Example_5-r.cob: Likewise.
* cobol.dg/group2/INSPECT_ISO_Example_6.cob: Likewise.
* cobol.dg/group2/INSPECT_ISO_Example_7.cob: Likewise.
* cobol.dg/group2/INSPECT_No_repeat_conversion_check.cob:
Likewise.
* cobol.dg/group2/INSPECT_REPLACING_figurative_constant.cob:
Likewise.
*
cobol.dg/group2/INSPECT_REPLACING_LEADING_ZEROS_BY_SPACES.cob: Likewise.
* cobol.dg/group2/INSPECT_TALLYING_AFTER.cob: Likewise.
* cobol.dg/group2/INSPECT_TALLYING_BEFORE.cob: Likewise.
* cobol.dg/group2/INSPECT_TALLYING_REPLACING_ISO_Example.cob:
Likewise.
* cobol.dg/group2/INSPECT_TRAILING.cob: Likewise.
* cobol.dg/group2/INSPECT_BACKWARD_REPLACING_LEADING.out: New
known-good result.
* cobol.dg/group2/INSPECT_BACKWARD_REPLACING_TRAILING.out:
Likewise.
* cobol.dg/group2/INSPECT_BACKWARD_simple_CONVERTING.out:
Likewise.
* cobol.dg/group2/INSPECT_BACKWARD_simple_REPLACING.out:
Likewise.
* cobol.dg/group2/INSPECT_BACKWARD_simple_TALLYING.out:
Likewise.
*
cobol.dg/group2/INSPECT_CONVERTING_TO_figurative_constants.out: Likewise.
* cobol.dg/group2/INSPECT_ISO_Example_1.out: Likewise.
* cobol.dg/group2/INSPECT_ISO_Example_2.out: Likewise.
* cobol.dg/group2/INSPECT_ISO_Example_3.out: Likewise.
* cobol.dg/group2/INSPECT_ISO_Example_4.out: Likewise.
* cobol.dg/group2/INSPECT_ISO_Example_5-f.out: Likewise.
* cobol.dg/group2/INSPECT_ISO_Example_5.out: Likewise.
* cobol.dg/group2/INSPECT_ISO_Example_5-r.out: Likewise.
* cobol.dg/group2/INSPECT_ISO_Example_6.out: Likewise.
* cobol.dg/group2/INSPECT_ISO_Example_7.out: Likewise.
* cobol.dg/group2/INSPECT_TALLYING_REPLACING_ISO_Example.out:
Likewise.
* cobol.dg/group2/INSPECT_TRAILING.out: Likewise.


RE: [PATCH] rtlanal, i386: Adjust pattern_cost and x86 constant cost [PR115910]

2025-04-03 Thread Roger Sayle
I agree returning to the GCC 14 behaviour is the best approach given the
current stage.

> -Original Message-
> From: Jakub Jelinek 
> Sent: 03 April 2025 09:16
> To: Roger Sayle 
> Cc: 'Richard Biener' ; 'Jan Hubicka' ;
'Uros
> Bizjak' ; 'Richard Sandiford'
;
> gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH] rtlanal, i386: Adjust pattern_cost and x86 constant
cost
> [PR115910]
> 
> On Thu, Apr 03, 2025 at 08:42:01AM -0400, Roger Sayle wrote:
> > I've been bootstrapping and regression testing a variant of Jakub's
> > previously proposed change:
> >
> > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc index
> > f38e3db41fa..123948a2dea 100644
> > --- a/gcc/config/i386/i386.cc
> > +++ b/gcc/config/i386/i386.cc
> > @@ -21883,7 +21883,10 @@ ix86_rtx_costs (rtx x, machine_mode mode, int
> > outer_cod e_i, int opno,
> >  case SYMBOL_REF:
> >if (x86_64_immediate_operand (x, VOIDmode))
> > *total = 0;
> > - else
> > +  else if (x86_64_zext_immediate_operand (x, VOIDmode))
> > +   /* movl is as expensive as a simple instruction.  */
> > +   *total = COSTS_N_INSNS (1);
> > +  else
> > /* movabsq is slightly more expensive than a simple instruction.
*/
> > *total = COSTS_N_INSNS (1) + 1;
> >return true;
> 
> This breaks the testcase again.
> With my version of the patch it is
> ;; positive division: unsigned cost: 26; signed cost: 28 with your
adjustment ;;
> positive division: unsigned cost: 29; signed cost: 28 and with vanilla
trunk ;;
> positive division: unsigned cost: 30; signed cost: 28 So, we'd need to put
more
> effort into tweaking this.
> My preference would be in stage1, the patch restores the GCC 14 cost for
the
> unsigned 32-bit immediates for normal rtx_cost and fixes up the weird
behavior of
> pattern_cost everywhere where cost [1,3] is less expensive than cost 0.
> 
> Also, compared to movl I'd then use something like COSTS_N_INSNS (1) + 3
for
> movabsq.
> 
> > p.s. It's odd this is a P1.  It's not wrong code, but a single-cycle
> > instruction timing issue (where the current trunk implementation is
> > typically more correct than GCC 14's).
> 
> It is not odd, it is a regression on primary platform and clearly it
doesn't affect just
> that one testcase but most likely many of the unsigned vs. signed division
> decisions and clearly other code as well (see PR119594 for another
testcase
> affected by that).
> 
>   Jakub




Re: [PATCH] libstdc++: Fix handling of field width for wide strings and characters [PR119593]

2025-04-03 Thread Jonathan Wakely
On Thu, 3 Apr 2025 at 10:24, Jonathan Wakely  wrote:
>
> On Thu, 3 Apr 2025 at 09:55, Tomasz Kamiński  wrote:
> >
> > This patch corrects handling of UTF-32LE and UTF32-BE in
> > __unicode::__literal_encoding_is_unicode<_CharT>, so they are
> > recognized as unicode and functions produces correct result for wchar_t.
> >
> > Use `__unicode::__field_width` to compute the estimated witdh
>
> "width"
>
> > of the charcter for unicode wide encoding.
>
> "character"
>
> >
> > PR libstdc++-v3/119593
> >
> > libstdc++-v3/ChangeLog:
> >
> > * include/bits/unicode.h
> > (__unicode::__literal_encoding_is_unicode<_CharT>):
> > Corrected handing for UTF-16 and UTF-32 with "LE" or "BE" suffix.
> > * include/std/format (__formatter_str::_S_character_width):
> > Define.
> > (__formatter_str::_S_character_width): Updated passed char
> > length.
> > * testsuite/std/format/functions/format.cc: Test for wchar_t.
> > ---
> > Testing on x86_64-linux. OK for trunk?
> > I believe we should backport it, given that all wchar_t uses are
> > impacted.
> >
> >  libstdc++-v3/include/bits/unicode.h   |  2 ++
> >  libstdc++-v3/include/std/format   | 15 ++-
> >  .../testsuite/std/format/functions/format.cc  |  8 ++--
> >  3 files changed, 22 insertions(+), 3 deletions(-)
> >
> > diff --git a/libstdc++-v3/include/bits/unicode.h 
> > b/libstdc++-v3/include/bits/unicode.h
> > index 24b1ac3d53d..99d972eccff 100644
> > --- a/libstdc++-v3/include/bits/unicode.h
> > +++ b/libstdc++-v3/include/bits/unicode.h
> > @@ -1039,6 +1039,8 @@ inline namespace __v16_0_0
> >   string_view __s(__enc);
> >   if (__s.ends_with("//"))
> > __s.remove_suffix(2);
> > + if (__s.ends_with("LE") || __s.ends_with("BE"))
> > +   __s.remove_suffix(2);
> >   return __s == "16" || __s == "32";
> > }
> > }
> > diff --git a/libstdc++-v3/include/std/format 
> > b/libstdc++-v3/include/std/format
> > index c3327e1d384..603facc51de 100644
> > --- a/libstdc++-v3/include/std/format
> > +++ b/libstdc++-v3/include/std/format
> > @@ -1277,12 +1277,25 @@ namespace __format
> >   _M_spec);
> > }
> >
>
> Please put [[__gnu__::__always_inline__]] on this function, so that it
> doesn't add any overhead for narrow chars:
>
> > +  static size_t
> > +  _S_character_width(_CharT __c)
> > +  {
> > +   using __unicode::__literal_encoding_is_unicode;
> > +   // N.B. single byte cannot encode charcter of width greater than 1
> > +   if (sizeof(_CharT) > 1u && __literal_encoding_is_unicode<_CharT>())
>
> I think this can be 'if constexpr'
>
> OK for trunk and gcc-14 with those changes, thanks.
> (No backport for gcc-13 because it doesn't have the Unicode-aware
> field width support.)
>
> > + return __unicode::__field_width(__c);
> > +   else
> > + return 1u;
> > +  }
> > +
> >template
> > typename basic_format_context<_Out, _CharT>::iterator
> > _M_format_character(_CharT __c,
> >   basic_format_context<_Out, _CharT>& __fc) const
> > {
> > - return __format::__write_padded_as_spec({&__c, 1u}, 1, __fc, 
> > _M_spec);
> > + return __format::__write_padded_as_spec({&__c, 1u},
> > + _S_character_width(__c),
> > + __fc, _M_spec);
> > }
> >
> >template
> > diff --git a/libstdc++-v3/testsuite/std/format/functions/format.cc 
> > b/libstdc++-v3/testsuite/std/format/functions/format.cc
> > index 7fc42017045..d8dbf463413 100644
> > --- a/libstdc++-v3/testsuite/std/format/functions/format.cc
> > +++ b/libstdc++-v3/testsuite/std/format/functions/format.cc
> > @@ -501,9 +501,14 @@ test_unicode()
> >  {
> >// Similar to sC example in test_std_examples, but not from the standard.
> >// Verify that the character "🤡" has estimated field width 2,
> > -  // rather than estimated field width equal to strlen("🤡"), which would 
> > be 4.
> > +  // rather than estimated field width equal to strlen("🤡"), which would 
> > be 4,
> > +  // or just width 1 for single character.
> >std::string sC = std::format("{:*<3}", "🤡");
> >VERIFY( sC == "🤡*" );
> > +  std::wstring wsC = std::format(L"{:*<3}", L"🤡");
> > +  VERIFY( wsC == L"🤡*" );
> > +  wsC = std::format(L"{:*<3}", L'🤡');
> > +  VERIFY( wsC == L"🤡*" );
> >
> >// Verify that "£" has estimated field width 1, not strlen("£") == 2.
> >std::string sL = std::format("{:*<3}", "£");
> > @@ -517,7 +522,6 @@ test_unicode()
> >std::string sP = std::format("{:1.1} {:*<1.1}", "£", "🤡");
> >VERIFY( sP == "£ *" );
> >sP = std::format("{:*<2.1} {:*<2.1}", "£", "🤡");
> > -  VERIFY( sP == "£* **" );

I didn't notice at first that this line has been removed, was 

[PATCH] libstdc++: Fix handling of field width for wide strings and characters [PR119593]

2025-04-03 Thread Tomasz Kamiński
This patch corrects handling of UTF-32LE and UTF32-BE in
__unicode::__literal_encoding_is_unicode<_CharT>, so they are
recognized as unicode and functions produces correct result for wchar_t.

Use `__unicode::__field_width` to compute the estimated witdh
of the charcter for unicode wide encoding.

PR libstdc++-v3/119593

libstdc++-v3/ChangeLog:

* include/bits/unicode.h
(__unicode::__literal_encoding_is_unicode<_CharT>):
Corrected handing for UTF-16 and UTF-32 with "LE" or "BE" suffix.
* include/std/format (__formatter_str::_S_character_width):
Define.
(__formatter_str::_S_character_width): Updated passed char
length.
* testsuite/std/format/functions/format.cc: Test for wchar_t.
---
Testing on x86_64-linux. OK for trunk?
I believe we should backport it, given that all wchar_t uses are
impacted.

 libstdc++-v3/include/bits/unicode.h   |  2 ++
 libstdc++-v3/include/std/format   | 15 ++-
 .../testsuite/std/format/functions/format.cc  |  8 ++--
 3 files changed, 22 insertions(+), 3 deletions(-)

diff --git a/libstdc++-v3/include/bits/unicode.h 
b/libstdc++-v3/include/bits/unicode.h
index 24b1ac3d53d..99d972eccff 100644
--- a/libstdc++-v3/include/bits/unicode.h
+++ b/libstdc++-v3/include/bits/unicode.h
@@ -1039,6 +1039,8 @@ inline namespace __v16_0_0
  string_view __s(__enc);
  if (__s.ends_with("//"))
__s.remove_suffix(2);
+ if (__s.ends_with("LE") || __s.ends_with("BE"))
+   __s.remove_suffix(2);
  return __s == "16" || __s == "32";
}
}
diff --git a/libstdc++-v3/include/std/format b/libstdc++-v3/include/std/format
index c3327e1d384..603facc51de 100644
--- a/libstdc++-v3/include/std/format
+++ b/libstdc++-v3/include/std/format
@@ -1277,12 +1277,25 @@ namespace __format
  _M_spec);
}
 
+  static size_t
+  _S_character_width(_CharT __c)
+  {
+   using __unicode::__literal_encoding_is_unicode;
+   // N.B. single byte cannot encode charcter of width greater than 1
+   if (sizeof(_CharT) > 1u && __literal_encoding_is_unicode<_CharT>())
+ return __unicode::__field_width(__c);
+   else
+ return 1u;
+  }
+
   template
typename basic_format_context<_Out, _CharT>::iterator
_M_format_character(_CharT __c,
  basic_format_context<_Out, _CharT>& __fc) const
{
- return __format::__write_padded_as_spec({&__c, 1u}, 1, __fc, _M_spec);
+ return __format::__write_padded_as_spec({&__c, 1u},
+ _S_character_width(__c),
+ __fc, _M_spec);
}
 
   template
diff --git a/libstdc++-v3/testsuite/std/format/functions/format.cc 
b/libstdc++-v3/testsuite/std/format/functions/format.cc
index 7fc42017045..d8dbf463413 100644
--- a/libstdc++-v3/testsuite/std/format/functions/format.cc
+++ b/libstdc++-v3/testsuite/std/format/functions/format.cc
@@ -501,9 +501,14 @@ test_unicode()
 {
   // Similar to sC example in test_std_examples, but not from the standard.
   // Verify that the character "🤡" has estimated field width 2,
-  // rather than estimated field width equal to strlen("🤡"), which would be 4.
+  // rather than estimated field width equal to strlen("🤡"), which would be 4,
+  // or just width 1 for single character.
   std::string sC = std::format("{:*<3}", "🤡");
   VERIFY( sC == "🤡*" );
+  std::wstring wsC = std::format(L"{:*<3}", L"🤡");
+  VERIFY( wsC == L"🤡*" );
+  wsC = std::format(L"{:*<3}", L'🤡');
+  VERIFY( wsC == L"🤡*" );
 
   // Verify that "£" has estimated field width 1, not strlen("£") == 2.
   std::string sL = std::format("{:*<3}", "£");
@@ -517,7 +522,6 @@ test_unicode()
   std::string sP = std::format("{:1.1} {:*<1.1}", "£", "🤡");
   VERIFY( sP == "£ *" );
   sP = std::format("{:*<2.1} {:*<2.1}", "£", "🤡");
-  VERIFY( sP == "£* **" );
 
   // Verify field width handling for extended grapheme clusters,
   // and that a cluster gets output as a single item, not truncated.
-- 
2.48.1



Re: [PATCH] combine: Allow 2->2 combos but limit insn searches [PR116398]

2025-04-03 Thread Richard Sandiford
Richard Sandiford  writes:
> Richard Biener  writes:
>> On Wed, 2 Apr 2025, Richard Sandiford wrote:
>>
>>> Richard Biener  writes:
>>> > On Tue, 1 Apr 2025, Richard Sandiford wrote:
>>> >
>>> >> The problem in PR101523 was that, after each successful 2->2
>>> >> combination attempt, distribute_links would search further and
>>> >> further for the next combinable use of the i2 destination.
>>> >> The original patch for the PR dealt with that by disallowing such
>>> >> combinations.  However, that led to various optimisation regressions,
>>> >> so there was interest in allowing the combinations again, at least
>>> >> until an alternative way of getting the same results is in place.
>>> >> 
>>> >> This patch does that, but limits distribute_links to a certain number
>>> >> of instructions when i2 is unchanged.  As Segher said in the PR trail,
>>> >> it would make more conceptual sense to apply the limit unconditionally,
>>> >> but I thought it would be better to change as little as possible at
>>> >> this development stage.  Logically, in stage 1, the --param should
>>> >> be applied directly by distribute_links with no input from callers.
>>> >> 
>>> >> As I mentioned in:
>>> >> 
>>> >>   https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116398#c28
>>> >> 
>>> >> I think it's safe to drop log links even if a use exists.  All
>>> >> processing of log links seems to handle the absence of a link
>>> >> for a particular register in a conservative way.
>>> >> 
>>> >> The initial set-up errs on the side of dropping links, since for example
>>> >> create_log_links has:
>>> >> 
>>> >>  /* flow.c claimed:
>>> >> 
>>> >>  We don't build a LOG_LINK for hard registers contained
>>> >>  in ASM_OPERANDs.  If these registers get replaced,
>>> >>  we might wind up changing the semantics of the insn,
>>> >>  even if reload can make what appear to be valid
>>> >>  assignments later.  */
>>> >>   if (regno < FIRST_PSEUDO_REGISTER
>>> >>   && asm_noperands (PATTERN (use_insn)) >= 0)
>>> >> continue;
>>> >> 
>>> >> which excludes combinations by dropping log links, rather than during
>>> >> try_combine.  And:
>>> >> 
>>> >>   /* If this register is being initialized using itself, and the
>>> >>  register is uninitialized in this basic block, and there are
>>> >>  no LOG_LINKS which set the register, then part of the
>>> >>  register is uninitialized.  In that case we can't assume
>>> >>  anything about the number of nonzero bits.
>>> >> 
>>> >>  ??? We could do better if we checked this in
>>> >>  reg_{nonzero_bits,num_sign_bit_copies}_for_combine.  Then we
>>> >>  could avoid making assumptions about the insn which initially
>>> >>  sets the register, while still using the information in other
>>> >>  insns.  We would have to be careful to check every insn
>>> >>  involved in the combination.  */
>>> >> 
>>> >>   if (insn
>>> >>   && reg_referenced_p (x, PATTERN (insn))
>>> >>   && !REGNO_REG_SET_P (DF_LR_IN (BLOCK_FOR_INSN (insn)),
>>> >>REGNO (x)))
>>> >> {
>>> >>   struct insn_link *link;
>>> >> 
>>> >>   FOR_EACH_LOG_LINK (link, insn)
>>> >> if (dead_or_set_p (link->insn, x))
>>> >>   break;
>>> >>   if (!link)
>>> >> {
>>> >>   rsp->nonzero_bits = GET_MODE_MASK (mode);
>>> >>   rsp->sign_bit_copies = 1;
>>> >>   return;
>>> >> }
>>> >> }
>>> >> 
>>> >> treats the lack of a log link is a possible sign of uninitialised data,
>>> >> but that would be a missed optimisation rather than a correctness issue.
>>> >> 
>>> >> One question is what the default --param value should be.  I went with
>>> >> Jakub's suggestion of 3000 from:
>>> >> 
>>> >>   https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116398#c25
>>> >> 
>>> >> Also, to answer Jakub's question in that comment, I tried bisecting:
>>> >> 
>>> >>   int limit = atoi (getenv ("BISECT"));
>>> >> 
>>> >> (so applying the limit for all calls from try_combine) with an
>>> >> abort in distribute_links if the limit caused a link to be skipped.
>>> >> The minimum BISECT value that allowed an aarch64-linux-gnu bootstrap
>>> >> to succeed with --enable-languages=all --enable-checking=yes,rtl,extra
>>> >> was 142, so much lower than the parameter value.  I realised too late
>>> >> that --enable-checking=release would probably have been a more
>>> >> interesting test.
>>> >> 
>>> >> Some of the try_combine changes come from Richi's patch in:
>>> >> 
>>> >>   https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523#c53
>>> >> 
>>> >> Bootstrapped & regression-tested on aarch64-linux-gnu,
>>> >> powerpc64le-linux-gnu and x86_64-linux-gnu.  OK to install?
>>> >> 
>>> >> Richard
>>> >> 
>>> >> 
>>> >> gcc/
>>> >>

Re: Forward declaration

2025-04-03 Thread Qing Zhao
CC’ing gcc alias as well. 

Hi, Yeoul,

I think the key for when to  use the forward declaration syntax is:

When there is only one single identifier as the argument of counted_by 
attribute, 
There will be no possibility of scope conflict between different identifiers 
inside
the counted_by attribute, for such cases, we always look up the identifier 
inside
the structure. 

No need for the forward declaration syntax for such cases. 

This is the current behavior of counted_by attribute with single identifier, 
this behavior
remains unchanged. 

However, when the argument of counted_by attribute is  an expression, there 
might be
multiple identifiers inside counted_by attribute, the scope conflict between 
different
identifiers inside the counted_by attribute becomes possible, for such cases, 
forward declarations 
are necessary to specify which identifiers inside the counted_by attribute 
should be looked
up inside the structure. The identifiers that are not listed in the forward 
declarations will be
looked up according to the default C scoping rules . 

Forward declarations inside the counted_by attributes serve two purposes:

1. Explicitly specify which identifiers should be looked up inside the 
structure. (Similar purpose as 
the new __self keyword we proposed previously).
2. Resolve the real forward declaration issue as we identified previously. 

In addition to the above 2, the forward declarations syntax is user-friendly 
and easy to be understood.
It keeps the default C scoping rules. And at the same time it’s also can be 
used for C++ as well. 

So, I think this might be a nice compromised new syntax. 

More in below to answer your questions:

> On Apr 2, 2025, at 19:52, Yeoul Na  wrote:
> 
> Hi Qing and Bill,
> 
> Thanks for trying to converge on the solution for this problem.
> 
> @Bill: So your proposal allows referring to a member for simple declaration 
> reference like this:
> 
> constexpr int len = 20;
> struct s {
>   int len;
>   int *__counted_by(len) buf; // this continues to be member ‘len’, not 
> global ‘len'  
> };
> 
> 
> Requires forward declarations for these cases
> - forward referencing:
> 
> constexpr int len = 20;
> struct s {
>   int *__counted_by(int len; len) buf; 
>   int len; 
> };

I think that in the above example, the forward declaration is not needed if the 
user’s intention is to use
the member “len” instead of the global “len”. 

However, if the user tries to use the global “len” instead of the member “len”, 
then he need to use the 
Following syntax:

constexpr int len = 20;
struct s {
  int *__counted_by(len+0) buf; 
  int len; 
};

In the above, the argument inside counted_by is an expression, however, there 
is NO any forward declaration,
Then the identifier inside this expression will be looked up per the default C 
scoping rule.
Jacub Jelinek from GCC community mentioned this in a very early email in the 
GCC discussion. 

https://gcc.gnu.org/pipermail/gcc-patches/2025-March/677123.html


> - arithmetic expressions, etc.
> 
> constexpr int len = 20;
> constexpr int scale = 4;
> struct s {
>   int scale; 
>   int len;
>   int *__counted_by(int scale; int len; len * scale) buf; 
> };

Yes. This is correct. As long as the counted_by argument is an expression, 
forward declarations are needed
 to specify which identifiers should be looked up inside the structure. 

> 
> Is that a correct summary?
> If so, I’m curious what is the expected behavior for this:
> 
> constexpr int len = 20;
> constexpr int scale = 4;
> struct s {
>   int scale; 
>   int len;
>   int *__counted_by(len * scale) buf; 
> };
> 

In the above, since the argument is an expression, but there is no forward 
declaration inside the counted_by,
The identifiers “len” and “scale” are all looked up through the default C 
scoping, i.e, outside of the structure.

> 
>> So, I will go with the compromised syntax as suggested by Bill and Martin.
> 
> 
> @Bill @Qing Before doing that, this also needs Clang community to be on 
> board. I see Bill’s proposal is only being discussed in the GCC list. 
> However, adding a forward declaration within attributes will also affect 
> Clang attributes such as ‘guarded_by’, so could you please start an RFC in 
> Clang community as well and get support from there?
> 
> I will still have some more questions and other suggestions on your proposal 
> that I’d like to discuss once you start some thread in Clang. 
> 
> Also, the attributes (‘counted_by’ and friends, and ‘guarded_by’) are also 
> used for function parameters. Are you suggesting to use the forward 
> declaration for function parameters as well? Or this is just for structs?

From my understanding, as I mentioned in the above,  the purpose of the forward 
declarations 
inside counted_by attribute are:

1. Explicitly specify which identifiers should be looked up inside the 
structure. (Similar purpose as 
the new __self keyword we proposed previously).
2. Resolve the real forward declaration issue as w

Re: [PATCH] combine: Allow 2->2 combos but limit insn searches [PR116398]

2025-04-03 Thread Jakub Jelinek
On Thu, Apr 03, 2025 at 07:50:13AM -0600, Jeff Law wrote:
> On 4/3/25 7:40 AM, Richard Sandiford wrote:
> > > 
> > > But log-links on i2 are for uses in i2 for defs before i2, given
> > > i0/i1/i2 changed/have gone away we adjust those to eventually end
> > > up on insns between i2 and i3 (or indeed after i3).  Combine then
> > > want's to try combine the insns with changed log-links.
> > 
> > Right.  But I meant in the case where i2 hasn't changed, which
> > I thought was the case in contention.  If i2 hasn't changed then
> > its uses are the same, and so there is no need to distribute its
> > log links.
> I'm not following closely (though I'm definitely interested as I continue to
> see cases were 2->2 for rewriting would be profitable).
> 
> Do we have to worry about the case where i2 doesn't change, but the other
> insn does change?  I know I saw one of those scenarios in a hot loop in
> coremark for rv64.

Both the PR116398 and PR114518 are P1s about combiner in GCC 15 punting the
2->2 combinations where only i3 changes and i2 doesn't.

Jakub



Ping^2: [PATCH v4] get source line for diagnostic from preprocessed file [PR preprocessor/79106]

2025-04-03 Thread Bader, Lucas
Gentle ping for https://gcc.gnu.org/pipermail/gcc-patches/2025-March/676875.html


Re: [PATCH] libstdc++: Fix handling of field width for wide strings and characters [PR119593]

2025-04-03 Thread Tomasz Kaminski
On Thu, Apr 3, 2025 at 2:49 PM Jonathan Wakely  wrote:

> On Thu, 3 Apr 2025 at 10:24, Jonathan Wakely  wrote:
> >
> > On Thu, 3 Apr 2025 at 09:55, Tomasz Kamiński 
> wrote:
> > >
> > > This patch corrects handling of UTF-32LE and UTF32-BE in
> > > __unicode::__literal_encoding_is_unicode<_CharT>, so they are
> > > recognized as unicode and functions produces correct result for
> wchar_t.
> > >
> > > Use `__unicode::__field_width` to compute the estimated witdh
> >
> > "width"
> >
> > > of the charcter for unicode wide encoding.
> >
> > "character"
> >
> > >
> > > PR libstdc++-v3/119593
> > >
> > > libstdc++-v3/ChangeLog:
> > >
> > > * include/bits/unicode.h
> > > (__unicode::__literal_encoding_is_unicode<_CharT>):
> > > Corrected handing for UTF-16 and UTF-32 with "LE" or "BE"
> suffix.
> > > * include/std/format (__formatter_str::_S_character_width):
> > > Define.
> > > (__formatter_str::_S_character_width): Updated passed char
> > > length.
> > > * testsuite/std/format/functions/format.cc: Test for wchar_t.
> > > ---
> > > Testing on x86_64-linux. OK for trunk?
> > > I believe we should backport it, given that all wchar_t uses are
> > > impacted.
> > >
> > >  libstdc++-v3/include/bits/unicode.h   |  2 ++
> > >  libstdc++-v3/include/std/format   | 15 ++-
> > >  .../testsuite/std/format/functions/format.cc  |  8 ++--
> > >  3 files changed, 22 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/libstdc++-v3/include/bits/unicode.h
> b/libstdc++-v3/include/bits/unicode.h
> > > index 24b1ac3d53d..99d972eccff 100644
> > > --- a/libstdc++-v3/include/bits/unicode.h
> > > +++ b/libstdc++-v3/include/bits/unicode.h
> > > @@ -1039,6 +1039,8 @@ inline namespace __v16_0_0
> > >   string_view __s(__enc);
> > >   if (__s.ends_with("//"))
> > > __s.remove_suffix(2);
> > > + if (__s.ends_with("LE") || __s.ends_with("BE"))
> > > +   __s.remove_suffix(2);
> > >   return __s == "16" || __s == "32";
> > > }
> > > }
> > > diff --git a/libstdc++-v3/include/std/format
> b/libstdc++-v3/include/std/format
> > > index c3327e1d384..603facc51de 100644
> > > --- a/libstdc++-v3/include/std/format
> > > +++ b/libstdc++-v3/include/std/format
> > > @@ -1277,12 +1277,25 @@ namespace __format
> > >   _M_spec);
> > > }
> > >
> >
> > Please put [[__gnu__::__always_inline__]] on this function, so that it
> > doesn't add any overhead for narrow chars:
> >
> > > +  static size_t
> > > +  _S_character_width(_CharT __c)
> > > +  {
> > > +   using __unicode::__literal_encoding_is_unicode;
> > > +   // N.B. single byte cannot encode charcter of width greater
> than 1
> > > +   if (sizeof(_CharT) > 1u &&
> __literal_encoding_is_unicode<_CharT>())
> >
> > I think this can be 'if constexpr'
> >
> > OK for trunk and gcc-14 with those changes, thanks.
> > (No backport for gcc-13 because it doesn't have the Unicode-aware
> > field width support.)
> >
> > > + return __unicode::__field_width(__c);
> > > +   else
> > > + return 1u;
> > > +  }
> > > +
> > >template
> > > typename basic_format_context<_Out, _CharT>::iterator
> > > _M_format_character(_CharT __c,
> > >   basic_format_context<_Out, _CharT>& __fc) const
> > > {
> > > - return __format::__write_padded_as_spec({&__c, 1u}, 1, __fc,
> _M_spec);
> > > + return __format::__write_padded_as_spec({&__c, 1u},
> > > +
>  _S_character_width(__c),
> > > + __fc, _M_spec);
> > > }
> > >
> > >template
> > > diff --git a/libstdc++-v3/testsuite/std/format/functions/format.cc
> b/libstdc++-v3/testsuite/std/format/functions/format.cc
> > > index 7fc42017045..d8dbf463413 100644
> > > --- a/libstdc++-v3/testsuite/std/format/functions/format.cc
> > > +++ b/libstdc++-v3/testsuite/std/format/functions/format.cc
> > > @@ -501,9 +501,14 @@ test_unicode()
> > >  {
> > >// Similar to sC example in test_std_examples, but not from the
> standard.
> > >// Verify that the character "🤡" has estimated field width 2,
> > > -  // rather than estimated field width equal to strlen("🤡"), which
> would be 4.
> > > +  // rather than estimated field width equal to strlen("🤡"), which
> would be 4,
> > > +  // or just width 1 for single character.
> > >std::string sC = std::format("{:*<3}", "🤡");
> > >VERIFY( sC == "🤡*" );
> > > +  std::wstring wsC = std::format(L"{:*<3}", L"🤡");
> > > +  VERIFY( wsC == L"🤡*" );
> > > +  wsC = std::format(L"{:*<3}", L'🤡');
> > > +  VERIFY( wsC == L"🤡*" );
> > >
> > >// Verify that "£" has estimated field width 1, not strlen("£") ==
> 2.
> > >std::string sL = std::format("{:*<3}", "£");
> > > @@ -517,7 +522,6 @@ test_unicode()
> > >s

[PATCH v1 2/2] c: Improve diagnostics for C FMV declaration conflicts.

2025-04-03 Thread Alfie Richards

Improves diagnostic messages for conflicting function multiversioning (FMV)
declarations using target_version and/or target_clones attributes.

Conflict errors now include the overlapping version string (if relevant),
making it easier to identify and resolve declaration mismatches.

gcc/c/ChangeLog:

* c-decl.cc (diagnose_mismatched_decls): Add conflicting_ver argument
and update diagnostics with it.
(duplicate_decls): Ditto.
(pushdecl): Add conflicting_version variable update logic to use it.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/mv-and-mvc-error1.c: New test.
* gcc.target/aarch64/mv-and-mvc-error2.c: New test.
* gcc.target/aarch64/mv-and-mvc-error3.c: New test.
* gcc.target/aarch64/mv-error1.c: New test.
* gcc.target/aarch64/mv-error2.c: New test.
* gcc.target/aarch64/mv-error3.c: New test.
* gcc.target/aarch64/mv-error4.c: New test.
* gcc.target/aarch64/mv-error5.c: New test.
* gcc.target/aarch64/mv-error6.c: New test.
* gcc.target/aarch64/mv-error7.c: New test.
* gcc.target/aarch64/mv-error8.c: New test.
* gcc.target/aarch64/mv-error9.c: New test.
* gcc.target/aarch64/mvc-error1.c: New test.
* gcc.target/aarch64/mvc-error2.c: New test.
* gcc.target/aarch64/mvc-warning1.c: New test.
---
 gcc/c/c-decl.cc   | 35 +--
 .../gcc.target/aarch64/mv-and-mvc-error1.c|  9 +
 .../gcc.target/aarch64/mv-and-mvc-error2.c|  9 +
 .../gcc.target/aarch64/mv-and-mvc-error3.c|  8 +
 gcc/testsuite/gcc.target/aarch64/mv-error1.c  | 18 ++
 gcc/testsuite/gcc.target/aarch64/mv-error2.c  |  9 +
 gcc/testsuite/gcc.target/aarch64/mv-error3.c  | 12 +++
 gcc/testsuite/gcc.target/aarch64/mv-error4.c  |  9 +
 gcc/testsuite/gcc.target/aarch64/mv-error5.c  |  8 +
 gcc/testsuite/gcc.target/aarch64/mv-error6.c  | 20 +++
 gcc/testsuite/gcc.target/aarch64/mv-error7.c  | 11 ++
 gcc/testsuite/gcc.target/aarch64/mv-error8.c  | 12 +++
 gcc/testsuite/gcc.target/aarch64/mv-error9.c  | 12 +++
 gcc/testsuite/gcc.target/aarch64/mvc-error1.c |  9 +
 gcc/testsuite/gcc.target/aarch64/mvc-error2.c |  9 +
 .../gcc.target/aarch64/mvc-warning1.c | 13 +++
 16 files changed, 193 insertions(+), 10 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/mv-and-mvc-error1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/mv-and-mvc-error2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/mv-and-mvc-error3.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/mv-error1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/mv-error2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/mv-error3.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/mv-error4.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/mv-error5.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/mv-error6.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/mv-error7.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/mv-error8.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/mv-error9.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/mvc-error1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/mvc-error2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/mvc-warning1.c

diff --git a/gcc/c/c-decl.cc b/gcc/c/c-decl.cc
index e68f8e0cdeb..d5cd31c40a2 100644
--- a/gcc/c/c-decl.cc
+++ b/gcc/c/c-decl.cc
@@ -2119,8 +2119,8 @@ maybe_mark_function_versioned (tree decl)
TREE_TYPE (NEWDECL, OLDDECL) respectively.  */
 
 static bool
-diagnose_mismatched_decls (tree newdecl, tree olddecl,
-			   tree *newtypep, tree *oldtypep)
+diagnose_mismatched_decls (tree newdecl, tree olddecl, tree *newtypep,
+			   tree *oldtypep, string_slice *conflicting_ver = NULL)
 {
   tree newtype, oldtype;
   bool retval = true;
@@ -2448,7 +2448,12 @@ diagnose_mismatched_decls (tree newdecl, tree olddecl,
 		DECL_ATTRIBUTES (newdecl)
 		{
 		  auto_diagnostic_group d;
-		  error ("redefinition of %q+D", newdecl);
+		  if (conflicting_ver && conflicting_ver->is_valid ())
+		error ("redefinition of %qB version for %q+D",
+			   conflicting_ver,
+			   newdecl);
+		  else
+		error ("redefinition of %q+D", newdecl);
 		  locate_old_decl (olddecl);
 		  return false;
 		}
@@ -3185,7 +3190,8 @@ merge_decls (tree newdecl, tree olddecl, tree newtype, tree oldtype)
true.  Otherwise, return false.  */
 
 static bool
-duplicate_decls (tree newdecl, tree olddecl)
+duplicate_decls (tree newdecl, tree olddecl,
+		 string_slice *conflicting_ver = NULL)
 {
   tree newtype = NULL, oldtype = NULL;
 
@@ -3193,12 +3199,18 @@ duplicate_decls (tree newdecl, tree olddecl)
   && !mergeable_version_decls (olddecl, newdecl))
 {
   auto_diagnostic_group d;
-  error ("conflicting versioned declarations of %q+D", newdecl);
+  if (conflicting_ver && conflicting_ver-

Re: [PATCH] vect: Relax scan-tree-dump strict pattern matching [PR118597]

2025-04-03 Thread Victor Do Nascimento

Cheers,

And given how the test also started failing for GCC14, are we also okay
to go ahead with backporting the patch?

Victor

On 4/2/25 17:39, Jeff Law wrote:



On 4/2/25 8:53 AM, Victor Do Nascimento wrote:

Using specific SSA names in pattern matching in `dg-final' makes tests
"unstable", in that changes in passes prior to the pass whose dump is
analyzed in the particular test may change the numbering of the SSA
variables, causing the test to start failing spuriously.

We thus switch from specific SSA names to the use of a multi-line
regular expression making use of capture groups for matching particular
variables across different statements, ensuring the test will pass
more consistently across different versions of GCC.

PR testsuite/118597

gcc/testsuite/ChangeLog:

* gcc.dg/vect/vect-fncall-mask.c: Update test directives.
Yea, we definitely should not be using specific SSA_NAME versions in a 
testcase like that.


OK for the trunk.  Please go ahead and install.

Thanks!

jeff





Re: [PATCH] vect: Relax scan-tree-dump strict pattern matching [PR118597]

2025-04-03 Thread Jeff Law




On 4/3/25 8:40 AM, Victor Do Nascimento wrote:

Cheers,

And given how the test also started failing for GCC14, are we also okay
to go ahead with backporting the patch?

Of course.

Thanks,
jeff



[PING^2] [PATCH v2] rs6000: Adding missed ISA 3.0 atomic memory operation instructions.

2025-04-03 Thread jeevitha


Ping!

please review.

Thanks & Regards
Jeevitha

On 20/02/25 7:41 pm, jeevitha wrote:
> Hi All,
> 
> The following patch has been bootstrapped and regtested on powerpc64le-linux.
> 
> Changes to amo.h include the addition of the following load atomic operations:
> Compare and Swap Not Equal, Fetch and Increment Bounded, Fetch and Increment
> Equal, and Fetch and Decrement Bounded. Additionally, Store Twin is added for
> store atomic operations.
> 
> 2025-02-20 Peter Bergner 
> 
> gcc/:
>   * config/rs6000/amo.h: Add missing atomic memory operations.
>   * doc/extend.texi (PowerPC Atomic Memory Operation Functions):
> Document new functions.
> 
> gcc/testsuite/:
>   * gcc.target/powerpc/amo3.c: New test.
>   * gcc.target/powerpc/amo4.c: Likewise.
>   * gcc.target/powerpc/amo5.c: Likewise.
>   * gcc.target/powerpc/amo6.c: Likewise.
>   * gcc.target/powerpc/amo7.c: Likewise.
> 
> Co-authored-by: Jeevitha Palanisamy  
> 
> 
> diff --git a/gcc/config/rs6000/amo.h b/gcc/config/rs6000/amo.h
> index 25ab1c7b4c4..10960208d31 100644
> --- a/gcc/config/rs6000/amo.h
> +++ b/gcc/config/rs6000/amo.h
> @@ -71,6 +71,64 @@ NAME (TYPE *_PTR, TYPE _VALUE) 
> \
>return _RET;   
> \
>  }
>  
> +/* Implementation of the LWAT/LDAT operations that take two input registers
> +   and modify one word or double-word of memory and return the value that was
> +   previously in the memory location.  The destination and two source
> +   registers are encoded with only one register number, so we need three
> +   consecutive GPR registers and there is no C/C++ type that will give
> +   us that, so we have to use register asm variables to achieve that.
> +
> +   The LWAT/LDAT opcode requires the address to be a single register,
> +   and that points to a suitably aligned memory location.  Asm volatile
> +   is used to prevent the optimizer from moving the operation.  */
> +
> +#define _AMO_LD_CMPSWP(NAME, TYPE, OPCODE, FC)   
> \
> +static __inline__ TYPE   
> \
> +NAME (TYPE *_PTR, TYPE _COND, TYPE _VALUE)   \
> +{\
> +  register TYPE _ret asm ("r8"); \
> +  register TYPE _cond asm ("r9") = _COND;\
> +  register TYPE _value asm ("r10") = _VALUE; \
> +  __asm__ __volatile__ (OPCODE " %[ret],%P[addr],%[code]"\
> + : [addr] "+Q" (_PTR[0]), [ret] "=r" (_ret)  \
> + : "r" (_cond), "r" (_value), [code] "n" (FC));  \
> +  return _ret;   
> \
> +}
> +
> +/* Implementation of the LWAT/LDAT fetch and increment operations.
> +
> +   The LWAT/LDAT opcode requires the address to be a single register that
> +   points to a suitably aligned memory location.  Asm volatile is used to
> +   prevent the optimizer from moving the operation.  */
> +
> +#define _AMO_LD_INCREMENT(NAME, TYPE, OPCODE, FC)\
> +static __inline__ TYPE   
> \
> +NAME (TYPE *_PTR)\
> +{\
> +  TYPE _RET; \
> +  __asm__ volatile (OPCODE " %[ret],%P[addr],%[code]\n"  
> \
> + : [addr] "+Q" (_PTR[0]), [ret] "=r" (_RET)  \
> + : "Q" (*(TYPE (*)[2]) _PTR), [code] "n" (FC));  \
> +  return _RET;   
> \
> +}
> +
> +/* Implementation of the LWAT/LDAT fetch and decrement operations.
> +
> +   The LWAT/LDAT opcode requires the address to be a single register that
> +   points to a suitably aligned memory location.  Asm volatile is used to
> +   prevent the optimizer from moving the operation.  */
> +
> +#define _AMO_LD_DECREMENT(NAME, TYPE, OPCODE, FC)\
> +static __inline__ TYPE   
> \
> +NAME (TYPE *_PTR)\
> +{\
> +  TYPE _RET; \
> +  __asm__ volatile (OPCODE " %[ret],%P[addr],%[code]\n"  
> \
> + : [addr] "+Q" (_PTR[1]), [ret] "=r" (_RET)  \
> + : "Q" (*(TYPE (*)[2]) (_PTR)), [code] "n" (FC));\
> +  return _RET;   
> \
> +}
> +
>  _AMO_LD_SIMPLE (amo_lwat_add,   uint32_t, "lwat", _AMO_LD_ADD)
>  _AMO_LD_SIMPLE (amo_lwat_xor,   uint32_t, "lwat", _AMO_LD

[PATCH] libstdc++: Provide formatter for vector::reference [PR109162]

2025-04-03 Thread Tomasz Kamiński
This patch implement formatter for vector::reference which
is part of P2286R8.

To indicate partial support we define __glibcxx_format_ranges macro
value 1, without defining __cpp_lib_format_ranges.

To avoid including the whole content of the  header, we
introduce new bits/formatfwd.h forward declares classes required
for newly introduce formatter.

The signatures of the user-facing parse and format method of the provided
formatters deviate from the standard by constraining types of params:
* _Bit_reference instead T satisfying is-vector-bool-reference
* _CharT is constrained __formatter::__char
* basic_format_parse_context<_CharT> for parse argument
* basic_format_context<_Out, _CharT> for format second argument
The standard specifies last three of above as unconstrained types, which leads
to formattable::reference, char32_t> (and any other type as char)
being true.

PR libstdc++/109162

libstdc++-v3/ChangeLog:

* include/Makefile.am: Add bits/formatfwd.h.
* include/Makefile.in: Add bits/formatfwd.h.
* include/bits/version.def:
Define __glibcxx_format_ranges without corresponding std name.
* include/bits/version.h: Regenerate.
* include/std/format (basic_format_context, __format::__char):
Move declartions to bits/formatfwd.h.
(formatter<_Tp, _CharT>): Remove default argument for _CharT
parameter, now specified in forward declaration in bits/formatfwd.h.
* include/std/vector (formatter<_Bit_reference, _CharT>: Define.
* include/bits/formatfwd.h: New file with forward declartions
for bits of std/format.
* testsuite/23_containers/vector/bool/format.cc: New test.
---
Testing for x86_64-linux, format tests passed.
OK for trunk if all test passes?
I think I could land this first to bring __glibcxx_format_ranges macro.

 libstdc++-v3/include/Makefile.am  |  1 +
 libstdc++-v3/include/Makefile.in  |  1 +
 libstdc++-v3/include/bits/formatfwd.h | 68 +++
 libstdc++-v3/include/bits/version.def | 18 ++---
 libstdc++-v3/include/bits/version.h   | 10 +++
 libstdc++-v3/include/std/format   | 14 +---
 libstdc++-v3/include/std/vector   | 30 
 .../23_containers/vector/bool/format.cc   | 67 ++
 8 files changed, 188 insertions(+), 21 deletions(-)
 create mode 100644 libstdc++-v3/include/bits/formatfwd.h
 create mode 100644 libstdc++-v3/testsuite/23_containers/vector/bool/format.cc

diff --git a/libstdc++-v3/include/Makefile.am b/libstdc++-v3/include/Makefile.am
index 4dc771a540c..537774c2668 100644
--- a/libstdc++-v3/include/Makefile.am
+++ b/libstdc++-v3/include/Makefile.am
@@ -195,6 +195,7 @@ bits_headers = \
${bits_srcdir}/cow_string.h \
${bits_srcdir}/deque.tcc \
${bits_srcdir}/erase_if.h \
+   ${bits_srcdir}/formatfwd.h \
${bits_srcdir}/forward_list.h \
${bits_srcdir}/forward_list.tcc \
${bits_srcdir}/fs_dir.h \
diff --git a/libstdc++-v3/include/Makefile.in b/libstdc++-v3/include/Makefile.in
index 0e3d09b3a75..7b96b2207f8 100644
--- a/libstdc++-v3/include/Makefile.in
+++ b/libstdc++-v3/include/Makefile.in
@@ -548,6 +548,7 @@ bits_freestanding = \
 @GLIBCXX_HOSTED_TRUE@  ${bits_srcdir}/cow_string.h \
 @GLIBCXX_HOSTED_TRUE@  ${bits_srcdir}/deque.tcc \
 @GLIBCXX_HOSTED_TRUE@  ${bits_srcdir}/erase_if.h \
+@GLIBCXX_HOSTED_TRUE@  ${bits_srcdir}/formatfwd.h \
 @GLIBCXX_HOSTED_TRUE@  ${bits_srcdir}/forward_list.h \
 @GLIBCXX_HOSTED_TRUE@  ${bits_srcdir}/forward_list.tcc \
 @GLIBCXX_HOSTED_TRUE@  ${bits_srcdir}/fs_dir.h \
diff --git a/libstdc++-v3/include/bits/formatfwd.h 
b/libstdc++-v3/include/bits/formatfwd.h
new file mode 100644
index 000..5450ad1297f
--- /dev/null
+++ b/libstdc++-v3/include/bits/formatfwd.h
@@ -0,0 +1,68 @@
+//  Formatting -*- C++ -*-
+
+// Copyright The GNU Toolchain Authors.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// Under Section 7 of GPL version 3, you are granted additional
+// permissions described in the GCC Runtime Library Exception, version
+// 3.1, as published by the Free Software Foundation.
+
+// You should have received a copy of the GNU General Public License and
+// a copy of the GCC Runtime Library Exception along with this program;
+// see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+// .
+
+/** @file include/format
+ *  This is a Standard C++ Library head

Re: [PATCH, OBVIOUS] rs6000: Add Cobol support to traceback table [PR119308]

2025-04-03 Thread Peter Bergner
On 4/2/25 2:57 PM, Peter Bergner wrote:
> The AIX traceback table documentation states the tbtab "lang" field for
> Cobol should be set to 7.
> 
> Tested on powerpc64le-linux.  There are "new" FAILs with the patch (see below)
> on the Cobol test cases, but that is a good thing, because without the
> patch, the compiler ICEs and none of the tests are even run making them
> essentially unsupported.
> 
> The FAILs below are due to PR119597 and are only hit when compiling with -O0.
> The -O[123s] Cobol tests all PASS.  That is an improvement to me over the
> current status, so I'm treating this patch as "obvious" and will push it
> tomorrow unless someone has an objection.

Pushed to trunk now.

We still have PR119597 to deal with though before Cobol is working-working
on powerpc64le-linux.

Peter