[PATCH] bitintlower: Fix interaction of gimple_assign_copy_p stmts vs. has_single_use [PR119808]

2025-04-15 Thread Jakub Jelinek
Hi!

The following testcase is miscompiled, because we emit a CLOBBER in a place
where it shouldn't be emitted.
Before lowering we have:
  b_5 = 0;
  b.0_6 = b_5;
  b.1_1 = (unsigned _BitInt(129)) b.0_6;
...
   = b_5;
The bitint coalescing assigns the same partition/underlying variable
for both b_5 and b.0_6 (possible because there is a copy assignment)
and of course a different one for b.1_1 (and other SSA_NAMEs in between).
This is -O0 so stmts aren't DCEd and aren't propagated that much etc.
It is -O0 so we also don't try to optimize and omit some names from m_names
and handle multiple stmts at once, so the expansion emits essentially
  bitint.4 = {};
  bitint.4 = bitint.4;
  bitint.2 = cast of bitint.4;
  bitint.4 = CLOBBER;
...
   = bitint.4;
and the CLOBBER is the problem because bitint.4 is still live afterwards.
We emit the clobbers to improve code generation, but do it only for
(initially) has_single_use SSA_NAMEs (remembered in m_single_use_names)
being used, if they don't have the same partition on the lhs and a few
other conditions.
The problem above is that b.0_6 which is used in the cast has_single_use
and so was in m_single_use_names bitmask and the lhs in that case is
bitint.2, so a different partition.  But there is gimple_assign_copy_p
with SSA_NAME rhs1 and the partitioning special cases those and while
b.0_6 is single use, b_5 has multiple uses.  I believe this ought to be
a problem solely in the case of such copy stmts and its special case
by the partitioning, if instead of b.0_6 = b_5; there would be
b.0_6 = b_5 + 1; or whatever other stmts that performs or may perform
changes on the value, partitioning couldn't assign the same partition
to b.0_6 and b_5 if b_5 is used later, it couldn't have two different
(or potentially different) values in the same bitint.N var.  With
copy that is possible though.

So the following patch fixes it by being more careful when we set
m_single_use_names, don't set it if it is a has_single_use SSA_NAME
but SSA_NAME_DEF_STMT of it is a copy stmt with SSA_NAME rhs1 and that
rhs1 doesn't have single use, or has_single_use but SSA_NAME_DEF_STMT of it
is a copy stmt etc.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Just to make sure it doesn't change code generation too much, I've gathered
statistics how many times
  if (m_first
  && m_single_use_names
  && m_vars[p] != m_lhs
  && m_after_stmt
  && bitmap_bit_p (m_single_use_names, SSA_NAME_VERSION (op)))
{
  tree clobber = build_clobber (TREE_TYPE (m_vars[p]),
CLOBBER_STORAGE_END);
  g = gimple_build_assign (m_vars[p], clobber);
  gimple_stmt_iterator gsi = gsi_for_stmt (m_after_stmt);
  gsi_insert_after (&gsi, g, GSI_SAME_STMT);
}
emits a clobber on
make check-gcc GCC_TEST_RUN_EXPENSIVE=1 
RUNTESTFLAGS="--target_board=unix\{-m64,-m32\} GCC_TEST_RUN_EXPENSIVE=1 
dg.exp='*bitint* pr112673.c builtin-stdc-bit-*.c pr112566-2.c pr112511.c 
pr116588.c pr116003.c pr113693.c pr113602.c flex-array-counted-by-7.c' 
dg-torture.exp='*bitint* pr116480-2.c pr114312.c pr114121.c' dfp.exp=*bitint* 
i386.exp='pr118017.c pr117946.c apx-ndd-x32-2a.c' 
vect.exp='vect-early-break_99-pr113287.c' tree-ssa.exp=pr113735.c"
and before this patch it was 41010 clobbers and after it is 40968,
so difference is 42 clobbers, 0.1% fewer.

2025-04-16  Jakub Jelinek  

PR middle-end/119808
* gimple-lower-bitint.cc (gimple_lower_bitint): Don't set
m_single_use_names bits for SSA_NAMEs which have single use but
their SSA_NAME_DEF_STMT is a copy from another SSA_NAME which doesn't
have a single use, or single use which is such a copy etc.

* gcc.dg/bitint-121.c: New test.

--- gcc/gimple-lower-bitint.cc.jj   2025-04-12 13:13:47.543814860 +0200
+++ gcc/gimple-lower-bitint.cc  2025-04-15 21:00:32.779348865 +0200
@@ -6647,10 +6647,28 @@ gimple_lower_bitint (void)
  bitmap_set_bit (large_huge.m_names, SSA_NAME_VERSION (s));
  if (has_single_use (s))
{
- if (!large_huge.m_single_use_names)
-   large_huge.m_single_use_names = BITMAP_ALLOC (NULL);
- bitmap_set_bit (large_huge.m_single_use_names,
- SSA_NAME_VERSION (s));
+ tree s2 = s;
+ /* The coalescing hook special cases SSA_NAME copies.
+Make sure not to mark in m_single_use_names single
+use SSA_NAMEs copied from non-single use SSA_NAMEs.  */
+ while (gimple_assign_copy_p (SSA_NAME_DEF_STMT (s2)))
+   {
+ s2 = gimple_assign_rhs1 (SSA_NAME_DEF_STMT (s2));
+ if (TREE_CODE (s2) != SSA_NAME)
+   break;
+ if (!has_single_use (s2))
+   {
+ s2 = NULL_TREE;
+ break;
+   }
+   }
+ if

Re: [PATCH] [testsuite] [ppc] compile [PR112822] with -mvsx

2025-04-15 Thread Alexandre Oliva
On Apr 15, 2025, Peter Bergner  wrote:

> On 4/14/25 11:30 PM, Alexandre Oliva wrote:
>> On Apr 14, 2025, Peter Bergner  wrote:
>> 
>>> This is an architecture independent test case, so I'm surprised this
>>> doesn't FAIL on non-powerpc targets since they don't know anything
>>> about altivec.
>> 
>> AFAICT we ignore attributes we don't know about.
>> 
>> I'd think the following fix should help them too.
>> 
>> I considered doing something like that, but I don't know whether the
>> modified test would trigger the original ICE.  It seemed fragile, and
>> the change could sort of invalidate the regression test.

> I have verified the modified test case ICEs with the exact same
> error as the original test case using the commit immediately
> before the commit the fixed the ICE.

Awesome, thanks!  I hereby withdraw the proposed patch, in favor of yours.

-- 
Alexandre Oliva, happy hackerhttps://blog.lx.oliva.nom.br/
Free Software Activist FSFLA co-founder GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity.
Excluding neuro-others for not behaving ""normal"" is *not* inclusive!


[committed] testsuite: Add testcase for already fixed PR [PR116093]

2025-04-15 Thread Jakub Jelinek
Hi!

This testcase got fixed with r15-9397 PR119722 fix.

Tested on x86_64-linux -m32/-m64 with vanilla trunk as well
as with r15-9397 fix reverted (where it FAILs), committed
to trunk as obvious.

2025-04-16  Jakub Jelinek  

PR tree-optimization/116093
* gcc.dg/bitint-122.c: New test.

--- gcc/testsuite/gcc.dg/bitint-122.c.jj
+++ gcc/testsuite/gcc.dg/bitint-122.c
@@ -0,0 +1,20 @@
+/* PR tree-optimization/116093 */
+/* { dg-do run { target bitint } } */
+/* { dg-options "-Og -ftree-vrp -fno-tree-dce" } */
+
+#if __BITINT_MAXWIDTH__ >= 129
+char
+foo (int a, _BitInt (129) b, char c)
+{
+  return c << (5 / b % (0xdb75dbf5 | a));
+}
+#endif
+
+int
+main ()
+{
+#if __BITINT_MAXWIDTH__ >= 129
+  if (foo (0, 6, 1) != 1)
+__builtin_abort ();
+#endif
+}

Jakub



Re: [PATCH] [testsuite] [ppc] disable -mpowerpc64 for various ilp32 asm-out checks

2025-04-15 Thread Alexandre Oliva
On Apr 15, 2025, Peter Bergner  wrote:

> On 4/15/25 9:36 AM, Peter Bergner wrote:
>> So what ABI does powerpc-elf use and what does it mandate?

That's not for me to decide, but to me the patch that introduced
OS_MISSING_POWERPC64 and the PR106680 coversation suggests that enabling
-mpowerpc64 with -m32 -mcpu=<64bitcapable> had long been intended,
*except* on systems that are not 64-bit-compatible when running 32-bit
mode programs.

I acknowledge it is a different ABI, but since call-saved registers are
handled in a way that makes the difference irrelevant, a system without
preemption (including interrupts and traps) would hit no trouble AFAICT.

Since powerpc-elf doesn't assume an underlying operating system, it
doesn't strike me as an unreasonable assumption that there won't be
preemption, or that, if the initialization code enables execution of
powerpc64 instructions (I'd expect that to be necessary to enable them,
but that's just an uncheckedd guess), then one could also count on
64-bit register saves at traps, interrupts, and context switches; as
long as there isn't something like async signals, that should suffice to
make -mpowerpc64 safe, useful and desirable in this target.

But I acknowledge that it's a bit of a risky proposition; I suppose it
would be more conservative to disable it uniformly on all targets, and
only enable it with -m32 when explicitly requested.  I.e., make
OS_MISSING_POWERPC64 the rule rather than the exception, and define it
to zero on targets where it is deemed safe.

I'm not sure what the setting should be for powerpc-elf.  Since
!OS_MISSING_POWERPC64 has been in effect for so long on it, my
inclination would be to leave it as is.

> It seems the behavior to add OPTION_MASK_POWERPC64 happened with
> Kewen's patch

No, it had been there *long* before, Kewen's patch only enabled systems
that would misbehave due to ABI concerns to declare the effective
incompatibility of enabling -mpowerpc64 in -m32.  IMHO it should be a
lot noisier than it is, possibly an error, when the incompatibility is
there, since any async interaction could be trouble.

> maybe you need the same fix rtmes added?

I don't see reason to consider that a fix for powerpc-elf; it would be a
feature regression for those who take legitimate advantage of 64-bit
registers and instructions on deployed hardware.  But my considerations
about its not being a conservatively safe feature apply: I wouldn't
stand in the way of flipping the global default, and *not* enabling
-mpowerpc64 implicitly any more on powerpc-elf, provided that it *could*
still be enabled when knowledge about the target environment makes it
safe.

>   https://gcc.gnu.org/PR106680

-- 
Alexandre Oliva, happy hackerhttps://blog.lx.oliva.nom.br/
Free Software Activist FSFLA co-founder GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity.
Excluding neuro-others for not behaving ""normal"" is *not* inclusive!


Re:[committed] [PATCH] AArch64: Fix operands order in vec_extract expander

2025-04-15 Thread Tejas Belagod

On 4/15/25 1:56 PM, Richard Sandiford wrote:

Tejas Belagod  writes:

The operand order to gen_vcond_mask call in the vec_extract pattern is wrong.
Fix the order where predicate is operand 3.

Tested and bootstrapped on aarch64-linux-gnu. OK for trunk?

gcc/ChangeLog

* config/aarch64/aarch64-sve.md (vec_extract): Fix operand
order to gen_vcond_mask_*.


Thanks, LGTM too.



Thanks, now applied to trunk as 31e16c8b75b.

Tejas.


Richard


---
  gcc/config/aarch64/aarch64-sve.md | 6 +++---
  1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-sve.md 
b/gcc/config/aarch64/aarch64-sve.md
index 3dbd65986ec..d4af3706294 100644
--- a/gcc/config/aarch64/aarch64-sve.md
+++ b/gcc/config/aarch64/aarch64-sve.md
@@ -3133,9 +3133,9 @@
"TARGET_SVE"
{
  rtx tmp = gen_reg_rtx (mode);
-emit_insn (gen_vcond_mask_ (tmp, operands[1],
-CONST1_RTX (mode),
-CONST0_RTX (mode)));
+emit_insn (gen_vcond_mask_ (tmp, CONST1_RTX (mode),
+CONST0_RTX (mode),
+operands[1]));
  emit_insn (gen_vec_extract (operands[0], tmp, operands[2]));
  DONE;
}




Re: [PATCH] [testsuite] [ppc] pr87600, pr89313: test for __PPC__ as well

2025-04-15 Thread Alexandre Oliva
On Apr 14, 2025, Peter Bergner  wrote:

> On 4/11/25 1:03 PM, Alexandre Oliva wrote:
>> gcc.dg/pr87600.h and gcc.dg/pr89313.c test for __powerpc__ and
>> __POWERPC__ to choose ppc register names, but ppc-elf defines neither;
>> it defines __PPC__, so test for that as well.

> Is there a reason why powerpc-*-elf doesn't define __powerpc__ or
> __POWERPC__ like we do for other powerpc* targets?

-ENOCLUE :-(

It doesn't seem to be uniform.

darwin.h defines __POWERPC__; 32-bit freebsd.h defines neither (though
freebsd64.h defines __powerpc__ with -m32); sysv4.h defines neither.
It seems to be a long and messy history.

> That said, I think this probably falls under the "obvious" rule too.

Thanks,

-- 
Alexandre Oliva, happy hackerhttps://blog.lx.oliva.nom.br/
Free Software Activist FSFLA co-founder GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity.
Excluding neuro-others for not behaving ""normal"" is *not* inclusive!


Re: [PATCH] [testsuite] [ppc] compile [PR112822] with -mvsx

2025-04-15 Thread Alexandre Oliva
On Apr 14, 2025, Peter Bergner  wrote:

> diff --git a/gcc/testsuite/g++.dg/pr112822.C b/gcc/testsuite/g++.dg/pr112822.C
> -typedef __attribute__((altivec(vector__))) double co;
> +typedef double co __attribute__ ((vector_size (16)));

FWIW, I've tested this change on gcc-14 powerpc-elf and I confirm that
it solves the failures that motivated the initial patch in this thread.

-- 
Alexandre Oliva, happy hackerhttps://blog.lx.oliva.nom.br/
Free Software Activist FSFLA co-founder GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity.
Excluding neuro-others for not behaving ""normal"" is *not* inclusive!


Re: [PATCH v2] riscv: Fix incorrect gnu property alignment on rv32

2025-04-15 Thread Kito Cheng
Thanks, committed to trunk :)

On Fri, Apr 11, 2025 at 12:27 PM Jesse Huang  wrote:
>
> Codegen is incorrectly emitting a ".p2align 3" that coerces the
> alignment of the .note.gnu.property section from 4 to 8 on rv32.
>
> 2025-04-11  Jesse Huang  
>
> gcc/ChangeLog
>
> * config/riscv/riscv.cc (riscv_file_end): Fix .p2align value.
>
> gcc/testsuite/ChangeLog
>
> * gcc.target/riscv/gnu-property-align-rv32.c: New file.
> * gcc.target/riscv/gnu-property-align-rv64.c: New file.
> ---
>  gcc/config/riscv/riscv.cc| 2 +-
>  gcc/testsuite/gcc.target/riscv/gnu-property-align-rv32.c | 7 +++
>  gcc/testsuite/gcc.target/riscv/gnu-property-align-rv64.c | 7 +++
>  3 files changed, 15 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/gnu-property-align-rv32.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/gnu-property-align-rv64.c
>
> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> index 38f3ae7cd84..d3656a7a430 100644
> --- a/gcc/config/riscv/riscv.cc
> +++ b/gcc/config/riscv/riscv.cc
> @@ -10382,7 +10382,7 @@ riscv_file_end ()
>fprintf (asm_out_file, "1:\n");
>
>/* pr_type.  */
> -  fprintf (asm_out_file, "\t.p2align\t3\n");
> +  fprintf (asm_out_file, "\t.p2align\t%u\n", p2align);
>fprintf (asm_out_file, "2:\n");
>fprintf (asm_out_file, "\t.long\t0xc000\n");
>/* pr_datasz.  */
> diff --git a/gcc/testsuite/gcc.target/riscv/gnu-property-align-rv32.c 
> b/gcc/testsuite/gcc.target/riscv/gnu-property-align-rv32.c
> new file mode 100644
> index 000..4f48cff33da
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/gnu-property-align-rv32.c
> @@ -0,0 +1,7 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv32g_zicfiss -fcf-protection=return -mabi=ilp32d " 
> } */
> +
> +void foo() {}
> +
> +/* { dg-final { scan-assembler-times ".p2align\t2" 3 } } */
> +/* { dg-final { scan-assembler-not ".p2align\t3" } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/gnu-property-align-rv64.c 
> b/gcc/testsuite/gcc.target/riscv/gnu-property-align-rv64.c
> new file mode 100644
> index 000..1bfd1271826
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/gnu-property-align-rv64.c
> @@ -0,0 +1,7 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64g_zicfiss -fcf-protection=return -mabi=lp64d " } 
> */
> +
> +void foo() {}
> +
> +/* { dg-final { scan-assembler-times ".p2align\t3" 3 } } */
> +/* { dg-final { scan-assembler-not ".p2align\t2" } } */
> --
> 2.39.3
>


Re: [PATCH] RISC-V: Put jump table in text for large code model

2025-04-15 Thread Kito Cheng
committed :)

On Mon, Apr 14, 2025 at 6:01 PM Kito Cheng  wrote:
>
> This patch will be committed this week if CI passes and not strong
> objections since it's bug to large code model, also change is small
>
> On Mon, Apr 14, 2025 at 6:00 PM Kito Cheng  wrote:
> >
> > Large code model assume the data or rodata may put far away from
> > text section.  So we need to put jump table in text section for
> > large code model.
> >
> > gcc/ChangeLog:
> >
> > * config/riscv/riscv.h (JUMP_TABLES_IN_TEXT_SECTION): Check if
> > large code model.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/riscv/jump-table-large-code-model.c: New test.
> > ---
> >  gcc/config/riscv/riscv.h  |  2 +-
> >  .../riscv/jump-table-large-code-model.c   | 24 +++
> >  2 files changed, 25 insertions(+), 1 deletion(-)
> >  create mode 100644 
> > gcc/testsuite/gcc.target/riscv/jump-table-large-code-model.c
> >
> > diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
> > index 2bcabd03517..2759a4cb1c9 100644
> > --- a/gcc/config/riscv/riscv.h
> > +++ b/gcc/config/riscv/riscv.h
> > @@ -888,7 +888,7 @@ extern enum riscv_cc get_riscv_cc (const rtx use);
> >  #define ASM_OUTPUT_OPCODE(STREAM, PTR) \
> >(PTR) = riscv_asm_output_opcode(STREAM, PTR)
> >
> > -#define JUMP_TABLES_IN_TEXT_SECTION 0
> > +#define JUMP_TABLES_IN_TEXT_SECTION (riscv_cmodel == CM_LARGE)
> >  #define CASE_VECTOR_MODE SImode
> >  #define CASE_VECTOR_PC_RELATIVE (riscv_cmodel != CM_MEDLOW)
> >
> > diff --git a/gcc/testsuite/gcc.target/riscv/jump-table-large-code-model.c 
> > b/gcc/testsuite/gcc.target/riscv/jump-table-large-code-model.c
> > new file mode 100644
> > index 000..1ee7f6c07d3
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/jump-table-large-code-model.c
> > @@ -0,0 +1,24 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-march=rv64gc -mabi=lp64 -mcmodel=large" } */
> > +
> > +int foo(int x, int y)
> > +{
> > +  switch(x){
> > +  case 0:
> > +return 123 + y;
> > +  case 1:
> > +return 456 + y;
> > +  case 2:
> > +return 789 - y;
> > +  case 3:
> > +return 12 * y;
> > +  case 4:
> > +return 13 % y;
> > +  case 5:
> > +return 11 *y;
> > +  }
> > +  return 0;
> > +}
> > +
> > +
> > +/* { dg-final { scan-assembler-not "\.section  \.rodata" } } */
> > --
> > 2.34.1
> >


Re: [PATCH STAGE 4] aarch64: Disable sysreg feature gating

2025-04-15 Thread Richard Sandiford
Alice Carlotti  writes:
> This applies to the sysreg read/write intrinsics __arm_[wr]sr*.  It does
> not depend on changes to Binutils, because GCC converts recognised
> sysreg names to an encoding based form, which is already ungated in Binutils.
>
> We have, however, agreed to make an equivalent change in Binutils (which
> would then disable feature gating for sysreg accesses in inline
> assembly), but this has not yet been posted upstream.
>
> In the future we may introduce a new flag to renable some checking,
> but these checks could not be comprehensive because many system
> registers depend on architecture features that don't have corresponding
> GCC/GAS --march options.  This would also depend on addressing numerous
> inconsistencies in the existing list of sysreg feature dependencies.
>
> ---
>
> Ok for master now? And how about backporting to gcc 14? I do recognise that
> this is late in stage 4, sorry - it slipped through the gaps of being
> Binutils-adjacent work with a different deadline.

OK for trunk and backports.  I'm disappointed that I didn't notice
the lack of dg-error tests for the code being removed.

Thanks,
Richard

>
> Thanks,
> Alice
>
>
>
> gcc/ChangeLog:
>
>   * config/aarch64/aarch64.cc
>   (aarch64_valid_sysreg_name_p): Remove feature check.
>   (aarch64_retrieve_sysreg): Ditto.
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/aarch64/acle/rwsr-ungated.c: New test.
>
>
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index 
> 4e801146c60a52c7ef6f8c0f92b1b922e729c234..433ec975d7e4e9d7130fe49eac37f4ebfb880416
>  100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -31073,8 +31073,6 @@ aarch64_valid_sysreg_name_p (const char *regname)
>const sysreg_t *sysreg = aarch64_lookup_sysreg_map (regname);
>if (sysreg == NULL)
>  return aarch64_is_implem_def_reg (regname);
> -  if (sysreg->arch_reqs)
> -return bool (aarch64_isa_flags & sysreg->arch_reqs);
>return true;
>  }
>  
> @@ -31098,8 +31096,6 @@ aarch64_retrieve_sysreg (const char *regname, bool 
> write_p, bool is128op)
>if ((write_p && (sysreg->properties & F_REG_READ))
>|| (!write_p && (sysreg->properties & F_REG_WRITE)))
>  return NULL;
> -  if ((~aarch64_isa_flags & sysreg->arch_reqs) != 0)
> -return NULL;
>return sysreg->encoding;
>  }
>  
> diff --git a/gcc/testsuite/gcc.target/aarch64/acle/rwsr-ungated.c 
> b/gcc/testsuite/gcc.target/aarch64/acle/rwsr-ungated.c
> new file mode 100644
> index 
> ..d67a42673733cdb128fd62d465fa122037ae531d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/acle/rwsr-ungated.c
> @@ -0,0 +1,13 @@
> +/* Test that __arm_[r,w]sr intrinsics aren't gated (by default).  */
> +
> +/* { dg-do compile } */
> +/* { dg-options "-march=armv8-a" } */
> +
> +#include 
> +
> +uint64_t
> +foo (uint64_t a)
> +{
> +  __arm_wsr64 ("zcr_el1", a);
> +  return __arm_rsr64 ("smcr_el1");
> +}


[PATCH v2] c: Fully fold each parameter for call to .ACCESS_WITH_SIZE [PR119717]

2025-04-15 Thread Qing Zhao
This is the 2nd version of the patch, the change is to replace "FALSE" with
"false" per Marek's comments.

C_MAYBE_CONST_EXPR is a C FE operator that will be removed by c_fully_fold.
In c_fully_fold, it assumes that operands of function calls have already
been folded. However, when we build call to .ACCESS_WITH_SIZE, all its
operands are not fully folded. therefore the C FE specific operator is
passed to middle-end.

In order to fix this issue, fully fold the parameters before building the
call to .ACCESS_WITH_SIZE.

Bootstrapped and regression tested on both x86 and aarch64.
Okay for trunk?

Thanks.

Qing

=

PR c/119717

gcc/c/ChangeLog:

* c-typeck.cc (build_access_with_size_for_counted_by): Fully fold the
parameters for call to .ACCESS_WITH_SIZE.

gcc/testsuite/ChangeLog:

* gcc.dg/pr119717.c: New test.
---
 gcc/c/c-typeck.cc   |  8 ++--
 gcc/testsuite/gcc.dg/pr119717.c | 24 
 2 files changed, 30 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/pr119717.c

diff --git a/gcc/c/c-typeck.cc b/gcc/c/c-typeck.cc
index 3870e8a1558..55d896e02df 100644
--- a/gcc/c/c-typeck.cc
+++ b/gcc/c/c-typeck.cc
@@ -3013,12 +3013,16 @@ build_access_with_size_for_counted_by (location_t loc, 
tree ref,
   gcc_assert (c_flexible_array_member_type_p (TREE_TYPE (ref)));
   /* The result type of the call is a pointer to the flexible array type.  */
   tree result_type = c_build_pointer_type (TREE_TYPE (ref));
+  tree first_param
+= c_fully_fold (array_to_pointer_conversion (loc, ref), false, NULL);
+  tree second_param
+= c_fully_fold (counted_by_ref, false, NULL);
 
   tree call
 = build_call_expr_internal_loc (loc, IFN_ACCESS_WITH_SIZE,
result_type, 6,
-   array_to_pointer_conversion (loc, ref),
-   counted_by_ref,
+   first_param,
+   second_param,
build_int_cst (integer_type_node, 1),
build_int_cst (counted_by_type, 0),
build_int_cst (integer_type_node, -1),
diff --git a/gcc/testsuite/gcc.dg/pr119717.c b/gcc/testsuite/gcc.dg/pr119717.c
new file mode 100644
index 000..e5eedc567b3
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr119717.c
@@ -0,0 +1,24 @@
+/* PR c/119717  */
+/* { dg-additional-options "-std=c23" } */
+/* { dg-do compile } */
+
+struct annotated {
+  unsigned count;
+  [[gnu::counted_by(count)]] char array[];
+};
+
+[[gnu::noinline,gnu::noipa]]
+static unsigned
+size_of (bool x, struct annotated *a)
+{
+  char *p = (x ? a : 0)->array;
+  return __builtin_dynamic_object_size (p, 1);
+}
+
+int main()
+{
+  struct annotated *p = __builtin_malloc(sizeof *p);
+  p->count = 0;
+  __builtin_printf ("the bdos whole is %ld\n", size_of (0, p));
+  return 0;
+}
-- 
2.31.1



Re: [RFC] [C]New syntax for the argument of counted_by attribute for C language

2025-04-15 Thread Martin Uecker
Am Dienstag, dem 15.04.2025 um 14:50 +0200 schrieb Michael Matz:
> Hello,
...

> > struct A {
> >   int *buf __counted_by(len); // 'len' *must* be in the struct.
> >   int len;
> > };
> 
> ... means that we would have to implement general delayed parsing for 
> expressions in C parsers. 

I have to agree with Michael.  This was the main reason
we rejected the original approach.  

I also think consistency with general syntax for arrays in structs
is far more important for C than consistency for the special case of
having only one identifier in counted_by.

Martin


Re: [PATCH] [PR119765] testsuite: adjust amd64-abi-9.c to check both ms and sysv ABIs

2025-04-15 Thread NightStrike
On Tue, Apr 15, 2025 at 5:02 AM LIU Hao  wrote:
>
> 在 2025-4-14 04:10, Peter Damianov 写道:
> > diff --git a/gcc/testsuite/gcc.target/i386/amd64-abi-9.c 
> > b/gcc/testsuite/gcc.target/i386/amd64-abi-9.c
> > index 9b2cd7e7b49..827215be3e2 100644
> > --- a/gcc/testsuite/gcc.target/i386/amd64-abi-9.c
> > +++ b/gcc/testsuite/gcc.target/i386/amd64-abi-9.c
> > @@ -1,18 +1,46 @@
> >   /* { dg-do compile { target { ! ia32 } } } */
> >   /* { dg-options "-O2 -mno-sse -mno-skip-rax-setup" } */
> > +
> > +// For sysv abi, eax holds the number of XMM registers used in the call.
> > +// Since sse is disabled, check that it is zeroed
> >   /* { dg-final { scan-assembler-times "xorl\[\\t \]*\\\%eax,\[\\t \]*%eax" 
> > 2 } } */
> >
> > -void foo (const char *, ...);
> > +// For ms abi, the argument should go in edx
> > +/* { dg-final { scan-assembler-times "movl\[\\t \]*\\\$20,\[\\t \[]*%edx" 
> > 2 } } */
>
> is this a superfluous `\[` ? --^^
>
> > +
> > +// For sysv abi, the argument should go in esi
> > +/* { dg-final { scan-assembler-times "movl\[\\t \]*\\\$20,\[\\t \[]*%esi" 
> > 2 } } */
> > +
> > +
>
> ditto.

Both should be \] instead of \[]


[PATCH][GCC14] Extend check-function-bodies to allow label and directives

2025-04-15 Thread H.J. Lu
Hi,

I'd like to backport this testsuite enhancement to GCC 14 so that

https://gcc.gnu.org/pipermail/gcc-patches/2025-April/680896.html

can be backported to GCC 14 with testcases unchanged.


H.J.
---
As PR target/116174 shown, we may need to verify labels and the directive
order.  Extend check-function-bodies to support matched output lines to
allow label and directives.

gcc/

* doc/sourcebuild.texi (check-function-bodies): Add an optional
argument for matched output lines.

gcc/testsuite/

* gcc.target/i386/pr116174.c: Use check-function-bodies.
* lib/scanasm.exp (parse_function_bodies): Append the line if
$up_config(matched) matches the line.
(check-function-bodies): Add an argument for matched.  Set
up_config(matched) to $matched.  Append the expected line without
$config(line_prefix) to function_regexp if it starts with ".L".

Signed-off-by: H.J. Lu 
(cherry picked from commit d6bb1e257fc414d21bc31faa7ddecbc93a197e3c)
---
 gcc/doc/sourcebuild.texi |  9 ++---
 gcc/testsuite/gcc.target/i386/pr116174.c | 18 +++---
 gcc/testsuite/lib/scanasm.exp| 15 +--
 3 files changed, 34 insertions(+), 8 deletions(-)

diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index 23dedef4161..c8130dc1ba9 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -3440,7 +3440,7 @@ assembly output.
 Passes if @var{symbol} is not defined as a hidden symbol in the test's
 assembly output.
 
-@item check-function-bodies @var{prefix} @var{terminator} [@var{options} [@{ 
target/xfail @var{selector} @}]]
+@item check-function-bodies @var{prefix} @var{terminator} [@var{options} [@{ 
target/xfail @var{selector} @} [@var{matched}]]]
 Looks through the source file for comments that give the expected assembly
 output for selected functions.  Each line of expected output starts with the
 prefix string @var{prefix} and the expected output for a function as a whole
@@ -3467,8 +3467,11 @@ Depending on the configuration (see
 @code{configure_check-function-bodies} in
 @file{gcc/testsuite/lib/scanasm.exp}), the test may discard from the
 compiler's assembly output directives such as @code{.cfi_startproc},
-local label definitions such as @code{.LFB0}, and more.
-It then matches the result against the expected
+local label definitions such as @code{.LFB0}, and more.  This behavior
+can be overridden using the optional @var{matched} argument, which
+specifies a regexp for lines that should not be discarded in this way.
+
+The test then matches the result against the expected
 output for a function as a single regular expression.  This means that
 later lines can use backslashes to refer back to @samp{(@dots{})}
 captures on earlier lines.  For example:
diff --git a/gcc/testsuite/gcc.target/i386/pr116174.c 
b/gcc/testsuite/gcc.target/i386/pr116174.c
index 8877d0b51af..686aeb9ff31 100644
--- a/gcc/testsuite/gcc.target/i386/pr116174.c
+++ b/gcc/testsuite/gcc.target/i386/pr116174.c
@@ -1,6 +1,20 @@
 /* { dg-do compile { target *-*-linux* } } */
-/* { dg-options "-O2 -fcf-protection=branch" } */
+/* { dg-options "-O2 -g0 -fcf-protection=branch" } */
+/* Keep labels and directives ('.p2align', '.cfi_startproc').
+/* { dg-final { check-function-bodies "**" "" "" { target "*-*-*" } {^\t?\.}  
} } */
 
+/*
+**foo:
+**.LFB0:
+** .cfi_startproc
+** (
+** endbr64
+** .p2align 5
+** |
+** endbr32
+** )
+**...
+*/
 char *
 foo (char *dest, const char *src)
 {
@@ -8,5 +22,3 @@ foo (char *dest, const char *src)
 /* nothing */;
   return --dest;
 }
-
-/* { dg-final { scan-assembler "\t\.cfi_startproc\n\tendbr(32|64)\n" } } */
diff --git a/gcc/testsuite/lib/scanasm.exp b/gcc/testsuite/lib/scanasm.exp
index 6cf9997240d..d1c8e3b5079 100644
--- a/gcc/testsuite/lib/scanasm.exp
+++ b/gcc/testsuite/lib/scanasm.exp
@@ -952,6 +952,9 @@ proc parse_function_bodies { config filename result } {
verbose "parse_function_bodies: $function_name:\n$function_body"
set up_result($function_name) $function_body
set in_function 0
+   } elseif { $up_config(matched) ne "" \
+  && [regexp $up_config(matched) $line] } {
+   append function_body $line "\n"
} elseif { [regexp $up_config(fluff) $line] } {
verbose "parse_function_bodies: $function_name: ignoring fluff 
line: $line"
} else {
@@ -982,7 +985,7 @@ proc check_function_body { functions name body_regexp } {
 
 # Check the implementations of functions against expected output.  Used as:
 #
-# { dg-do { check-function-bodies PREFIX TERMINATOR[ OPTION[ SELECTOR]] } }
+# { dg-do { check-function-bodies PREFIX TERMINATOR[ OPTION[ SELECTOR 
[MATCHED]]] } }
 #
 # See sourcebuild.texi for details.
 
@@ -990,7 +993,7 @@ proc check-function-bodies { args } {
 if { [llength $args] < 2 } {
error "too few arguments to check-function-bodies"

Re: [PATCH 2/4] cfgloopmanip: Add infrastructure for scaling of multi-exit loops [PR117790]

2025-04-15 Thread Jan Hubicka
> Hi,
> > gcc/ChangeLog:
> > 
> > PR tree-optimization/117790
> > * cfgloopmanip.cc (can_flow_scale_loop_freqs_p): New.
> > (flow_scale_loop_freqs): New.
> > (scale_loop_freqs_with_exit_counts): New.
> > (scale_loop_freqs_hold_exit_counts): New.
> > (scale_loop_profile): Refactor to use the newly-added
> > scale_loop_profile_1, and use scale_loop_freqs_hold_exit_counts to
> > correctly handle reducing the expected niters for loops with multiple
> > exits.
> > (scale_loop_freqs_with_new_exit_count): New.
> > (scale_loop_profile_1): New.
> > (scale_loop_profile_hold_exit_counts): New.
> > * cfgloopmanip.h (scale_loop_profile_hold_exit_counts): New.
> > (scale_loop_freqs_with_new_exit_count): New.
> +template
> +static bool
> +can_flow_scale_loop_freqs_p (class loop *loop,
> +  ExitCountFn get_exit_count)
> +{
> +  basic_block bb = loop->header;
> +
> +  const profile_count count_in = loop_count_in (loop);
> +  profile_count exit_count = profile_count::zero ();
> +
> +  while (bb != loop->latch)
> +{
> +  /* Punt if any of the BB counts are uninitialized.  */
> +  if (!bb->count.initialized_p ())
> + return false;
> +
> +  bool found_exit = false;
> +  edge internal_edge = nullptr;
> +  for (auto e : bb->succs)
> + if (flow_bb_inside_loop_p (loop, e->dest))
> +   {
> + if (internal_edge)
> +   return false;
> + internal_edge = e;
> 
> This assumes that there are at most 2 edges out which is not always the
> case (i.e. for EH and switch).  I suppose vectorizer never calls it
> there but probably you want to test that there are precisely two edges
> in can_flow_scale_loop_freqs and if not drop message to dump file, so in
> case we encounter such loops we notice.

Also forgot to write. Ohter interesting case is when loop has inner
loop.  In this case there will be 2 edges but both of them internal.
It is also possible that the exit of loop sits inside inner loop.

Honza


Re: [PATCH 2/4] cfgloopmanip: Add infrastructure for scaling of multi-exit loops [PR117790]

2025-04-15 Thread Jan Hubicka
Hi,
> gcc/ChangeLog:
> 
>   PR tree-optimization/117790
>   * cfgloopmanip.cc (can_flow_scale_loop_freqs_p): New.
>   (flow_scale_loop_freqs): New.
>   (scale_loop_freqs_with_exit_counts): New.
>   (scale_loop_freqs_hold_exit_counts): New.
>   (scale_loop_profile): Refactor to use the newly-added
>   scale_loop_profile_1, and use scale_loop_freqs_hold_exit_counts to
>   correctly handle reducing the expected niters for loops with multiple
>   exits.
>   (scale_loop_freqs_with_new_exit_count): New.
>   (scale_loop_profile_1): New.
>   (scale_loop_profile_hold_exit_counts): New.
>   * cfgloopmanip.h (scale_loop_profile_hold_exit_counts): New.
>   (scale_loop_freqs_with_new_exit_count): New.
+template
+static bool
+can_flow_scale_loop_freqs_p (class loop *loop,
+ExitCountFn get_exit_count)
+{
+  basic_block bb = loop->header;
+
+  const profile_count count_in = loop_count_in (loop);
+  profile_count exit_count = profile_count::zero ();
+
+  while (bb != loop->latch)
+{
+  /* Punt if any of the BB counts are uninitialized.  */
+  if (!bb->count.initialized_p ())
+   return false;
+
+  bool found_exit = false;
+  edge internal_edge = nullptr;
+  for (auto e : bb->succs)
+   if (flow_bb_inside_loop_p (loop, e->dest))
+ {
+   if (internal_edge)
+ return false;
+   internal_edge = e;

This assumes that there are at most 2 edges out which is not always the
case (i.e. for EH and switch).  I suppose vectorizer never calls it
there but probably you want to test that there are precisely two edges
in can_flow_scale_loop_freqs and if not drop message to dump file, so in
case we encounter such loops we notice.

+ }
+   else
+ {
+   if (found_exit)
+ return false;
+   found_exit = true;
+   exit_count += get_exit_count (e);
+ }
+
+  bb = internal_edge->dest;
+}
+
+  /* Punt if any exit edge had an uninitialized count.  */
+  if (!exit_count.initialized_p ())
+return false;
You already early rturn once you hit bb with uninitialized count, so
perhaps you can move this check just after call of get_exit_count?

+  const profile_count new_exit_count = get_exit_count (exit_edge);
+  profile_probability new_exit_prob;
+  if (new_block_count.nonzero_p ())
+   new_exit_prob = new_exit_count.probability_in (new_block_count);

If new_exit_count > new_block_count probability_in will return 1.  I
guess there is not much to do, but pehraps logging inconsistency into
dump is not a bad idea here.

+  else
+   {
+ /* NEW_BLOCK_COUNT is zero, so the only way we can make the profile
+consistent is if NEW_EXIT_COUNT is zero too.  */
+ if (dump_file && new_exit_count.nonzero_p ())
+   fprintf (dump_file,
+";; flow_scale_loop_freqs wants non-zero exit count "
+"but bb count is zero/uninit: profile is inconsistent\n");
+
+ /* Arbitrarily set the exit probability to 0.  */
+ new_exit_prob = profile_probability::never ();

never is kind of strong hint to optimize the other patch (it has
RELIABLE reliability). Since we have no info I would just keep the
probability which was there before.
+   }

Patch is OK with these changes.

Honza


Re: [PATCH 2/4] cfgloopmanip: Add infrastructure for scaling of multi-exit loops [PR117790]

2025-04-15 Thread Jan Hubicka
Hi,
> gcc/ChangeLog:
> 
>   PR tree-optimization/117790
>   * tree-vect-loop.cc (scale_profile_for_vect_loop): Use
>   scale_loop_profile_hold_exit_counts instead of scale_loop_profile.  Drop
>   the exit edge parameter, since the code now handles multiple exits.
>   Adjust the caller ...
>   (vect_transform_loop): ... here.
>
>gcc/testsuite/ChangeLog:
>
>   PR tree-optimization/117790
>   * gcc.dg/vect/vect-early-break-profile-2.c: New test.
>
>
>-  if (entry_count.nonzero_p ())
>-set_edge_probability_and_rescale_others
>-  (exit_e,
>-   entry_count.probability_in (loop->header->count / vf));
>-  /* Avoid producing very large exit probability when we do not have
>- sensible profile.  */
>-  else if (exit_e->probability < profile_probability::always () / (vf * 2))

This is handling relatively common case wehre we decide to vectorize
loop with, say, factor of 32 and have no profile-feedback.
In this case if the loop trip count is unknown at early loop, we will
esitmate it to iterate few times (approx 3-5 as that is average
iteration count of random loop based on some measurements).

The fact that we want to vecotirze by factor 32 implies that vectorizer
does not take this info seriously and its heuristics thinks better.
In this case we do not wan to drop the loop to 0 iteraitons as that
would result of poor code layout and regalloc.

I don't think you kept this logic in the new code?

Honza
>-set_edge_probability_and_rescale_others (exit_e, exit_e->probability * 
>vf);
>-  loop->latch->count = single_pred_edge (loop->latch)->count ();
>-
>-  scale_loop_profile (loop, profile_probability::always () / vf,
>-get_likely_max_loop_iterations_int (loop));
>+  const auto likely_max_niters = get_likely_max_loop_iterations_int (loop);
>+  scale_loop_profile_hold_exit_counts (loop,
>+ profile_probability::always () / vf,
>+ likely_max_niters);



[PATCH] ipa-prop: Extend the tailc IPA-VRP hack to LTO [PR119614]

2025-04-15 Thread Jakub Jelinek
Hi!

Here is my attempt at the PR119614 LTO fix.
Of course, if Martin can come up with something cleaner, let's go with that
instead.

This patch just remembers when ipa_record_return_value_range was set
to a singleton range with CONSTANT_CLASS_P value and propagates that value
through LTO to ltrans where ipa_return_value_range used by tailc pass can
consume it.
Initially I wanted to store it in cgraph_node as a tree, but haven't figured
out how to stream that tree out/in, so this patch stores it as an attribute
instead, which is streamed automatically.

Bootstrapped/regtested on x86_64-linux and i686-linux.

2025-04-14  Jakub Jelinek  

PR tree-optimization/119614
* ipa-prop.cc: Include attribs.h.
(ipa_record_return_value_range): Set "singleton retval" attribute if
the recorded range is singleton with CONSTANT_CLASS_P value, or
remove it otherwise.
(ipa_return_value_range): Use "singleton retval" attribute and create
singleton range from it as fallback.

* g++.dg/lto/pr119614_0.C: New test.

--- gcc/ipa-prop.cc.jj  2025-04-10 17:14:31.689344793 +0200
+++ gcc/ipa-prop.cc 2025-04-14 08:02:15.083339571 +0200
@@ -60,6 +60,7 @@ along with GCC; see the file COPYING3.
 #include "gimple-range.h"
 #include "value-range-storage.h"
 #include "vr-values.h"
+#include "attribs.h"
 
 /* Function summary where the parameter infos are actually stored. */
 ipa_node_params_t *ipa_node_params_sum = NULL;
@@ -6158,6 +6159,21 @@ ipa_record_return_value_range (value_ran
   ipa_return_value_sum->disable_insertion_hook ();
 }
   ipa_return_value_sum->get_create (n)->vr = ipa_get_value_range (val);
+  tree valr;
+  if (flag_lto || flag_wpa)
+{
+  if (val.singleton_p (&valr)
+ && CONSTANT_CLASS_P (valr)
+ && !tree_expr_nan_p (valr))
+   DECL_ATTRIBUTES (current_function_decl)
+ = tree_cons (get_identifier ("singleton retval"), valr,
+  DECL_ATTRIBUTES (current_function_decl));
+  else
+   DECL_ATTRIBUTES (current_function_decl)
+ = remove_attribute ("singleton retval",
+ DECL_ATTRIBUTES (current_function_decl));
+}
+
   if (dump_file && (dump_flags & TDF_DETAILS))
 {
   fprintf (dump_file, "Recording return range ");
@@ -6172,7 +6188,7 @@ bool
 ipa_return_value_range (value_range &range, tree decl)
 {
   cgraph_node *n = cgraph_node::get (decl);
-  if (!n || !ipa_return_value_sum)
+  if (!n || (!ipa_return_value_sum && !flag_ltrans))
 return false;
   enum availability avail;
   n = n->ultimate_alias_target (&avail);
@@ -6180,11 +6196,21 @@ ipa_return_value_range (value_range &ran
 return false;
   if (n->decl != decl && !useless_type_conversion_p (TREE_TYPE (decl), 
TREE_TYPE (n->decl)))
 return false;
-  ipa_return_value_summary *v = ipa_return_value_sum->get (n);
-  if (!v)
-return false;
-  v->vr->get_vrange (range);
-  return true;
+  if (ipa_return_value_sum)
+if (ipa_return_value_summary *v = ipa_return_value_sum->get (n))
+  {
+   v->vr->get_vrange (range);
+   return true;
+  }
+  if (tree attr = lookup_attribute ("singleton retval", DECL_ATTRIBUTES 
(n->decl)))
+{
+  value_range vr (TREE_VALUE (attr), TREE_VALUE (attr));
+  if (is_a  (vr))
+   (as_a  (vr)).clear_nan ();
+  range = vr;
+  return true;
+}
+  return false;
 }
 
 /* Reset all state within ipa-prop.cc so that we can rerun the compiler
--- gcc/testsuite/g++.dg/lto/pr119614_0.C.jj2025-04-14 08:06:08.774121960 
+0200
+++ gcc/testsuite/g++.dg/lto/pr119614_0.C   2025-04-07 08:42:35.629686614 
+0200
@@ -0,0 +1,34 @@
+// PR tree-optimization/119614
+// { dg-lto-do link }
+// { dg-lto-options { { -O2 -fPIC -flto -flto-partition=max } } }
+// { dg-require-effective-target shared }
+// { dg-require-effective-target fpic }
+// { dg-require-effective-target musttail }
+// { dg-extra-ld-options "-shared" }
+
+struct S {} b;
+char *foo ();
+int e, g;
+void bar ();
+void corge (S);
+
+[[gnu::noinline]] static char *
+baz ()
+{
+  bar ();
+  return 0;
+}
+
+const char *
+qux ()
+{
+  if (e)
+{
+  S a = b;
+  corge (a);
+  if (g)
+return 0;
+  [[gnu::musttail]] return baz ();
+}
+  return foo ();
+}

Jakub



Re: COBOL: Is anything stalled because of me?

2025-04-15 Thread Jakub Jelinek
On Tue, Apr 15, 2025 at 10:47:13AM -0500, Robert Dubner wrote:
> Speaking purely casually:  I thought that that COBOL would be released with 
> documented limited capability.  "Yeah, it works on x86_64-linux and 
> aarch64-linux.  More to come.".  We knew that we didn't know how to 
> cross-compile, and we knew that other platforms would have to come, in time.

What is definitely known not to work is big endian targets, cross
compilation from big endian hosts to little endian targets, 32-bit targets,
cross compilation from 32-bit hosts, I'm afraid we can live with it for the
15 release.

What is still missing are web page updates, the repository in that case
is ssh://gcc.gnu.org/git/gcc-wwwdocs.git and e.g https://gcc.gnu.org/
lists in News (left column)
"Modula-2 front end added [2022-12-14]
The Modula-2 programming language front end has been added to GCC.
This front end was contributed by Gaius Mulley."
so we want something like that for COBOL too, then in
https://gcc.gnu.org/gcc-15/changes.html something that COBOL FE has been
added and perhaps the limitations for this release.
See e.g. https://gcc.gnu.org/gcc-13/changes.html which mentioned the
addition of Modula-2.

Jakub



[pushed] c++: constexpr, trivial, and non-alias target [PR111075]

2025-04-15 Thread Jason Merrill
Tested the testcase fix with a Darwin cross-compiler.
Regression tested x86_64-pc-linux-gnu.
Applying to trunk.

-- 8< --

On Darwin and other targets with !can_alias_cdtor, we instead go to
maybe_thunk_ctor, which builds a thunk function that calls the general
constructor.  And then cp_fold tries to constant-evaluate that call, and we
ICE because we don't expect to ever be asked to constant-evaluate a call to
a trivial function.

No new test because this fixes g++.dg/torture/tail-padding1.C on affected
targets.

PR c++/111075

gcc/cp/ChangeLog:

* constexpr.cc (cxx_eval_call_expression): Allow trivial
call from a thunk.
---
 gcc/cp/constexpr.cc | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/gcc/cp/constexpr.cc b/gcc/cp/constexpr.cc
index dc59f59aa3f..4346b29abc6 100644
--- a/gcc/cp/constexpr.cc
+++ b/gcc/cp/constexpr.cc
@@ -3103,6 +3103,9 @@ cxx_eval_call_expression (const constexpr_ctx *ctx, tree 
t,
  we can only get a trivial function here with -fno-elide-constructors.  */
   gcc_checking_assert (!trivial_fn_p (fun)
   || !flag_elide_constructors
+  /* Or it's a call from maybe_thunk_body (111075).  */
+  || (TREE_CODE (t) == CALL_EXPR ? CALL_FROM_THUNK_P (t)
+  : AGGR_INIT_FROM_THUNK_P (t))
   /* We don't elide constructors when processing
  a noexcept-expression.  */
   || cp_noexcept_operand);

base-commit: 7f56a8e8ad1c33d358e9e09fcbaf263c2caba1b9
-- 
2.49.0



Re: [PATCH] x86: Update gcc.target/i386/apx-interrupt-1.c

2025-04-15 Thread Uros Bizjak
On Tue, Apr 15, 2025 at 2:23 PM H.J. Lu  wrote:
>
> On Tue, Apr 15, 2025 at 12:45 AM Uros Bizjak  wrote:
> >
> > On Tue, Apr 15, 2025 at 1:06 AM H.J. Lu  wrote:
> > >
> > > ix86_add_cfa_restore_note omits the REG_CFA_RESTORE REG note for registers
> > > pushed in red-zone.  Since
> > >
> > > commit 0a074b8c7e79f9d9359d044f1499b0a9ce9d2801
> > > Author: H.J. Lu 
> > > Date:   Sun Apr 13 12:20:42 2025 -0700
> > >
> > > APX: Don't use red-zone with 32 GPRs and no caller-saved registers
> > >
> > > disabled red-zone, update gcc.target/i386/apx-interrupt-1.c to expect
> > > 31 .cfi_restore directives.
> >
> > Hm, did you also account for RED_ZONE_RESERVE? The last 8-byte slot is
> > reserved for internal use by the compiler.
>
> There is no red-zone in this case.
>
> > Uros.
> >
> > >
> > > PR target/119784
> > > * gcc.target/i386/apx-interrupt-1.c: Expect 31 .cfi_restore
> > > directives.

OK.

Thanks,
Uros.

> > >
> > > Signed-off-by: H.J. Lu 
> > > ---
> > >  gcc/testsuite/gcc.target/i386/apx-interrupt-1.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/gcc/testsuite/gcc.target/i386/apx-interrupt-1.c 
> > > b/gcc/testsuite/gcc.target/i386/apx-interrupt-1.c
> > > index fefe2e6d6fc..fa1acc7a142 100644
> > > --- a/gcc/testsuite/gcc.target/i386/apx-interrupt-1.c
> > > +++ b/gcc/testsuite/gcc.target/i386/apx-interrupt-1.c
> > > @@ -66,7 +66,7 @@ void foo (void *frame)
> > >  /* { dg-final { scan-assembler-times {\t\.cfi_offset 132, -120} 1 } } */
> > >  /* { dg-final { scan-assembler-times {\t\.cfi_offset 131, -128} 1 } } */
> > >  /* { dg-final { scan-assembler-times {\t\.cfi_offset 130, -136} 1 } } */
> > > -/* { dg-final { scan-assembler-times ".cfi_restore" 15} } */
> > > +/* { dg-final { scan-assembler-times ".cfi_restore" 31 } } */
> > >  /* { dg-final { scan-assembler-times "pop(?:l|q)\[\\t \]*%(?:e|r)ax" 1 } 
> > > } */
> > >  /* { dg-final { scan-assembler-times "pop(?:l|q)\[\\t \]*%(?:e|r)bx" 1 } 
> > > } */
> > >  /* { dg-final { scan-assembler-times "pop(?:l|q)\[\\t \]*%(?:e|r)cx" 1 } 
> > > } */
> > > --
> > > 2.49.0
> > >
>
>
>
> --
> H.J.


[COMMITTED] Docs: Address -fivopts, -O0, and -Q confusion [PR71094]

2025-04-15 Thread Sandra Loosemore
There's a blurb at the top of the "Optimize Options" node telling
people that most optimization options are completely disabled at -O0
and a similar blurb in the entry for -Og, but nothing at the entry for
-O0.  Since this is a continuing point of confusion it seems wise to
duplicate the information in all the places users are likely to look
for it.

gcc/ChangeLog
PR tree-optimization/71094
* doc/invoke.texi (Optimize Options): Document that -fivopts is
enabled at -O1 and higher.  Add blurb about -O0 causing GCC to
completely ignore most optimization options.
---
 gcc/doc/invoke.texi | 9 +
 1 file changed, 9 insertions(+)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index b99da94dca1..0b6644b0315 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -12746,6 +12746,7 @@ complexity than at @option{-O}.
 -fipa-pure-const
 -fipa-reference
 -fipa-reference-addressable
+-fivopts
 -fmerge-constants
 -fmove-loop-invariants
 -fmove-loop-stores
@@ -12854,6 +12855,13 @@ by @option{-O2} and also turns on the following 
optimization flags:
 Reduce compilation time and make debugging produce the expected
 results.  This is the default.
 
+At @option{-O0}, GCC completely disables most optimization passes;
+they are not run even if you explicitly enable them on the command
+line, or are listed by @option{-Q --help=optimizers} as being enabled by
+default.  Many optimizations performed by GCC depend on code analysis
+or canonicalization passes that are enabled by @option{-O}, and it would
+not be useful to run individual optimization passes in isolation.
+
 @opindex Os
 @item -Os
 Optimize for size.  @option{-Os} enables all @option{-O2} optimizations
@@ -14306,6 +14314,7 @@ Enabled by default at @option{-O1} and higher.
 @item -fivopts
 Perform induction variable optimizations (strength reduction, induction
 variable merging and induction variable elimination) on trees.
+Enabled by default at @option{-O1} and higher.
 
 @opindex ftree-parallelize-loops
 @item -ftree-parallelize-loops=n
-- 
2.34.1



[PATCH STAGE 4] aarch64: Disable sysreg feature gating

2025-04-15 Thread Alice Carlotti
This applies to the sysreg read/write intrinsics __arm_[wr]sr*.  It does
not depend on changes to Binutils, because GCC converts recognised
sysreg names to an encoding based form, which is already ungated in Binutils.

We have, however, agreed to make an equivalent change in Binutils (which
would then disable feature gating for sysreg accesses in inline
assembly), but this has not yet been posted upstream.

In the future we may introduce a new flag to renable some checking,
but these checks could not be comprehensive because many system
registers depend on architecture features that don't have corresponding
GCC/GAS --march options.  This would also depend on addressing numerous
inconsistencies in the existing list of sysreg feature dependencies.

---

Ok for master now? And how about backporting to gcc 14? I do recognise that
this is late in stage 4, sorry - it slipped through the gaps of being
Binutils-adjacent work with a different deadline.

Thanks,
Alice



gcc/ChangeLog:

* config/aarch64/aarch64.cc
(aarch64_valid_sysreg_name_p): Remove feature check.
(aarch64_retrieve_sysreg): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/acle/rwsr-ungated.c: New test.


diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 
4e801146c60a52c7ef6f8c0f92b1b922e729c234..433ec975d7e4e9d7130fe49eac37f4ebfb880416
 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -31073,8 +31073,6 @@ aarch64_valid_sysreg_name_p (const char *regname)
   const sysreg_t *sysreg = aarch64_lookup_sysreg_map (regname);
   if (sysreg == NULL)
 return aarch64_is_implem_def_reg (regname);
-  if (sysreg->arch_reqs)
-return bool (aarch64_isa_flags & sysreg->arch_reqs);
   return true;
 }
 
@@ -31098,8 +31096,6 @@ aarch64_retrieve_sysreg (const char *regname, bool 
write_p, bool is128op)
   if ((write_p && (sysreg->properties & F_REG_READ))
   || (!write_p && (sysreg->properties & F_REG_WRITE)))
 return NULL;
-  if ((~aarch64_isa_flags & sysreg->arch_reqs) != 0)
-return NULL;
   return sysreg->encoding;
 }
 
diff --git a/gcc/testsuite/gcc.target/aarch64/acle/rwsr-ungated.c 
b/gcc/testsuite/gcc.target/aarch64/acle/rwsr-ungated.c
new file mode 100644
index 
..d67a42673733cdb128fd62d465fa122037ae531d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/acle/rwsr-ungated.c
@@ -0,0 +1,13 @@
+/* Test that __arm_[r,w]sr intrinsics aren't gated (by default).  */
+
+/* { dg-do compile } */
+/* { dg-options "-march=armv8-a" } */
+
+#include 
+
+uint64_t
+foo (uint64_t a)
+{
+  __arm_wsr64 ("zcr_el1", a);
+  return __arm_rsr64 ("smcr_el1");
+}



[pushed] configure, Darwin: Recognise new naming for Xcode ld.

2025-04-15 Thread Iain Sandoe
Tested on i686, x86_64 and aarch64 Darwin, plus x86_64, aarch64 and
powerpc64le Linux, pushed to trunk, thanks
Iain

--- 8 ---

The latest editions of XCode have altered the identify reported by 'ld -v'
(again).  This means that GCC configure no longer detects the version.

Fixed by adding the new name to the set checked.

gcc/ChangeLog:

* configure: Regenerate.
* configure.ac: Recognise PROJECT:ld-.nn.aa as an identifier
for Darwin's static linker.

Signed-off-by: Iain Sandoe 
---
 gcc/configure| 7 ---
 gcc/configure.ac | 7 ---
 2 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/gcc/configure b/gcc/configure
index 821f8b44bc6..16965953f05 100755
--- a/gcc/configure
+++ b/gcc/configure
@@ -3948,7 +3948,7 @@ if test x"${DEFAULT_LINKER+set}" = x"set"; then
 as_fn_error $? "cannot execute: $DEFAULT_LINKER: check --with-ld or env. 
var. DEFAULT_LINKER" "$LINENO" 5
   elif $DEFAULT_LINKER -v < /dev/null 2>&1 | grep GNU > /dev/null; then
 gnu_ld_flag=yes
-  elif $DEFAULT_LINKER -v < /dev/null 2>&1 | grep ld64- > /dev/null; then
+  elif $DEFAULT_LINKER -v < /dev/null 2>&1 | grep 'PROJECT:ld\(64\)*-' > 
/dev/null; then
 ld64_flag=yes
   fi
 
@@ -32730,8 +32730,9 @@ $as_echo "$gcc_cv_ld64_major" >&6; }
 { $as_echo "$as_me:${as_lineno-$LINENO}: checking linker version" >&5
 $as_echo_n "checking linker version... " >&6; }
 if test x"${gcc_cv_ld64_version}" = x; then
-  gcc_cv_ld64_version=`$gcc_cv_ld -v 2>&1 | $EGREP 'ld64|dyld' \
-  | sed -e 's/.*ld64-//' -e 's/.*dyld-//'| awk '{print $1}'`
+  gcc_cv_ld64_version=`$gcc_cv_ld -v 2>&1 | $EGREP 'ld64|dyld|PROJECT:ld' \
+  | sed -e 's/.*ld64-//' -e 's/.*dyld-//' -e 's/.*PROJECT:ld-//' \
+  | awk '{print $1}'`
 fi
 { $as_echo "$as_me:${as_lineno-$LINENO}: result: $gcc_cv_ld64_version" >&5
 $as_echo "$gcc_cv_ld64_version" >&6; }
diff --git a/gcc/configure.ac b/gcc/configure.ac
index 3d0a4e6f8f5..9f67e62950a 100644
--- a/gcc/configure.ac
+++ b/gcc/configure.ac
@@ -358,7 +358,7 @@ if test x"${DEFAULT_LINKER+set}" = x"set"; then
 AC_MSG_ERROR([cannot execute: $DEFAULT_LINKER: check --with-ld or env. 
var. DEFAULT_LINKER])
   elif $DEFAULT_LINKER -v < /dev/null 2>&1 | grep GNU > /dev/null; then
 gnu_ld_flag=yes
-  elif $DEFAULT_LINKER -v < /dev/null 2>&1 | grep ld64- > /dev/null; then
+  elif $DEFAULT_LINKER -v < /dev/null 2>&1 | grep 'PROJECT:ld\(64\)*-' > 
/dev/null; then
 ld64_flag=yes
   fi
   AC_DEFINE_UNQUOTED(DEFAULT_LINKER,"$DEFAULT_LINKER",
@@ -6418,8 +6418,9 @@ if test x"$ld64_flag" = x"yes"; then
 # If the version was not specified, try to find it.
 AC_MSG_CHECKING(linker version)
 if test x"${gcc_cv_ld64_version}" = x; then
-  gcc_cv_ld64_version=`$gcc_cv_ld -v 2>&1 | $EGREP 'ld64|dyld' \
-  | sed -e 's/.*ld64-//' -e 's/.*dyld-//'| awk '{print $1}'`
+  gcc_cv_ld64_version=`$gcc_cv_ld -v 2>&1 | $EGREP 'ld64|dyld|PROJECT:ld' \
+  | sed -e 's/.*ld64-//' -e 's/.*dyld-//' -e 's/.*PROJECT:ld-//' \
+  | awk '{print $1}'`
 fi
 AC_MSG_RESULT($gcc_cv_ld64_version)
 
-- 
2.39.2 (Apple Git-143)



[PATCH] Fortran: pure subroutine with pure procedure as dummy [PR106948]

2025-04-15 Thread Harald Anlauf

Dear all,

the testcase in the PR shows a case where the pureness of a function
is not properly determined, even though the function is resolved, and
its attributes clearly show that it is pure, because gfc_pure_function
relies on isym or esym being set.  This does not happen here, probably
because the function is used as a dummy here.

The least invasive fix seems to be to look at the symbol's attributes
when isym or esym is not set.

Regression testing lead to additional redundant error messages for two
testcases, so I opted to restrict the change to the case of functions
as dummy arguments, making this patch very safe.

Regtested on x86_64-pc-linux-gnu.  OK for mainline?

Thanks,
Harald

From 5ebb5bb438e8ccf6ea30559604a9f27a75dea0ef Mon Sep 17 00:00:00 2001
From: Harald Anlauf 
Date: Tue, 15 Apr 2025 20:43:05 +0200
Subject: [PATCH] Fortran: pure subroutine with pure procedure as dummy
 [PR106948]

	PR fortran/106948

gcc/fortran/ChangeLog:

	* resolve.cc (gfc_pure_function): If a function has been resolved,
	but esym is not yet set, look at its attributes to see whether it
	is pure or elemental.

gcc/testsuite/ChangeLog:

	* gfortran.dg/pure_formal_proc_4.f90: New test.
---
 gcc/fortran/resolve.cc|  7 +++
 .../gfortran.dg/pure_formal_proc_4.f90| 49 +++
 2 files changed, 56 insertions(+)
 create mode 100644 gcc/testsuite/gfortran.dg/pure_formal_proc_4.f90

diff --git a/gcc/fortran/resolve.cc b/gcc/fortran/resolve.cc
index cdf043b6411..410ff685906 100644
--- a/gcc/fortran/resolve.cc
+++ b/gcc/fortran/resolve.cc
@@ -3190,6 +3190,13 @@ gfc_pure_function (gfc_expr *e, const char **name)
 	 || e->value.function.isym->elemental;
   *name = e->value.function.isym->name;
 }
+  else if (e->symtree && e->symtree->n.sym && e->symtree->n.sym->attr.dummy)
+{
+  /* The function has been resolved, but esym is not yet set.
+	 This can happen with functions as dummy argument.  */
+  pure = e->symtree->n.sym->attr.pure || e->symtree->n.sym->attr.elemental;
+  *name = e->symtree->n.sym->name;
+}
   else
 {
   /* Implicit functions are not pure.  */
diff --git a/gcc/testsuite/gfortran.dg/pure_formal_proc_4.f90 b/gcc/testsuite/gfortran.dg/pure_formal_proc_4.f90
new file mode 100644
index 000..92640e2d2f4
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/pure_formal_proc_4.f90
@@ -0,0 +1,49 @@
+! { dg-do compile }
+! PR fortran/106948 - check that passing of PURE procedures works
+!
+! Contributed by Jim Feng
+
+module a
+  implicit none
+
+  interface new
+pure module subroutine b(x, f)
+  integer, intent(inout) :: x
+  interface
+pure function f(x) result(r)
+  real, intent(in) :: x
+  real :: r
+end function f
+  end interface
+end subroutine b
+  end interface new
+end module a
+
+submodule(a) a_b
+  implicit none
+
+contains
+  module procedure b
+x = int(f(real(x)) * 0.15)
+  end procedure b
+end submodule a_b
+
+program test
+  use a
+  implicit none
+
+  integer :: x
+
+  x = 100
+  call new(x, g)
+  print *, x
+
+contains
+
+  pure function g(y) result(r)
+real, intent(in) :: y
+real :: r
+
+r = sqrt(y)
+  end function g
+end program test
-- 
2.43.0



Re: [GCC16,RFC,V2 05/14] aarch64: add new definition for post-index stg

2025-04-15 Thread Richard Sandiford
Indu Bhagat  writes:
> Using post-index stg is a faster way of memory tagging/untagging.
>
> TBD:
>   - Currently generated by in the aarch64 backend.  Not sure if this
> is the right way to do it.
>   - Also not clear how to weave in the generation of stzg.

Similarly to patch 4, I think we should rewrite the existing stg pattern
to use the same kind of approach that I mentioned in response to patch 2,
then extend the predicate and constraint to support PRE_MODIFY and
POST_MODIFY addresses.

Thanks,
Richard

>
> ChangeLog:
>   * gcc/config/aarch64/aarch64.md
>
> ---
>
> [New in RFC V2]
> ---
>  gcc/config/aarch64/aarch64.md | 15 +++
>  1 file changed, 15 insertions(+)
>
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index 175aed3146ac..3cb773a77ad8 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -8475,6 +8475,21 @@
>[(set_attr "type" "memtag")]
>  )
>  
> +;; STG with post-index writeback.
> +(define_insn "*stg_post"
> +  [(set (mem:QI (unspec:DI
> +  [(plus:DI (match_operand:DI 1 "register_operand" "=rk")
> +(const_int 0))]
> +  UNSPEC_TAG_SPACE))
> + (and:QI (lshiftrt:DI (match_operand:DI 0 "register_operand" "rk")
> +  (const_int 56)) (const_int 15)))
> +(set (match_dup 1)
> + (plus:DI (match_dup 1) (match_operand:DI 2 
> "aarch64_granule16_simm9" "i")))]
> +  "TARGET_MEMTAG"
> +  "stg\\t%0, [%1], #%2"
> +  [(set_attr "type" "memtag")]
> +)
> +
>  ;; ST2G updates allocation tags for two memory granules (i.e. 32 bytes) at
>  ;; once, without zero initialization.
>  (define_insn "st2g"


RE: COBOL: Is anything stalled because of me?

2025-04-15 Thread Robert Dubner



> -Original Message-
> From: Jakub Jelinek 
> Sent: Tuesday, April 15, 2025 13:54
> To: Robert Dubner 
> Cc: 'Jeff Law' ; gcc-patches@gcc.gnu.org; 'James
K.
> Lowden' 
> Subject: Re: COBOL: Is anything stalled because of me?
> 
> On Tue, Apr 15, 2025 at 10:47:13AM -0500, Robert Dubner wrote:
> > Speaking purely casually:  I thought that that COBOL would be released
> with
> > documented limited capability.  "Yeah, it works on x86_64-linux and
> > aarch64-linux.  More to come.".  We knew that we didn't know how to
> > cross-compile, and we knew that other platforms would have to come, in
> time.
> 
> What is definitely known not to work is big endian targets, cross
> compilation from big endian hosts to little endian targets, 32-bit
> targets,
> cross compilation from 32-bit hosts, I'm afraid we can live with it for
> the
> 15 release.

I am afraid we're going to have to.

> 
> What is still missing are web page updates, the repository in that case
> is ssh://gcc.gnu.org/git/gcc-wwwdocs.git and e.g https://gcc.gnu.org/
> lists in News (left column)
> "Modula-2 front end added [2022-12-14]
> The Modula-2 programming language front end has been added to GCC.
> This front end was contributed by Gaius Mulley."
> so we want something like that for COBOL too, then in
> https://gcc.gnu.org/gcc-15/changes.html something that COBOL FE has been
> added and perhaps the limitations for this release.
> See e.g. https://gcc.gnu.org/gcc-13/changes.html which mentioned the
> addition of Modula-2.

Jim has been taking the lead on documentation.  He's eager to get to it.
He's been attending to some pressing family matters that require his
attention.

Thank you very much for the summary.

> 
>   Jakub



Re: [GCC16,RFC,V2 04/14] aarch64: add new definition for post-index st2g

2025-04-15 Thread Richard Sandiford
Indu Bhagat  writes:
> Using post-index st2g is a faster way of memory tagging/untagging.
> Because a post-index 'st2g tag, [addr], #32' is equivalent to:
>stg tag, addr, #0
>stg tag, addr, #16
>add addr, addr, #32
>
> TBD:
>   - Currently generated by in the aarch64 backend.  Not sure if this is
> the right way to do it.

If we do go for the "aarch64_granule_memory_operand" approach that
I described for patch 3, then that predicate (and the associated constrant)
could handle PRE_MODIFY and POST_MODIFY addresseses, which would remove
the need for separate patterns.

>   - Also not clear how to weave in the generation of stz2g.

I think stz2g could be:

(set (match_operand:OI 0 "aarch64_granule_memory_operand" "+")
 (unspec_volatile:OI
   [(const_int 0)
(match_operand:DI 1 "register_operand" "rk")]
   UNSPECV...))

I think in practice stz2g will need a separate pattern from st2g,
rather than being an alternatives of the same pattern.  (That's because
the suggested pattern for st2g uses a (match_dup 0), which isn't subject
to constraint matching.)

Thanks,
Richard

>
> ChangeLog:
>   * gcc/config/aarch64/aarch64.md
>
> ---
> [New in RFC V2]
> ---
>  gcc/config/aarch64/aarch64.md | 20 
>  1 file changed, 20 insertions(+)
>
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index d3223e275c51..175aed3146ac 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -8495,6 +8495,26 @@
>[(set_attr "type" "memtag")]
>  )
>  
> +;; ST2G with post-index writeback.
> +(define_insn "*st2g_post"
> +  [(set (mem:QI (unspec:DI
> +  [(plus:DI (match_operand:DI 1 "register_operand" "=&rk")
> +(const_int 0))]
> +  UNSPEC_TAG_SPACE))
> + (and:QI (lshiftrt:DI (match_operand:DI 0 "register_operand" "rk")
> +  (const_int 56)) (const_int 15)))
> +   (set (mem:QI (unspec:DI
> +  [(plus:DI (match_dup 1) (const_int -16))]
> +  UNSPEC_TAG_SPACE))
> + (and:QI (lshiftrt:DI (match_dup 0)
> +  (const_int 56)) (const_int 15)))
> +(set (match_dup 1)
> + (plus:DI (match_dup 1) (match_operand:DI 2 
> "aarch64_granule16_simm9" "i")))]
> +  "TARGET_MEMTAG"
> +  "st2g\\t%0, [%1], #%2"
> +  [(set_attr "type" "memtag")]
> +)
> +
>  ;; Load/Store 64-bit (LS64) instructions.
>  (define_insn "ld64b"
>[(set (match_operand:V8DI 0 "register_operand" "=r")


Re: [PATCH] c++: Properly mangle CONST_DECL without a INTEGER_CST value [PR116511]

2025-04-15 Thread Simon Martin
Hi Jason,

On Thu Apr 10, 2025 at 10:42 PM CEST, Jason Merrill wrote:
> On 9/6/24 7:15 AM, Simon Martin wrote:
>> We ICE upon the following *valid* code when mangling the requires
>> clause
>> 
>> === cut here ===
>> template  struct s1 {
>>enum { e1 = 1 };
>> };
>> template  struct s2 {
>>enum { e1 = s1::e1 };
>>s2() requires(0 != e1) {}
>> };
>> s2<8> a;
>> === cut here ===
>> 
>> The problem is that the mangler wrongly assumes that the DECL_INITIAL of
>> a CONST_DECL is always an INTEGER_CST, and blindly passes it to
>> write_integer_cst.
>> 
>> I assume we should be able to actually compute the value of e1 and use
>> it when mangling, however from my investigation, it seems to be a pretty
>> involved change.
>> 
>> What's clear however is that we should not try to write a non-literal as
>> a literal. This patch adds a utility function to determine whether a
>> tree is a literal as per the definition in the ABI, and uses it to only
>> call write_template_arg_literal when we actually have a literal in hand.
>> 
>> Note that I had to change the expectation of an existing test, that was
>> expecting "[...]void (AF::*)(){}[...]" and now gets an equivalent
>> "[...](void (AF::*)())0[...]" (and FWIW is what clang and icx give; see
>> https://godbolt.org/z/hnjdeKEhW).
>
> Unfortunately we need to provide backward bug compatibility for 
> -fabi-version=14, so this change needs to check abi_version_at_least (15).
Good point, ack.

>> +/* Determine whether T is a literal per section 5.1.6.1 of the CXX ABI.  */
>> +
>> +static bool
>> +literal_p (const tree t)
>> +{
>> +  if ((TREE_TYPE (t) && NULLPTR_TYPE_P (TREE_TYPE (t)))
>
> This looks wrong; a random expression with type nullptr_t is not a 
> literal, and can be instantiation-dependent.  And I don't see any test 
> of mangling such a thing.
TBH I think there might be more than just this wrong with this patch :-)

I have been flip-flopping between "it's wrong to just mangle the
expression" and "but I don't think we can do much better" and never
settled on one; that's why I never pinged this 6+ month old patch.

Is the approach this took actually valid? I think that in an ideal
world, the enum value would have been tsubst'd (or we'd have all we need
to tsubst it) when we mangle, and I tried to hook things up so that it
happens, but I never succeeded.

Simon



Re: [PATCH v2] c: Fully fold each parameter for call to .ACCESS_WITH_SIZE [PR119717]

2025-04-15 Thread Marek Polacek
On Tue, Apr 15, 2025 at 06:46:26PM +, Qing Zhao wrote:
> This is the 2nd version of the patch, the change is to replace "FALSE" with
> "false" per Marek's comments.
> 
> C_MAYBE_CONST_EXPR is a C FE operator that will be removed by c_fully_fold.
> In c_fully_fold, it assumes that operands of function calls have already
> been folded. However, when we build call to .ACCESS_WITH_SIZE, all its
> operands are not fully folded. therefore the C FE specific operator is
> passed to middle-end.
> 
> In order to fix this issue, fully fold the parameters before building the
> call to .ACCESS_WITH_SIZE.
> 
> Bootstrapped and regression tested on both x86 and aarch64.
> Okay for trunk?

LGTM now, thanks.
 
> Thanks.
> 
> Qing
> 
> =
> 
>   PR c/119717
> 
> gcc/c/ChangeLog:
> 
>   * c-typeck.cc (build_access_with_size_for_counted_by): Fully fold the
>   parameters for call to .ACCESS_WITH_SIZE.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/pr119717.c: New test.
> ---
>  gcc/c/c-typeck.cc   |  8 ++--
>  gcc/testsuite/gcc.dg/pr119717.c | 24 
>  2 files changed, 30 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/pr119717.c
> 
> diff --git a/gcc/c/c-typeck.cc b/gcc/c/c-typeck.cc
> index 3870e8a1558..55d896e02df 100644
> --- a/gcc/c/c-typeck.cc
> +++ b/gcc/c/c-typeck.cc
> @@ -3013,12 +3013,16 @@ build_access_with_size_for_counted_by (location_t 
> loc, tree ref,
>gcc_assert (c_flexible_array_member_type_p (TREE_TYPE (ref)));
>/* The result type of the call is a pointer to the flexible array type.  */
>tree result_type = c_build_pointer_type (TREE_TYPE (ref));
> +  tree first_param
> += c_fully_fold (array_to_pointer_conversion (loc, ref), false, NULL);
> +  tree second_param
> += c_fully_fold (counted_by_ref, false, NULL);
>  
>tree call
>  = build_call_expr_internal_loc (loc, IFN_ACCESS_WITH_SIZE,
>   result_type, 6,
> - array_to_pointer_conversion (loc, ref),
> - counted_by_ref,
> + first_param,
> + second_param,
>   build_int_cst (integer_type_node, 1),
>   build_int_cst (counted_by_type, 0),
>   build_int_cst (integer_type_node, -1),
> diff --git a/gcc/testsuite/gcc.dg/pr119717.c b/gcc/testsuite/gcc.dg/pr119717.c
> new file mode 100644
> index 000..e5eedc567b3
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/pr119717.c
> @@ -0,0 +1,24 @@
> +/* PR c/119717  */
> +/* { dg-additional-options "-std=c23" } */
> +/* { dg-do compile } */
> +
> +struct annotated {
> +  unsigned count;
> +  [[gnu::counted_by(count)]] char array[];
> +};
> +
> +[[gnu::noinline,gnu::noipa]]
> +static unsigned
> +size_of (bool x, struct annotated *a)
> +{
> +  char *p = (x ? a : 0)->array;
> +  return __builtin_dynamic_object_size (p, 1);
> +}
> +
> +int main()
> +{
> +  struct annotated *p = __builtin_malloc(sizeof *p);
> +  p->count = 0;
> +  __builtin_printf ("the bdos whole is %ld\n", size_of (0, p));
> +  return 0;
> +}
> -- 
> 2.31.1
> 

Marek



Re: [PATCH] OpenMP: omp.h omp::allocator C++ Allocator interface

2025-04-15 Thread Tobias Burnus

Alex wrote:

Tested on x86_64-pc-linux-gnu, this is only a library addition (and a
few tests) so it shouldn't cause any major impacts.  I also tested
libgomp C to ensure the conditional compile was working.


Namely, the change is only to omp.h(.in) - effective for
__cplusplus >= 201103L.

Note that the following is an OpenMP 5.0 feature that for some reason
was missed when implementing omp_alloc / omp_free support.

   omp::allocator:: ... 

where ... is the name of a predefined allocator (with omp_ and _alloc 
stripped).

[Support for omp::allocator::null_allocator is a (semi-accidental)
OpenMP 6.0 feature, where omp_null_allocator implies that the allocator
of default-allocator-var ICV is used.]

The main use case of this feature is to make it easy to use those
allocators with containers from the STL like:

  std::vector> var;

where cgroup_mem uses low latency memory on AMD and Nvidia GPU devices,
which is faster than the normal allocator.
(→ https://gcc.gnu.org/onlinedocs/libgomp/Offload-Target-Specifics.html
for cgroup_mem )

* * *

LGTM. Thanks for the patch!

Tobias


Re: [PATCH] discriminators: Fix assigning discriminators on edge [PR113546]

2025-04-15 Thread Andrew Pinski
On Sun, Mar 16, 2025 at 11:43 AM Jeff Law  wrote:
>
>
>
> On 3/15/25 9:01 PM, Andrew Pinski wrote:
> > The problem here is there was a compare debug since the discriminators
> > would still take into account debug statements. For the edge we would look
> > at the first statement after the labels and that might have been a debug 
> > statement.
> > So we need to skip over debug statements otherwise we could get different
> > discriminators # with and without -g.
> >
> > Bootstrapped and tested on x86_64-linux-gnu with no regressions.
> >
> >   PR middle-end/113546
> >
> > gcc/ChangeLog:
> >
> >   * tree-cfg.cc (first_non_label_stmt): Rename to ...
> >   (first_non_label_nondebug_stmt): This and use 
> > gsi_start_nondebug_after_labels_bb.
> >   (assign_discriminators): Update call to first_non_label_nondebug_stmt.
> >
> > gcc/testsuite/ChangeLog:
> >
> >   * c-c++-common/torture/pr113546-1.c: New test.
> OK.

Now backported/pushed to GCC 14.

Thanks,
Andrew Pinski

>
> Jeff


Re: [RFC] [C]New syntax for the argument of counted_by attribute for C language

2025-04-15 Thread Kees Cook
On Tue, Apr 15, 2025 at 09:07:44PM +0200, Martin Uecker wrote:
> Am Dienstag, dem 15.04.2025 um 14:50 +0200 schrieb Michael Matz:
> > Hello,
> ...
> 
> > > struct A {
> > >   int *buf __counted_by(len); // 'len' *must* be in the struct.
> > >   int len;
> > > };
> > 
> > ... means that we would have to implement general delayed parsing for 
> > expressions in C parsers. 
> 
> I have to agree with Michael.  This was the main reason
> we rejected the original approach.  
> 
> I also think consistency with general syntax for arrays in structs
> is far more important for C than consistency for the special case of
> having only one identifier in counted_by.

Okay, so I think the generally recognized way forward is with two
attributes:

counted_by(struct_member)

and

counted_by_expr(type struct_member; ...; expression)

This leaves flexible array members with counted_by unchanged from
current behavior.

Questions I am left with:

1) When applying counted_by to pointer members, are out-of-order member
declarations expected to be handled? As in, is this expected to be valid?

struct foo {
struct bar *p __attribute__((counted_by(count)));
int count;
};

1.A) If it is _not_ valid, is it valid to use it when the member has
been declared earlier? Such as:

struct foo {
int count;
struct bar *p __attribute__((counted_by(count)));
};

1.B) If "1" isn't valid, but "1.A" is valid, I would expect that way to
allow the member ordering in "1" is through counted_by_expr? For example:

struct foo {
struct bar *p __attribute__((counted_by_expr(int count; 
count)));
int count;
};

1.C) If "1" isn't valid, and "1.A" isn't valid, then counted_by of
pointer members must always use counted_by_expr. Is that expected?
(I ask because it seems like a potentially weird case there member order
forces choosing between two differently named attributes. It'd be really
nice if "1" could be valid.)


2) For all counted_by of pointer members, I understand this to only be
about the parsing step, not further analysis where the full sizes of
all objects will need to be known. Which means that this is valid:

struct bar; // empty declaration

struct foo {
struct bar *p __attribute__((counted_by_expr(int count; 
count)));
int count;
};
...
// defined after being referenced by counted_by_expr above
struct bar {
int a, b, c;
struct foo *p;
};

Is that correct?


3) It seems it will be possible to provide a "singleton" alias to
indicate that a given pointer member is not an array of objects, but
rather a pointer to a single object instance:

struct bar {
int a, b, c;
struct foo *p __attribute__((counted_by_expr(1)));
};

Is that correct? (This will be useful once we can apply counted_by to
function arguments...)


4) If there are type mismatches between the counted_by_expr struct
member declaration and the later actual struct member declaration, I
assume that will be a hard error. For example, this would fail to compile:

struct foo {
struct bar *p __attribute__((counted_by_expr(int count; 
count)));
unsigned long count;
};

Is that correct? It feels like if we're already able to do this analysis,
then "1" should be possible also. Perhaps I'm misunderstanding something
about the parser.


Thanks!

-Kees

-- 
Kees Cook


Re: [PATCH v2] c: Fully fold each parameter for call to .ACCESS_WITH_SIZE [PR119717]

2025-04-15 Thread Qing Zhao
Thanks.

Pushed to trunk.

Qing

> On Apr 15, 2025, at 14:56, Marek Polacek  wrote:
> 
> On Tue, Apr 15, 2025 at 06:46:26PM +, Qing Zhao wrote:
>> This is the 2nd version of the patch, the change is to replace "FALSE" with
>> "false" per Marek's comments.
>> 
>> C_MAYBE_CONST_EXPR is a C FE operator that will be removed by c_fully_fold.
>> In c_fully_fold, it assumes that operands of function calls have already
>> been folded. However, when we build call to .ACCESS_WITH_SIZE, all its
>> operands are not fully folded. therefore the C FE specific operator is
>> passed to middle-end.
>> 
>> In order to fix this issue, fully fold the parameters before building the
>> call to .ACCESS_WITH_SIZE.
>> 
>> Bootstrapped and regression tested on both x86 and aarch64.
>> Okay for trunk?
> 
> LGTM now, thanks.
> 
>> Thanks.
>> 
>> Qing
>> 
>> =
>> 
>> PR c/119717
>> 
>> gcc/c/ChangeLog:
>> 
>> * c-typeck.cc (build_access_with_size_for_counted_by): Fully fold the
>> parameters for call to .ACCESS_WITH_SIZE.
>> 
>> gcc/testsuite/ChangeLog:
>> 
>> * gcc.dg/pr119717.c: New test.
>> ---
>> gcc/c/c-typeck.cc   |  8 ++--
>> gcc/testsuite/gcc.dg/pr119717.c | 24 
>> 2 files changed, 30 insertions(+), 2 deletions(-)
>> create mode 100644 gcc/testsuite/gcc.dg/pr119717.c
>> 
>> diff --git a/gcc/c/c-typeck.cc b/gcc/c/c-typeck.cc
>> index 3870e8a1558..55d896e02df 100644
>> --- a/gcc/c/c-typeck.cc
>> +++ b/gcc/c/c-typeck.cc
>> @@ -3013,12 +3013,16 @@ build_access_with_size_for_counted_by (location_t 
>> loc, tree ref,
>>   gcc_assert (c_flexible_array_member_type_p (TREE_TYPE (ref)));
>>   /* The result type of the call is a pointer to the flexible array type.  */
>>   tree result_type = c_build_pointer_type (TREE_TYPE (ref));
>> +  tree first_param
>> += c_fully_fold (array_to_pointer_conversion (loc, ref), false, NULL);
>> +  tree second_param
>> += c_fully_fold (counted_by_ref, false, NULL);
>> 
>>   tree call
>> = build_call_expr_internal_loc (loc, IFN_ACCESS_WITH_SIZE,
>> result_type, 6,
>> - array_to_pointer_conversion (loc, ref),
>> - counted_by_ref,
>> + first_param,
>> + second_param,
>> build_int_cst (integer_type_node, 1),
>> build_int_cst (counted_by_type, 0),
>> build_int_cst (integer_type_node, -1),
>> diff --git a/gcc/testsuite/gcc.dg/pr119717.c 
>> b/gcc/testsuite/gcc.dg/pr119717.c
>> new file mode 100644
>> index 000..e5eedc567b3
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/pr119717.c
>> @@ -0,0 +1,24 @@
>> +/* PR c/119717  */
>> +/* { dg-additional-options "-std=c23" } */
>> +/* { dg-do compile } */
>> +
>> +struct annotated {
>> +  unsigned count;
>> +  [[gnu::counted_by(count)]] char array[];
>> +};
>> +
>> +[[gnu::noinline,gnu::noipa]]
>> +static unsigned
>> +size_of (bool x, struct annotated *a)
>> +{
>> +  char *p = (x ? a : 0)->array;
>> +  return __builtin_dynamic_object_size (p, 1);
>> +}
>> +
>> +int main()
>> +{
>> +  struct annotated *p = __builtin_malloc(sizeof *p);
>> +  p->count = 0;
>> +  __builtin_printf ("the bdos whole is %ld\n", size_of (0, p));
>> +  return 0;
>> +}
>> -- 
>> 2.31.1
>> 
> 
> Marek




[PATCH v2 3/4] libstdc++: Implement std::extents [PR107761].

2025-04-15 Thread Luc Grosheintz
This implements std::extents from  according to N4950 and
contains partial progress towards PR107761.

If an extent changes its type, there's a precondition in the standard,
that the value is representable in the target integer type. This commit
uses direct initialization to perform the conversion, without any
additional checks.

The precondition for 'extents::{static_,}extent' is that '__r < rank()'.
For extents this precondition is always violated and results in
calling __builtin_trap. For all other specializations it's checked via
__glibcxx_assert.

PR libstdc++/107761

libstdc++-v3/ChangeLog:

* include/std/mdspan (extents): New class.
* src/c++23/std.cc.in: Add 'using std::extents'.

Signed-off-by: Luc Grosheintz 
---
 libstdc++-v3/include/std/mdspan  | 304 +++
 libstdc++-v3/src/c++23/std.cc.in |   6 +-
 2 files changed, 309 insertions(+), 1 deletion(-)

diff --git a/libstdc++-v3/include/std/mdspan b/libstdc++-v3/include/std/mdspan
index 4094a416d1e..72ca3445d15 100644
--- a/libstdc++-v3/include/std/mdspan
+++ b/libstdc++-v3/include/std/mdspan
@@ -33,6 +33,10 @@
 #pragma GCC system_header
 #endif
 
+#include 
+#include 
+#include 
+
 #define __glibcxx_want_mdspan
 #include 
 
@@ -41,6 +45,306 @@
 namespace std _GLIBCXX_VISIBILITY(default)
 {
 _GLIBCXX_BEGIN_NAMESPACE_VERSION
+  namespace __mdspan
+  {
+template
+  class __array
+  {
+  public:
+   constexpr _Tp&
+ operator[](size_t __n) noexcept
+ {
+   return _M_elems[__n];
+ }
+
+   constexpr const _Tp&
+ operator[](size_t __n) const noexcept
+ {
+   return _M_elems[__n];
+ }
+
+  private:
+   array<_Tp, _Nm> _M_elems;
+  };
+
+template
+  class __array<_Tp, 0>
+  {
+  public:
+   constexpr _Tp&
+ operator[](size_t __n) noexcept
+ {
+   __builtin_trap();
+ }
+
+   constexpr const _Tp&
+ operator[](size_t __n) const noexcept
+ {
+   __builtin_trap();
+ }
+  };
+
+template
+  class _ExtentsStorage
+  {
+  public:
+   static constexpr bool
+   _M_is_dyn(size_t __ext) noexcept
+   { return __ext == dynamic_extent; }
+
+   template
+ static constexpr _IndexType
+ _M_int_cast(const _OIndexType& __other) noexcept
+ { return _IndexType(__other); }
+
+   static constexpr size_t _S_rank = sizeof...(_Extents);
+   static constexpr array _S_exts{_Extents...};
+
+   // For __r in [0, _S_rank], _S_dynamic_index[__r] is the number
+   // of dynamic extents up to (and not including) __r.
+   //
+   // If __r is the index of a dynamic extent, then
+   // _S_dynamic_index[__r] is the index of that extent in
+   // _M_dynamic_extents.
+   static constexpr auto _S_dynamic_index = [] consteval
+   {
+ array __ret;
+ size_t __dyn = 0;
+ for(size_t __i = 0; __i < _S_rank; ++__i)
+   {
+ __ret[__i] = __dyn;
+ __dyn += _M_is_dyn(_S_exts[__i]);
+   }
+ __ret[_S_rank] = __dyn;
+ return __ret;
+   }();
+
+   static constexpr size_t _S_rank_dynamic = _S_dynamic_index[_S_rank];
+
+   // For __r in [0, _S_rank_dynamic), _S_dynamic_index_inv[__r] is the
+   // index of the __r-th dynamic extent in _S_exts.
+   static constexpr auto _S_dynamic_index_inv = [] consteval
+   {
+ array __ret;
+ for (size_t __i = 0, __r = 0; __i < _S_rank; ++__i)
+   if (_M_is_dyn(_S_exts[__i]))
+ __ret[__r++] = __i;
+ return __ret;
+   }();
+
+   static constexpr size_t
+   _M_static_extent(size_t __r) noexcept
+   { return _S_exts[__r]; }
+
+   constexpr _IndexType
+   _M_extent(size_t __r) const noexcept
+   {
+ auto __se = _S_exts[__r];
+ if (__se == dynamic_extent)
+   return _M_dynamic_extents[_S_dynamic_index[__r]];
+ else
+   return __se;
+   }
+
+  private:
+   template
+ constexpr void
+ _M_init_dynamic_extents(_GetOtherExtent __get_extent) noexcept
+ {
+   for(size_t __i = 0; __i < _S_rank_dynamic; ++__i)
+ {
+   size_t __di = __i;
+   if constexpr (_OtherRank != _S_rank_dynamic)
+ __di = _S_dynamic_index_inv[__i];
+   _M_dynamic_extents[__i] = _M_int_cast(__get_extent(__di));
+ }
+ }
+
+  public:
+   constexpr
+   _ExtentsStorage() noexcept = default;
+
+   template
+ constexpr
+ _ExtentsStorage(const _ExtentsStorage<_OIndexType, _OExtents...>&
+ __other) noexcept
+ {
+   _M_init_dynamic_extents<_S_rank>([&__other](auto __i)
+ { return __other._M_extent(__i); });
+ }
+
+   template
+ constexpr
+ _ExtentsStorage

Re: [PATCH] [PR119765] testsuite: adjust amd64-abi-9.c to check both ms and sysv ABIs

2025-04-15 Thread LIU Hao

在 2025-4-14 04:10, Peter Damianov 写道:

diff --git a/gcc/testsuite/gcc.target/i386/amd64-abi-9.c 
b/gcc/testsuite/gcc.target/i386/amd64-abi-9.c
index 9b2cd7e7b49..827215be3e2 100644
--- a/gcc/testsuite/gcc.target/i386/amd64-abi-9.c
+++ b/gcc/testsuite/gcc.target/i386/amd64-abi-9.c
@@ -1,18 +1,46 @@
  /* { dg-do compile { target { ! ia32 } } } */
  /* { dg-options "-O2 -mno-sse -mno-skip-rax-setup" } */
+
+// For sysv abi, eax holds the number of XMM registers used in the call.
+// Since sse is disabled, check that it is zeroed
  /* { dg-final { scan-assembler-times "xorl\[\\t \]*\\\%eax,\[\\t \]*%eax" 2 } 
} */
  
-void foo (const char *, ...);

+// For ms abi, the argument should go in edx
+/* { dg-final { scan-assembler-times "movl\[\\t \]*\\\$20,\[\\t \[]*%edx" 2 } 
} */


is this a superfluous `\[` ? --^^


+
+// For sysv abi, the argument should go in esi
+/* { dg-final { scan-assembler-times "movl\[\\t \]*\\\$20,\[\\t \[]*%esi" 2 } 
} */
+
+


ditto.


--
Best regards,
LIU Hao


OpenPGP_signature.asc
Description: OpenPGP digital signature


Re: [PATCH] c++: Prune lambda captures from more places [PR119755]

2025-04-15 Thread Jason Merrill

On 4/15/25 2:56 AM, Nathaniel Shead wrote:

On Mon, Apr 14, 2025 at 05:33:05PM -0400, Jason Merrill wrote:

On 4/13/25 6:32 AM, Nathaniel Shead wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk?

-- >8 --

Currently, pruned lambda captures are still leftover in the function's
BLOCK and topmost BIND_EXPR; this doesn't cause any issues for normal
compilation, but does break modules streaming as we try to reconstruct a
FIELD_DECL that no longer exists on the type itself.

PR c++/119755

gcc/cp/ChangeLog:

* lambda.cc (prune_lambda_captures): Remove pruned capture from
function's BLOCK_VARS and BIND_EXPR_VARS.

gcc/testsuite/ChangeLog:

* g++.dg/modules/lambda-10_a.H: New test.
* g++.dg/modules/lambda-10_b.C: New test.

Signed-off-by: Nathaniel Shead 
---
   gcc/cp/lambda.cc   | 22 ++
   gcc/testsuite/g++.dg/modules/lambda-10_a.H | 17 +
   gcc/testsuite/g++.dg/modules/lambda-10_b.C |  7 +++
   3 files changed, 46 insertions(+)
   create mode 100644 gcc/testsuite/g++.dg/modules/lambda-10_a.H
   create mode 100644 gcc/testsuite/g++.dg/modules/lambda-10_b.C

diff --git a/gcc/cp/lambda.cc b/gcc/cp/lambda.cc
index f0a54b60275..d01bb04cd32 100644
--- a/gcc/cp/lambda.cc
+++ b/gcc/cp/lambda.cc
@@ -1858,6 +1858,14 @@ prune_lambda_captures (tree body)
 cp_walk_tree_without_duplicates (&body, mark_const_cap_r, &const_vars);
+  tree bind_expr = expr_single (DECL_SAVED_TREE (lambda_function (lam)));
+  gcc_assert (bind_expr
+ && (TREE_CODE (bind_expr) == BIND_EXPR
+ /* FIXME: In a noexcept lambda we never prune captures
+(PR119764); when we do we need to handle this case
+for modules streaming.  */


The attached patch seems to fix that, with the result that your patch
crashes.



Thanks.  And yup, crashing was deliberate here as I wasn't 100% sure
what the tree would look like for this case after an appropriate fix.

One quick question about your patch, since it could in theory affect ABI
(the size of the lambdas change) should the pruning of such lambdas be
dependent on an ABI version check?


Indeed, perhaps this is too late in the 15 cycle for such a change.


Otherwise here's an updated patch that relies on your patch.
Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk along
with yours?  (Or if the potential ABI concerns mean that your change
isn't appropriate for GCC15, would the old version of my patch still be
OK for GCC15 to get 'import std' working again for C++26?)


For 15 please adjust this patch to be more fault-tolerant:


-- >8 --

Currently, pruned lambda captures are still leftover in the function's
BLOCK and topmost BIND_EXPR; this doesn't cause any issues for normal
compilation, but does break modules streaming as we try to reconstruct a
FIELD_DECL that no longer exists on the type itself.

PR c++/119755

gcc/cp/ChangeLog:

* lambda.cc (prune_lambda_captures): Remove pruned capture from
function's BLOCK_VARS and BIND_EXPR_VARS.

gcc/testsuite/ChangeLog:

* g++.dg/modules/lambda-10_a.H: New test.
* g++.dg/modules/lambda-10_b.C: New test.

Signed-off-by: Nathaniel Shead 
Reviewed-by: Jason Merrill 
---
  gcc/cp/lambda.cc   | 19 +++
  gcc/testsuite/g++.dg/modules/lambda-10_a.H | 17 +
  gcc/testsuite/g++.dg/modules/lambda-10_b.C |  7 +++
  3 files changed, 43 insertions(+)
  create mode 100644 gcc/testsuite/g++.dg/modules/lambda-10_a.H
  create mode 100644 gcc/testsuite/g++.dg/modules/lambda-10_b.C

diff --git a/gcc/cp/lambda.cc b/gcc/cp/lambda.cc
index c6308b941d3..7bb88a900d5 100644
--- a/gcc/cp/lambda.cc
+++ b/gcc/cp/lambda.cc
@@ -1862,6 +1862,11 @@ prune_lambda_captures (tree body)
  
cp_walk_tree_without_duplicates (&body, mark_const_cap_r, &const_vars);
  
+  tree bind_expr = expr_single (DECL_SAVED_TREE (lambda_function (lam)));

+  if (bind_expr && TREE_CODE (bind_expr) == MUST_NOT_THROW_EXPR)
+bind_expr = expr_single (TREE_OPERAND (bind_expr, 0));
+  gcc_assert (bind_expr && TREE_CODE (bind_expr) == BIND_EXPR);


i.e. here clear bind_expr if it isn't a BIND_EXPR...


tree *fieldp = &TYPE_FIELDS (LAMBDA_EXPR_CLOSURE (lam));
for (tree *capp = &LAMBDA_EXPR_CAPTURE_LIST (lam); *capp; )
  {
@@ -1883,6 +1888,20 @@ prune_lambda_captures (tree body)
fieldp = &DECL_CHAIN (*fieldp);
  *fieldp = DECL_CHAIN (*fieldp);
  
+	  /* And out of the bindings for the function.  */

+ tree *blockp = &BLOCK_VARS (current_binding_level->blocks);
+ while (*blockp != DECL_EXPR_DECL (**use))
+   blockp = &DECL_CHAIN (*blockp);
+ *blockp = DECL_CHAIN (*blockp);
+
+ /* And maybe out of the vars declared in the containing
+BIND_EXPR, if it's listed there.  */
+ tree *bindp = &BIND_EX

[PATCH v4 18/20] Fix FMV return type ambiguation

2025-04-15 Thread Alfie Richards
Add logic for the case of two FMV annotated functions with identical
signature other than the return type.

Previously this was ignored, this changes the behavior to emit a diagnostic.

gcc/cp/ChangeLog:
PR c++/119498
* decl.cc (duplicate_decls): Change logic to not always exclude FMV
annotated functions in cases of return type non-ambiguation.
---
 gcc/cp/decl.cc | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
index 4a374fa29e3..6494944e3ba 100644
--- a/gcc/cp/decl.cc
+++ b/gcc/cp/decl.cc
@@ -2022,8 +2022,11 @@ duplicate_decls (tree newdecl, tree olddecl, bool 
hiding, bool was_hidden)
}
  /* For function versions, params and types match, but they
 are not ambiguous.  */
- else if ((!DECL_FUNCTION_VERSIONED (newdecl)
-   && !DECL_FUNCTION_VERSIONED (olddecl))
+ else if (((!DECL_FUNCTION_VERSIONED (newdecl)
+&& !DECL_FUNCTION_VERSIONED (olddecl))
+   || !comptypes (TREE_TYPE (TREE_TYPE (newdecl)),
+  TREE_TYPE (TREE_TYPE (olddecl)),
+  COMPARE_STRICT))
   /* Let constrained hidden friends coexist for now, we'll
  check satisfaction later.  */
   && !member_like_constrained_friend_p (newdecl)
-- 
2.34.1



[committed] libstdc++: Do not define __cpp_lib_ranges_iota in

2025-04-15 Thread Jonathan Wakely
In r14-7153-gadbc46942aee75 we removed a duplicate definition of
__glibcxx_want_range_iota from , but __cpp_lib_ranges_iota
should be defined in  at all.

libstdc++-v3/ChangeLog:

* include/std/ranges (__glibcxx_want_ranges_iota): Do not
define.
---

Tested x86_64-linux. Pushed to trunk.

 libstdc++-v3/include/std/ranges | 1 -
 1 file changed, 1 deletion(-)

diff --git a/libstdc++-v3/include/std/ranges b/libstdc++-v3/include/std/ranges
index 7a339c51368..9300c364a16 100644
--- a/libstdc++-v3/include/std/ranges
+++ b/libstdc++-v3/include/std/ranges
@@ -64,7 +64,6 @@
 #define __glibcxx_want_ranges_chunk
 #define __glibcxx_want_ranges_chunk_by
 #define __glibcxx_want_ranges_enumerate
-#define __glibcxx_want_ranges_iota
 #define __glibcxx_want_ranges_join_with
 #define __glibcxx_want_ranges_repeat
 #define __glibcxx_want_ranges_slide
-- 
2.49.0



[committed] libstdc++: Do not declare namespace ranges in unconditionally

2025-04-15 Thread Jonathan Wakely
Move namespace ranges inside the feature test macro guard, because
'ranges' is not a reserved name before C++20.

libstdc++-v3/ChangeLog:

* include/std/numeric (ranges): Only declare namespace for C++23
and later.
(ranges::iota_result): Fix indentation.
* testsuite/17_intro/names.cc: Check ranges is not used as an
identifier before C++20.
---

Tested x86_64-linux. Pushed to trunk.

 libstdc++-v3/include/std/numeric | 8 +++-
 libstdc++-v3/testsuite/17_intro/names.cc | 4 
 2 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/libstdc++-v3/include/std/numeric b/libstdc++-v3/include/std/numeric
index 4d36fcd36d9..490963ee46d 100644
--- a/libstdc++-v3/include/std/numeric
+++ b/libstdc++-v3/include/std/numeric
@@ -732,12 +732,11 @@ namespace __detail
   /// @} group numeric_ops
 #endif // C++17
 
+#if __glibcxx_ranges_iota >= 202202L // C++ >= 23
 namespace ranges
 {
-#if __glibcxx_ranges_iota >= 202202L // C++ >= 23
-
   template
-  using iota_result = out_value_result<_Out, _Tp>;
+using iota_result = out_value_result<_Out, _Tp>;
 
   struct __iota_fn
   {
@@ -762,9 +761,8 @@ namespace ranges
   };
 
   inline constexpr __iota_fn iota{};
-
-#endif // __glibcxx_ranges_iota
 } // namespace ranges
+#endif // __glibcxx_ranges_iota
 
 _GLIBCXX_END_NAMESPACE_VERSION
 } // namespace std
diff --git a/libstdc++-v3/testsuite/17_intro/names.cc 
b/libstdc++-v3/testsuite/17_intro/names.cc
index 4458325e52b..f67818db425 100644
--- a/libstdc++-v3/testsuite/17_intro/names.cc
+++ b/libstdc++-v3/testsuite/17_intro/names.cc
@@ -142,6 +142,10 @@
 #define try_emplace (
 #endif
 
+#if __cplusplus < 202002L
+#define ranges (
+#endif
+
 // These clash with newlib so don't use them.
 # define __lockablecannot be used as an identifier
 # define __null_sentinel   cannot be used as an identifier
-- 
2.49.0



[PUSHED/14 6/6] testcase: Add testcase for already fixed PR [PR118476]

2025-04-15 Thread Andrew Pinski
This testcase was fixed by r15-3052-gc7b76a076cb2c6ded but is
a testcase that failed in a different fashion and a much older
failure than the one added with r15-3052.

Pushed as obvious after a quick test.

PR tree-optimization/118476

gcc/testsuite/ChangeLog:

* gcc.dg/torture/pr118476-1.c: New test.

Signed-off-by: Andrew Pinski 
(cherry picked from commit d45a6502d1ec87d43f1a39f87cca58f1e28369c8)
---
 gcc/testsuite/gcc.dg/torture/pr118476-1.c | 14 ++
 1 file changed, 14 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr118476-1.c

diff --git a/gcc/testsuite/gcc.dg/torture/pr118476-1.c 
b/gcc/testsuite/gcc.dg/torture/pr118476-1.c
new file mode 100644
index 000..33509403b61
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr118476-1.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+
+/* PR tree-optimization/118476 */
+
+typedef unsigned long long poly64x1 
__attribute__((__vector_size__(1*sizeof(long long;
+
+poly64x1 vext_p64(poly64x1 a, poly64x1 b, const int n)
+{
+  poly64x1 r = a;
+  unsigned src = (unsigned)n;
+  long long t = b[0];
+  r[0] = (src < 1) ? a[src] : t;
+  return r;
+}
-- 
2.43.0



Re: [PATCH v2] libstdc++: Implement formatter for ranges and range_formatter [PR109162]

2025-04-15 Thread Jonathan Wakely

A few spelling and grammar fixes, and whitespace tweaks, but the only
significant thing is to qualify some calls to prevent ADL ...


On 14/04/25 16:13 +0200, Tomasz Kamiński wrote:

This patch implements formatter specialization for input_ranges and
range_formatter class form P2286R8, as adjusted by P2585R1. The formatter


"form" should be "from"


for pair/tuple is not yet provided, making maps not formattable.

To indicate partial support we define __glibcxx_format_ranges macro
value 1, without defining __cpp_lib_format_ranges.


That was already pushed in an earlier commit, but this sounds like
it's done here.


This introduces an new _M_format_range member to internal __formatter_str,
that formats range as _CharT as string, according the to the format spec.


"the to the" should be "to the"


This function transform any contiguous range into basic_string_view direclty,


"directly"


by computing size if necessary. Otherwise, for ranges for which size can be
computed (forward_range or sized_range) we use a stack buffer, if they are
sufficiently small. Finally, we create a basic_string<_CharT> from the range,
and format it content.


Should be "its content"



In case when padding is specified, this is handled by firstly formatting
the content of the range to the temporary string object. However, this can be
only implemented if the iterator of the basic_format_context is internal
type-erased iterator used by implementation. Otherwise a new 
basic_format_context
would need to be created, which would require rebinding of handles stored in
the arguments: note that format spec for element type could retrive any format


"retrive" should be "retrieve"


argument from format context, visit and and user handle to format it.


"and and"


As basic_format_context provide no user-facing constructor, the user are not 
able


"the user are not" should be "users are not"


to cosntructor object of that type with arbitrally iterators.


"cosntructor" should be "construct"
"arbitrally" should be "arbitrary"



The signatures of the user-facing parse and format method of the provided


"method" should be "methods"


formatters deviate from the standard by constraining types of params:
* _CharT is constrained __formatter::__char
* basic_format_parse_context<_CharT> for parse argument
* basic_format_context<_Out, _CharT> for format second argument
The standard specifies last three of above as unconstrained types. This types


"This types" should be "These types"


are later passed to possibly user-provided formatter specializations, that are
required via formattable concept to only accept above types.

Finally, the formatter specialization is implemented
without using specialization of range-default-formatter exposition only
template as base class, while providing same functionality.

PR libstdc++/109162

libstdc++-v3/ChangeLog:

* include/std/format (__format::__has_debug_format, 
_Pres_type::_Pres_seq)
(_Pres_type::_Pres_str, __format::__Stackbuf_size): Define.
(_Separators::_S_squares, _Separators::_S_parens, _Separators::_S_comma)
(_Separators::_S_colon): Define additional constants.
(_Spec::_M_parse_fill_and_align): Define overload accepting
list of excluded characters for fill, and forward existing overload.
(__formatter_str::_M_format_range): Define.
(__format::_Buf_sink) Use __Stackbuf_size for size of array.
(__format::__is_map_formattable, std::range_formatter)
(std::formatter<_Rg, _CharT>): Define.
* src/c++23/std.cc.in (std::format_kind, std::range_format)
(std::range_formatter): Export.
* testsuite/std/format/formatter/lwg3944.cc: Guarded tests with
__glibcxx_format_ranges.
* testsuite/std/format/formatter/requirements.cc: Adjusted for standard
behavior.
* testsuite/23_containers/vector/bool/format.cc: Test vector 
formatting.
* testsuite/std/format/ranges/format_kind.cc: New test.
* testsuite/std/format/ranges/formatter.cc: New test.
* testsuite/std/format/ranges/sequence.cc: New test.
* testsuite/std/format/ranges/string.cc: New test.
---
Adjusted the commit message and added test for result of formattable
check for ranges of types that are not formattable.

libstdc++-v3/include/std/format   | 511 --
libstdc++-v3/src/c++23/std.cc.in  |   6 +
.../23_containers/vector/bool/format.cc   |   6 +
.../testsuite/std/format/formatter/lwg3944.cc |   4 +-
.../std/format/formatter/requirements.cc  |  14 +-
.../std/format/ranges/format_kind.cc  |  94 
.../testsuite/std/format/ranges/formatter.cc  | 145 +
.../testsuite/std/format/ranges/sequence.cc   | 190 +++
.../testsuite/std/format/ranges/string.cc | 226 
9 files changed, 1131 insertions(+), 65 deletions(-)
create mode 100644 libstdc++-v3/testsuite/std/format/ranges/format_kind.cc
create mode 100644 libstdc++-v3/

Re: [PATCH] Locality cloning pass (was: Introduce -flto-partition=locality)

2025-04-15 Thread Kyrylo Tkachov


> On 15 Apr 2025, at 15:42, Richard Biener  wrote:
> 
> On Mon, Apr 14, 2025 at 3:11 PM Kyrylo Tkachov  wrote:
>> 
>> Hi Honza,
>> 
>>> On 13 Apr 2025, at 23:19, Jan Hubicka  wrote:
>>> 
 +@opindex fipa-reorder-for-locality
 +@item -fipa-reorder-for-locality
 +Group call chains close together in the binary layout to improve code code
 +locality.  This option is incompatible with an explicit
 +@option{-flto-partition=} option since it enforces a custom partitioning
 +scheme.
>>> 
>>> Please also cross-link this with -fprofile-reorder-functions and
>>> -freorder-functions, which does similar thing.
>>> If you see how to clean-up the description of the other two so user is
>>> not confused.
>>> 
>>> Perhaps say that -freorder-functions only partitions functions into
>>> never-executed/cold/normal/hot and -fprofile-reroder-functions is aiming
>>> for program startup optimization (it reorders by measured first time the
>>> function is executed.  By accident it seems to kind of work for
>>> locality.
>> 
>> Yeah, the option names are quite similar aren't they?
>> I’ve attempted to disambiguate them a bit in their description.
>> I’m attaching a diff from the previous version (as the full updated patch) 
>> to make it easier to see what’s adjusted.
>> 
>> 
>>> 
 +
 +/* Helper function of to accumulate call counts.  */
 +static bool
 +accumulate_profile_counts_after_cloning (cgraph_node *node, void *data)
 +{
 +  struct profile_stats *stats = (struct profile_stats *) data;
 +  for (cgraph_edge *e = node->callers; e; e = e->next_caller)
 +{
 +  if (e->caller == stats->target)
 + {
 +  if (stats->rec_count.compatible_p (e->count.ipa ()))
 +stats->rec_count += e->count.ipa ();
 + }
 +  else
 + {
 +  if (stats->nonrec_count.compatible_p (e->count.ipa ()))
 +stats->nonrec_count += e->count.ipa ();
 + }
>>> In case part of profile is missing (which may happen if one unit has -O0
>>> or so) , we may have counts to be uninitialized. Uninitialized counts are
>>> compatible with everything, but any arithmetics with it will produce
>>> uninitialized result which will likely confuse code later.  So I would
>>> skip edges with uninitialized counts.
>>> 
>>> On the other hand ipa counts are always compatible, so compatible_p
>>> should be redundat. Main reaosn for existence of compatible_p is that we
>>> can have local profiles that are 0 or unknown at IPA level.  The ipa ()
>>> conversion turns all counts into IPA counts and those are compatible
>>> with each other.
>>> 
>>> I suppose compatibe_p test is there since the code ICEd in past,but I
>>> think it was because of missing ipa() conversion.
>>> 
>>> 
 +}
 +  return false;
 +}
 +
 +/* NEW_NODE is a previously created clone of ORIG_NODE already present in
 +   current partition.  EDGES contains newly redirected edges to NEW_NODE.
 +   Adjust profile information for both nodes and the edge.  */
 +
 +static void
 +adjust_profile_info_for_non_self_rec_edges (auto_vec 
 &edges,
 +cgraph_node *new_node,
 +cgraph_node *orig_node)
 +{
 +  profile_count orig_node_count = orig_node->count.ipa ();
 +  profile_count edge_count = profile_count::zero ();
 +  profile_count final_new_count = profile_count::zero ();
 +  profile_count final_orig_count = profile_count::zero ();
 +
 +  for (unsigned i = 0; i < edges.length (); ++i)
 +edge_count += edges[i]->count.ipa ();
>>> Here I would again skip uninitialized.  It is probably legal for -O0
>>> function to end up in partition.
 +
 +  final_orig_count = orig_node_count - edge_count;
 +
 +  /* NEW_NODE->count was adjusted for other callers when the clone was
 + first created.  Just add the new edge count.  */
 +  if (new_node->count.compatible_p (edge_count))
 +final_new_count = new_node->count + edge_count;
>>> And here compatible_p should be unnecesary.
 +/* Accumulate frequency of all edges from EDGE->caller to EDGE->callee.  
 */
 +
 +static sreal
 +accumulate_incoming_edge_frequency (cgraph_edge *edge)
 +{
 +  sreal count = 0;
 +  struct cgraph_edge *e;
 +  for (e = edge->callee->callers; e; e = e->next_caller)
 +{
 +  /* Make a local decision about all edges for EDGE->caller but not 
 the
 + other nodes already in the partition.  Their edges will be visited
 + later or may have been visited before and not fit the
 + cut-off criteria.  */
 +  if (e->caller == edge->caller)
 + {
 +  profile_count caller_count = e->caller->inlined_to
 + ? e->caller->inlined_to->count
 + : e->caller->count;
 +  if (e->count.compatible_p (caller_count))
>>> Here again compatiblity check should not be necessary, since the counts
>>> belong to one function body (after inlining) and should be compatible.
>>> inliner 

COBOL: Is anything stalled because of me?

2025-04-15 Thread Robert Dubner
Speaking purely casually:  I thought that that COBOL would be released with 
documented limited capability.  "Yeah, it works on x86_64-linux and 
aarch64-linux.  More to come.".  We knew that we didn't know how to 
cross-compile, and we knew that other platforms would have to come, in time.

It never occurred to me that significant efforts would be made to fix all 
that in a month.

More formally:  I am very aware that I have not been as responsive here as 
maybe I should have been.  I plead incapacition due to inundation.

If I have missed anything; if anybody is waiting for me, please remind me. 
And if I have missed pings, I apologize; they've just been hidden in the 
deluge.

Thanks.

> -Original Message-
> From: Jeff Law 
> Sent: Tuesday, April 15, 2025 10:32
> To: gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH] libgcobol: mark riscv64-*-linux* as supported target
>
>
>
> On 4/15/25 7:57 AM, Andreas Schwab wrote:
> > * configure.tgt: Set LIBGCOBOL_SUPPORTED for riscv64-*-linux* with
> > 64-bit multilib.
> Can't say I'm happy with the amount of Cobol related churn at this phase
> in our cycle.  But this should be exceedingly safe.  So OK.
>
> jeff



Re: Mark const parameters passed by invisible reference as readonly in the function body

2025-04-15 Thread Jason Merrill

On 3/30/25 6:12 PM, Jan Hubicka wrote:

Hi,
I noticed that this patch got forgotten and I think it may be useful to
solve this next stage 1.


cp_apply_type_quals_to_decl drops 'const' if the type has mutable members.
Unfortunately TREE_READONLY on the PARM_DECL isn't helpful in the case of an
invisiref parameter.


But maybe classes with mutable
members are never POD and thus always runtime initialized?


No.


C++ frontend has

/* Nonzero means that this type contains a mutable member.  */
#define CLASSTYPE_HAS_MUTABLE(NODE) (LANG_TYPE_CLASS_CHECK (NODE)->has_mutable)
#define TYPE_HAS_MUTABLE_P(NODE) (cp_has_mutable_p (NODE))

but it is not exported to middle-end.

However still this is quite special situation since the object is passed
using invisible reference, so I wonder if in this situation a copy is
constructed so the callee can possibly overwrite the mutable fields?


The object bound to the invisible reference is usually a copy, mutable
doesn't make a difference.

If I understand situation right, in the following testcase:

struct foo
{
   mutable int a;
   void bar() const;
   ~foo()
   {
 if (a != 42)
   __builtin_abort ();
   }
};
__attribute__ ((noinline))
void test(const struct foo a)
{
 int b = a.a;
 a.bar();
 if (a.a != b)
   __builtin_printf ("optimize me away");
}

We can not assume that value of a.a was not changed by bar because a is
mutable, but otherwise it is safe to optimize out the final check.
If that is so, I think we want to let middle-end know that a type has
mutable field and use it here, right?


Ah, yes, that makes sense.

Jason



Re: [PATCH] c: Fully fold each parameter for call to .ACCESS_WITH_SIZE [PR119717]

2025-04-15 Thread Marek Polacek
On Mon, Apr 14, 2025 at 08:28:55PM +, Qing Zhao wrote:
> C_MAYBE_CONST_EXPR is a C FE operator that will be removed by c_fully_fold.
> In c_fully_fold, it assumes that operands of function calls have already
> been folded. However, when we build call to .ACCESS_WITH_SIZE, all its
> operands are not fully folded. therefore the C FE specific operator is
> passed to middle-end.
> 
> In order to fix this issue, fully fold the parameters before building the
> call to .ACCESS_WITH_SIZE.
> 
> I am doing the bootstrap and regression testing on both X86 and aarch64 now.
> Okay for trunk if testing going well?
> 
> thanks.
> 
> Qing
> 
>   PR c/119717
> 
> gcc/c/ChangeLog:
> 
>   * c-typeck.cc (build_access_with_size_for_counted_by): Fully fold the
>   parameters for call to .ACCESS_WITH_SIZE.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/pr119717.c: New test.
> ---
>  gcc/c/c-typeck.cc   |  8 ++--
>  gcc/testsuite/gcc.dg/pr119717.c | 24 
>  2 files changed, 30 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/pr119717.c
> 
> diff --git a/gcc/c/c-typeck.cc b/gcc/c/c-typeck.cc
> index 3870e8a1558..dd176d96a41 100644
> --- a/gcc/c/c-typeck.cc
> +++ b/gcc/c/c-typeck.cc
> @@ -3013,12 +3013,16 @@ build_access_with_size_for_counted_by (location_t 
> loc, tree ref,
>gcc_assert (c_flexible_array_member_type_p (TREE_TYPE (ref)));
>/* The result type of the call is a pointer to the flexible array type.  */
>tree result_type = c_build_pointer_type (TREE_TYPE (ref));
> +  tree first_param
> += c_fully_fold (array_to_pointer_conversion (loc, ref), FALSE, NULL);
> +  tree second_param
> += c_fully_fold (counted_by_ref, FALSE, NULL);

Why FALSE?  Just use false.  You can also use nullptr rather than NULL now.
  
>tree call
>  = build_call_expr_internal_loc (loc, IFN_ACCESS_WITH_SIZE,
>   result_type, 6,
> - array_to_pointer_conversion (loc, ref),
> - counted_by_ref,
> + first_param,
> + second_param,
>   build_int_cst (integer_type_node, 1),
>   build_int_cst (counted_by_type, 0),
>   build_int_cst (integer_type_node, -1),
> diff --git a/gcc/testsuite/gcc.dg/pr119717.c b/gcc/testsuite/gcc.dg/pr119717.c
> new file mode 100644
> index 000..e5eedc567b3
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/pr119717.c
> @@ -0,0 +1,24 @@
> +/* PR c/119717  */
> +/* { dg-additional-options "-std=c23" } */
> +/* { dg-do compile } */
> +
> +struct annotated {
> +  unsigned count;
> +  [[gnu::counted_by(count)]] char array[];
> +};
> +
> +[[gnu::noinline,gnu::noipa]]
> +static unsigned
> +size_of (bool x, struct annotated *a)
> +{
> +  char *p = (x ? a : 0)->array;
> +  return __builtin_dynamic_object_size (p, 1);
> +}
> +
> +int main()
> +{
> +  struct annotated *p = __builtin_malloc(sizeof *p);
> +  p->count = 0;
> +  __builtin_printf ("the bdos whole is %ld\n", size_of (0, p));
> +  return 0;
> +}
> -- 
> 2.31.1
> 

Marek



[PATCH] x86: Update gcc.target/i386/apx-interrupt-1.c

2025-04-15 Thread H.J. Lu
ix86_add_cfa_restore_note omits the REG_CFA_RESTORE REG note for registers
pushed in red-zone.  Since

commit 0a074b8c7e79f9d9359d044f1499b0a9ce9d2801
Author: H.J. Lu 
Date:   Sun Apr 13 12:20:42 2025 -0700

APX: Don't use red-zone with 32 GPRs and no caller-saved registers

disabled red-zone, update gcc.target/i386/apx-interrupt-1.c to expect
31 .cfi_restore directives.

PR target/119784
* gcc.target/i386/apx-interrupt-1.c: Expect 31 .cfi_restore
directives.

Signed-off-by: H.J. Lu 
---
 gcc/testsuite/gcc.target/i386/apx-interrupt-1.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/i386/apx-interrupt-1.c 
b/gcc/testsuite/gcc.target/i386/apx-interrupt-1.c
index fefe2e6d6fc..fa1acc7a142 100644
--- a/gcc/testsuite/gcc.target/i386/apx-interrupt-1.c
+++ b/gcc/testsuite/gcc.target/i386/apx-interrupt-1.c
@@ -66,7 +66,7 @@ void foo (void *frame)
 /* { dg-final { scan-assembler-times {\t\.cfi_offset 132, -120} 1 } } */
 /* { dg-final { scan-assembler-times {\t\.cfi_offset 131, -128} 1 } } */
 /* { dg-final { scan-assembler-times {\t\.cfi_offset 130, -136} 1 } } */
-/* { dg-final { scan-assembler-times ".cfi_restore" 15} } */
+/* { dg-final { scan-assembler-times ".cfi_restore" 31 } } */
 /* { dg-final { scan-assembler-times "pop(?:l|q)\[\\t \]*%(?:e|r)ax" 1 } } */
 /* { dg-final { scan-assembler-times "pop(?:l|q)\[\\t \]*%(?:e|r)bx" 1 } } */
 /* { dg-final { scan-assembler-times "pop(?:l|q)\[\\t \]*%(?:e|r)cx" 1 } } */
-- 
2.49.0



[PATCH v4] libstdc++: Implement formatter for ranges and range_formatter [PR109162]

2025-04-15 Thread Tomasz Kamiński
This patch implements formatter specialization for input_ranges and
range_formatter class from P2286R8, as adjusted by P2585R1. The formatter
for pair/tuple is not yet provided, making maps not formattable.

This introduces an new _M_format_range member to internal __formatter_str,
that formats range as _CharT as string, according to the format spec.
This function transform any contiguous range into basic_string_view directly,
by computing size if necessary. Otherwise, for ranges for which size can be
computed (forward_range or sized_range) we use a stack buffer, if they are
sufficiently small. Finally, we create a basic_string<_CharT> from the range,
and format its content.

In case when padding is specified, this is handled by firstly formatting
the content of the range to the temporary string object. However, this can be
only implemented if the iterator of the basic_format_context is internal
type-erased iterator used by implementation. Otherwise a new 
basic_format_context
would need to be created, which would require rebinding of handles stored in
the arguments: note that format spec for element type could retrieve any format
argument from format context, visit and use handle to format it.
As basic_format_context provide no user-facing constructor, the user are not 
able
to construct object of that type with arbitrary iterators.

The signatures of the user-facing parse and format methods of the provided
formatters deviate from the standard by constraining types of params:
* _CharT is constrained __formatter::__char
* basic_format_parse_context<_CharT> for parse argument
* basic_format_context<_Out, _CharT> for format second argument
The standard specifies last three of above as unconstrained types. These types
are later passed to possibly user-provided formatter specializations, that are
required via formattable concept to only accept above types.

Finally, the formatter specialization is implemented
without using specialization of range-default-formatter exposition only
template as base class, while providing same functionality.

PR libstdc++/109162

libstdc++-v3/ChangeLog:

* include/std/format (__format::__has_debug_format, 
_Pres_type::_Pres_seq)
(_Pres_type::_Pres_str, __format::__Stackbuf_size): Define.
(_Separators::_S_squares, _Separators::_S_parens, _Separators::_S_comma)
(_Separators::_S_colon): Define additional constants.
(_Spec::_M_parse_fill_and_align): Define overload accepting
list of excluded characters for fill, and forward existing overload.
(__formatter_str::_M_format_range): Define.
(__format::_Buf_sink) Use __Stackbuf_size for size of array.
(__format::__is_map_formattable, std::range_formatter)
(std::formatter<_Rg, _CharT>): Define.
* src/c++23/std.cc.in (std::format_kind, std::range_format)
(std::range_formatter): Export.
* testsuite/std/format/formatter/lwg3944.cc: Guarded tests with
__glibcxx_format_ranges.
* testsuite/std/format/formatter/requirements.cc: Adjusted for standard
behavior.
* testsuite/23_containers/vector/bool/format.cc: Test vector 
formatting.
* testsuite/std/format/ranges/format_kind.cc: New test.
* testsuite/std/format/ranges/formatter.cc: New test.
* testsuite/std/format/ranges/sequence.cc: New test.
* testsuite/std/format/ranges/string.cc: New test.

Reviewed-by: Jonathan Wakely 
Signed-off-by: Tomasz Kamiński 
---
Fixed another double spacing error.

 libstdc++-v3/include/std/format   | 505 --
 libstdc++-v3/src/c++23/std.cc.in  |   6 +
 .../23_containers/vector/bool/format.cc   |   6 +
 .../testsuite/std/format/formatter/lwg3944.cc |   4 +-
 .../std/format/formatter/requirements.cc  |  14 +-
 .../std/format/ranges/format_kind.cc  |  94 
 .../testsuite/std/format/ranges/formatter.cc  | 145 +
 .../testsuite/std/format/ranges/sequence.cc   | 190 +++
 .../testsuite/std/format/ranges/string.cc | 226 
 9 files changed, 1125 insertions(+), 65 deletions(-)
 create mode 100644 libstdc++-v3/testsuite/std/format/ranges/format_kind.cc
 create mode 100644 libstdc++-v3/testsuite/std/format/ranges/formatter.cc
 create mode 100644 libstdc++-v3/testsuite/std/format/ranges/sequence.cc
 create mode 100644 libstdc++-v3/testsuite/std/format/ranges/string.cc

diff --git a/libstdc++-v3/include/std/format b/libstdc++-v3/include/std/format
index 23f00970840..096dda4f989 100644
--- a/libstdc++-v3/include/std/format
+++ b/libstdc++-v3/include/std/format
@@ -97,6 +97,10 @@ namespace __format
 #define _GLIBCXX_WIDEN_(C, S) ::std::__format::_Widen(S, L##S)
 #define _GLIBCXX_WIDEN(S) _GLIBCXX_WIDEN_(_CharT, S)
 
+  // Size for stack located buffer
+  template
+  constexpr size_t __stackbuf_size = 32 * sizeof(void*) / sizeof(_CharT);
+
   // Type-erased character sinks.
   template class _Sink;
   template class _Fixedbuf_sink;
@@ -47

RE: [PATCH]middle-end: Fix incorrect codegen with PFA and VLS [PR119351]

2025-04-15 Thread Tamar Christina
> -Original Message-
> From: Richard Biener 
> Sent: Tuesday, April 15, 2025 12:50 PM
> To: Tamar Christina 
> Cc: Richard Sandiford ; gcc-patches@gcc.gnu.org;
> nd 
> Subject: RE: [PATCH]middle-end: Fix incorrect codegen with PFA and VLS
> [PR119351]
> 
> On Tue, 15 Apr 2025, Tamar Christina wrote:
> 
> > > -Original Message-
> > > From: Richard Sandiford 
> > > Sent: Tuesday, April 15, 2025 10:52 AM
> > > To: Tamar Christina 
> > > Cc: gcc-patches@gcc.gnu.org; nd ; rguent...@suse.de
> > > Subject: Re: [PATCH]middle-end: Fix incorrect codegen with PFA and VLS
> > > [PR119351]
> > >
> > > Tamar Christina  writes:
> > > > diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
> > > > index
> > >
> 56a4e9a8b63f3cae0bf596bf5d22893887dc80e8..0722679d6e66e5dd5af4ec1c
> > > e591f7c38b76d07f 100644
> > > > --- a/gcc/tree-vect-loop-manip.cc
> > > > +++ b/gcc/tree-vect-loop-manip.cc
> > > > @@ -2195,6 +2195,22 @@ vect_can_peel_nonlinear_iv_p (loop_vec_info
> > > loop_vinfo,
> > > >return false;
> > > >  }
> > > >
> > > > +  /* With early break vectorization we don't know whether the accesses 
> > > > will
> stay
> > > > + inside the loop or not.  TODO: The early break adjustment code 
> > > > can be
> > > > + implemented the same way for vectorizable_linear_induction.  
> > > > However
> we
> > > > + can't test this today so reject it.  */
> > > > +  if (niters_skip != NULL_TREE
> > > > +  && vect_use_loop_mask_for_alignment_p (loop_vinfo)
> > > > +  && LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo)
> > > > +  && LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > > +{
> > > > +  if (dump_enabled_p ())
> > > > +   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > > > +"Peeling for alignement using masking is not 
> > > > supported"
> > > > +" for nonlinear induction when using early 
> > > > breaks.\n");
> > > > +  return false;
> > > > +}
> > > > +
> > > >return true;
> > > >  }
> > >
> > > FTR, I was wondering here whether we should predict this in advance and
> > > instead drop down to peeling for alignment without masks.  It probably
> > > isn't worth the effort though.
> >
> > We could move the check into vect_use_loop_mask_for_alignment_p where
> > rejecting it there would get it to fall back to scalar peeling.  That seems 
> > simple
> enough
> > if that's preferrable.
> 
> The above is perferable IMO (short of fixing up that case, but with
> a testcase).
> 

I wasn't able to make a testcase before as any non-linear induction feeding a 
load becomes
a gather load, which we block outright way before getting here though.  I 
couldn't think of
an example where it wouldn't be, even a gapped load e.g +=2 became one.

Thanks,
Tamar

> Richard.
> 
> > Cheers,
> > Tamar
> > >
> > > > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> > > > index
> > >
> 9413dcef702597ab27165e676546b190e2bd36ba..6dcdee19bb250993d8cc6b0
> > > 057d2fa46245d04d9 100644
> > > > --- a/gcc/tree-vect-loop.cc
> > > > +++ b/gcc/tree-vect-loop.cc
> > > > @@ -10678,6 +10678,104 @@ vectorizable_induction (loop_vec_info
> > > loop_vinfo,
> > > >LOOP_VINFO_MASK_SKIP_NITERS
> > > (loop_vinfo));
> > > >   peel_mul = gimple_build_vector_from_val (&init_stmts,
> > > >step_vectype, 
> > > > peel_mul);
> > > > +
> > > > + /* If early break then we have to create a new PHI which we 
> > > > can use as
> > > > +   an offset to adjust the induction reduction in early exits. 
> > > >  */
> > > > + if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > > +   {
> > > > + auto skip_niters = LOOP_VINFO_MASK_SKIP_NITERS 
> > > > (loop_vinfo);
> > > > + tree ty_skip_niters = TREE_TYPE (skip_niters);
> > > > + tree break_lhs_phi = NULL_TREE;
> > > > + break_lhs_phi = vect_get_new_vect_var (ty_skip_niters,
> > > > +vect_scalar_var,
> > > > +"pfa_iv_offset");
> > > > + gphi *nphi = create_phi_node (break_lhs_phi, bb);
> > > > + add_phi_arg (nphi, skip_niters, pe, UNKNOWN_LOCATION);
> > > > + add_phi_arg (nphi, build_zero_cst (ty_skip_niters),
> > > > +  loop_latch_edge (iv_loop), UNKNOWN_LOCATION);
> > > > +
> > > > + /* Rewrite all the early exit usages.  */
> > > > + tree phi_lhs = PHI_RESULT (phi);
> > > > + imm_use_iterator iter;
> > > > + use_operand_p use_p;
> > > > + gimple *use_stmt;
> > > > +
> > > > + FOR_EACH_IMM_USE_FAST (use_p, iter, phi_lhs)
> > > > +   {
> > > > + use_stmt = USE_STMT (use_p);
> > > > + if (!flow_bb_inside_loop_p (iv_loop, gimple_bb 
> > > > 

[PATCH v4 08/20] Add get_clone_versions and get_version functions.

2025-04-15 Thread Alfie Richards
This is a reimplementation of get_target_clone_attr_len,
get_attr_str, and separate_attrs using string_slice and auto_vec to make
memory management and use simpler.

Adds get_target_version helper function to get the target_version string
from a decl.

gcc/c-family/ChangeLog:

* c-attribs.cc (handle_target_clones_attribute): Change to use
get_clone_versions.

gcc/ChangeLog:

* tree.cc (get_clone_versions): New function.
(get_clone_attr_versions): New function.
(get_version): New function.
* tree.h (get_clone_versions): New function.
(get_clone_attr_versions): New function.
(get_target_version): New function.
---
 gcc/c-family/c-attribs.cc |  4 ++-
 gcc/tree.cc   | 59 +++
 gcc/tree.h| 11 
 3 files changed, 73 insertions(+), 1 deletion(-)

diff --git a/gcc/c-family/c-attribs.cc b/gcc/c-family/c-attribs.cc
index 5a0e3d328ba..5dff489fcca 100644
--- a/gcc/c-family/c-attribs.cc
+++ b/gcc/c-family/c-attribs.cc
@@ -6132,7 +6132,9 @@ handle_target_clones_attribute (tree *node, tree name, 
tree ARG_UNUSED (args),
}
}
 
-  if (get_target_clone_attr_len (args) == -1)
+  auto_vec versions= get_clone_attr_versions (args, NULL);
+
+  if (versions.length () == 1)
{
  warning (OPT_Wattributes,
   "single % attribute is ignored");
diff --git a/gcc/tree.cc b/gcc/tree.cc
index eccfcc89da4..fdcdfb336bc 100644
--- a/gcc/tree.cc
+++ b/gcc/tree.cc
@@ -15372,6 +15372,65 @@ get_target_clone_attr_len (tree arglist)
   return str_len_sum;
 }
 
+/* Returns an auto_vec of string_slices containing the version strings from
+   ARGLIST.  DEFAULT_COUNT is incremented for each default version found.  */
+
+auto_vec
+get_clone_attr_versions (const tree arglist, int *default_count)
+{
+  gcc_assert (TREE_CODE (arglist) == TREE_LIST);
+  auto_vec versions;
+
+  static const char separator_str[] = {TARGET_CLONES_ATTR_SEPARATOR, 0};
+  string_slice separators = string_slice (separator_str);
+
+  for (tree arg = arglist; arg; arg = TREE_CHAIN (arg))
+{
+  string_slice str = string_slice (TREE_STRING_POINTER (TREE_VALUE (arg)));
+  while (str.is_valid ())
+   {
+ string_slice attr = string_slice::tokenize (&str, separators);
+ attr = attr.strip ();
+
+ if (attr == "default" && default_count)
+   (*default_count)++;
+ versions.safe_push (attr);
+   }
+}
+  return versions;
+}
+
+/* Returns an auto_vec of string_slices containing the version strings from
+   the target_clone attribute from DECL.  DEFAULT_COUNT is incremented for each
+   default version found.  */
+auto_vec
+get_clone_versions (const tree decl, int *default_count)
+{
+  tree attr = lookup_attribute ("target_clones", DECL_ATTRIBUTES (decl));
+  if (!attr)
+return auto_vec ();
+  tree arglist = TREE_VALUE (attr);
+  return get_clone_attr_versions (arglist, default_count);
+}
+
+/* If DECL has a target_version attribute, returns a string_slice containing 
the
+   attribute value.  Otherwise, returns string_slice::invalid.
+   Only works for target_version due to target attributes allowing multiple
+   string arguments to specify one target.  */
+string_slice
+get_target_version (const tree decl)
+{
+  gcc_assert (!TARGET_HAS_FMV_TARGET_ATTRIBUTE);
+
+  tree attr = lookup_attribute ("target_version", DECL_ATTRIBUTES (decl));
+
+  if (!attr)
+return string_slice::invalid ();
+
+  return string_slice (TREE_STRING_POINTER (TREE_VALUE (TREE_VALUE (attr
+  .strip ();
+}
+
 void
 tree_cc_finalize (void)
 {
diff --git a/gcc/tree.h b/gcc/tree.h
index 99f26177628..a89f3cf7189 100644
--- a/gcc/tree.h
+++ b/gcc/tree.h
@@ -22,6 +22,7 @@ along with GCC; see the file COPYING3.  If not see
 
 #include "tree-core.h"
 #include "options.h"
+#include "vec.h"
 
 /* Convert a target-independent built-in function code to a combined_fn.  */
 
@@ -7052,4 +7053,14 @@ extern tree get_attr_nonstring_decl (tree, tree * = 
NULL);
 
 extern int get_target_clone_attr_len (tree);
 
+/* Returns the version string for a decl with target_version attribute.
+   Returns an invalid string_slice if no attribute is present.  */
+extern string_slice get_target_version (const tree);
+/* Returns a vector of the version strings from a target_clones attribute on
+   a decl.  Can also record the number of default versions found.  */
+extern auto_vec get_clone_versions (const tree, int * = NULL);
+/* Returns a vector of the version strings from a target_clones attribute
+   directly.  */
+extern auto_vec get_clone_attr_versions (const tree, int *);
+
 #endif  /* GCC_TREE_H  */
-- 
2.34.1



Re: [PATCH v2] RISC-V: Do not lift up vsetvl into non-transparent blocks [PR119547].

2025-04-15 Thread Jeff Law




On 4/9/25 6:08 AM, Robin Dapp wrote:

Hi,

when lifting up a vsetvl into a block we currently don't consider the
block's transparency with respect to the vsetvl as in other parts of the
pass.  This patch does not perform the lift when transparency is not
guaranteed.

This condition is more restrictive than necessary as we can still
perform a vsetvl lift if the conflicting register is only every used
in vsetvls and no regular insns but given how late we are in the GCC 15
cycle it seems better to defer this.  Therefore
gcc.target/riscv/rvv/vsetvl/avl_single-68.c is XFAILed for now.

This issue was found in OpenCV where it manifests as a runtime error.
Zhijin Zeng debugged PR119547 and provided an initial patch.

V2 now uses the transparency property rather than the manual approach 
before, both because it is cleaner and also because it helps with the go 
ICE

in PR119533.

Regtested on rv64gcv_zvl512b.

Regards
Robin

Reported-By: 曾治金 

 PR target/119547

gcc/ChangeLog:

 * config/riscv/riscv-vsetvl.cc 
(pre_vsetvl::earliest_fuse_vsetvl_info):

 Do not perform lift if block is not transparent.

gcc/testsuite/ChangeLog:

 * gcc.target/riscv/rvv/vsetvl/avl_single-68.c: xfail.
 * g++.target/riscv/rvv/autovec/pr119547.C: New test.
 * g++.target/riscv/rvv/autovec/pr119547-2.C: New test.
 * gcc.target/riscv/rvv/vsetvl/vlmax_switch_vtype-10.c: Adjust.

OK for the trunk.

jeff



Re: [PATCH] c++: wrong targs printed in hard satisfaction error [PR99214]

2025-04-15 Thread Jason Merrill

On 4/13/25 1:56 PM, Patrick Palka wrote:

Alternatively, rather than passing the most general template + args to
push_tinst_level, we can pass the partially instantiated template +
innermost args via just:

gcc/cp/ChangeLog:

* constraint.cc (satisfy_declaration_constraints): Pass the
original T and ARGS to push_tinst_level.

diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc
index 2f1678ce4ff9..52768972da43 100644
--- a/gcc/cp/constraint.cc
+++ b/gcc/cp/constraint.cc
@@ -2704,6 +2704,8 @@ satisfy_declaration_constraints (tree t, sat_info info)
  static tree
  satisfy_declaration_constraints (tree t, tree args, sat_info info)
  {
+  tree orig_t = t, orig_args = args;
+
/* Update the declaration for diagnostics.  */
info.in_decl = t;
  
@@ -2732,7 +2734,7 @@ satisfy_declaration_constraints (tree t, tree args, sat_info info)

tree result = boolean_true_node;
if (tree norm = get_normalized_constraints_from_decl (t, info.noisy ()))
  {
-  if (!push_tinst_level (t, args))
+  if (!push_tinst_level (orig_t, orig_args))
return result;
tree pattern = DECL_TEMPLATE_RESULT (t);
push_to_top_level ();

So that for diagnostic20.C in question we emit:

   In substitution of '... void A::f() [with U = char]'.

compared to (with the previous approach)

   In substitution of '... void A::f() [with U = char; T = int]'.

or (wrongly, with the status quo)

   In substitution of '... void A::f() [with U = int]'

Would this be preferable?  I'd be good with either.


This approach certainly seems tidier; OK.

Jason


On Wed, 9 Apr 2025, Patrick Palka wrote:


On Wed, 9 Apr 2025, Patrick Palka wrote:


On Wed, 5 Mar 2025, Jason Merrill wrote:


On 3/5/25 10:13 AM, Patrick Palka wrote:

On Tue, 4 Mar 2025, Jason Merrill wrote:


On 3/4/25 2:49 PM, Patrick Palka wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK
for trunk/14?

-- >8 --

In the three-parameter version of satisfy_declaration_constraints, when
't' isn't the most general template, then 't' won't correspond with
'args' after we augment the latter via add_outermost_template_args, and
so the instantiation context that we push via push_tinst_level isn't
quite correct: 'args' is a complete set of template arguments, but 't'
is not necessarily the most general template.  This manifests as
misleading diagnostic context lines when issuing a hard error (or a
constraint recursion error) that occurred during satisfaction, e.g. for
the below testcase without this patch we emit:
 In substitution of '... void A::f() [with U = int]'
and with this patch we emit:
 In substitution of '... void A::f() [with U = char; T = int]'.

This patch fixes this by always passing the most general template to
push_tinst_level.


That soungs good, but getting it by passing it back from
get_normalized_constraints_from_decl seems confusing; I'd think we should
calculate it in parallel to changing args to correspond to that template.


Hmm, won't that mean duplicating the template adjustment logic in
get_normalized_constraints_from_decl, which seems undesirable?  The
function has many callers, some of which are for satisfaction where
targs are involved, and the rest are for subsumption where no targs are
involved, so I don't see a clean way of refactoring the code to avoid
duplication of the template adjustment logic.  Right now the targ
adjustment logic is unfortunately duplicated across both overloads
of satisfy_declaration_constraints and it seems undesirable to add
more duplication.


Fair enough.  Incidentally, I wonder why the two-parm overload doesn't call
the three-parm overload?


Maybe one way to reduce the duplication would be to go the other way and
move the targ adjustment logic to get_normalized_constraints_from_decl
as well (so that it has two out-parameters, 'gen_d' and 'gen_args').
The proposed patch then would be an incremental step towards that.


That makes sense, passing back something suitable for
add_outermost_template_args.


I tried combining the two overloads, and/or moving the targ adjustment
logic to get_normalized_constraints_from_decl, but I couldn't arrive at
a formulation that worked and I was happy with (i.e. didn't lead to more
code duplication than the original appproach).

In the meantime I noticed that this bug is more pervasive than I
thought, and leads to wrong diagnostic context lines printed even in the
case of ordinary satisfaction failure -- however the wrong diagnostic
lines are more annoying/noticable during a hard error or constraint
recursion where there's likely no other useful diagnostic lines that
might have the correct args printed.

So I adjusted the testcase in the original patch accordingly.  Could the
following go in for now?

I also attached a diff of the output of all our concepts testcases
currently, before/after this patch.  Each change seems like a clear
improvement/correction to me.


Oops, that was not a complete diff of all the concepts

Re: [PATCH] c: Fully fold each parameter for call to .ACCESS_WITH_SIZE [PR119717]

2025-04-15 Thread Qing Zhao


> On Apr 14, 2025, at 16:35, Marek Polacek  wrote:
> 
> On Mon, Apr 14, 2025 at 08:28:55PM +, Qing Zhao wrote:
>> C_MAYBE_CONST_EXPR is a C FE operator that will be removed by c_fully_fold.
>> In c_fully_fold, it assumes that operands of function calls have already
>> been folded. However, when we build call to .ACCESS_WITH_SIZE, all its
>> operands are not fully folded. therefore the C FE specific operator is
>> passed to middle-end.
>> 
>> In order to fix this issue, fully fold the parameters before building the
>> call to .ACCESS_WITH_SIZE.
>> 
>> I am doing the bootstrap and regression testing on both X86 and aarch64 now.
>> Okay for trunk if testing going well?
>> 
>> thanks.
>> 
>> Qing
>> 
>> PR c/119717
>> 
>> gcc/c/ChangeLog:
>> 
>> * c-typeck.cc (build_access_with_size_for_counted_by): Fully fold the
>> parameters for call to .ACCESS_WITH_SIZE.
>> 
>> gcc/testsuite/ChangeLog:
>> 
>> * gcc.dg/pr119717.c: New test.
>> ---
>> gcc/c/c-typeck.cc   |  8 ++--
>> gcc/testsuite/gcc.dg/pr119717.c | 24 
>> 2 files changed, 30 insertions(+), 2 deletions(-)
>> create mode 100644 gcc/testsuite/gcc.dg/pr119717.c
>> 
>> diff --git a/gcc/c/c-typeck.cc b/gcc/c/c-typeck.cc
>> index 3870e8a1558..dd176d96a41 100644
>> --- a/gcc/c/c-typeck.cc
>> +++ b/gcc/c/c-typeck.cc
>> @@ -3013,12 +3013,16 @@ build_access_with_size_for_counted_by (location_t 
>> loc, tree ref,
>>   gcc_assert (c_flexible_array_member_type_p (TREE_TYPE (ref)));
>>   /* The result type of the call is a pointer to the flexible array type.  */
>>   tree result_type = c_build_pointer_type (TREE_TYPE (ref));
>> +  tree first_param
>> += c_fully_fold (array_to_pointer_conversion (loc, ref), FALSE, NULL);
>> +  tree second_param
>> += c_fully_fold (counted_by_ref, FALSE, NULL);
> 
> Why FALSE?  Just use false.  You can also use nullptr rather than NULL now.

Just replaced FALSE with false.
I am keeping NULL to be consistent with other calls to c_fully_fold in the same 
file.

And  testing the new version now.

Thanks a lot.

(With FALSE, the compilation went fine…)

Qing
> 
>>   tree call
>> = build_call_expr_internal_loc (loc, IFN_ACCESS_WITH_SIZE,
>> result_type, 6,
>> - array_to_pointer_conversion (loc, ref),
>> - counted_by_ref,
>> + first_param,
>> + second_param,
>> build_int_cst (integer_type_node, 1),
>> build_int_cst (counted_by_type, 0),
>> build_int_cst (integer_type_node, -1),
>> diff --git a/gcc/testsuite/gcc.dg/pr119717.c 
>> b/gcc/testsuite/gcc.dg/pr119717.c
>> new file mode 100644
>> index 000..e5eedc567b3
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/pr119717.c
>> @@ -0,0 +1,24 @@
>> +/* PR c/119717  */
>> +/* { dg-additional-options "-std=c23" } */
>> +/* { dg-do compile } */
>> +
>> +struct annotated {
>> +  unsigned count;
>> +  [[gnu::counted_by(count)]] char array[];
>> +};
>> +
>> +[[gnu::noinline,gnu::noipa]]
>> +static unsigned
>> +size_of (bool x, struct annotated *a)
>> +{
>> +  char *p = (x ? a : 0)->array;
>> +  return __builtin_dynamic_object_size (p, 1);
>> +}
>> +
>> +int main()
>> +{
>> +  struct annotated *p = __builtin_malloc(sizeof *p);
>> +  p->count = 0;
>> +  __builtin_printf ("the bdos whole is %ld\n", size_of (0, p));
>> +  return 0;
>> +}
>> -- 
>> 2.31.1
>> 
> 
> Marek




Re: [PATCH] MATCH: Fix patterns of type (a != b) and (a == b) [PR117760]

2025-04-15 Thread Jeff Law




On 4/15/25 12:24 AM, Eikansh Gupta wrote:

The patterns can be simplified as shown below:

(a != b) & ((a|b) != 0)  -> (a != b)
(a != b) | ((a|b) != 0)  -> ((a|b) != 0)

The similar simplification can be there for (a == b). This patch adds
simplification for above patterns. The forwprop pass was modifying the
patterns to some other form and they were not getting simplified. The
patch also adds simplification for those patterns.

Bootstrapped and tested on x86_64-linux-gnu with no regressions.

PR 117760

gcc/ChangeLog:

* match.pd ((a != b) and/or ((a | b) != 0)): New pattern.
   ((a == b) and/or (a | b) == 0): New pattern.
   ((a == b) & (a | b) == 0): New pattern.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/pr117760-1.c: New test.
* gcc.dg/tree-ssa/pr117760-2.c: New test.
* gcc.dg/tree-ssa/pr117760.c: New test.

Deferring to gcc-16 stage1.

jeff



Re: [GCC16,RFC,V2 02/14] aarch64: add new define_insn for subg

2025-04-15 Thread Richard Sandiford
Hi,

Indu Bhagat  writes:
> subg (Subtract with Tag) is an Armv8.5-A memory tagging (MTE)
> instruction.  It can be used to subtract an immediate value scaled by
> the tag granule from the address in the source register.
>
> gcc/ChangeLog:
>
>   * config/aarch64/aarch64.md (subg): New definition.

In my previous comment about this patch:

  https://gcc.gnu.org/pipermail/gcc-patches/2024-November/668669.html

I hadn't realised that the pattern follows the existing "addg" pattern.
But...

> ---
>  gcc/config/aarch64/aarch64.md | 17 +
>  1 file changed, 17 insertions(+)
>
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index 031e621c98a1..0c7aebb838cd 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -8416,6 +8416,23 @@
>[(set_attr "type" "memtag")]
>  )
>  
> +(define_insn "subg"
> +  [(set (match_operand:DI 0 "register_operand" "=rk")
> + (ior:DI
> +  (and:DI (minus:DI (match_operand:DI 1 "register_operand" "rk")
> +   (match_operand:DI 2 "aarch64_granule16_uimm6" "i"))
> +  (const_int -1080863910568919041)) ;; 0xf0ff...
> +  (ashift:DI
> +   (unspec:QI
> +[(and:QI (lshiftrt:DI (match_dup 1) (const_int 56)) (const_int 15))
> + (match_operand:QI 3 "aarch64_memtag_tag_offset" "i")]
> +UNSPEC_GEN_TAG)
> +   (const_int 56]
> +  "TARGET_MEMTAG"
> +  "subg\\t%0, %1, #%2, #%3"
> +  [(set_attr "type" "memtag")]
> +)
> +
>  (define_insn "subp"
>[(set (match_operand:DI 0 "register_operand" "=r")
>   (minus:DI

...subtractions of constants are canonically expressed using (plus ...)
of a negative number, rather than (minus ...) of a positive number.
So I think we should instead add subg to the existing addg pattern.
That is, in:

(define_insn "addg"
  [(set (match_operand:DI 0 "register_operand" "=rk")
(ior:DI
 (and:DI (plus:DI (match_operand:DI 1 "register_operand" "rk")
  (match_operand:DI 2 "aarch64_granule16_uimm6" "i"))
 (const_int -1080863910568919041)) ;; 0xf0ff...
 (ashift:DI
  (unspec:QI
   [(and:QI (lshiftrt:DI (match_dup 1) (const_int 56)) (const_int 15))
(match_operand:QI 3 "aarch64_memtag_tag_offset" "i")]
   UNSPEC_GEN_TAG)
  (const_int 56]
  "TARGET_MEMTAG"
  "addg\\t%0, %1, #%2, #%3"
  [(set_attr "type" "memtag")]
)

the aarch64_granule16_uimm6 would be replaced with a predicate that
accepts all multiples of 16 in the range [-1008, 1008].  Then the
output pattern would generate an addg or subg instruction based on
whether operand 2 is negative.

Thanks,
Richard


[PATCH] OpenMP: omp.h omp::allocator C++ Allocator interface

2025-04-15 Thread Alex
Tested on x86_64-pc-linux-gnu, this is only a library addition (and a
few tests) so it shouldn't cause any major impacts.  I also tested
libgomp C to ensure the conditional compile was working.

Okay for trunk?
From 1ef3fe0a1f026689e64963ec9ab0b04b7e6b1bde Mon Sep 17 00:00:00 2001
From: waffl3x 
Date: Tue, 15 Apr 2025 04:12:55 -0600
Subject: [PATCH] OpenMP: omp.h omp::allocator C++ Allocator interface

The implementation of each allocator is simplified by inheriting from
__detail::__allocator_templ.  At the moment, none of the implementations
diverge in any way, simply passing in the allocator handle to be used when
an allocation is made.  In the future, const_mem will need special handling
added to it to support constant memory space.

libgomp/ChangeLog:

	* omp.h.in: Add omp::allocator::* and ompx::allocator::* allocators.
	(__detail::__allocator_templ):
	New struct template.
	(null_allocator): New struct template.
	(default_mem): Likewise.
	(large_cap_mem): Likewise.
	(const_mem): Likewise.
	(high_bw_mem): Likewise.
	(low_lat_mem): Likewise.
	(cgroup_mem): Likewise.
	(pteam_mem): Likewise.
	(thread_mem): Likewise.
	(ompx::allocator::gnu_pinned_mem): Likewise.
	* testsuite/libgomp.c++/allocator-1.C: New test.
	* testsuite/libgomp.c++/allocator-2.C: New test.

Signed-off-by: waffl3x 
---
 libgomp/omp.h.in| 132 
 libgomp/testsuite/libgomp.c++/allocator-1.C | 158 
 libgomp/testsuite/libgomp.c++/allocator-2.C | 132 
 3 files changed, 422 insertions(+)
 create mode 100644 libgomp/testsuite/libgomp.c++/allocator-1.C
 create mode 100644 libgomp/testsuite/libgomp.c++/allocator-2.C

diff --git a/libgomp/omp.h.in b/libgomp/omp.h.in
index d5e8be46e94..8d17db1da9a 100644
--- a/libgomp/omp.h.in
+++ b/libgomp/omp.h.in
@@ -432,4 +432,136 @@ extern const char *omp_get_uid_from_device (int) __GOMP_NOTHROW;
 }
 #endif
 
+#if __cplusplus >= 201103L
+
+/* std::__throw_bad_alloc and std::__throw_bad_array_new_length.  */
+#include 
+
+namespace omp
+{
+namespace allocator
+{
+
+namespace __detail
+{
+
+template
+struct __allocator_templ
+{
+  using value_type = __T;
+  using pointer = __T*;
+  using const_pointer = const __T*;
+  using size_type = __SIZE_TYPE__;
+  using difference_type = __PTRDIFF_TYPE__;
+
+  __T*
+  allocate (size_type __n)
+  {
+if (__SIZE_MAX__ / sizeof(__T) < __n)
+  std::__throw_bad_array_new_length ();
+void *__p = omp_aligned_alloc (alignof(__T), __n * sizeof(__T), __Handle);
+if (!__p)
+  std::__throw_bad_alloc ();
+return static_cast<__T*>(__p);
+  }
+
+  void
+  deallocate (__T *__p, size_type) __GOMP_NOTHROW
+  {
+omp_free (static_cast(__p), __Handle);
+  }
+};
+
+template
+constexpr bool
+operator== (const __allocator_templ<__T, __Handle>&,
+	const __allocator_templ<__U, __Handle>&) __GOMP_NOTHROW
+{
+  return true;
+}
+
+template
+constexpr bool
+operator== (const __allocator_templ<__T, __Handle>&,
+	const __allocator_templ<__U, __UHandle>&) __GOMP_NOTHROW
+{
+  return false;
+}
+
+template
+constexpr bool
+operator!= (const __allocator_templ<__T, __Handle>&,
+	const __allocator_templ<__U, __Handle>&) __GOMP_NOTHROW
+{
+  return false;
+}
+
+template
+constexpr bool
+operator!= (const __allocator_templ<__T, __Handle>&,
+	const __allocator_templ<__U, __UHandle>&) __GOMP_NOTHROW
+{
+  return true;
+}
+
+} /* namespace __detail */
+
+template
+struct null_allocator
+  : __detail::__allocator_templ<__T, omp_null_allocator> {};
+
+template
+struct default_mem
+  : __detail::__allocator_templ<__T, omp_default_mem_alloc> {};
+
+template
+struct large_cap_mem
+  : __detail::__allocator_templ<__T, omp_large_cap_mem_alloc> {};
+
+template
+struct const_mem
+  : __detail::__allocator_templ<__T, omp_const_mem_alloc> {};
+
+template
+struct high_bw_mem
+  : __detail::__allocator_templ<__T, omp_high_bw_mem_alloc> {};
+
+template
+struct low_lat_mem
+  : __detail::__allocator_templ<__T, omp_low_lat_mem_alloc> {};
+
+template
+struct cgroup_mem
+  : __detail::__allocator_templ<__T, omp_cgroup_mem_alloc> {};
+
+template
+struct pteam_mem
+  : __detail::__allocator_templ<__T, omp_pteam_mem_alloc> {};
+
+template
+struct thread_mem
+  : __detail::__allocator_templ<__T, omp_thread_mem_alloc> {};
+
+} /* namespace allocator */
+
+} /* namespace omp */
+
+namespace ompx
+{
+
+namespace allocator
+{
+
+template
+struct gnu_pinned_mem
+  : omp::allocator::__detail::__allocator_templ<__T, ompx_gnu_pinned_mem_alloc> {};
+
+} /* namespace allocator */
+
+} /* namespace ompx */
+
+#endif /* __cplusplus */
+
 #endif /* _OMP_H */
diff --git a/libgomp/testsuite/libgomp.c++/allocator-1.C b/libgomp/testsuite/libgomp.c++/allocator-1.C
new file mode 100644
index 000..725beade0c8
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c++/allocator-1.C
@@ -0,0 +1,158 @@
+// { dg-do run }
+
+#include 
+#include 
+#include 
+
+template class Alloc>
+void test (T const initial_value = T())
+{
+  using Allocator 

Regenerate common.opt.urls

2025-04-15 Thread Kyrylo Tkachov
Pushing as obvious.
Thanks,
Kyrill

Signed-off-by: Kyrylo Tkachov 

* common.opt.urls: Regenerate.



0001-Regenerate-common.opt.urls.patch
Description: 0001-Regenerate-common.opt.urls.patch


Re: [PATCH] x86: Update gcc.target/i386/apx-interrupt-1.c

2025-04-15 Thread Uros Bizjak
On Tue, Apr 15, 2025 at 1:06 AM H.J. Lu  wrote:
>
> ix86_add_cfa_restore_note omits the REG_CFA_RESTORE REG note for registers
> pushed in red-zone.  Since
>
> commit 0a074b8c7e79f9d9359d044f1499b0a9ce9d2801
> Author: H.J. Lu 
> Date:   Sun Apr 13 12:20:42 2025 -0700
>
> APX: Don't use red-zone with 32 GPRs and no caller-saved registers
>
> disabled red-zone, update gcc.target/i386/apx-interrupt-1.c to expect
> 31 .cfi_restore directives.

Hm, did you also account for RED_ZONE_RESERVE? The last 8-byte slot is
reserved for internal use by the compiler.

Uros.

>
> PR target/119784
> * gcc.target/i386/apx-interrupt-1.c: Expect 31 .cfi_restore
> directives.
>
> Signed-off-by: H.J. Lu 
> ---
>  gcc/testsuite/gcc.target/i386/apx-interrupt-1.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/gcc/testsuite/gcc.target/i386/apx-interrupt-1.c 
> b/gcc/testsuite/gcc.target/i386/apx-interrupt-1.c
> index fefe2e6d6fc..fa1acc7a142 100644
> --- a/gcc/testsuite/gcc.target/i386/apx-interrupt-1.c
> +++ b/gcc/testsuite/gcc.target/i386/apx-interrupt-1.c
> @@ -66,7 +66,7 @@ void foo (void *frame)
>  /* { dg-final { scan-assembler-times {\t\.cfi_offset 132, -120} 1 } } */
>  /* { dg-final { scan-assembler-times {\t\.cfi_offset 131, -128} 1 } } */
>  /* { dg-final { scan-assembler-times {\t\.cfi_offset 130, -136} 1 } } */
> -/* { dg-final { scan-assembler-times ".cfi_restore" 15} } */
> +/* { dg-final { scan-assembler-times ".cfi_restore" 31 } } */
>  /* { dg-final { scan-assembler-times "pop(?:l|q)\[\\t \]*%(?:e|r)ax" 1 } } */
>  /* { dg-final { scan-assembler-times "pop(?:l|q)\[\\t \]*%(?:e|r)bx" 1 } } */
>  /* { dg-final { scan-assembler-times "pop(?:l|q)\[\\t \]*%(?:e|r)cx" 1 } } */
> --
> 2.49.0
>


Re: [PATCH v5 2/2] i386: Enable -mnop-mcount for -fpic with PLTs

2025-04-15 Thread Uros Bizjak
On Thu, Apr 10, 2025 at 2:26 PM Ard Biesheuvel  wrote:
>
> From: Ard Biesheuvel 
>
> -mnop-mcount can be trivially enabled for -fPIC codegen as long as PLTs
> are being used, given that the instruction encodings are identical, only
> the target may resolve differently depending on how the linker decides
> to incorporate the object file.
>
> So relax the option check, and add a test to ensure that 5-byte NOPs are
> emitted when -mnop-mcount is being used.
>
> Signed-off-by: Ard Biesheuvel 
>
> gcc/ChangeLog:
>
> PR target/119386
> * config/i386/i386-options.cc: Permit -mnop-mcount when using
>   -fpic with PLTs.
>
> gcc/testsuite/ChangeLog:
>
> PR target/119386
> * gcc.target/i386/pr119386-3.c: New test.

LGTM.

Thanks,
Uros.

> ---
>  gcc/config/i386/i386-options.cc|  4 ++--
>  gcc/testsuite/gcc.target/i386/pr119386-3.c | 10 ++
>  2 files changed, 12 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr119386-3.c
>
> diff --git a/gcc/config/i386/i386-options.cc b/gcc/config/i386/i386-options.cc
> index a9fac011f3d..964449fa8cd 100644
> --- a/gcc/config/i386/i386-options.cc
> +++ b/gcc/config/i386/i386-options.cc
> @@ -2828,8 +2828,8 @@ ix86_option_override_internal (bool main_args_p,
>if (flag_nop_mcount)
>  error ("%<-mnop-mcount%> is not compatible with this target");
>  #endif
> -  if (flag_nop_mcount && flag_pic)
> -error ("%<-mnop-mcount%> is not implemented for %<-fPIC%>");
> +  if (flag_nop_mcount && flag_pic && !flag_plt)
> +error ("%<-mnop-mcount%> is not implemented for %<-fno-plt%>");
>
>/* Accept -msseregparm only if at least SSE support is enabled.  */
>if (TARGET_SSEREGPARM_P (opts->x_target_flags)
> diff --git a/gcc/testsuite/gcc.target/i386/pr119386-3.c 
> b/gcc/testsuite/gcc.target/i386/pr119386-3.c
> new file mode 100644
> index 000..287410b951a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr119386-3.c
> @@ -0,0 +1,10 @@
> +/* PR target/119386 */
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fpic -pg -mnop-mcount" } */
> +/* { dg-final { scan-assembler ".byte\[ \t\]+0x0f, 0x1f, 0x44, 0x00, 0x00" } 
> } */
> +
> +int
> +main ()
> +{
> +  return 0;
> +}
> --
> 2.49.0.504.g3bcea36a83-goog
>


Re: [PATCH] combine: Correct comment about combine_validate_cost

2025-04-15 Thread Richard Sandiford
Hans-Peter Nilsson  writes:
> Noticed while investigating a regression for cris-elf with
> r15-9239-g4d7a634f6d4102 "combine: Allow 2->2 combinations,
> but with a tweak [PR116398]" (to-be-reported).
>
> The comment was introduced when breaking out the
> combine_validate_cost function, in r0-59417-g64b8935d4809f3.
>
> I thought about wordsmithing to keep the "polarity" of the
> statement, but "are equal to or cheaper than" didn't read
> well.
>
> Ok to commit?

OK, thanks.

Richard

> -- >8 --
> The *code* has been the same since forever, but this
> comment, at a critical path, is misleading: if the new cost
> is the same (like, when doing an identity replacement), then
> combine_validate_cost returns true.
>
>   * combine.cc (try_combine): Correct comment about
>   combine_validate_cost.
> ---
>  gcc/combine.cc | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/combine.cc b/gcc/combine.cc
> index 5f085187cfef..c2c1d50ca49f 100644
> --- a/gcc/combine.cc
> +++ b/gcc/combine.cc
> @@ -4129,8 +4129,8 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, 
> rtx_insn *i0,
>   }
>  }
>  
> -  /* Only allow this combination if insn_cost reports that the
> - replacement instructions are cheaper than the originals.  */
> +  /* Reject this combination if insn_cost reports that the replacement
> + instructions are more expensive than the originals.  */
>if (!combine_validate_cost (i0, i1, i2, i3, newpat, newi2pat, other_pat))
>  {
>undo_all ();


Re: [PATCH v5 1/2] i386: Prefer PLT indirection for __fentry__ calls under -fPIC

2025-04-15 Thread Uros Bizjak
On Thu, Apr 10, 2025 at 2:27 PM Ard Biesheuvel  wrote:
>
> From: Ard Biesheuvel 
>
> Commit bde21de1205 ("i386: Honour -mdirect-extern-access when calling
> __fentry__") updated the logic that emits mcount() / __fentry__() calls
> into function prologues when profiling is enabled, to avoid GOT-based
> indirect calls when a direct call would suffice.
>
> There are two problems with that change:
> - it relies on -mdirect-extern-access rather than -fno-plt to decide
>   whether or not a direct [PLT based] call is appropriate;
> - for the PLT case, it falls through to x86_print_call_or_nop(), which
>   does not emit the @PLT suffix, resulting in the wrong relocation to be
>   used (R_X86_64_PC32 instead of R_X86_64_PLT32)
>
> Fix this by testing flag_plt instead of ix86_direct_extern_access, and
> updating x86_print_call_or_nop() to take flag_pic and flag_plt into
> account. This also ensures that -mnop-mcount works as expected when
> emitting the PLT based profiling calls.
>
> While at it, fix the 32-bit logic as well, and issue a PLT call unless
> PLTs are explicitly disabled.
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119386
>
> Signed-off-by: Ard Biesheuvel 
>
> gcc/ChangeLog:
>
> PR target/119386
> * config/i386/i386.cc (x86_print_call_or_nop): Add @PLT suffix
> where appropriate.
> (x86_function_profiler): Fall through to x86_print_call_or_nop()
> for PIC codegen when flag_plt is set.
>
> gcc/testsuite/ChangeLog:
>
> PR target/119386
> * gcc.target/i386/pr119386-1.c: New test.
> * gcc.target/i386/pr119386-2.c: New test.

OK if there are no further comments in the next day or two.

BTW: Do you have commit rights?

Thanks,
Uros.

> ---
>  gcc/config/i386/i386.cc| 12 ++--
>  gcc/testsuite/gcc.target/i386/pr119386-1.c | 10 ++
>  gcc/testsuite/gcc.target/i386/pr119386-2.c | 12 
>  3 files changed, 32 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr119386-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr119386-2.c
>
> diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> index 4f8380c4a58..20059b775b9 100644
> --- a/gcc/config/i386/i386.cc
> +++ b/gcc/config/i386/i386.cc
> @@ -23158,6 +23158,12 @@ x86_print_call_or_nop (FILE *file, const char 
> *target)
>if (flag_nop_mcount || !strcmp (target, "nop"))
>  /* 5 byte nop: nopl 0(%[re]ax,%[re]ax,1) */
>  fprintf (file, "1:" ASM_BYTE "0x0f, 0x1f, 0x44, 0x00, 0x00\n");
> +  else if (!TARGET_PECOFF && flag_pic)
> +{
> +  gcc_assert (flag_plt);
> +
> +  fprintf (file, "1:\tcall\t%s@PLT\n", target);
> +}
>else
>  fprintf (file, "1:\tcall\t%s\n", target);
>  }
> @@ -23321,7 +23327,7 @@ x86_function_profiler (FILE *file, int labelno 
> ATTRIBUTE_UNUSED)
>   break;
> case CM_SMALL_PIC:
> case CM_MEDIUM_PIC:
> - if (!ix86_direct_extern_access)
> + if (!flag_plt)
> {
>   if (ASSEMBLER_DIALECT == ASM_INTEL)
> fprintf (file, "1:\tcall\t[QWORD PTR %s@GOTPCREL[rip]]\n",
> @@ -23352,7 +23358,9 @@ x86_function_profiler (FILE *file, int labelno 
> ATTRIBUTE_UNUSED)
>  "\tleal\t%sP%d@GOTOFF(%%ebx), %%" PROFILE_COUNT_REGISTER 
> "\n",
>  LPREFIX, labelno);
>  #endif
> -  if (ASSEMBLER_DIALECT == ASM_INTEL)
> +  if (flag_plt)
> +   x86_print_call_or_nop (file, mcount_name);
> +  else if (ASSEMBLER_DIALECT == ASM_INTEL)
> fprintf (file, "1:\tcall\t[DWORD PTR %s@GOT[ebx]]\n", mcount_name);
>else
> fprintf (file, "1:\tcall\t*%s@GOT(%%ebx)\n", mcount_name);
> diff --git a/gcc/testsuite/gcc.target/i386/pr119386-1.c 
> b/gcc/testsuite/gcc.target/i386/pr119386-1.c
> new file mode 100644
> index 000..9a0dc64b5b9
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr119386-1.c
> @@ -0,0 +1,10 @@
> +/* PR target/119386 */
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fpic -pg" } */
> +/* { dg-final { scan-assembler "call\[ \t\]+mcount@PLT" } } */
> +
> +int
> +main ()
> +{
> +  return 0;
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/pr119386-2.c 
> b/gcc/testsuite/gcc.target/i386/pr119386-2.c
> new file mode 100644
> index 000..3ea978ecfdf
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr119386-2.c
> @@ -0,0 +1,12 @@
> +/* PR target/119386 */
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fpic -fno-plt -pg" } */
> +/* { dg-final { scan-assembler "call\[ \t\]+\\*mcount@GOTPCREL\\(" { target 
> { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "call\[ \t\]+\\*mcount@GOT\\(" { target ia32 
> } } } */
> +
> +
> +int
> +main ()
> +{
> +  return 0;
> +}
> --
> 2.49.0.504.g3bcea36a83-goog
>


[committed v3] libstdc++: Fix std::string construction from volatile char* [PR119748]

2025-04-15 Thread Jonathan Wakely
My recent r15-9381-g648d5c26e25497 change assumes that a contiguous
iterator with the correct value_type can be converted to a const charT*
but that's not true for volatile charT*. The optimization should only be
done if it can be converted to the right pointer type.

Additionally, some generic loops for non-contiguous iterators need an
explicit cast to deal with iterator reference types that do not bind to
the const charT& parameter of traits_type::assign.

libstdc++-v3/ChangeLog:

PR libstdc++/119748
* include/bits/basic_string.h (_S_copy_chars): Only optimize for
contiguous iterators that are convertible to const charT*. Use
explicit conversion to charT after dereferencing iterator.
(_S_copy_range): Likewise for contiguous ranges.
* include/bits/basic_string.tcc (_M_construct): Use explicit
conversion to charT after dereferencing iterator.
* include/bits/cow_string.h (_S_copy_chars): Likewise.
(basic_string(from_range_t, R&&, const Allocator&)): Likewise.
Only optimize for contiguous iterators that are convertible to
const charT*.
* testsuite/21_strings/basic_string/cons/char/119748.cc: New
test.
* testsuite/21_strings/basic_string/cons/wchar_t/119748.cc:
New test.

Reviewed-by: Tomasz Kaminski 
---

Changes in v3:
- Fixed commit message to not talk about iterator references that aren't
  implicitly convertible to value_type.
- Used testsuite_iterators.h for new tests (after enabling the
  test_container(T(&)[N]) constructor for C++98).

Tested x86_64-linux. Pushed to trunk.

The static_cast parts would be OK to backport too.

 libstdc++-v3/include/bits/basic_string.h  | 24 +
 libstdc++-v3/include/bits/basic_string.tcc|  3 +-
 libstdc++-v3/include/bits/cow_string.h| 17 ++---
 .../basic_string/cons/char/119748.cc  | 35 +++
 .../basic_string/cons/wchar_t/119748.cc   |  7 
 5 files changed, 73 insertions(+), 13 deletions(-)
 create mode 100644 
libstdc++-v3/testsuite/21_strings/basic_string/cons/char/119748.cc
 create mode 100644 
libstdc++-v3/testsuite/21_strings/basic_string/cons/wchar_t/119748.cc

diff --git a/libstdc++-v3/include/bits/basic_string.h 
b/libstdc++-v3/include/bits/basic_string.h
index 9c431c765ab..c90bd099b63 100644
--- a/libstdc++-v3/include/bits/basic_string.h
+++ b/libstdc++-v3/include/bits/basic_string.h
@@ -488,8 +488,11 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
  is_same<_IterBase, const _CharT*>>::value)
_S_copy(__p, std::__niter_base(__k1), __k2 - __k1);
 #if __cpp_lib_concepts
- else if constexpr (contiguous_iterator<_Iterator>
-  && is_same_v, _CharT>)
+ else if constexpr (requires {
+  requires contiguous_iterator<_Iterator>;
+  { std::to_address(__k1) }
+-> convertible_to;
+})
{
  const auto __d = __k2 - __k1;
  (void) (__k1 + __d); // See P3349R1
@@ -499,7 +502,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
  else
 #endif
  for (; __k1 != __k2; ++__k1, (void)++__p)
-   traits_type::assign(*__p, *__k1); // These types are off.
+   traits_type::assign(*__p, static_cast<_CharT>(*__k1));
}
 #pragma GCC diagnostic pop
 
@@ -527,12 +530,19 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
static constexpr void
_S_copy_range(pointer __p, _Rg&& __rg, size_type __n)
{
- if constexpr (ranges::contiguous_range<_Rg>
- && is_same_v, _CharT>)
+ if constexpr (requires {
+ requires ranges::contiguous_range<_Rg>;
+ { ranges::data(std::forward<_Rg>(__rg)) }
+   -> convertible_to;
+   })
_S_copy(__p, ranges::data(std::forward<_Rg>(__rg)), __n);
  else
-   for (auto&& __e : __rg)
- traits_type::assign(*__p++, std::forward(__e));
+   {
+ auto __first = ranges::begin(__rg);
+ const auto __last = ranges::end(__rg);
+ for (; __first != __last; ++__first)
+   traits_type::assign(*__p++, static_cast<_CharT>(*__first));
+   }
}
 #endif
 
diff --git a/libstdc++-v3/include/bits/basic_string.tcc 
b/libstdc++-v3/include/bits/basic_string.tcc
index 02230aca5d2..bca55bc5658 100644
--- a/libstdc++-v3/include/bits/basic_string.tcc
+++ b/libstdc++-v3/include/bits/basic_string.tcc
@@ -210,7 +210,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
_M_data(__another);
_M_capacity(__capacity);
  }
-   traits_type::assign(_M_data()[__len++], *__beg);
+   traits_type::assign(_M_data()[__len++],
+   static_cast<_CharT>(*__beg));
++__beg;
  }
 
di

[PATCH v2 4/4] libstdc++: Add tests for std::extents.

2025-04-15 Thread Luc Grosheintz
A prior commit added std::extents, this commit adds the tests. The bulk
is focussed on testing the constructors. These are split into three
groups:

1. the ctor from other extents and the copy ctor,
2. the ctor from a pack of integer-like objects,
3. the ctor from shapes, i.e. span and array.

For each group check that the ctor:
* produces an object with the expected values for extent,
* is implicit if and only if required,
* is constexpr,
* doesn't change the rank of the extent.

libstdc++-v3/ChangeLog:

* testsuite/23_containers/mdspan/extents/assign.cc: New test.
* testsuite/23_containers/mdspan/extents/class_properties.cc: New test.
* testsuite/23_containers/mdspan/extents/ctor_copy.cc: New test.
* testsuite/23_containers/mdspan/extents/ctor_copy_constexpr.cc: New 
test.
* testsuite/23_containers/mdspan/extents/ctor_ints.cc: New test.
* testsuite/23_containers/mdspan/extents/ctor_ints_constexpr.cc: New 
test.
* testsuite/23_containers/mdspan/extents/ctor_shape_all_extents.cc: New 
test.
* testsuite/23_containers/mdspan/extents/ctor_shape_constexpr.cc: New 
test.
* testsuite/23_containers/mdspan/extents/ctor_shape_dynamic_extents.cc: 
New test.
* testsuite/23_containers/mdspan/extents/custom_integer.cc: New test.
* testsuite/23_containers/mdspan/extents/deduction_guide.cc: New test.
* testsuite/23_containers/mdspan/extents/dextents.cc: New test.
* testsuite/23_containers/mdspan/extents/extent.cc: New test.
* testsuite/23_containers/mdspan/extents/ops_eq.cc: New test.

Signed-off-by: Luc Grosheintz 
---
 .../23_containers/mdspan/extents/assign.cc| 29 ++
 .../mdspan/extents/class_properties.cc| 62 +
 .../23_containers/mdspan/extents/ctor_copy.cc | 75 +++
 .../mdspan/extents/ctor_copy_constexpr.cc | 20 
 .../23_containers/mdspan/extents/ctor_ints.cc | 58 
 .../mdspan/extents/ctor_ints_constexpr.cc | 12 +++
 .../mdspan/extents/ctor_shape_all_extents.cc  | 61 +
 .../mdspan/extents/ctor_shape_constexpr.cc| 23 +
 .../extents/ctor_shape_dynamic_extents.cc | 91 +++
 .../mdspan/extents/custom_integer.cc  | 87 ++
 .../mdspan/extents/deduction_guide.cc | 34 +++
 .../23_containers/mdspan/extents/dextents.cc  | 11 +++
 .../23_containers/mdspan/extents/extent.cc| 24 +
 .../23_containers/mdspan/extents/ops_eq.cc| 58 
 14 files changed, 645 insertions(+)
 create mode 100644 
libstdc++-v3/testsuite/23_containers/mdspan/extents/assign.cc
 create mode 100644 
libstdc++-v3/testsuite/23_containers/mdspan/extents/class_properties.cc
 create mode 100644 
libstdc++-v3/testsuite/23_containers/mdspan/extents/ctor_copy.cc
 create mode 100644 
libstdc++-v3/testsuite/23_containers/mdspan/extents/ctor_copy_constexpr.cc
 create mode 100644 
libstdc++-v3/testsuite/23_containers/mdspan/extents/ctor_ints.cc
 create mode 100644 
libstdc++-v3/testsuite/23_containers/mdspan/extents/ctor_ints_constexpr.cc
 create mode 100644 
libstdc++-v3/testsuite/23_containers/mdspan/extents/ctor_shape_all_extents.cc
 create mode 100644 
libstdc++-v3/testsuite/23_containers/mdspan/extents/ctor_shape_constexpr.cc
 create mode 100644 
libstdc++-v3/testsuite/23_containers/mdspan/extents/ctor_shape_dynamic_extents.cc
 create mode 100644 
libstdc++-v3/testsuite/23_containers/mdspan/extents/custom_integer.cc
 create mode 100644 
libstdc++-v3/testsuite/23_containers/mdspan/extents/deduction_guide.cc
 create mode 100644 
libstdc++-v3/testsuite/23_containers/mdspan/extents/dextents.cc
 create mode 100644 
libstdc++-v3/testsuite/23_containers/mdspan/extents/extent.cc
 create mode 100644 
libstdc++-v3/testsuite/23_containers/mdspan/extents/ops_eq.cc

diff --git a/libstdc++-v3/testsuite/23_containers/mdspan/extents/assign.cc 
b/libstdc++-v3/testsuite/23_containers/mdspan/extents/assign.cc
new file mode 100644
index 000..3bc32361a7b
--- /dev/null
+++ b/libstdc++-v3/testsuite/23_containers/mdspan/extents/assign.cc
@@ -0,0 +1,29 @@
+// { dg-do run { target c++23 } }
+#include 
+
+#include 
+
+constexpr auto dyn = std::dynamic_extent;
+
+static_assert(std::is_nothrow_assignable_v,
+  std::extents>);
+
+int
+main()
+{
+  auto e1 = std::extents();
+  auto e2 = std::extents();
+
+  e2 = e1;
+  VERIFY(e2 == e1);
+
+  auto e5 = std::extents();
+  e5 = e1;
+  VERIFY(e5 == e1);
+
+  auto e3 = std::extents(1, 2);
+  auto e4 = std::extents(3, 4);
+  e3 = e4;
+  VERIFY(e3 == e4);
+  return 0;
+}
diff --git 
a/libstdc++-v3/testsuite/23_containers/mdspan/extents/class_properties.cc 
b/libstdc++-v3/testsuite/23_containers/mdspan/extents/class_properties.cc
new file mode 100644
index 000..548900a7f44
--- /dev/null
+++ b/libstdc++-v3/testsuite/23_containers/mdspan/extents/class_properties.cc
@@ -0,0 +1,62 @@
+// { dg-do compile { target c++23 } }
+#include 
+
+

[PATCH v2 1/4] libstdc++: Setup internal FTM for mdspan.

2025-04-15 Thread Luc Grosheintz
Uses the FTM infrastructure to create an internal feature testing macro
for partial availability of mdspan; which is then used to hide the
contents of the header mdspan when compiling against a standard prior to
C++23.

libstdc++-v3/ChangeLog:

* include/bits/version.def: Add internal feature testing macro
__glibcxx_mdspan.
* include/bits/version.h: Regenerate.

Signed-off-by: Luc Grosheintz 
---
 libstdc++-v3/include/bits/version.def | 9 +
 libstdc++-v3/include/bits/version.h   | 9 +
 2 files changed, 18 insertions(+)

diff --git a/libstdc++-v3/include/bits/version.def 
b/libstdc++-v3/include/bits/version.def
index 0afaf0dec24..b2aaabff6d2 100644
--- a/libstdc++-v3/include/bits/version.def
+++ b/libstdc++-v3/include/bits/version.def
@@ -998,6 +998,15 @@ ftms = {
   };
 };
 
+ftms = {
+  name = mdspan;
+  no_stdname = true; // FIXME: remove
+  values = {
+v = 1; // FIXME: 202207
+cxxmin = 23;
+  };
+};
+
 ftms = {
   name = ssize;
   values = {
diff --git a/libstdc++-v3/include/bits/version.h 
b/libstdc++-v3/include/bits/version.h
index 980fee641e9..9ee1e0e980d 100644
--- a/libstdc++-v3/include/bits/version.h
+++ b/libstdc++-v3/include/bits/version.h
@@ -1115,6 +1115,15 @@
 #endif /* !defined(__cpp_lib_span) && defined(__glibcxx_want_span) */
 #undef __glibcxx_want_span
 
+#if !defined(__cpp_lib_mdspan)
+# if (__cplusplus >= 202100L)
+#  define __glibcxx_mdspan 1L
+#  if defined(__glibcxx_want_all) || defined(__glibcxx_want_mdspan)
+#  endif
+# endif
+#endif /* !defined(__cpp_lib_mdspan) && defined(__glibcxx_want_mdspan) */
+#undef __glibcxx_want_mdspan
+
 #if !defined(__cpp_lib_ssize)
 # if (__cplusplus >= 202002L)
 #  define __glibcxx_ssize 201902L
-- 
2.48.1



[PATCH] tailc: Fix up musttail calls vs. -fsanitize=thread [PR119801]

2025-04-15 Thread Jakub Jelinek
Hi!

Calls with musttail attribute don't really work with -fsanitize=thread in
GCC.  The problem is that TSan instrumentation adds
  __tsan_func_entry (__builtin_return_address (0));
calls at the start of each instrumented function and
  __tsan_func_exit ();
call at the end of those and the latter stands in a way of normal tail calls
as well as musttail tail calls.

Looking at what LLVM does, for normal calls -fsanitize=thread also prevents
tail calls like in GCC (well, the __tsan_func_exit () call itself can be
tail called in GCC (and from what I see not in clang)).
But for [[clang::musttail]] calls it arranges to move the
__tsan_func_exit () before the musttail call instead of after it.

The following patch handles it similarly.  If we for -fsanitize=thread
instrumented function detect __builtin_tsan_func_exit () call, we process
it normally (so that the call can be tail called in function returning void)
but set a flag that the builtin has been seen (only for cfun->has_musttail
in the diag_musttail phase).  And then let tree_optimize_tail_calls_1
call find_tail_calls again in a new mode where the __tsan_func_exit ()
call is ignored and so we are able to find calls before it, but only
accept that if the call before it is actually a musttail.  For C++ it needs
to verify that EH cleanup if any also has the __tsan_func_exit () call
and if all goes well, the musttail call is registered for tailcalling with
a flag that it has __tsan_func_exit () after it and when optimizing that
we emit __tsan_func_exit (); call before the musttail tail call (or musttail
tail recursion).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2025-04-15  Jakub Jelinek  

PR sanitizer/119801
* sanitizer.def (BUILT_IN_TSAN_FUNC_EXIT): Use BT_FN_VOID rather
than BT_FN_VOID_PTR.
* tree-tailcall.cc: Include attribs.h and asan.h.
(struct tailcall): Add has_tsan_func_exit member.
(empty_eh_cleanup): Add eh_has_tsan_func_exit argument, set what
it points to to 1 if there is exactly one __tsan_func_exit call
and ignore that call otherwise.  Adjust recursive call.
(find_tail_calls): Add RETRY_TSAN_FUNC_EXIT argument, pass it
to recursive calls.  When seeing __tsan_func_exit call with
RETRY_TSAN_FUNC_EXIT 0, set it to -1.  If RETRY_TSAN_FUNC_EXIT
is 1, initially ignore __tsan_func_exit calls.  Adjust
empty_eh_cleanup caller.  When looking through stmts after the call,
ignore exactly one __tsan_func_exit call but remember it in
t->has_tsan_func_exit.  Diagnose if EH cleanups didn't have
__tsan_func_exit and normal path did or vice versa.
(optimize_tail_call): Emit __tsan_func_exit before the tail call
or tail recursion.
(tree_optimize_tail_calls_1): Adjust find_tail_calls callers.  If
find_tail_calls changes retry_tsan_func_exit to -1, set it to 1
and call it again with otherwise the same arguments.

* c-c++-common/tsan/pr119801.c: New test.

--- gcc/sanitizer.def.jj2025-04-14 19:30:31.804837079 +0200
+++ gcc/sanitizer.def   2025-04-15 09:48:23.752349037 +0200
@@ -247,7 +247,7 @@ DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_INIT
 DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_FUNC_ENTRY, "__tsan_func_entry",
  BT_FN_VOID_PTR, ATTR_NOTHROW_LEAF_LIST)
 DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_FUNC_EXIT, "__tsan_func_exit",
- BT_FN_VOID_PTR, ATTR_NOTHROW_LEAF_LIST)
+ BT_FN_VOID, ATTR_NOTHROW_LEAF_LIST)
 DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_VPTR_UPDATE, "__tsan_vptr_update",
  BT_FN_VOID_PTR_PTR, ATTR_NOTHROW_LEAF_LIST)
 DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_READ1, "__tsan_read1",
--- gcc/tree-tailcall.cc.jj 2025-04-14 19:30:31.976834786 +0200
+++ gcc/tree-tailcall.cc2025-04-15 10:12:48.879501238 +0200
@@ -51,6 +51,8 @@ along with GCC; see the file COPYING3.
 #include "symbol-summary.h"
 #include "ipa-cp.h"
 #include "ipa-prop.h"
+#include "attribs.h"
+#include "asan.h"
 
 /* The file implements the tail recursion elimination.  It is also used to
analyze the tail calls in general, passing the results to the rtl level
@@ -122,6 +124,9 @@ struct tailcall
   /* True if it is a call to the current function.  */
   bool tail_recursion;
 
+  /* True if there is __tsan_func_exit call after the call.  */
+  bool has_tsan_func_exit;
+
   /* The return value of the caller is mult * f + add, where f is the return
  value of the call.  */
   tree mult, add;
@@ -504,7 +509,7 @@ maybe_error_musttail (gcall *call, const
Search at most CNT basic blocks (so that we don't need to do trivial
loop discovery).  */
 static bool
-empty_eh_cleanup (basic_block bb, int cnt)
+empty_eh_cleanup (basic_block bb, int *eh_has_tsan_func_exit, int cnt)
 {
   if (EDGE_COUNT (bb->succs) > 1)
 return false;
@@ -515,6 +520,14 @@ empty_eh_cleanup (basic_block bb, int cn
   gimple *g = gsi_stmt (gs

Re: [PATCH v2 2/4] libstdc++: Add header mdspan to the build-system.

2025-04-15 Thread Tomasz Kaminski
On Tue, Apr 15, 2025 at 10:43 AM Luc Grosheintz 
wrote:

> Creates a nearly empty header mdspan and adds it to the build-system and
> Doxygen config file.
>
> libstdc++-v3/ChangeLog:
>
> * doc/doxygen/user.cfg.in: Add .
> * include/Makefile.am: Ditto.
> * include/Makefile.in: Ditto.
> * include/precompiled/stdc++.h: Ditto.
> * include/std/mdspan: New file.
>
> Signed-off-by: Luc Grosheintz 
>
LGTM, as mentioned before, a separate approval from merging is needed.

> ---
>  libstdc++-v3/doc/doxygen/user.cfg.in  |  1 +
>  libstdc++-v3/include/Makefile.am  |  1 +
>  libstdc++-v3/include/Makefile.in  |  1 +
>  libstdc++-v3/include/precompiled/stdc++.h |  1 +
>  libstdc++-v3/include/std/mdspan   | 48 +++
>  5 files changed, 52 insertions(+)
>  create mode 100644 libstdc++-v3/include/std/mdspan
>
> diff --git a/libstdc++-v3/doc/doxygen/user.cfg.in
> b/libstdc++-v3/doc/doxygen/user.cfg.in
> index 19ae67a67ba..e926c6707f6 100644
> --- a/libstdc++-v3/doc/doxygen/user.cfg.in
> +++ b/libstdc++-v3/doc/doxygen/user.cfg.in
> @@ -880,6 +880,7 @@ INPUT  = 
> @srcdir@/doc/doxygen/doxygroups.cc
> \
>   include/list \
>   include/locale \
>   include/map \
> + include/mdspan \
>   include/memory \
>   include/memory_resource \
>   include/mutex \
> diff --git a/libstdc++-v3/include/Makefile.am
> b/libstdc++-v3/include/Makefile.am
> index 537774c2668..1140fa0dffd 100644
> --- a/libstdc++-v3/include/Makefile.am
> +++ b/libstdc++-v3/include/Makefile.am
> @@ -38,6 +38,7 @@ std_freestanding = \
> ${std_srcdir}/generator \
> ${std_srcdir}/iterator \
> ${std_srcdir}/limits \
> +   ${std_srcdir}/mdspan \
> ${std_srcdir}/memory \
> ${std_srcdir}/numbers \
> ${std_srcdir}/numeric \
> diff --git a/libstdc++-v3/include/Makefile.in
> b/libstdc++-v3/include/Makefile.in
> index 7b96b2207f8..c96e981acd6 100644
> --- a/libstdc++-v3/include/Makefile.in
> +++ b/libstdc++-v3/include/Makefile.in
> @@ -396,6 +396,7 @@ std_freestanding = \
> ${std_srcdir}/generator \
> ${std_srcdir}/iterator \
> ${std_srcdir}/limits \
> +   ${std_srcdir}/mdspan \
> ${std_srcdir}/memory \
> ${std_srcdir}/numbers \
> ${std_srcdir}/numeric \
> diff --git a/libstdc++-v3/include/precompiled/stdc++.h
> b/libstdc++-v3/include/precompiled/stdc++.h
> index f4b312d9e47..e7d89c92704 100644
> --- a/libstdc++-v3/include/precompiled/stdc++.h
> +++ b/libstdc++-v3/include/precompiled/stdc++.h
> @@ -228,6 +228,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> diff --git a/libstdc++-v3/include/std/mdspan
> b/libstdc++-v3/include/std/mdspan
> new file mode 100644
> index 000..4094a416d1e
> --- /dev/null
> +++ b/libstdc++-v3/include/std/mdspan
> @@ -0,0 +1,48 @@
> +//  -*- C++ -*-
> +
> +// Copyright (C) 2025 Free Software Foundation, Inc.
> +//
> +// This file is part of the GNU ISO C++ Library.  This library is free
> +// software; you can redistribute it and/or modify it under the
> +// terms of the GNU General Public License as published by the
> +// Free Software Foundation; either version 3, or (at your option)
> +// any later version.
> +
> +// This library is distributed in the hope that it will be useful,
> +// but WITHOUT ANY WARRANTY; without even the implied warranty of
> +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +// GNU General Public License for more details.
> +
> +// Under Section 7 of GPL version 3, you are granted additional
> +// permissions described in the GCC Runtime Library Exception, version
> +// 3.1, as published by the Free Software Foundation.
> +
> +// You should have received a copy of the GNU General Public License and
> +// a copy of the GCC Runtime Library Exception along with this program;
> +// see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
> +// .
> +
> +/** @file mdspan
> + *  This is a Standard C++ Library header.
> + */
> +
> +#ifndef _GLIBCXX_MDSPAN
> +#define _GLIBCXX_MDSPAN 1
> +
> +#ifdef _GLIBCXX_SYSHDR
> +#pragma GCC system_header
> +#endif
> +
> +#define __glibcxx_want_mdspan
> +#include 
> +
> +#ifdef __glibcxx_mdspan
> +
> +namespace std _GLIBCXX_VISIBILITY(default)
> +{
> +_GLIBCXX_BEGIN_NAMESPACE_VERSION
> +
> +_GLIBCXX_END_NAMESPACE_VERSION
> +}
> +#endif
> +#endif
> --
> 2.48.1
>
>


Re: [PATCH v2 4/4] libstdc++: Add tests for std::extents.

2025-04-15 Thread Tomasz Kaminski
On Tue, Apr 15, 2025 at 10:55 AM Luc Grosheintz 
wrote:

> A prior commit added std::extents, this commit adds the tests. The bulk
> is focussed on testing the constructors. These are split into three
> groups:
>
> 1. the ctor from other extents and the copy ctor,
> 2. the ctor from a pack of integer-like objects,
> 3. the ctor from shapes, i.e. span and array.
>
> For each group check that the ctor:
> * produces an object with the expected values for extent,
> * is implicit if and only if required,
> * is constexpr,
> * doesn't change the rank of the extent.
>
> libstdc++-v3/ChangeLog:
>
> * testsuite/23_containers/mdspan/extents/assign.cc: New test.
> * testsuite/23_containers/mdspan/extents/class_properties.cc: New
> test.
> * testsuite/23_containers/mdspan/extents/ctor_copy.cc: New test.
> * testsuite/23_containers/mdspan/extents/ctor_copy_constexpr.cc:
> New test.
> * testsuite/23_containers/mdspan/extents/ctor_ints.cc: New test.
> * testsuite/23_containers/mdspan/extents/ctor_ints_constexpr.cc:
> New test.
> *
> testsuite/23_containers/mdspan/extents/ctor_shape_all_extents.cc: New test.
> * testsuite/23_containers/mdspan/extents/ctor_shape_constexpr.cc:
> New test.
> *
> testsuite/23_containers/mdspan/extents/ctor_shape_dynamic_extents.cc: New
> test.
> * testsuite/23_containers/mdspan/extents/custom_integer.cc: New
> test.
> * testsuite/23_containers/mdspan/extents/deduction_guide.cc: New
> test.
> * testsuite/23_containers/mdspan/extents/dextents.cc: New test.
> * testsuite/23_containers/mdspan/extents/extent.cc: New test.
> * testsuite/23_containers/mdspan/extents/ops_eq.cc: New test.
>
The DejaGnu configuration that we use for testing, has some overhead
(another file is compiled)
per each separate test file, so I would suggest merging some of these test
cases.

>
> Signed-off-by: Luc Grosheintz 
> ---
>  .../23_containers/mdspan/extents/assign.cc| 29 ++
>  .../mdspan/extents/class_properties.cc| 62 +
>  .../23_containers/mdspan/extents/ctor_copy.cc | 75 +++
>  .../mdspan/extents/ctor_copy_constexpr.cc | 20 
>  .../23_containers/mdspan/extents/ctor_ints.cc | 58 
>  .../mdspan/extents/ctor_ints_constexpr.cc | 12 +++
>  .../mdspan/extents/ctor_shape_all_extents.cc  | 61 +
>  .../mdspan/extents/ctor_shape_constexpr.cc| 23 +
>  .../extents/ctor_shape_dynamic_extents.cc | 91 +++
>  .../mdspan/extents/custom_integer.cc  | 87 ++
>  .../mdspan/extents/deduction_guide.cc | 34 +++
>  .../23_containers/mdspan/extents/dextents.cc  | 11 +++
>  .../23_containers/mdspan/extents/extent.cc| 24 +
>  .../23_containers/mdspan/extents/ops_eq.cc| 58 
>  14 files changed, 645 insertions(+)
>  create mode 100644
> libstdc++-v3/testsuite/23_containers/mdspan/extents/assign.cc
>  create mode 100644
> libstdc++-v3/testsuite/23_containers/mdspan/extents/class_properties.cc
>  create mode 100644
> libstdc++-v3/testsuite/23_containers/mdspan/extents/ctor_copy.cc
>  create mode 100644
> libstdc++-v3/testsuite/23_containers/mdspan/extents/ctor_copy_constexpr.cc
>  create mode 100644
> libstdc++-v3/testsuite/23_containers/mdspan/extents/ctor_ints.cc
>  create mode 100644
> libstdc++-v3/testsuite/23_containers/mdspan/extents/ctor_ints_constexpr.cc
>  create mode 100644
> libstdc++-v3/testsuite/23_containers/mdspan/extents/ctor_shape_all_extents.cc
>  create mode 100644
> libstdc++-v3/testsuite/23_containers/mdspan/extents/ctor_shape_constexpr.cc
>  create mode 100644
> libstdc++-v3/testsuite/23_containers/mdspan/extents/ctor_shape_dynamic_extents.cc
>  create mode 100644
> libstdc++-v3/testsuite/23_containers/mdspan/extents/custom_integer.cc
>  create mode 100644
> libstdc++-v3/testsuite/23_containers/mdspan/extents/deduction_guide.cc
>  create mode 100644
> libstdc++-v3/testsuite/23_containers/mdspan/extents/dextents.cc
>  create mode 100644
> libstdc++-v3/testsuite/23_containers/mdspan/extents/extent.cc
>  create mode 100644
> libstdc++-v3/testsuite/23_containers/mdspan/extents/ops_eq.cc
>
> diff --git a/libstdc++-v3/testsuite/23_containers/mdspan/extents/assign.cc
> b/libstdc++-v3/testsuite/23_containers/mdspan/extents/assign.cc
> new file mode 100644
> index 000..3bc32361a7b
> --- /dev/null
> +++ b/libstdc++-v3/testsuite/23_containers/mdspan/extents/assign.cc
> @@ -0,0 +1,29 @@
> +// { dg-do run { target c++23 } }
> +#include 
> +
> +#include 
> +
> +constexpr auto dyn = std::dynamic_extent;
> +
> +static_assert(std::is_nothrow_assignable_v,
> +  std::extents>);
> +
> +int
> +main()
> +{
> +  auto e1 = std::extents();
> +  auto e2 = std::extents();
> +
> +  e2 = e1;
> +  VERIFY(e2 == e1);
> +
> +  auto e5 = std::extents();
> +  e5 = e1;
> +  VERIFY(e5 == e1);
> +
> +  auto e3 = std::extents(1, 2);
> +  auto e

[PATCH v4 09/20] Add assembler_name to cgraph_function_version_info.

2025-04-15 Thread Alfie Richards
Add the assembler_name member to cgraph_function_version_info to store
the base assembler name of the funciton set, before FMV mangling. This is
used in later patches for refactoring FMV mangling.

gcc/ChangeLog:

* cgraph.cc (cgraph_node::insert_new_function_version): Record
assembler_name.
* cgraph.h (struct cgraph_function_version_info): Add assembler_name.
---
 gcc/cgraph.cc | 1 +
 gcc/cgraph.h  | 3 +++
 2 files changed, 4 insertions(+)

diff --git a/gcc/cgraph.cc b/gcc/cgraph.cc
index feaeebec40b..23f7748e49e 100644
--- a/gcc/cgraph.cc
+++ b/gcc/cgraph.cc
@@ -187,6 +187,7 @@ cgraph_node::insert_new_function_version (void)
   version_info_node = NULL;
   version_info_node = ggc_cleared_alloc ();
   version_info_node->this_node = this;
+  version_info_node->assembler_name = DECL_ASSEMBLER_NAME (this->decl);
 
   if (cgraph_fnver_htab == NULL)
 cgraph_fnver_htab = hash_table::create_ggc (2);
diff --git a/gcc/cgraph.h b/gcc/cgraph.h
index 6759505bf33..4a4fb7302b1 100644
--- a/gcc/cgraph.h
+++ b/gcc/cgraph.h
@@ -856,6 +856,9 @@ struct GTY((for_user)) cgraph_function_version_info {
  dispatcher. The dispatcher decl is an alias to the resolver
  function decl.  */
   tree dispatcher_resolver;
+
+  /* The assmbly name of the function set before version mangling.  */
+  tree assembler_name;
 };
 
 #define DEFCIFCODE(code, type, string) CIF_ ## code,
-- 
2.34.1



[PATCH v4 00/20] FMV refactor and ACLE compliance.

2025-04-15 Thread Alfie Richards
Hi all,

Another update to this series.

This patch changes the version info structure to be sorted by
priority. This allows easier reasoning for optimisations and prevents having to
calculate the priority of functions repeatedly.

The other change is that the target_clones pass was split in two. This is
because the target_clones pass now dispatches the target_versions and
target_clones, and different versions may have arbitrarily idfferent bodies.
Therefore, allowing passes like efvp before dispatching made some invalid
optimisations.
However, as Alice Carlotti (alice.carlo...@arm.com) pointed out offline, the
target_clones pass was likely put in this position late as for target_clones
it is valid, as all the versions have the same body.
So I split it in two. In the early stage complicated cases where there are
multiple decls are expanded and dispatched. In the later stages, the simple
case of a lone target_clones decl is dispatched (as is always the case
for TARGET_HAS_FMV_TARGET_ATTRIBUTE targets).

Regression tested and bootstrapped for aarch64-none-linux-gnu
and x86_64-unknown-linux-gnu.

Cross compiled and checked FMV tests for riscv and powerpc.

Hoping for GCC16 stage 1 for this.

I have a Forgejo PR if reviewers want to try using that for review:
https://forge.sourceware.org/gcc/gcc-TEST/pulls/49

Kind regards,
Alfie

Change log
==

V4:
- Changed version_info structure to be sorted by priority
- Split the target_clones pass into early/late stages
- Split out fix for PR c++/119498

V3: https://gcc.gnu.org/pipermail/gcc-patches/2025-March/679488.html
- Added reject target_clones version logic and hook
- Added pretty print for string_slice
- Refactored merging and conflict logic in front end
- Improved diagnostics

V2: https://gcc.gnu.org/pipermail/gcc-patches/2025-February/675960.html
- Changed recording of assembly name to be done in version into initialisation
- Changed behaviour for a lone default decl

V1: 
https://gcc.gnu.org/pipermail/gcc-patches/2025-February/674973.htmlhttps://gcc.gnu.org/pipermail/gcc-patches/2025-February/674973.html
- Initial

Alfie Richards (18):
  Add string_slice class.
  Remove unnecessary `record` argument from maybe_version_functions.
  Update is_function_default_version to work with target_version (Approved).
  Refactor record_function_versions.
  Change make_attribute to take string_slice (Approved).
  Add get_clone_versions and get_target_version functions.
  Add assembler_name to cgraph_function_version_info.
  Add dispatcher_resolver_function and is_target_clone flags to
cgraph_node.
  Add clone_identifier function.
  Refactor FMV name mangling.
  Refactor riscv target parsing to take string_slice.
  Add reject_target_clone hook for filtering target_clone versions.
  Change target_version semantics to follow ACLE specification.
  Refactor FMV frontend conflict and merging logic and hooks.
  Support mixing of target_clones and target_version.
  Fix FMV return type ambiguation
  Add diagnostic tests for Aarch64 FMV.
  Remove FMV beta warning.

Alice Carlotti (2):
  Add PowerPC FMV symbol tests.
  Add x86 FMV symbol tests

 gcc/attribs.cc| 170 ---
 gcc/attribs.h |   5 +-
 gcc/c-family/c-attribs.cc |  33 +-
 gcc/c-family/c-format.cc  |   7 +
 gcc/c-family/c-format.h   |   1 +
 gcc/cgraph.cc |  80 ++--
 gcc/cgraph.h  |  29 +-
 gcc/cgraphclones.cc   |  16 +-
 gcc/cgraphunit.cc |   9 +
 gcc/config/aarch64/aarch64.cc | 273 +---
 gcc/config/aarch64/aarch64.opt|   2 +-
 gcc/config/i386/i386-features.cc  | 141 +++---
 gcc/config/riscv/riscv-protos.h   |   2 +
 gcc/config/riscv/riscv-target-attr.cc |  14 +-
 gcc/config/riscv/riscv.cc | 267 +--
 gcc/config/rs6000/rs6000.cc   | 150 +--
 gcc/cp/call.cc|  10 +
 gcc/cp/class.cc   |  19 +-
 gcc/cp/cp-gimplify.cc |  11 +-
 gcc/cp/cp-tree.h  |   4 +-
 gcc/cp/decl.cc|  90 +++-
 gcc/cp/decl2.cc   |   2 +-
 gcc/cp/typeck.cc  |  10 +
 gcc/doc/invoke.texi   |   5 +-
 gcc/doc/tm.texi   |  16 +-
 gcc/doc/tm.texi.in|   2 +
 gcc/hooks.cc  |  13 +
 gcc/hooks.h   |   4 +
 gcc/ipa.cc|  11 +
 gcc/multiple_target.cc| 421 ++
 gcc/passes.def|   3 +-
 gcc/pretty-print.cc   |  10 +
 gcc/target.def

[PATCH v4 02/20] Add x86 FMV symbol tests

2025-04-15 Thread Alfie Richards
From: Alice Carlotti 

This is for testing the x86 mangling of FMV versioned function
assembly names.

gcc/testsuite/ChangeLog:

* g++.target/i386/mv-symbols1.C: New test.
* g++.target/i386/mv-symbols2.C: New test.
* g++.target/i386/mv-symbols3.C: New test.
* g++.target/i386/mv-symbols4.C: New test.
* g++.target/i386/mv-symbols5.C: New test.
* g++.target/i386/mvc-symbols1.C: New test.
* g++.target/i386/mvc-symbols2.C: New test.
* g++.target/i386/mvc-symbols3.C: New test.
* g++.target/i386/mvc-symbols4.C: New test.

Co-authored-by: Alfie Richards 
---
 gcc/testsuite/g++.target/i386/mv-symbols1.C  | 68 
 gcc/testsuite/g++.target/i386/mv-symbols2.C  | 56 
 gcc/testsuite/g++.target/i386/mv-symbols3.C  | 44 +
 gcc/testsuite/g++.target/i386/mv-symbols4.C  | 50 ++
 gcc/testsuite/g++.target/i386/mv-symbols5.C  | 56 
 gcc/testsuite/g++.target/i386/mvc-symbols1.C | 44 +
 gcc/testsuite/g++.target/i386/mvc-symbols2.C | 29 +
 gcc/testsuite/g++.target/i386/mvc-symbols3.C | 35 ++
 gcc/testsuite/g++.target/i386/mvc-symbols4.C | 23 +++
 9 files changed, 405 insertions(+)
 create mode 100644 gcc/testsuite/g++.target/i386/mv-symbols1.C
 create mode 100644 gcc/testsuite/g++.target/i386/mv-symbols2.C
 create mode 100644 gcc/testsuite/g++.target/i386/mv-symbols3.C
 create mode 100644 gcc/testsuite/g++.target/i386/mv-symbols4.C
 create mode 100644 gcc/testsuite/g++.target/i386/mv-symbols5.C
 create mode 100644 gcc/testsuite/g++.target/i386/mvc-symbols1.C
 create mode 100644 gcc/testsuite/g++.target/i386/mvc-symbols2.C
 create mode 100644 gcc/testsuite/g++.target/i386/mvc-symbols3.C
 create mode 100644 gcc/testsuite/g++.target/i386/mvc-symbols4.C

diff --git a/gcc/testsuite/g++.target/i386/mv-symbols1.C 
b/gcc/testsuite/g++.target/i386/mv-symbols1.C
new file mode 100644
index 000..1290299aea5
--- /dev/null
+++ b/gcc/testsuite/g++.target/i386/mv-symbols1.C
@@ -0,0 +1,68 @@
+/* { dg-do compile } */
+/* { dg-require-ifunc "" } */
+/* { dg-options "-O0" } */
+
+__attribute__((target("default")))
+int foo ()
+{
+  return 1;
+}
+
+__attribute__((target("arch=slm")))
+int foo ()
+{
+  return 3;
+}
+
+__attribute__((target("sse4.2")))
+int foo ()
+{
+  return 5;
+}
+
+__attribute__((target("sse4.2")))
+int foo (int)
+{
+  return 6;
+}
+
+__attribute__((target("arch=slm")))
+int foo (int)
+{
+  return 4;
+}
+
+__attribute__((target("default")))
+int foo (int)
+{
+  return 2;
+}
+
+int bar()
+{
+  return foo ();
+}
+
+int bar(int x)
+{
+  return foo (x);
+}
+
+/* When updating any of the symbol names in these tests, make sure to also
+   update any tests for their absence in mvc-symbolsN.C */
+
+/* { dg-final { scan-assembler-times "\n_Z3foov:\n" 1 } } */
+/* { dg-final { scan-assembler-times "\n_Z3foov\.arch_slm:\n" 1 } } */
+/* { dg-final { scan-assembler-times "\n_Z3foov\.sse4.2:\n" 1 } } */
+/* { dg-final { scan-assembler-times "\n_Z3foov\.resolver:\n" 1 } } */
+/* { dg-final { scan-assembler-times "\n\tcall\t_Z7_Z3foovv\n" 1 } } */
+/* { dg-final { scan-assembler-times "\n\t\.type\t_Z7_Z3foovv, 
@gnu_indirect_function\n" 1 } } */
+/* { dg-final { scan-assembler-times 
"\n\t\.set\t_Z7_Z3foovv,_Z3foov\.resolver\n" 1 } } */
+
+/* { dg-final { scan-assembler-times "\n_Z3fooi:\n" 1 } } */
+/* { dg-final { scan-assembler-times "\n_Z3fooi\.arch_slm:\n" 1 } } */
+/* { dg-final { scan-assembler-times "\n_Z3fooi\.sse4.2:\n" 1 } } */
+/* { dg-final { scan-assembler-times "\n_Z3fooi\.resolver:\n" 1 } } */
+/* { dg-final { scan-assembler-times "\n\tcall\t_Z7_Z3fooii\n" 1 } } */
+/* { dg-final { scan-assembler-times "\n\t\.type\t_Z7_Z3fooii, 
@gnu_indirect_function\n" 1 } } */
+/* { dg-final { scan-assembler-times 
"\n\t\.set\t_Z7_Z3fooii,_Z3fooi\.resolver\n" 1 } } */
diff --git a/gcc/testsuite/g++.target/i386/mv-symbols2.C 
b/gcc/testsuite/g++.target/i386/mv-symbols2.C
new file mode 100644
index 000..8b75565d78d
--- /dev/null
+++ b/gcc/testsuite/g++.target/i386/mv-symbols2.C
@@ -0,0 +1,56 @@
+/* { dg-do compile } */
+/* { dg-require-ifunc "" } */
+/* { dg-options "-O0" } */
+
+__attribute__((target("default")))
+int foo ()
+{
+  return 1;
+}
+
+__attribute__((target("arch=slm")))
+int foo ()
+{
+  return 3;
+}
+
+__attribute__((target("sse4.2")))
+int foo ()
+{
+  return 5;
+}
+
+__attribute__((target("sse4.2")))
+int foo (int)
+{
+  return 6;
+}
+
+__attribute__((target("arch=slm")))
+int foo (int)
+{
+  return 4;
+}
+
+__attribute__((target("default")))
+int foo (int)
+{
+  return 2;
+}
+
+/* When updating any of the symbol names in these tests, make sure to also
+   update any tests for their absence in mvc-symbolsN.C */
+
+/* { dg-final { scan-assembler-times "\n_Z3foov:\n" 1 } } */
+/* { dg-final { scan-assembler-times "\n_Z3foov\.arch_slm:\n" 1 } } */
+/* { dg-final { scan-assembler-times "\n_Z3foov\.sse4.2:\n" 1 } } */
+/* { dg-final { scan-assembl

[PATCH v4 17/20] Support mixing of target_clones and target_version.

2025-04-15 Thread Alfie Richards
Add support for a FMV set defined by a combination of target_clones and
target_version definitions.

Additionally, change is_function_default_version to consider a function
declaration annotated with target_clones containing default to be a
default version.

Lastly, add support for the case that a target_clone has all versions filtered
out and therefore the declaration should be removed. This is relevant as now
the default could be defined in a target_version, so a target_clones no longer
necessarily contains the default.

This takes advantage of refactoring done in previous patches changing
how target_clones are expanded and how conflicting decls are handled.

gcc/ChangeLog:

* attribs.cc (is_function_default_version): Update to handle
target_clones.
* cgraph.h (FOR_EACH_FUNCTION_REMOVABLE): New macro.
* multiple_target.cc (expand_target_clones): Update logic to delete
empty target_clones and modify diagnostic.
(ipa_target_clone): Update to use
FOR_EACH_FUNCTION_REMOVABLE.

gcc/c-family/ChangeLog:

* c-attribs.cc: Add support for target_version and target_clone mixing.

gcc/testsuite/ChangeLog:

* g++.target/aarch64/mv-and-mvc1.C: New test.
* g++.target/aarch64/mv-and-mvc2.C: New test.
* g++.target/aarch64/mv-and-mvc3.C: New test.
* g++.target/aarch64/mv-and-mvc4.C: New test.
---
 gcc/attribs.cc| 10 -
 gcc/c-family/c-attribs.cc |  9 +---
 gcc/cgraph.h  |  7 
 gcc/multiple_target.cc| 24 +--
 .../g++.target/aarch64/mv-and-mvc1.C  | 38 +
 .../g++.target/aarch64/mv-and-mvc2.C  | 29 +
 .../g++.target/aarch64/mv-and-mvc3.C  | 41 +++
 .../g++.target/aarch64/mv-and-mvc4.C  | 38 +
 8 files changed, 183 insertions(+), 13 deletions(-)
 create mode 100644 gcc/testsuite/g++.target/aarch64/mv-and-mvc1.C
 create mode 100644 gcc/testsuite/g++.target/aarch64/mv-and-mvc2.C
 create mode 100644 gcc/testsuite/g++.target/aarch64/mv-and-mvc3.C
 create mode 100644 gcc/testsuite/g++.target/aarch64/mv-and-mvc4.C

diff --git a/gcc/attribs.cc b/gcc/attribs.cc
index 06785eaa136..2ca82674f7c 100644
--- a/gcc/attribs.cc
+++ b/gcc/attribs.cc
@@ -1242,7 +1242,8 @@ make_dispatcher_decl (const tree decl)
With the target attribute semantics, returns true if the function is marked
as default with the target version.
With the target_version attribute semantics, returns true if the function
-   is either not annotated, or annotated as default.  */
+   is either not annotated, annotated as default, or is a target_clone
+   containing the default declaration.  */
 
 bool
 is_function_default_version (const tree decl)
@@ -1259,6 +1260,13 @@ is_function_default_version (const tree decl)
 }
   else
 {
+  if (lookup_attribute ("target_clones", DECL_ATTRIBUTES (decl)))
+   {
+ int num_defaults = 0;
+ get_clone_versions (decl, &num_defaults);
+ return num_defaults > 0;
+   }
+
   attr = lookup_attribute ("target_version", DECL_ATTRIBUTES (decl));
   if (!attr)
return true;
diff --git a/gcc/c-family/c-attribs.cc b/gcc/c-family/c-attribs.cc
index b5287f0da06..a4e657d9ffd 100644
--- a/gcc/c-family/c-attribs.cc
+++ b/gcc/c-family/c-attribs.cc
@@ -249,13 +249,6 @@ static const struct attribute_spec::exclusions 
attr_target_clones_exclusions[] =
   ATTR_EXCL ("always_inline", true, true, true),
   ATTR_EXCL ("target", TARGET_HAS_FMV_TARGET_ATTRIBUTE,
 TARGET_HAS_FMV_TARGET_ATTRIBUTE, TARGET_HAS_FMV_TARGET_ATTRIBUTE),
-  ATTR_EXCL ("target_version", true, true, true),
-  ATTR_EXCL (NULL, false, false, false),
-};
-
-static const struct attribute_spec::exclusions 
attr_target_version_exclusions[] =
-{
-  ATTR_EXCL ("target_clones", true, true, true),
   ATTR_EXCL (NULL, false, false, false),
 };
 
@@ -543,7 +536,7 @@ const struct attribute_spec c_common_gnu_attributes[] =
  attr_target_exclusions },
   { "target_version", 1, 1, true, false, false, false,
  handle_target_version_attribute,
- attr_target_version_exclusions },
+ NULL },
   { "target_clones",  1, -1, true, false, false, false,
  handle_target_clones_attribute,
  attr_target_clones_exclusions },
diff --git a/gcc/cgraph.h b/gcc/cgraph.h
index 0eed6a9d46d..fb89a7b5919 100644
--- a/gcc/cgraph.h
+++ b/gcc/cgraph.h
@@ -3093,6 +3093,13 @@ symbol_table::next_function_with_gimple_body 
(cgraph_node *node)
for ((node) = symtab->first_function (); (node); \
(node) = symtab->next_function ((node)))
 
+/* Walk all functions but precompute so a node can be deleted if needed.  */
+#define FOR_EACH_FUNCTION_REMOVABLE(node) \
+   cg

[PATCH v4 13/20] Refactor riscv target parsing to take string_slice.

2025-04-15 Thread Alfie Richards
This is a quick refactor of the riscv target processing code
to take a string_slice rather than a decl.

The reason for this is to enable it to work with target_clones
where merging logic requires reasoning about each version string
individually in the front end.

This refactor primarily serves just to get this working. Ideally the
logic here would be further refactored as currenly there is no way to
check if a parse fails or not without emitting an error.
This makes things difficult for later patches which intends to emit a
warning and ignoring unrecognised/not parsed target_clone values rather
than erroring which can't currenly be achieved with the current riscv
code.

gcc/ChangeLog:

* config/riscv/riscv-protos.h (riscv_process_target_version_str): New 
function..
* config/riscv/riscv-target-attr.cc (riscv_process_target_attr): 
Refactor to take
string_slice.
(riscv_process_target_version_str): Ditto.
* config/riscv/riscv.cc (parse_features_for_version): Refactor to take
string_slice.
(riscv_compare_version_priority): Ditto.
(dispatch_function_versions): Change to pass location.
---
 gcc/config/riscv/riscv-protos.h   |  2 ++
 gcc/config/riscv/riscv-target-attr.cc | 14 +---
 gcc/config/riscv/riscv.cc | 50 ++-
 3 files changed, 37 insertions(+), 29 deletions(-)

diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 2bedd878a04..1efe45d63e6 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -813,6 +813,8 @@ riscv_option_valid_attribute_p (tree, tree, tree, int);
 extern bool
 riscv_option_valid_version_attribute_p (tree, tree, tree, int);
 extern bool
+riscv_process_target_version_str (string_slice, location_t);
+extern bool
 riscv_process_target_version_attr (tree, location_t);
 extern void
 riscv_override_options_internal (struct gcc_options *);
diff --git a/gcc/config/riscv/riscv-target-attr.cc 
b/gcc/config/riscv/riscv-target-attr.cc
index 1d968655f95..d3f06fb15d4 100644
--- a/gcc/config/riscv/riscv-target-attr.cc
+++ b/gcc/config/riscv/riscv-target-attr.cc
@@ -354,11 +354,11 @@ num_occurrences_in_str (char c, char *str)
and update the global target options space.  */
 
 bool
-riscv_process_target_attr (const char *args,
+riscv_process_target_attr (string_slice args,
   location_t loc,
   const struct riscv_attribute_info *attrs)
 {
-  size_t len = strlen (args);
+  size_t len = args.size ();
 
   /* No need to emit warning or error on empty string here, generic code 
already
  handle this case.  */
@@ -369,7 +369,7 @@ riscv_process_target_attr (const char *args,
 
   std::unique_ptr buf (new char[len+1]);
   char *str_to_check = buf.get ();
-  strcpy (str_to_check, args);
+  strncpy (str_to_check, args.begin (), args.size ());
 
   /* Used to catch empty spaces between semi-colons i.e.
  attribute ((target ("attr1;;attr2"))).  */
@@ -391,8 +391,7 @@ riscv_process_target_attr (const char *args,
 
   if (num_attrs != num_semicolons + 1)
 {
-  error_at (loc, "malformed % attribute",
-   args);
+  error_at (loc, "malformed % attribute", &args);
   return false;
 }
 
@@ -513,6 +512,11 @@ riscv_process_target_version_attr (tree args, location_t 
loc)
   return riscv_process_target_attr (str, loc, riscv_target_version_attrs);
 }
 
+bool
+riscv_process_target_version_str (string_slice str, location_t loc)
+{
+  return riscv_process_target_attr (str, loc, riscv_target_version_attrs);
+}
 
 /* Implement TARGET_OPTION_VALID_VERSION_ATTRIBUTE_P.  This is used to
process attribute ((target_version ("..."))).  */
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 946658e0d5e..ddeb321cb44 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -13092,31 +13092,22 @@ riscv_c_mode_for_floating_type (enum tree_index ti)
   return default_mode_for_floating_type (ti);
 }
 
-/* This parses the attribute arguments to target_version in DECL and modifies
-   the feature mask and priority required to select those targets.  */
-static void
-parse_features_for_version (tree decl,
+/* This parses STR and modifies the feature mask and priority required to
+   select those targets.  */
+static bool
+parse_features_for_version (string_slice version_str,
+   location_t loc,
struct riscv_feature_bits &res,
int &priority)
 {
-  tree version_attr = lookup_attribute ("target_version",
-   DECL_ATTRIBUTES (decl));
-  if (version_attr == NULL_TREE)
+  gcc_assert (version_str.is_valid ());
+  if (version_str == "default")
 {
   res.length = 0;
   priority = 0;
-  return;
+  return true;
 }
 
-  const char *version_string = TREE_STRING_POINTER (TREE_VALUE (TREE_VALUE
-   

[PATCH v4 04/20] Remove unnecessary `record` argument from maybe_version_functions.

2025-04-15 Thread Alfie Richards
Previously, the `record` argument in maybe_version_function allowed the
call to cgraph_node::record_function_versions to be skipped.  However,
this was only skipped when both decls were already marked as versioned,
in which case we trigger the early exit in record_function_versions
instead. Therefore, the argument is unnecessary.

gcc/cp/ChangeLog:

* class.cc (add_method): Remove argument.
* cp-tree.h (maybe_version_functions): Ditto.
* decl.cc (decls_match): Ditto.
(maybe_version_functions): Ditto.
---
 gcc/cp/class.cc  |  2 +-
 gcc/cp/cp-tree.h |  2 +-
 gcc/cp/decl.cc   | 13 +
 3 files changed, 7 insertions(+), 10 deletions(-)

diff --git a/gcc/cp/class.cc b/gcc/cp/class.cc
index 2b694b98e56..93f1a1bdd81 100644
--- a/gcc/cp/class.cc
+++ b/gcc/cp/class.cc
@@ -1402,7 +1402,7 @@ add_method (tree type, tree method, bool via_using)
   /* If these are versions of the same function, process and
 move on.  */
   if (TREE_CODE (fn) == FUNCTION_DECL
- && maybe_version_functions (method, fn, true))
+ && maybe_version_functions (method, fn))
continue;
 
   if (DECL_INHERITED_CTOR (method))
diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 55f986e25c1..898054c2891 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -7122,7 +7122,7 @@ extern void determine_local_discriminator (tree, tree = 
NULL_TREE);
 extern bool member_like_constrained_friend_p   (tree);
 extern bool fns_correspond (tree, tree);
 extern int decls_match (tree, tree, bool = true);
-extern bool maybe_version_functions(tree, tree, bool);
+extern bool maybe_version_functions(tree, tree);
 extern bool validate_constexpr_redeclaration   (tree, tree);
 extern bool merge_default_template_args(tree, tree, bool);
 extern tree duplicate_decls(tree, tree,
diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
index 4e97093b134..9cb56eac4a9 100644
--- a/gcc/cp/decl.cc
+++ b/gcc/cp/decl.cc
@@ -1216,9 +1216,7 @@ decls_match (tree newdecl, tree olddecl, bool 
record_versions /* = true */)
  && targetm.target_option.function_versions (newdecl, olddecl))
{
  if (record_versions)
-   maybe_version_functions (newdecl, olddecl,
-(!DECL_FUNCTION_VERSIONED (newdecl)
- || !DECL_FUNCTION_VERSIONED (olddecl)));
+   maybe_version_functions (newdecl, olddecl);
  return 0;
}
 }
@@ -1285,11 +1283,11 @@ maybe_mark_function_versioned (tree decl)
 }
 
 /* NEWDECL and OLDDECL have identical signatures.  If they are
-   different versions adjust them and return true.
-   If RECORD is set to true, record function versions.  */
+   different versions adjust them, record function versions, and return
+   true.  */
 
 bool
-maybe_version_functions (tree newdecl, tree olddecl, bool record)
+maybe_version_functions (tree newdecl, tree olddecl)
 {
   if (!targetm.target_option.function_versions (newdecl, olddecl))
 return false;
@@ -1312,8 +1310,7 @@ maybe_version_functions (tree newdecl, tree olddecl, bool 
record)
   maybe_mark_function_versioned (newdecl);
 }
 
-  if (record)
-cgraph_node::record_function_versions (olddecl, newdecl);
+  cgraph_node::record_function_versions (olddecl, newdecl);
 
   return true;
 }
-- 
2.34.1



[PATCH v4 20/20] Remove FMV beta warning.

2025-04-15 Thread Alfie Richards
This patch removes the warning for target_version and target_clones
in aarch64 as it is now spec compliant.

gcc/ChangeLog:

* config/aarch64/aarch64.cc (aarch64_process_target_version_attr):
Remove warning.
* config/aarch64/aarch64.opt: Mark -Wno-experimental-fmv-target
deprecated.
* doc/invoke.texi: Ditto.

gcc/testsuite/ChangeLog:

* g++.target/aarch64/mv-1.C: Remove option.
* g++.target/aarch64/mv-and-mvc-error1.C: Ditto.
* g++.target/aarch64/mv-and-mvc-error2.C: Ditto.
* g++.target/aarch64/mv-and-mvc-error3.C: Ditto.
* g++.target/aarch64/mv-and-mvc1.C: Ditto.
* g++.target/aarch64/mv-and-mvc2.C: Ditto.
* g++.target/aarch64/mv-and-mvc3.C: Ditto.
* g++.target/aarch64/mv-and-mvc4.C: Ditto.
* g++.target/aarch64/mv-error1.C: Ditto.
* g++.target/aarch64/mv-error2.C: Ditto.
* g++.target/aarch64/mv-error3.C: Ditto.
* g++.target/aarch64/mv-error4.C: Ditto.
* g++.target/aarch64/mv-error5.C: Ditto.
* g++.target/aarch64/mv-error6.C: Ditto.
* g++.target/aarch64/mv-error7.C: Ditto.
* g++.target/aarch64/mv-error8.C: Ditto.
* g++.target/aarch64/mv-pragma.C: Ditto.
* g++.target/aarch64/mv-symbols1.C: Ditto.
* g++.target/aarch64/mv-symbols10.C: Ditto.
* g++.target/aarch64/mv-symbols11.C: Ditto.
* g++.target/aarch64/mv-symbols12.C: Ditto.
* g++.target/aarch64/mv-symbols13.C: Ditto.
* g++.target/aarch64/mv-symbols2.C: Ditto.
* g++.target/aarch64/mv-symbols3.C: Ditto.
* g++.target/aarch64/mv-symbols4.C: Ditto.
* g++.target/aarch64/mv-symbols5.C: Ditto.
* g++.target/aarch64/mv-symbols6.C: Ditto.
* g++.target/aarch64/mv-symbols7.C: Ditto.
* g++.target/aarch64/mv-symbols8.C: Ditto.
* g++.target/aarch64/mv-symbols9.C: Ditto.
* g++.target/aarch64/mvc-error1.C: Ditto.
* g++.target/aarch64/mvc-error2.C: Ditto.
* g++.target/aarch64/mvc-symbols1.C: Ditto.
* g++.target/aarch64/mvc-symbols2.C: Ditto.
* g++.target/aarch64/mvc-symbols3.C: Ditto.
* g++.target/aarch64/mvc-symbols4.C: Ditto.
* g++.target/aarch64/mv-warning1.C: Removed.
* g++.target/aarch64/mvc-warning1.C: Removed.
---
 gcc/config/aarch64/aarch64.cc| 9 -
 gcc/config/aarch64/aarch64.opt   | 2 +-
 gcc/doc/invoke.texi  | 5 +
 gcc/testsuite/g++.target/aarch64/mv-1.C  | 1 -
 gcc/testsuite/g++.target/aarch64/mv-and-mvc-error1.C | 1 -
 gcc/testsuite/g++.target/aarch64/mv-and-mvc-error2.C | 1 -
 gcc/testsuite/g++.target/aarch64/mv-and-mvc-error3.C | 1 -
 gcc/testsuite/g++.target/aarch64/mv-and-mvc1.C   | 1 -
 gcc/testsuite/g++.target/aarch64/mv-and-mvc2.C   | 1 -
 gcc/testsuite/g++.target/aarch64/mv-and-mvc3.C   | 1 -
 gcc/testsuite/g++.target/aarch64/mv-and-mvc4.C   | 1 -
 gcc/testsuite/g++.target/aarch64/mv-error1.C | 1 -
 gcc/testsuite/g++.target/aarch64/mv-error2.C | 1 -
 gcc/testsuite/g++.target/aarch64/mv-error3.C | 1 -
 gcc/testsuite/g++.target/aarch64/mv-error4.C | 1 -
 gcc/testsuite/g++.target/aarch64/mv-error5.C | 1 -
 gcc/testsuite/g++.target/aarch64/mv-error6.C | 1 -
 gcc/testsuite/g++.target/aarch64/mv-error7.C | 1 -
 gcc/testsuite/g++.target/aarch64/mv-error8.C | 1 -
 gcc/testsuite/g++.target/aarch64/mv-pragma.C | 1 -
 gcc/testsuite/g++.target/aarch64/mv-symbols1.C   | 1 -
 gcc/testsuite/g++.target/aarch64/mv-symbols10.C  | 1 -
 gcc/testsuite/g++.target/aarch64/mv-symbols11.C  | 1 -
 gcc/testsuite/g++.target/aarch64/mv-symbols12.C  | 1 -
 gcc/testsuite/g++.target/aarch64/mv-symbols13.C  | 1 -
 gcc/testsuite/g++.target/aarch64/mv-symbols2.C   | 1 -
 gcc/testsuite/g++.target/aarch64/mv-symbols3.C   | 1 -
 gcc/testsuite/g++.target/aarch64/mv-symbols4.C   | 1 -
 gcc/testsuite/g++.target/aarch64/mv-symbols5.C   | 1 -
 gcc/testsuite/g++.target/aarch64/mv-symbols6.C   | 1 -
 gcc/testsuite/g++.target/aarch64/mv-symbols7.C   | 1 -
 gcc/testsuite/g++.target/aarch64/mv-symbols8.C   | 1 -
 gcc/testsuite/g++.target/aarch64/mv-symbols9.C   | 1 -
 gcc/testsuite/g++.target/aarch64/mv-warning1.C   | 9 -
 gcc/testsuite/g++.target/aarch64/mvc-error1.C| 1 -
 gcc/testsuite/g++.target/aarch64/mvc-error2.C| 1 -
 gcc/testsuite/g++.target/aarch64/mvc-symbols1.C  | 1 -
 gcc/testsuite/g++.target/aarch64/mvc-symbols2.C  | 1 -
 gcc/testsuite/g++.target/aarch64/mvc-symbols3.C  | 1 -
 gcc/testsuite/g++.target/aarch64/mvc-symbols4.C  | 1 -
 gcc/testsuite/g++.target/aarch64/mvc-warning1.C  | 1 -
 41 files changed, 2 insertions(+), 60 deletions(-)
 delete mode 100644 gcc/testsuite/g++.target/aarch64/mv-warning1.C

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aar

[PATCH v4 19/20] Add diagnostic tests for Aarch64 FMV.

2025-04-15 Thread Alfie Richards
Add tests covering many FMV errors for Aarch64, including
redeclaration, and mixing target_clones and target_versions.

gcc/testsuite/ChangeLog:

* g++.target/aarch64/mv-and-mvc-error1.C: New test.
* g++.target/aarch64/mv-and-mvc-error2.C: New test.
* g++.target/aarch64/mv-and-mvc-error3.C: New test.
* g++.target/aarch64/mv-error1.C: New test.
* g++.target/aarch64/mv-error2.C: New test.
* g++.target/aarch64/mv-error3.C: New test.
* g++.target/aarch64/mv-error4.C: New test.
* g++.target/aarch64/mv-error5.C: New test.
* g++.target/aarch64/mv-error6.C: New test.
* g++.target/aarch64/mv-error7.C: New test.
* g++.target/aarch64/mv-error8.C: New test.
* g++.target/aarch64/mvc-error1.C: New test.
* g++.target/aarch64/mvc-error2.C: New test.
* g++.target/aarch64/mvc-warning1.C: Modified test.
---
 .../g++.target/aarch64/mv-and-mvc-error1.C| 10 +
 .../g++.target/aarch64/mv-and-mvc-error2.C| 10 +
 .../g++.target/aarch64/mv-and-mvc-error3.C|  9 
 gcc/testsuite/g++.target/aarch64/mv-error1.C  | 19 +
 gcc/testsuite/g++.target/aarch64/mv-error2.C  | 10 +
 gcc/testsuite/g++.target/aarch64/mv-error3.C  | 13 
 gcc/testsuite/g++.target/aarch64/mv-error4.C  | 10 +
 gcc/testsuite/g++.target/aarch64/mv-error5.C  |  9 
 gcc/testsuite/g++.target/aarch64/mv-error6.C  | 21 +++
 gcc/testsuite/g++.target/aarch64/mv-error7.C  | 12 +++
 gcc/testsuite/g++.target/aarch64/mv-error8.C  | 13 
 gcc/testsuite/g++.target/aarch64/mvc-error1.C | 10 +
 gcc/testsuite/g++.target/aarch64/mvc-error2.C | 10 +
 .../g++.target/aarch64/mvc-warning1.C | 12 +--
 14 files changed, 166 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/g++.target/aarch64/mv-and-mvc-error1.C
 create mode 100644 gcc/testsuite/g++.target/aarch64/mv-and-mvc-error2.C
 create mode 100644 gcc/testsuite/g++.target/aarch64/mv-and-mvc-error3.C
 create mode 100644 gcc/testsuite/g++.target/aarch64/mv-error1.C
 create mode 100644 gcc/testsuite/g++.target/aarch64/mv-error2.C
 create mode 100644 gcc/testsuite/g++.target/aarch64/mv-error3.C
 create mode 100644 gcc/testsuite/g++.target/aarch64/mv-error4.C
 create mode 100644 gcc/testsuite/g++.target/aarch64/mv-error5.C
 create mode 100644 gcc/testsuite/g++.target/aarch64/mv-error6.C
 create mode 100644 gcc/testsuite/g++.target/aarch64/mv-error7.C
 create mode 100644 gcc/testsuite/g++.target/aarch64/mv-error8.C
 create mode 100644 gcc/testsuite/g++.target/aarch64/mvc-error1.C
 create mode 100644 gcc/testsuite/g++.target/aarch64/mvc-error2.C

diff --git a/gcc/testsuite/g++.target/aarch64/mv-and-mvc-error1.C 
b/gcc/testsuite/g++.target/aarch64/mv-and-mvc-error1.C
new file mode 100644
index 000..00d3826f757
--- /dev/null
+++ b/gcc/testsuite/g++.target/aarch64/mv-and-mvc-error1.C
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-require-ifunc "" } */
+/* { dg-options "-O0" } */
+/* { dg-additional-options "-Wno-experimental-fmv-target" } */
+
+__attribute__ ((target_version ("dotprod"))) int
+foo () { return 3; } /* { dg-message "previous definition" } */
+
+__attribute__ ((target_clones ("dotprod", "sve"))) int
+foo () { return 1; } /* { dg-error "conflicting .dotprod. versions" } */
diff --git a/gcc/testsuite/g++.target/aarch64/mv-and-mvc-error2.C 
b/gcc/testsuite/g++.target/aarch64/mv-and-mvc-error2.C
new file mode 100644
index 000..bf8a4112a21
--- /dev/null
+++ b/gcc/testsuite/g++.target/aarch64/mv-and-mvc-error2.C
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-require-ifunc "" } */
+/* { dg-options "-O0" } */
+/* { dg-additional-options "-Wno-experimental-fmv-target" } */
+
+__attribute__ ((target_version ("default"))) int
+foo () { return 1; } /* { dg-message "old declaration" } */
+
+__attribute__ ((target_clones ("dotprod", "sve"))) float
+foo () { return 3; } /* { dg-error "ambiguating new declaration of" } */
diff --git a/gcc/testsuite/g++.target/aarch64/mv-and-mvc-error3.C 
b/gcc/testsuite/g++.target/aarch64/mv-and-mvc-error3.C
new file mode 100644
index 000..3233a98d1ad
--- /dev/null
+++ b/gcc/testsuite/g++.target/aarch64/mv-and-mvc-error3.C
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-require-ifunc "" } */
+/* { dg-options "-O0" } */
+/* { dg-additional-options "-Wno-experimental-fmv-target" } */
+
+float foo () { return 1; } /* { dg-message "previous definition" } */
+
+__attribute__ ((target_clones ("default", "dotprod", "sve"))) float
+foo () { return 3; } /* { dg-error "conflicting .default. versions" } */
diff --git a/gcc/testsuite/g++.target/aarch64/mv-error1.C 
b/gcc/testsuite/g++.target/aarch64/mv-error1.C
new file mode 100644
index 000..0b9642c9ab6
--- /dev/null
+++ b/gcc/testsuite/g++.target/aarch64/mv-error1.C
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-require-ifunc "" } */
+/* { dg-options "-O0" 

[PATCH v4 12/20] Refactor FMV name mangling.

2025-04-15 Thread Alfie Richards
This patch is an overhaul of how FMV name mangling works. Previously
mangling logic was duplicated in several places across both target
specific and independent code. This patch changes this such that all
mangling is done in targetm.mangle_decl_assembler_name (including for the
dispatched symbol and dispatcher resolver).

This allows for the removing of previous hacks, such as where the default
mangled decl's assembler name was unmangled to then remangle all versions
and the resolver and dispatched symbol.

This introduces a change (shown in test changes) for the assembler name of the
dispatched symbol for a x86 versioned function set. Previously it used the
function name mangled twice. This was hard to reproduce without hacks I
wasn't comfortable with. Therefore, the mangling is changed to instead append
".ifunc" which matches clang's behavior.

This change also refactors expand_target_clone using
targetm.mangle_decl_assembler_name for mangling and get_clone_versions.
It is modified such that if the target_clone is in a FMV structure
the ordering is preserved once expanded. This is used later for ACLE semantics
and target_clone/target_version mixing.

gcc/ChangeLog:

* attribs.cc (make_dispatcher_decl): Move duplicated cgraph logic into
this function and change to use targetm.mangle_decl_assembler_name for
mangling.
* cgraph.cc (delete_function_version): Made public static member of
cgraph_node.
* cgraph.h (delete_function_version): Ditto.
* config/aarch64/aarch64.cc (aarch64_parse_fmv_features): Change to
support string_slice.
(aarch64_process_target_version_attr): Ditto.
(get_feature_mask_for_version): Ditto.
(aarch64_mangle_decl_assembler_name): Add logic for mangling dispatched
symbol and resolver.
(get_suffixed_assembler_name): Removed.
(make_resolver_func): Refactor to use
aarch64_mangle_decl_assembler_name for mangling.
(aarch64_generate_version_dispatcher_body): Remove remangling.
(aarch64_get_function_versions_dispatcher): Refactor to remove
duplicated cgraph logic.
* config/i386/i386-features.cc (is_valid_asm_symbol): Moved from
multiple_target.cc.
(create_new_asm_name): Ditto.
(ix86_mangle_function_version_assembler_name): Refactor to use
clone_identifier and to mangle default.
(ix86_mangle_decl_assembler_name): Add logic for mangling dispatched
symbol and resolver.
(ix86_get_function_versions_dispatcher): Remove duplicated cgraph
logic.
(make_resolver_func): Refactor to use ix86_mangle_decl_assembler_name
for mangling.
* config/riscv/riscv.cc (riscv_mangle_decl_assembler_name): Add logic
for FMV mangling.
(get_suffixed_assembler_name): Removed.
(make_resolver_func): Refactor to use riscv_mangle_decl_assembler_name
for mangling.
(riscv_generate_version_dispatcher_body): Remove unnecessary remangling.
(riscv_get_function_versions_dispatcher): Remove duplicated cgraph
logic.
* config/rs6000/rs6000.cc (rs6000_mangle_decl_assembler_name): New
function.
(rs6000_get_function_versions_dispatcher): Remove duplicated cgraph
logic.
(make_resolver_func): Refactor to use rs6000_mangle_decl_assembler_name
for mangling.
(is_valid_asm_symbol): Move from multiple_target.cc.
(create_new_asm_name): Ditto.
(rs6000_mangle_function_version_assembler_name): New function.
* multiple_target.cc (create_dispatcher_calls): Remove mangling code.
(get_attr_str): Removed.
(separate_attrs): Ditto.
(is_valid_asm_symbol): Moved to target specific.
(create_new_asm_name): Ditto.
(expand_target_clones): Refactor to use
targetm.mangle_decl_assembler_name for mangling and be more general.
* tree.cc (get_target_clone_attr_len): Removed.
* tree.h (get_target_clone_attr_len): Removed.

gcc/cp/ChangeLog:

* decl.cc (maybe_mark_function_versioned): Change to insert function 
version
and therefore record assembler name.

gcc/testsuite/ChangeLog:

* g++.target/i386/mv-symbols1.C: Update x86 FMV mangling.
* g++.target/i386/mv-symbols3.C: Ditto.
* g++.target/i386/mv-symbols4.C: Ditto.
* g++.target/i386/mv-symbols5.C: Ditto.
---
 gcc/attribs.cc  |  45 +++-
 gcc/cgraph.cc   |   4 +-
 gcc/cgraph.h|   2 +
 gcc/config/aarch64/aarch64.cc   | 163 +---
 gcc/config/i386/i386-features.cc| 108 +---
 gcc/config/riscv/riscv.cc   | 110 +++-
 gcc/config/rs6000/rs6000.cc | 115 +++--
 gcc/cp/decl.cc  |   7 +
 gcc/multiple_target.cc  | 262 +++-
 gcc/testsu

[PATCH v4 11/20] Add clone_identifier function.

2025-04-15 Thread Alfie Richards
This is similar to clone_function_name and its siblings but takes an
identifier tree node rather than a function declaration.

This is to be used in conjunction with the identifier node stored in
cgraph_function_version_info::assembler_name to mangle FMV functions in
later patches.

gcc/ChangeLog:

* cgraph.h (clone_identifier): New function.
* cgraphclones.cc (clone_identifier): New function.
clone_function_name: Refactored to use clone_identifier.
---
 gcc/cgraph.h|  1 +
 gcc/cgraphclones.cc | 16 ++--
 2 files changed, 11 insertions(+), 6 deletions(-)

diff --git a/gcc/cgraph.h b/gcc/cgraph.h
index 55812cc09a2..d6d8e066da6 100644
--- a/gcc/cgraph.h
+++ b/gcc/cgraph.h
@@ -2630,6 +2630,7 @@ tree clone_function_name (const char *name, const char 
*suffix,
 tree clone_function_name (tree decl, const char *suffix,
  unsigned long number);
 tree clone_function_name (tree decl, const char *suffix);
+tree clone_identifier (tree decl, const char *suffix);
 
 void tree_function_versioning (tree, tree, vec *,
   ipa_param_adjustments *,
diff --git a/gcc/cgraphclones.cc b/gcc/cgraphclones.cc
index 5332a433317..6b650849a63 100644
--- a/gcc/cgraphclones.cc
+++ b/gcc/cgraphclones.cc
@@ -557,6 +557,14 @@ clone_function_name (tree decl, const char *suffix)
   /* For consistency this needs to behave the same way as
  ASM_FORMAT_PRIVATE_NAME does, but without the final number
  suffix.  */
+  return clone_identifier (identifier, suffix);
+}
+
+/* Return a new clone of ID ending with the string SUFFIX.  */
+
+tree
+clone_identifier (tree id, const char *suffix)
+{
   char *separator = XALLOCAVEC (char, 2);
   separator[0] = symbol_table::symbol_suffix_separator ();
   separator[1] = 0;
@@ -565,15 +573,11 @@ clone_function_name (tree decl, const char *suffix)
 #else
   const char *prefix = "";
 #endif
-  char *result = ACONCAT ((prefix,
-  IDENTIFIER_POINTER (identifier),
-  separator,
-  suffix,
-  (char*)0));
+  char *result = ACONCAT (
+(prefix, IDENTIFIER_POINTER (id), separator, suffix, (char *) 0));
   return get_identifier (result);
 }
 
-
 /* Create callgraph node clone with new declaration.  The actual body will be
copied later at compilation stage.  The name of the new clone will be
constructed from the name of the original node, SUFFIX and NUM_SUFFIX.
-- 
2.34.1



[PATCH v4 03/20] Add string_slice class.

2025-04-15 Thread Alfie Richards
The string_slice inherits from array_slice and is used to refer to a
substring of an array that is memory managed elsewhere without modifying
the underlying array.

For example, this is useful in cases such as when needing to refer to a
substring of an attribute in the syntax tree.

Adds some minimal helper functions for string_slice,
such as a strtok alternative, equality operators, strcmp, and a function
to strip whitespace from the beginning and end of a string_slice.

gcc/c-family/ChangeLog:

* c-format.cc (local_string_slice_node): New node type.
(asm_fprintf_char_table): New entry.
(init_dynamic_diag_info): Add support for string_slice.
* c-format.h (T_STRING_SLICE): New node type.

gcc/ChangeLog:

* pretty-print.cc (format_phase_2): Add support for string_slice.
* vec.cc (string_slice::tokenize): New method.
(strcmp): New implementation for string_slice.
(string_slice::strip): New method.
(test_string_slice_initializers): New test.
(test_string_slice_tokenize): Ditto.
(test_string_slice_strcmp): Ditto.
(test_string_slice_equality): Ditto.
(test_string_slice_inequality): Ditto.
(test_string_slice_invalid): Ditto.
(test_string_slice_strip): Ditto.
(vec_cc_tests): Add new tests.
* vec.h (class string_slice): New class.
(strcmp): New implementation for stirng_slice.
---
 gcc/c-family/c-format.cc |   7 ++
 gcc/c-family/c-format.h  |   1 +
 gcc/pretty-print.cc  |  10 ++
 gcc/vec.cc   | 207 +++
 gcc/vec.h|  45 +
 5 files changed, 270 insertions(+)

diff --git a/gcc/c-family/c-format.cc b/gcc/c-family/c-format.cc
index 211d20dd25b..dd650d9d520 100644
--- a/gcc/c-family/c-format.cc
+++ b/gcc/c-family/c-format.cc
@@ -70,6 +70,7 @@ static GTY(()) tree local_event_ptr_node;
 static GTY(()) tree local_pp_element_ptr_node;
 static GTY(()) tree local_gimple_ptr_node;
 static GTY(()) tree local_cgraph_node_ptr_node;
+static GTY(()) tree local_string_slice_node;
 static GTY(()) tree locus;
 
 static bool decode_format_attr (const_tree, tree, tree, function_format_info *,
@@ -770,6 +771,7 @@ static const format_char_info asm_fprintf_char_table[] =
   { "p",   1, STD_C89, { T89_V,   BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  
BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN  }, "q",  "c",  NULL }, \
   { "r",   1, STD_C89, { T89_C,   BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  
BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN  }, "","//cR",   NULL 
}, \
   { "@",   1, STD_C89, { T_EVENT_PTR,   BADLEN,  BADLEN,  BADLEN,  BADLEN,  
BADLEN,  BADLEN,  BADLEN,  BADLEN  }, "", "\"",   NULL }, \
+  { "B",   1, STD_C89, { T_STRING_SLICE,   BADLEN,  BADLEN,  BADLEN,  BADLEN,  
BADLEN,  BADLEN,  BADLEN,  BADLEN  }, "q", "",   NULL }, \
   { "e",   1, STD_C89, { T_PP_ELEMENT_PTR,   BADLEN,  BADLEN,  BADLEN,  
BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN  }, "", "\"", NULL }, \
   { "<",   0, STD_C89, NOARGUMENTS, "",  "<",   NULL }, \
   { ">",   0, STD_C89, NOARGUMENTS, "",  ">",   NULL }, \
@@ -5211,6 +5213,11 @@ init_dynamic_diag_info (void)
   || local_cgraph_node_ptr_node == void_type_node)
 local_cgraph_node_ptr_node = get_named_type ("cgraph_node");
 
+  /* Similar to the above but for string_slice*.  */
+  if (!local_string_slice_node
+  || local_string_slice_node == void_type_node)
+local_string_slice_node = get_named_type ("string_slice");
+
   /* Similar to the above but for diagnostic_event_id_t*.  */
   if (!local_event_ptr_node
   || local_event_ptr_node == void_type_node)
diff --git a/gcc/c-family/c-format.h b/gcc/c-family/c-format.h
index 323338cb8e7..d44d3862d83 100644
--- a/gcc/c-family/c-format.h
+++ b/gcc/c-family/c-format.h
@@ -317,6 +317,7 @@ struct format_kind_info
 #define T89_G   { STD_C89, NULL, &local_gimple_ptr_node }
 #define T_CGRAPH_NODE   { STD_C89, NULL, &local_cgraph_node_ptr_node }
 #define T_EVENT_PTR{ STD_C89, NULL, &local_event_ptr_node }
+#define T_STRING_SLICE{ STD_C89, NULL, &local_string_slice_node }
 #define T_PP_ELEMENT_PTR{ STD_C89, NULL, &local_pp_element_ptr_node }
 #define T89_T   { STD_C89, NULL, &local_tree_type_node }
 #define T89_V  { STD_C89, NULL, T_V }
diff --git a/gcc/pretty-print.cc b/gcc/pretty-print.cc
index abd6c0b528f..aacd43420dd 100644
--- a/gcc/pretty-print.cc
+++ b/gcc/pretty-print.cc
@@ -2035,6 +2035,16 @@ format_phase_2 (pretty_printer *pp,
pp_string (pp, va_arg (*text.m_args_ptr, const char *));
  break;
 
+   case 'B':
+ {
+   string_slice s = *va_arg (*text.m_args_ptr, string_slice *);
+   if (quote)
+ pp_quoted_string (pp, s.begin (), s.size ());
+   else
+ pp_string_n (pp, s.begin (), s.size ());
+   break;
+ }
+
case 'p':
  pp_pointer (pp, va_arg (*text.m_args_ptr, void *));
  

[PATCH v4 06/20] Refactor record_function_versions.

2025-04-15 Thread Alfie Richards
Renames record_function_versions to add_function_version, and make it
explicit that it is adding a single version to the function structure.

Additionally, change the insertion point to always maintain priority ordering
of the versions.

This allows for removing logic for moving the default to the first
position which was duplicated across target specific code and enables
easier reasoning about function sets.

gcc/ChangeLog:

* cgraph.cc (cgraph_node::record_function_versions): Refactor and
rename to...
(cgraph_node::add_function_version): new function.
* cgraph.h (cgraph_node::record_function_versions): Refactor and
rename to...
(cgraph_node::add_function_version): new function.
* config/aarch64/aarch64.cc (aarch64_get_function_versions_dispatcher):
Remove reordering.
* config/i386/i386-features.cc (ix86_get_function_versions_dispatcher):
Remove reordering.
* config/riscv/riscv.cc (riscv_get_function_versions_dispatcher):
Remove reordering.
* config/rs6000/rs6000.cc (rs6000_get_function_versions_dispatcher):
Remove reordering.

gcc/cp/ChangeLog:

* decl.cc (maybe_version_functions): Change record_function_versions
call to add_function_version.
---
 gcc/cgraph.cc| 75 +++-
 gcc/cgraph.h |  6 +--
 gcc/config/aarch64/aarch64.cc| 34 +++
 gcc/config/i386/i386-features.cc | 33 +++---
 gcc/config/riscv/riscv.cc| 38 +++-
 gcc/config/rs6000/rs6000.cc  | 35 +++
 gcc/cp/decl.cc   |  8 +++-
 7 files changed, 78 insertions(+), 151 deletions(-)

diff --git a/gcc/cgraph.cc b/gcc/cgraph.cc
index 6ae6a97f6f5..feaeebec40b 100644
--- a/gcc/cgraph.cc
+++ b/gcc/cgraph.cc
@@ -231,45 +231,60 @@ cgraph_node::delete_function_version_by_decl (tree decl)
   decl_node->remove ();
 }
 
-/* Record that DECL1 and DECL2 are semantically identical function
+/* Add decl to the structure of semantically identical function versions.
+   The node is inserted at the point maintaining the priority ordering on the
versions.  */
 void
-cgraph_node::record_function_versions (tree decl1, tree decl2)
+cgraph_node::add_function_version (cgraph_function_version_info *fn_v,
+  tree decl)
 {
-  cgraph_node *decl1_node = cgraph_node::get_create (decl1);
-  cgraph_node *decl2_node = cgraph_node::get_create (decl2);
-  cgraph_function_version_info *decl1_v = NULL;
-  cgraph_function_version_info *decl2_v = NULL;
-  cgraph_function_version_info *before;
-  cgraph_function_version_info *after;
-
-  gcc_assert (decl1_node != NULL && decl2_node != NULL);
-  decl1_v = decl1_node->function_version ();
-  decl2_v = decl2_node->function_version ();
-
-  if (decl1_v != NULL && decl2_v != NULL)
-return;
-
-  if (decl1_v == NULL)
-decl1_v = decl1_node->insert_new_function_version ();
+  cgraph_node *decl_node = cgraph_node::get_create (decl);
+  cgraph_function_version_info *decl_v = NULL;
 
-  if (decl2_v == NULL)
-decl2_v = decl2_node->insert_new_function_version ();
+  gcc_assert (decl_node != NULL);
 
-  /* Chain decl2_v and decl1_v.  All semantically identical versions
- will be chained together.  */
+  decl_v = decl_node->function_version ();
 
-  before = decl1_v;
-  after = decl2_v;
+  /* If the nodes are already linked, skip.  */
+  if (decl_v != NULL && (decl_v->next || decl_v->prev))
+return;
 
-  while (before->next != NULL)
-before = before->next;
+  if (decl_v == NULL)
+decl_v = decl_node->insert_new_function_version ();
+
+  gcc_assert (decl_v);
+  gcc_assert (fn_v);
+
+  /* Go to start of the FMV structure.  */
+  while (fn_v->prev)
+fn_v = fn_v->prev;
+
+  cgraph_function_version_info *insert_point_before = NULL;
+  cgraph_function_version_info *insert_point_after = fn_v;
+
+  /* Find the insertion point for the new version to maintain ordering.
+ The default node must always go at the beginning.  */
+  if (!is_function_default_version (decl))
+while (insert_point_after
+  && (targetm.compare_version_priority
+(decl, insert_point_after->this_node->decl) > 0
+  || is_function_default_version
+   (insert_point_after->this_node->decl)
+  || lookup_attribute
+   ("target_clones",
+DECL_ATTRIBUTES (insert_point_after->this_node->decl
+  {
+   insert_point_before = insert_point_after;
+   insert_point_after = insert_point_after->next;
+  }
 
-  while (after->prev != NULL)
-after= after->prev;
+  decl_v->prev = insert_point_before;
+  decl_v->next= insert_point_after;
 
-  before->next = after;
-  after->prev = before;
+  if (insert_point_before)
+insert_point_before->next = decl_v;
+  if (insert_point_after)
+insert_point_after->prev = decl_v;
 }
 
 /* Initialize callgraph dump file.  

[PATCH v4 08/20] Add get_clone_versions and get_target_version functions.

2025-04-15 Thread Alfie Richards
This is a reimplementation of get_target_clone_attr_len,
get_attr_str, and separate_attrs using string_slice and auto_vec to make
memory management and use simpler.

Adds get_target_version helper function to get the target_version string
from a decl.

gcc/c-family/ChangeLog:

* c-attribs.cc (handle_target_clones_attribute): Change to use
get_clone_versions.

gcc/ChangeLog:

* tree.cc (get_clone_versions): New function.
(get_clone_attr_versions): New function.
(get_version): New function.
* tree.h (get_clone_versions): New function.
(get_clone_attr_versions): New function.
(get_target_version): New function.
---
 gcc/c-family/c-attribs.cc |  4 ++-
 gcc/tree.cc   | 59 +++
 gcc/tree.h| 11 
 3 files changed, 73 insertions(+), 1 deletion(-)

diff --git a/gcc/c-family/c-attribs.cc b/gcc/c-family/c-attribs.cc
index 5a0e3d328ba..5dff489fcca 100644
--- a/gcc/c-family/c-attribs.cc
+++ b/gcc/c-family/c-attribs.cc
@@ -6132,7 +6132,9 @@ handle_target_clones_attribute (tree *node, tree name, 
tree ARG_UNUSED (args),
}
}
 
-  if (get_target_clone_attr_len (args) == -1)
+  auto_vec versions= get_clone_attr_versions (args, NULL);
+
+  if (versions.length () == 1)
{
  warning (OPT_Wattributes,
   "single % attribute is ignored");
diff --git a/gcc/tree.cc b/gcc/tree.cc
index eccfcc89da4..fdcdfb336bc 100644
--- a/gcc/tree.cc
+++ b/gcc/tree.cc
@@ -15372,6 +15372,65 @@ get_target_clone_attr_len (tree arglist)
   return str_len_sum;
 }
 
+/* Returns an auto_vec of string_slices containing the version strings from
+   ARGLIST.  DEFAULT_COUNT is incremented for each default version found.  */
+
+auto_vec
+get_clone_attr_versions (const tree arglist, int *default_count)
+{
+  gcc_assert (TREE_CODE (arglist) == TREE_LIST);
+  auto_vec versions;
+
+  static const char separator_str[] = {TARGET_CLONES_ATTR_SEPARATOR, 0};
+  string_slice separators = string_slice (separator_str);
+
+  for (tree arg = arglist; arg; arg = TREE_CHAIN (arg))
+{
+  string_slice str = string_slice (TREE_STRING_POINTER (TREE_VALUE (arg)));
+  while (str.is_valid ())
+   {
+ string_slice attr = string_slice::tokenize (&str, separators);
+ attr = attr.strip ();
+
+ if (attr == "default" && default_count)
+   (*default_count)++;
+ versions.safe_push (attr);
+   }
+}
+  return versions;
+}
+
+/* Returns an auto_vec of string_slices containing the version strings from
+   the target_clone attribute from DECL.  DEFAULT_COUNT is incremented for each
+   default version found.  */
+auto_vec
+get_clone_versions (const tree decl, int *default_count)
+{
+  tree attr = lookup_attribute ("target_clones", DECL_ATTRIBUTES (decl));
+  if (!attr)
+return auto_vec ();
+  tree arglist = TREE_VALUE (attr);
+  return get_clone_attr_versions (arglist, default_count);
+}
+
+/* If DECL has a target_version attribute, returns a string_slice containing 
the
+   attribute value.  Otherwise, returns string_slice::invalid.
+   Only works for target_version due to target attributes allowing multiple
+   string arguments to specify one target.  */
+string_slice
+get_target_version (const tree decl)
+{
+  gcc_assert (!TARGET_HAS_FMV_TARGET_ATTRIBUTE);
+
+  tree attr = lookup_attribute ("target_version", DECL_ATTRIBUTES (decl));
+
+  if (!attr)
+return string_slice::invalid ();
+
+  return string_slice (TREE_STRING_POINTER (TREE_VALUE (TREE_VALUE (attr
+  .strip ();
+}
+
 void
 tree_cc_finalize (void)
 {
diff --git a/gcc/tree.h b/gcc/tree.h
index 99f26177628..a89f3cf7189 100644
--- a/gcc/tree.h
+++ b/gcc/tree.h
@@ -22,6 +22,7 @@ along with GCC; see the file COPYING3.  If not see
 
 #include "tree-core.h"
 #include "options.h"
+#include "vec.h"
 
 /* Convert a target-independent built-in function code to a combined_fn.  */
 
@@ -7052,4 +7053,14 @@ extern tree get_attr_nonstring_decl (tree, tree * = 
NULL);
 
 extern int get_target_clone_attr_len (tree);
 
+/* Returns the version string for a decl with target_version attribute.
+   Returns an invalid string_slice if no attribute is present.  */
+extern string_slice get_target_version (const tree);
+/* Returns a vector of the version strings from a target_clones attribute on
+   a decl.  Can also record the number of default versions found.  */
+extern auto_vec get_clone_versions (const tree, int * = NULL);
+/* Returns a vector of the version strings from a target_clones attribute
+   directly.  */
+extern auto_vec get_clone_attr_versions (const tree, int *);
+
 #endif  /* GCC_TREE_H  */
-- 
2.34.1



Re: [PATCH v2 3/4] libstdc++: Implement std::extents [PR107761].

2025-04-15 Thread Tomasz Kaminski
On Tue, Apr 15, 2025 at 10:43 AM Luc Grosheintz 
wrote:

> This implements std::extents from  according to N4950 and
> contains partial progress towards PR107761.
>
> If an extent changes its type, there's a precondition in the standard,
> that the value is representable in the target integer type. This commit
> uses direct initialization to perform the conversion, without any
> additional checks.
>
> The precondition for 'extents::{static_,}extent' is that '__r < rank()'.
> For extents this precondition is always violated and results in
> calling __builtin_trap. For all other specializations it's checked via
> __glibcxx_assert.
>
> PR libstdc++/107761
>
> libstdc++-v3/ChangeLog:
>
> * include/std/mdspan (extents): New class.
> * src/c++23/std.cc.in: Add 'using std::extents'.
>
> Signed-off-by: Luc Grosheintz 
> ---
>  libstdc++-v3/include/std/mdspan  | 304 +++
>  libstdc++-v3/src/c++23/std.cc.in |   6 +-
>  2 files changed, 309 insertions(+), 1 deletion(-)
>
> diff --git a/libstdc++-v3/include/std/mdspan
> b/libstdc++-v3/include/std/mdspan
> index 4094a416d1e..72ca3445d15 100644
> --- a/libstdc++-v3/include/std/mdspan
> +++ b/libstdc++-v3/include/std/mdspan
> @@ -33,6 +33,10 @@
>  #pragma GCC system_header
>  #endif
>
> +#include 
> +#include 
> +#include 
> +
>  #define __glibcxx_want_mdspan
>  #include 
>
> @@ -41,6 +45,306 @@
>  namespace std _GLIBCXX_VISIBILITY(default)
>  {
>  _GLIBCXX_BEGIN_NAMESPACE_VERSION
> +  namespace __mdspan
> +  {
> +template
> +  class __array
> +  {
> +  public:
> +   constexpr _Tp&
> + operator[](size_t __n) noexcept
> + {
> +   return _M_elems[__n];
> + }
> +
> +   constexpr const _Tp&
> + operator[](size_t __n) const noexcept
> + {
> +   return _M_elems[__n];
> + }
> +
> +  private:
> +   array<_Tp, _Nm> _M_elems;
> +  };
> +
> +template
> +  class __array<_Tp, 0>
> +  {
> +  public:
> +   constexpr _Tp&
> + operator[](size_t __n) noexcept
> + {
> +   __builtin_trap();
> + }
> +
> +   constexpr const _Tp&
> + operator[](size_t __n) const noexcept
> + {
> +   __builtin_trap();
> + }
> +  };
> +
> +template
> +  class _ExtentsStorage
> +  {
> +  public:
> +   static constexpr bool
> +   _M_is_dyn(size_t __ext) noexcept
> +   { return __ext == dynamic_extent; }
> +
> +   template
> + static constexpr _IndexType
> + _M_int_cast(const _OIndexType& __other) noexcept
> + { return _IndexType(__other); }
> +
> +   static constexpr size_t _S_rank = sizeof...(_Extents);
> +   static constexpr array _S_exts{_Extents...};
> +
> +   // For __r in [0, _S_rank], _S_dynamic_index[__r] is the number
> +   // of dynamic extents up to (and not including) __r.
> +   //
> +   // If __r is the index of a dynamic extent, then
> +   // _S_dynamic_index[__r] is the index of that extent in
> +   // _M_dynamic_extents.
> +   static constexpr auto _S_dynamic_index = [] consteval
> +   {
> + array __ret;
> + size_t __dyn = 0;
> + for(size_t __i = 0; __i < _S_rank; ++__i)
> +   {
> + __ret[__i] = __dyn;
> + __dyn += _M_is_dyn(_S_exts[__i]);
> +   }
> + __ret[_S_rank] = __dyn;
> + return __ret;
> +   }();
> +
> +   static constexpr size_t _S_rank_dynamic =
> _S_dynamic_index[_S_rank];
> +
> +   // For __r in [0, _S_rank_dynamic), _S_dynamic_index_inv[__r] is
> the
> +   // index of the __r-th dynamic extent in _S_exts.
> +   static constexpr auto _S_dynamic_index_inv = [] consteval
> +   {
> + array __ret;
> + for (size_t __i = 0, __r = 0; __i < _S_rank; ++__i)
> +   if (_M_is_dyn(_S_exts[__i]))
> + __ret[__r++] = __i;
> + return __ret;
> +   }();
> +
> +   static constexpr size_t
> +   _M_static_extent(size_t __r) noexcept
> +   { return _S_exts[__r]; }
> +
> +   constexpr _IndexType
> +   _M_extent(size_t __r) const noexcept
> +   {
> + auto __se = _S_exts[__r];
> + if (__se == dynamic_extent)
> +   return _M_dynamic_extents[_S_dynamic_index[__r]];
> + else
> +   return __se;
> +   }
> +
> +  private:
> +   template
> + constexpr void
> + _M_init_dynamic_extents(_GetOtherExtent __get_extent) noexcept
> + {
> +   for(size_t __i = 0; __i < _S_rank_dynamic; ++__i)
> + {
> +   size_t __di = __i;
> +   if constexpr (_OtherRank != _S_rank_dynamic)
> + __di = _S_dynamic_index_inv[__i];
> +   _M_dynamic_extents[__i] = _M_int_cast(__get_extent(__di));
> + }
> + }
> +
> +  public:
> +   constexpr
> +   _ExtentsStorage() noexce

[PATCH]middle-end: Fix incorrect codegen with PFA and VLS [PR119351]

2025-04-15 Thread Tamar Christina
Hi All,

The following example:

#define N 512
#define START 2
#define END 505

int x[N] __attribute__((aligned(32)));

int __attribute__((noipa))
foo (void)
{
  for (signed int i = START; i < END; ++i)
{
  if (x[i] == 0)
return i;
}
  return -1;
}

generates incorrect code with fixed length SVE because for early break we need
to know which value to start the scalar loop with if we take an early exit.

Historically this means that we take the first element of every induction.
this is because there's an assumption in place, that even with masked loops the
masks come from a whilel* instruction.

As such we reduce using a BIT_FIELD_REF <, 0>.

When PFA was added this assumption was correct for non-masked loop, however we
assumed that PFA for VLA wouldn't work for now, and disabled it using the
alignment requirement checks.  We also expected VLS to PFA using scalar loops.

However as this PR shows, for VLS the vectorizer can, and does in some
circumstances choose to peel using masks by masking the first iteration of the
loop with an additional alignment mask.

When this is done, the first elements of the predicate can be inactive. In this
example element 1 is inactive based on the calculated misalignment.  hence the
-1 value in the first vector IV element.

When we reduce using BIT_FIELD_REF we get the wrong value.

This patch updates it by creating a new scalar PHI that keeps track of whether
we are the first iteration of the loop (with the additional masking) or whether
we have taken a loop iteration already.

The generated sequence:

pre-header:
  bb1:
i_1 = 

header:
  bb2:
i_2 = PHI 
…

early-exit:
  bb3:
i_3 = iv_step * i_2 + PHI

Which eliminates the need to do an expensive mask based reduction.

This fixes gromacs with one OpenMP thread. But with > 1 there is still an issue.

Bootstrapped Regtested on aarch64-none-linux-gnu,
arm-none-linux-gnueabihf, x86_64-pc-linux-gnu
-m32, -m64 and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

PR tree-optimization/119351
* tree-vect-loop-manip.cc (vect_can_peel_nonlinear_iv_p): Reject PFA
with masking with non-linear IVs.
* tree-vect-loop.cc (vectorizable_induction): Support PFA for masking.

gcc/testsuite/ChangeLog:

PR tree-optimization/119351
* gcc.target/aarch64/sve/peel_ind_10.c: New test.
* gcc.target/aarch64/sve/peel_ind_10_run.c: New test.
* gcc.target/aarch64/sve/peel_ind_5.c: New test.
* gcc.target/aarch64/sve/peel_ind_5_run.c: New test.
* gcc.target/aarch64/sve/peel_ind_6.c: New test.
* gcc.target/aarch64/sve/peel_ind_6_run.c: New test.
* gcc.target/aarch64/sve/peel_ind_7.c: New test.
* gcc.target/aarch64/sve/peel_ind_7_run.c: New test.
* gcc.target/aarch64/sve/peel_ind_8.c: New test.
* gcc.target/aarch64/sve/peel_ind_8_run.c: New test.
* gcc.target/aarch64/sve/peel_ind_9.c: New test.
* gcc.target/aarch64/sve/peel_ind_9_run.c: New test.

---
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/peel_ind_10.c 
b/gcc/testsuite/gcc.target/aarch64/sve/peel_ind_10.c
new file mode 100644
index 
..b7a7bc5cb0cfdfdb74adb120c54ba15019832cf1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/peel_ind_10.c
@@ -0,0 +1,24 @@
+/* Fix for PR119351 alignment peeling with vectors and VLS.  */
+/* { dg-do compile } */
+/* { dg-options "-Ofast -msve-vector-bits=256 --param 
aarch64-autovec-preference=sve-only -fdump-tree-vect-details" } */
+
+#define N 512
+#define START 0
+#define END 505
+ 
+int x[N] __attribute__((aligned(32)));
+
+int __attribute__((noipa))
+foo (int start)
+{
+  for (unsigned int i = start; i < END; ++i)
+{
+  if (x[i] == 0)
+return i;
+}
+  return -1;
+}
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+/* { dg-final { scan-tree-dump "pfa_iv_offset" "vect" } } */
+/* { dg-final { scan-tree-dump "Alignment of access forced using peeling" 
"vect" } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/peel_ind_10_run.c 
b/gcc/testsuite/gcc.target/aarch64/sve/peel_ind_10_run.c
new file mode 100644
index 
..6169aebcc40cc1553f30c1af61ccec91b51cdb42
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/peel_ind_10_run.c
@@ -0,0 +1,17 @@
+/* Fix for PR119351 alignment peeling with vectors and VLS.  */
+/* { dg-do run { target aarch64_sve_hw } } */
+/* { dg-options "-Ofast --param aarch64-autovec-preference=sve-only" } */
+/* { dg-additional-options "-msve-vector-bits=256" { target aarch64_sve256_hw 
} } */
+/* { dg-additional-options "-msve-vector-bits=128" { target aarch64_sve128_hw 
} } */
+
+#include "peel_ind_10.c"
+
+int __attribute__ ((optimize (1)))
+main (void)
+{
+  int res = foo (START);
+  asm volatile ("");
+  if (res != START)
+__builtin_abort ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/peel_ind_5.c 
b/gcc/testsuit

[PATCH] ipa-cp: Fix up ipcp_print_widest_int

2025-04-15 Thread Jakub Jelinek
On Mon, Mar 31, 2025 at 03:34:07PM +0200, Martin Jambor wrote:
> This patch just introduces a form of dumping of widest ints that only
> have zeros in the lowest 128 bits so that instead of printing
> thousands of f's the output looks like:
> 
>Bits: value = 0x, mask = all ones folled by 
> 0x
> 
> and then makes sure we use the function not only to print bits but
> also to print masks where values like these can also occur.

Shouldn't that be followed by instead?
And the widest_int checks seems to be quite expensive (especially for
large widest_ints), I think for the first one we can just == -1
and for the second one wi::arshift (value, 128) == -1 and the zero extension
by using wi::zext.

Anyway, I wonder if it wouldn't be better to use something shorter,
the variant patch uses 0xf..f prefix before the 128-bit hexadecimal
number (maybe we could also special case the even more common bits 64+
are all ones case).  Or it could be 0xf*f prefix.  Or printing such
numbers as -0x prefixed negative, though that is not a good idea for masks.

Jakub
2025-04-15  Jakub Jelinek  

* ipa-cp.cc (ipcp_print_widest_int): Fix a typo, folled -> followed.
Simplify wide_int check for -1 or all ones above least significant
128 bits.

--- gcc/ipa-cp.cc.jj2025-04-15 07:55:18.369479825 +0200
+++ gcc/ipa-cp.cc   2025-04-15 11:37:03.059964475 +0200
@@ -313,14 +313,12 @@ ipcp_lattice::print (FILE * f,
 static void
 ipcp_print_widest_int (FILE *f, const widest_int &value)
 {
-  if (wi::eq_p (wi::bit_not (value), 0))
+  if (value == -1)
 fprintf (f, "-1");
-  else if (wi::eq_p (wi::bit_not (wi::bit_or (value,
- wi::sub (wi::lshift (1, 128),
-  1))), 0))
+  else if (wi::arshift (value, 128) == -1)
 {
-  fprintf (f, "all ones folled by ");
-  print_hex (wi::bit_and (value, wi::sub (wi::lshift (1, 128), 1)), f);
+  fprintf (f, "all ones followed by ");
+  print_hex (wi::zext (value, 128), f);
 }
   else
 print_hex (value, f);
2025-04-15  Jakub Jelinek  

* ipa-cp.cc (ipcp_print_widest_int): Print values with all ones in
bits 128+ with "0xf..f" prefix instead of "all ones folled by ".
Simplify wide_int check for -1 or all ones above least significant
128 bits.

--- gcc/ipa-cp.cc.jj2025-04-15 07:55:18.369479825 +0200
+++ gcc/ipa-cp.cc   2025-04-15 11:54:45.369704056 +0200
@@ -313,14 +313,20 @@ ipcp_lattice::print (FILE * f,
 static void
 ipcp_print_widest_int (FILE *f, const widest_int &value)
 {
-  if (wi::eq_p (wi::bit_not (value), 0))
+  if (value == -1)
 fprintf (f, "-1");
-  else if (wi::eq_p (wi::bit_not (wi::bit_or (value,
- wi::sub (wi::lshift (1, 128),
-  1))), 0))
+  else if (wi::arshift (value, 128) == -1)
 {
-  fprintf (f, "all ones folled by ");
-  print_hex (wi::bit_and (value, wi::sub (wi::lshift (1, 128), 1)), f);
+  char buf[35];
+  widest_int v = wi::zext (value, 128);
+  size_t len;
+  print_hex (v, buf);
+  len = strlen (buf + 2);
+  if (len == 32)
+   fprintf (f, "0xf..f");
+  else
+   fprintf (f, "0xf..f%0*d", (int) (32 - len), 0);
+  fputs (buf + 2, f);
 }
   else
 print_hex (value, f);


Re: [PATCH]middle-end: Fix incorrect codegen with PFA and VLS [PR119351]

2025-04-15 Thread Richard Sandiford
Tamar Christina  writes:
> diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
> index 
> 56a4e9a8b63f3cae0bf596bf5d22893887dc80e8..0722679d6e66e5dd5af4ec1ce591f7c38b76d07f
>  100644
> --- a/gcc/tree-vect-loop-manip.cc
> +++ b/gcc/tree-vect-loop-manip.cc
> @@ -2195,6 +2195,22 @@ vect_can_peel_nonlinear_iv_p (loop_vec_info loop_vinfo,
>return false;
>  }
>  
> +  /* With early break vectorization we don't know whether the accesses will 
> stay
> + inside the loop or not.  TODO: The early break adjustment code can be
> + implemented the same way for vectorizable_linear_induction.  However we
> + can't test this today so reject it.  */
> +  if (niters_skip != NULL_TREE
> +  && vect_use_loop_mask_for_alignment_p (loop_vinfo)
> +  && LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo)
> +  && LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> +{
> +  if (dump_enabled_p ())
> + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +  "Peeling for alignement using masking is not supported"
> +  " for nonlinear induction when using early breaks.\n");
> +  return false;
> +}
> +
>return true;
>  }

FTR, I was wondering here whether we should predict this in advance and
instead drop down to peeling for alignment without masks.  It probably
isn't worth the effort though.

> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index 
> 9413dcef702597ab27165e676546b190e2bd36ba..6dcdee19bb250993d8cc6b0057d2fa46245d04d9
>  100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -10678,6 +10678,104 @@ vectorizable_induction (loop_vec_info loop_vinfo,
>  LOOP_VINFO_MASK_SKIP_NITERS 
> (loop_vinfo));
> peel_mul = gimple_build_vector_from_val (&init_stmts,
>  step_vectype, peel_mul);
> +
> +   /* If early break then we have to create a new PHI which we can use as
> + an offset to adjust the induction reduction in early exits.  */
> +   if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> + {
> +   auto skip_niters = LOOP_VINFO_MASK_SKIP_NITERS (loop_vinfo);
> +   tree ty_skip_niters = TREE_TYPE (skip_niters);
> +   tree break_lhs_phi = NULL_TREE;
> +   break_lhs_phi = vect_get_new_vect_var (ty_skip_niters,
> +  vect_scalar_var,
> +  "pfa_iv_offset");
> +   gphi *nphi = create_phi_node (break_lhs_phi, bb);
> +   add_phi_arg (nphi, skip_niters, pe, UNKNOWN_LOCATION);
> +   add_phi_arg (nphi, build_zero_cst (ty_skip_niters),
> +loop_latch_edge (iv_loop), UNKNOWN_LOCATION);
> +
> +   /* Rewrite all the early exit usages.  */
> +   tree phi_lhs = PHI_RESULT (phi);
> +   imm_use_iterator iter;
> +   use_operand_p use_p;
> +   gimple *use_stmt;
> +
> +   FOR_EACH_IMM_USE_FAST (use_p, iter, phi_lhs)
> + {
> +   use_stmt = USE_STMT (use_p);
> +   if (!flow_bb_inside_loop_p (iv_loop, gimple_bb (use_stmt))
> +   && is_a  (use_stmt))
> + {
> +   auto gsi = gsi_last_bb (use_stmt->bb);
> +   for (auto exit : get_loop_exit_edges (iv_loop))
> + if (exit != LOOP_VINFO_IV_EXIT (loop_vinfo)
> + && bb == exit->src)
> +   {
> + /* Now create the PHI for the outside loop usage to
> +retrieve the value for the offset counter.  */
> + tree rphi_lhs = make_ssa_name (ty_skip_niters);
> + gphi *rphi
> +   = create_phi_node (rphi_lhs, use_stmt->bb);
> + for (unsigned i = 0; i < gimple_phi_num_args (rphi);
> +  i++)
> +   SET_PHI_ARG_DEF (rphi, i, PHI_RESULT (nphi));
> +
> + tree tmp = make_ssa_name (TREE_TYPE (phi_lhs));
> + tree stmt_lhs = PHI_RESULT (use_stmt);
> + imm_use_iterator iter2;
> + gimple *use_stmt2;
> + use_operand_p use2_p;
> +
> + /* Now rewrite all the usages first.  */
> + FOR_EACH_IMM_USE_STMT (use_stmt2, iter2, stmt_lhs)
> +   FOR_EACH_IMM_USE_ON_STMT (use2_p, iter2)
> + SET_USE (use2_p, tmp);
> +
> + /* And then generate the adjustment to avoid the
> +update code from updating this new usage.  The
> +multiplicaiton is to get the original IV and the
> +downwards counting IV correct.  */

typo: multipl

RE: [PATCH]middle-end: Fix incorrect codegen with PFA and VLS [PR119351]

2025-04-15 Thread Tamar Christina
> -Original Message-
> From: Richard Sandiford 
> Sent: Tuesday, April 15, 2025 10:52 AM
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; nd ; rguent...@suse.de
> Subject: Re: [PATCH]middle-end: Fix incorrect codegen with PFA and VLS
> [PR119351]
> 
> Tamar Christina  writes:
> > diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
> > index
> 56a4e9a8b63f3cae0bf596bf5d22893887dc80e8..0722679d6e66e5dd5af4ec1c
> e591f7c38b76d07f 100644
> > --- a/gcc/tree-vect-loop-manip.cc
> > +++ b/gcc/tree-vect-loop-manip.cc
> > @@ -2195,6 +2195,22 @@ vect_can_peel_nonlinear_iv_p (loop_vec_info
> loop_vinfo,
> >return false;
> >  }
> >
> > +  /* With early break vectorization we don't know whether the accesses 
> > will stay
> > + inside the loop or not.  TODO: The early break adjustment code can be
> > + implemented the same way for vectorizable_linear_induction.  However 
> > we
> > + can't test this today so reject it.  */
> > +  if (niters_skip != NULL_TREE
> > +  && vect_use_loop_mask_for_alignment_p (loop_vinfo)
> > +  && LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo)
> > +  && LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > +{
> > +  if (dump_enabled_p ())
> > +   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > +"Peeling for alignement using masking is not supported"
> > +" for nonlinear induction when using early breaks.\n");
> > +  return false;
> > +}
> > +
> >return true;
> >  }
> 
> FTR, I was wondering here whether we should predict this in advance and
> instead drop down to peeling for alignment without masks.  It probably
> isn't worth the effort though.

We could move the check into vect_use_loop_mask_for_alignment_p where
rejecting it there would get it to fall back to scalar peeling.  That seems 
simple enough
if that's preferrable.

Cheers,
Tamar
> 
> > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> > index
> 9413dcef702597ab27165e676546b190e2bd36ba..6dcdee19bb250993d8cc6b0
> 057d2fa46245d04d9 100644
> > --- a/gcc/tree-vect-loop.cc
> > +++ b/gcc/tree-vect-loop.cc
> > @@ -10678,6 +10678,104 @@ vectorizable_induction (loop_vec_info
> loop_vinfo,
> >LOOP_VINFO_MASK_SKIP_NITERS
> (loop_vinfo));
> >   peel_mul = gimple_build_vector_from_val (&init_stmts,
> >step_vectype, peel_mul);
> > +
> > + /* If early break then we have to create a new PHI which we can use as
> > +   an offset to adjust the induction reduction in early exits.  */
> > + if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > +   {
> > + auto skip_niters = LOOP_VINFO_MASK_SKIP_NITERS (loop_vinfo);
> > + tree ty_skip_niters = TREE_TYPE (skip_niters);
> > + tree break_lhs_phi = NULL_TREE;
> > + break_lhs_phi = vect_get_new_vect_var (ty_skip_niters,
> > +vect_scalar_var,
> > +"pfa_iv_offset");
> > + gphi *nphi = create_phi_node (break_lhs_phi, bb);
> > + add_phi_arg (nphi, skip_niters, pe, UNKNOWN_LOCATION);
> > + add_phi_arg (nphi, build_zero_cst (ty_skip_niters),
> > +  loop_latch_edge (iv_loop), UNKNOWN_LOCATION);
> > +
> > + /* Rewrite all the early exit usages.  */
> > + tree phi_lhs = PHI_RESULT (phi);
> > + imm_use_iterator iter;
> > + use_operand_p use_p;
> > + gimple *use_stmt;
> > +
> > + FOR_EACH_IMM_USE_FAST (use_p, iter, phi_lhs)
> > +   {
> > + use_stmt = USE_STMT (use_p);
> > + if (!flow_bb_inside_loop_p (iv_loop, gimple_bb (use_stmt))
> > + && is_a  (use_stmt))
> > +   {
> > + auto gsi = gsi_last_bb (use_stmt->bb);
> > + for (auto exit : get_loop_exit_edges (iv_loop))
> > +   if (exit != LOOP_VINFO_IV_EXIT (loop_vinfo)
> > +   && bb == exit->src)
> > + {
> > +   /* Now create the PHI for the outside loop usage to
> > +  retrieve the value for the offset counter.  */
> > +   tree rphi_lhs = make_ssa_name (ty_skip_niters);
> > +   gphi *rphi
> > + = create_phi_node (rphi_lhs, use_stmt->bb);
> > +   for (unsigned i = 0; i < gimple_phi_num_args (rphi);
> > +i++)
> > + SET_PHI_ARG_DEF (rphi, i, PHI_RESULT (nphi));
> > +
> > +   tree tmp = make_ssa_name (TREE_TYPE (phi_lhs));
> > +   tree stmt_lhs = PHI_RESULT (use_stmt);
> > +   imm_use_iterator iter2;
> > +   gimple *use_stmt2;
> > +   use_operand_p use2_p;
> > +
> > +   /* Now rewrite all

Re: [PATCH v1 0/4] Implement extents from the mdspan header.

2025-04-15 Thread Luc Grosheintz

The second iteration of this patch series is available here:
https://gcc.gnu.org/pipermail/libstdc++/2025-April/060988.html

Thank you for the reviews.

On 4/9/25 9:23 AM, Luc Grosheintz wrote:

Hi,

This is a patch series that implements std::extents from .

I've never contributed to GCC or libstdc++. Hence, any comments about
obvious issues, e.g. too few/many tests, not following conventions
w.r.t. checking prerequisites, use of 'friend class', choice of uglified
names that could clash with other of our uglified names, etc. will be
very well received. In a similar vein let me know if you want the patch
series split differently.

If you deem this an effective way of implementing , I would work
on std::layout_*, std::accessor_default and eventually std::mdspan.

Each commit was tested with 'make check-target-libstdc++-v3' on x86_64.

Thank you!

Luc Grosheintz (4):
   libstdc++: Setup internal FTM for mdspan.
   libstdc++: Add header mdspan to the build-system.
   libstdc++: Implement std::extents [PR107761].
   libstdc++: Add tests for std::extents.

  libstdc++-v3/doc/doxygen/user.cfg.in  |   1 +
  libstdc++-v3/include/Makefile.am  |   1 +
  libstdc++-v3/include/Makefile.in  |   1 +
  libstdc++-v3/include/bits/version.def |   9 +
  libstdc++-v3/include/bits/version.h   |  10 +
  libstdc++-v3/include/precompiled/stdc++.h |   4 +
  libstdc++-v3/include/std/mdspan   | 448 ++
  libstdc++-v3/src/c++23/std.cc.in  |   6 +-
  .../mdspan/extents/assign_copy.cc |  26 +
  .../mdspan/extents/assign_copy_01_neg.cc  |  15 +
  .../mdspan/extents/class_traits.cc|  20 +
  .../23_containers/mdspan/extents/ctor_copy.cc |  34 ++
  .../mdspan/extents/ctor_copy_01_neg.cc|  10 +
  .../mdspan/extents/ctor_copy_02_neg.cc|  10 +
  .../mdspan/extents/ctor_copy_constexpr.cc |  20 +
  .../mdspan/extents/ctor_copy_implicit_00.cc   |  28 ++
  .../mdspan/extents/ctor_copy_implicit_01.cc   |  22 +
  .../extents/ctor_copy_implicit_02_neg.cc  |  10 +
  .../extents/ctor_copy_implicit_03_neg.cc  |   9 +
  .../extents/ctor_copy_implicit_04_neg.cc  |   9 +
  .../mdspan/extents/ctor_ints_00.cc|  30 ++
  .../mdspan/extents/ctor_ints_01.cc|  24 +
  .../mdspan/extents/ctor_ints_02_neg.cc|   9 +
  .../mdspan/extents/ctor_ints_03_neg.cc|   9 +
  .../mdspan/extents/ctor_ints_constexpr.cc |  12 +
  .../extents/ctor_ints_implicit_00_neg.cc  |   9 +
  .../extents/ctor_ints_implicit_01_neg.cc  |   9 +
  .../mdspan/extents/ctor_shape_00.cc   |  35 ++
  .../mdspan/extents/ctor_shape_01.cc   |  17 +
  .../mdspan/extents/ctor_shape_02_neg.cc   |  10 +
  .../mdspan/extents/ctor_shape_03_neg.cc   |  11 +
  .../mdspan/extents/ctor_shape_constexpr.cc|  23 +
  .../mdspan/extents/ctor_shape_implicit_00.cc  |  42 ++
  .../mdspan/extents/ctor_shape_implicit_01.cc  |  19 +
  .../extents/ctor_shape_implicit_02_neg.cc |  11 +
  .../extents/ctor_shape_implicit_03_neg.cc |  12 +
  .../mdspan/extents/ctor_shape_implicit_04.cc  |  24 +
  .../mdspan/extents/custom_integer.cc  |  86 
  .../mdspan/extents/deduction_guide_00.cc  |  23 +
  .../mdspan/extents/deduction_guide_01_neg.cc  |  10 +
  .../23_containers/mdspan/extents/dextents.cc  |  11 +
  .../23_containers/mdspan/extents/extent.cc|  27 ++
  .../mdspan/extents/index_type.cc  |  14 +
  .../23_containers/mdspan/extents/ops_eq.cc|  58 +++
  .../23_containers/mdspan/extents/rank.cc  |  10 +
  .../mdspan/extents/rank_dynamic.cc|  11 +
  .../mdspan/extents/rank_return_type.cc|  14 +
  .../23_containers/mdspan/extents/rank_type.cc |   7 +
  .../23_containers/mdspan/extents/size_type.cc |  16 +
  .../23_containers/mdspan/extents/sizeof.cc|  10 +
  .../mdspan/extents/static_extent.cc   |  15 +
  51 files changed, 1310 insertions(+), 1 deletion(-)
  create mode 100644 libstdc++-v3/include/std/mdspan
  create mode 100644 
libstdc++-v3/testsuite/23_containers/mdspan/extents/assign_copy.cc
  create mode 100644 
libstdc++-v3/testsuite/23_containers/mdspan/extents/assign_copy_01_neg.cc
  create mode 100644 
libstdc++-v3/testsuite/23_containers/mdspan/extents/class_traits.cc
  create mode 100644 
libstdc++-v3/testsuite/23_containers/mdspan/extents/ctor_copy.cc
  create mode 100644 
libstdc++-v3/testsuite/23_containers/mdspan/extents/ctor_copy_01_neg.cc
  create mode 100644 
libstdc++-v3/testsuite/23_containers/mdspan/extents/ctor_copy_02_neg.cc
  create mode 100644 
libstdc++-v3/testsuite/23_containers/mdspan/extents/ctor_copy_constexpr.cc
  create mode 100644 
libstdc++-v3/testsuite/23_containers/mdspan/extents/ctor_copy_implicit_00.cc
  create mode 100644 
libstdc++-v3/testsuite/23_containers/mdspan/extents/ctor_copy_implicit_01.cc
  create mode 100644 
libstdc++-v3/testsuite/23_containers/mdspan/extents/ctor_copy_implicit_

[PATCH v2 2/4] libstdc++: Add header mdspan to the build-system.

2025-04-15 Thread Luc Grosheintz
Creates a nearly empty header mdspan and adds it to the build-system and
Doxygen config file.

libstdc++-v3/ChangeLog:

* doc/doxygen/user.cfg.in: Add .
* include/Makefile.am: Ditto.
* include/Makefile.in: Ditto.
* include/precompiled/stdc++.h: Ditto.
* include/std/mdspan: New file.

Signed-off-by: Luc Grosheintz 
---
 libstdc++-v3/doc/doxygen/user.cfg.in  |  1 +
 libstdc++-v3/include/Makefile.am  |  1 +
 libstdc++-v3/include/Makefile.in  |  1 +
 libstdc++-v3/include/precompiled/stdc++.h |  1 +
 libstdc++-v3/include/std/mdspan   | 48 +++
 5 files changed, 52 insertions(+)
 create mode 100644 libstdc++-v3/include/std/mdspan

diff --git a/libstdc++-v3/doc/doxygen/user.cfg.in 
b/libstdc++-v3/doc/doxygen/user.cfg.in
index 19ae67a67ba..e926c6707f6 100644
--- a/libstdc++-v3/doc/doxygen/user.cfg.in
+++ b/libstdc++-v3/doc/doxygen/user.cfg.in
@@ -880,6 +880,7 @@ INPUT  = @srcdir@/doc/doxygen/doxygroups.cc 
\
  include/list \
  include/locale \
  include/map \
+ include/mdspan \
  include/memory \
  include/memory_resource \
  include/mutex \
diff --git a/libstdc++-v3/include/Makefile.am b/libstdc++-v3/include/Makefile.am
index 537774c2668..1140fa0dffd 100644
--- a/libstdc++-v3/include/Makefile.am
+++ b/libstdc++-v3/include/Makefile.am
@@ -38,6 +38,7 @@ std_freestanding = \
${std_srcdir}/generator \
${std_srcdir}/iterator \
${std_srcdir}/limits \
+   ${std_srcdir}/mdspan \
${std_srcdir}/memory \
${std_srcdir}/numbers \
${std_srcdir}/numeric \
diff --git a/libstdc++-v3/include/Makefile.in b/libstdc++-v3/include/Makefile.in
index 7b96b2207f8..c96e981acd6 100644
--- a/libstdc++-v3/include/Makefile.in
+++ b/libstdc++-v3/include/Makefile.in
@@ -396,6 +396,7 @@ std_freestanding = \
${std_srcdir}/generator \
${std_srcdir}/iterator \
${std_srcdir}/limits \
+   ${std_srcdir}/mdspan \
${std_srcdir}/memory \
${std_srcdir}/numbers \
${std_srcdir}/numeric \
diff --git a/libstdc++-v3/include/precompiled/stdc++.h 
b/libstdc++-v3/include/precompiled/stdc++.h
index f4b312d9e47..e7d89c92704 100644
--- a/libstdc++-v3/include/precompiled/stdc++.h
+++ b/libstdc++-v3/include/precompiled/stdc++.h
@@ -228,6 +228,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
diff --git a/libstdc++-v3/include/std/mdspan b/libstdc++-v3/include/std/mdspan
new file mode 100644
index 000..4094a416d1e
--- /dev/null
+++ b/libstdc++-v3/include/std/mdspan
@@ -0,0 +1,48 @@
+//  -*- C++ -*-
+
+// Copyright (C) 2025 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// Under Section 7 of GPL version 3, you are granted additional
+// permissions described in the GCC Runtime Library Exception, version
+// 3.1, as published by the Free Software Foundation.
+
+// You should have received a copy of the GNU General Public License and
+// a copy of the GCC Runtime Library Exception along with this program;
+// see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+// .
+
+/** @file mdspan
+ *  This is a Standard C++ Library header.
+ */
+
+#ifndef _GLIBCXX_MDSPAN
+#define _GLIBCXX_MDSPAN 1
+
+#ifdef _GLIBCXX_SYSHDR
+#pragma GCC system_header
+#endif
+
+#define __glibcxx_want_mdspan
+#include 
+
+#ifdef __glibcxx_mdspan
+
+namespace std _GLIBCXX_VISIBILITY(default)
+{
+_GLIBCXX_BEGIN_NAMESPACE_VERSION
+
+_GLIBCXX_END_NAMESPACE_VERSION
+}
+#endif
+#endif
-- 
2.48.1



Re: [PATCH] AArch64: Fix operands order in vec_extract expander

2025-04-15 Thread Richard Sandiford
Tejas Belagod  writes:
> The operand order to gen_vcond_mask call in the vec_extract pattern is wrong.
> Fix the order where predicate is operand 3.
>
> Tested and bootstrapped on aarch64-linux-gnu. OK for trunk?
>
> gcc/ChangeLog
>
>   * config/aarch64/aarch64-sve.md (vec_extract): Fix operand
>   order to gen_vcond_mask_*.

Thanks, LGTM too.

Richard

> ---
>  gcc/config/aarch64/aarch64-sve.md | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/gcc/config/aarch64/aarch64-sve.md 
> b/gcc/config/aarch64/aarch64-sve.md
> index 3dbd65986ec..d4af3706294 100644
> --- a/gcc/config/aarch64/aarch64-sve.md
> +++ b/gcc/config/aarch64/aarch64-sve.md
> @@ -3133,9 +3133,9 @@
>"TARGET_SVE"
>{
>  rtx tmp = gen_reg_rtx (mode);
> -emit_insn (gen_vcond_mask_ (tmp, operands[1],
> -  CONST1_RTX (mode),
> -  CONST0_RTX (mode)));
> +emit_insn (gen_vcond_mask_ (tmp, CONST1_RTX (mode),
> +  CONST0_RTX (mode),
> +  operands[1]));
>  emit_insn (gen_vec_extract (operands[0], tmp, operands[2]));
>  DONE;
>}


[committed] libstdc++: Enable __gnu_test::test_container constructor for C++98

2025-04-15 Thread Jonathan Wakely
The only reason this constructor wasn't defined for C++98 is that it
uses constructor delegation, but that isn't necessary.

libstdc++-v3/ChangeLog:

* testsuite/util/testsuite_iterators.h (test_container): Define
array constructor for C++98 as well.
---

I don't remember why I defined this using a delegating constructor when
it's actually fewer characters to call the base constructor directly.

Tested x86_64-linux. Pushed to trunk.

 libstdc++-v3/testsuite/util/testsuite_iterators.h | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/libstdc++-v3/testsuite/util/testsuite_iterators.h 
b/libstdc++-v3/testsuite/util/testsuite_iterators.h
index 0df6dcc5af5..20539ecaca6 100644
--- a/libstdc++-v3/testsuite/util/testsuite_iterators.h
+++ b/libstdc++-v3/testsuite/util/testsuite_iterators.h
@@ -610,12 +610,10 @@ namespace __gnu_test
 test_container(T* _first, T* _last) : bounds(_first, _last)
 { }
 
-#if __cplusplus >= 201103L
 template
   explicit
-  test_container(T (&arr)[N]) : test_container(arr, arr+N)
+  test_container(T (&arr)[N]) : bounds(arr, arr+N)
   { }
-#endif
 
 ItType
 it(int pos)
-- 
2.49.0



[PATCH v2 0/4] Implement extents from the mdspan header.

2025-04-15 Thread Luc Grosheintz
This is the second version of:
https://gcc.gnu.org/pipermail/libstdc++/2025-April/060914.html

Following the comments from Jonathan, I reorganized the tests. Using the
very effective tricks pointed out by Tomasz, led to a much more concise
implementation of `_ExtentsStorage`. If desired, this could be taken
further and the helper `_ExtentsStorage` could be completely merged into
`extents`.

Changes since v1:
  * All `_neg` tests are replaced by `static_assert`s. Additionally,
some other small tests are merged into related files.
  * Assigning the elements of `_M_dynamic_extents` in a for loop,
eliminates the need for `index_sequence`.
  * The `consteval` trick enables removing all template meta-programming
helpers needed to create `_S_dynamic_index{,_inv}`.
  * The trick to use `initializer_list` removes the need for the ctor
`_ExtentsStorage(Integrals...)`.
  * Introduces a concept for checking if `_OIndexType` is suitable, i.e.
convertible & nothrow constructible; and replaces `typename` with
the concept.
  * The internal namespace for details is renamed to `__mdspan`.
  * Regenerates `include/bits/version.h` to respect `no_stdname`.
  * Fixes the bug in `extents::size_type`.
  * Changes the integer conversion `_M_int_cast` as suggested.
  * Fixed `#include ` in `precompiled/stdc++.h`.
  * Replaced the `array` used for storing the dynamic
extents with to a custom type `__mdspan::__array`. See Point 2
below.

I'd like to point out the following:

1. When calling the ctor `extents(IntLike{}, IntLike{})` the
   user-defined conversion operator of `IntLike` will be called twice
   when creating the `initializer_list`, even though ultimately we don't
   need either of the two values, because both extents are static.

2. In v1 the implementation used `array` to store the dynamic
   extents when there are none. Therefore, `[[no_unique_address]]` had
   no effect. In v2, I've replaced the `array` with a custom type
   that works with `[[no_unique_address]]`.

The patches where tested with `make check-target-libstdc++-v3`.

Thank you Tomasz Kaminski and Jonathan Wakely for your helpful review!

Luc Grosheintz (4):
  libstdc++: Setup internal FTM for mdspan.
  libstdc++: Add header mdspan to the build-system.
  libstdc++: Implement std::extents [PR107761].
  libstdc++: Add tests for std::extents.

 libstdc++-v3/doc/doxygen/user.cfg.in  |   1 +
 libstdc++-v3/include/Makefile.am  |   1 +
 libstdc++-v3/include/Makefile.in  |   1 +
 libstdc++-v3/include/bits/version.def |   9 +
 libstdc++-v3/include/bits/version.h   |   9 +
 libstdc++-v3/include/precompiled/stdc++.h |   1 +
 libstdc++-v3/include/std/mdspan   | 352 ++
 libstdc++-v3/src/c++23/std.cc.in  |   6 +-
 .../23_containers/mdspan/extents/assign.cc|  29 ++
 .../mdspan/extents/class_properties.cc|  62 +++
 .../23_containers/mdspan/extents/ctor_copy.cc |  75 
 .../mdspan/extents/ctor_copy_constexpr.cc |  20 +
 .../23_containers/mdspan/extents/ctor_ints.cc |  58 +++
 .../mdspan/extents/ctor_ints_constexpr.cc |  12 +
 .../mdspan/extents/ctor_shape_all_extents.cc  |  61 +++
 .../mdspan/extents/ctor_shape_constexpr.cc|  23 ++
 .../extents/ctor_shape_dynamic_extents.cc |  91 +
 .../mdspan/extents/custom_integer.cc  |  87 +
 .../mdspan/extents/deduction_guide.cc |  34 ++
 .../23_containers/mdspan/extents/dextents.cc  |  11 +
 .../23_containers/mdspan/extents/extent.cc|  24 ++
 .../23_containers/mdspan/extents/ops_eq.cc|  58 +++
 22 files changed, 1024 insertions(+), 1 deletion(-)
 create mode 100644 libstdc++-v3/include/std/mdspan
 create mode 100644 
libstdc++-v3/testsuite/23_containers/mdspan/extents/assign.cc
 create mode 100644 
libstdc++-v3/testsuite/23_containers/mdspan/extents/class_properties.cc
 create mode 100644 
libstdc++-v3/testsuite/23_containers/mdspan/extents/ctor_copy.cc
 create mode 100644 
libstdc++-v3/testsuite/23_containers/mdspan/extents/ctor_copy_constexpr.cc
 create mode 100644 
libstdc++-v3/testsuite/23_containers/mdspan/extents/ctor_ints.cc
 create mode 100644 
libstdc++-v3/testsuite/23_containers/mdspan/extents/ctor_ints_constexpr.cc
 create mode 100644 
libstdc++-v3/testsuite/23_containers/mdspan/extents/ctor_shape_all_extents.cc
 create mode 100644 
libstdc++-v3/testsuite/23_containers/mdspan/extents/ctor_shape_constexpr.cc
 create mode 100644 
libstdc++-v3/testsuite/23_containers/mdspan/extents/ctor_shape_dynamic_extents.cc
 create mode 100644 
libstdc++-v3/testsuite/23_containers/mdspan/extents/custom_integer.cc
 create mode 100644 
libstdc++-v3/testsuite/23_containers/mdspan/extents/deduction_guide.cc
 create mode 100644 
libstdc++-v3/testsuite/23_containers/mdspan/extents/dextents.cc
 create mode 100644 
libstdc++-v3/testsuite/23_containers/mdspan/extents/extent.cc
 create mode 100644 
libstdc++-v3/testsuite/23_containers/mdspan/extents/ops_eq.cc

-- 
2.

Re: [PATCH v2 3/4] libstdc++: Implement std::extents [PR107761].

2025-04-15 Thread Tomasz Kaminski
On Tue, Apr 15, 2025 at 10:43 AM Luc Grosheintz 
wrote:

> This implements std::extents from  according to N4950 and
> contains partial progress towards PR107761.
>
> If an extent changes its type, there's a precondition in the standard,
> that the value is representable in the target integer type. This commit
> uses direct initialization to perform the conversion, without any
> additional checks.
>
> The precondition for 'extents::{static_,}extent' is that '__r < rank()'.
> For extents this precondition is always violated and results in
> calling __builtin_trap. For all other specializations it's checked via
> __glibcxx_assert.
>
> PR libstdc++/107761
>
> libstdc++-v3/ChangeLog:
>
> * include/std/mdspan (extents): New class.
> * src/c++23/std.cc.in: Add 'using std::extents'.
>
> Signed-off-by: Luc Grosheintz 
>
Looks really good, thanks. A bunch of small suggestions.

> ---
>  libstdc++-v3/include/std/mdspan  | 304 +++
>  libstdc++-v3/src/c++23/std.cc.in |   6 +-
>  2 files changed, 309 insertions(+), 1 deletion(-)
>
> diff --git a/libstdc++-v3/include/std/mdspan
> b/libstdc++-v3/include/std/mdspan
> index 4094a416d1e..72ca3445d15 100644
> --- a/libstdc++-v3/include/std/mdspan
> +++ b/libstdc++-v3/include/std/mdspan
> @@ -33,6 +33,10 @@
>  #pragma GCC system_header
>  #endif
>
> +#include 
> +#include 
> +#include 
> +
>  #define __glibcxx_want_mdspan
>  #include 
>
> @@ -41,6 +45,306 @@
>  namespace std _GLIBCXX_VISIBILITY(default)
>  {
>  _GLIBCXX_BEGIN_NAMESPACE_VERSION
> +  namespace __mdspan
> +  {
> +template
> +  class __array
>
Any reason for using __array here, instead of std::array.
We already need to pull that header, because of the constructors.

> +  {
> +  public:
> +   constexpr _Tp&
> + operator[](size_t __n) noexcept
> + {
> +   return _M_elems[__n];
> + }
> +
> +   constexpr const _Tp&
> + operator[](size_t __n) const noexcept
> + {
> +   return _M_elems[__n];
> + }
> +
> +  private:
> +   array<_Tp, _Nm> _M_elems;
> +  };
> +
> +template
> +  class __array<_Tp, 0>
> +  {
> +  public:
> +   constexpr _Tp&
> + operator[](size_t __n) noexcept
> + {
> +   __builtin_trap();
> + }
> +
> +   constexpr const _Tp&
> + operator[](size_t __n) const noexcept
> + {
> +   __builtin_trap();
> + }
> +  };
> +
> +template
>
I would use NTTP of the std::array here, and eliminate internal _S_exts.
Your __array is not structural, because it has private members.

> +  class _ExtentsStorage
> +  {
> +  public:
> +   static constexpr bool
> +   _M_is_dyn(size_t __ext) noexcept
> +   { return __ext == dynamic_extent; }
> +
> +   template
> + static constexpr _IndexType
> + _M_int_cast(const _OIndexType& __other) noexcept
> + { return _IndexType(__other); }
> +
> +   static constexpr size_t _S_rank = sizeof...(_Extents);
> +   static constexpr array _S_exts{_Extents...};
> +
> +   // For __r in [0, _S_rank], _S_dynamic_index[__r] is the number
> +   // of dynamic extents up to (and not including) __r.
> +   //
> +   // If __r is the index of a dynamic extent, then
> +   // _S_dynamic_index[__r] is the index of that extent in
> +   // _M_dynamic_extents.
> +   static constexpr auto _S_dynamic_index = [] consteval
> +   {
> + array __ret;
> + size_t __dyn = 0;
> + for(size_t __i = 0; __i < _S_rank; ++__i)
> +   {
> + __ret[__i] = __dyn;
> + __dyn += _M_is_dyn(_S_exts[__i]);
> +   }
> + __ret[_S_rank] = __dyn;
> + return __ret;
> +   }();
> +
> +   static constexpr size_t _S_rank_dynamic =
> _S_dynamic_index[_S_rank];
> +
> +   // For __r in [0, _S_rank_dynamic), _S_dynamic_index_inv[__r] is
> the
> +   // index of the __r-th dynamic extent in _S_exts.
> +   static constexpr auto _S_dynamic_index_inv = [] consteval
> +   {
> + array __ret;
> + for (size_t __i = 0, __r = 0; __i < _S_rank; ++__i)
> +   if (_M_is_dyn(_S_exts[__i]))
> + __ret[__r++] = __i;
> + return __ret;
> +   }();
> +
> +   static constexpr size_t
> +   _M_static_extent(size_t __r) noexcept
> +   { return _S_exts[__r]; }
> +
> +   constexpr _IndexType
> +   _M_extent(size_t __r) const noexcept
> +   {
> + auto __se = _S_exts[__r];
> + if (__se == dynamic_extent)
> +   return _M_dynamic_extents[_S_dynamic_index[__r]];
> + else
> +   return __se;
> +   }
> +
> +  private:
>
This private is not needed here.

> +   template
> + constexpr void
> + _M_init_dynamic_extents(_GetOtherExtent __get_extent) noexcept
> + {
> +   for(size_t __i = 0; __i < _S_rank_dynamic; 

[committed] testsuite: Fix up ipa/pr119318.c test [PR119318]

2025-04-15 Thread Jakub Jelinek
Hi!

dg-additional-options followed by dg-options is ignored.  I've added the
-w from there to dg-options and removed dg-additional-options.

Tested on x86_64-linux, committed to trunk as obvious.

2025-04-15  Jakub Jelinek  

PR ipa/119318
* gcc.dg/ipa/pr119318.c: Remove dg-additional-options, add -w to
dg-options.

--- gcc/testsuite/gcc.dg/ipa/pr119318.c.jj  2025-04-14 16:01:28.082745338 
+0200
+++ gcc/testsuite/gcc.dg/ipa/pr119318.c 2025-04-15 12:16:53.298801324 +0200
@@ -1,7 +1,6 @@
 /* { dg-do run } */
 /* { dg-require-effective-target int128 } */
-/* { dg-additional-options "-Wno-psabi -w" } */
-/* { dg-options "-Wno-psabi -O2" } */
+/* { dg-options "-Wno-psabi -w -O2" } */
 
 typedef unsigned V __attribute__((vector_size (64)));
 typedef unsigned __int128 W __attribute__((vector_size (64)));

Jakub



[PATCH v1] FMV: Redirect to specific target

2025-04-15 Thread Alfie Richards
Hi all,

This is an optimisation similar to the one discussed in [1] and posted in [2].

This is slightly stronger as it makes use of the callee version information
*and* caller information, enabling slightly more cases to be covered.

This also means it can replace most of the cases that the previous optimisation
covered, where two version sets implement the same set of versions. Some cases
will be dropped where there are genuinely higher priority versions that could
be selected, but in my opinion that's okay.

This requires my FMV patch series. Mostly due to it relying on the function
versions being sorted by priority order, but it also uses some helper functions.

Any FMV target would benefit from implementing
TARGET_OPTION_VERSION_A_IMPLIES_VERSION_B to enable more redirection cases,
but there is a default implementation which just checks for matching target/
target_version attribute values.

This is reg tested on aarch64 and ix86 linux gnu.
(Notably this includes gcc/testsuite/g++.target/i386/pr82625.C which tests
the previous optimisation).

I've made a forgejo PR here if reviewers want to try that:
https://forge.sourceware.org/alfie.richards/gcc-TEST/pulls/2

[1] https://patchwork.sourceware.org/comment/197172/
[2] https://gcc.gnu.org/pipermail/gcc-patches/2025-April/680876.html

Kind regards,
Alfie Richards

-- >8 --

Adds an optimisation in FMV to redirect to a specific target if possible.

A call is redirected to a specific target if both:
- the caller can always call the callee version
- and, it is possible to rule out all higher priority versions of the callee
  fmv set. That is estabilished either by the callee being the highest priority
  version, or each higher priority version of the callee implying that, were it
  resolved, a higher priority version of the caller would have been selected.

For this logic, introduces the new TARGET_OPTION_VERSION_A_IMPLIES_VERSION_B
hook. Adds a full implementation for Aarch64, and a weaker default version
for other targets.

This allows the target to replace the previous optimisation as the new one is
able to cover the same case where two function sets implement the same versions.

gcc/ChangeLog:

* config/aarch64/aarch64.cc (aarch64_version_a_implies_version_b): New
function.
(TARGET_OPTION_VERSION_A_IMPLIES_VERSION_B): New define.
* doc/tm.texi: Regenerate.
* doc/tm.texi.in: Add documentation for version_a_implies_version_b.
* multiple_target.cc (redirect_to_specific_clone): Add new optimisation
logic.
(ipa_target_clone): Add
* target.def: Remove TARGET_HAS_FMV_TARGET_ATTRIBUTE check.
* attribs.cc: (version_a_implies_version_b) New function.
* attribs.h: (version_a_implies_version_b) New function.

gcc/testsuite/ChangeLog:

* g++.target/aarch64/fmv-selection1.C: New test.
* g++.target/aarch64/fmv-selection2.C: New test.
* g++.target/aarch64/fmv-selection3.C: New test.
* g++.target/aarch64/fmv-selection4.C: New test.
* g++.target/aarch64/fmv-selection5.C: New test.
* g++.target/aarch64/fmv-selection6.C: New test.
---
 gcc/attribs.cc| 16 
 gcc/attribs.h |  1 +
 gcc/config/aarch64/aarch64.cc | 26 +
 gcc/doc/tm.texi   |  4 +
 gcc/doc/tm.texi.in|  2 +
 gcc/multiple_target.cc| 96 ---
 gcc/target.def|  9 ++
 .../g++.target/aarch64/fmv-selection1.C   | 40 
 .../g++.target/aarch64/fmv-selection2.C   | 40 
 .../g++.target/aarch64/fmv-selection3.C   | 25 +
 .../g++.target/aarch64/fmv-selection4.C   | 30 ++
 .../g++.target/aarch64/fmv-selection5.C   | 28 ++
 .../g++.target/aarch64/fmv-selection6.C   | 27 ++
 13 files changed, 311 insertions(+), 33 deletions(-)
 create mode 100644 gcc/testsuite/g++.target/aarch64/fmv-selection1.C
 create mode 100644 gcc/testsuite/g++.target/aarch64/fmv-selection2.C
 create mode 100644 gcc/testsuite/g++.target/aarch64/fmv-selection3.C
 create mode 100644 gcc/testsuite/g++.target/aarch64/fmv-selection4.C
 create mode 100644 gcc/testsuite/g++.target/aarch64/fmv-selection5.C
 create mode 100644 gcc/testsuite/g++.target/aarch64/fmv-selection6.C

diff --git a/gcc/attribs.cc b/gcc/attribs.cc
index 2ca82674f7c..66c77904404 100644
--- a/gcc/attribs.cc
+++ b/gcc/attribs.cc
@@ -1095,6 +1095,22 @@ common_function_versions (string_slice fn1 
ATTRIBUTE_UNUSED,
   gcc_unreachable ();
 }
 
+bool
+version_a_implies_version_b (tree fn1, tree fn2)
+{
+  const char *attr_name = TARGET_HAS_FMV_TARGET_ATTRIBUTE
+ ? "target"
+ : "target_version";
+
+  tree attr1 = lookup_attribute (attr_name, DECL_ATTRIBUTES (fn1));
+  tree attr2 = lookup_attribute (attr_name, DECL_ATTRIBUTES (fn2));
+
+  if (!attr1 || !attr2

[PATCH v4 07/20] Change make_attribute to take string_slice.

2025-04-15 Thread Alfie Richards
gcc/ChangeLog:

* attribs.cc (make_attribute): Change arguments.
* attribs.h (make_attribute): Change arguments.

Approved by Richard Sandiford.
---
 gcc/attribs.cc | 16 +---
 gcc/attribs.h  |  2 +-
 2 files changed, 6 insertions(+), 12 deletions(-)

diff --git a/gcc/attribs.cc b/gcc/attribs.cc
index f6667839c01..3fce9d62525 100644
--- a/gcc/attribs.cc
+++ b/gcc/attribs.cc
@@ -1076,21 +1076,15 @@ apply_tm_attr (tree fndecl, tree attr)
it to CHAIN.  */
 
 tree
-make_attribute (const char *name, const char *arg_name, tree chain)
+make_attribute (string_slice name, string_slice arg_name, tree chain)
 {
-  tree attr_name;
-  tree attr_arg_name;
-  tree attr_args;
-  tree attr;
-
-  attr_name = get_identifier (name);
-  attr_arg_name = build_string (strlen (arg_name), arg_name);
-  attr_args = tree_cons (NULL_TREE, attr_arg_name, NULL_TREE);
-  attr = tree_cons (attr_name, attr_args, chain);
+  tree attr_name = get_identifier_with_length (name.begin (), name.size ());
+  tree attr_arg_name = build_string (arg_name.size (), arg_name.begin ());
+  tree attr_args = tree_cons (NULL_TREE, attr_arg_name, NULL_TREE);
+  tree attr = tree_cons (attr_name, attr_args, chain);
   return attr;
 }
 
-
 /* Common functions used for target clone support.  */
 
 /* Comparator function to be used in qsort routine to sort attribute
diff --git a/gcc/attribs.h b/gcc/attribs.h
index 4b946390f76..b8b6838599c 100644
--- a/gcc/attribs.h
+++ b/gcc/attribs.h
@@ -45,7 +45,7 @@ extern bool cxx11_attribute_p (const_tree);
 extern tree get_attribute_name (const_tree);
 extern tree get_attribute_namespace (const_tree);
 extern void apply_tm_attr (tree, tree);
-extern tree make_attribute (const char *, const char *, tree);
+extern tree make_attribute (string_slice, string_slice, tree);
 extern bool attribute_ignored_p (tree);
 extern bool attribute_ignored_p (const attribute_spec *const);
 extern bool any_nonignored_attribute_p (tree);
-- 
2.34.1



[PATCH v4 01/20] Add PowerPC FMV symbol tests.

2025-04-15 Thread Alfie Richards
From: Alice Carlotti 

This tests the mangling of function assembly names when annotated with
target_clones attributes.

gcc/testsuite/ChangeLog:

* g++.target/powerpc/mvc-symbols1.C: New test.
* g++.target/powerpc/mvc-symbols2.C: New test.
* g++.target/powerpc/mvc-symbols3.C: New test.
* g++.target/powerpc/mvc-symbols4.C: New test.

Co-authored-by: Alfie Richards 
---
 .../g++.target/powerpc/mvc-symbols1.C | 47 +++
 .../g++.target/powerpc/mvc-symbols2.C | 35 ++
 .../g++.target/powerpc/mvc-symbols3.C | 41 
 .../g++.target/powerpc/mvc-symbols4.C | 29 
 4 files changed, 152 insertions(+)
 create mode 100644 gcc/testsuite/g++.target/powerpc/mvc-symbols1.C
 create mode 100644 gcc/testsuite/g++.target/powerpc/mvc-symbols2.C
 create mode 100644 gcc/testsuite/g++.target/powerpc/mvc-symbols3.C
 create mode 100644 gcc/testsuite/g++.target/powerpc/mvc-symbols4.C

diff --git a/gcc/testsuite/g++.target/powerpc/mvc-symbols1.C 
b/gcc/testsuite/g++.target/powerpc/mvc-symbols1.C
new file mode 100644
index 000..9424382bf14
--- /dev/null
+++ b/gcc/testsuite/g++.target/powerpc/mvc-symbols1.C
@@ -0,0 +1,47 @@
+/* { dg-do compile } */
+/* { dg-require-ifunc "" } */
+/* { dg-options "-O0" } */
+
+__attribute__((target_clones("default", "cpu=power6", "cpu=power6x")))
+int foo ()
+{
+  return 1;
+}
+
+__attribute__((target_clones("cpu=power6x", "cpu=power6", "default")))
+int foo (int)
+{
+  return 2;
+}
+
+int bar()
+{
+  return foo ();
+}
+
+int bar(int x)
+{
+  return foo (x);
+}
+
+/* { dg-final { scan-assembler-times "\n_Z3foov\.default:\n" 1 } } */
+/* { dg-final { scan-assembler-times "\n_Z3foov\.cpu_power6:\n" 1 } } */
+/* { dg-final { scan-assembler-times "\n_Z3foov\.cpu_power6x:\n" 1 } } */
+/* { dg-final { scan-assembler-times "\n_Z3foov\.resolver:\n" 1 } } */
+/* { dg-final { scan-assembler-times "\n\tbl _Z3foov\n" 1 } } */
+/* { dg-final { scan-assembler-times "\n\t\.type\t_Z3foov, 
@gnu_indirect_function\n" 1 } } */
+/* { dg-final { scan-assembler-times "\n\t\.set\t_Z3foov,_Z3foov\.resolver\n" 
1 } } */
+/* { dg-final { scan-assembler-times "\n\t\.quad\t_Z3foov\.default\n" 1 } } */
+/* { dg-final { scan-assembler-times "\n\t\.quad\t_Z3foov\.cpu_power6\n" 1 } } 
*/
+/* { dg-final { scan-assembler-times "\n\t\.quad\t_Z3foov\.cpu_power6x\n" 0 } 
} */
+
+/* { dg-final { scan-assembler-times "\n_Z3fooi\.default:\n" 1 } } */
+/* { dg-final { scan-assembler-times "\n_Z3fooi\.cpu_power6:\n" 1 } } */
+/* { dg-final { scan-assembler-times "\n_Z3fooi\.cpu_power6x:\n" 1 } } */
+/* { dg-final { scan-assembler-times "\n_Z3fooi\.resolver:\n" 1 } } */
+/* { dg-final { scan-assembler-times "\n\tbl _Z3fooi\n" 1 } } */
+/* { dg-final { scan-assembler-times "\n\t\.type\t_Z3fooi, 
@gnu_indirect_function\n" 1 } } */
+/* { dg-final { scan-assembler-times "\n\t\.set\t_Z3fooi,_Z3fooi\.resolver\n" 
1 } } */
+/* { dg-final { scan-assembler-times "\n\t\.quad\t_Z3fooi\.default\n" 1 } } */
+/* { dg-final { scan-assembler-times "\n\t\.quad\t_Z3fooi\.cpu_power6\n" 0 } } 
*/
+/* { dg-final { scan-assembler-times "\n\t\.quad\t_Z3fooi\.cpu_power6x\n" 1 } 
} */
diff --git a/gcc/testsuite/g++.target/powerpc/mvc-symbols2.C 
b/gcc/testsuite/g++.target/powerpc/mvc-symbols2.C
new file mode 100644
index 000..edf54480efd
--- /dev/null
+++ b/gcc/testsuite/g++.target/powerpc/mvc-symbols2.C
@@ -0,0 +1,35 @@
+/* { dg-do compile } */
+/* { dg-require-ifunc "" } */
+/* { dg-options "-O0" } */
+
+__attribute__((target_clones("default", "cpu=power6", "cpu=power6x")))
+int foo ()
+{
+  return 1;
+}
+
+__attribute__((target_clones("cpu=power6x", "cpu=power6", "default")))
+int foo (int)
+{
+  return 2;
+}
+
+/* { dg-final { scan-assembler-times "\n_Z3foov\.default:\n" 1 } } */
+/* { dg-final { scan-assembler-times "\n_Z3foov\.cpu_power6:\n" 1 } } */
+/* { dg-final { scan-assembler-times "\n_Z3foov\.cpu_power6x:\n" 1 } } */
+/* { dg-final { scan-assembler-times "\n_Z3foov\.resolver:\n" 1 } } */
+/* { dg-final { scan-assembler-times "\n\t\.type\t_Z3foov, 
@gnu_indirect_function\n" 1 } } */
+/* { dg-final { scan-assembler-times "\n\t\.set\t_Z3foov,_Z3foov\.resolver\n" 
1 } } */
+/* { dg-final { scan-assembler-times "\n\t\.quad\t_Z3foov\.default\n" 1 } } */
+/* { dg-final { scan-assembler-times "\n\t\.quad\t_Z3foov\.cpu_power6\n" 1 } } 
*/
+/* { dg-final { scan-assembler-times "\n\t\.quad\t_Z3foov\.cpu_power6x\n" 0 } 
} */
+
+/* { dg-final { scan-assembler-times "\n_Z3fooi\.default:\n" 1 } } */
+/* { dg-final { scan-assembler-times "\n_Z3fooi\.cpu_power6:\n" 1 } } */
+/* { dg-final { scan-assembler-times "\n_Z3fooi\.cpu_power6x:\n" 1 } } */
+/* { dg-final { scan-assembler-times "\n_Z3fooi\.resolver:\n" 1 } } */
+/* { dg-final { scan-assembler-times "\n\t\.type\t_Z3fooi, 
@gnu_indirect_function\n" 1 } } */
+/* { dg-final { scan-assembler-times "\n\t\.set\t_Z3fooi,_Z3fooi\.resolver\n" 
1 } } */
+/* { dg-final { scan-assembler-times "\n\t\.q

[PATCH v4 05/20] Update is_function_default_version to work with target_version.

2025-04-15 Thread Alfie Richards
Notably this respects target_version semantics where an unannotated
function can be the default version.

gcc/ChangeLog:

* attribs.cc (is_function_default_version): Add target_version logic.

Approved by Richard Sandiford.
---
 gcc/attribs.cc | 27 ---
 1 file changed, 20 insertions(+), 7 deletions(-)

diff --git a/gcc/attribs.cc b/gcc/attribs.cc
index 56dd18c2fa8..f6667839c01 100644
--- a/gcc/attribs.cc
+++ b/gcc/attribs.cc
@@ -1279,18 +1279,31 @@ make_dispatcher_decl (const tree decl)
   return func_decl;
 }
 
-/* Returns true if DECL is multi-versioned using the target attribute, and this
-   is the default version.  This function can only be used for targets that do
-   not support the "target_version" attribute.  */
+/* Returns true if DECL a multiversioned default.
+   With the target attribute semantics, returns true if the function is marked
+   as default with the target version.
+   With the target_version attribute semantics, returns true if the function
+   is either not annotated, or annotated as default.  */
 
 bool
 is_function_default_version (const tree decl)
 {
-  if (TREE_CODE (decl) != FUNCTION_DECL
-  || !DECL_FUNCTION_VERSIONED (decl))
+  tree attr;
+  if (TREE_CODE (decl) != FUNCTION_DECL)
 return false;
-  tree attr = lookup_attribute ("target", DECL_ATTRIBUTES (decl));
-  gcc_assert (attr);
+  if (TARGET_HAS_FMV_TARGET_ATTRIBUTE)
+{
+  if (!DECL_FUNCTION_VERSIONED (decl))
+   return false;
+  attr = lookup_attribute ("target", DECL_ATTRIBUTES (decl));
+  gcc_assert (attr);
+}
+  else
+{
+  attr = lookup_attribute ("target_version", DECL_ATTRIBUTES (decl));
+  if (!attr)
+   return true;
+}
   attr = TREE_VALUE (TREE_VALUE (attr));
   return (TREE_CODE (attr) == STRING_CST
  && strcmp (TREE_STRING_POINTER (attr), "default") == 0);
-- 
2.34.1



[PATCH v4 14/20] Add reject_target_clone hook for filtering target_clone versions.

2025-04-15 Thread Alfie Richards
This patch introduces the TARGET_REJECT_FUNCTION_CLONE_VERSION hook
which is used to determine if a target_clones version string parses.

If true is returned, a warning is emitted and from then on the version
is ignored.

This is as specified in the Arm C Language Extension. The purpose of this
is to allow some portability of code using target_clones attributes.

Currently this is only properly implemented for the Aarch64 backend.

For riscv which is the only other backend which uses target_version
semantics a partial implementation is present, where this hook is used
to check parsing, in which errors will be emitted on a failed parse
rather than warnings. A refactor of the riscv parsing logic would be
required to enable this functionality fully.

This fixes PR 118339 where parse failures could cause ICE in Aarch64.

gcc/ChangeLog:

PR target/118339
* target.def: Add reject_target_clone_version hook.
* tree.cc (get_clone_attr_versions): Add filter and location argument.
(get_clone_versions): Update call to get_clone_attr_versions.
* tree.h (get_clone_attr_versions): Add filter and location argument.
* config/aarch64/aarch64.cc (aarch64_reject_target_clone_version):
New function
(TARGET_REJECT_FUNCTION_CLONE_VERSION): New define.
* config/riscv/riscv.cc (riscv_reject_target_clone_version):
New function.
(TARGET_REJECT_FUNCTION_CLONE_VERSION): New define.
* doc/tm.texi: Regenerated.
* doc/tm.texi.in: Add documentation for new hook.
* hooks.h (hook_stringslice_locationt_false): New function.
* hooks.cc (hook_stringslice_locationt_false): New function.

gcc/c-family/ChangeLog:

* c-attribs.cc (handle_target_clones_attribute): Update to emit warnings
for rejected versions.
---
 gcc/c-family/c-attribs.cc | 26 +-
 gcc/config/aarch64/aarch64.cc | 20 
 gcc/config/riscv/riscv.cc | 18 ++
 gcc/doc/tm.texi   |  5 +
 gcc/doc/tm.texi.in|  2 ++
 gcc/hooks.cc  |  6 ++
 gcc/hooks.h   |  3 +++
 gcc/target.def|  8 
 gcc/tree.cc   | 12 ++--
 gcc/tree.h|  8 ++--
 10 files changed, 99 insertions(+), 9 deletions(-)

diff --git a/gcc/c-family/c-attribs.cc b/gcc/c-family/c-attribs.cc
index 5dff489fcca..b5287f0da06 100644
--- a/gcc/c-family/c-attribs.cc
+++ b/gcc/c-family/c-attribs.cc
@@ -6132,12 +6132,28 @@ handle_target_clones_attribute (tree *node, tree name, 
tree ARG_UNUSED (args),
}
}
 
-  auto_vec versions= get_clone_attr_versions (args, NULL);
-
-  if (versions.length () == 1)
-   {
+  int num_defaults = 0;
+  auto_vec versions= get_clone_attr_versions (args,
+ &num_defaults,
+ DECL_SOURCE_LOCATION (*node),
+ false);
+
+  for (auto v : versions)
+   if (targetm.reject_function_clone_version
+ (v, DECL_SOURCE_LOCATION (*node)))
  warning (OPT_Wattributes,
-  "single % attribute is ignored");
+  "invalid % version %qB ignored",
+  &v);
+
+  /* Lone target_clones version is always ignored for target attr 
semantics.
+Only ignore under target_version semantics if it is a default
+version.  */
+  if (versions.length () == 1 && (TARGET_HAS_FMV_TARGET_ATTRIBUTE
+ || num_defaults == 1))
+   {
+ if (TARGET_HAS_FMV_TARGET_ATTRIBUTE)
+   warning (OPT_Wattributes,
+"single % attribute is ignored");
  *no_add_attrs = true;
}
   else
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 99e351fb65b..43ac50c7734 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -31229,6 +31229,23 @@ aarch64_expand_reversed_crc_using_pmull (scalar_mode 
crc_mode,
 }
 }
 
+bool
+aarch64_reject_target_clone_version (string_slice str,
+location_t loc ATTRIBUTE_UNUSED)
+{
+  str = str.strip ();
+
+  if (str == "default")
+return false;
+
+  enum aarch_parse_opt_result parse_res;
+  auto isa_flags = aarch64_asm_isa_flags;
+  parse_res = aarch64_parse_fmv_features (str, &isa_flags, NULL, NULL);
+
+  /* Reject any version which does not parse.  */
+  return parse_res != AARCH_PARSE_OK;
+}
+
 /* Target-specific selftests.  */
 
 #if CHECKING_P
@@ -32052,6 +32069,9 @@ aarch64_libgcc_floating_mode_supported_p
 #undef TARGET_OPTION_FUNCTION_VERSIONS
 #define TARGET_OPTION_FUNCTION_VERSIONS aarch64_common_function_versions
 
+#undef TARGET_REJECT_FUNCTION_CLONE_VERSION
+#define TARGET_REJECT_FUNCTION_CLONE_VERSION 
aarch64_reject_target_clone_version
+
 #undef TARGET_COMPARE_VERSION_PRIORITY
 #def

[PATCH v4 15/20] Change target_version semantics to follow ACLE specification.

2025-04-15 Thread Alfie Richards
This patch changes the semantics of target_version and target_clones attributes
to match the behavior described in the Arm C Language extension.

The changes to behavior are:

- The scope and signature of an FMV function set is now that of the default
  version.
- The FMV resolver is now created at the locations of the default version
  implementation. Previously this was at the first call to an FMV function.
- When a TU has a single annotated function version, it gets mangled.
  - This includes a lone annotated default version.

This only affects targets with TARRGET_HAS_FMV_TARGET_ATTRIBUTE set to false.
Currently that is aarch64 and riscv.

This is achieved by:

- Skipping the existing FMV dispatching code at C++ gimplification and instead
  making use of the target_clones dispatching code in multiple_targets.cc.
  (This fixes PR target/118313 for aarch64 and riscv).
- Splitting target_clones pass in two, an early and late pass, where the early
  pass handles cases where multiple declarations are used to define a version,
  and the late pass handling target semantics targets, and cases where a FMV
  set is defined by a single target_clones decl.
- Changing the logic in add_candidates and resolve_address of overloaded
  function to prevent resolution of any version except a default version.
  (thus making the default version determine scope and signature of the
  versioned function set).
- Adding logic for dispatching a lone annotated default version in
  multiple_targets.cc
  - As as annotated default version gets mangled an alias is created from the
dispatched symbol to the default version as no ifunc resolution is required
in this case. (ie, an alias from `_Z3foov_` to `_Z3foov.default`)
- Adding logic to symbol_table::remove_unreachable_nodes and analyze_functions
  that a reference to the default function version also implies a possible
  reference to the other versions (so they shouldnt be deleted and do need to
  be analyzed).

gcc/ChangeLog:

PR target/118313
* cgraphunit.cc (analyze_functions): Add logic for target version
dependencies.
* ipa.cc (symbol_table::remove_unreachable_nodes): Ditto.
* multiple_target.cc (create_dispatcher_calls): Change to support
target version semantics.
(ipa_target_clone): Change to dispatch all function sets in
target_version semantics, and to have early and late pass.
(is_simple_target_clones_case): New function.
* config/aarch64/aarch64.cc: (aarch64_get_function_versions_dispatcher):
Refactor with the assumption that the DECL node will be default.
* config/riscv/riscv.cc: (riscv_get_function_versions_dispatcher):
Refactor with the assumption that the DECL node will be default.
* passes.def: Split target_clones pass into early and late version.

gcc/cp/ChangeLog:

PR target/118313
* call.cc (add_candidates): Change to not resolve non-default versions 
in
target_version semantics.
* class.cc (resolve_address_of_overloaded_function): Ditto.
* cp-gimplify.cc (cp_genericize_r): Change logic to not apply for
target_version semantics.
* decl.cc (start_decl): Change to mark and therefore mangle all
target_version decls.
(start_preparsed_function): Ditto.
* typeck.cc (cp_build_function_call_vec): Add error for calling 
unresolvable
non-default node in target_version semantics.

gcc/testsuite/ChangeLog:

* g++.target/aarch64/mv-1.C: Change for target_version semantics.
* g++.target/aarch64/mv-symbols2.C: Ditto.
* g++.target/aarch64/mv-symbols3.C: Ditto.
* g++.target/aarch64/mv-symbols4.C: Ditto.
* g++.target/aarch64/mv-symbols5.C: Ditto.
* g++.target/aarch64/mvc-symbols3.C: Ditto.
* g++.target/riscv/mv-symbols2.C: Ditto.
* g++.target/riscv/mv-symbols3.C: Ditto.
* g++.target/riscv/mv-symbols4.C: Ditto.
* g++.target/riscv/mv-symbols5.C: Ditto.
* g++.target/riscv/mvc-symbols3.C: Ditto.
* g++.target/aarch64/mv-symbols10.C: New test.
* g++.target/aarch64/mv-symbols11.C: New test.
* g++.target/aarch64/mv-symbols12.C: New test.
* g++.target/aarch64/mv-symbols13.C: New test.
* g++.target/aarch64/mv-symbols6.C: New test.
* g++.target/aarch64/mv-symbols7.C: New test.
* g++.target/aarch64/mv-symbols8.C: New test.
* g++.target/aarch64/mv-symbols9.C: New test.
---
 gcc/cgraphunit.cc |   9 ++
 gcc/config/aarch64/aarch64.cc |  43 ++
 gcc/config/riscv/riscv.cc |  43 ++
 gcc/cp/call.cc|  10 ++
 gcc/cp/class.cc   |  13 +-
 gcc/cp/cp-gimplify.cc |  11 +-
 gcc/cp/decl.cc|  14 ++
 gcc/cp/typeck.cc  |  10 ++
 gcc/ipa.cc 

[PATCH v4 10/20] Add dispatcher_resolver_function and is_target_clone flags to cgraph_node.

2025-04-15 Thread Alfie Richards
These are needed to correctly mangle FMV declarations.

gcc/ChangeLog:

* cgraph.h (struct cgraph_node): Add dispatcher_resolver_function and
is_target_clone.
---
 gcc/cgraph.h | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/gcc/cgraph.h b/gcc/cgraph.h
index 4a4fb7302b1..55812cc09a2 100644
--- a/gcc/cgraph.h
+++ b/gcc/cgraph.h
@@ -907,7 +907,9 @@ struct GTY((tag ("SYMTAB_FUNCTION"))) cgraph_node : public 
symtab_node
   used_as_abstract_origin (false),
   lowered (false), process (false), frequency (NODE_FREQUENCY_NORMAL),
   only_called_at_startup (false), only_called_at_exit (false),
-  tm_clone (false), dispatcher_function (false), calls_comdat_local 
(false),
+  tm_clone (false), dispatcher_function (false),
+  dispatcher_resolver_function (false), is_target_clone (false),
+  calls_comdat_local (false),
   icf_merged (false), nonfreeing_fn (false), merged_comdat (false),
   merged_extern_inline (false), parallelized_function (false),
   split_part (false), indirect_call_target (false), local (false),
@@ -1465,6 +1467,12 @@ struct GTY((tag ("SYMTAB_FUNCTION"))) cgraph_node : 
public symtab_node
   unsigned tm_clone : 1;
   /* True if this decl is a dispatcher for function versions.  */
   unsigned dispatcher_function : 1;
+  /* True if this decl is a resolver for function versions.  */
+  unsigned dispatcher_resolver_function : 1;
+  /* True this is part of a multiversioned set and this version comes from a
+ target_clone attribute.  Or if this is a dispatched symbol or resolver
+ and the default version comes from a target_clones.  */
+  unsigned is_target_clone : 1;
   /* True if this decl calls a COMDAT-local function.  This is set up in
  compute_fn_summary and inline_call.  */
   unsigned calls_comdat_local : 1;
-- 
2.34.1



[r15-9487 Regression] FAIL: gcc.dg/completion-2.c (test for excess errors) on Linux/x86_64

2025-04-15 Thread haochen.jiang
On Linux/x86_64,

6d9fdf4bf57353f9260a2e0c8774854fb50f5128 is the first bad commit
commit 6d9fdf4bf57353f9260a2e0c8774854fb50f5128
Author: Kyrylo Tkachov 
Date:   Thu Feb 27 09:24:10 2025 -0800

Locality cloning pass: -fipa-reorder-for-locality

caused

FAIL: gcc.dg/completion-2.c expected multiline pattern lines 5-11
FAIL: gcc.dg/completion-2.c (test for excess errors)

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r15-9487/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check RUNTESTFLAGS="dg.exp=gcc.dg/completion-2.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check RUNTESTFLAGS="dg.exp=gcc.dg/completion-2.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check RUNTESTFLAGS="dg.exp=gcc.dg/completion-2.c 
--target_board='unix{-m64}'"
$ cd {build_dir}/gcc && make check RUNTESTFLAGS="dg.exp=gcc.dg/completion-2.c 
--target_board='unix{-m64\ -march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at haochen dot jiang at intel.com.)
(If you met problems with cascadelake related, disabling AVX512F in command 
line might save that.)
(However, please make sure that there is no potential problems with AVX512.)


  1   2   >