date:20250709

Re: [PATCH 0/1] [RFC][AutoFDO]: Source filename tracking in GCOV

2025-07-09 Thread Kugan Vivekanandarajah

Hi Honza,

> On 8 Jul 2025, at 10:31 pm, Jan Hubicka  wrote:
> 
> External email: Use caution opening links or attachments
> 
> 
>> Hi Honza,
>> 
>>> On 8 Jul 2025, at 2:26 am, Jan Hubicka  wrote:
>>> 
>>> External email: Use caution opening links or attachments
>>> 
>>> 
>>> Hi,
>>> as discussed also on the autofdo pull request, LLVM solves the same
>>> problem using -funique-internal-linkage-names
>>> https://reviews.llvm.org/D73307
>>> 
>>> All non-public functions gets theis symbol renamed from
>>> .__uniq.
>> 
>> How is  __uniq. added to static 
>> symbols in the profile?
> 
> The patch does three things
> 1) extends ipa-visibility pass to rename all non-public function
>symbols adding the __uniq suffix.
>This skips those marked as used so asm statements can work.
> 2) makes dwarf2out to always add DW_AT_linkage_name attribute to
>inlined to DW_TAG_inlined_subroutine dies
> 3) extends auto-profile to accept profiles with unique names
>when building without unique names and vice versa.
> 
> I think it is pretty much what LLVM does except that I compute hash
> based on object file name while LLVM uses filename of the outer
> translation unit (which is easy to change, I just wanted to have
> something functional to see how it works in practice).
> 
> There is a comment on the pull request comment I added
> https://github.com/google/autofdo/pull/244#issuecomment-3046121191
> So it seems that llvm folks are not that happy with uniq suffixes since
> it breaks asm statements in Linux kernel.  I originally tought renaming
> is done in dwarf only but indeed renaming all static symbols is quite
> radical.
> 
> Their proposal
> https://discourse.llvm.org/t/rfc-keep-globalvalue-guids-stable/84801
> seems to be equivalent to what we have as profile_id.  It is 64bit
> identifier of a function that should be stable across builds and (modulo
> conflits) unique within translated program.  Currently it is assigned
> only to functions that may be used as indirect call targets and is used
> by normal FDO for resolving cross-unit indirect calls.
> 
> One option would be to use profile IDs in auto-profiles too.  I guess
> they can be streamed to dwarf via an extension as 64bit IDs. But it is
> not clear to me that it is what LLVM folks work on and if it will
> eventually get upstreamed.
> 
> If we want to finish your solution (adding file names in create_gcov). I
> think we need to solve the following:
> 1) extend dwarf2out to add DW_AT_linkage_name attributes for all
>function symbols.  This is easy to do.
> 2) veirfy that create_gcov can safely determine symbols with public
>or static linkage (even inlined ones).  There is DW_AT_public
>attribute
>and stream file names only for public linkage symbols
> 3) instead of streaming filename of file containing the symbol
>stream filename of the corresponding translation unit.
> 
> I would say that the advantage of profile id is probably shorter gcov
> files, advantage of streaming filename:symbol_name pairs is that the
> profile info is easier to read.  What do you think?

Thanks for the clarification.  Since LLVM  has been using __uniq suffix and 
this is optional (controlled by flag), IMO we could go with your patch.

Thanks,
Kugan
> 
> Honza

Re: [PATCH v1 0/3] RISC-V: Combine vec_duplicate + vssub.vv to vssub.vx on GR2VR cost

2025-07-09 Thread Robin Dapp


This patch would like to introduce the combine of vec_dup + vssub.vv
into vssub.vx on the cost value of GR2VR.  The late-combine will take
place if the cost of GR2VR is zero, or reject the combine if non-zero
like 1, 2, 15 in test.  There will be two cases for the combine:


Jeff has already pre-acked this so it's good to go.  I'm seeing a few failures 
in the CI, though.  Could you check if they're real?



--
Regards
Robin

Re: [PATCH] cobol: Implement CXXFLAGS_FOR_COBOL.

2025-07-09 Thread Andreas Schwab

On Jul 08 2025, Robert Dubner wrote:

> Are you suggesting that I can somehow apply a specific set of flags when
> compiling, for example,
>
>   gcc/cobol/genapi.cc
>
> If so, how could I do that?

There are some examples in gcc/Makefile.in.

CFLAGS-cobol/genapi.o = ...

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."

Re: [PATCH 2/3] tree: Add 7 and 8 argument TREE_CHECK_* and TREE_NOT_CHECK_*

2025-07-09 Thread Richard Biener

On Wed, Jul 9, 2025 at 4:36 AM Alex (Waffl3x)  wrote:
>

Adding extra checks like this is OK.

Thanks,
Richard.

Re: [PATCH 3/3] middle-end/121005 Add checks for TREE_LANG_FLAG_*

2025-07-09 Thread Richard Biener

On Wed, Jul 9, 2025 at 4:37 AM Alex (Waffl3x)  wrote:
>

 /* Nonzero in a _DECL if the use of the name is defined as an
unavailable feature by __attribute__((unavailable)).  */
 #define TREE_UNAVAILABLE(NODE) \
-  ((NODE)->base.u.bits.unavailable_flag)
+  ((TREE_CHECK_BITS_AVAILABLE (NODE))->base.u.bits.unavailable_flag)

the comment suggests we should use DECL_MINIMAL_CHECK (...) which I think
covers bits availability.

Otherwise this looks good to me.

Thanks,
Richard.

Re: [PATCH 1/3] tree: Add TREE_NOT_RANGE_CHECK

2025-07-09 Thread Richard Biener

On Wed, Jul 9, 2025 at 4:37 AM Alex (Waffl3x)  wrote:
>

LGTM.

Richard.

[PATCH] RISC-V: Adjust testdata for unsigned vector SAT_SUB

2025-07-09 Thread Ciyan Pan

From: panciyan 

This patch adjust test data for unsigned vector SAT_SUB to vec_sat_data.h

Passed the rv64gcv regression test.

Signed-off-by: Ciyan Pan 
gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/sat/vec_sat_arith.h: Add 
vec_sat_u_sub_fmt wrap define.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_data.h: Add vec_sat_u_sub 
test data.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-1-u16.c: Remove 
test data.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-1-u32.c: Remove 
test data.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-1-u64.c: Remove 
test data.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-1-u8.c: Remove 
test data.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-10-u16.c: Remove 
test data.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-10-u32.c: Remove 
test data.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-10-u64.c: Remove 
test data.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-10-u8.c: Remove 
test data.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-2-u16.c: Remove 
test data.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-2-u32.c: Remove 
test data.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-2-u64.c: Remove 
test data.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-2-u8.c: Remove 
test data.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-3-u16.c: Remove 
test data.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-3-u32.c: Remove 
test data.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-3-u64.c: Remove 
test data.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-3-u8.c: Remove 
test data.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-4-u16.c: Remove 
test data.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-4-u32.c: Remove 
test data.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-4-u64.c: Remove 
test data.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-4-u8.c: Remove 
test data.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-5-u16.c: Remove 
test data.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-5-u32.c: Remove 
test data.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-5-u64.c: Remove 
test data.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-5-u8.c: Remove 
test data.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-6-u16.c: Remove 
test data.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-6-u32.c: Remove 
test data.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-6-u64.c: Remove 
test data.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-6-u8.c: Remove 
test data.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-7-u16.c: Remove 
test data.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-7-u32.c: Remove 
test data.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-7-u64.c: Remove 
test data.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-7-u8.c: Remove 
test data.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-8-u16.c: Remove 
test data.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-8-u32.c: Remove 
test data.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-8-u64.c: Remove 
test data.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-8-u8.c: Remove 
test data.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-9-u16.c: Remove 
test data.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-9-u32.c: Remove 
test data.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-9-u64.c: Remove 
test data.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-9-u8.c: Remove 
test data.

---
 .../riscv/rvv/autovec/sat/vec_sat_arith.h |  40 +++
 .../riscv/rvv/autovec/sat/vec_sat_data.h  | 252 ++
 .../rvv/autovec/sat/vec_sat_u_sub-run-1-u16.c |  70 +
 .../rvv/autovec/sat/vec_sat_u_sub-run-1-u32.c |  70 +
 .../rvv/autovec/sat/vec_sat_u_sub-run-1-u64.c |  70 +
 .../rvv/autovec/sat/vec_sat_u_sub-run-1-u8.c  |  71 +
 .../autovec/sat/vec_sat_u_sub-run-10-u16.c|  70 +
 .../autovec/sat/vec_sat_u_sub-run-10-u32.c|  70 +
 .../autovec/sat/vec_sat_u_sub-run-10-u64.c|  70 +
 .../rvv/autovec/sat/vec_sat_u_sub-run-10-u8.c |  70 +
 .../rvv/autovec/sat/vec_sat_u_sub-run-2-u16.c |  70 +
 .../rvv/autovec/sat/vec_sat_u_sub-run-2-u32.c |  70 +
 .../rvv/autovec/sat/vec_sat_u_sub-run-2-u64.c |  70 +
 .../rvv/autovec/sat/vec_sat_u_sub-run-2-u8.c  |  70 +
 .../rvv/autovec/sat/vec_sat_u_sub-run-3-u16.c |  70 +
 .../rvv/autovec/sat/vec_sat_u_sub-run-3-u32.c |  70 +
 .../rvv/autovec/sat/vec_sat_u_sub-run-3-u64.c |  70 +
 .../rvv/autovec/sat/vec_sat_u_sub-run-3-u8.c  |  70 +

Re: [PATCH] RISC-V: Vector-scalar widening multiply-(subtract-)accumulate [PR119100]

2025-07-09 Thread Jeff Law





On 7/9/25 3:11 AM, Robin Dapp wrote:



Also, seems like the CI picked up the patch but didn't run it?
Yea, it's happened with a couple of mine recently, including one 
yesterday.   If it's not picked up when Paul-Antoine posts an update, 
then I'll throw it into my system for some degree of pre-commit testing.


jeff

Re: [PATCH] [PR target/109286] H8/300: Fix warnings about initfini sections missing attributes

2025-07-09 Thread Jeff Law





On 7/8/25 10:26 AM, Jan Dubiec wrote:

The patch changes order of inclusions, i.e. elfos.h is included before
target specific h8300/h8300.h, in a way similar to a few other targets.
Thanks to this change it is possible to override macros from elfos.h in
h8300/h8300.h, in particular .init/.fini section definitions.

 PR target/109286

gcc/ChangeLog:

 * config.gcc: Include elfos.h before h8300/h8300.h.

 * config/h8300/h8300.h (INIT_SECTION_ASM_OP): Override
 default version from elfos.h.
 (FINI_SECTION_ASM_OP): Ditto.
 (ASM_DECLARE_FUNCTION_NAME): Ditto.
 (ASM_GENERATE_INTERNAL_LABEL): Macro removed because it was
 being overridden in elfos.h anyway.
 (ASM_OUTPUT_SKIP): Ditto.

Thanks.  I've pushed this to the trunk.

jeff

Re: [PATCH 2/2] tree-optimization/109893 - allow more backwards jump threading

2025-07-09 Thread Jeff Law





On 7/9/25 12:27 AM, Richard Biener wrote:

The following changes the percentage that determines how many
stmts are allowed for backwards jump threading from 50 to 54,
enabling the missed jump threading observed in PR109893.

Bootstrapped and tested on x86_64-unknown-linux-gnu.  It seems that
at least backward threading is prone to profile mismatches, I've
altered two testcases to deal with new ones to pop up (definitely
latent issues).

OK?

PR tree-optimization/109893
* params.opt (fsm-scale-path-stmts): Change from 50 to 54.

* gcc.dg/tree-ssa/pr109893.c: New testcase.
* gcc.dg/tree-prof/cmpsf-1.c: XFAIL.
* gcc.dg/tree-ssa/pr109893.c: Remove scan on no profile
mismatches.
My recollection is the scaling factor was set one based on some old PR 
where code size exploded and wasn't really tuned further after that.  If 
the new value is working better, then that's obviously fine with me. 
Ideally we'd just get rid of the magic ratio


Jeff

Re: [PATCH v2 1/1] libiberty: add routines to handle type-sensitive doubly linked lists

2025-07-09 Thread Richard Sandiford

Matthieu Longo  writes:
> Those methods's implementation is relying on duck-typing at compile
> time.
> The structure corresponding to the node of a doubly linked list needs
> to define attributes 'prev' and 'next' which are pointers on the type
> of a node.
> The structure wrapping the nodes and others metadata (first, last, size)
> needs to define pointers 'first', and 'last' of the node's type, and
> an integer type for 'size'.
>
> Mutative methods can be bundled together and be declarable once via a
> same macro, or can be declared separately. The merge sort is bundled
> separately.
> There are 3 types of macros:
> 1. for the declaration of prototypes: to use in a header file for a
>public declaration, or as a forward declaration in the source file
>for private declaration.
> 2. for the declaration of the implementation: to use always in a
>source file.
> 3. for the invocation of the functions.
>
> The methods can be declared either public or private via the second
> argument of the declaration macros.
>
> List of currently implemented methods:
> - LINKED_LIST_*:
> - APPEND: insert a node at the end of the list.
> - PREPEND: insert a node at the beginning of the list.
> - INSERT_BEFORE: insert a node before the given node.
> - POP_FRONT: remove the first node of the list.
> - POP_BACK: remove the last node of the list.
> - REMOVE: remove the given node from the list.
> - SWAP: swap the two given nodes in the list.
> - LINKED_LIST_MERGE_SORT: a merge sort implementation.

Thanks for the update, LGTM.  OK for trunk (and for binutils).

Richard

> ---
>  include/doubly-linked-list.h  | 447 ++
>  libiberty/Makefile.in |   1 +
>  libiberty/testsuite/Makefile.in   |  12 +-
>  libiberty/testsuite/test-doubly-linked-list.c | 269 +++
>  4 files changed, 728 insertions(+), 1 deletion(-)
>  create mode 100644 include/doubly-linked-list.h
>  create mode 100644 libiberty/testsuite/test-doubly-linked-list.c
>
> diff --git a/include/doubly-linked-list.h b/include/doubly-linked-list.h
> new file mode 100644
> index 000..3f5ea2808f9
> --- /dev/null
> +++ b/include/doubly-linked-list.h
> @@ -0,0 +1,447 @@
> +/* Manipulate doubly linked lists.
> +   Copyright (C) 2025 Free Software Foundation, Inc.
> +
> +   This program is free software; you can redistribute it and/or modify
> +   it under the terms of the GNU General Public License as published by
> +   the Free Software Foundation; either version 3 of the License, or
> +   (at your option) any later version.
> +
> +   This program is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +   GNU General Public License for more details.
> +
> +   You should have received a copy of the GNU General Public License
> +   along with this program.  If not, see .  */
> +
> +
> +#ifndef _DOUBLY_LINKED_LIST_H
> +#define _DOUBLY_LINKED_LIST_H
> +
> +/* Doubly linked list implementation enforcing typing.
> +
> +   This implementation of doubly linked list tries to achieve the 
> enforcement of
> +   typing similarly to C++ templates, but without encapsulation.
> +
> +   All the functions are prefixed with the type of the value: "AType_xxx".
> +   Some functions are prefixed with "_AType_xxx" and are not part of the 
> public
> +   API, so should not be used, except for _##LTYPE##_merge_sort with a caveat
> +   (see note above its definition).
> +
> +   Each function (### is a placeholder for method name) has a macro for:
> +   (1) its invocation LINKED_LIST_###(LTYPE).
> +   (2) its prototype LINKED_LIST_DECL_###(A, A2, scope). To add in a header
> +   file, or a source file for forward declaration. 'scope' should be set
> +   respectively to 'extern', or 'static'.
> +   (3) its definition LINKED_LIST_DEFN_###(A, A2, scope). To add in a source
> +   file with the 'scope' set respectively to nothing, or 'static' 
> depending
> +   on (2).
> +
> +   Data structures requirements:
> +   - LTYPE corresponds to the node of a doubly linked list. It needs to 
> define
> + attributes 'prev' and 'next' which are pointers on the type of a node.
> + For instance:
> +   struct my_list_node
> +   {
> +  T value;
> +  struct my_list_node *prev;
> +  struct my_list_node *next;
> +   };
> +   - LWRAPPERTYPE is a structure wrapping the nodes and others metadata 
> (first,
> + last, size).
> + */
> +
> +
> +/* Mutative operations:
> +- append
> +- prepend
> +- insert_before
> +- pop_front
> +- pop_back
> +- remove
> +- swap
> +   The header and body of each of those operation can be declared 
> individually,
> +   or as a whole via LINKED_LIST_MUTATIVE_OPS_PROTOTYPE for the prototypes, 
> and
> +   LINKED_LIST_MUTATIVE_OPS_DECL for the im

Re: [PATCH] RISC-V: Adjust testdata for unsigned vector SAT_SUB

2025-07-09 Thread Jeff Law





On 7/9/25 2:35 AM, Ciyan Pan wrote:

From: panciyan 

This patch adjust test data for unsigned vector SAT_SUB to vec_sat_data.h

Passed the rv64gcv regression test.

Signed-off-by: Ciyan Pan 
gcc/testsuite/ChangeLog:

 * gcc.target/riscv/rvv/autovec/sat/vec_sat_arith.h: Add 
vec_sat_u_sub_fmt wrap define.
 * gcc.target/riscv/rvv/autovec/sat/vec_sat_data.h: Add vec_sat_u_sub 
test data.
 * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-1-u16.c: Remove 
test data.
 * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-1-u32.c: Remove 
test data.
 * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-1-u64.c: Remove 
test data.
 * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-1-u8.c: Remove 
test data.
 * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-10-u16.c: Remove 
test data.
 * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-10-u32.c: Remove 
test data.
 * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-10-u64.c: Remove 
test data.
 * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-10-u8.c: Remove 
test data.
 * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-2-u16.c: Remove 
test data.
 * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-2-u32.c: Remove 
test data.
 * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-2-u64.c: Remove 
test data.
 * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-2-u8.c: Remove 
test data.
 * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-3-u16.c: Remove 
test data.
 * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-3-u32.c: Remove 
test data.
 * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-3-u64.c: Remove 
test data.
 * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-3-u8.c: Remove 
test data.
 * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-4-u16.c: Remove 
test data.
 * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-4-u32.c: Remove 
test data.
 * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-4-u64.c: Remove 
test data.
 * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-4-u8.c: Remove 
test data.
 * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-5-u16.c: Remove 
test data.
 * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-5-u32.c: Remove 
test data.
 * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-5-u64.c: Remove 
test data.
 * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-5-u8.c: Remove 
test data.
 * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-6-u16.c: Remove 
test data.
 * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-6-u32.c: Remove 
test data.
 * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-6-u64.c: Remove 
test data.
 * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-6-u8.c: Remove 
test data.
 * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-7-u16.c: Remove 
test data.
 * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-7-u32.c: Remove 
test data.
 * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-7-u64.c: Remove 
test data.
 * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-7-u8.c: Remove 
test data.
 * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-8-u16.c: Remove 
test data.
 * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-8-u32.c: Remove 
test data.
 * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-8-u64.c: Remove 
test data.
 * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-8-u8.c: Remove 
test data.
 * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-9-u16.c: Remove 
test data.
 * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-9-u32.c: Remove 
test data.
 * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-9-u64.c: Remove 
test data.
 * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-9-u8.c: Remove 
test data.
Thanks.  I ran this through my tester since the pre-commit tester didn't 
run it for some reason or another.  I also spot checked some of the 
tables of data to make sure they were unchanged.  Everything looked 
good, so I've pushed this to the trunk.


Thanks!

jeff

Re: [PATCH] ext-dce: Fix subreg_lsb is_constant assumption

2025-07-09 Thread Jeff Law





On 7/9/25 8:00 AM, Richard Sandiford wrote:




Makes me wonder if I should resurrect my aarch64_be RFS.  I changed how
those systems worked in the system a few years back to make it work
better with container based testing rather than direct chroots.  I never
converted aarch64_be to that setup.  It shouldn't be hard if you think
it's valuable.


I'm not sure TBH.  The only reason I started looking at aarch64_be
recently was to test a patch for Konstantinos.  And it turns out that
the "before" results are really, really poor.  I think that suggests
that no-one on the AArch64 side is testing big-endian regularly.
(And LLVM have got away without ever implementing big-endian arm_sve.h
support.)  So there's a danger that you'd spend a lot of your time
triaging AArch64-specific bugs.  There again, like you say...
It's certainly not a clear cut decision.  I do test some BE stuff (m68k, 
s390x, H8 and a few others), but certainly nothing BE with scalable 
vectors.  And yes, I definitely found it useful to test those BE things 
for Konstantinos, and CRCs and ext-dce, etc.  The gap I see is BE with 
scalable vectors.


Unfortunately any data I had on stability of aarch64_be has been lost in 
the years since I did the conversion.  I don't recall spending much time 
on it and it was one of the slower things to test via qemu, so it just 
didn't seem worth the effort at the time.


The best solution would be to discover that someone has an aarch64_be 
setup working on a RPI.  But I don't see any evidence of that anywhere.





I can't think of another system where we'd these kinds of issues.


...it probably does have some "unique" features. :)

I later came across another instance of the subreg_lsb thing, which was
causing other ICEs.  I went ahead and installed this as obvious, given
the approval for the earlier one.

Tested on aarch64-linux-gnu and aarch64_be-elf.

Yea, definitely OK.  Thanks.

jeff

Re: [PATCH] c++, libstdc++, v3: Implement C++26 P3068R5 - constexpr exceptions [PR117785]

2025-07-09 Thread Jason Merrill


On 7/9/25 9:30 AM, Jakub Jelinek wrote:

On Tue, Jul 08, 2025 at 09:43:20PM -0400, Jason Merrill wrote:

@@ -3066,7 +3810,12 @@ cxx_eval_call_expression (const constexp
  return arg1;
}
 else if (cxx_dynamic_cast_fn_p (fun))
-   return cxx_eval_dynamic_cast_fn (ctx, t, non_constant_p, overflow_p);
+   return cxx_eval_dynamic_cast_fn (ctx, t, non_constant_p, overflow_p,
+jump_target);
+  else if (enum cxa_builtin kind = cxx_cxa_builtin_fn_p (fun))
+   return cxx_eval_cxa_builtin_fn (ctx, t, kind, fun,
+   non_constant_p, overflow_p,
+   jump_target);
 if (!ctx->quiet)
{


If evaluating the arguments might throw, we can't give up at this point for
a non-constexpr function.


I don't understand this comment, at least in connection with the above
snippet, that just handles the magic calls.


Sorry I wasn't clear, the comment was about the existing code that 
follows the end of that hunk:



  if (!ctx->quiet)
{
  if (!lambda_static_thunk_p (fun))
error_at (loc, "call to non-% function %qD", fun);
  explain_invalid_constexpr_fn (fun);
}
  *non_constant_p = true;
  return t;


In C++26 we can't take this early exit before cxx_bind_parameters_in_call.


--- gcc/cp/decl.cc.jj   2025-05-31 00:43:13.835238535 +0200
+++ gcc/cp/decl.cc  2025-05-31 00:43:32.174996782 +0200
@@ -5070,6 +5070,18 @@ cxx_init_decl_processing (void)
BUILT_IN_FRONTEND, NULL, NULL_TREE);
 set_call_expr_flags (decl, ECF_CONST | ECF_NOTHROW | ECF_LEAF);
+  if (cxx_dialect >= cxx26)
+{
+  tree void_ptrintftype
+   = build_function_type_list (void_type_node, ptr_type_node,
+   integer_type_node, NULL_TREE);
+  decl = add_builtin_function ("__builtin_eh_ptr_adjust_ref",


Instead of this, I wonder about hijacking the exception_ptr constructor and
destructor?  That doesn't need to be in this patch, just a thought. Do we
know if there's a clang/libc++ plan for constexpr exception_ptr yet?


Hana has some implementation on her branch, but I don't see progress in
trying to upstream that into LLVM.
The reason I've added a builtin is that unlike the __cxa_* functions or say
std::current_exception() etc. which are defined out of line (I think even in
libc++), most of the exception_ptr class cdtors and methods are defined
inline, for libstdc++ except for exception_ptr::_M_addref and
exception_ptr::_M_release which are out of line.  The names of those are
very libstdc++ specific though, libc++ uses something else.  Though the
operations (addref and release) are used in multiple places, e.g. in the
private ctor from void *, in copy ctor, and release from dtor, so figuring
out what exactly to hijack seems harder and less portable and future
standards can always add further methods to exception_ptr etc.
https://github.com/llvm/llvm-project/compare/main...hanickadot:llvm-project:hana/P3068-constexpr-exceptions#diff-58be87d6aa8658f15e1c1f3fa40acb71d078896d857cf98033b32f9fa294c320
is what I can see for exception_ptr, but whatever builtins they choose
in the end I guess depends on the upstreaming.  And they can choose to use
the same ones as GCC or libc++ can conditionalize that.


Aha, she has

  __constexpr_exception_refcount_inc(__ptr_);
  __constexpr_exception_refcount_dec(__ptr_);

so a similar approach, just with two builtins rather than one.  So yeah, 
never mind for now, but we might try to coordinate builtin naming with them.


Jason

RE: [PATCH] cobol: Implement CXXFLAGS_FOR_COBOL.

2025-07-09 Thread Robert Dubner

With respect, this is another example of "I have been unable to make it
work."

The gcc/Makefile.in has this line in it:

$(foreach file,$(ALL_HOST_FRONTEND_OBJS),$(eval CFLAGS-$(file) +=
-DIN_GCC_FRONTEND))

At the point where gcc/cobol files are compiled, the environment variable
has this value:

CFLAGS-cobol/genapi.o is -DIN_GCC_FRONTEND

An attempt to override that with

make CFLAGS-cobol/genapi.o=-DHARMLESS

has no effect on the value of the CFLAGS-cobol/genapi.o

I commented out the $(foreach... line and tried to set the variable with
the make command.  

Even then, CFLAGS-cobol/genapi.o did not seem to be set as a result.  

I do not know why my attempts to set CFLAGS-cobol/genapi.o are failing.


> -Original Message-
> From: Andreas Schwab 
> Sent: Wednesday, July 9, 2025 03:23
> To: Robert Dubner 
> Cc: Rainer Orth ; gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH] cobol: Implement CXXFLAGS_FOR_COBOL.
> 
> On Jul 08 2025, Robert Dubner wrote:
> 
> > Are you suggesting that I can somehow apply a specific set of flags
when
> > compiling, for example,
> >
> > gcc/cobol/genapi.cc
> >
> > If so, how could I do that?
> 
> There are some examples in gcc/Makefile.in.
> 
> CFLAGS-cobol/genapi.o = ...
> 
> --
> Andreas Schwab, SUSE Labs, sch...@suse.de
> GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
> "And now for something completely different."

RE: [PATCH v1 0/3] RISC-V: Combine vec_duplicate + vssub.vv to vssub.vx on GR2VR cost

2025-07-09 Thread Li, Pan2

Thanks Robin.

The failures of FAIL: gcc.dg/vect/vect-strided-a-u8-i2-gap.c -flto 
-ffat-lto-objects (test for excess errors) from
the linux-rv64gcv-lp64d testsuite log should not be a real failure (build 
failure but run success).

It passed locally as below, so will commit it soon.

Executing on host: 
/home/pli/gcc/111/riscv-gnu-toolchain/build-gcc-linux-stage2/gcc/xgcc 
-B/home/pli/gcc/111/riscv-gnu-toolchain/build-gcc-linux-stage2/gcc/  
/home/pli/gcc/111/riscv-gnu-toolchain/gcc/gcc/testsuite/gcc.dg/vect/vect-strided-a-u8-i2-gap.c
  -march=rv64gcv -mabi=lp64d -mcmodel=medlow   -fdiagnostics-plain-output   
-mno-vector-strict-align -ftree-vectorize -fno-tree-loop-distribute-patterns 
-fno-vect-cost-model -fno-common -O2 -fdump-tree-vect-details  -lm  -o 
./vect-strided-a-u8-i2-gap.exe(timeout = 3000)
spawn -ignore SIGHUP 
/home/pli/gcc/111/riscv-gnu-toolchain/build-gcc-linux-stage2/gcc/xgcc 
-B/home/pli/gcc/111/riscv-gnu-toolchain/build-gcc-linux-stage2/gcc/ 
/home/pli/gcc/111/riscv-gnu-toolchain/gcc/gcc/testsuite/gcc.dg/vect/vect-strided-a-u8-i2-gap.c
 -march=rv64gcv -mabi=lp64d -mcmodel=medlow -fdiagnostics-plain-output 
-mno-vector-strict-align -ftree-vectorize -fno-tree-loop-distribute-patterns 
-fno-vect-cost-model -fno-common -O2 -fdump-tree-vect-details -lm -o 
./vect-strided-a-u8-i2-gap.exe
PASS: gcc.dg/vect/vect-strided-a-u8-i2-gap.c (test for excess errors)
spawn riscv64-unknown-linux-gnu-run ./vect-strided-a-u8-i2-gap.exe
PASS: gcc.dg/vect/vect-strided-a-u8-i2-gap.c execution test
PASS: gcc.dg/vect/vect-strided-a-u8-i2-gap.c scan-tree-dump-times vect 
"vectorized 2 loops" 1
Executing on host: 
/home/pli/gcc/111/riscv-gnu-toolchain/build-gcc-linux-stage2/gcc/xgcc 
-B/home/pli/gcc/111/riscv-gnu-toolchain/build-gcc-linux-stage2/gcc/  
/home/pli/gcc/111/riscv-gnu-toolchain/gcc/gcc/testsuite/gcc.dg/vect/vect-strided-a-u8-i2-gap.c
  -march=rv64gcv -mabi=lp64d -mcmodel=medlow   -fdiagnostics-plain-output  
-flto -ffat-lto-objects -mno-vector-strict-align -ftree-vectorize 
-fno-tree-loop-distribute-patterns -fno-vect-cost-model -fno-common -O2 
-fdump-tree-vect-details  -lm  -o ./vect-strided-a-u8-i2-gap.exe
(timeout = 3000)
spawn -ignore SIGHUP 
/home/pli/gcc/111/riscv-gnu-toolchain/build-gcc-linux-stage2/gcc/xgcc 
-B/home/pli/gcc/111/riscv-gnu-toolchain/build-gcc-linux-stage2/gcc/ 
/home/pli/gcc/111/riscv-gnu-toolchain/gcc/gcc/testsuite/gcc.dg/vect/vect-strided-a-u8-i2-gap.c
 -march=rv64gcv -mabi=lp64d -mcmodel=medlow -fdiagnostics-plain-output -flto 
-ffat-lto-objects -mno-vector-strict-align -ftree-vectorize 
-fno-tree-loop-distribute-patterns -fno-vect-cost-model -fno-common -O2 
-fdump-tree-vect-details -lm -o ./vect-strided-a-u8-i2-gap.exe
PASS: gcc.dg/vect/vect-strided-a-u8-i2-gap.c -flto -ffat-lto-objects (test for 
excess errors)
spawn riscv64-unknown-linux-gnu-run ./vect-strided-a-u8-i2-gap.exe
PASS: gcc.dg/vect/vect-strided-a-u8-i2-gap.c -flto -ffat-lto-objects execution 
test
PASS: gcc.dg/vect/vect-strided-a-u8-i2-gap.c -flto -ffat-lto-objects  
scan-tree-dump-times vect "vectorized 2 loops" 1

Pan

-Original Message-
From: Robin Dapp  
Sent: Wednesday, July 9, 2025 3:10 PM
To: Li, Pan2 ; gcc-patches@gcc.gnu.org
Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; jeffreya...@gmail.com; 
rdapp@gmail.com; Chen, Ken ; Liu, Hongtao 
; Robin Dapp 
Subject: Re: [PATCH v1 0/3] RISC-V: Combine vec_duplicate + vssub.vv to 
vssub.vx on GR2VR cost

> This patch would like to introduce the combine of vec_dup + vssub.vv
> into vssub.vx on the cost value of GR2VR.  The late-combine will take
> place if the cost of GR2VR is zero, or reject the combine if non-zero
> like 1, 2, 15 in test.  There will be two cases for the combine:

Jeff has already pre-acked this so it's good to go.  I'm seeing a few failures 
in the CI, though.  Could you check if they're real?


-- 
Regards
 Robin

RE: [PATCH] RISC-V: Adjust testdata for unsigned vector SAT_SUB

2025-07-09 Thread Li, Pan2

Thanks Jeff and Kito, LGTM.

Pan

-Original Message-
From: Jeff Law  
Sent: Wednesday, July 9, 2025 10:33 PM
To: Ciyan Pan ; gcc-patches@gcc.gnu.org
Cc: kito.ch...@gmail.com; richard.guent...@gmail.com; tamar.christ...@arm.com; 
juzhe.zh...@rivai.ai; Li, Pan2 ; rdapp@gmail.com
Subject: Re: [PATCH] RISC-V: Adjust testdata for unsigned vector SAT_SUB



On 7/9/25 2:35 AM, Ciyan Pan wrote:
> From: panciyan 
> 
> This patch adjust test data for unsigned vector SAT_SUB to vec_sat_data.h
> 
> Passed the rv64gcv regression test.
> 
> Signed-off-by: Ciyan Pan 
> gcc/testsuite/ChangeLog:
> 
>  * gcc.target/riscv/rvv/autovec/sat/vec_sat_arith.h: Add 
> vec_sat_u_sub_fmt wrap define.
>  * gcc.target/riscv/rvv/autovec/sat/vec_sat_data.h: Add vec_sat_u_sub 
> test data.
>  * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-1-u16.c: Remove 
> test data.
>  * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-1-u32.c: Remove 
> test data.
>  * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-1-u64.c: Remove 
> test data.
>  * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-1-u8.c: Remove 
> test data.
>  * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-10-u16.c: 
> Remove test data.
>  * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-10-u32.c: 
> Remove test data.
>  * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-10-u64.c: 
> Remove test data.
>  * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-10-u8.c: Remove 
> test data.
>  * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-2-u16.c: Remove 
> test data.
>  * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-2-u32.c: Remove 
> test data.
>  * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-2-u64.c: Remove 
> test data.
>  * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-2-u8.c: Remove 
> test data.
>  * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-3-u16.c: Remove 
> test data.
>  * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-3-u32.c: Remove 
> test data.
>  * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-3-u64.c: Remove 
> test data.
>  * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-3-u8.c: Remove 
> test data.
>  * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-4-u16.c: Remove 
> test data.
>  * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-4-u32.c: Remove 
> test data.
>  * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-4-u64.c: Remove 
> test data.
>  * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-4-u8.c: Remove 
> test data.
>  * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-5-u16.c: Remove 
> test data.
>  * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-5-u32.c: Remove 
> test data.
>  * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-5-u64.c: Remove 
> test data.
>  * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-5-u8.c: Remove 
> test data.
>  * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-6-u16.c: Remove 
> test data.
>  * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-6-u32.c: Remove 
> test data.
>  * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-6-u64.c: Remove 
> test data.
>  * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-6-u8.c: Remove 
> test data.
>  * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-7-u16.c: Remove 
> test data.
>  * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-7-u32.c: Remove 
> test data.
>  * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-7-u64.c: Remove 
> test data.
>  * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-7-u8.c: Remove 
> test data.
>  * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-8-u16.c: Remove 
> test data.
>  * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-8-u32.c: Remove 
> test data.
>  * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-8-u64.c: Remove 
> test data.
>  * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-8-u8.c: Remove 
> test data.
>  * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-9-u16.c: Remove 
> test data.
>  * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-9-u32.c: Remove 
> test data.
>  * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-9-u64.c: Remove 
> test data.
>  * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-9-u8.c: Remove 
> test data.
Thanks.  I ran this through my tester since the pre-commit tester didn't 
run it for some reason or another.  I also spot checked some of the 
tables of data to make sure they were unchanged.  Everything looked 
good, so I've pushed this to the trunk.

Thanks!

jeff

Re: [PATCH] RISC-V: Enable zvfh for vector-scalar half-float run tests

2025-07-09 Thread Jeff Law





On 7/8/25 9:17 AM, Paul-Antoine Arras wrote:

zvfh is not enabled at the testsuite level. It has to be enabled on a testcase
by testcase basis. This was correctly done for compile tests but not for run
tests. This patch fixes it.
Also, to ensure correct results with half-precision floats, MAX_RELATIVE_DIFF is
set according to the type.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vx_vf/vf_mulop_run.h: Set
MAX_RELATIVE_DIFF depending on type.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfmacc-run-1-f16.c: Enable zvfh.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfmadd-run-1-f16.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfmsac-run-1-f16.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfmsub-run-1-f16.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfnmacc-run-1-f16.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfnmadd-run-1-f16.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfnmsac-run-1-f16.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfnmsub-run-1-f16.c: Likewise.

Thanks.  I've pushed this to the trunk for you.

Jeff

Re: [PATCH] aarch64: Implement sme2+faminmax extension.

2025-07-09 Thread Kyrylo Tkachov

Hi Alfie,

> On 7 Jul 2025, at 10:46, Alfie Richards  wrote:
> 
> Hello all,
> 
> This patch implements the couple of amin/amax instructions that are part of
> SME2 + faminmax.
> 
> Regression testsed and bootstrapped for Aarch64.
> 
> Thanks,
> Alfie
> 
> -- >8 --
> 
> Implements the sme2+faminmax svamin and svamax intrinsics.
> 
> gcc/ChangeLog:
> 
> * config/aarch64/aarch64-sme.md (@aarch64_sme_):
> New patterns.
> * config/aarch64/aarch64-sve-builtins-sme.def (svamin): New intrinsics.
> (svamax): New intrinsics.
> * config/aarch64/aarch64-sve-builtins-sve2.cc (class faminmaximpl): New
> class.
> (svamin): New function.
> (svamax): New function.
> 
> gcc/testsuite/ChangeLog:
> 
> * gcc.target/aarch64/sme2/acle-asm/amax_f16_x2.c: New test.
> * gcc.target/aarch64/sme2/acle-asm/amax_f16_x4.c: New test.
> * gcc.target/aarch64/sme2/acle-asm/amax_f32_x2.c: New test.
> * gcc.target/aarch64/sme2/acle-asm/amax_f32_x4.c: New test.
> * gcc.target/aarch64/sme2/acle-asm/amax_f64_x2.c: New test.
> * gcc.target/aarch64/sme2/acle-asm/amax_f64_x4.c: New test.
> * gcc.target/aarch64/sme2/acle-asm/amin_f16_x2.c: New test.
> * gcc.target/aarch64/sme2/acle-asm/amin_f16_x4.c: New test.
> * gcc.target/aarch64/sme2/acle-asm/amin_f32_x2.c: New test.
> * gcc.target/aarch64/sme2/acle-asm/amin_f32_x4.c: New test.
> * gcc.target/aarch64/sme2/acle-asm/amin_f64_x2.c: New test.
> * gcc.target/aarch64/sme2/acle-asm/amin_f64_x4.c: New test.
> ---
> gcc/config/aarch64/aarch64-sme.md |  18 +++
> .../aarch64/aarch64-sve-builtins-sme.def  |   5 +
> .../aarch64/aarch64-sve-builtins-sve2.cc  |  44 +-
> .../aarch64/sme2/acle-asm/amax_f16_x2.c   |  97 +
> .../aarch64/sme2/acle-asm/amax_f16_x4.c   | 128 +
> .../aarch64/sme2/acle-asm/amax_f32_x2.c   |  96 +
> .../aarch64/sme2/acle-asm/amax_f32_x4.c   | 129 ++
> .../aarch64/sme2/acle-asm/amax_f64_x2.c   |  96 +
> .../aarch64/sme2/acle-asm/amax_f64_x4.c   | 128 +
> .../aarch64/sme2/acle-asm/amin_f16_x2.c   |  96 +
> .../aarch64/sme2/acle-asm/amin_f16_x4.c   | 128 +
> .../aarch64/sme2/acle-asm/amin_f32_x2.c   |  96 +
> .../aarch64/sme2/acle-asm/amin_f32_x4.c   | 128 +
> .../aarch64/sme2/acle-asm/amin_f64_x2.c   |  96 +
> .../aarch64/sme2/acle-asm/amin_f64_x4.c   | 128 +
> 15 files changed, 1409 insertions(+), 4 deletions(-)
> create mode 100644 
> gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/amax_f16_x2.c
> create mode 100644 
> gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/amax_f16_x4.c
> create mode 100644 
> gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/amax_f32_x2.c
> create mode 100644 
> gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/amax_f32_x4.c
> create mode 100644 
> gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/amax_f64_x2.c
> create mode 100644 
> gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/amax_f64_x4.c
> create mode 100644 
> gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/amin_f16_x2.c
> create mode 100644 
> gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/amin_f16_x4.c
> create mode 100644 
> gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/amin_f32_x2.c
> create mode 100644 
> gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/amin_f32_x4.c
> create mode 100644 
> gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/amin_f64_x2.c
> create mode 100644 
> gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/amin_f64_x4.c
> 
> diff --git a/gcc/config/aarch64/aarch64-sme.md 
> b/gcc/config/aarch64/aarch64-sme.md
> index b8bb4cc14b6..bfe368e80b5 100644
> --- a/gcc/config/aarch64/aarch64-sme.md
> +++ b/gcc/config/aarch64/aarch64-sme.md
> @@ -38,6 +38,7 @@
> ;;  Binary arithmetic on ZA tile
> ;;  Binary arithmetic on ZA slice
> ;;  Binary arithmetic, writing to ZA slice
> +;;  Absolute minimum/maximum
> ;;
> ;; == Ternary arithmetic
> ;;  [INT] Dot product
> @@ -1264,6 +1265,23 @@ (define_insn "*aarch64_sme_single__plus"
>   "\tza.[%w0, %1, vgx], %2, %3."
> )
> 
> +;; -
> +;;  Absolute minimum/maximum
> +;; -
> +;; Includes:
> +;; - svamin (SME2+faminmax)
> +;; - svamin (SME2+faminmax)
> +;; -
> +
> +(define_insn "@aarch64_sme_"
> +  [(set (match_operand:SVE_Fx24 0 "register_operand" "=Uw")
> + (unspec:SVE_Fx24 [(match_operand:SVE_Fx24 1 "register_operand" "%0")
> +  (match_operand:SVE_Fx24 2 "register_operand" "Uw")]
> + FAMINMAX_UNS))]
> +  "TARGET_SME2 && TARGET_FAMINMAX"
> +  "\t%0, %1, %2"
> +)
> +
> ;; =
> ;; == Ternary arithmetic
> ;; =
> diff --git a/gcc/config/aarch64/aarch64-sve-

RE: [PATCH 1/3]middle-end: support vec_cbranch_any and vec_cbranch_all [PR118974]

2025-07-09 Thread Richard Biener

On Wed, 9 Jul 2025, Tamar Christina wrote:

> > > +Operand 0 is a comparison operator.  Operand 1 and operand 2 are the
> > > +first and second operands of the comparison, respectively.  Operand 3
> > > +is the @code{code_label} to jump to.
> > > +
> > > +@cindex @code{cbranch_all@var{mode}4} instruction pattern
> > > +@item @samp{cbranch_all@var{mode}4}
> > > +Conditional branch instruction combined with a compare instruction on 
> > > vectors
> > > +where it is required that at all of the elementwise comparisons of the
> > > +two input vectors are true.
> > 
> > See above.
> > 
> > When I look at the RTL for aarch64 I wonder whether the middle-end
> > can still invert a jump (for BB reorder, for example)?  Without
> > a cbranch_none expander we have to invert during RTL expansion?
> > 
> 
> Isn't cbranch_none just cbranch_all x 0? i.e. all value must be zero.
> I think all states are expressible with any and all and flipping the branches
> so it shouldn't be any more restrictive than cbranch itself is today.
> 
> cbranch also only supports eq and ne, so none would be cbranch (eq x 0)
> 
> and FTR the RTL generated for AArch64 (Both SVE And Adv.SIMD) will be 
> simplified to:
> 
> (insn 23 22 24 5 (parallel [
> (set (reg:VNx4BI 128 [ mask_patt_14.15_57 ])
> (unspec:VNx4BI [
> (reg:VNx4BI 129)
> (const_int 0 [0x0])
> (gt:VNx4BI (reg:VNx4SI 114 [ vect__2.11 ])
> (const_vector:VNx4SI repeat [
> (const_int 0 [0])
> ]))
> ] UNSPEC_PRED_Z))
> (clobber (reg:CC_NZC 66 cc))
> ]) "cbranch.c":25:10 -1
> 
> (jump_insn 27 26 28 5 (set (pc)
> (if_then_else (eq (reg:CC_Z 66 cc)
> (const_int 0 [0]))
> (label_ref 33)
> (pc))) "cbranch.c":25:10 -1
>  (int_list:REG_BR_PROB 1014686025 (nil))
> 
> The thing is we can't rid of the unspecs as there's concept of masking in RTL 
> compares.
> We could technically do an AND (and do in some cases) but then you lose the 
> predicate
> Hint constant in the RTL which tells you whether the mask is known to be all 
> true or not.
> This hint is crucial to allow for further optimizations.
> 
> That said the condition code, branch and compares are fully exposed.
> 
> We expand to a larger sequence than I'd like mostly because there's no support
> for conditional cbranch optabs, or even conditional vector comparisons. So 
> the comparisons
> must be generated unpredicated by generating an all true mask, and later 
> patterns
> merge in the AND.
> 
> The new patterns allow us to clean up codegen for Adv.SIMD + SVE (in a single 
> loop)
> But not pure SVE.  For which I take a different approach to try to avoid 
> requiring
> a predicated version of these optabs.
> 
> I don't want to push my luck, but would you be ok with a conditional version 
> of these
> optabs too? i.e. cond_cbranch_all and cond_cbranch_all?  This would allow us 
> to
> immediately expand to the correct representation for both SVE and Adv.SIMD
> without having to rely on various combine patterns and cc-fusion to optimize 
> the sequences
> later on (which has historically been a bit hit or miss if someone adds a new 
> CC pattern).

Can you explain?  A conditional conditional branch makes my head hurt.
It's really a cbranch_{all,any} where the (vector) compare has an
additional mask applied?  So cbranch_cond_{all,any} would be a better
fit?  Though 'cond_' elsewhere suggests there is an else value, instead
cbranch_mask_{all,any}?  Or would that not capture things exactly?
cbranch is compare-and-branch, so masked-compare-and-branch aka
mcbranch[_{all,any}]?

And writing this and not tryinig to remember everything said it
appears that 'cbranch' itself (for vectors) becomes redundant?

Richard.

> And the reason for both is that for Adv.SIMD there's no mask at GIMPLE level 
> and we have to
> make it during expand.
> 
> Thanks,
> Tamar
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Re: [PATCH] c++, libstdc++, v2: Implement C++26 P3068R5 - constexpr exceptions [PR117785]

2025-07-09 Thread Jakub Jelinek

On Tue, Jul 08, 2025 at 09:43:20PM -0400, Jason Merrill wrote:
Thanks for the review.
Working on the rest (most of it already have in my working copy).

> >   case CLEANUP_POINT_EXPR:
> > {
> > -   auto_vec cleanups;
> > +   auto_vec cleanups;
> 
> What's the rationale for this increase?  It seems costly to increase the
> size of a local variable in cxx_eval_constant_expression given the deep
> recursion.  I think we need a rationale for using any internal storage at
> all in this auto_vec; all full-expressions are wrapped in
> CLEANUP_POINT_EXPR, and only some of them actually have any cleanups.
> 
> Though I don't know for sure whether this actually affects the size of the
> stack frame.

Wanted to reply to this separately.
The rationale was that for C++26 CLEANUP_EH_ONLY will push not 0 but 2 elts
to the vector, so I thought increasing it a little bit would help.
Note, there is
  constexpr_ctx new_ctx;
  tree r = t;

  tree_code tcode = TREE_CODE (t);
  switch (tcode)
...
case CLEANUP_POINT_EXPR:
  {
auto_vec cleanups;
vec *prev_cleanups = ctx->global->cleanups;
ctx->global->cleanups = &cleanups;

auto_vec save_exprs;
constexpr_ctx new_ctx = *ctx;
new_ctx.save_exprs = &save_exprs;
before this change.
Unpatched trunk needs on x86_64 frame size 0x148,
with auto_vec that is 0x158, with auto_vec (aka
auto_vec 0x138 and with auto_vec changed to
auto_vec 0xd8.
Changing
constexpr_ctx new_ctx = *ctx;
to
new_ctx = *ctx;
doesn't free anything but is a good idea anyway.

I guess one option is to decrease both auto_vec to 0, another
is outline the CLEANUP_POINT_EXPR to a separate function (and perhaps
use [[gnu::noinline]] on it conditionally).
What do you prefer (though guess it can be done separately and I can
just drop the 2->4 change from the patch)?
And perhaps investigate more how to decrease the frame size further.

Jakub

Re: [PATCH] RISC-V: Adjust testdata for unsigned vector SAT_SUB

2025-07-09 Thread Jeff Law





On 7/9/25 4:51 AM, Kito Cheng wrote:

OK if Pan say OK
Note the CI failure is unrelated to Ciyan Pan's work.  Looks like 
something went goofy in the libstdc++ space.  I'm running it through my 
tester right now.


jeff

[PATCH] aarch64: Extend HVLA permutations to big-endian

2025-07-09 Thread Richard Sandiford

TARGET_VECTORIZE_VEC_PERM_CONST has code to match the SVE2.1
"hybrid VLA" DUPQ, EXTQ, UZPQ{1,2}, and ZIPQ{1,2} instructions.
This matching was conditional on !BYTES_BIG_ENDIAN.

The ACLE code also lowered the associated SVE2.1 intrinsics into
suitable VEC_PERM_EXPRs.  This lowering was not conditional on
!BYTES_BIG_ENDIAN.

The mismatch led to lots of ICEs in the ACLE tests on big-endian
targets: we lowered to VEC_PERM_EXPRs that are not supported.

I think the !BYTES_BIG_ENDIAN restriction was unnecessary.
SVE maps the first memory element to the least significant end of
the register for both endiannesses, so no endian correction or lane
number adjustment is necessary.

This is in some ways a bit counterintuitive.  ZIPQ1 is conceptually
"apply Advanced SIMD ZIP1 to each 128-bit block" and endianness does
matter when choosing between Advanced SIMD ZIP1 and ZIP2.  For example,
the V4SI permute selector { 0, 4, 1, 5 } corresponds to ZIP1 for little-
endian and ZIP2 for big-endian.  But the difference between the hybrid
VLA and Advanced SIMD permute selectors is a consequence of the
difference between the SVE and Advanced SIMD element orders.

The same thing applies to ACLE intrinsics.  The current lowering of
svzipq1 etc. is correct for both endiannesses.  If ACLE code does:

  2x svld1_s32 + svzipq1_s32 + svst1_s32

then the byte-for-byte result is the same for both endiannesses.
On big-endian targets, this is different from using the Advanced SIMD
sequence below for each 128-bit block:

  2x LDR + ZIP1 + STR

In contrast, the byte-for-byte result of:

  2x svld1q_gather_s32 + svzipq1_s32 + svst11_scatter_s32

depends on endianness, since the quadword gathers and scatters use
Advanced SIMD byte ordering for each 128-bit block.  This gather/scatter
sequence behaves in the same way as the Advanced SIMD LDR+ZIP1+STR
sequence for both endiannesses.

Programmers writing ACLE code have to be aware of this difference
if they want to support both endiannesses.

The patch includes some new execution tests to verify the expansion
of the VEC_PERM_EXPRs.

Tested on aarch64-linux-gnu and aarch64_be-elf.  OK to install?

Richard


gcc/
* doc/sourcebuild.texi (aarch64_sve2_hw, aarch64_sve2p1_hw): Document.
* config/aarch64/aarch64.cc (aarch64_evpc_hvla): Extend to
BYTES_BIG_ENDIAN.

gcc/testsuite/
* lib/target-supports.exp (check_effective_target_aarch64_sve2p1_hw):
New proc.
* gcc.target/aarch64/sve2/dupq_1.c: Extend to big-endian.  Add
noipa attributes.
* gcc.target/aarch64/sve2/extq_1.c: Likewise.
* gcc.target/aarch64/sve2/uzpq_1.c: Likewise.
* gcc.target/aarch64/sve2/zipq_1.c: Likewise.
* gcc.target/aarch64/sve2/dupq_1_run.c: New test.
* gcc.target/aarch64/sve2/extq_1_run.c: Likewise.
* gcc.target/aarch64/sve2/uzpq_1_run.c: Likewise.
* gcc.target/aarch64/sve2/zipq_1_run.c: Likewise.
---
 gcc/config/aarch64/aarch64.cc |  1 -
 gcc/doc/sourcebuild.texi  |  6 ++
 .../gcc.target/aarch64/sve2/dupq_1.c  | 26 +++---
 .../gcc.target/aarch64/sve2/dupq_1_run.c  | 87 +++
 .../gcc.target/aarch64/sve2/extq_1.c  |  2 +-
 .../gcc.target/aarch64/sve2/extq_1_run.c  | 73 
 .../gcc.target/aarch64/sve2/uzpq_1.c  |  2 +-
 .../gcc.target/aarch64/sve2/uzpq_1_run.c  | 78 +
 .../gcc.target/aarch64/sve2/zipq_1.c  |  2 +-
 .../gcc.target/aarch64/sve2/zipq_1_run.c  | 78 +
 gcc/testsuite/lib/target-supports.exp | 17 
 11 files changed, 355 insertions(+), 17 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/dupq_1_run.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/extq_1_run.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/uzpq_1_run.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/zipq_1_run.c

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 7960b639f90..ce25f4f6f9f 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -26752,7 +26752,6 @@ aarch64_evpc_hvla (struct expand_vec_perm_d *d)
   machine_mode vmode = d->vmode;
   if (!TARGET_SVE2p1
   || !TARGET_NON_STREAMING
-  || BYTES_BIG_ENDIAN
   || d->vec_flags != VEC_SVE_DATA
   || GET_MODE_UNIT_BITSIZE (vmode) > 64)
 return false;
diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index 6c5586e4b03..85fb810d96c 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -2373,6 +2373,12 @@ whether it does so by default).
 @itemx aarch64_sve1024_hw
 @itemx aarch64_sve2048_hw
 Like @code{aarch64_sve_hw}, but also test for an exact hardware vector length.
+@item aarch64_sve2_hw
+AArch64 target that is able to generate and execute SVE2 code (regardless of
+whether it does so by default).
+@item aarch64_sve2p1_hw
+AArch64 target that is able to generate and execute SVE2.1 co

[PATCH] aarch64: Some fixes for SVE INDEX constants

2025-07-09 Thread Richard Sandiford

When using SVE INDEX to load an Advanced SIMD vector, we need to
take account of the different element ordering for big-endian
targets.  For example, when big-endian targets store the V4SI
constant { 0, 1, 2, 3 } in registers, 0 becomes the most
significant element, whereas INDEX always operates from the
least significant element.  A big-endian target would therefore
load V4SI { 0, 1, 2, 3 } using:

INDEX Z0.S, #3, #-1

rather than little-endian's:

INDEX Z0.S, #0, #1

While there, I noticed that we would only check the first vector
in a multi-vector SVE constant, which would trigger an ICE if the
other vectors turned out to be invalid.  This is pretty difficult to
trigger at the moment, since we only allow single-register modes to be
used as frontend & middle-end vector modes, but it can be seen using
the RTL frontend.

Tested on aarch64-linux-gnu and aarch64_be-elf.  OK to install?

Richard


gcc/
* config/aarch64/aarch64.cc (aarch64_sve_index_series_p): New
function, split out from...
(aarch64_simd_valid_imm): ...here.  Account for the different
SVE and Advanced SIMD element orders on big-endian targets.
Check each vector in a structure mode.

gcc/testsuite/
* gcc.dg/rtl/aarch64/vec-series-1.c: New test.
* gcc.dg/rtl/aarch64/vec-series-2.c: Likewise.
* gcc.target/aarch64/sve/acle/general/dupq_2.c: Fix expected
output for this big-endian test.
* gcc.target/aarch64/sve/acle/general/dupq_4.c: Likewise.
* gcc.target/aarch64/sve/vec_init_3.c: Restrict to little-endian
targets and add more tests.
* gcc.target/aarch64/sve/vec_init_4.c: New big-endian version
of vec_init_3.c.
---
 gcc/config/aarch64/aarch64.cc |  59 -
 .../gcc.dg/rtl/aarch64/vec-series-1.c |  35 +++
 .../gcc.dg/rtl/aarch64/vec-series-2.c |  35 +++
 .../aarch64/sve/acle/general/dupq_2.c |   2 +-
 .../aarch64/sve/acle/general/dupq_4.c |   2 +-
 .../gcc.target/aarch64/sve/vec_init_3.c   | 114 +-
 .../gcc.target/aarch64/sve/vec_init_4.c   | 209 ++
 7 files changed, 446 insertions(+), 10 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/rtl/aarch64/vec-series-1.c
 create mode 100644 gcc/testsuite/gcc.dg/rtl/aarch64/vec-series-2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/vec_init_4.c

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index ce25f4f6f9f..6d5b2009b2a 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -23074,6 +23074,58 @@ aarch64_sve_index_immediate_p (rtx base_or_step)
  && IN_RANGE (INTVAL (base_or_step), -16, 15));
 }
 
+/* Return true if SERIES is a constant vector that can be loaded using
+   an immediate SVE INDEX, considering both SVE and Advanced SIMD modes.
+   When returning true, store the base in *BASE_OUT and the step
+   in *STEP_OUT.  */
+
+static bool
+aarch64_sve_index_series_p (rtx series, rtx *base_out, rtx *step_out)
+{
+  rtx base, step;
+  if (!const_vec_series_p (series, &base, &step)
+  || !CONST_INT_P (base)
+  || !CONST_INT_P (step))
+return false;
+
+  auto mode = GET_MODE (series);
+  auto elt_mode = as_a (GET_MODE_INNER (mode));
+  unsigned int vec_flags = aarch64_classify_vector_mode (mode);
+  if (BYTES_BIG_ENDIAN && (vec_flags & VEC_ADVSIMD))
+{
+  /* On big-endian targets, architectural lane 0 holds the last element
+for Advanced SIMD and the first element for SVE; see the comment at
+the head of aarch64-sve.md for details.  This means that, from an SVE
+point of view, an Advanced SIMD series goes from the last element to
+the first.  */
+  auto i = GET_MODE_NUNITS (mode).to_constant () - 1;
+  base = gen_int_mode (UINTVAL (base) + i * UINTVAL (step), elt_mode);
+  step = gen_int_mode (-UINTVAL (step), elt_mode);
+}
+
+  if (!aarch64_sve_index_immediate_p (base)
+  || !aarch64_sve_index_immediate_p (step))
+return false;
+
+  /* If the mode spans multiple registers, check that each subseries is
+ in range.  */
+  unsigned int nvectors = aarch64_ldn_stn_vectors (mode);
+  if (nvectors != 1)
+{
+  unsigned int nunits;
+  if (!GET_MODE_NUNITS (mode).is_constant (&nunits))
+   return false;
+  nunits /= nvectors;
+  for (unsigned int i = 1; i < nvectors; ++i)
+   if (!IN_RANGE (INTVAL (base) + i * nunits * INTVAL (step), -16, 15))
+ return false;
+}
+
+  *base_out = base;
+  *step_out = step;
+  return true;
+}
+
 /* Return true if X is a valid immediate for the SVE ADD and SUB instructions
when applied to mode MODE.  Negate X first if NEGATE_P is true.  */
 
@@ -23522,13 +23574,8 @@ aarch64_simd_valid_imm (rtx op, simd_immediate_info 
*info,
 n_elts = CONST_VECTOR_NPATTERNS (op);
   else if (which == AARCH64_CHECK_MOV
   && TARGET_SVE
-  && const_vec_series_p (op, &base, &step))
+

Re: [PATCH] libstdc++: Implement std::chrono::current_zone() for Windows

2025-07-09 Thread Jonathan Wakely

On Tue, 8 Jul 2025 at 21:47, Björn Schäpers  wrote:
>
> From: Björn Schäpers 
>
> I have based this on my previous (not yet landed) patch, but it only
> reuses the #ifdef to include . Since std::array isn't used
> anywhere else I thought that was the right place to put it.
>
> I hope the formatting is okay.
>
> I've used wide strings for the Windows zone name and territory, since
> the Windows API returns wide strings and thus they can be compared
> directly. For the territory there exists a narrow string API, but
> internally it calls the wide string version and narrows it down. If
> desired I can switch to narrow strings, the conversion can be done by
> static_cast per character since only ASCII chars are used.

Working with wide strings seems fine, that's the native format.

I think the generated header should be written to src/c++20/ directory
though, since it doesn't need to be installed alongside the public
headers and doesn't need to be included by anything except tzdb.cc.
That would mean you can just do #include "windows_zones-map.h" in tzdb.cc


>
> -- >8 --
> On Windows there is no API to get the current time zone as IANA name,
> instead Windows has its own zones. But there exists a mapping provided
> by the Unicode Consortium. This patch adds a script to convert the XML
> file with the mapping to a lookup table and adds a Windows code path to
> use that mapping.
>
> libstdc++-v3/Changelog:
>
> Implement std::chrono::current_zone() for Windows
>
> * include/bits/windows_zones-map.h: New file, contains the look
> up table.
> * scripts/gen_windows_zones_map.py: New file, generates
> windows_zones-map.h.
> * src/c++20/tzdb.cc (tzdb::current_zone): Add Windows code path.
>
> Signed-off-by: Björn Schäpers 
> ---
>  libstdc++-v3/include/bits/windows_zones-map.h | 407 ++
>  libstdc++-v3/scripts/gen_windows_zones_map.py | 128 ++
>  libstdc++-v3/src/c++20/tzdb.cc| 102 -
>  3 files changed, 635 insertions(+), 2 deletions(-)
>  create mode 100644 libstdc++-v3/include/bits/windows_zones-map.h
>  create mode 100644 libstdc++-v3/scripts/gen_windows_zones_map.py
>
> diff --git a/libstdc++-v3/include/bits/windows_zones-map.h 
> b/libstdc++-v3/include/bits/windows_zones-map.h
> new file mode 100644
> index 000..7be736b063d
> --- /dev/null
> +++ b/libstdc++-v3/include/bits/windows_zones-map.h
> @@ -0,0 +1,407 @@
> +// Generated by scripts/gen_windows_zones_map.py, do not edit.
> +
> +// Copyright The GNU Toolchain Authors.
> +//
> +// This file is part of the GNU ISO C++ Library.  This library is free
> +// software; you can redistribute it and/or modify it under the
> +// terms of the GNU General Public License as published by the
> +// Free Software Foundation; either version 3, or (at your option)
> +// any later version.
> +
> +// This library is distributed in the hope that it will be useful,
> +// but WITHOUT ANY WARRANTY; without even the implied warranty of
> +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +// GNU General Public License for more details.
> +
> +// Under Section 7 of GPL version 3, you are granted additional
> +// permissions described in the GCC Runtime Library Exception, version
> +// 3.1, as published by the Free Software Foundation.
> +
> +// You should have received a copy of the GNU General Public License and
> +// a copy of the GCC Runtime Library Exception along with this program;
> +// see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
> +// .
> +
> +/** @file bits/windows_zones-map.h
> + *  This is an internal header file, included by other library headers.
> + *  Do not attempt to use it directly. @headername{chrono}
> + */
> +
> +#ifndef _GLIBCXX_GET_WINDOWS_ZONES_MAP
> +# error "This is not a public header, do not include it directly"
> +#endif
> +
> +struct windows_zone_map_entry
> +{
> +  wstring_view windows_name;
> +  wstring_view territory;
> +  string_view iana_name;
> +};
> +
> +static constexpr array windows_zone_map{
> +  {
> +{L"AUS Central Standard Time", L"001", "Australia/Darwin"},
> +{L"AUS Eastern Standard Time", L"001", "Australia/Sydney"},
> +{L"Afghanistan Standard Time", L"001", "Asia/Kabul"},
> +{L"Alaskan Standard Time", L"001", "America/Anchorage"},
> +{L"Aleutian Standard Time", L"001", "America/Adak"},
> +{L"Altai Standard Time", L"001", "Asia/Barnaul"},
> +{L"Arab Standard Time", L"001", "Asia/Riyadh"},
> +{L"Arab Standard Time", L"BH", "Asia/Bahrain"},
> +{L"Arab Standard Time", L"KW", "Asia/Kuwait"},
> +{L"Arab Standard Time", L"QA", "Asia/Qatar"},
> +{L"Arab Standard Time", L"YE", "Asia/Aden"},
> +{L"Arabian Standard Time", L"001", "Asia/Dubai"},
> +{L"Arabian Standard Time", L"OM", "Asia/Muscat"},
> +{L"Arabian Standard Time", L"ZZ", "Etc/GMT-4"},
> +{L"Arabic Standard Time", L"001", "Asia/Baghdad"},
> +{L"Argentina Standard

[PATCH] Make the RTL frontend set REG_NREGS correctly

2025-07-09 Thread Richard Sandiford

While working on a new testcase that uses the RTL frontend,
I hit a bug where a (reg ...) that spans multiple hard registers
had REG_NREGS set to 1.  This caused various things to misbehave.
For example, if the (reg ...) in question was used as crtl->return_rtx,
only the first register in the group would be marked as live on exit.

Tested on aarch64-linux-gnu and aarch64_be-elf.  OK to install?

Richard


gcc/
* read-rtl-function.cc (function_reader::read_rtx_operand_r): Use
hard_regno_nregs to work out REG_NREGS for hard registers.
---
 gcc/read-rtl-function.cc | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/gcc/read-rtl-function.cc b/gcc/read-rtl-function.cc
index fb9c9554ea3..1f08c50cc12 100644
--- a/gcc/read-rtl-function.cc
+++ b/gcc/read-rtl-function.cc
@@ -1065,7 +1065,10 @@ function_reader::read_rtx_operand_r (rtx x)
   if (regno == -1)
 fatal_at (loc, "unrecognized register: '%s'", name.string);
 
-  set_regno_raw (x, regno, 1);
+  int nregs = 1;
+  if (HARD_REGISTER_NUM_P (regno))
+nregs = hard_regno_nregs (regno, GET_MODE (x));
+  set_regno_raw (x, regno, nregs);
 
   /* Consolidate singletons.  */
   x = consolidate_singletons (x);
-- 
2.43.0

Re: [PATCH] libstdc++: Implement std::chrono::current_zone() for Windows

2025-07-09 Thread Jonathan Wakely

On Wed, 9 Jul 2025 at 15:13, Jonathan Wakely  wrote:
>
> On Tue, 8 Jul 2025 at 21:47, Björn Schäpers  wrote:
> >
> > From: Björn Schäpers 
> >
> > I have based this on my previous (not yet landed) patch, but it only
> > reuses the #ifdef to include . Since std::array isn't used
> > anywhere else I thought that was the right place to put it.
> >
> > I hope the formatting is okay.
> >
> > I've used wide strings for the Windows zone name and territory, since
> > the Windows API returns wide strings and thus they can be compared
> > directly. For the territory there exists a narrow string API, but
> > internally it calls the wide string version and narrows it down. If
> > desired I can switch to narrow strings, the conversion can be done by
> > static_cast per character since only ASCII chars are used.
>
> Working with wide strings seems fine, that's the native format.
>
> I think the generated header should be written to src/c++20/ directory
> though, since it doesn't need to be installed alongside the public
> headers and doesn't need to be included by anything except tzdb.cc.
> That would mean you can just do #include "windows_zones-map.h" in tzdb.cc
>
>
> >
> > -- >8 --
> > On Windows there is no API to get the current time zone as IANA name,
> > instead Windows has its own zones. But there exists a mapping provided
> > by the Unicode Consortium. This patch adds a script to convert the XML
> > file with the mapping to a lookup table and adds a Windows code path to
> > use that mapping.
> >
> > libstdc++-v3/Changelog:
> >
> > Implement std::chrono::current_zone() for Windows
> >
> > * include/bits/windows_zones-map.h: New file, contains the look
> > up table.
> > * scripts/gen_windows_zones_map.py: New file, generates
> > windows_zones-map.h.
> > * src/c++20/tzdb.cc (tzdb::current_zone): Add Windows code path.
> >
> > Signed-off-by: Björn Schäpers 
> > ---
> >  libstdc++-v3/include/bits/windows_zones-map.h | 407 ++
> >  libstdc++-v3/scripts/gen_windows_zones_map.py | 128 ++
> >  libstdc++-v3/src/c++20/tzdb.cc| 102 -
> >  3 files changed, 635 insertions(+), 2 deletions(-)
> >  create mode 100644 libstdc++-v3/include/bits/windows_zones-map.h
> >  create mode 100644 libstdc++-v3/scripts/gen_windows_zones_map.py
> >
> > diff --git a/libstdc++-v3/include/bits/windows_zones-map.h 
> > b/libstdc++-v3/include/bits/windows_zones-map.h
> > new file mode 100644
> > index 000..7be736b063d
> > --- /dev/null
> > +++ b/libstdc++-v3/include/bits/windows_zones-map.h
> > @@ -0,0 +1,407 @@
> > +// Generated by scripts/gen_windows_zones_map.py, do not edit.
> > +
> > +// Copyright The GNU Toolchain Authors.
> > +//
> > +// This file is part of the GNU ISO C++ Library.  This library is free
> > +// software; you can redistribute it and/or modify it under the
> > +// terms of the GNU General Public License as published by the
> > +// Free Software Foundation; either version 3, or (at your option)
> > +// any later version.
> > +
> > +// This library is distributed in the hope that it will be useful,
> > +// but WITHOUT ANY WARRANTY; without even the implied warranty of
> > +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > +// GNU General Public License for more details.
> > +
> > +// Under Section 7 of GPL version 3, you are granted additional
> > +// permissions described in the GCC Runtime Library Exception, version
> > +// 3.1, as published by the Free Software Foundation.
> > +
> > +// You should have received a copy of the GNU General Public License and
> > +// a copy of the GCC Runtime Library Exception along with this program;
> > +// see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
> > +// .
> > +
> > +/** @file bits/windows_zones-map.h
> > + *  This is an internal header file, included by other library headers.
> > + *  Do not attempt to use it directly. @headername{chrono}
> > + */
> > +
> > +#ifndef _GLIBCXX_GET_WINDOWS_ZONES_MAP
> > +# error "This is not a public header, do not include it directly"
> > +#endif
> > +
> > +struct windows_zone_map_entry
> > +{
> > +  wstring_view windows_name;
> > +  wstring_view territory;
> > +  string_view iana_name;
> > +};
> > +
> > +static constexpr array windows_zone_map{
> > +  {
> > +{L"AUS Central Standard Time", L"001", "Australia/Darwin"},
> > +{L"AUS Eastern Standard Time", L"001", "Australia/Sydney"},
> > +{L"Afghanistan Standard Time", L"001", "Asia/Kabul"},
> > +{L"Alaskan Standard Time", L"001", "America/Anchorage"},
> > +{L"Aleutian Standard Time", L"001", "America/Adak"},
> > +{L"Altai Standard Time", L"001", "Asia/Barnaul"},
> > +{L"Arab Standard Time", L"001", "Asia/Riyadh"},
> > +{L"Arab Standard Time", L"BH", "Asia/Bahrain"},
> > +{L"Arab Standard Time", L"KW", "Asia/Kuwait"},
> > +{L"Arab Standard Time", L"QA", "Asia/Qatar"},
> > +{L"Arab Standard Time", L

[PATCH] aarch64: Fix endianness of DFmode vector constants

2025-07-09 Thread Richard Sandiford

aarch64_simd_valid_imm tries to decompose a constant into a repeating
series of 64 bits, since most Advanced SIMD and SVE immediate forms
require that.  (The exceptions are handled first.)  It does this by
building up a byte-level register image, lsb first.  If the image does
turn out to repeat every 64 bits, it loads the first 64 bits into an
integer.

At this point, endianness has mostly been dealt with.  Endianness
applies to transfers between registers and memory, whereas at this
point we're dealing purely with register values.

However, one of things we try is to bitcast the value to a float
and use FMOV.  This involves splitting the value into 32-bit chunks
(stored as longs) and passing them to real_from_target.  The problem
being fixed by this patch is that, when a value spans multiple 32-bit
chunks, real_from_target expects them to be in memory rather than
register order.  Thus index 0 is the most significant chunk if
FLOAT_WORDS_BIG_ENDIAN and the least significant chunk otherwise.

This fixes aarch64/sve/cond_fadd_1.c and various other tests
for aarch64_be-elf.

Tested on aarch64-linux-gnu and aarch64_be-elf.  OK to install?

Richard


gcc/
* config/aarch64/aarch64.cc (aarch64_simd_valid_imm): Account
for FLOAT_WORDS_BIG_ENDIAN when building a floating-point value.
---
 gcc/config/aarch64/aarch64.cc | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 6d5b2009b2a..27c315fc35e 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -23687,6 +23687,8 @@ aarch64_simd_valid_imm (rtx op, simd_immediate_info 
*info,
   long int as_long_ints[2];
   as_long_ints[0] = ival & 0x;
   as_long_ints[1] = (ival >> 32) & 0x;
+  if (imode == DImode && FLOAT_WORDS_BIG_ENDIAN)
+   std::swap (as_long_ints[0], as_long_ints[1]);
 
   REAL_VALUE_TYPE r;
   real_from_target (&r, as_long_ints, fmode);
-- 
2.43.0

Re: [PATCH] c++, libstdc++, v2: Implement C++26 P3068R5 - constexpr exceptions [PR117785]

2025-07-09 Thread Jason Merrill


On 7/9/25 7:34 AM, Jakub Jelinek wrote:

On Tue, Jul 08, 2025 at 09:43:20PM -0400, Jason Merrill wrote:
Thanks for the review.
Working on the rest (most of it already have in my working copy).


   case CLEANUP_POINT_EXPR:
 {
-   auto_vec cleanups;
+   auto_vec cleanups;


What's the rationale for this increase?  It seems costly to increase the
size of a local variable in cxx_eval_constant_expression given the deep
recursion.  I think we need a rationale for using any internal storage at
all in this auto_vec; all full-expressions are wrapped in
CLEANUP_POINT_EXPR, and only some of them actually have any cleanups.

Though I don't know for sure whether this actually affects the size of the
stack frame.


Wanted to reply to this separately.
The rationale was that for C++26 CLEANUP_EH_ONLY will push not 0 but 2 elts
to the vector, so I thought increasing it a little bit would help.
Note, there is
   constexpr_ctx new_ctx;
   tree r = t;

   tree_code tcode = TREE_CODE (t);
   switch (tcode)
...
 case CLEANUP_POINT_EXPR:
   {
 auto_vec cleanups;
 vec *prev_cleanups = ctx->global->cleanups;
 ctx->global->cleanups = &cleanups;

 auto_vec save_exprs;
 constexpr_ctx new_ctx = *ctx;
 new_ctx.save_exprs = &save_exprs;
before this change.
Unpatched trunk needs on x86_64 frame size 0x148,
with auto_vec that is 0x158, with auto_vec (aka
auto_vec 0x138 and with auto_vec changed to
auto_vec 0xd8.
Changing
constexpr_ctx new_ctx = *ctx;
to
new_ctx = *ctx;
doesn't free anything but is a good idea anyway.

I guess one option is to decrease both auto_vec to 0, another
is outline the CLEANUP_POINT_EXPR to a separate function (and perhaps
use [[gnu::noinline]] on it conditionally).
What do you prefer (though guess it can be done separately and I can
just drop the 2->4 change from the patch)?
And perhaps investigate more how to decrease the frame size further.


I don't have a strong preference between the two approaches.  Dropping 
both to 0 is certainly simpler, is there any noticeable performance 
impact?  And yes, it makes sense for this to be done separately.


Jason

Re: [Patch, Fortran, Coarray, PR88076, v1] 7/6 Add a shared memory multi process coarray library.

2025-07-09 Thread Andre Vehreschild

Hi Jerry,

good you could build Toon's code. 

Your idea of using the OpenCoarray tests to test caf_shmem made me think about
how to do it the easiest. 

I came up with the following:

1. Pull a recent OpenCoarray source tree from Github or use a clean existing
one.
2. Apply attached patch.
3. Create a build directory for OpenCoarrays.
4. Copy or link libcaf_shmem.a from a recent gfortran build to the OpenCoarray
build directory.
5. Do a cmake  in the build directory.
6. Now cmake --build . && ctest . should test using caf_shmem.

Notes:
- Some tests have been deactivated, because they use MPI-features, that are not
  present in caf_shmem.
- There are still two test failing about some test on cafrun. Those are ok,
  because the file has been patched and I can't find where the test is coming
  from. Unimportant!
- You need an update patch set for caf_shmem, because the OpenCoarray tests
  revealed some/a lot of issues in caf_shmem. I am currently regtesting the
  patch and will publish as soon as it regtests cleanly.

Regards,
Andre

On Fri, 4 Jul 2025 10:51:43 -0700
Jerry D  wrote:

> On 7/4/25 5:12 AM, Andre Vehreschild wrote:
> > Hi all,
> > 
> > attached patches goes on top of other 6 caf_shmem coarray patches and fixes
> > missing includes esp. on non-Linux systems. I have tested this on a FreeBSD,
> > which is very time consuming due to it being fully virtualized on my system.
> > 
> > Regtests ok on x86_64-pc-linux-gnu and aarch64-unknown-freebsd14.3. Ok for
> > mainline?
> > 
> > Thanks to Steve for bringing these deficiencies to my attention.
> > 
> > Regards,
> > Andre  
> 
> So far,
> 
> $ export GFORTRAN_NUM_IMAGES=9
> $ rm *.mod
> $ gfc -fcoarray=lib random-weather.f90  -lcaf_shmem
> $ ./a.out
> Decomposition information on image   6 : there are   9 *   1 slabs; the slabs 
> are   8 *  70 grid cells in size.
> Decomposition information on image   1 : there are   9 *   1 slabs; the slabs 
> are   8 *  70 grid cells in size.
> Decomposition information on image   4 : there are   9 *   1 slabs; the slabs 
> are   8 *  70 grid cells in size.
> Decomposition information on image   5 : there are   9 *   1 slabs; the slabs 
> are   8 *  70 grid cells in size.
> Decomposition information on image   9 : there are   9 *   1 slabs; the slabs 
> are   8 *  70 grid cells in size.
> Decomposition information on image   3 : there are   9 *   1 slabs; the slabs 
> are   8 *  70 grid cells in size.
> Decomposition information on image   8 : there are   9 *   1 slabs; the slabs 
> are   8 *  70 grid cells in size.
> Decomposition information on image   2 : there are   9 *   1 slabs; the slabs 
> are   8 *  70 grid cells in size.
> Decomposition information on image   7 : there are   9 *   1 slabs; the slabs 
> are   8 *  70 grid cells in size.
> .
> .
> .
>   Time3600  Image   4  PS=   99925.0391  T=301.282928 
>U=   -51.2542686  V=24.3605309  W=  -0.296301365  Q= 
> 1.48258626E-03
>   Time3600  Image   9  PS=   99899.3047  T=299.897095 
>U=62.8683090  V=   -57.9342270  W=   0.445489585  Q= 
> 1.90666097E-03
>   Time3600  Image   1  PS=   99966.7656  T=300.011597 
>U=   -1.93229961  V=   -118.892410  W=   -6.45965934E-02  Q= 
> 2.03774264E-03
>   Time3600  Image   7  PS=   100015.938  T=300.066162 
>U=   -17.6038494  V=  -0.982973158  W=7.21789524E-02  Q= 
> 2.17592530E-03
>   Time3600  Image   2  PS=   13.477  T=300.078522 
>U=   -2.38964367  V=   -18.8026981  W=  -0.179861650  Q= 
> 1.99834118E-03
>   Time3600  Image   5  PS=   100077.422  T=300.781494 
>U=   -16.6273994  V=   -101.607895  W=   0.361649722  Q= 
> 1.7433E-03
>   Time3600  Image   3  PS=   12.391  T=299.708862 
>U=18.6304798  V=   0.391739845  W=2.24014949E-02  Q= 
> 1.96914421E-03
>   Time3600  Image   8  PS=   100074.359  T=299.516235 
>U=   -55.1445618  V=68.3090286  W=  -0.537869334  Q= 
> 2.32057413E-03
>   Time3600  Image   6  PS=   99976.4453  T=300.221924 
>U=   -1.62557888  V=1.44226456  W=   0.201509774  Q= 
> 1.97460176E-03
> $
> 
> 
> real  0m0.066s
> user  0m0.337s
> sys   0m0.107s
> 
> Definitely much faster than mpich.  I also over prescribed the number of
> images to 30 and ran as well.
> 
> I still need to build OpenCoarrays using this gfortran-16 and make sure it 
> succeeds those tests with mpich.  I will try to then test each case on the 
> OpenCoarrays suite of tests with -lcaf_shmem and see if those all work.
> 
> Any ideas on how to stress test this. I only have 32 gig of memory here and 
> would like to see how a longer running program does.
> 
> Regards,
> 
> Jerry


-- 
Andre Vehreschild * Email: vehre ad gmx dot de 
From fad13

[PUSHED] Fix 'main' function in 'gcc.dg/builtin-dynamic-object-size-pr120780.c'

2025-07-09 Thread Thomas Schwinge

Fix-up for commit 72e85d46472716e670cbe6e967109473b8d12d38
"tree-optimization/120780: Support object size for containing objects".
'size_t sz' is unused here, and GCC/nvptx doesn't accept this:

spawn -ignore SIGHUP [...]/nvptx-none-run 
./builtin-dynamic-object-size-pr120780.exe
error   : Prototype doesn't match for 'main' in 'input file 1 at offset 
1924', first defined in 'input file 1 at offset 1924'
nvptx-run: cuLinkAddData failed: unknown error (CUDA_ERROR_UNKNOWN, 999)
FAIL: gcc.dg/builtin-dynamic-object-size-pr120780.c execution test

gcc/testsuite/
* gcc.dg/builtin-dynamic-object-size-pr120780.c: Fix 'main' function.
---
 gcc/testsuite/gcc.dg/builtin-dynamic-object-size-pr120780.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-pr120780.c 
b/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-pr120780.c
index 0d6593ec828..12e6c29569c 100644
--- a/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-pr120780.c
+++ b/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-pr120780.c
@@ -207,7 +207,7 @@ test5 (size_t sz)
 }
 
 int
-main (size_t sz)
+main (void)
 {
   test1 (sizeof (struct container));
   test1 (sizeof (struct container) - sizeof (int));
-- 
2.34.1

Re: [PATCH] libstdc++: Members missing in std::numeric_limits

2025-07-09 Thread Mateusz Zych

Thank you for reviewing my patch and committing it!
I'm glad that I've been able to contribute to such an important project.

Thanks, Mateusz Zych

On Wed, Jul 9, 2025 at 2:17 PM Jonathan Wakely  wrote:

> On Fri, 4 Jul 2025 at 13:11, Mateusz Zych  wrote:
> >
> > Hello!
> >
> > I've updated the ChangeLog, since I forgot to do it before.
>
> Thanks, I've pushed the patch to trunk now.  I used a simpler commit
> message, without the large verbatim quotes from the standard.
>
> Thanks again for noticing the problem and contributing the fix.
>
>
> >
> > Thanks, Mateusz Zych
> >
> > On Thu, Jul 3, 2025 at 9:49 PM Mateusz Zych  wrote:
> >>
> >> Hello!
> >>
> >> I've prepared a patch, which adds all members missing from
> >> std::numeric_limits<> specializations for integer-class types.
> >>
> >> Jonathan, please let me know whether you like these changes
> >> and do not see any bugs or issues with them. From my side, I just want
> to say that:
> >>
> >> Since all std::numeric_limits<> specializations for integral types,
> >> defined in //libstdc++-v3/include/std/limits don't inherit from a base
> class
> >> providing common data members and member functions,
> >> I also didn't introduce such a base class in
> //libstdc++-v3/include/bits/max_size_type.h.
> >> Such implementation has quite a bit of code duplication, but it's like
> that on purpose, right?
> >>
> >> I didn't test the traps static data member, because I don't know how to
> >> accurately predict when this compile-time constant should be true and
> when it should be false.
> >> Moreover, I saw that the unit-test verifying correctness of the traps
> constant
> >> from std::numeric_limits<> specializations for integral types
> >> (//libstdc++-v3/testsuite/18_support/numeric_limits/traps.cc) also
> doesn't verify its value.
> >>
> >> In the unit-tests for integer-class types I've defined variable template
> >> verify_numeric_limits_values_not_meaningful_for<> to avoid code
> duplication
> >> and have clear and readable code. I hope this is OK.
> >>
> >> Thanks, Mateusz Zych
> >>
> >> On Wed, Jul 2, 2025 at 7:30 PM Jonathan Wakely 
> wrote:
> >>>
> >>> On Wed, 2 Jul 2025 at 17:15, Mateusz Zych wrote:
> >>> >
> >>> > OK, then I’ll prepare appropriate patch with tests and send it when
> I’m done implementing it.
> >>>
> >>> That would be great, thanks. I won't push the initial patch, we can
> >>> wait for you to prepare the complete fix.
> >>>
> >>> Please note that for a more significant change, we have some legal
> >>> prerequisites for contributions, as documented at:
> >>> https://gcc.gnu.org/contribute.html#legal
> >>>
> >>> If you want to contribute under the DCO terms, please read
> >>> https://gcc.gnu.org/dco.html so that you understand exactly what the
> >>> Signed-off-by: trailer means.
> >>>
> >>> Thanks!
> >>>
>
>

RE: [PATCH 1/3]middle-end: support vec_cbranch_any and vec_cbranch_all [PR118974]

2025-07-09 Thread Tamar Christina

> -Original Message-
> From: Richard Biener 
> Sent: Wednesday, July 9, 2025 12:36 PM
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; nd 
> Subject: RE: [PATCH 1/3]middle-end: support vec_cbranch_any and
> vec_cbranch_all [PR118974]
> 
> On Wed, 9 Jul 2025, Tamar Christina wrote:
> 
> > > > +Operand 0 is a comparison operator.  Operand 1 and operand 2 are the
> > > > +first and second operands of the comparison, respectively.  Operand 3
> > > > +is the @code{code_label} to jump to.
> > > > +
> > > > +@cindex @code{cbranch_all@var{mode}4} instruction pattern
> > > > +@item @samp{cbranch_all@var{mode}4}
> > > > +Conditional branch instruction combined with a compare instruction on
> vectors
> > > > +where it is required that at all of the elementwise comparisons of the
> > > > +two input vectors are true.
> > >
> > > See above.
> > >
> > > When I look at the RTL for aarch64 I wonder whether the middle-end
> > > can still invert a jump (for BB reorder, for example)?  Without
> > > a cbranch_none expander we have to invert during RTL expansion?
> > >
> >
> > Isn't cbranch_none just cbranch_all x 0? i.e. all value must be zero.
> > I think all states are expressible with any and all and flipping the 
> > branches
> > so it shouldn't be any more restrictive than cbranch itself is today.
> >
> > cbranch also only supports eq and ne, so none would be cbranch (eq x 0)
> >
> > and FTR the RTL generated for AArch64 (Both SVE And Adv.SIMD) will be
> simplified to:
> >
> > (insn 23 22 24 5 (parallel [
> > (set (reg:VNx4BI 128 [ mask_patt_14.15_57 ])
> > (unspec:VNx4BI [
> > (reg:VNx4BI 129)
> > (const_int 0 [0x0])
> > (gt:VNx4BI (reg:VNx4SI 114 [ vect__2.11 ])
> > (const_vector:VNx4SI repeat [
> > (const_int 0 [0])
> > ]))
> > ] UNSPEC_PRED_Z))
> > (clobber (reg:CC_NZC 66 cc))
> > ]) "cbranch.c":25:10 -1
> >
> > (jump_insn 27 26 28 5 (set (pc)
> > (if_then_else (eq (reg:CC_Z 66 cc)
> > (const_int 0 [0]))
> > (label_ref 33)
> > (pc))) "cbranch.c":25:10 -1
> >  (int_list:REG_BR_PROB 1014686025 (nil))
> >
> > The thing is we can't rid of the unspecs as there's concept of masking in 
> > RTL
> compares.
> > We could technically do an AND (and do in some cases) but then you lose the
> predicate
> > Hint constant in the RTL which tells you whether the mask is known to be 
> > all true
> or not.
> > This hint is crucial to allow for further optimizations.
> >
> > That said the condition code, branch and compares are fully exposed.
> >
> > We expand to a larger sequence than I'd like mostly because there's no 
> > support
> > for conditional cbranch optabs, or even conditional vector comparisons. So 
> > the
> comparisons
> > must be generated unpredicated by generating an all true mask, and later
> patterns
> > merge in the AND.
> >
> > The new patterns allow us to clean up codegen for Adv.SIMD + SVE (in a 
> > single
> loop)
> > But not pure SVE.  For which I take a different approach to try to avoid 
> > requiring
> > a predicated version of these optabs.
> >
> > I don't want to push my luck, but would you be ok with a conditional 
> > version of
> these
> > optabs too? i.e. cond_cbranch_all and cond_cbranch_all?  This would allow 
> > us to
> > immediately expand to the correct representation for both SVE and Adv.SIMD
> > without having to rely on various combine patterns and cc-fusion to 
> > optimize the
> sequences
> > later on (which has historically been a bit hit or miss if someone adds a 
> > new CC
> pattern).
> 
> Can you explain?  A conditional conditional branch makes my head hurt.
> It's really a cbranch_{all,any} where the (vector) compare has an
> additional mask applied?  So cbranch_cond_{all,any} would be a better
> fit?

Yeah so cbranch is itself in GIMPLE

c = vec_a `cmp` vec_b
if (c {any,all} 0)
  ...

where cbranch_{all, any} represents the gimple

If (vec_a `cmp` vec_b)
  ...

cbranch_cond_{all, any} would represent

if ((vec_a `cmp` vec_b) & loop_mask)
  ...

In GIMPLE we mask most operations by & the mask with the result
of the operation.  But cbranch doesn't have an LHS, so we can't wrap
the & around anything.  And because of this we rely on backend patterns
to later push the mask from the & into the operation such that we can
generate the predicated compare.

Because of this we end up requiring patterns such as

;; Predicated integer comparisons, formed by combining a PTRUE-predicated
;; comparison with an AND in which only the flags result is interesting.
(define_insn_and_rewrite "*cmp_and_ptest"
  [(set (reg:CC_Z CC_REGNUM)
   (unspec:CC_Z
 [(match_operand:VNx16BI 1 "register_operand")
  (match_operand 4)
  (const_int SVE_KNOWN_PTRUE)
  (and:
(unspec:

[PATCH v5 8/11] openmp: Fix struct handling for OpenMP iterators

2025-07-09 Thread Kwok Cheung Yeung

This patch fixes some issues with the struct handling introduced in the 
patch for Fortran support. The problem is that 
build_omp_struct_comp_nodes and omp_accumulate_sibling_list can add 
extra clauses with iterators onto the target construct, but this occurs 
after the iterator loop has already been built, so no iterator element 
array has been allocated for the new clause and the iterator vector will 
refer to whatever the new clause was derived from, effectively sharing 
data between the two clauses.


A new function add_new_omp_iterators_clause has been added which 
allocates a new element array after the loop has already been built, and 
updates the clause iterator to point to the new array instead. This is 
called each time a new clauses is created from an existing one.From c3444e54de724a7787701f39bd1f42c43e93bd3e Mon Sep 17 00:00:00 2001
From: Kwok Cheung Yeung 
Date: Fri, 11 Apr 2025 18:27:00 +0100
Subject: [PATCH 08/11] openmp: Fix struct handling for OpenMP iterators

New clauses can be created for structs, and these will also need to have
iterators applied to them if the base clause is using iterators.  As this
occurs after the initial iterator expansion, a new mechanism for allocating
new entries in the iterator loop is required.

gcc/

* gimplify.cc (add_new_omp_iterators_clause): New.
(build_omp_struct_comp_nodes): Add extra argument for loops sequence.
Call add_new_omp_iterators_clause on newly generated clauses.
(omp_accumulate_sibling_list): Add extra argument for loops sequence.
Pass to calls to build_omp_struct_comp_nodes.  Add iterators to newly
generator clauses for struct accesses.
(omp_build_struct_sibling_lists): Add extra argument for loops
sequence. Pass to call to omp_accumulate_sibling_list.
(gimplify_adjust_omp_clauses): Pass loops sequence to
omp_build_struct_sibling_lists.
---
 gcc/gimplify.cc | 81 -
 1 file changed, 73 insertions(+), 8 deletions(-)

diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc
index 7c2f565d102..2b4d0ddafb1 100644
--- a/gcc/gimplify.cc
+++ b/gcc/gimplify.cc
@@ -10167,6 +10167,60 @@ exit_omp_iterator_loop_context (tree c)
   pop_gimplify_context (NULL);
 }
 
+/* Insert new OpenMP clause C into pre-existing iterator loop LOOPS_SEQ_P.
+   If the clause has an iterator, then that iterator is assumed to be in
+   the expanded form (i.e. it has info regarding the loop, expanded elements
+   etc.).  */
+
+void
+add_new_omp_iterators_clause (tree c, gimple_seq *loops_seq_p)
+{
+  gimple_stmt_iterator gsi;
+  tree iters = OMP_CLAUSE_ITERATORS (c);
+  if (!iters)
+return;
+  gcc_assert (OMP_ITERATORS_EXPANDED_P (iters));
+
+  /* Search for  = -1.  */
+  tree index = OMP_ITERATORS_INDEX (iters);
+  for (gsi = gsi_start (*loops_seq_p); !gsi_end_p (gsi); gsi_next (&gsi))
+{
+  gimple *stmt = gsi_stmt (gsi);
+  if (gimple_code (stmt) == GIMPLE_ASSIGN
+ && gimple_assign_lhs (stmt) == index
+ && gimple_assign_rhs1 (stmt) == size_int (-1))
+   break;
+}
+  gcc_assert (!gsi_end_p (gsi));
+
+  /* Create array for this clause.  */
+  tree arr_length = omp_iterator_elems_length (OMP_ITERATORS_COUNT (iters));
+  tree elems_type = TREE_CONSTANT (arr_length)
+   ? build_array_type (ptr_type_node,
+   build_index_type (arr_length))
+   : build_pointer_type (ptr_type_node);
+  tree elems = create_tmp_var_raw (elems_type, "omp_iter_data");
+  TREE_ADDRESSABLE (elems) = 1;
+  gimple_add_tmp_var (elems);
+
+  /* BEFORE LOOP:  */
+  /* elems[0] = count;  */
+  tree lhs = TREE_CODE (TREE_TYPE (elems)) == ARRAY_TYPE
+   ? build4 (ARRAY_REF, ptr_type_node, elems, size_int (0), NULL_TREE,
+ NULL_TREE)
+   : build1 (INDIRECT_REF, ptr_type_node, elems);
+
+  gimple_seq assign_seq = NULL;
+  gimplify_assign (lhs, OMP_ITERATORS_COUNT (iters), &assign_seq);
+  gsi_insert_seq_after (&gsi, assign_seq, GSI_SAME_STMT);
+
+  /* Update iterator information.  */
+  tree new_iterator = copy_omp_iterator (OMP_CLAUSE_ITERATORS (c));
+  OMP_ITERATORS_ELEMS (new_iterator) = elems;
+  TREE_CHAIN (new_iterator) = TREE_CHAIN (OMP_CLAUSE_ITERATORS (c));
+  OMP_CLAUSE_ITERATORS (c) = new_iterator;
+}
+
 /* If *LIST_P contains any OpenMP depend clauses with iterators,
lower all the depend clauses by populating corresponding depend
array.  Returns 0 if there are no such depend clauses, or
@@ -10565,7 +10619,7 @@ omp_map_clause_descriptor_p (tree c)
 
 static tree
 build_omp_struct_comp_nodes (enum tree_code code, tree grp_start, tree grp_end,
-tree *extra_node)
+tree *extra_node, gimple_seq *loops_seq_p)
 {
   enum gomp_map_kind mkind
 = (code == OMP_TARGET_EXIT_DATA || code == OACC_EXIT_DATA)
@@ -10575,6 +10629,8 @@ build_omp_struct_comp_nodes (enum tree_code code, tree 
grp_s

[PATCH v5 10/11] openmp, fortran: Add iterator support for Fortran, deep-mapping of allocatables

2025-07-09 Thread Kwok Cheung Yeung


This patch adds iterator support for Fortran deep mapping of allocatables.

When a new map is generated in gfc_omp_deep_mapping_map, a new elements 
array is allocated in the iterator loop, and the data and size that 
would have gone into the map are now written into the array from inside 
the iterator loop. The data entry is then set to point to the elements 
array and the size is set to indicate the map as an iterator map.


Since the map data entry may refer to other computed expressions, any 
statements generated during evaluation must also go inside the iterator 
loop.
From 8c6a4e6536125d2c18a0a5016764b31fbdd23fd2 Mon Sep 17 00:00:00 2001
From: Kwok Cheung Yeung 
Date: Sat, 3 May 2025 21:03:33 +
Subject: [PATCH 10/11] openmp, fortran: Add iterator support for Fortran
 deep-mapping of allocatables

gcc/fortran/

* trans-openmp.cc (gfc_omp_deep_mapping_map): Remove const from ctx
argument.  Add arguments for iterators and the statement sequence to
go into the iterator loop.  Add statement sequence to iterator loop
body.  Generate iterator loop entries for generated maps, insert
the map decls and sizes into the iterator element arrays, replace
original decls with the address of the element arrays, and
sizes/biases with SIZE_INT.
(gfc_omp_deep_mapping_comps): Remove const from ctx. Add argument for
iterators.  Pass iterators to calls to gfc_omp_deep_mapping_item and
gfc_omp_deep_mapping_comps.
(gfc_omp_deep_mapping_item): Remove const from ctx. Add argument for
iterators.  Collect generated side-effect statements and pass to
gfc_omp_deep_mapping_map along with the iterators.  Pass iterators
to gfc_omp_deep_mapping_comps.
(gfc_omp_deep_mapping_do): Remove const from ctx.  Pass iterators to
gfc_omp_deep_mapping_item.
(gfc_omp_deep_mapping_cnt): Remove const from ctx.
(gfc_omp_deep_mapping): Likewise.
* trans.h (gfc_omp_deep_mapping_cnt): Likewise.
(gfc_omp_deep_mapping): Likewise.

gcc/

* gimplify.cc (enter_omp_iterator_loop_context): New function variant.
(enter_omp_iterator_loop_context): Delegate to new variant.
(exit_omp_iterator_loop_context): New function variant.
(exit_omp_iterator_loop_context): Delegate to new variant.
(assign_to_iterator_elems_array): New.
(add_new_omp_iterators_entry): New.
(add_new_omp_iterators_clause): Delegate to
add_new_omp_iterators_entry.
* gimplify.h (enter_omp_iterator_loop_context): New prototype.
(enter_omp_iterator_loop_context): Remove default argument.
(exit_omp_iterator_loop_context): Remove argument.
(assign_to_iterator_elems_array): New prototype.
(add_new_omp_iterators_entry): New prototype.
(add_new_omp_iterators_clause): New prototype.
* langhooks-def.h (lhd_omp_deep_mapping_cnt): Remove const from
argument.
(lhd_omp_deep_mapping): Likewise.
* langhooks.cc (lhd_omp_deep_mapping_cnt): Likewise.
(lhd_omp_deep_mapping): Likewise.
* langhooks.h (omp_deep_mapping_cnt): Likewise.
(omp_deep_mapping): Likewise.
* omp-low.cc (lower_omp_map_iterator_expr): Delegate to
assign_to_iterator_elems_array.
(lower_omp_map_iterator_size): Likewise.
(lower_omp_target): Remove sorry for deep mapping.

libgomp/

* testsuite/libgomp.fortran/allocatable-comp-iterators.f90: New.
---
 gcc/fortran/trans-openmp.cc   | 94 ++-
 gcc/fortran/trans.h   |  4 +-
 gcc/gimplify.cc   | 87 -
 gcc/gimplify.h|  8 +-
 gcc/langhooks-def.h   |  4 +-
 gcc/langhooks.cc  |  4 +-
 gcc/langhooks.h   |  4 +-
 gcc/omp-low.cc| 50 +-
 .../allocatable-comp-iterators.f90| 60 
 9 files changed, 216 insertions(+), 99 deletions(-)
 create mode 100644 
libgomp/testsuite/libgomp.fortran/allocatable-comp-iterators.f90

diff --git a/gcc/fortran/trans-openmp.cc b/gcc/fortran/trans-openmp.cc
index b272ad769ae..96a9c63665b 100644
--- a/gcc/fortran/trans-openmp.cc
+++ b/gcc/fortran/trans-openmp.cc
@@ -1853,7 +1853,8 @@ static void
 gfc_omp_deep_mapping_map (tree data, tree size, unsigned HOST_WIDE_INT tkind,
  location_t loc, tree data_array, tree sizes_array,
  tree kinds_array, tree offset_data, tree offset,
- gimple_seq *seq, const gimple *ctx)
+ gimple_seq *seq, gimple *ctx,
+ tree iterators, gimple_seq loops_pre_seq)
 {
   tree one = build_int_cst (size_type_node, 1);
 
@@ -1864,26 +1865,63 @@ gfc_omp_deep_mapping_map (tree data, tree size, 
unsigned HOST_WIDE_INT

Re: [PATCH 2/2] tree-optimization/109893 - allow more backwards jump threading

2025-07-09 Thread Jeff Law





On 7/9/25 6:53 AM, Richard Biener wrote:

On Wed, Jul 9, 2025 at 2:16 PM Jeff Law  wrote:




On 7/9/25 12:27 AM, Richard Biener wrote:

The following changes the percentage that determines how many
stmts are allowed for backwards jump threading from 50 to 54,
enabling the missed jump threading observed in PR109893.

Bootstrapped and tested on x86_64-unknown-linux-gnu.  It seems that
at least backward threading is prone to profile mismatches, I've
altered two testcases to deal with new ones to pop up (definitely
latent issues).

OK?

   PR tree-optimization/109893
   * params.opt (fsm-scale-path-stmts): Change from 50 to 54.

   * gcc.dg/tree-ssa/pr109893.c: New testcase.
   * gcc.dg/tree-prof/cmpsf-1.c: XFAIL.
   * gcc.dg/tree-ssa/pr109893.c: Remove scan on no profile
   mismatches.

My recollection is the scaling factor was set one based on some old PR
where code size exploded and wasn't really tuned further after that.  If
the new value is working better, then that's obviously fine with me.
Ideally we'd just get rid of the magic ratio


Yes, there's some hand-waving about the forward threader having ways
to re-use copied blocks but the backward threader does not. 
That's absolutely correct.  Essentially the forward threader creates one 
jump thread path for each final destination.  Then all the incoming 
edges are redirected to the appropriate copy.  The backwards threader 
makes a copy for each jump thread path discovered.





Unless somebody is going to spend time cleaning things up further
(we've talked about getting rid of the forward threader), this is what I
have to offer.
I'd still like to see the forward threader go away.  I haven't done any 
evaluation on what needs to happen to realize that goal in years though.


jeff

Re: [PATCH 1/3]middle-end: support vec_cbranch_any and vec_cbranch_all [PR118974]

2025-07-09 Thread Richard Biener

On Wed, 9 Jul 2025, Tamar Christina wrote:

> (on mobile so doing a top reply)
> 
> > So it isn't as efficient to use cbranch_any (g != 0) here?  I  think it 
> > should be practically equivalent...
> 
> Ah yeah, it can expand what we currently expand vector boolean to.
> 
> I was initially confused because for SVE what we want here is an ORRS (flag 
> setting inclusive ORR)
> 
> Using this optab we can get to that an easier way too.
> 
> So yeah I agree, cbranch for vectors can be deprecated.
> 
> Note that in my patch I named the new one vec_cbranch_any/all to implicitly 
> say they are only vectors.
> 
> Do you want to fully deprecated cbranch for vectors?

Yes, I think that would remove confusion.

> This would mean though that all target checks needs to be updated unless we 
> update the supports checks with a helper?
> 
> Thanks,
> Tamar
> 
> 
> From: Richard Biener 
> Sent: Wednesday, July 9, 2025 1:24 PM
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org ; nd 
> Subject: RE: [PATCH 1/3]middle-end: support vec_cbranch_any and 
> vec_cbranch_all [PR118974]
> 
> On Wed, 9 Jul 2025, Tamar Christina wrote:
> 
> > > -Original Message-
> > > From: Richard Biener 
> > > Sent: Wednesday, July 9, 2025 12:36 PM
> > > To: Tamar Christina 
> > > Cc: gcc-patches@gcc.gnu.org; nd 
> > > Subject: RE: [PATCH 1/3]middle-end: support vec_cbranch_any and
> > > vec_cbranch_all [PR118974]
> > >
> > > On Wed, 9 Jul 2025, Tamar Christina wrote:
> > >
> > > > > > +Operand 0 is a comparison operator.  Operand 1 and operand 2 are 
> > > > > > the
> > > > > > +first and second operands of the comparison, respectively.  
> > > > > > Operand 3
> > > > > > +is the @code{code_label} to jump to.
> > > > > > +
> > > > > > +@cindex @code{cbranch_all@var{mode}4} instruction pattern
> > > > > > +@item @samp{cbranch_all@var{mode}4}
> > > > > > +Conditional branch instruction combined with a compare instruction 
> > > > > > on
> > > vectors
> > > > > > +where it is required that at all of the elementwise comparisons of 
> > > > > > the
> > > > > > +two input vectors are true.
> > > > >
> > > > > See above.
> > > > >
> > > > > When I look at the RTL for aarch64 I wonder whether the middle-end
> > > > > can still invert a jump (for BB reorder, for example)?  Without
> > > > > a cbranch_none expander we have to invert during RTL expansion?
> > > > >
> > > >
> > > > Isn't cbranch_none just cbranch_all x 0? i.e. all value must be zero.
> > > > I think all states are expressible with any and all and flipping the 
> > > > branches
> > > > so it shouldn't be any more restrictive than cbranch itself is today.
> > > >
> > > > cbranch also only supports eq and ne, so none would be cbranch (eq x 0)
> > > >
> > > > and FTR the RTL generated for AArch64 (Both SVE And Adv.SIMD) will be
> > > simplified to:
> > > >
> > > > (insn 23 22 24 5 (parallel [
> > > > (set (reg:VNx4BI 128 [ mask_patt_14.15_57 ])
> > > > (unspec:VNx4BI [
> > > > (reg:VNx4BI 129)
> > > > (const_int 0 [0x0])
> > > > (gt:VNx4BI (reg:VNx4SI 114 [ vect__2.11 ])
> > > > (const_vector:VNx4SI repeat [
> > > > (const_int 0 [0])
> > > > ]))
> > > > ] UNSPEC_PRED_Z))
> > > > (clobber (reg:CC_NZC 66 cc))
> > > > ]) "cbranch.c":25:10 -1
> > > >
> > > > (jump_insn 27 26 28 5 (set (pc)
> > > > (if_then_else (eq (reg:CC_Z 66 cc)
> > > > (const_int 0 [0]))
> > > > (label_ref 33)
> > > > (pc))) "cbranch.c":25:10 -1
> > > >  (int_list:REG_BR_PROB 1014686025 (nil))
> > > >
> > > > The thing is we can't rid of the unspecs as there's concept of masking 
> > > > in RTL
> > > compares.
> > > > We could technically do an AND (and do in some cases) but then you lose 
> > > > the
> > > predicate
> > > > Hint constant in the RTL which tells you whether the mask is known to 
> > > > be all true
> > > or not.
> > > > This hint is crucial to allow for further optimizations.
> > > >
> > > > That said the condition code, branch and compares are fully exposed.
> > > >
> > > > We expand to a larger sequence than I'd like mostly because there's no 
> > > > support
> > > > for conditional cbranch optabs, or even conditional vector comparisons. 
> > > > So the
> > > comparisons
> > > > must be generated unpredicated by generating an all true mask, and later
> > > patterns
> > > > merge in the AND.
> > > >
> > > > The new patterns allow us to clean up codegen for Adv.SIMD + SVE (in a 
> > > > single
> > > loop)
> > > > But not pure SVE.  For which I take a different approach to try to 
> > > > avoid requiring
> > > > a predicated version of these optabs.
> > > >
> > > > I don't want to push my luck, but would you be ok with a conditional 
> > > > version of
> > > these
> > > > optabs too? i.e.

[wwwdocs] Add some C++23 and C++26 library features to GCC 16 release notes

2025-07-09 Thread Jonathan Wakely

Also thank Tomasz for std::format range support in GCC 15
---

Pushed to wwwdocs.

 htdocs/gcc-15/changes.html |  2 +-
 htdocs/gcc-16/changes.html | 21 -
 2 files changed, 21 insertions(+), 2 deletions(-)

diff --git a/htdocs/gcc-15/changes.html b/htdocs/gcc-15/changes.html
index bf980491..23632d41 100644
--- a/htdocs/gcc-15/changes.html
+++ b/htdocs/gcc-15/changes.html
@@ -679,7 +679,7 @@ asm (".text; %cc0: mov %cc2, %%r0; .previous;"
 
 
   Formatting of ranges and tuples with std::format,
-  as well as string escaping for debug formats.
+  as well as string escaping for debug formats, thanks to Tomasz KamiÅski.
 
 
   Clarify handling of encodings in localized formatting of chrono types.
diff --git a/htdocs/gcc-16/changes.html b/htdocs/gcc-16/changes.html
index 99644758..cc6fe204 100644
--- a/htdocs/gcc-16/changes.html
+++ b/htdocs/gcc-16/changes.html
@@ -75,7 +75,26 @@ for general information.
 
 C++
 
-
+Runtime Library (libstdc++)
+
+
+  Improved experimental support for C++26, including:
+
+
+  std::copyable_function and std::function_ref.
+
+std::indirect and std::polymorphic.
+
+  std::owner_equal for shared pointers, thanks to Paul Keir.
+
+
+  
+  Improved experimental support for C++23, including:
+
+std::mdspan, thanks to Luc Grosheintz.
+
+  
+
 
 
 
-- 
2.50.0

[PATCH v5 7/11] openmp: Add macros for iterator element access

2025-07-09 Thread Kwok Cheung Yeung

This patch adds macros to refer to the fields of OpenMP iterators by 
name rather than by index, as the number of items has increased to 10 
and referring to them by index has become error-prone.From e09f6cba88e321e9da50e002b3e74ff36cf19865 Mon Sep 17 00:00:00 2001
From: Kwok Cheung Yeung 
Date: Sat, 3 May 2025 20:38:10 +
Subject: [PATCH 07/11] openmp: Add macros for iterator element access

gcc/c/

* c-parser.cc (c_parser_omp_iterators): Use macros for accessing
iterator elements.
(c_parser_omp_clause_affinity): Likewise.
(c_parser_omp_clause_depend): Likewise.
(c_parser_omp_clause_map): Likewise.
(c_parser_omp_clause_from_to): Likewise.
* c-typeck.cc (c_omp_finish_iterators): Likewise.

gcc/cp/

* parser.cc (cp_parser_omp_iterators): Use macros for accessing
iterator elements.
(cp_parser_omp_clause_affinity): Likewise.
(cp_parser_omp_clause_depend): Likewise.
(cp_parser_omp_clause_from_to): Likewise.
(cp_parser_omp_clause_map): Likewise.
* semantics.cc (cp_omp_finish_iterators): Likewise.

gcc/fortran/

* trans-openmp.cc (gfc_trans_omp_array_section): Use macros for
accessing iterator elements.
(handle_iterator): Likewise.
(gfc_trans_omp_clauses): Likewise.

gcc/

* gimplify.cc (gimplify_omp_affinity): Use macros for accessing
iterator elements.
(compute_omp_iterator_count): Likewise.
(build_omp_iterator_loop): Likewise.
(remove_unused_omp_iterator_vars): Likewise.
(build_omp_iterators_loops): Likewise.
(enter_omp_iterator_loop_context_1): Likewise.
(extract_base_bit_offset): Likewise.
* omp-low.cc (lower_omp_map_iterator_expr): Likewise.
(lower_omp_map_iterator_size): Likewise.
(allocate_omp_iterator_elems): Likewise.
(free_omp_iterator_elems): Likewise.
* tree-inline.cc (copy_tree_body_r): Likewise.
* tree-pretty-print.cc (dump_omp_iterators): Likewise.
* tree.h (OMP_ITERATORS_VAR, OMP_ITERATORS_BEGIN, OMP_ITERATORS_END,
OMP_ITERATORS_STEP, OMP_ITERATORS_ORIG_STEP, OMP_ITERATORS_BLOCK,
OMP_ITERATORS_LABEL, OMP_ITERATORS_INDEX, OMP_ITERATORS_ELEMS,
OMP_ITERATORS_COUNT, OMP_ITERATORS_EXPANDED_P): New macros.
---
 gcc/c/c-parser.cc   | 16 
 gcc/c/c-typeck.cc   | 24 ++--
 gcc/cp/parser.cc| 16 
 gcc/cp/semantics.cc | 26 ++---
 gcc/fortran/trans-openmp.cc | 38 +-
 gcc/gimplify.cc | 78 ++---
 gcc/omp-low.cc  | 17 
 gcc/tree-inline.cc  |  4 +-
 gcc/tree-pretty-print.cc| 20 +-
 gcc/tree.h  | 13 +++
 10 files changed, 133 insertions(+), 119 deletions(-)

diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
index b426d0b9f9f..0ecc3e88be5 100644
--- a/gcc/c/c-parser.cc
+++ b/gcc/c/c-parser.cc
@@ -19751,10 +19751,10 @@ c_parser_omp_iterators (c_parser *parser)
   pushdecl (iter_var);
 
   *last = make_tree_vec (6);
-  TREE_VEC_ELT (*last, 0) = iter_var;
-  TREE_VEC_ELT (*last, 1) = begin;
-  TREE_VEC_ELT (*last, 2) = end;
-  TREE_VEC_ELT (*last, 3) = step;
+  OMP_ITERATORS_VAR (*last) = iter_var;
+  OMP_ITERATORS_BEGIN (*last) = begin;
+  OMP_ITERATORS_END (*last) = end;
+  OMP_ITERATORS_STEP (*last) = step;
   last = &TREE_CHAIN (*last);
 
   if (c_parser_next_token_is (parser, CPP_COMMA))
@@ -19819,7 +19819,7 @@ c_parser_omp_clause_affinity (c_parser *parser, tree 
list)
   tree block = pop_scope ();
   if (iterators != error_mark_node)
{
- TREE_VEC_ELT (iterators, 5) = block;
+ OMP_ITERATORS_BLOCK (iterators) = block;
  for (tree c = nl; c != list; c = OMP_CLAUSE_CHAIN (c))
OMP_CLAUSE_DECL (c) = build_tree_list (iterators,
   OMP_CLAUSE_DECL (c));
@@ -19936,7 +19936,7 @@ c_parser_omp_clause_depend (c_parser *parser, tree list)
  if (iterators == error_mark_node)
iterators = NULL_TREE;
  else
-   TREE_VEC_ELT (iterators, 5) = block;
+   OMP_ITERATORS_BLOCK (iterators) = block;
}
 
   for (c = nl; c != list; c = OMP_CLAUSE_CHAIN (c))
@@ -20275,7 +20275,7 @@ c_parser_omp_clause_map (c_parser *parser, tree list, 
bool declare_mapper_p)
   if (iterators == error_mark_node)
iterators = NULL_TREE;
   else
-   TREE_VEC_ELT (iterators, 5) = block;
+   OMP_ITERATORS_BLOCK (iterators) = block;
 }
 
   for (c = nl; c != list; c = OMP_CLAUSE_CHAIN (c))
@@ -20645,7 +20645,7 @@ c_parser_omp_clause_from_to (c_parser *parser, enum 
omp_clause_code kind,
   if (iterators == error_mark_node)
iterators = NULL_TREE;
   else
-   TREE_VEC_ELT (iterators, 5) = block;
+   OMP_ITERATORS_BLOCK (iterators) = block;

[PATCH v5 11/11] openmp, fortran: Add support for non-constant iterator bounds in Fortran deep-mapping iterator support

2025-07-09 Thread Kwok Cheung Yeung

This patch adds support for non-constant iterator bounds to the Fortran 
deep-mapping iterator support.


To do this, we need to keep track of the new iterator entries generated 
during by the deep mapping. Code is generated by lower_omp_target to 
allocate memory for each of these entries one-by-one, then freed after 
the target code. allocate_omp_iterator_elems and free_omp_iterator_elems 
are modified so that they work on a per-iterator basis rather than 
per-clause.From a9236e9350dbe1c51d92c8301118fe7f36a371db Mon Sep 17 00:00:00 2001
From: Kwok Cheung Yeung 
Date: Sat, 3 May 2025 21:10:47 +
Subject: [PATCH 11/11] openmp, fortran: Add support for non-constant iterator
 bounds in Fortran deep-mapping iterator support

gcc/fortran/

* trans-openmp.cc (gfc_omp_deep_mapping_map): Add new argument for
vector of newly created iterators.  Push new iterators onto the
vector.
(gfc_omp_deep_mapping_comps): Add new argument for vector of new
iterators.  Pass argument in calls to gfc_omp_deep_mapping_item and
gfc_omp_deep_mapping_comps.
(gfc_omp_deep_mapping_item): Add new argument for vector of new
iterators.  Pass argument in calls to gfc_omp_deep_mapping_map and
gfc_omp_deep_mapping_comps.
(gfc_omp_deep_mapping_do): Add new argument for vector of new
iterators.  Pass argument in calls to gfc_omp_deep_mapping_item.
(gfc_omp_deep_mapping_cnt): Pass NULL to new argument for
gfc_omp_deep_mapping_do.
(gfc_omp_deep_mapping): Add new argument for vector of new
iterators.  Pass argument in calls to gfc_omp_deep_mapping_do.
* trans.h (gfc_omp_deep_mapping): Add new argument.

gcc/

* langhooks-def.h (lhd_omp_deep_mapping): Add new argument.
* langhooks.cc (lhd_omp_deep_mapping): Likewise.
* langhooks.h (omp_deep_mapping): Likewise.
* omp-low.cc (allocate_omp_iterator_elems): Work on the supplied
iterator set instead of the iterators in a supplied set of clauses.
(free_omp_iterator_elems): Likewise.
(lower_omp_target): Maintain vector of new iterators generated by
deep-mapping.  Allocate and free iterator element arrays using
iterators found in clauses and in the new iterator vector.

libgomp/

* testsuite/libgomp.fortran/allocatable-comp-iterators.f90: Add test
for non-const iterator boundaries.
---
 gcc/fortran/trans-openmp.cc   |  38 ---
 gcc/fortran/trans.h   |   3 +-
 gcc/langhooks-def.h   |   3 +-
 gcc/langhooks.cc  |   2 +-
 gcc/langhooks.h   |   3 +-
 gcc/omp-low.cc| 103 +-
 .../allocatable-comp-iterators.f90|   3 +-
 7 files changed, 83 insertions(+), 72 deletions(-)

diff --git a/gcc/fortran/trans-openmp.cc b/gcc/fortran/trans-openmp.cc
index 96a9c63665b..7b0996b03e6 100644
--- a/gcc/fortran/trans-openmp.cc
+++ b/gcc/fortran/trans-openmp.cc
@@ -1854,7 +1854,8 @@ gfc_omp_deep_mapping_map (tree data, tree size, unsigned 
HOST_WIDE_INT tkind,
  location_t loc, tree data_array, tree sizes_array,
  tree kinds_array, tree offset_data, tree offset,
  gimple_seq *seq, gimple *ctx,
- tree iterators, gimple_seq loops_pre_seq)
+ tree iterators, gimple_seq loops_pre_seq,
+ vec *new_iterators)
 {
   tree one = build_int_cst (size_type_node, 1);
 
@@ -1880,6 +1881,7 @@ gfc_omp_deep_mapping_map (tree data, tree size, unsigned 
HOST_WIDE_INT tkind,
   if (iterators)
 {
   data_iter = add_new_omp_iterators_entry (iterators, loops_seq_p);
+  new_iterators->safe_push (data_iter);
   assign_to_iterator_elems_array (data_expr, data_iter, ctx);
   data_expr = OMP_ITERATORS_ELEMS (data_iter);
   if (TREE_CODE (TREE_TYPE (data_expr)) == ARRAY_TYPE)
@@ -1900,6 +1902,7 @@ gfc_omp_deep_mapping_map (tree data, tree size, unsigned 
HOST_WIDE_INT tkind,
   if (iterators)
 {
   data_addr_iter = add_new_omp_iterators_entry (iterators, loops_seq_p);
+  new_iterators->safe_push (data_addr_iter);
   assign_to_iterator_elems_array (data_addr_expr, data_addr_iter, ctx);
   data_addr_expr = OMP_ITERATORS_ELEMS (data_addr_iter);
   if (TREE_CODE (TREE_TYPE (data_addr_expr)) == ARRAY_TYPE)
@@ -1990,7 +1993,7 @@ static void gfc_omp_deep_mapping_item (bool, bool, bool, 
location_t, tree,
   tree *, unsigned HOST_WIDE_INT, tree,
   tree, tree, tree, tree, tree,
   gimple_seq *, gimple *, bool *,
-  tree);
+  tree, vec  *);
 
 /* Map allocatable components.  */
 static void
@@ -1999,7 +2002,8

Re: [PATCH 2/2] tree-optimization/109893 - allow more backwards jump threading

2025-07-09 Thread Richard Biener

On Wed, Jul 9, 2025 at 3:06 PM Jeff Law  wrote:
>
>
>
> On 7/9/25 6:53 AM, Richard Biener wrote:
> > On Wed, Jul 9, 2025 at 2:16 PM Jeff Law  wrote:
> >>
> >>
> >>
> >> On 7/9/25 12:27 AM, Richard Biener wrote:
> >>> The following changes the percentage that determines how many
> >>> stmts are allowed for backwards jump threading from 50 to 54,
> >>> enabling the missed jump threading observed in PR109893.
> >>>
> >>> Bootstrapped and tested on x86_64-unknown-linux-gnu.  It seems that
> >>> at least backward threading is prone to profile mismatches, I've
> >>> altered two testcases to deal with new ones to pop up (definitely
> >>> latent issues).
> >>>
> >>> OK?
> >>>
> >>>PR tree-optimization/109893
> >>>* params.opt (fsm-scale-path-stmts): Change from 50 to 54.
> >>>
> >>>* gcc.dg/tree-ssa/pr109893.c: New testcase.
> >>>* gcc.dg/tree-prof/cmpsf-1.c: XFAIL.
> >>>* gcc.dg/tree-ssa/pr109893.c: Remove scan on no profile
> >>>mismatches.
> >> My recollection is the scaling factor was set one based on some old PR
> >> where code size exploded and wasn't really tuned further after that.  If
> >> the new value is working better, then that's obviously fine with me.
> >> Ideally we'd just get rid of the magic ratio
> >
> > Yes, there's some hand-waving about the forward threader having ways
> > to re-use copied blocks but the backward threader does not.
> That's absolutely correct.  Essentially the forward threader creates one
> jump thread path for each final destination.  Then all the incoming
> edges are redirected to the appropriate copy.  The backwards threader
> makes a copy for each jump thread path discovered.

ISTR the backwards threader simply cancels paths that had blocks in
common with another jump thread (that happened to be materialized
first).  But maybe it's less strict than that.  It cancels things in too
many places and while it collects all opportunities upfront it doesn't
have any global limit on the amount of copying it does (and do pruning
based on such limit in any sensible order, of course).

Richard.

>
>
> >
> > Unless somebody is going to spend time cleaning things up further
> > (we've talked about getting rid of the forward threader), this is what I
> > have to offer.
> I'd still like to see the forward threader go away.  I haven't done any
> evaluation on what needs to happen to realize that goal in years though.
>
> jeff
>

[PATCH 3/3] Remove vect_dissolve_slp_only_groups

2025-07-09 Thread Richard Biener

This function dissolves DR groups that are not subject to SLP.  Which
means it is no longer necessary.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

* tree-vect-loop.cc (vect_dissolve_slp_only_groups): Remove.
(vect_analyze_loop_2): Do not call it.
---
 gcc/tree-vect-loop.cc | 75 ---
 1 file changed, 75 deletions(-)

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index d57d34dfad2..2d5ea414559 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -2260,78 +2260,6 @@ vect_get_datarefs_in_loop (loop_p loop, basic_block *bbs,
   return opt_result::success ();
 }
 
-/* Look for SLP-only access groups and turn each individual access into its own
-   group.  */
-static void
-vect_dissolve_slp_only_groups (loop_vec_info loop_vinfo)
-{
-  unsigned int i;
-  struct data_reference *dr;
-
-  DUMP_VECT_SCOPE ("vect_dissolve_slp_only_groups");
-
-  vec datarefs = LOOP_VINFO_DATAREFS (loop_vinfo);
-  FOR_EACH_VEC_ELT (datarefs, i, dr)
-{
-  gcc_assert (DR_REF (dr));
-  stmt_vec_info stmt_info
-   = vect_stmt_to_vectorize (loop_vinfo->lookup_stmt (DR_STMT (dr)));
-
-  /* Check if the load is a part of an interleaving chain.  */
-  if (STMT_VINFO_GROUPED_ACCESS (stmt_info))
-   {
- stmt_vec_info first_element = DR_GROUP_FIRST_ELEMENT (stmt_info);
- dr_vec_info *dr_info = STMT_VINFO_DR_INFO (first_element);
- unsigned int group_size = DR_GROUP_SIZE (first_element);
-
- /* Check if SLP-only groups.  */
- if (!STMT_SLP_TYPE (stmt_info)
- && STMT_VINFO_SLP_VECT_ONLY (first_element))
-   {
- /* Dissolve the group.  */
- STMT_VINFO_SLP_VECT_ONLY (first_element) = false;
-
- stmt_vec_info vinfo = first_element;
- while (vinfo)
-   {
- stmt_vec_info next = DR_GROUP_NEXT_ELEMENT (vinfo);
- DR_GROUP_FIRST_ELEMENT (vinfo) = vinfo;
- DR_GROUP_NEXT_ELEMENT (vinfo) = NULL;
- DR_GROUP_SIZE (vinfo) = 1;
- if (STMT_VINFO_STRIDED_P (first_element)
- /* We cannot handle stores with gaps.  */
- || DR_IS_WRITE (dr_info->dr))
-   {
- STMT_VINFO_STRIDED_P (vinfo) = true;
- DR_GROUP_GAP (vinfo) = 0;
-   }
- else
-   DR_GROUP_GAP (vinfo) = group_size - 1;
- /* Duplicate and adjust alignment info, it needs to
-be present on each group leader, see dr_misalignment.  */
- if (vinfo != first_element)
-   {
- dr_vec_info *dr_info2 = STMT_VINFO_DR_INFO (vinfo);
- dr_info2->target_alignment = dr_info->target_alignment;
- int misalignment = dr_info->misalignment;
- if (misalignment != DR_MISALIGNMENT_UNKNOWN)
-   {
- HOST_WIDE_INT diff
-   = (TREE_INT_CST_LOW (DR_INIT (dr_info2->dr))
-  - TREE_INT_CST_LOW (DR_INIT (dr_info->dr)));
- unsigned HOST_WIDE_INT align_c
-   = dr_info->target_alignment.to_constant ();
- misalignment = (misalignment + diff) % align_c;
-   }
- dr_info2->misalignment = misalignment;
-   }
- vinfo = next;
-   }
-   }
-   }
-}
-}
-
 /* Determine if operating on full vectors for LOOP_VINFO might leave
some scalar iterations still to do.  If so, decide how we should
handle those scalar iterations.  The possibilities are:
@@ -2687,9 +2615,6 @@ start_over:
   goto again;
 }
 
-  /* Dissolve SLP-only groups.  */
-  vect_dissolve_slp_only_groups (loop_vinfo);
-
   /* For now, we don't expect to mix both masking and length approaches for one
  loop, disable it if both are recorded.  */
   if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo)
-- 
2.43.0

[PATCH 2/3] Remove vect_analyze_loop_operations

2025-07-09 Thread Richard Biener

This removes the remains of vect_analyze_loop_operations.  All the
checks it does still on LC PHIs of inner loops in outer loop
vectorization should be handled by vectorizable_lc_phi.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

* tree-vect-loop.cc (vect_active_double_reduction_p): Remove.
(vect_analyze_loop_operations): Remove.
(vect_analyze_loop_2): Do not call it.
---
 gcc/tree-vect-loop.cc | 137 --
 1 file changed, 137 deletions(-)

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 98ac528e3a9..d57d34dfad2 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -1960,133 +1960,6 @@ vect_create_loop_vinfo (class loop *loop, 
vec_info_shared *shared,
 
 
 
-/* Return true if STMT_INFO describes a double reduction phi and if
-   the other phi in the reduction is also relevant for vectorization.
-   This rejects cases such as:
-
-  outer1:
-   x_1 = PHI ;
-   ...
-
-  inner:
-   x_2 = ...;
-   ...
-
-  outer2:
-   x_3 = PHI ;
-
-   if nothing in x_2 or elsewhere makes x_1 relevant.  */
-
-static bool
-vect_active_double_reduction_p (stmt_vec_info stmt_info)
-{
-  if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_double_reduction_def)
-return false;
-
-  return STMT_VINFO_RELEVANT_P (STMT_VINFO_REDUC_DEF (stmt_info));
-}
-
-/* Function vect_analyze_loop_operations.
-
-   Scan the loop stmts and make sure they are all vectorizable.  */
-
-static opt_result
-vect_analyze_loop_operations (loop_vec_info loop_vinfo)
-{
-  class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
-  basic_block *bbs = LOOP_VINFO_BBS (loop_vinfo);
-  int nbbs = loop->num_nodes;
-  int i;
-  stmt_vec_info stmt_info;
-
-  DUMP_VECT_SCOPE ("vect_analyze_loop_operations");
-
-  for (i = 0; i < nbbs; i++)
-{
-  basic_block bb = bbs[i];
-
-  for (gphi_iterator si = gsi_start_phis (bb); !gsi_end_p (si);
-  gsi_next (&si))
-{
-  gphi *phi = si.phi ();
-
- stmt_info = loop_vinfo->lookup_stmt (phi);
-  if (dump_enabled_p ())
-   dump_printf_loc (MSG_NOTE, vect_location, "examining phi: %G",
-(gimple *) phi);
- if (virtual_operand_p (gimple_phi_result (phi)))
-   continue;
-
- /* ???  All of the below unconditional FAILs should be in
-done earlier after analyzing cycles, possibly when
-determining stmt relevancy?  */
-
-  /* Inner-loop loop-closed exit phi in outer-loop vectorization
- (i.e., a phi in the tail of the outer-loop).  */
-  if (! is_loop_header_bb_p (bb))
-{
-  /* FORNOW: we currently don't support the case that these phis
- are not used in the outerloop (unless it is double reduction,
- i.e., this phi is vect_reduction_def), cause this case
- requires to actually do something here.  */
-  if (STMT_VINFO_LIVE_P (stmt_info)
- && !vect_active_double_reduction_p (stmt_info))
-   return opt_result::failure_at (phi,
-  "Unsupported loop-closed phi"
-  " in outer-loop.\n");
-
-  /* If PHI is used in the outer loop, we check that its operand
- is defined in the inner loop.  */
-  if (STMT_VINFO_RELEVANT_P (stmt_info))
-{
-  tree phi_op;
-
-  if (gimple_phi_num_args (phi) != 1)
-return opt_result::failure_at (phi, "unsupported phi");
-
-  phi_op = PHI_ARG_DEF (phi, 0);
- stmt_vec_info op_def_info = loop_vinfo->lookup_def (phi_op);
- if (!op_def_info)
-   return opt_result::failure_at (phi, "unsupported phi\n");
-
- if (STMT_VINFO_RELEVANT (op_def_info) != vect_used_in_outer
- && (STMT_VINFO_RELEVANT (op_def_info)
- != vect_used_in_outer_by_reduction))
-   return opt_result::failure_at (phi, "unsupported phi\n");
-
- if ((STMT_VINFO_DEF_TYPE (stmt_info) == vect_internal_def
-  || (STMT_VINFO_DEF_TYPE (stmt_info)
-  == vect_double_reduction_def))
- && ! PURE_SLP_STMT (stmt_info))
-   return opt_result::failure_at (phi, "unsupported phi\n");
-}
-
-  continue;
-}
-
-  gcc_assert (stmt_info);
-
-  if ((STMT_VINFO_RELEVANT (stmt_info) == vect_used_in_scope
-   || STMT_VINFO_LIVE_P (stmt_info))
- && STMT_VINFO_DEF_TYPE (stmt_info) != vect_induction_def
- && STMT_VINFO_DEF_TYPE (stmt_info) != vect_first_order_recurrence)
-   /* A scalar-dependence cycle that we don't support.  */
-   return opt_result::failure_at (phi,
-

[PATCH 1/3] Remove non-SLP vectorization factor determining

2025-07-09 Thread Richard Biener

The following removes the VF determining step from non-SLP stmts.
For now we keep setting STMT_VINFO_VECTYPE for all stmts, there are
too many places to fix, including some more complicated ones, so
this is defered for a followup.

Along this removes vect_update_vf_for_slp, merging the check for
present hybrid SLP stmts to vect_detect_hybrid_slp and fail analysis
early.  This also removes to essentially duplicate this check in
the stmt walk of vect_analyze_loop_operations.  Getting rid of that,
and performing some other checks earlier is also defered to a followup.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

* tree-vect-loop.cc (vect_determine_vf_for_stmt_1): Rename
to ...
(vect_determine_vectype_for_stmt_1): ... this and only set
STMT_VINFO_VECTYPE.  Fail for single-element vector types.
(vect_determine_vf_for_stmt): Rename to ...
(vect_determine_vectype_for_stmt): ... this and only set
STMT_VINFO_VECTYPE. Fail for single-element vector types.
(vect_determine_vectorization_factor): Rename to ...
(vect_set_stmts_vectype): ... this and only set STMT_VINFO_VECTYPE.
(vect_update_vf_for_slp): Remove.
(vect_analyze_loop_operations): Remove walk over stmts.
(vect_analyze_loop_2): Call vect_set_stmts_vectype instead of
vect_determine_vectorization_factor.  Set vectorization factor
from LOOP_VINFO_SLP_UNROLLING_FACTOR.  Fail if vect_detect_hybrid_slp
detects hybrid stmts or when vect_make_slp_decision finds
nothing to SLP.
* tree-vect-slp.cc (vect_detect_hybrid_slp): Move check
whether we have any hybrid stmts here from vect_update_vf_for_slp
* tree-vect-stmts.cc (vect_analyze_stmt): Remove loop over
stmts.
* tree-vectorizer.h (vect_detect_hybrid_slp): Update.
---
 gcc/tree-vect-loop.cc  | 220 -
 gcc/tree-vect-slp.cc   |  48 -
 gcc/tree-vect-stmts.cc |  12 ++-
 gcc/tree-vectorizer.h  |   2 +-
 4 files changed, 100 insertions(+), 182 deletions(-)

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 42e00159ff8..98ac528e3a9 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -168,9 +168,8 @@ static stmt_vec_info vect_is_simple_reduction 
(loop_vec_info, stmt_vec_info,
may already be set for general statements (not just data refs).  */
 
 static opt_result
-vect_determine_vf_for_stmt_1 (vec_info *vinfo, stmt_vec_info stmt_info,
- bool vectype_maybe_set_p,
- poly_uint64 *vf)
+vect_determine_vectype_for_stmt_1 (vec_info *vinfo, stmt_vec_info stmt_info,
+  bool vectype_maybe_set_p)
 {
   gimple *stmt = stmt_info->stmt;
 
@@ -192,6 +191,12 @@ vect_determine_vf_for_stmt_1 (vec_info *vinfo, 
stmt_vec_info stmt_info,
 
   if (stmt_vectype)
 {
+  if (known_le (TYPE_VECTOR_SUBPARTS (stmt_vectype), 1U))
+   return opt_result::failure_at (STMT_VINFO_STMT (stmt_info),
+  "not vectorized: unsupported "
+  "data-type in %G",
+  STMT_VINFO_STMT (stmt_info));
+
   if (STMT_VINFO_VECTYPE (stmt_info))
/* The only case when a vectype had been already set is for stmts
   that contain a data ref, or for "pattern-stmts" (stmts generated
@@ -203,9 +208,6 @@ vect_determine_vf_for_stmt_1 (vec_info *vinfo, 
stmt_vec_info stmt_info,
STMT_VINFO_VECTYPE (stmt_info) = stmt_vectype;
 }
 
-  if (nunits_vectype)
-vect_update_max_nunits (vf, nunits_vectype);
-
   return opt_result::success ();
 }
 
@@ -215,13 +217,12 @@ vect_determine_vf_for_stmt_1 (vec_info *vinfo, 
stmt_vec_info stmt_info,
or false if something prevented vectorization.  */
 
 static opt_result
-vect_determine_vf_for_stmt (vec_info *vinfo,
-   stmt_vec_info stmt_info, poly_uint64 *vf)
+vect_determine_vectype_for_stmt (vec_info *vinfo, stmt_vec_info stmt_info)
 {
   if (dump_enabled_p ())
 dump_printf_loc (MSG_NOTE, vect_location, "==> examining statement: %G",
 stmt_info->stmt);
-  opt_result res = vect_determine_vf_for_stmt_1 (vinfo, stmt_info, false, vf);
+  opt_result res = vect_determine_vectype_for_stmt_1 (vinfo, stmt_info, false);
   if (!res)
 return res;
 
@@ -240,7 +241,7 @@ vect_determine_vf_for_stmt (vec_info *vinfo,
dump_printf_loc (MSG_NOTE, vect_location,
 "==> examining pattern def stmt: %G",
 def_stmt_info->stmt);
- res = vect_determine_vf_for_stmt_1 (vinfo, def_stmt_info, true, vf);
+ res = vect_determine_vectype_for_stmt_1 (vinfo, def_stmt_info, true);
  if (!res)
return res;
}
@@ -249,7 +250,7 @@ vect_determine_vf_for_stmt (vec_info *vinfo,
dump_printf_loc (MSG_NOTE, vect_location,

Re: [PATCH] libstdc++: Ensure pool resources meet alignment requirements [PR118681]

2025-07-09 Thread Andreas Schwab

This breaks several cross compilers:

../../../../../libstdc++-v3/src/c++17/memory_resource.cc: In member function 
'virtual void* std::pmr::unsynchronized_pool_resource::do_allocate(std::size_t, 
std::size_t)':
../../../../../libstdc++-v3/src/c++17/memory_resource.cc:1474:29: error: 
'choose_block_size' was not declared in this scope
 1474 | const auto block_size = choose_block_size(bytes, alignment);
  | ^
../../../../../libstdc++-v3/src/c++17/memory_resource.cc: In member function 
'virtual void std::pmr::unsynchronized_pool_resource::do_deallocate(void*, 
std::size_t, std::size_t)':
../../../../../libstdc++-v3/src/c++17/memory_resource.cc:1491:25: error: 
'choose_block_size' was not declared in this scope
 1491 | size_t block_size = choose_block_size(bytes, alignment);
  | ^
make[5]: *** [Makefile:587: memory_resource.lo] Error 1
make[5]: Leaving directory 
'/home/abuild/rpmbuild/BUILD/cross-arm-none-gcc16-16.0.0+git2118-build/gcc-16.0.0+git2118/obj-x86_64-suse-linux/arm-none-eabi/libstdc++-v3/src/c++17'

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."

[committed] libstdc++: Fix memory_resource.cc bootstrap failure for non-gthreads targets

2025-07-09 Thread Jonathan Wakely

The new choose_block_size function added in r16-2112-gac2fb60a67d6d1 was
defined inside an #ifdef _GLIBCXX_HAS_GTHREADS group, which means that
it's not available for single-threaded targets, and so can't be used by
unsynchronized_pool_resource. Move it before that preprocessor group so
it's always defined.

libstdc++-v3/ChangeLog:

* src/c++17/memory_resource.cc: Adjust indentation of unnamed
namespaces.
(pool_sizes): Add comment.
(choose_block_size): Move outside preprocessor group for
gthreads targets.
* testsuite/20_util/synchronized_pool_resource/118681.cc:
Require gthreads.
---

Tested x86_64-linux. Pushed to trunk.

I'll include this fix in the backports of r16-2112-gac2fb60a67d6d1.

 libstdc++-v3/src/c++17/memory_resource.cc | 66 +++
 .../synchronized_pool_resource/118681.cc  |  1 +
 2 files changed, 40 insertions(+), 27 deletions(-)

diff --git a/libstdc++-v3/src/c++17/memory_resource.cc 
b/libstdc++-v3/src/c++17/memory_resource.cc
index fddfe2c7dd98..c61569f249ad 100644
--- a/libstdc++-v3/src/c++17/memory_resource.cc
+++ b/libstdc++-v3/src/c++17/memory_resource.cc
@@ -182,8 +182,8 @@ namespace pmr
   // versions will not use this symbol.
   monotonic_buffer_resource::~monotonic_buffer_resource() { release(); }
 
-  namespace {
-
+namespace
+{
   // aligned_size stores the size and alignment of a memory allocation.
   // The size must be a multiple of N, leaving the low log2(N) bits free
   // to store the base-2 logarithm of the alignment.
@@ -221,7 +221,7 @@ namespace pmr
 return (n + alignment - 1) & ~(alignment - 1);
   }
 
-  } // namespace
+} // namespace
 
   // Memory allocated by the upstream resource is managed in a linked list
   // of _Chunk objects. A _Chunk object recording the size and alignment of
@@ -307,8 +307,8 @@ namespace pmr
 
   // Helper types for synchronized_pool_resource & unsynchronized_pool_resource
 
-  namespace {
-
+namespace
+{
   // Simple bitset with runtime size.
   // Tracks which blocks in a pool chunk are used/unused.
   struct bitset
@@ -636,7 +636,7 @@ namespace pmr
 
   static_assert(sizeof(big_block) == (2 * sizeof(void*)));
 
-  } // namespace
+} // namespace
 
   // A pool that serves blocks of a particular size.
   // Each pool manages a number of chunks.
@@ -868,7 +868,16 @@ namespace pmr
 using big_block::big_block;
   };
 
-  namespace {
+namespace
+{
+  // N.B. it is important that we don't skip any power of two sizes if there
+  // is a non-power of two size between them, e.g. must not have pool sizes
+  // of 24 and 40 without having a pool size of 32. Otherwise an allocation
+  // of 32 bytes with alignment 16 would choose the 40-byte pool which is not
+  // correctly aligned for 16-byte alignment. It would be OK (but suboptimal)
+  // to have no pool of size 32 if we have pool sizes of 16 and 64 and no
+  // non-power of two sizes between those, because the example of (32, 16)
+  // would choose the 64-byte pool, which would be correctly aligned.
 
   constexpr size_t pool_sizes[] = {
   8, 16, 24,
@@ -983,7 +992,7 @@ namespace pmr
   using exclusive_lock = lock_guard;
 #endif
 
-  } // namespace
+} // namespace
 
   __pool_resource::
   __pool_resource(const pool_options& opts, memory_resource* upstream)
@@ -1075,12 +1084,33 @@ namespace pmr
 return p;
   }
 
+  // Determine the appropriate allocation size, rounding up to a multiple
+  // of the alignment if needed.
+  static inline size_t
+  choose_block_size(size_t bytes, size_t alignment)
+  {
+if (bytes == 0) [[unlikely]]
+  return alignment;
+
+// Use bit_ceil in case alignment is invalid (i.e. not a power of two).
+size_t mask = std::__bit_ceil(alignment) - 1;
+// Round up to a multiple of alignment.
+size_t block_size = (bytes + mask) & ~mask;
+
+if (block_size >= bytes) [[likely]]
+  return block_size;
+
+// Wrapped around to zero, bytes must have been impossibly large.
+return numeric_limits::max();
+  }
+
+
 #ifdef _GLIBCXX_HAS_GTHREADS
   // synchronized_pool_resource members.
 
   /* Notes on implementation and thread safety:
*
-   * Each synchronized_pool_resource manages an linked list of N+1 _TPools
+   * Each synchronized_pool_resource manages a linked list of N+1 _TPools
* objects, where N is the number of threads using the pool resource.
* Each _TPools object has its own set of pools, with their own chunks.
* The first element of the list, _M_tpools[0], can be used by any thread.
@@ -1242,24 +1272,6 @@ namespace pmr
 return pools;
   }
 
-  static inline size_t
-  choose_block_size(size_t bytes, size_t alignment)
-  {
-if (bytes == 0) [[unlikely]]
-  return alignment;
-
-// Use bit_ceil in case alignment is invalid (i.e. not a power of two).
-size_t mask = std::__bit_ceil(alignment) - 1;
-// Round up to a multiple of alignment.
-size_t block_size = (bytes + mask) & ~mask;
-
-if (block_size >= by

Re: [PATCH] libstdc++: Ensure pool resources meet alignment requirements [PR118681]

2025-07-09 Thread Jonathan Wakely

On Wed, 9 Jul 2025 at 10:10, Jonathan Wakely  wrote:
>
> On Wed, 9 Jul 2025 at 09:51, Andreas Schwab  wrote:
> >
> > This breaks several cross compilers:
> >
> > ../../../../../libstdc++-v3/src/c++17/memory_resource.cc: In member 
> > function 'virtual void* 
> > std::pmr::unsynchronized_pool_resource::do_allocate(std::size_t, 
> > std::size_t)':
> > ../../../../../libstdc++-v3/src/c++17/memory_resource.cc:1474:29: error: 
> > 'choose_block_size' was not declared in this scope
> >  1474 | const auto block_size = choose_block_size(bytes, alignment);
> >   | ^
> > ../../../../../libstdc++-v3/src/c++17/memory_resource.cc: In member 
> > function 'virtual void 
> > std::pmr::unsynchronized_pool_resource::do_deallocate(void*, std::size_t, 
> > std::size_t)':
> > ../../../../../libstdc++-v3/src/c++17/memory_resource.cc:1491:25: error: 
> > 'choose_block_size' was not declared in this scope
> >  1491 | size_t block_size = choose_block_size(bytes, alignment);
> >   | ^
> > make[5]: *** [Makefile:587: memory_resource.lo] Error 1
> > make[5]: Leaving directory 
> > '/home/abuild/rpmbuild/BUILD/cross-arm-none-gcc16-16.0.0+git2118-build/gcc-16.0.0+git2118/obj-x86_64-suse-linux/arm-none-eabi/libstdc++-v3/src/c++17'
>
>
> Yeah I got a CI email from Linaro, the choose_block_size function
> needs to be moved outside the #ifdef _GLIBCXX_HAS_GTHREADS group so
> that's it's usable for --disable-threads targets.

It should be fixed at  r16-2123-g7a878ba615c2c5

[PATCH 1/2] libstdc++: Treat __int128 as a real integral type [PR96710]

2025-07-09 Thread Jonathan Wakely

Since LWG 3828 (included in C++23) implementations are allowed to have
extended integer types that are wider than intmax_t. This means we no
longer have to make is_integral_v<__int128> false for strict -std=c++23
mode, removing the confusing inconsistency with -std=gnu++23 (where
is_integral_v<__int128> is true).

This change makes __int128 a true integral type for all modes, treating
LWG 3828 as a DR for previous standards. Most of the change just
involves removing special cases where we wanted to treat __int128 and
unsigned __int128 as integral types even when is_integral_v was false.

There are still some preprocessor conditionals needed, because on some
targets the compiler defines the macro __GLIBCXX_TYPE_INT_N_0 as
__int128 in non-strict modes. Because we define explicit specializations
of templates such as is_integral for all the INT_N types, we already
have a specialization of is_integral<__int128> in non-strict modes, and
so to avoid a redefinition we only must only define
is_integral<__int128> for strict modes.

libstdc++-v3/ChangeLog:

PR libstdc++/96710
* include/bits/cpp_type_traits.h (__is_integer): Define explicit
specializations for __int128.
(__memcpyable_integer): Remove explicit specializations for
__int128.
* include/bits/iterator_concepts.h (incrementable_traits):
Likewise.
(__is_signed_int128, __is_unsigned_int128, __is_int128): Remove.
(__is_integer_like, __is_signed_integer_like): Remove check for
__int128.
* include/bits/max_size_type.h: Remove all uses of __is_int128
in constraints.
* include/bits/ranges_base.h (__to_unsigned_like): Remove
overloads for __int128.
(ranges::ssize): Remove special case for __int128.
* include/bits/stl_algobase.h (__size_to_integer): Define
__int128 overloads for strict modes.
* include/ext/numeric_traits.h (__is_integer_nonstrict): Remove
explicit specializations for __int128.
* include/std/charconv (to_chars): Define overloads for
__int128.
* include/std/format (__format::make_unsigned_t): Remove.
(__format::to_chars): Remove.
* include/std/limits (numeric_limits): Define explicit
specializations for __int128.
* include/std/type_traits (__is_integral_helper): Likewise.
(__make_unsigned, __make_signed): Likewise.
---

Tested x86_64-linux and powerpc64le-linux.

 libstdc++-v3/include/bits/cpp_type_traits.h   | 17 +++
 libstdc++-v3/include/bits/iterator_concepts.h | 34 -
 libstdc++-v3/include/bits/max_size_type.h | 48 +--
 libstdc++-v3/include/bits/ranges_base.h   | 15 --
 libstdc++-v3/include/bits/stl_algobase.h  |  7 +++
 libstdc++-v3/include/ext/numeric_traits.h |  6 ---
 libstdc++-v3/include/std/charconv |  4 ++
 libstdc++-v3/include/std/format   | 14 --
 libstdc++-v3/include/std/limits   |  3 +-
 libstdc++-v3/include/std/type_traits  | 25 ++
 10 files changed, 67 insertions(+), 106 deletions(-)

diff --git a/libstdc++-v3/include/bits/cpp_type_traits.h 
b/libstdc++-v3/include/bits/cpp_type_traits.h
index b1a6206ce1eb..770ad94b3b4d 100644
--- a/libstdc++-v3/include/bits/cpp_type_traits.h
+++ b/libstdc++-v3/include/bits/cpp_type_traits.h
@@ -273,6 +273,12 @@ __INT_N(__GLIBCXX_TYPE_INT_N_2)
 __INT_N(__GLIBCXX_TYPE_INT_N_3)
 #endif
 
+#if defined __STRICT_ANSI__ && defined __SIZEOF_INT128__
+// In strict modes __GLIBCXX_TYPE_INT_N_0 is not defined for __int128,
+// but we want to always treat signed/unsigned __int128 as integral types.
+__INT_N(__int128)
+#endif
+
 #undef __INT_N
 
   //
@@ -545,17 +551,6 @@ __INT_N(__GLIBCXX_TYPE_INT_N_3)
 { enum { __width = __GLIBCXX_BITSIZE_INT_N_3 }; };
 #endif
 
-#if defined __STRICT_ANSI__ && defined __SIZEOF_INT128__
-  // In strict modes __is_integer<__int128> is false,
-  // but we want to allow memcpy between signed/unsigned __int128.
-  __extension__
-  template<>
-struct __memcpyable_integer<__int128> { enum { __width = 128 }; };
-  __extension__
-  template<>
-struct __memcpyable_integer { enum { __width = 128 }; };
-#endif
-
 #if _GLIBCXX_DOUBLE_IS_IEEE_BINARY64 && _GLIBCXX_LDOUBLE_IS_IEEE_BINARY64
   template<>
 struct __memcpyable { enum { __value = true }; };
diff --git a/libstdc++-v3/include/bits/iterator_concepts.h 
b/libstdc++-v3/include/bits/iterator_concepts.h
index d31e4f145107..979039e7da53 100644
--- a/libstdc++-v3/include/bits/iterator_concepts.h
+++ b/libstdc++-v3/include/bits/iterator_concepts.h
@@ -214,17 +214,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
= make_signed_t() - std::declval<_Tp>())>;
 };
 
-#if defined __STRICT_ANSI__ && defined __SIZEOF_INT128__
-  // __int128 is incrementable even if !integral<__int128>
-  template<>
-struct incrementable_traits<__int128>
-{ using difference_type = __int128; };
-
-  template<>
-struct increme

[PATCH] testsuite/120093 - fix gcc.dg/vect/pr101145.c

2025-07-09 Thread Richard Biener

The following changes noinline to noipa to avoid having IPA-CP clones
confusing the vectorized loop counting.

Tested on x86_64-unknown-linux-gnu, pushed.

PR testsuite/120093
* gcc.dg/vect/pr101145.c: Use noipa instead of noinline
attribute.
---
 gcc/testsuite/gcc.dg/vect/pr101145.c | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/vect/pr101145.c 
b/gcc/testsuite/gcc.dg/vect/pr101145.c
index cd11c030d57..c055ae6359f 100644
--- a/gcc/testsuite/gcc.dg/vect/pr101145.c
+++ b/gcc/testsuite/gcc.dg/vect/pr101145.c
@@ -2,7 +2,7 @@
 /* { dg-additional-options "-O3" } */
 #include 
 
-unsigned __attribute__ ((noinline))
+unsigned __attribute__ ((noipa))
 foo (int *__restrict__ a, int *__restrict__ b, unsigned l, unsigned n)
 {
   while (n < ++l)
@@ -10,7 +10,7 @@ foo (int *__restrict__ a, int *__restrict__ b, unsigned l, 
unsigned n)
   return l;
 }
 
-unsigned __attribute__ ((noinline))
+unsigned __attribute__ ((noipa))
 foo_1 (int *__restrict__ a, int *__restrict__ b, unsigned l, unsigned)
 {
   while (UINT_MAX - 64 < ++l)
@@ -18,7 +18,7 @@ foo_1 (int *__restrict__ a, int *__restrict__ b, unsigned l, 
unsigned)
   return l;
 }
 
-unsigned __attribute__ ((noinline))
+unsigned __attribute__ ((noipa))
 foo_2 (int *__restrict__ a, int *__restrict__ b, unsigned l, unsigned n)
 {
   l = UINT_MAX - 32;
@@ -27,7 +27,7 @@ foo_2 (int *__restrict__ a, int *__restrict__ b, unsigned l, 
unsigned n)
   return l;
 }
 
-unsigned __attribute__ ((noinline))
+unsigned __attribute__ ((noipa))
 foo_3 (int *__restrict__ a, int *__restrict__ b, unsigned l, unsigned n)
 {
   while (n <= ++l)
@@ -35,7 +35,7 @@ foo_3 (int *__restrict__ a, int *__restrict__ b, unsigned l, 
unsigned n)
   return l;
 }
 
-unsigned __attribute__ ((noinline))
+unsigned __attribute__ ((noipa))
 foo_4 (int *__restrict__ a, int *__restrict__ b, unsigned l, unsigned n)
 {  // infininate 
   while (0 <= ++l)
@@ -43,7 +43,7 @@ foo_4 (int *__restrict__ a, int *__restrict__ b, unsigned l, 
unsigned n)
   return l;
 }
 
-unsigned __attribute__ ((noinline))
+unsigned __attribute__ ((noipa))
 foo_5 (int *__restrict__ a, int *__restrict__ b, unsigned l, unsigned n)
 {
   //no loop
@@ -53,7 +53,7 @@ foo_5 (int *__restrict__ a, int *__restrict__ b, unsigned l, 
unsigned n)
   return l;
 }
 
-unsigned __attribute__ ((noinline))
+unsigned __attribute__ ((noipa))
 bar (int *__restrict__ a, int *__restrict__ b, unsigned l, unsigned n)
 {
   while (--l < n)
@@ -61,7 +61,7 @@ bar (int *__restrict__ a, int *__restrict__ b, unsigned l, 
unsigned n)
   return l;
 }
 
-unsigned __attribute__ ((noinline))
+unsigned __attribute__ ((noipa))
 bar_1 (int *__restrict__ a, int *__restrict__ b, unsigned l, unsigned)
 {
   while (--l < 64)
@@ -69,7 +69,7 @@ bar_1 (int *__restrict__ a, int *__restrict__ b, unsigned l, 
unsigned)
   return l;
 }
 
-unsigned __attribute__ ((noinline))
+unsigned __attribute__ ((noipa))
 bar_2 (int *__restrict__ a, int *__restrict__ b, unsigned l, unsigned n)
 {
   l = 32;
-- 
2.43.0

[PATCH 2/2] libstdc++: Always treat __float128 as a floating-point type

2025-07-09 Thread Jonathan Wakely

Similar to the previous commit that made is_integral_v<__int128>
unconditionally true, this makes is_floating_point_v<__float128>
unconditionally true. With the new extended floating-point types in
C++23 (std::float64_t etc.) it seems unhelpful for is_floating_point_v
to be true for them, but not for __float128. Especially as it is true on
some targets, because __float128 is just a typedef for long double.

This change makes is_floating_point_v<__float128> true whenever the type
is defined, giving less surprising and more portable behaviour.

libstdc++-v3/ChangeLog:

* include/bits/cpp_type_traits.h (__is_floating<__float128>):
Do not depend on __STRICT_ANSI__.
* include/bits/stl_algobase.h (__size_to_integer(__float128)):
Likewise.
* include/std/type_traits (__is_floating_point_helper<__float128>):
Likewise.
---

Tested x86_64-linux and powerpc64le-linux.

I don't _think_ this affects how  handles __float128, because
there are std::formatter specializations for each floating-point type,
rather than a partial specialization that uses std::floating_point.

 libstdc++-v3/include/bits/cpp_type_traits.h | 9 +
 libstdc++-v3/include/bits/stl_algobase.h| 2 +-
 libstdc++-v3/include/std/type_traits| 2 +-
 3 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/libstdc++-v3/include/bits/cpp_type_traits.h 
b/libstdc++-v3/include/bits/cpp_type_traits.h
index 770ad94b3b4d..38cea4c67b76 100644
--- a/libstdc++-v3/include/bits/cpp_type_traits.h
+++ b/libstdc++-v3/include/bits/cpp_type_traits.h
@@ -313,6 +313,15 @@ __INT_N(__int128)
   typedef __true_type __type;
 };
 
+#ifdef _GLIBCXX_USE_FLOAT128
+  template<>
+struct __is_floating<__float128>
+{
+  enum { __value = 1 };
+  typedef __true_type __type;
+};
+#endif
+
 #ifdef __STDCPP_FLOAT16_T__
   template<>
 struct __is_floating<_Float16>
diff --git a/libstdc++-v3/include/bits/stl_algobase.h 
b/libstdc++-v3/include/bits/stl_algobase.h
index 71ef2335a311..b104ec2536a0 100644
--- a/libstdc++-v3/include/bits/stl_algobase.h
+++ b/libstdc++-v3/include/bits/stl_algobase.h
@@ -1065,7 +1065,7 @@ _GLIBCXX_END_NAMESPACE_CONTAINER
   __size_to_integer(double __n) { return (long long)__n; }
   inline _GLIBCXX_CONSTEXPR long long
   __size_to_integer(long double __n) { return (long long)__n; }
-#if !defined(__STRICT_ANSI__) && defined(_GLIBCXX_USE_FLOAT128)
+#ifdef _GLIBCXX_USE_FLOAT128
   __extension__ inline _GLIBCXX_CONSTEXPR long long
   __size_to_integer(__float128 __n) { return (long long)__n; }
 #endif
diff --git a/libstdc++-v3/include/std/type_traits 
b/libstdc++-v3/include/std/type_traits
index e88d04e44d76..78a5ee8c0eb4 100644
--- a/libstdc++-v3/include/std/type_traits
+++ b/libstdc++-v3/include/std/type_traits
@@ -532,7 +532,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 : public true_type { };
 #endif
 
-#if !defined(__STRICT_ANSI__) && defined(_GLIBCXX_USE_FLOAT128)
+#ifdef _GLIBCXX_USE_FLOAT128
   template<>
 struct __is_floating_point_helper<__float128>
 : public true_type { };
-- 
2.50.0

Re: [PATCH] libstdc++: Members missing in std::numeric_limits

2025-07-09 Thread Jonathan Wakely

On Fri, 4 Jul 2025 at 13:11, Mateusz Zych  wrote:
>
> Hello!
>
> I've updated the ChangeLog, since I forgot to do it before.

Thanks, I've pushed the patch to trunk now.  I used a simpler commit
message, without the large verbatim quotes from the standard.

Thanks again for noticing the problem and contributing the fix.


>
> Thanks, Mateusz Zych
>
> On Thu, Jul 3, 2025 at 9:49 PM Mateusz Zych  wrote:
>>
>> Hello!
>>
>> I've prepared a patch, which adds all members missing from
>> std::numeric_limits<> specializations for integer-class types.
>>
>> Jonathan, please let me know whether you like these changes
>> and do not see any bugs or issues with them. From my side, I just want to 
>> say that:
>>
>> Since all std::numeric_limits<> specializations for integral types,
>> defined in //libstdc++-v3/include/std/limits don't inherit from a base class
>> providing common data members and member functions,
>> I also didn't introduce such a base class in 
>> //libstdc++-v3/include/bits/max_size_type.h.
>> Such implementation has quite a bit of code duplication, but it's like that 
>> on purpose, right?
>>
>> I didn't test the traps static data member, because I don't know how to
>> accurately predict when this compile-time constant should be true and when 
>> it should be false.
>> Moreover, I saw that the unit-test verifying correctness of the traps 
>> constant
>> from std::numeric_limits<> specializations for integral types
>> (//libstdc++-v3/testsuite/18_support/numeric_limits/traps.cc) also doesn't 
>> verify its value.
>>
>> In the unit-tests for integer-class types I've defined variable template
>> verify_numeric_limits_values_not_meaningful_for<> to avoid code duplication
>> and have clear and readable code. I hope this is OK.
>>
>> Thanks, Mateusz Zych
>>
>> On Wed, Jul 2, 2025 at 7:30 PM Jonathan Wakely  wrote:
>>>
>>> On Wed, 2 Jul 2025 at 17:15, Mateusz Zych wrote:
>>> >
>>> > OK, then I’ll prepare appropriate patch with tests and send it when I’m 
>>> > done implementing it.
>>>
>>> That would be great, thanks. I won't push the initial patch, we can
>>> wait for you to prepare the complete fix.
>>>
>>> Please note that for a more significant change, we have some legal
>>> prerequisites for contributions, as documented at:
>>> https://gcc.gnu.org/contribute.html#legal
>>>
>>> If you want to contribute under the DCO terms, please read
>>> https://gcc.gnu.org/dco.html so that you understand exactly what the
>>> Signed-off-by: trailer means.
>>>
>>> Thanks!
>>>

Re: [PATCH] libstdc++: Add smart ptr owner_equals and owner_hash structs and members for P1901R2

2025-07-09 Thread Jonathan Wakely

Pushed to trunk now - thanks for contributing this!

On Tue, 8 Jul 2025 at 18:32, Paul Keir  wrote:
>
> Thanks Jonathan.
>
>
> 
> From: Jonathan Wakely 
> Sent: 08 July 2025 1:37 PM
> To: Paul Keir
> Cc: gcc-patches@gcc.gnu.org; libstd...@gcc.gnu.org
> Subject: Re: [PATCH] libstdc++: Add smart ptr owner_equals and owner_hash 
> structs and members for P1901R2
>
>
>
> Warning: Do not open attachments or click on links unless you trust the sender
>
>
>
> On Tue, 8 Jul 2025 at 13:24, Jonathan Wakely  wrote:
> >
> > On Tue, 8 Jul 2025 at 12:54, Paul Keir  wrote:
> > >
> > > Let me know if this needs a refresh.
> >
> > The patch fails to apply:
> >
> > error: patch failed: libstdc++-v3/include/bits/shared_ptr_base.h:1715
> > error: libstdc++-v3/include/bits/shared_ptr_base.h: patch does not apply
> >
> > but I think it's your mail client munging whitespace, not something
> > that can be fixed by rebasing on trunk.
> > I'll figure it out and apply it by hand.
>
> OK, I added your github fork and did a merge --squash from there
>
>
> >
> >
> > >
> > > 
> > > From: Paul Keir 
> > > Sent: 06 June 2025 5:32 PM
> > > To: Jonathan Wakely
> > > Cc: gcc-patches@gcc.gnu.org; libstd...@gcc.gnu.org
> > > Subject: Re: [PATCH] libstdc++: Add smart ptr owner_equals and owner_hash 
> > > structs and members for P1901R2
> > >
> > > No problem. That should be it included below. Github diff for 
> > > convenience: 
> > > https://github.com/gcc-mirror/gcc/compare/e37eb85...pkeir:gcc:1b7c7c1a
> > >
> > > Signed-off-by: Paul Keir 
> > >
> > > Tested on x86_64-linux.
> > >
> > > libstdc++-v3/ChangeLog:
> > >
> > > * include/bits/shared_ptr.h: Added owner_equal and owner_hash 
> > > members to shared_ptr and weak_ptr.
> > > * include/bits/shared_ptr_base.h: Added owner_equal and 
> > > owner_hash structs.
> > > * include/bits/version.def: Added 
> > > __cpp_lib_smart_ptr_owner_equality feature macro.
> > > * include/bits/version.h: Update generated for 
> > > __cpp_lib_smart_ptr_owner_equality feature macro.
> > > * include/std/memory: Added define for 
> > > __glibcxx_want_smart_ptr_owner_equality.
> > > * testsuite/20_util/owner_equal/version.cc: New test.
> > > * testsuite/20_util/owner_equal/cmp.cc: New test.
> > > * testsuite/20_util/owner_equal/noexcept.cc: New test.
> > > * testsuite/20_util/owner_hash/cmp.cc: New test.
> > > * testsuite/20_util/owner_hash/noexcept.cc: New test.
> > > * testsuite/20_util/shared_ptr/observers/owner_equal.cc: New test.
> > > * testsuite/20_util/shared_ptr/observers/owner_hash.cc: New test.
> > > * testsuite/20_util/weak_ptr/observers/owner_equal.cc: New test.
> > > * testsuite/20_util/weak_ptr/observers/owner_hash.cc: New test.
> > >
> > > ---
> > >
> > >  include/bits/shared_ptr.h  |  57 +++
> > >  include/bits/shared_ptr_base.h |  40 
> > >  include/bits/version.def   |   9 ++
> > >  include/bits/version.h |  10 ++
> > >  include/std/memory |   1 +
> > >  testsuite/20_util/owner_equal/cmp.cc   | 105 
> > > +
> > >  testsuite/20_util/owner_equal/noexcept.cc  |  30 ++
> > >  testsuite/20_util/owner_equal/version.cc   |  13 +++
> > >  testsuite/20_util/owner_hash/cmp.cc|  87 
> > > +
> > >  testsuite/20_util/owner_hash/noexcept.cc   |  16 
> > >  .../20_util/shared_ptr/observers/owner_equal.cc|  74 +++
> > >  .../20_util/shared_ptr/observers/owner_hash.cc |  71 ++
> > >  .../20_util/weak_ptr/observers/owner_equal.cc  |  52 ++
> > >  testsuite/20_util/weak_ptr/observers/owner_hash.cc |  50 ++
> > >  14 files changed, 615 insertions(+)
> > >
> > > diff --git a/libstdc++-v3/include/bits/shared_ptr.h 
> > > b/libstdc++-v3/include/bits/shared_ptr.h
> > > index a196a0f1212..dd02ab16e59 100644
> > > --- a/libstdc++-v3/include/bits/shared_ptr.h
> > > +++ b/libstdc++-v3/include/bits/shared_ptr.h
> > > @@ -909,6 +909,63 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> > >  : public _Sp_owner_less, shared_ptr<_Tp>>
> > >  { };
> > >
> > > +#ifdef __glibcxx_smart_ptr_owner_equality // >= C++26
> > > +
> > > +  /**
> > > +   * @brief Provides ownership-based hashing.
> > > +   * @headerfile memory
> > > +   * @since C++26
> > > +   */
> > > +  struct owner_hash
> > > +  {
> > > +template
> > > +size_t operator()(const shared_ptr<_Tp>& __s) const noexcept
> > > +{ return __s.owner_hash(); }
> > > +
> > > +template
> > > +size_t operator()(const weak_ptr<_Tp>& __s) const noexcept
> > > +{ return __s.owner_hash(); }
> > > +
> > > +using is_transparent = void;
> > > +  };
> > > +
> > > +  /**
> > > +   * @brief Pro

RE: [PATCH 1/3]middle-end: support vec_cbranch_any and vec_cbranch_all [PR118974]

2025-07-09 Thread Tamar Christina

> > +Operand 0 is a comparison operator.  Operand 1 and operand 2 are the
> > +first and second operands of the comparison, respectively.  Operand 3
> > +is the @code{code_label} to jump to.
> > +
> > +@cindex @code{cbranch_all@var{mode}4} instruction pattern
> > +@item @samp{cbranch_all@var{mode}4}
> > +Conditional branch instruction combined with a compare instruction on 
> > vectors
> > +where it is required that at all of the elementwise comparisons of the
> > +two input vectors are true.
> 
> See above.
> 
> When I look at the RTL for aarch64 I wonder whether the middle-end
> can still invert a jump (for BB reorder, for example)?  Without
> a cbranch_none expander we have to invert during RTL expansion?
> 

Isn't cbranch_none just cbranch_all x 0? i.e. all value must be zero.
I think all states are expressible with any and all and flipping the branches
so it shouldn't be any more restrictive than cbranch itself is today.

cbranch also only supports eq and ne, so none would be cbranch (eq x 0)

and FTR the RTL generated for AArch64 (Both SVE And Adv.SIMD) will be 
simplified to:

(insn 23 22 24 5 (parallel [
(set (reg:VNx4BI 128 [ mask_patt_14.15_57 ])
(unspec:VNx4BI [
(reg:VNx4BI 129)
(const_int 0 [0x0])
(gt:VNx4BI (reg:VNx4SI 114 [ vect__2.11 ])
(const_vector:VNx4SI repeat [
(const_int 0 [0])
]))
] UNSPEC_PRED_Z))
(clobber (reg:CC_NZC 66 cc))
]) "cbranch.c":25:10 -1

(jump_insn 27 26 28 5 (set (pc)
(if_then_else (eq (reg:CC_Z 66 cc)
(const_int 0 [0]))
(label_ref 33)
(pc))) "cbranch.c":25:10 -1
 (int_list:REG_BR_PROB 1014686025 (nil))

The thing is we can't rid of the unspecs as there's concept of masking in RTL 
compares.
We could technically do an AND (and do in some cases) but then you lose the 
predicate
Hint constant in the RTL which tells you whether the mask is known to be all 
true or not.
This hint is crucial to allow for further optimizations.

That said the condition code, branch and compares are fully exposed.

We expand to a larger sequence than I'd like mostly because there's no support
for conditional cbranch optabs, or even conditional vector comparisons. So the 
comparisons
must be generated unpredicated by generating an all true mask, and later 
patterns
merge in the AND.

The new patterns allow us to clean up codegen for Adv.SIMD + SVE (in a single 
loop)
But not pure SVE.  For which I take a different approach to try to avoid 
requiring
a predicated version of these optabs.

I don't want to push my luck, but would you be ok with a conditional version of 
these
optabs too? i.e. cond_cbranch_all and cond_cbranch_all?  This would allow us to
immediately expand to the correct representation for both SVE and Adv.SIMD
without having to rely on various combine patterns and cc-fusion to optimize 
the sequences
later on (which has historically been a bit hit or miss if someone adds a new 
CC pattern).

And the reason for both is that for Adv.SIMD there's no mask at GIMPLE level 
and we have to
make it during expand.

Thanks,
Tamar

[PATCH v5 1/3] Hard register constraints

2025-07-09 Thread Stefan Schulze Frielinghaus

Implement hard register constraints of the form {regname} where regname
must be a valid register name for the target.  Such constraints may be
used in asm statements as a replacement for register asm and in machine
descriptions.  A more verbose description is given in extend.texi.

It is expected and desired that optimizations coalesce multiple pseudos
into one whenever possible.  However, in case of hard register
constraints we may have to undo this and introduce copies since
otherwise we would constraint a single pseudo to multiple hard
registers.  This is done prior RA during asmcons in
match_asm_constraints_2().  While IRA tries to reduce live ranges, it
also replaces some register-register moves.  That in turn might undo
those copies of a pseudo which we just introduced during asmcons.  Thus,
check in decrease_live_ranges_number() via
valid_replacement_for_asm_input_p() whether it is valid to perform a
replacement.

The reminder of the patch mostly deals with parsing and decoding hard
register constraints.  The actual work is done by LRA in
process_alt_operands() where a register filter, according to the
constraint, is installed.

For the sake of "reviewability" and in order to show the beauty of LRA,
error handling (which gets pretty involved) is spread out into a
subsequent patch.

Limitation
--

Currently, a fixed register cannot be used as hard register constraint.
For example, loading the stack pointer on x86_64 via

void *
foo (void)
{
  void *y;
  __asm__ ("" : "={rsp}" (y));
  return y;
}

leads to an error.

Asm Adjust Hook
---

The following targets implement TARGET_MD_ASM_ADJUST:

- aarch64
- arm
- avr
- cris
- i386
- mn10300
- nds32
- pdp11
- rs6000
- s390
- vax

Most of them only add the CC register to the list of clobbered register.
However, cris, i386, and s390 need some minor adjustment.

gcc/ChangeLog:

* config/cris/cris.cc (cris_md_asm_adjust): Deal with hard
register constraint.
* config/i386/i386.cc (map_egpr_constraints): Ditto.
* config/s390/s390.cc (f_constraint_p): Ditto.
* doc/extend.texi: Document hard register constraints.
* doc/md.texi: Ditto.
* function.cc (match_asm_constraints_2): Have a unique pseudo
for each operand with a hard register constraint.
(pass_match_asm_constraints::execute): Calling into new helper
match_asm_constraints_2().
* genoutput.cc (mdep_constraint_len): Return the length of a
hard register constraint.
* genpreds.cc (write_insn_constraint_len): Support hard register
constraints for insn_constraint_len().
* ira.cc (valid_replacement_for_asm_input_p_1): New helper.
(valid_replacement_for_asm_input_p): New helper.
(decrease_live_ranges_number): Similar to
match_asm_constraints_2() ensure that each operand has a unique
pseudo if constrained by a hard register.
* lra-constraints.cc (process_alt_operands): Install hard
register filter according to constraint.
* recog.cc (asm_operand_ok): Accept register type for hard
register constrained asm operands.
(constrain_operands): Validate hard register constraints.
* stmt.cc (decode_hard_reg_constraint): Parse a hard register
constraint into the corresponding register number or bail out.
(parse_output_constraint): Parse hard register constraint and
set *ALLOWS_REG.
(parse_input_constraint): Ditto.
* stmt.h (decode_hard_reg_constraint): Declaration of new
function.

gcc/testsuite/ChangeLog:

* gcc.dg/asm-hard-reg-1.c: New test.
* gcc.dg/asm-hard-reg-2.c: New test.
* gcc.dg/asm-hard-reg-3.c: New test.
* gcc.dg/asm-hard-reg-4.c: New test.
* gcc.dg/asm-hard-reg-5.c: New test.
* gcc.dg/asm-hard-reg-6.c: New test.
* gcc.dg/asm-hard-reg-7.c: New test.
* gcc.dg/asm-hard-reg-8.c: New test.
* gcc.target/aarch64/asm-hard-reg-1.c: New test.
* gcc.target/i386/asm-hard-reg-1.c: New test.
* gcc.target/i386/asm-hard-reg-2.c: New test.
* gcc.target/s390/asm-hard-reg-1.c: New test.
* gcc.target/s390/asm-hard-reg-2.c: New test.
* gcc.target/s390/asm-hard-reg-3.c: New test.
* gcc.target/s390/asm-hard-reg-4.c: New test.
* gcc.target/s390/asm-hard-reg-5.c: New test.
* gcc.target/s390/asm-hard-reg-6.c: New test.
* gcc.target/s390/asm-hard-reg-longdouble.h: New test.
---
 gcc/config/cris/cris.cc   |   6 +-
 gcc/config/i386/i386.cc   |   6 +
 gcc/config/s390/s390.cc   |   6 +-
 gcc/doc/extend.texi   | 162 ++
 gcc/doc/md.texi   |   6 +
 gcc/function.cc   | 116 +
 gcc/genoutput.cc  |  14 ++
 gcc/genpreds.cc   |

[PATCH v5 2/3] Error handling for hard register constraints

2025-07-09 Thread Stefan Schulze Frielinghaus

This implements error handling for hard register constraints including
potential conflicts with register asm operands.

In contrast to register asm operands, hard register constraints allow
more than just one register per operand.  Even more than just one
register per alternative.  For example, a valid constraint for an
operand is "{r0}{r1}m,{r2}".  However, this also means that we have to
make sure that each register is used at most once in each alternative
over all outputs and likewise over all inputs.  For asm statements this
is done by this patch during gimplification.  For hard register
constraints used in machine description, error handling is still a todo
and I haven't investigated this so far and consider this rather a low
priority.

gcc/ada/ChangeLog:

* gcc-interface/trans.cc (gnat_to_gnu): Pass null pointer to
parse_{input,output}_constraint().

gcc/analyzer/ChangeLog:

* region-model-asm.cc (region_model::on_asm_stmt): Pass null
pointer to parse_{input,output}_constraint().

gcc/c/ChangeLog:

* c-typeck.cc (build_asm_expr): Pass null pointer to
parse_{input,output}_constraint().

gcc/ChangeLog:

* cfgexpand.cc (n_occurrences): Move this ...
(check_operand_nalternatives): and this ...
(expand_asm_stmt): and the call to gimplify.cc.
* config/s390/s390.cc (s390_md_asm_adjust): Pass null pointer to
parse_{input,output}_constraint().
* gimple-walk.cc (walk_gimple_asm): Pass null pointer to
parse_{input,output}_constraint().
(walk_stmt_load_store_addr_ops): Ditto.
* gimplify-me.cc (gimple_regimplify_operands): Ditto.
* gimplify.cc (num_occurrences): Moved from cfgexpand.cc.
(num_alternatives): Ditto.
(gimplify_asm_expr): Deal with hard register constraints.
* stmt.cc (eliminable_regno_p): New helper.
(hardreg_ok_p): Perform a similar check as done in
make_decl_rtl().
(parse_output_constraint): Add parameter for gimplify_reg_info
and validate hard register constrained operands.
(parse_input_constraint): Ditto.
* stmt.h (class gimplify_reg_info): Forward declaration.
(parse_output_constraint): Add parameter.
(parse_input_constraint): Ditto.
* tree-ssa-operands.cc
(operands_scanner::get_asm_stmt_operands): Pass null pointer
to parse_{input,output}_constraint().
* tree-ssa-structalias.cc (find_func_aliases): Pass null pointer
to parse_{input,output}_constraint().
* varasm.cc (assemble_asm): Pass null pointer to
parse_{input,output}_constraint().
* gimplify_reg_info.h: New file.

gcc/cp/ChangeLog:

* semantics.cc (finish_asm_stmt): Pass null pointer to
parse_{input,output}_constraint().

gcc/d/ChangeLog:

* toir.cc: Pass null pointer to
parse_{input,output}_constraint().

gcc/testsuite/ChangeLog:

* gcc.dg/pr87600-2.c: Split test into two files since errors for
functions test{0,1} are thrown during expand, and for
test{2,3} during gimplification.
* lib/scanasm.exp: On s390, skip lines beginning with #.
* gcc.dg/asm-hard-reg-error-1.c: New test.
* gcc.dg/asm-hard-reg-error-2.c: New test.
* gcc.dg/asm-hard-reg-error-3.c: New test.
* gcc.dg/asm-hard-reg-error-4.c: New test.
* gcc.dg/asm-hard-reg-error-5.c: New test.
* gcc.dg/pr87600-3.c: New test.
* gcc.target/aarch64/asm-hard-reg-2.c: New test.
* gcc.target/s390/asm-hard-reg-7.c: New test.
---
 gcc/ada/gcc-interface/trans.cc|   9 +-
 gcc/analyzer/region-model-asm.cc  |   7 +-
 gcc/c/c-typeck.cc |   6 +-
 gcc/cfgexpand.cc  |  53 +---
 gcc/config/s390/s390.cc   |   5 +-
 gcc/cp/semantics.cc   |   6 +-
 gcc/d/toir.cc |   6 +-
 gcc/gimple-walk.cc|  11 +-
 gcc/gimplify-me.cc|   5 +-
 gcc/gimplify.cc   | 142 +--
 gcc/gimplify_reg_info.h   | 182 ++
 gcc/stmt.cc   | 238 +-
 gcc/stmt.h|   8 +-
 gcc/testsuite/gcc.dg/asm-hard-reg-error-1.c   |  83 ++
 gcc/testsuite/gcc.dg/asm-hard-reg-error-2.c   |  26 ++
 gcc/testsuite/gcc.dg/asm-hard-reg-error-3.c   |  27 ++
 gcc/testsuite/gcc.dg/asm-hard-reg-error-4.c   |  21 ++
 gcc/testsuite/gcc.dg/asm-hard-reg-error-5.c   |  13 +
 gcc/testsuite/gcc.dg/pr87600-2.c  |  19 --
 gcc/testsuite/gcc.dg/pr87600-3.c  |  26 ++
 .../gcc.target/aarch64/asm-hard-reg-2.c   |  17 ++
 .../gcc.target/s390/asm-hard-reg-7.c  |  34 +++
 gcc/testsuite/lib/scanasm.exp |   4 +
 gcc/tree-ssa-operands.cc  |   4 +-
 gcc

[PATCH v5 0/3] Hard Register Constraints

2025-07-09 Thread Stefan Schulze Frielinghaus

This is a follow-up to
https://gcc.gnu.org/pipermail/gcc-patches/2025-May/684181.html

I added the last missing pieces namely changelogs, and bootstrapped and
regtested on

aarch64-unknown-linux-gnu
powerpc64le-unknown-linux-gnu
s390x-ibm-linux-gnu
x86_64-pc-linux-gnu

Via cross compilers I verified the new tests for

arm-linux-gnueabi
i686-linux-gnueabi
powerpc-linux-gnu
riscv32-linux-gnu
riscv64-linux-gnu

Despite that I removed overloads for parse_{input,output}_constraint()
by passing a null pointer explicitly.  Furthermore, in case of register
pairs, if two constraints of operands overlap, error out and report the
overlapped register.  For example, on aarch64

svuint32x2_t x, y;
asm ("" : "={z5}" (x), "={z6}" (y));

previously I used the register as is of the first constraint in the
error message which is imprecise/misleading.  Now, I error out reporting
multiple outputs to register z6/v6, i.e., the actual overlapped one, and
not z5/v5 as previously.

Although I found a lot of corner cases during development via
-fdemote-register-asm I removed it from this patch series.  I
compiled and used the Linux kernel and glibc successfully with it for
s390x.  For x86_64, the Linux kernel compiles fine, too, except of one
small manual change.  For powerpc64le, I ran into an odd case compiling
glibc which I would like to understand in more detail.  Since register
asm is not as strict as hard register constraints, for a full fledged
implementation I need to consider more corner cases.  Therefore, I would
like to spend some more time on this before I push this new feature.

In total no huge changes.  Still ok for mainline?

Stefan Schulze Frielinghaus (3):
  Hard register constraints
  Error handling for hard register constraints
  genoutput: Verify hard register constraints

 gcc/ada/gcc-interface/trans.cc|   9 +-
 gcc/analyzer/region-model-asm.cc  |   7 +-
 gcc/c/c-typeck.cc |   6 +-
 gcc/cfgexpand.cc  |  53 +---
 gcc/config/cris/cris.cc   |   6 +-
 gcc/config/i386/i386.cc   |   6 +
 gcc/config/s390/s390.cc   |  11 +-
 gcc/cp/semantics.cc   |   6 +-
 gcc/d/toir.cc |   6 +-
 gcc/doc/extend.texi   | 162 ++
 gcc/doc/md.texi   |   6 +
 gcc/function.cc   | 116 
 gcc/genoutput.cc  |  60 
 gcc/genpreds.cc   |   4 +-
 gcc/gimple-walk.cc|  11 +-
 gcc/gimplify-me.cc|   5 +-
 gcc/gimplify.cc   | 142 -
 gcc/gimplify_reg_info.h   | 182 
 gcc/ira.cc|  84 +-
 gcc/lra-constraints.cc|  13 +
 gcc/output.h  |   2 +
 gcc/recog.cc  |  11 +-
 gcc/stmt.cc   | 277 +-
 gcc/stmt.h|   9 +-
 gcc/testsuite/gcc.dg/asm-hard-reg-1.c |  85 ++
 gcc/testsuite/gcc.dg/asm-hard-reg-2.c |  33 +++
 gcc/testsuite/gcc.dg/asm-hard-reg-3.c |  25 ++
 gcc/testsuite/gcc.dg/asm-hard-reg-4.c |  50 
 gcc/testsuite/gcc.dg/asm-hard-reg-5.c |  36 +++
 gcc/testsuite/gcc.dg/asm-hard-reg-6.c |  60 
 gcc/testsuite/gcc.dg/asm-hard-reg-7.c |  41 +++
 gcc/testsuite/gcc.dg/asm-hard-reg-8.c |  49 
 gcc/testsuite/gcc.dg/asm-hard-reg-error-1.c   |  83 ++
 gcc/testsuite/gcc.dg/asm-hard-reg-error-2.c   |  26 ++
 gcc/testsuite/gcc.dg/asm-hard-reg-error-3.c   |  27 ++
 gcc/testsuite/gcc.dg/asm-hard-reg-error-4.c   |  21 ++
 gcc/testsuite/gcc.dg/asm-hard-reg-error-5.c   |  13 +
 gcc/testsuite/gcc.dg/pr87600-2.c  |  19 --
 gcc/testsuite/gcc.dg/pr87600-3.c  |  26 ++
 .../gcc.target/aarch64/asm-hard-reg-1.c   |  55 
 .../gcc.target/aarch64/asm-hard-reg-2.c   |  17 ++
 .../gcc.target/i386/asm-hard-reg-1.c  |  80 +
 .../gcc.target/i386/asm-hard-reg-2.c  |  43 +++
 .../gcc.target/s390/asm-hard-reg-1.c  | 103 +++
 .../gcc.target/s390/asm-hard-reg-2.c  |  43 +++
 .../gcc.target/s390/asm-hard-reg-3.c  |  42 +++
 .../gcc.target/s390/asm-hard-reg-4.c  |   6 +
 .../gcc.target/s390/asm-hard-reg-5.c  |   6 +
 .../gcc.target/s390/asm-hard-reg-6.c  | 152 ++
 .../gcc.target/s390/asm-hard-reg-7.c  |  34 +++
 .../gcc.target/s390/asm-hard-reg-longdouble.h |  18 ++
 gcc/testsuite/lib/scanasm.exp |   4 +
 gcc/toplev.cc |   4 +
 gcc/tree-ssa-operands.cc  |   4 +-
 gcc/tree-ssa-structalias.cc   |   4 +-
 gcc/varasm.cc

[PATCH v5 3/3] genoutput: Verify hard register constraints

2025-07-09 Thread Stefan Schulze Frielinghaus

Since genoutput has no information about hard register names we cannot
statically verify those names in constraints of the machine description.
Therefore, we have to do it at runtime.  Although verification shouldn't
be too expensive, restrict it to checking builds.  This should be
sufficient since hard register constraints in machine descriptions
probably change rarely, and each commit should be tested with checking
anyway, or at the very least before a release is taken.

gcc/ChangeLog:

* genoutput.cc (main): Emit function
verify_reg_names_in_constraints() for run-time validation.
(mdep_constraint_len): Deal with hard register constraints.
* output.h (verify_reg_names_in_constraints): New function
declaration.
* toplev.cc (backend_init): If checking is enabled, call into
verify_reg_names_in_constraints().
---
 gcc/genoutput.cc | 46 ++
 gcc/output.h |  2 ++
 gcc/toplev.cc|  4 
 3 files changed, 52 insertions(+)

diff --git a/gcc/genoutput.cc b/gcc/genoutput.cc
index 908c70efddd..17751e5bfd9 100644
--- a/gcc/genoutput.cc
+++ b/gcc/genoutput.cc
@@ -200,6 +200,8 @@ static const char indep_constraints[] = ",=+%*?!^$#&g";
 static class constraint_data *
 constraints_by_letter_table[1 << CHAR_BIT];
 
+static hash_set used_reg_names;
+
 static int mdep_constraint_len (const char *, file_location, int);
 static void note_constraint (md_rtx_info *);
 
@@ -1156,6 +1158,45 @@ main (int argc, const char **argv)
   output_insn_data ();
   output_get_insn_name ();
 
+  /* Since genoutput has no information about hard register names we cannot
+ statically verify hard register names in constraints of the machine
+ description.  Therefore, we have to do it at runtime.  Although
+ verification shouldn't be too expensive, restrict it to checking builds.
+   */
+  printf ("\n\n#if CHECKING_P\n");
+  if (used_reg_names.is_empty ())
+printf ("void verify_reg_names_in_constraints () { }\n");
+  else
+{
+  size_t max_len = 0;
+  for (auto it = used_reg_names.begin (); it != used_reg_names.end (); 
++it)
+   {
+ size_t len = strlen (*it);
+ if (len > max_len)
+   max_len = len;
+   }
+  printf ("void\nverify_reg_names_in_constraints ()\n{\n");
+  printf ("  static const char hregnames[%zu][%zu] = {\n",
+ used_reg_names.elements (), max_len + 1);
+  auto it = used_reg_names.begin ();
+  while (it != used_reg_names.end ())
+   {
+ printf ("\"%s\"", *it);
+ ++it;
+ if (it != used_reg_names.end ())
+   printf (",");
+ printf ("\n");
+   }
+  printf ("  };\n");
+  printf ("  for (size_t i = 0; i < %zu; ++i)\n",
+ used_reg_names.elements ());
+  printf ("if (decode_reg_name (hregnames[i]) < 0)\n");
+  printf ("  internal_error (\"invalid register %%qs used in "
+ "constraint of machine description\", hregnames[i]);\n");
+  printf ("}\n");
+}
+  printf ("#endif\n");
+
   fflush (stdout);
   return (ferror (stdout) != 0 || have_error
? FATAL_EXIT_CODE : SUCCESS_EXIT_CODE);
@@ -1294,6 +1335,11 @@ mdep_constraint_len (const char *s, file_location loc, 
int opno)
   ptrdiff_t len = end - s;
   if (*end == '}' && len > 1 && len < 31)
{
+ char *regname = new char[len];
+ memcpy (regname, s + 1, len - 1);
+ regname[len - 1] = '\0';
+ if (used_reg_names.add (regname))
+   delete[] regname;
  return len + 1;
}
 }
diff --git a/gcc/output.h b/gcc/output.h
index 372d63c5f5c..0c329ffac3c 100644
--- a/gcc/output.h
+++ b/gcc/output.h
@@ -639,4 +639,6 @@ extern int default_address_cost (rtx, machine_mode, 
addr_space_t, bool);
 /* Stack usage.  */
 extern void output_stack_usage (void);
 
+extern void verify_reg_names_in_constraints ();
+
 #endif /* ! GCC_OUTPUT_H */
diff --git a/gcc/toplev.cc b/gcc/toplev.cc
index 7e457b5168b..753fd1a18eb 100644
--- a/gcc/toplev.cc
+++ b/gcc/toplev.cc
@@ -1815,6 +1815,10 @@ backend_init_target (void)
 static void
 backend_init (void)
 {
+#if CHECKING_P
+  verify_reg_names_in_constraints ();
+#endif
+
   init_emit_once ();
 
   init_rtlanal ();
-- 
2.49.0

Re: [PATCH 2/2] tree-optimization/109893 - allow more backwards jump threading

2025-07-09 Thread Jeff Law





On 7/9/25 7:17 AM, Richard Biener wrote:



ISTR the backwards threader simply cancels paths that had blocks in
common with another jump thread (that happened to be materialized
first).  But maybe it's less strict than that.  It cancels things in too
many places and while it collects all opportunities upfront it doesn't
have any global limit on the amount of copying it does (and do pruning
based on such limit in any sensible order, of course).
My recollection is it cancels paths that are subpaths of some larger 
path.  But if you have distinct incoming edges that target a particular 
outgoing edge, each of those incoming edges can be threaded (and each 
creates a copy).


jeff

[to-be-committed][RISC-V] Detect new fusions for RISC-V

2025-07-09 Thread Jeff Law

This is primarily Daniel's work...  He's chasing things in QEMU & LLVM 
right now so I'm doing a bit of clean-up and shepherding this patch forward.


--

Instruction fusion is a reasonably common way to improve the performance 
of code on many architectures/designs.  A few years ago we submitted 
(via VRULL I suspect) fusion support for a number of cases in the RISC-V 
space.


We made each type of fusion selectable independently in the tuning 
structure so that designs which implemented some particular set of 
fusions could select just the ones their design implemented.  This patch 
adds to that generic infrastructure.


In particular we're introducing additional load fusions, store pair 
fusions, bitfield extractions and a few B extension related fusions.


Conceptually for the new load fusions we're adding the ability to fuse 
most add/shNadd instructions with a subsequent load.  There's a couple 
of exceptions, but in general the expectation is that if we have 
add/shNadd for address computation, then they can potentially use with 
the load where the address gets used.


We've had limited forms of store pair fusion for a while.  Essentially 
we required both stores to be 64 bits wide and land on opposite sides of 
a 128 bit cache line.  That was enough to help prologues and a few other 
things, but was fairly restrictive.  The new cases capture store pairs 
where the two stores have the same size and hit consecutive memory 
locations.  For example, storing consecutive bytes with sb+sb is fusible.


For bitfield extractions we can fuse together a shift left followed by a 
shift right for arbitrary shift counts where as previously we restricted 
the shift counts to those implementing sign/zero extensions of 8, and 16 
bit objects.


Finally some B extension fusions.  orc.b+not which shows up in string 
comparisons, ctz+andi (deepsjeng?), neg+max (synthesized abs).


I hope these prove to be useful to other RISC-V designs.  I wouldn't be 
surprised if we have to break down the new load fusions further for some 
designs.  If we need to do that it wouldn't be hard.


FWIW, our data indicates the generalized store fusions followed by the 
expanded load fusions are the most important cases for the new code.


These have been tested with crosses and bootstrapped on the BPI.

Waiting on pre-commit CI before moving forward (though it has been 
failing to pick up some patches recently...)



Jeffgcc/
* config/riscv/riscv.cc (riscv_fusion_pairs): Add new cases.
(riscv_set_is_add): New function.
(riscv_set_is_addi, riscv_set_is_adduw, riscv_set_is_shNadd): Likewise.
(riscv_set_is_shNadduw): Likewise.
(riscv_macro_fusion_pair_p): Add new fusion cases.


diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index e09c189add92..7965bbc20db7 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -283,6 +283,10 @@ enum riscv_fusion_pairs
   RISCV_FUSE_AUIPC_LD = (1 << 7),
   RISCV_FUSE_LDPREINCREMENT = (1 << 8),
   RISCV_FUSE_ALIGNED_STD = (1 << 9),
+  RISCV_FUSE_CACHE_ALIGNED_STD = (1 << 10),
+  RISCV_FUSE_BFEXT = (1 << 11),
+  RISCV_FUSE_EXPANDED_LD = (1 << 12),
+  RISCV_FUSE_B_ALUI = (1 << 13),
 };
 
 /* Costs of various operations on the different architectures.  */
@@ -10204,6 +10208,81 @@ riscv_fusion_enabled_p(enum riscv_fusion_pairs op)
   return tune_param->fusible_ops & op;
 }
 
+/* Matches an add:
+   (set (reg:DI rd) (plus:SI (reg:SI rs1) (reg:SI rs2))) */
+
+static bool
+riscv_set_is_add (rtx set)
+{
+  return (GET_CODE (SET_SRC (set)) == PLUS
+ && REG_P (XEXP (SET_SRC (set), 0))
+ && REG_P (XEXP (SET_SRC (set), 1))
+ && REG_P (SET_DEST (set)));
+}
+
+/* Matches an addi:
+   (set (reg:DI rd) (plus:SI (reg:SI rs1) (const_int imm))) */
+
+static bool
+riscv_set_is_addi (rtx set)
+{
+  return (GET_CODE (SET_SRC (set)) == PLUS
+ && REG_P (XEXP (SET_SRC (set), 0))
+ && CONST_INT_P (XEXP (SET_SRC (set), 1))
+ && REG_P (SET_DEST (set)));
+}
+
+/* Matches an add.uw:
+  (set (reg:DI rd)
+(plus:DI (zero_extend:DI (reg:SI rs1)) (reg:DI rs2))) */
+
+static bool
+riscv_set_is_adduw (rtx set)
+{
+  return (GET_CODE (SET_SRC (set)) == PLUS
+ && GET_CODE (XEXP (SET_SRC (set), 0)) == ZERO_EXTEND
+ && REG_P (XEXP (XEXP (SET_SRC (set), 0), 0))
+ && REG_P (XEXP (SET_SRC (set), 1))
+ && REG_P (SET_DEST (set)));
+}
+
+/* Matches a shNadd:
+  (set (reg:DI rd)
+   (plus:DI (ashift:DI (reg:DI rs1) (const_int N)) (reg:DI rS2)) */
+
+static bool
+riscv_set_is_shNadd (rtx set)
+{
+  return (GET_CODE (SET_SRC (set)) == PLUS
+ && GET_CODE (XEXP (SET_SRC (set), 0)) == ASHIFT
+ && REG_P (XEXP (XEXP (SET_SRC (set), 0), 0))
+ && CONST_INT_P (XEXP (XEXP (SET_SRC (set), 0), 1))
+ && (INTVAL (XEXP (XEXP (SET_SRC (set), 0), 1)) == 1
+ || INTVAL (XEXP (XEXP (SET_SRC (set), 0), 1)) == 2
+ || INTVAL (XEXP (XEXP (SET_SRC (set),

Re: [PATCH] tree-optimization/120929: Limit MEM_REF handling to .ACCESS_WITH_SIZE

2025-07-09 Thread Siddhesh Poyarekar


On 2025-07-08 18:18, Qing Zhao wrote:

On Jul 8, 2025, at 17:46, Siddhesh Poyarekar  wrote:

On 2025-07-08 17:17, Qing Zhao wrote:

Are the above the correct and efficient updates to the .ACCESS_WITH_SIZE to 
resolve both PR121000 and the issue
we have with counted_by for pointers?


I don't know about PR121000, but for counted_by with pointers, I think the REF_TO_OBJ 
(and the result_type) would also have to be a->fam and not &a->fam, i.e. don't 
generate an INDIRECT_REF to the .ACCESS_WITH_SIZE.


Yes. That’s right. I already changed this in my local workspace. And worked 
well.
With the current .ACCESS_WITH_SIZE, in order to get the correct TYPE_SIZE_UNIT 
of the element TYPE for the pointer array, we have to distinguish whether the 
TYPE passed by the 6th argument is for FAM or for pointer, therefore, I have to 
add one more argument to the .ACCESS_WITH_SIZE:

+   the 7th argument of the call is 1 when for FAM, 0 for pointers.

With the new .ACCESS_WITH_SIZE I proposed in the previous email as:

ACCESS_WITH_SIZE (REF_TO_OBJ, REF_TO_SIZE, CLASS_OF_SIZE,
TYPE_OF_SIZE, ACCESS_MODE)
  which returns the REF_TO_OBJ same as the 1st argument;

  1st argument REF_TO_OBJ: The reference to the object;
  2nd argument REF_TO_SIZE: The reference to the size of the object,
  3rd argument CLASS_OF_SIZE + TYPE_OF_SIZE: An integer constant with a TYPE:

  The integer constant value of this argument represents:
 0: means that the size referenced by the REF_TO_SIZE is the 
number of bytes.
 1: means that the size referenced by the REF_TO_SIZE is the 
number of the elements of the object type;
  The TYPE is the same as the TYPE of the object referenced by REF_TO_SIZE.
  4th argument ACCESS_MODE:
   -1: Unknown access semantics
0: none
1: read_only
2: write_only
3: read_write
  5th argument: The TYPE_SIZE_UNIT of the element TYPE of the FAM or the 
pointer array. “

Since we pass the TYPE_SIZE_UNIT of the element TYPE directly to the call to 
.ACCESS_WITH_SIZE, no need to distinguish whether the TYPE is for FAM or 
pointer anymore.

Hope this is clear.


Sounds reasonable to me.

Thanks,
Sid

Re: [PUSHED] Fix 'main' function in 'gcc.dg/builtin-dynamic-object-size-pr120780.c'

2025-07-09 Thread Siddhesh Poyarekar


On 2025-07-09 04:21, Thomas Schwinge wrote:

Fix-up for commit 72e85d46472716e670cbe6e967109473b8d12d38
"tree-optimization/120780: Support object size for containing objects".
'size_t sz' is unused here, and GCC/nvptx doesn't accept this:

 spawn -ignore SIGHUP [...]/nvptx-none-run 
./builtin-dynamic-object-size-pr120780.exe
 error   : Prototype doesn't match for 'main' in 'input file 1 at offset 
1924', first defined in 'input file 1 at offset 1924'
 nvptx-run: cuLinkAddData failed: unknown error (CUDA_ERROR_UNKNOWN, 999)
 FAIL: gcc.dg/builtin-dynamic-object-size-pr120780.c execution test

gcc/testsuite/
* gcc.dg/builtin-dynamic-object-size-pr120780.c: Fix 'main' function.
---
  gcc/testsuite/gcc.dg/builtin-dynamic-object-size-pr120780.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)


Thanks, I'll backport this to gcc-15 too tomorrow along with my fix.

Sid



diff --git a/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-pr120780.c 
b/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-pr120780.c
index 0d6593ec828..12e6c29569c 100644
--- a/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-pr120780.c
+++ b/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-pr120780.c
@@ -207,7 +207,7 @@ test5 (size_t sz)
  }
  
  int

-main (size_t sz)
+main (void)
  {
test1 (sizeof (struct container));
test1 (sizeof (struct container) - sizeof (int));

[pushed] testsuite: Add a couple of fstack_protector guards

2025-07-09 Thread Richard Sandiford

These tests required runtime support for -fstack-protector,
but didn't test for it.

Tested on aarch64-linux-gnu and aarch64_be-elf & pushed as obvious.

Richard


gcc/testsuite/
* gcc.target/aarch64/pr118348_1.c: Require fstack_protector.
* gcc.target/aarch64/pr118348_2.c: Likewise.
---
 gcc/testsuite/gcc.target/aarch64/pr118348_1.c | 2 +-
 gcc/testsuite/gcc.target/aarch64/pr118348_2.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.target/aarch64/pr118348_1.c 
b/gcc/testsuite/gcc.target/aarch64/pr118348_1.c
index 75f6dada63a..2715dcb8b12 100644
--- a/gcc/testsuite/gcc.target/aarch64/pr118348_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/pr118348_1.c
@@ -1,4 +1,4 @@
-/* { dg-do run { target aarch64_sve128_hw } } */
+/* { dg-do run { target { aarch64_sve128_hw && fstack_protector } } } */
 /* { dg-options "-O2 -fopenmp-simd -fno-trapping-math -msve-vector-bits=128 
--param aarch64-autovec-preference=sve-only -fstack-protector-strong" } */
 
 #pragma GCC target "+sve"
diff --git a/gcc/testsuite/gcc.target/aarch64/pr118348_2.c 
b/gcc/testsuite/gcc.target/aarch64/pr118348_2.c
index 2e200044637..4ce8d20236c 100644
--- a/gcc/testsuite/gcc.target/aarch64/pr118348_2.c
+++ b/gcc/testsuite/gcc.target/aarch64/pr118348_2.c
@@ -1,4 +1,4 @@
-/* { dg-do run { target aarch64_sve256_hw } } */
+/* { dg-do run { target { aarch64_sve256_hw && fstack_protector } } } */
 /* { dg-options "-O2 -fopenmp-simd -fno-trapping-math -msve-vector-bits=256 
--param aarch64-autovec-preference=sve-only -fstack-protector-strong" } */
 
 #include "pr118348_1.c"
-- 
2.43.0

Re: [PATCH] ext-dce: Fix subreg_lsb is_constant assumption

2025-07-09 Thread Richard Sandiford

Jeff Law  writes:
> On 7/4/25 10:21 AM, Richard Sandiford wrote:
>> ext-dce had:
>> 
>>if (SUBREG_P (dst) && SUBREG_BYTE (dst).is_constant ())
>>  {
>>bit = subreg_lsb (dst).to_constant ();
>>if (bit >= HOST_BITS_PER_WIDE_INT)
>>  bit = HOST_BITS_PER_WIDE_INT - 1;
>>dst = SUBREG_REG (dst);
>> 
>> But a constant SUBREG_BYTE doesn't guarantee a constant subreg_lsb.
>> If the SUBREG_REG is a pair of N-bit registers on a big-endian target,
>> the most significant end has a SUBREG_BYTE of 0 but a subreg_lsb of N.
>> This N would then be non-constant for variable-length registers.
>> 
>> The patch fixes gcc.dg/torture/pr120276.c and other failures on
>> aarch64_be-elf.
>> 
>> Tested on aarch64-linux-gnu & aarch64_be-elf.  OK to install?
>> 
>> Richard
>> 
>> 
>> gcc/
>>  * ext-dce.cc (ext_dce_process_uses): Apply is_constant directly
>>  to the subreg_lsb.
> OK, of course.

Thanks.

> Makes me wonder if I should resurrect my aarch64_be RFS.  I changed how 
> those systems worked in the system a few years back to make it work 
> better with container based testing rather than direct chroots.  I never 
> converted aarch64_be to that setup.  It shouldn't be hard if you think 
> it's valuable.

I'm not sure TBH.  The only reason I started looking at aarch64_be
recently was to test a patch for Konstantinos.  And it turns out that
the "before" results are really, really poor.  I think that suggests
that no-one on the AArch64 side is testing big-endian regularly.
(And LLVM have got away without ever implementing big-endian arm_sve.h
support.)  So there's a danger that you'd spend a lot of your time
triaging AArch64-specific bugs.  There again, like you say...

> I can't think of another system where we'd these kinds of issues.

...it probably does have some "unique" features. :)

I later came across another instance of the subreg_lsb thing, which was
causing other ICEs.  I went ahead and installed this as obvious, given
the approval for the earlier one.

Tested on aarch64-linux-gnu and aarch64_be-elf.

Richard


This patch fixes another instance of the problem described in the
cover note for g:bf3037e923e9f91d93ab64bdf73a37f64f659fb9.

gcc/
* ext-dce.cc (ext_dce_process_uses): Apply is_constant directly
to the subreg_lsb.
---
 gcc/ext-dce.cc | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/gcc/ext-dce.cc b/gcc/ext-dce.cc
index e7635fb7a39..67ec92a4287 100644
--- a/gcc/ext-dce.cc
+++ b/gcc/ext-dce.cc
@@ -757,7 +757,7 @@ ext_dce_process_uses (rtx_insn *insn, rtx obj,
 and process the inner object.  */
  if (paradoxical_subreg_p (y))
y = XEXP (y, 0);
- else if (SUBREG_P (y) && SUBREG_BYTE (y).is_constant ())
+ else if (SUBREG_P (y) && subreg_lsb (y).is_constant (&bit))
{
  /* If !TRULY_NOOP_TRUNCATION_MODES_P, the mode
 change performed by Y would normally need to be a
@@ -774,8 +774,6 @@ ext_dce_process_uses (rtx_insn *insn, rtx obj,
GET_MODE (SUBREG_REG (y))
break;
 
- bit = subreg_lsb (y).to_constant ();
-
  /* If this is a wide object (more bits than we can fit
 in a HOST_WIDE_INT), then just break from the SET
 context.   That will cause the iterator to walk down
-- 
2.43.0

Re: [PATCH] libstdc++: Ensure pool resources meet alignment requirements [PR118681]

2025-07-09 Thread Jonathan Wakely

On Wed, 9 Jul 2025 at 09:51, Andreas Schwab  wrote:
>
> This breaks several cross compilers:
>
> ../../../../../libstdc++-v3/src/c++17/memory_resource.cc: In member function 
> 'virtual void* 
> std::pmr::unsynchronized_pool_resource::do_allocate(std::size_t, 
> std::size_t)':
> ../../../../../libstdc++-v3/src/c++17/memory_resource.cc:1474:29: error: 
> 'choose_block_size' was not declared in this scope
>  1474 | const auto block_size = choose_block_size(bytes, alignment);
>   | ^
> ../../../../../libstdc++-v3/src/c++17/memory_resource.cc: In member function 
> 'virtual void std::pmr::unsynchronized_pool_resource::do_deallocate(void*, 
> std::size_t, std::size_t)':
> ../../../../../libstdc++-v3/src/c++17/memory_resource.cc:1491:25: error: 
> 'choose_block_size' was not declared in this scope
>  1491 | size_t block_size = choose_block_size(bytes, alignment);
>   | ^
> make[5]: *** [Makefile:587: memory_resource.lo] Error 1
> make[5]: Leaving directory 
> '/home/abuild/rpmbuild/BUILD/cross-arm-none-gcc16-16.0.0+git2118-build/gcc-16.0.0+git2118/obj-x86_64-suse-linux/arm-none-eabi/libstdc++-v3/src/c++17'


Yeah I got a CI email from Linaro, the choose_block_size function
needs to be moved outside the #ifdef _GLIBCXX_HAS_GTHREADS group so
that's it's usable for --disable-threads targets.

Re: [PATCH] RISC-V: Vector-scalar widening multiply-(subtract-)accumulate [PR119100]

2025-07-09 Thread Robin Dapp


Hi Paul-Antoine,


+;; Intermediate pattern for vfwmacc.vf and vfwmsac.vf used by combine
+(define_insn_and_split "*extend_vf_"
+ [(set (match_operand:VWEXTF 0 "register_operand")
+(vec_duplicate:VWEXTF
+  (float_extend:
+(match_operand: 1 "register_operand"]
+  "TARGET_VECTOR"


Looks like that needs a can_create_pseudo_p () as well.


diff --git gcc/config/riscv/vector.md gcc/config/riscv/vector.md
index 6753b01db59..ddaa16cda1a 100644
--- gcc/config/riscv/vector.md
+++ gcc/config/riscv/vector.md
@@ -7267,10 +7267,10 @@ (define_insn "@pred_widen_mul__scalar"
  (plus_minus:VWEXTF
(mult:VWEXTF
  (float_extend:VWEXTF
-   (vec_duplicate:
- (match_operand: 3 "register_operand"   "f")))
- (float_extend:VWEXTF
-   (match_operand: 4 "register_operand" "   vr")))
+   (match_operand: 4 "register_operand" "   vr"))
+ (vec_duplicate:VWEXTF
+   (float_extend:
+ (match_operand: 3 "register_operand"   "f"


Hmm, this is not just a reordering but changes from (float_extend (vec_dup ...) 
to (vec_dup (float_extend ...)).


Is the original pattern not used anywhere?  I don't think one is more canonical 
than the other.  Do we fold a sequence like


vfmv.v.f
vfcvt
vfmadd

differently?  Or are we just missing a test for it?

diff --git 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vf_mulop_widen_run.h 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vf_mulop_widen_run.h

new file mode 100644
index 000..36d7f281576
--- /dev/null
+++ gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vf_mulop_widen_run.h
@@ -0,0 +1,32 @@
+#ifndef HAVE_DEFINED_VF_MULOP_WIDEN_RUN_H
+#define HAVE_DEFINED_VF_MULOP_WIDEN_RUN_H
+
+#include 
+
+#define N 512
+
+__attribute__((optimize("-fno-tree-vectorize")))


I would rather use an asm volatile inside the respective loop.


+int main ()
+{
+  T1 f[N];   
+  T1 in[N];   
+  T2 out[N]; 
+  T2 out2[N];


Trailing whitespaces.

Also, seems like the CI picked up the patch but didn't run it?

--
Regards
Robin

Re: [PATCH] RISC-V: Adjust testdata for unsigned vector SAT_SUB

2025-07-09 Thread Kito Cheng

OK if Pan say OK

On Wed, Jul 9, 2025 at 4:36 PM Ciyan Pan  wrote:
>
> From: panciyan 
>
> This patch adjust test data for unsigned vector SAT_SUB to vec_sat_data.h
>
> Passed the rv64gcv regression test.
>
> Signed-off-by: Ciyan Pan 
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/autovec/sat/vec_sat_arith.h: Add 
> vec_sat_u_sub_fmt wrap define.
> * gcc.target/riscv/rvv/autovec/sat/vec_sat_data.h: Add vec_sat_u_sub 
> test data.
> * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-1-u16.c: Remove 
> test data.
> * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-1-u32.c: Remove 
> test data.
> * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-1-u64.c: Remove 
> test data.
> * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-1-u8.c: Remove 
> test data.
> * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-10-u16.c: Remove 
> test data.
> * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-10-u32.c: Remove 
> test data.
> * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-10-u64.c: Remove 
> test data.
> * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-10-u8.c: Remove 
> test data.
> * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-2-u16.c: Remove 
> test data.
> * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-2-u32.c: Remove 
> test data.
> * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-2-u64.c: Remove 
> test data.
> * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-2-u8.c: Remove 
> test data.
> * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-3-u16.c: Remove 
> test data.
> * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-3-u32.c: Remove 
> test data.
> * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-3-u64.c: Remove 
> test data.
> * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-3-u8.c: Remove 
> test data.
> * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-4-u16.c: Remove 
> test data.
> * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-4-u32.c: Remove 
> test data.
> * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-4-u64.c: Remove 
> test data.
> * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-4-u8.c: Remove 
> test data.
> * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-5-u16.c: Remove 
> test data.
> * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-5-u32.c: Remove 
> test data.
> * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-5-u64.c: Remove 
> test data.
> * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-5-u8.c: Remove 
> test data.
> * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-6-u16.c: Remove 
> test data.
> * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-6-u32.c: Remove 
> test data.
> * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-6-u64.c: Remove 
> test data.
> * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-6-u8.c: Remove 
> test data.
> * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-7-u16.c: Remove 
> test data.
> * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-7-u32.c: Remove 
> test data.
> * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-7-u64.c: Remove 
> test data.
> * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-7-u8.c: Remove 
> test data.
> * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-8-u16.c: Remove 
> test data.
> * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-8-u32.c: Remove 
> test data.
> * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-8-u64.c: Remove 
> test data.
> * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-8-u8.c: Remove 
> test data.
> * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-9-u16.c: Remove 
> test data.
> * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-9-u32.c: Remove 
> test data.
> * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-9-u64.c: Remove 
> test data.
> * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub-run-9-u8.c: Remove 
> test data.
>
> ---
>  .../riscv/rvv/autovec/sat/vec_sat_arith.h |  40 +++
>  .../riscv/rvv/autovec/sat/vec_sat_data.h  | 252 ++
>  .../rvv/autovec/sat/vec_sat_u_sub-run-1-u16.c |  70 +
>  .../rvv/autovec/sat/vec_sat_u_sub-run-1-u32.c |  70 +
>  .../rvv/autovec/sat/vec_sat_u_sub-run-1-u64.c |  70 +
>  .../rvv/autovec/sat/vec_sat_u_sub-run-1-u8.c  |  71 +
>  .../autovec/sat/vec_sat_u_sub-run-10-u16.c|  70 +
>  .../autovec/sat/vec_sat_u_sub-run-10-u32.c|  70 +
>  .../autovec/sat/vec_sat_u_sub-run-10-u64.c|  70 +
>  .../rvv/autovec/sat/vec_sat_u_sub-run-10-u8.c |  70 +
>  .../rvv/autovec/sat/vec_sat_u_sub-run-2-u16.c |  70 +
>  .../rvv/autovec/sat/vec_sat_u_sub-run-2-u32.c |  70 +
>  .../rvv/autovec/sat/vec_sat_u_sub-run-2-u64.c |  70 +
>  .../rvv/autov

[PATCH] Avoid accessing STMT_VINFO_VECTYPE

2025-07-09 Thread Richard Biener

The following fixes up two places we access STMT_VINFO_VECTYPE that's
not covered by the fixup in vect_analyze/transform_stmt to set that
from SLP_TREE_VECTYPE.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

* tree-vect-loop.cc (vectorizable_reduction): Get the
output vector type from slp_for_stmt_info.
* tree-vect-stmts.cc (vect_analyze_stmt): Bail out earlier
for PURE_SLP_STMT when doing loop stmt analysis.
---
 gcc/tree-vect-loop.cc  |  2 +-
 gcc/tree-vect-stmts.cc | 16 
 2 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 9cee5195077..5a0736280ad 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -7823,7 +7823,7 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
 inside the loop body. The last operand is the reduction variable,
 which is defined by the loop-header-phi.  */
 
-  tree vectype_out = STMT_VINFO_VECTYPE (stmt_info);
+  tree vectype_out = SLP_TREE_VECTYPE (slp_for_stmt_info);
   STMT_VINFO_REDUC_VECTYPE (reduc_info) = vectype_out;
   STMT_VINFO_REDUC_VECTYPE_IN (reduc_info) = vectype_in;
 
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index b9609488292..717f3e02629 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -13424,6 +13424,14 @@ vect_analyze_stmt (vec_info *vinfo,
 gcc_unreachable ();
 }
 
+  if (PURE_SLP_STMT (stmt_info) && !node)
+{
+  if (dump_enabled_p ())
+   dump_printf_loc (MSG_NOTE, vect_location,
+"handled only by SLP analysis\n");
+  return opt_result::success ();
+}
+
   tree saved_vectype = STMT_VINFO_VECTYPE (stmt_info);
   if (node)
 STMT_VINFO_VECTYPE (stmt_info) = SLP_TREE_VECTYPE (node);
@@ -13437,14 +13445,6 @@ vect_analyze_stmt (vec_info *vinfo,
   *need_to_vectorize = true;
 }
 
-  if (PURE_SLP_STMT (stmt_info) && !node)
-{
-  if (dump_enabled_p ())
-   dump_printf_loc (MSG_NOTE, vect_location,
-"handled only by SLP analysis\n");
-  return opt_result::success ();
-}
-
   /* When we arrive here with a non-SLP statement and we are supposed
  to use SLP for everything fail vectorization.  */
   if (!node)
-- 
2.43.0

Basic fusions in RISC-V generic tuning model

2025-07-09 Thread Jeff Law


One thing I forgot to bring up in the patchwork meeting yesterday.

Philip or Craig asked if we should add the most basic fusions to the 
generic tuning models for the two toolchains.


I'm generally in favor of making that kind of change.  I don't think 
anyone believes it'd be a major performance driver, but it does slightly 
reduce the search space when we do need to chase things down.


lui/auipc+addi would fall into that set.  It's unclear if any others would.

Thoughts?

Jeff

[PATCH v2] testsuite: arm: Update function body for scheduler

2025-07-09 Thread Torbjörn SVENSSON

Ok for trunk and releases/gcc-15?

Changes since v1:
- Removed the acceptance of LDR as it's only generated without 
r15-7373-g5163cf2ae14. Since
  I'm currently looking into gcc-14 release, and made the patch in that scope, 
I ran it on
  trunk to ensure no new failures, but it's not actually needed.

--

The scheduler allows the `and` instruction to be placed at 3 different
locations. Update the function body to contain all 3 locations.

gcc/testsuite/ChangeLog:

* gcc.target/arm/unsigned-extend-2.c: Add missing potential
locations for `and` instruction.

Signed-off-by: Torbjörn SVENSSON 
---
 gcc/testsuite/gcc.target/arm/unsigned-extend-2.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/gcc/testsuite/gcc.target/arm/unsigned-extend-2.c 
b/gcc/testsuite/gcc.target/arm/unsigned-extend-2.c
index d9f95a14277..15bc5a4c14d 100644
--- a/gcc/testsuite/gcc.target/arm/unsigned-extend-2.c
+++ b/gcc/testsuite/gcc.target/arm/unsigned-extend-2.c
@@ -7,15 +7,19 @@
 ** foo:
 ** movs(r[0-9]+), #8
 ** (
+** (
+** and r0, r1, r0, lsr #1
 ** subs\1, \1, #1
 ** ands\1, \1, #255
-** and r0, r1, r0, lsr #1
-** bne .L[0-9]+
-** bx  lr
 ** |
 ** subs\1, \1, #1
 ** and r0, r1, r0, lsr #1
 ** ands\1, \1, #255
+** |
+** subs\1, \1, #1
+** ands\1, \1, #255
+** and r0, r1, r0, lsr #1
+** )
 ** bne .L[0-9]+
 ** bx  lr
 ** |
-- 
2.25.1

[PATCH v5 1/11] openmp: Refactor handling of iterators

2025-07-09 Thread Kwok Cheung Yeung


V1: https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652681.html
V2: https://gcc.gnu.org/pipermail/gcc-patches/2024-September/662139.html
V3: https://gcc.gnu.org/pipermail/gcc-patches/2024-October/664542.html
V4: https://gcc.gnu.org/pipermail/gcc-patches/2024-November/670334.html

This patch is unchanged from the previous version.From dea003f55ae44822bf4db945966d105cb926f34d Mon Sep 17 00:00:00 2001
From: Kwok Cheung Yeung 
Date: Wed, 27 Nov 2024 21:49:12 +
Subject: [PATCH 01/11] openmp: Refactor handling of iterators

Move code to calculate the iteration size and to generate the iterator
expansion loop into separate functions.

Use OMP_ITERATOR_DECL_P to check for iterators in clause declarations.

gcc/c-family/

* c-omp.cc (c_finish_omp_depobj): Use OMP_ITERATOR_DECL_P.

gcc/c/

* c-typeck.cc (handle_omp_array_sections): Use OMP_ITERATOR_DECL_P.
(c_finish_omp_clauses): Likewise.

gcc/cp/

* pt.cc (tsubst_omp_clause_decl): Use OMP_ITERATOR_DECL_P.
* semantics.cc (handle_omp_array_sections): Likewise.
(finish_omp_clauses): Likewise.

gcc/

* gimplify.cc (gimplify_omp_affinity): Use OMP_ITERATOR_DECL_P.
(compute_omp_iterator_count): New.
(build_omp_iterator_loop): New.
(gimplify_omp_depend): Use OMP_ITERATOR_DECL_P,
compute_omp_iterator_count and build_omp_iterator_loop.
* tree-inline.cc (copy_tree_body_r): Use OMP_ITERATOR_DECL_P.
* tree-pretty-print.cc (dump_omp_clause): Likewise.
* tree.h (OMP_ITERATOR_DECL_P): New macro.
---
 gcc/c-family/c-omp.cc|   4 +-
 gcc/c/c-typeck.cc|  13 +-
 gcc/cp/pt.cc |   4 +-
 gcc/cp/semantics.cc  |   8 +-
 gcc/gimplify.cc  | 321 +++
 gcc/tree-inline.cc   |   5 +-
 gcc/tree-pretty-print.cc |   8 +-
 gcc/tree.h   |   6 +
 8 files changed, 173 insertions(+), 196 deletions(-)

diff --git a/gcc/c-family/c-omp.cc b/gcc/c-family/c-omp.cc
index 4352214df3b..fe272888c51 100644
--- a/gcc/c-family/c-omp.cc
+++ b/gcc/c-family/c-omp.cc
@@ -769,9 +769,7 @@ c_finish_omp_depobj (location_t loc, tree depobj,
  kind = OMP_CLAUSE_DEPEND_KIND (clause);
  t = OMP_CLAUSE_DECL (clause);
  gcc_assert (t);
- if (TREE_CODE (t) == TREE_LIST
- && TREE_PURPOSE (t)
- && TREE_CODE (TREE_PURPOSE (t)) == TREE_VEC)
+ if (OMP_ITERATOR_DECL_P (t))
{
  error_at (OMP_CLAUSE_LOCATION (clause),
"% modifier may not be specified on "
diff --git a/gcc/c/c-typeck.cc b/gcc/c/c-typeck.cc
index 794810640e8..d2a857de9a6 100644
--- a/gcc/c/c-typeck.cc
+++ b/gcc/c/c-typeck.cc
@@ -15628,9 +15628,7 @@ handle_omp_array_sections (tree &c, enum 
c_omp_region_type ort)
   tree *tp = &OMP_CLAUSE_DECL (c);
   if ((OMP_CLAUSE_CODE (c) == OMP_CLAUSE_DEPEND
|| OMP_CLAUSE_CODE (c) == OMP_CLAUSE_AFFINITY)
-  && TREE_CODE (*tp) == TREE_LIST
-  && TREE_PURPOSE (*tp)
-  && TREE_CODE (TREE_PURPOSE (*tp)) == TREE_VEC)
+  && OMP_ITERATOR_DECL_P (*tp))
 tp = &TREE_VALUE (*tp);
   tree first = handle_omp_array_sections_1 (c, *tp, types,
maybe_zero_len, first_non_one,
@@ -16827,9 +16825,7 @@ c_finish_omp_clauses (tree clauses, enum 
c_omp_region_type ort)
  /* FALLTHRU */
case OMP_CLAUSE_AFFINITY:
  t = OMP_CLAUSE_DECL (c);
- if (TREE_CODE (t) == TREE_LIST
- && TREE_PURPOSE (t)
- && TREE_CODE (TREE_PURPOSE (t)) == TREE_VEC)
+ if (OMP_ITERATOR_DECL_P (t))
{
  if (TREE_PURPOSE (t) != last_iterators)
last_iterators_remove
@@ -16929,10 +16925,7 @@ c_finish_omp_clauses (tree clauses, enum 
c_omp_region_type ort)
  break;
}
}
- if (TREE_CODE (OMP_CLAUSE_DECL (c)) == TREE_LIST
- && TREE_PURPOSE (OMP_CLAUSE_DECL (c))
- && (TREE_CODE (TREE_PURPOSE (OMP_CLAUSE_DECL (c)))
- == TREE_VEC))
+ if (OMP_ITERATOR_DECL_P (OMP_CLAUSE_DECL (c)))
TREE_VALUE (OMP_CLAUSE_DECL (c)) = t;
  else
OMP_CLAUSE_DECL (c) = t;
diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 3362a6f8f9c..be9735f8d08 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -17967,9 +17967,7 @@ tsubst_omp_clause_decl (tree decl, tree args, 
tsubst_flags_t complain,
 return decl;
 
   /* Handle OpenMP iterators.  */
-  if (TREE_CODE (decl) == TREE_LIST
-  && TREE_PURPOSE (decl)
-  && TREE_CODE (TREE_PURPOSE (decl)) == TREE_VEC)
+  if (OMP_ITERATOR_DECL_P (decl))
 {
   tree ret;
   if (iterator_cache[0] == TREE_PURPOSE (decl))
diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index 28baf7b3172..07a601eb172 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -6295,9 +6295,7 @@ handle_omp_array_sections (tree &c, e

Re: [PATCH v1 1/3] libstdc++: Implement is_sufficiently_aligned.

2025-07-09 Thread Jonathan Wakely

On Thu, 3 Jul 2025 at 11:35, Luc Grosheintz  wrote:
>
> This commit implements and tests the function is_sufficiently_aligned
> from P2897R7.
>
> libstdc++-v3/ChangeLog:
>
> * include/bits/align.h (is_sufficiently_aligned): New function.
> * include/bits/version.def (is_sufficiently_aligned): Add.
> * include/bits/version.h: Regenerate.
> * include/std/memory: Add __glibcxx_want_is_sufficiently_aligned.
> * src/c++23/std.cc.in (is_sufficiently_aligned): Add.
> * testsuite/20_util/is_sufficiently_aligned/1.cc: New test.
> * testsuite/20_util/is_sufficiently_aligned/2.cc: New test.

The Signed-off-by: trailer is missing.


> ---
>  libstdc++-v3/include/bits/align.h | 16 ++
>  libstdc++-v3/include/bits/version.def |  8 +
>  libstdc++-v3/include/bits/version.h   | 10 ++
>  libstdc++-v3/include/std/memory   |  1 +
>  libstdc++-v3/src/c++23/std.cc.in  |  1 +
>  .../20_util/is_sufficiently_aligned/1.cc  | 31 +++
>  .../20_util/is_sufficiently_aligned/2.cc  |  7 +
>  7 files changed, 74 insertions(+)
>  create mode 100644 
> libstdc++-v3/testsuite/20_util/is_sufficiently_aligned/1.cc
>  create mode 100644 
> libstdc++-v3/testsuite/20_util/is_sufficiently_aligned/2.cc
>
> diff --git a/libstdc++-v3/include/bits/align.h 
> b/libstdc++-v3/include/bits/align.h
> index 2b40c37e033..fbbe9cb1f9c 100644
> --- a/libstdc++-v3/include/bits/align.h
> +++ b/libstdc++-v3/include/bits/align.h
> @@ -102,6 +102,22 @@ align(size_t __align, size_t __size, void*& __ptr, 
> size_t& __space) noexcept
>  }
>  #endif // __glibcxx_assume_aligned
>
> +#ifdef __glibcxx_is_sufficiently_aligned // C++ >= 26
> +  /** @brief Is @a __ptr aligned to an _Align byte boundary?
> +   *
> +   *  @tparam _Align An alignment value
> +   *  @tparam _TpAn object type
> +   *
> +   *  C++26 20.2.5 [ptr.align]
> +   *
> +   *  @ingroup memory
> +   */
> +  template
> +bool
> +is_sufficiently_aligned(_Tp* __ptr)
> +{ return reinterpret_cast<__UINTPTR_TYPE__>(__ptr) % _Align == 0; }
> +#endif // __glibcxx_is_sufficiently_aligned
> +
>  _GLIBCXX_END_NAMESPACE_VERSION
>  } // namespace
>
> diff --git a/libstdc++-v3/include/bits/version.def 
> b/libstdc++-v3/include/bits/version.def
> index f4ba501c403..a2695e67716 100644
> --- a/libstdc++-v3/include/bits/version.def
> +++ b/libstdc++-v3/include/bits/version.def
> @@ -732,6 +732,14 @@ ftms = {
>};
>  };
>
> +ftms = {
> +  name = is_sufficiently_aligned;
> +  values = {
> +v = 202411;
> +cxxmin = 26;
> +  };
> +};
> +
>  ftms = {
>name = atomic_flag_test;
>values = {
> diff --git a/libstdc++-v3/include/bits/version.h 
> b/libstdc++-v3/include/bits/version.h
> index dc8ac07be16..1b17a965239 100644
> --- a/libstdc++-v3/include/bits/version.h
> +++ b/libstdc++-v3/include/bits/version.h
> @@ -815,6 +815,16 @@
>  #endif /* !defined(__cpp_lib_assume_aligned) && 
> defined(__glibcxx_want_assume_aligned) */
>  #undef __glibcxx_want_assume_aligned
>
> +#if !defined(__cpp_lib_is_sufficiently_aligned)
> +# if (__cplusplus >  202302L)
> +#  define __glibcxx_is_sufficiently_aligned 202411L
> +#  if defined(__glibcxx_want_all) || 
> defined(__glibcxx_want_is_sufficiently_aligned)
> +#   define __cpp_lib_is_sufficiently_aligned 202411L
> +#  endif
> +# endif
> +#endif /* !defined(__cpp_lib_is_sufficiently_aligned) && 
> defined(__glibcxx_want_is_sufficiently_aligned) */
> +#undef __glibcxx_want_is_sufficiently_aligned
> +
>  #if !defined(__cpp_lib_atomic_flag_test)
>  # if (__cplusplus >= 202002L)
>  #  define __glibcxx_atomic_flag_test 201907L
> diff --git a/libstdc++-v3/include/std/memory b/libstdc++-v3/include/std/memory
> index 1da03b3ea6a..ff342ff35f3 100644
> --- a/libstdc++-v3/include/std/memory
> +++ b/libstdc++-v3/include/std/memory
> @@ -110,6 +110,7 @@
>  #define __glibcxx_want_constexpr_memory
>  #define __glibcxx_want_enable_shared_from_this
>  #define __glibcxx_want_indirect
> +#define __glibcxx_want_is_sufficiently_aligned
>  #define __glibcxx_want_make_unique
>  #define __glibcxx_want_out_ptr
>  #define __glibcxx_want_parallel_algorithm
> diff --git a/libstdc++-v3/src/c++23/std.cc.in 
> b/libstdc++-v3/src/c++23/std.cc.in
> index e692caaa5f9..6f4214ed3a7 100644
> --- a/libstdc++-v3/src/c++23/std.cc.in
> +++ b/libstdc++-v3/src/c++23/std.cc.in
> @@ -1864,6 +1864,7 @@ export namespace std
>using std::allocator_arg_t;
>using std::allocator_traits;
>using std::assume_aligned;
> +  using std::is_sufficiently_aligned;
>using std::make_obj_using_allocator;
>using std::pointer_traits;
>using std::to_address;
> diff --git a/libstdc++-v3/testsuite/20_util/is_sufficiently_aligned/1.cc 
> b/libstdc++-v3/testsuite/20_util/is_sufficiently_aligned/1.cc
> new file mode 100644
> index 000..4c2738b57db
> --- /dev/null
> +++ b/libstdc++-v3/testsuite/20_util/is_sufficiently_aligned/1.cc
> @@ -0,0 +1,31 @@
> +// { d

Re: [PATCH 1/3]middle-end: support vec_cbranch_any and vec_cbranch_all [PR118974]

2025-07-09 Thread Tamar Christina

(on mobile so doing a top reply)

> So it isn't as efficient to use cbranch_any (g != 0) here?  I  think it 
> should be practically equivalent...

Ah yeah, it can expand what we currently expand vector boolean to.

I was initially confused because for SVE what we want here is an ORRS (flag 
setting inclusive ORR)

Using this optab we can get to that an easier way too.

So yeah I agree, cbranch for vectors can be deprecated.

Note that in my patch I named the new one vec_cbranch_any/all to implicitly say 
they are only vectors.

Do you want to fully deprecated cbranch for vectors?

This would mean though that all target checks needs to be updated unless we 
update the supports checks with a helper?

Thanks,
Tamar


From: Richard Biener 
Sent: Wednesday, July 9, 2025 1:24 PM
To: Tamar Christina 
Cc: gcc-patches@gcc.gnu.org ; nd 
Subject: RE: [PATCH 1/3]middle-end: support vec_cbranch_any and vec_cbranch_all 
[PR118974]

On Wed, 9 Jul 2025, Tamar Christina wrote:

> > -Original Message-
> > From: Richard Biener 
> > Sent: Wednesday, July 9, 2025 12:36 PM
> > To: Tamar Christina 
> > Cc: gcc-patches@gcc.gnu.org; nd 
> > Subject: RE: [PATCH 1/3]middle-end: support vec_cbranch_any and
> > vec_cbranch_all [PR118974]
> >
> > On Wed, 9 Jul 2025, Tamar Christina wrote:
> >
> > > > > +Operand 0 is a comparison operator.  Operand 1 and operand 2 are the
> > > > > +first and second operands of the comparison, respectively.  Operand 3
> > > > > +is the @code{code_label} to jump to.
> > > > > +
> > > > > +@cindex @code{cbranch_all@var{mode}4} instruction pattern
> > > > > +@item @samp{cbranch_all@var{mode}4}
> > > > > +Conditional branch instruction combined with a compare instruction on
> > vectors
> > > > > +where it is required that at all of the elementwise comparisons of 
> > > > > the
> > > > > +two input vectors are true.
> > > >
> > > > See above.
> > > >
> > > > When I look at the RTL for aarch64 I wonder whether the middle-end
> > > > can still invert a jump (for BB reorder, for example)?  Without
> > > > a cbranch_none expander we have to invert during RTL expansion?
> > > >
> > >
> > > Isn't cbranch_none just cbranch_all x 0? i.e. all value must be zero.
> > > I think all states are expressible with any and all and flipping the 
> > > branches
> > > so it shouldn't be any more restrictive than cbranch itself is today.
> > >
> > > cbranch also only supports eq and ne, so none would be cbranch (eq x 0)
> > >
> > > and FTR the RTL generated for AArch64 (Both SVE And Adv.SIMD) will be
> > simplified to:
> > >
> > > (insn 23 22 24 5 (parallel [
> > > (set (reg:VNx4BI 128 [ mask_patt_14.15_57 ])
> > > (unspec:VNx4BI [
> > > (reg:VNx4BI 129)
> > > (const_int 0 [0x0])
> > > (gt:VNx4BI (reg:VNx4SI 114 [ vect__2.11 ])
> > > (const_vector:VNx4SI repeat [
> > > (const_int 0 [0])
> > > ]))
> > > ] UNSPEC_PRED_Z))
> > > (clobber (reg:CC_NZC 66 cc))
> > > ]) "cbranch.c":25:10 -1
> > >
> > > (jump_insn 27 26 28 5 (set (pc)
> > > (if_then_else (eq (reg:CC_Z 66 cc)
> > > (const_int 0 [0]))
> > > (label_ref 33)
> > > (pc))) "cbranch.c":25:10 -1
> > >  (int_list:REG_BR_PROB 1014686025 (nil))
> > >
> > > The thing is we can't rid of the unspecs as there's concept of masking in 
> > > RTL
> > compares.
> > > We could technically do an AND (and do in some cases) but then you lose 
> > > the
> > predicate
> > > Hint constant in the RTL which tells you whether the mask is known to be 
> > > all true
> > or not.
> > > This hint is crucial to allow for further optimizations.
> > >
> > > That said the condition code, branch and compares are fully exposed.
> > >
> > > We expand to a larger sequence than I'd like mostly because there's no 
> > > support
> > > for conditional cbranch optabs, or even conditional vector comparisons. 
> > > So the
> > comparisons
> > > must be generated unpredicated by generating an all true mask, and later
> > patterns
> > > merge in the AND.
> > >
> > > The new patterns allow us to clean up codegen for Adv.SIMD + SVE (in a 
> > > single
> > loop)
> > > But not pure SVE.  For which I take a different approach to try to avoid 
> > > requiring
> > > a predicated version of these optabs.
> > >
> > > I don't want to push my luck, but would you be ok with a conditional 
> > > version of
> > these
> > > optabs too? i.e. cond_cbranch_all and cond_cbranch_all?  This would allow 
> > > us to
> > > immediately expand to the correct representation for both SVE and Adv.SIMD
> > > without having to rely on various combine patterns and cc-fusion to 
> > > optimize the
> > sequences
> > > later on (which has historically been a bit hit or miss if someone adds a 
> > > new CC
> > pattern).
> >
> > Can

[PATCH v5 2/11] openmp: Add support for iterators in map clauses (C/C++)

2025-07-09 Thread Kwok Cheung Yeung


v1: https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652682.html
v2: https://gcc.gnu.org/pipermail/gcc-patches/2024-September/662140.html
v3: https://gcc.gnu.org/pipermail/gcc-patches/2024-October/664543.html
v4: https://gcc.gnu.org/pipermail/gcc-patches/2024-November/670335.html

This patch is largely unchanged from v4. 
gimple_omp_target_iterator_loops_ptr now takes a gimple* rather than a 
gomp_target* to avoid a couple of as_a casts in various places.From d3ad9940694b33a5428dbd5e52f46ec4da799419 Mon Sep 17 00:00:00 2001
From: Kwok Cheung Yeung 
Date: Sat, 3 May 2025 20:24:26 +
Subject: [PATCH 02/11] openmp: Add support for iterators in map clauses
 (C/C++)

This adds preliminary support for iterators in map clauses within OpenMP
'target' constructs (which includes constructs such as 'target enter data').

Iterators with non-constant loop bounds are not currently supported.

gcc/c/

* c-parser.cc (c_parser_omp_clause_map): Parse 'iterator' modifier.
* c-typeck.cc (c_finish_omp_clauses): Finish iterators.  Apply
iterators to generated clauses.

gcc/cp/

* parser.cc (cp_parser_omp_clause_map): Parse 'iterator' modifier.
* semantics.cc (finish_omp_clauses): Finish iterators.  Apply
iterators to generated clauses.

gcc/

* gimple-pretty-print.cc (dump_gimple_omp_target): Print expanded
iterator loops.
* gimple.cc (gimple_build_omp_target): Add argument for iterator
loops sequence.  Initialize iterator loops field.
* gimple.def (GIMPLE_OMP_TARGET): Set GSS symbol to GSS_OMP_TARGET.
* gimple.h (gomp_target): Set GSS symbol to GSS_OMP_TARGET.  Add extra
field for iterator loops.
(gimple_build_omp_target): Add argument for iterator loops sequence.
(gimple_omp_target_iterator_loops): New.
(gimple_omp_target_iterator_loops_ptr): New.
(gimple_omp_target_set_iterator_loops): New.
* gimplify.cc (find_var_decl): New.
(copy_omp_iterator): New.
(remap_omp_iterator_var_1): New.
(remap_omp_iterator_var): New.
(remove_unused_omp_iterator_vars): New.
(struct iterator_loop_info_t): New type.
(iterator_loop_info_map_t): New type.
(build_omp_iterators_loops): New.
(enter_omp_iterator_loop_context_1): New.
(enter_omp_iterator_loop_context): New.
(enter_omp_iterator_loop_context): New.
(exit_omp_iterator_loop_context): New.
(gimplify_adjust_omp_clauses): Add argument for iterator loop
sequence.  Gimplify the clause decl and size into the iterator
loop if iterators are used.
(gimplify_omp_workshare): Call remove_unused_omp_iterator_vars and
build_omp_iterators_loops for OpenMP target expressions.  Add
loop sequence as argument when calling gimplify_adjust_omp_clauses
and building the Gimple statement.
* gimplify.h (enter_omp_iterator_loop_context): New prototype.
(exit_omp_iterator_loop_context): New prototype.
* gsstruct.def (GSS_OMP_TARGET): New.
* omp-low.cc (lower_omp_map_iterator_expr): New.
(lower_omp_map_iterator_size): New.
(finish_omp_map_iterators): New.
(lower_omp_target): Add sorry if iterators used with deep mapping.
Call lower_omp_map_iterator_expr before assigning to sender ref.
Call lower_omp_map_iterator_size before setting the size.  Insert
iterator loop sequence before the statements for the target clause.
* tree-nested.cc (convert_nonlocal_reference_stmt): Walk the iterator
loop sequence of OpenMP target statements.
(convert_local_reference_stmt): Likewise.
(convert_tramp_reference_stmt): Likewise.
* tree-pretty-print.cc (dump_omp_iterators): Dump extra iterator
information if present.
(dump_omp_clause): Call dump_omp_iterators for iterators in map
clauses.
* tree.cc (omp_clause_num_ops): Add operand for OMP_CLAUSE_MAP.
(walk_tree_1): Do not walk last operand of OMP_CLAUSE_MAP.
* tree.h (OMP_CLAUSE_HAS_ITERATORS): New.
(OMP_CLAUSE_ITERATORS): New.

gcc/testsuite/

* c-c++-common/gomp/map-6.c (foo): Amend expected error message.
* c-c++-common/gomp/target-map-iterators-1.c: New.
* c-c++-common/gomp/target-map-iterators-2.c: New.
* c-c++-common/gomp/target-map-iterators-3.c: New.
* c-c++-common/gomp/target-map-iterators-4.c: New.

libgomp/

* target.c (kind_to_name): New.
(gomp_merge_iterator_maps): New.
(gomp_map_vars_internal): Call gomp_merge_iterator_maps.  Copy
address of only the first iteration to target vars.  Free allocated
variables.
* testsuite/libgomp.c-c++-common/target-map-iterators-1.c: New.
* testsuite/libgomp.c-c++-common/target-map-iterators-2.c: New.
* testsuite/libgomp.c-c++-common/target-map-iterators-3.c: New.

Co-authored-by

[PATCH v5 3/11] openmp: Add support for iterators in 'target update' clauses (C/C++)

2025-07-09 Thread Kwok Cheung Yeung


V1: https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652683.html
V2: https://gcc.gnu.org/pipermail/gcc-patches/2024-September/662141.html
V3: https://gcc.gnu.org/pipermail/gcc-patches/2024-October/664545.html
V4: https://gcc.gnu.org/pipermail/gcc-patches/2024-November/670336.html

This patch is basically unchanged from v4.From 19dc39bbacb5d630df6dae78beb6871cbab7fcf8 Mon Sep 17 00:00:00 2001
From: Kwok Cheung Yeung 
Date: Wed, 27 Nov 2024 21:51:34 +
Subject: [PATCH 03/11] openmp: Add support for iterators in 'target update'
 clauses (C/C++)

This adds support for iterators in 'to' and 'from' clauses in the
'target update' OpenMP directive.

gcc/c/

* c-parser.cc (c_parser_omp_clause_from_to): Parse 'iterator' modifier.
* c-typeck.cc (c_finish_omp_clauses): Finish iterators for to/from
clauses.

gcc/cp/

* parser.cc (cp_parser_omp_clause_from_to): Parse 'iterator' modifier.
* semantics.cc (finish_omp_clauses): Finish iterators for to/from
clauses.

gcc/

* gimplify.cc (gimplify_scan_omp_clauses): Add argument for iterator
loop sequence.   Gimplify the clause decl and size into the iterator
loop if iterators are used.
(gimplify_omp_workshare): Add argument for iterator loops sequence
in call to gimplify_scan_omp_clauses.
(gimplify_omp_target_update): Call remove_unused_omp_iterator_vars and
build_omp_iterators_loops.  Add loop sequence as argument when calling
gimplify_scan_omp_clauses, gimplify_adjust_omp_clauses and building
the Gimple statement.
* tree-pretty-print.cc (dump_omp_clause): Call dump_omp_iterators
for to/from clauses with iterators.
* tree.cc (omp_clause_num_ops): Add extra operand for OMP_CLAUSE_FROM
and OMP_CLAUSE_TO.
* tree.h (OMP_CLAUSE_HAS_ITERATORS): Add check for OMP_CLAUSE_TO and
OMP_CLAUSE_FROM.
(OMP_CLAUSE_ITERATORS): Likewise.

gcc/testsuite/

* c-c++-common/gomp/target-update-iterators-1.c: New.
* c-c++-common/gomp/target-update-iterators-2.c: New.
* c-c++-common/gomp/target-update-iterators-3.c: New.

libgomp/

* target.c (gomp_update): Call gomp_merge_iterator_maps.  Free
allocated variables.
* testsuite/libgomp.c-c++-common/target-update-iterators-1.c: New.
* testsuite/libgomp.c-c++-common/target-update-iterators-2.c: New.
* testsuite/libgomp.c-c++-common/target-update-iterators-3.c: New.
---
 gcc/c/c-parser.cc |  99 ++--
 gcc/c/c-typeck.cc |   5 +-
 gcc/cp/parser.cc  | 112 --
 gcc/cp/semantics.cc   |   5 +-
 gcc/gimplify.cc   |  37 +++---
 .../gomp/target-update-iterators-1.c  |  20 
 .../gomp/target-update-iterators-2.c  |  23 
 .../gomp/target-update-iterators-3.c  |  17 +++
 gcc/tree-pretty-print.cc  |  10 ++
 gcc/tree.cc   |   4 +-
 gcc/tree.h|   6 +-
 libgomp/target.c  |  14 +++
 .../target-update-iterators-1.c   |  65 ++
 .../target-update-iterators-2.c   |  58 +
 .../target-update-iterators-3.c   |  67 +++
 15 files changed, 501 insertions(+), 41 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/gomp/target-update-iterators-1.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/target-update-iterators-2.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/target-update-iterators-3.c
 create mode 100644 
libgomp/testsuite/libgomp.c-c++-common/target-update-iterators-1.c
 create mode 100644 
libgomp/testsuite/libgomp.c-c++-common/target-update-iterators-2.c
 create mode 100644 
libgomp/testsuite/libgomp.c-c++-common/target-update-iterators-3.c

diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
index e60b3821481..b426d0b9f9f 100644
--- a/gcc/c/c-parser.cc
+++ b/gcc/c/c-parser.cc
@@ -20540,8 +20540,11 @@ c_parser_omp_clause_device_type (c_parser *parser, 
tree list)
to ( variable-list )
 
OpenMP 5.1:
-   from ( [present :] variable-list )
-   to ( [present :] variable-list ) */
+   from ( [motion-modifier[,] [motion-modifier[,]...]:] variable-list )
+   to ( [motion-modifier[,] [motion-modifier[,]...]:] variable-list )
+
+   motion-modifier:
+ present | iterator (iterators-definition)  */
 
 static tree
 c_parser_omp_clause_from_to (c_parser *parser, enum omp_clause_code kind,
@@ -20552,18 +20555,83 @@ c_parser_omp_clause_from_to (c_parser *parser, enum 
omp_clause_code kind,
   if (!parens.require_open (parser))
 return list;
 
+  int pos = 1, colon_pos = 0;
+  int iterator_length = 0;
+
+  while (c_parser_peek_nth_token_raw (parser, pos)->type == CPP_NAME)
+{
+  const char *identifier =
+   IDENTIFIER_POINTER (c_parser_peek_

[PATCH v5 5/11] openmp, fortran: Add support for iterators in OpenMP 'target update' constructs (Fortran)

2025-07-09 Thread Kwok Cheung Yeung


v2: https://gcc.gnu.org/pipermail/gcc-patches/2024-September/662143.html
v3: https://gcc.gnu.org/pipermail/gcc-patches/2024-October/664548.html
v4: https://gcc.gnu.org/pipermail/gcc-patches/2024-November/670338.html

This is largely identical to v4 of the patch, with some slightly 
improved error reporting when dealing with extra modifiers.From 2211e8cebe99d693c87ef80979ff0426131f44d6 Mon Sep 17 00:00:00 2001
From: Kwok Cheung Yeung 
Date: Wed, 27 Nov 2024 21:56:08 +
Subject: [PATCH 05/11] openmp, fortran: Add support for iterators in OpenMP
 'target update' constructs (Fortran)

This adds Fortran support for iterators in 'to' and 'from' clauses in the
'target update' OpenMP directive.

gcc/fortran/

* dump-parse-tree.cc (show_omp_namelist): Add iterator support for
OMP_LIST_TO and OMP_LIST_FROM.
* openmp.cc (gfc_free_omp_clauses): Free namespace for OMP_LIST_TO
and OMP_LIST_FROM.
(gfc_match_motion_var_list): Parse 'iterator' modifier.
(resolve_omp_clauses): Resolve iterators for OMP_LIST_TO and
OMP_LIST_FROM.
* trans-openmp.cc (gfc_trans_omp_clauses): Handle iterators in
OMP_LIST_TO and OMP_LIST_FROM clauses.  Add expressions to
iter_block rather than block.

gcc/testsuite/

* gfortran.dg/gomp/target-update-iterators-1.f90: New.
* gfortran.dg/gomp/target-update-iterators-2.f90: New.
* gfortran.dg/gomp/target-update-iterators-3.f90: New.

libgomp/

* testsuite/libgomp.fortran/target-update-iterators-1.f90: New.
* testsuite/libgomp.fortran/target-update-iterators-2.f90: New.
* testsuite/libgomp.fortran/target-update-iterators-3.f90: New.

Co-authored-by: Andrew Stubbs  
---
 gcc/fortran/dump-parse-tree.cc|  7 +-
 gcc/fortran/openmp.cc | 65 ++--
 gcc/fortran/trans-openmp.cc   | 49 ++--
 .../gomp/target-update-iterators-1.f90| 25 ++
 .../gomp/target-update-iterators-2.f90| 28 +++
 .../gomp/target-update-iterators-3.f90| 23 ++
 .../target-update-iterators-1.f90 | 68 
 .../target-update-iterators-2.f90 | 63 +++
 .../target-update-iterators-3.f90 | 78 +++
 9 files changed, 394 insertions(+), 12 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/target-update-iterators-1.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/target-update-iterators-2.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/target-update-iterators-3.f90
 create mode 100644 
libgomp/testsuite/libgomp.fortran/target-update-iterators-1.f90
 create mode 100644 
libgomp/testsuite/libgomp.fortran/target-update-iterators-2.f90
 create mode 100644 
libgomp/testsuite/libgomp.fortran/target-update-iterators-3.f90

diff --git a/gcc/fortran/dump-parse-tree.cc b/gcc/fortran/dump-parse-tree.cc
index bf4ee446940..5477a33cf59 100644
--- a/gcc/fortran/dump-parse-tree.cc
+++ b/gcc/fortran/dump-parse-tree.cc
@@ -1462,7 +1462,8 @@ show_omp_namelist (int list_type, gfc_omp_namelist *n)
 {
   gfc_current_ns = ns_curr;
   if (list_type == OMP_LIST_AFFINITY || list_type == OMP_LIST_DEPEND
- || list_type == OMP_LIST_MAP)
+ || list_type == OMP_LIST_MAP
+ || list_type == OMP_LIST_TO || list_type == OMP_LIST_FROM)
{
  gfc_current_ns = n->u2.ns ? n->u2.ns : ns_curr;
  if (n->u2.ns != ns_iter)
@@ -1478,6 +1479,10 @@ show_omp_namelist (int list_type, gfc_omp_namelist *n)
fputs ("DEPEND (", dumpfile);
  else if (list_type == OMP_LIST_MAP)
fputs ("MAP (", dumpfile);
+ else if (list_type == OMP_LIST_TO)
+   fputs ("TO (", dumpfile);
+ else if (list_type == OMP_LIST_FROM)
+   fputs ("FROM (", dumpfile);
  else
gcc_unreachable ();
}
diff --git a/gcc/fortran/openmp.cc b/gcc/fortran/openmp.cc
index e8563278a5d..56f008e2a38 100644
--- a/gcc/fortran/openmp.cc
+++ b/gcc/fortran/openmp.cc
@@ -205,7 +205,8 @@ gfc_free_omp_clauses (gfc_omp_clauses *c)
   for (i = 0; i < OMP_LIST_NUM; i++)
 gfc_free_omp_namelist (c->lists[i],
   i == OMP_LIST_AFFINITY || i == OMP_LIST_DEPEND
-  || i == OMP_LIST_MAP,
+  || i == OMP_LIST_MAP
+  || i == OMP_LIST_TO || i == OMP_LIST_FROM,
   i == OMP_LIST_ALLOCATE,
   i == OMP_LIST_USES_ALLOCATORS,
   i == OMP_LIST_INIT);
@@ -1418,16 +1419,67 @@ gfc_match_motion_var_list (const char *str, 
gfc_omp_namelist **list,
   if (m != MATCH_YES)
 return m;
 
-  match m_present = gfc_match (" present : ");
+  gfc_namespace *ns_iter = NULL, *ns_curr = gfc_current_ns;
+  locus old_loc = gfc_current_locus;
+  int present_modifie

Re: [PATCH 2/2] tree-optimization/109893 - allow more backwards jump threading

2025-07-09 Thread Richard Biener

On Wed, 9 Jul 2025, Jan Hubicka wrote:

> > The following changes the percentage that determines how many
> > stmts are allowed for backwards jump threading from 50 to 54,
> > enabling the missed jump threading observed in PR109893.
> > 
> > Bootstrapped and tested on x86_64-unknown-linux-gnu.  It seems that
> > at least backward threading is prone to profile mismatches, I've
> > altered two testcases to deal with new ones to pop up (definitely
> > latent issues).
> > 
> > OK?
> I wonder if the duplication limit should not be controlled by statement
> size rather than statement count?

It is, it uses estimate_num_insns (stmt, &eni_size_weights) to
"count" stmts, the name (and variables used) are just "off".

Richard.

> Honza
>

[PATCH v5 6/11] openmp: Add support for non-constant iterator parameters in map, to and from clauses

2025-07-09 Thread Kwok Cheung Yeung

The previous version was posted at:
https://gcc.gnu.org/pipermail/gcc-patches/2024-December/671630.html

This patch is largely the same as the previous version, but calculation 
of the size of the iterator elements array is separated into another 
function for reuse later.From 607ceefb93136fd14471d0a9383464776228f074 Mon Sep 17 00:00:00 2001
From: Kwok Cheung Yeung 
Date: Thu, 12 Dec 2024 21:22:20 +
Subject: [PATCH 06/11] openmp: Add support for non-constant iterator
 parameters in map, to and from clauses

This patch enables support for using non-constant expressions when specifying
iterators in the map clause of target constructs and to/from clauses of
target update constructs.

gcc/

* gimplify.cc (omp_iterator_elems_length): New.
(build_omp_iterators_loops): Change type of elements
array to pointer of pointers if array length is non-constant, and
assign size with indirect reference.  Reorder elements added to
iterator vector and add element containing the iteration count.  Use
omp_iterator_elems_length to compute element array size required.
* gimplify.h (omp_iterator_elems_length): New prototype.
* omp-low.cc (lower_omp_map_iterator_expr): Reorder elements read
from iterator vector.  If elements field is a pointer type, assign
using pointer arithmetic followed by indirect reference, and return
the field directly.
(lower_omp_map_iterator_size): Reorder elements read from iterator
vector.  If elements field is a pointer type, assign using pointer
arithmetic followed by indirect reference.
(allocate_omp_iterator_elems): New.
(free_omp_iterator_elems): New.
(lower_omp_target): Call allocate_omp_iterator_elems before inserting
loops sequence, and call free_omp_iterator_elems afterwards.
* tree-pretty-print.cc (dump_omp_iterators): Print extra elements in
iterator vector.

gcc/testsuite/

* c-c++-common/gomp/target-map-iterators-3.c: Update expected Gimple
output.
* c-c++-common/gomp/target-map-iterators-5.c: New.
* c-c++-common/gomp/target-update-iterators-3.c: Update expected
Gimple output.
* gfortran.dg/gomp/target-map-iterators-3.f90: Likewise.
* gfortran.dg/gomp/target-map-iterators-5.f90: New.
* gfortran.dg/gomp/target-update-iterators-3.f90: Update expected
Gimple output.

libgomp/

* testsuite/libgomp.c-c++-common/target-map-iterators-4.c: New.
* testsuite/libgomp.c-c++-common/target-map-iterators-5.c: New.
* testsuite/libgomp.c-c++-common/target-update-iterators-4.c: New.
* testsuite/libgomp.fortran/target-map-iterators-4.f90: New.
* testsuite/libgomp.fortran/target-map-iterators-5.f90: New.
* testsuite/libgomp.fortran/target-update-iterators-4.f90: New.
---
 gcc/gimplify.cc   |  44 
 gcc/gimplify.h|   1 +
 gcc/omp-low.cc| 100 --
 .../gomp/target-map-iterators-3.c |   8 +-
 .../gomp/target-map-iterators-5.c |  14 +++
 .../gomp/target-update-iterators-3.c  |   4 +-
 .../gomp/target-map-iterators-3.f90   |   8 +-
 .../gomp/target-map-iterators-5.f90   |  21 
 .../gomp/target-update-iterators-3.f90|   6 +-
 gcc/tree-pretty-print.cc  |   6 +-
 .../target-map-iterators-4.c  |  48 +
 .../target-map-iterators-5.c  |  59 +++
 .../target-update-iterators-4.c   |  66 
 .../target-map-iterators-4.f90|  48 +
 .../target-map-iterators-5.f90|  61 +++
 .../target-update-iterators-4.f90 |  70 
 16 files changed, 518 insertions(+), 46 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/gomp/target-map-iterators-5.c
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/target-map-iterators-5.f90
 create mode 100644 
libgomp/testsuite/libgomp.c-c++-common/target-map-iterators-4.c
 create mode 100644 
libgomp/testsuite/libgomp.c-c++-common/target-map-iterators-5.c
 create mode 100644 
libgomp/testsuite/libgomp.c-c++-common/target-update-iterators-4.c
 create mode 100644 libgomp/testsuite/libgomp.fortran/target-map-iterators-4.f90
 create mode 100644 libgomp/testsuite/libgomp.fortran/target-map-iterators-5.f90
 create mode 100644 
libgomp/testsuite/libgomp.fortran/target-update-iterators-4.f90

diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc
index 52d0e11de8b..f8f649c7154 100644
--- a/gcc/gimplify.cc
+++ b/gcc/gimplify.cc
@@ -9973,6 +9973,13 @@ struct iterator_loop_info_t

 typedef hash_map iterator_loop_info_map_t;

+tree
+omp_iterator_elems_length (tree count)
+{
+  tree count_2 = size_binop (MULT_EXPR, count, size_int (2));
+  return size_binop (PLUS_EXPR, count_2, size_int (1));
+}
+
 /* Builds a

Re: [PATCH v1 1/1] libiberty: add common methods for type-sensitive doubly linked lists

2025-07-09 Thread Matthieu Longo


On 2025-07-08 16:25, Richard Sandiford wrote:

Matthieu Longo  writes:

Those methods's implementation is relying on duck-typing at compile
time.
The structure corresponding to the node of a doubly linked list needs
to define attributes 'prev' and 'next' which are pointers on the type
of a node.
The structure wrapping the nodes and others metadata (first, last, size)
needs to define pointers 'first_', and 'last_' of the node's type, and
an integer type for 'size'.

Mutative methods can be bundled together and be declarable once via a
same macro, or can be declared separately. The merge sort is bundled
separately.
There are 3 types of macros:
1. for the declaration of prototypes: to use in a header file for a
public declaration, or as a forward declaration in the source file
for private declaration.
2. for the declaration of the implementation: to use always in a
source file.
3. for the invocation of the functions.

The methods can be declared either public or private via the second
argument of the declaration macros.

List of currently implemented methods:
- LINKED_LIST_*:
 - APPEND: insert a node at the end of the list.
 - PREPEND: insert a node at the beginning of the list.
 - INSERT_BEFORE: insert a node before the given node.
 - POP_FRONT: remove the first node of the list.
 - POP_BACK: remove the last node of the list.
 - REMOVE: remove the given node from the list.
 - SWAP: swap the two given nodes in the list.
- LINKED_LIST_MERGE_SORT: a merge sort implementation.


This mostly looks good, but some comments below:


---
  include/doubly-linked-list.h  | 440 ++
  libiberty/Makefile.in |   1 +
  libiberty/testsuite/Makefile.in   |  12 +-
  libiberty/testsuite/test-doubly-linked-list.c | 253 ++
  4 files changed, 705 insertions(+), 1 deletion(-)
  create mode 100644 include/doubly-linked-list.h
  create mode 100644 libiberty/testsuite/test-doubly-linked-list.c

diff --git a/include/doubly-linked-list.h b/include/doubly-linked-list.h
new file mode 100644
index 000..3b3ce1ee6b9
--- /dev/null
+++ b/include/doubly-linked-list.h
@@ -0,0 +1,440 @@
+/* Copyright (C) 2025 Free Software Foundation, Inc.


There should be a one-line summary of the file before the copyright.



Fixed in the next revision.


+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see .  */
+
+
+#ifndef _DOUBLY_LINKED_LIST_H
+#define _DOUBLY_LINKED_LIST_H
+
+#include 
+
+/* Doubly linked list implementation enforcing typing.
+
+   This implementation of doubly linked list tries to achieve the enforcement 
of
+   typing similarly to C++ templates, but without encapsulation.
+
+   All the functions are prefixed with the type of the value: "AType_xxx".
+   Some functions are prefixed with "_AType_xxx" and are not part of the public
+   API, so should not be used, except for _##LTYPE##_merge_sort with a caveat
+   (see note above its definition).
+
+   Each function (### is a placeholder for method name) has a macro for:
+   (1) its invocation LINKED_LIST_###(LTYPE).
+   (2) its prototype LINKED_LIST_DECL_###(A, A2, scope). To add in a header
+   file, or a source file for forward declaration. 'scope' should be set
+   respectively to 'extern', or 'static'.
+   (3) its definition LINKED_LIST_DEFN_###(A, A2, scope). To add in a source
+   file with the 'scope' set respectively to nothing, or 'static' depending
+   on (2).
+
+   Data structures requirements:
+   - LTYPE corresponds to the node of a doubly linked list. It needs to define
+ attributes 'prev' and 'next' which are pointers on the type of a node.
+ For instance:
+   struct my_list_node
+   {
+T value;
+struct my_list_node *prev;
+struct my_list_node *next;
+   };
+   - LWRAPPERTYPE is a structure wrapping the nodes and others metadata 
(first_,
+ last_, size).


Was there a reason for adding underscores to "first" and "last" but not
to "prev", "next" and "size"?



No, no good reason. I changed them to "first" and "last" in the next 
revision.



+ */
+
+
+/* Mutative operations:
+- append
+- prepend
+- insert_before
+- pop_front
+- pop_back
+- remove
+- swap
+   The header and body of each of those operation can be declared individually,
+   or as a whole via LINKED_LIST_MUTATIV

[PATCH v5 4/11] openmp, fortran: Add support for map iterators in OpenMP target construct (Fortran)

2025-07-09 Thread Kwok Cheung Yeung


v2: https://gcc.gnu.org/pipermail/gcc-patches/2024-September/662142.html
v3: https://gcc.gnu.org/pipermail/gcc-patches/2024-October/664546.html
v4: https://gcc.gnu.org/pipermail/gcc-patches/2024-November/670337.html

Again, largely the same as v4, with a couple of extra bug-fixes:

- When computing the bias of an array section, substitute iterator 
variables with the value of the lower bound before computing the difference.


- Fixed a boundary condition for Fortran when computing the iterator 
count, where the start equals the end value.


- Fix computation of base bit offset with iterators.From 14469e4793899babf038afaac90b9f7721922573 Mon Sep 17 00:00:00 2001
From: Kwok Cheung Yeung 
Date: Sat, 3 May 2025 20:36:21 +
Subject: [PATCH 04/11] openmp, fortran: Add support for map iterators in
 OpenMP target construct (Fortran)

This adds support for iterators in map clauses within OpenMP
'target' constructs in Fortran.

Some special handling for struct field maps has been added to libgomp in
order to handle arrays of derived types.

gcc/fortran/

* dump-parse-tree.cc (show_omp_namelist): Add iterator support for
OMP_LIST_MAP.
* openmp.cc (gfc_free_omp_clauses): Free namespace in namelist for
OMP_LIST_MAP.
(gfc_match_omp_clauses): Parse 'iterator' modifier for 'map' clause.
(resolve_omp_clauses): Resolve iterators for OMP_LIST_MAP.
* trans-openmp.cc: Include tree-ssa-loop-niter.h.
(gfc_trans_omp_array_section): Add iterator argument.  Replace
instances of iterator variables with the initial value when
computing biases.
(gfc_trans_omp_clauses): Handle iterators in OMP_LIST_MAP clauses.
Add expressions to iter_block rather than block.  Do not apply
iterators to firstprivate maps.  Pass iterator to
gfc_trans_omp_array_section.

gcc/

* gimplify.cc (compute_omp_iterator_count): Account for difference
in loop boundaries in Fortran.
(build_omp_iterator_loop): Change upper boundary condition for
Fortran.  Insert block statements into innermost loop.
(remove_unused_omp_iterator_vars): Copy block subblocks of old
iterator to new iterator and remove original.
(contains_vars_1): New.
(contains_vars): New.
(extract_base_bit_offset): Add iterator argument.  Remove iterator
variables from base.  Do not set variable_offset if the offset
does not contain any remaining variables.
(omp_accumulate_sibling_list): Add iterator argument to
extract_base_bit_offset.
* tree-pretty-print.cc (dump_block_node): Ignore BLOCK_SUBBLOCKS
containing iterator block statements.

gcc/testsuite/

* gfortran.dg/gomp/target-map-iterators-1.f90: New.
* gfortran.dg/gomp/target-map-iterators-2.f90: New.
* gfortran.dg/gomp/target-map-iterators-3.f90: New.
* gfortran.dg/gomp/target-map-iterators-4.f90: New.

libgomp/

* target.c (kind_to_name): Handle GOMP_MAP_STRUCT and
GOMP_MAP_STRUCT_UNORD.
(gomp_add_map): New.
(gomp_merge_iterator_maps): Expand fields of a struct mapping
breadth-first.
* testsuite/libgomp.fortran/target-map-iterators-1.f90: New.
* testsuite/libgomp.fortran/target-map-iterators-2.f90: New.
* testsuite/libgomp.fortran/target-map-iterators-3.f90: New.

Co-authored-by: Andrew Stubbs 
---
 gcc/fortran/dump-parse-tree.cc|   9 +-
 gcc/fortran/openmp.cc |  36 +-
 gcc/fortran/trans-openmp.cc   | 106 ++
 gcc/gimplify.cc   |  94 +---
 .../gomp/target-map-iterators-1.f90   |  26 +
 .../gomp/target-map-iterators-2.f90   |  33 ++
 .../gomp/target-map-iterators-3.f90   |  24 
 .../gomp/target-map-iterators-4.f90   |  31 +
 gcc/tree-pretty-print.cc  |   4 +-
 libgomp/target.c  |  84 ++
 .../target-map-iterators-1.f90|  45 
 .../target-map-iterators-2.f90|  45 
 .../target-map-iterators-3.f90|  56 +
 13 files changed, 526 insertions(+), 67 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/target-map-iterators-1.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/target-map-iterators-2.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/target-map-iterators-3.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/target-map-iterators-4.f90
 create mode 100644 libgomp/testsuite/libgomp.fortran/target-map-iterators-1.f90
 create mode 100644 libgomp/testsuite/libgomp.fortran/target-map-iterators-2.f90
 create mode 100644 libgomp/testsuite/libgomp.fortran/target-map-iterators-3.f90

diff --git a/gcc/fortran/dump-parse-tree.cc b/gcc/fortran/dump-parse-tree.cc
index 3cd2eeef11a..bf4ee446940 100644
--- a/gcc/fortran/dump

[PATCH v2 1/1] libiberty: add routines to handle type-sensitive doubly linked lists

2025-07-09 Thread Matthieu Longo

Those methods's implementation is relying on duck-typing at compile
time.
The structure corresponding to the node of a doubly linked list needs
to define attributes 'prev' and 'next' which are pointers on the type
of a node.
The structure wrapping the nodes and others metadata (first, last, size)
needs to define pointers 'first', and 'last' of the node's type, and
an integer type for 'size'.

Mutative methods can be bundled together and be declarable once via a
same macro, or can be declared separately. The merge sort is bundled
separately.
There are 3 types of macros:
1. for the declaration of prototypes: to use in a header file for a
   public declaration, or as a forward declaration in the source file
   for private declaration.
2. for the declaration of the implementation: to use always in a
   source file.
3. for the invocation of the functions.

The methods can be declared either public or private via the second
argument of the declaration macros.

List of currently implemented methods:
- LINKED_LIST_*:
- APPEND: insert a node at the end of the list.
- PREPEND: insert a node at the beginning of the list.
- INSERT_BEFORE: insert a node before the given node.
- POP_FRONT: remove the first node of the list.
- POP_BACK: remove the last node of the list.
- REMOVE: remove the given node from the list.
- SWAP: swap the two given nodes in the list.
- LINKED_LIST_MERGE_SORT: a merge sort implementation.
---
 include/doubly-linked-list.h  | 447 ++
 libiberty/Makefile.in |   1 +
 libiberty/testsuite/Makefile.in   |  12 +-
 libiberty/testsuite/test-doubly-linked-list.c | 269 +++
 4 files changed, 728 insertions(+), 1 deletion(-)
 create mode 100644 include/doubly-linked-list.h
 create mode 100644 libiberty/testsuite/test-doubly-linked-list.c

diff --git a/include/doubly-linked-list.h b/include/doubly-linked-list.h
new file mode 100644
index 000..3f5ea2808f9
--- /dev/null
+++ b/include/doubly-linked-list.h
@@ -0,0 +1,447 @@
+/* Manipulate doubly linked lists.
+   Copyright (C) 2025 Free Software Foundation, Inc.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see .  */
+
+
+#ifndef _DOUBLY_LINKED_LIST_H
+#define _DOUBLY_LINKED_LIST_H
+
+/* Doubly linked list implementation enforcing typing.
+
+   This implementation of doubly linked list tries to achieve the enforcement 
of
+   typing similarly to C++ templates, but without encapsulation.
+
+   All the functions are prefixed with the type of the value: "AType_xxx".
+   Some functions are prefixed with "_AType_xxx" and are not part of the public
+   API, so should not be used, except for _##LTYPE##_merge_sort with a caveat
+   (see note above its definition).
+
+   Each function (### is a placeholder for method name) has a macro for:
+   (1) its invocation LINKED_LIST_###(LTYPE).
+   (2) its prototype LINKED_LIST_DECL_###(A, A2, scope). To add in a header
+   file, or a source file for forward declaration. 'scope' should be set
+   respectively to 'extern', or 'static'.
+   (3) its definition LINKED_LIST_DEFN_###(A, A2, scope). To add in a source
+   file with the 'scope' set respectively to nothing, or 'static' depending
+   on (2).
+
+   Data structures requirements:
+   - LTYPE corresponds to the node of a doubly linked list. It needs to define
+ attributes 'prev' and 'next' which are pointers on the type of a node.
+ For instance:
+   struct my_list_node
+   {
+T value;
+struct my_list_node *prev;
+struct my_list_node *next;
+   };
+   - LWRAPPERTYPE is a structure wrapping the nodes and others metadata (first,
+ last, size).
+ */
+
+
+/* Mutative operations:
+- append
+- prepend
+- insert_before
+- pop_front
+- pop_back
+- remove
+- swap
+   The header and body of each of those operation can be declared individually,
+   or as a whole via LINKED_LIST_MUTATIVE_OPS_PROTOTYPE for the prototypes, and
+   LINKED_LIST_MUTATIVE_OPS_DECL for the implementations.  */
+
+/* Append the given node new_ to the exising list.
+   Precondition: prev and next of new_ must be NULL.  */
+#define LINKED_LIST_APPEND(LTYPE)  LTYPE##_append
+
+#define LINKED_LIST_DECL_APPEND(LWRAPPERTYPE, LTYPE, EXPORT)   \
+  EXPORT void

[PATCH v5 9/11] openmp: Add support for using custom mappers with iterators (C, C++)

2025-07-09 Thread Kwok Cheung Yeung

This patch adds support for custom mappers with iterators (C and C++ 
only, as the Fortran custom mapper support has not been committed yet, 
nor has support for nested mappers).


It works by propagating clause iterators onto new clauses generated by 
mappers. As this occurs early in the front-end, the middle-end will 
treat it the same as if the iterators were specified explicitly.
From 46a70c58505f6c0ee6af57997de81da94f0faa11 Mon Sep 17 00:00:00 2001
From: Kwok Cheung Yeung 
Date: Mon, 13 Jan 2025 13:08:07 +
Subject: [PATCH 09/11] openmp: Add support for using custom mappers with
 iterators (C, C++)

gcc/c-family/

* c-omp.cc (omp_instantiate_mapper): Apply iterator to new clauses
generated from mapper.

gcc/c/

* c-parser.cc (c_parser_omp_clause_map): Apply iterator to push and
pop mapper clauses.

gcc/cp/

* parser.cc (cp_parser_omp_clause_map): Apply iterator to push and
pop mapper clauses.

libgomp/

* testsuite/libgomp.c-c++-common/mapper-iterators-1.c: New test.
* testsuite/libgomp.c-c++-common/mapper-iterators-2.c: New test.

Co-authored-by: Andrew Stubbs 
---
 gcc/c-family/c-omp.cc |  2 +
 gcc/c/c-parser.cc |  4 +
 gcc/cp/parser.cc  |  4 +
 .../libgomp.c-c++-common/mapper-iterators-1.c | 83 +++
 .../libgomp.c-c++-common/mapper-iterators-2.c | 81 ++
 5 files changed, 174 insertions(+)
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/mapper-iterators-1.c
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/mapper-iterators-2.c

diff --git a/gcc/c-family/c-omp.cc b/gcc/c-family/c-omp.cc
index fe272888c51..1b68179db88 100644
--- a/gcc/c-family/c-omp.cc
+++ b/gcc/c-family/c-omp.cc
@@ -4392,6 +4392,7 @@ omp_instantiate_mapper (tree *outlist, tree mapper, tree 
expr,
   tree clauses = OMP_DECLARE_MAPPER_CLAUSES (mapper);
   tree dummy_var = OMP_DECLARE_MAPPER_DECL (mapper);
   tree mapper_name = NULL_TREE;
+  tree iterator = *outlist ? OMP_CLAUSE_ITERATORS (*outlist) : NULL_TREE;
 
   remap_mapper_decl_info map_info;
   map_info.dummy_var = dummy_var;
@@ -4476,6 +4477,7 @@ omp_instantiate_mapper (tree *outlist, tree mapper, tree 
expr,
  continue;
}
 
+  OMP_CLAUSE_ITERATORS (unshared) = iterator;
   *outlist = unshared;
   outlist = &OMP_CLAUSE_CHAIN (unshared);
 }
diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
index 0ecc3e88be5..1cd14b814f3 100644
--- a/gcc/c/c-parser.cc
+++ b/gcc/c/c-parser.cc
@@ -20290,6 +20290,8 @@ c_parser_omp_clause_map (c_parser *parser, tree list, 
bool declare_mapper_p)
   tree name = build_omp_clause (input_location, OMP_CLAUSE_MAP);
   OMP_CLAUSE_SET_MAP_KIND (name, GOMP_MAP_PUSH_MAPPER_NAME);
   OMP_CLAUSE_DECL (name) = mapper_name;
+  if (iterators)
+   OMP_CLAUSE_ITERATORS (name) = iterators;
   OMP_CLAUSE_CHAIN (name) = nl;
   nl = name;
 
@@ -20298,6 +20300,8 @@ c_parser_omp_clause_map (c_parser *parser, tree list, 
bool declare_mapper_p)
   name = build_omp_clause (input_location, OMP_CLAUSE_MAP);
   OMP_CLAUSE_SET_MAP_KIND (name, GOMP_MAP_POP_MAPPER_NAME);
   OMP_CLAUSE_DECL (name) = null_pointer_node;
+  if (iterators)
+   OMP_CLAUSE_ITERATORS (name) = iterators;
   OMP_CLAUSE_CHAIN (name) = OMP_CLAUSE_CHAIN (last_new);
   OMP_CLAUSE_CHAIN (last_new) = name;
 }
diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index 4f496433135..e819dad03ae 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -42838,6 +42838,8 @@ cp_parser_omp_clause_map (cp_parser *parser, tree list, 
bool declare_mapper_p)
   tree name = build_omp_clause (input_location, OMP_CLAUSE_MAP);
   OMP_CLAUSE_SET_MAP_KIND (name, GOMP_MAP_PUSH_MAPPER_NAME);
   OMP_CLAUSE_DECL (name) = mapper_name;
+  if (iterators)
+   OMP_CLAUSE_ITERATORS (name) = iterators;
   OMP_CLAUSE_CHAIN (name) = nlist;
   nlist = name;
 
@@ -42846,6 +42848,8 @@ cp_parser_omp_clause_map (cp_parser *parser, tree list, 
bool declare_mapper_p)
   name = build_omp_clause (input_location, OMP_CLAUSE_MAP);
   OMP_CLAUSE_SET_MAP_KIND (name, GOMP_MAP_POP_MAPPER_NAME);
   OMP_CLAUSE_DECL (name) = null_pointer_node;
+  if (iterators)
+   OMP_CLAUSE_ITERATORS (name) = iterators;
   OMP_CLAUSE_CHAIN (name) = OMP_CLAUSE_CHAIN (last_new);
   OMP_CLAUSE_CHAIN (last_new) = name;
 }
diff --git a/libgomp/testsuite/libgomp.c-c++-common/mapper-iterators-1.c 
b/libgomp/testsuite/libgomp.c-c++-common/mapper-iterators-1.c
new file mode 100644
index 000..193823744bd
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c-c++-common/mapper-iterators-1.c
@@ -0,0 +1,83 @@
+/* { dg-do run } */
+
+#include 
+#include 
+#include 
+#include 
+
+#define DIM1 4
+#define DIM2 16
+
+struct S {
+  int *arr1;
+  float *arr2;
+  size_t len;
+};
+
+size_t
+mkarray (struct S arr[])
+{
+  size_t sum = 0;
+
+  for (int i = 0;

[PATCH v2 0/1] libiberty: add routines to handle type-sensitive doubly linked lists

2025-07-09 Thread Matthieu Longo

This patch was originally part of [1]. Merging it in GCC is a prerequisite of 
merging it inside binutils.

Those methods's implementation is relying on duck-typing at compile time. The 
structure corresponding to the node of a doubly linked list needs to define 
attributes 'prev' and 'next' which are pointers on the type of a node.
The structure wrapping the nodes and others metadata (first, last, size) needs 
to define pointers 'first', and 'last' of the node's type, and an integer type 
for 'size'.

Mutative methods can be bundled together and be declarable once via a same 
macro, or can be declared separately. The merge sort is bundled separately.
There are 3 types of macros:
1. for the declaration of prototypes: to use in a header file for a public 
declaration, or as a forward declaration in the source file for private 
declaration.
2. for the declaration of the implementation: to use always in a source file.
3. for the invocation of the functions.

The methods can be declared either public or private via the second argument of 
the declaration macros.

List of currently implemented methods:
- LINKED_LIST_*:
- APPEND: insert a node at the end of the list.
- PREPEND: insert a node at the beginning of the list.
- INSERT_BEFORE: insert a node before the given node.
- POP_FRONT: remove the first node of the list.
- POP_BACK: remove the last node of the list.
- REMOVE: remove the given node from the list.
- SWAP: swap the two given nodes in the list.
- LINKED_LIST_MERGE_SORT: a merge sort implementation.

Regression tested on aarch64-unknown-linux-gnu. No failure found.

[1]: 
https://inbox.sourceware.org/binutils/20250509151319.88725-10-matthieu.lo...@arm.com/

## Diff against previous revisions

### Revision 1 -> 2:

Patch series v1: 
https://inbox.sourceware.org/gcc-patches/20250703105942.732907-1-matthieu.lo...@arm.com/
Diff:
- Address Richard Sandiford's comments from revision 1.
- Add 2 new tests cases for swap.

Regards,
Matthieu


Matthieu Longo (1):
  libiberty: add routines to handle type-sensitive doubly linked lists

 include/doubly-linked-list.h  | 447 ++
 libiberty/Makefile.in |   1 +
 libiberty/testsuite/Makefile.in   |  12 +-
 libiberty/testsuite/test-doubly-linked-list.c | 269 +++
 4 files changed, 728 insertions(+), 1 deletion(-)
 create mode 100644 include/doubly-linked-list.h
 create mode 100644 libiberty/testsuite/test-doubly-linked-list.c

-- 
2.50.1

Re: [PATCH 2/2] tree-optimization/109893 - allow more backwards jump threading

2025-07-09 Thread Richard Biener

On Wed, Jul 9, 2025 at 2:16 PM Jeff Law  wrote:
>
>
>
> On 7/9/25 12:27 AM, Richard Biener wrote:
> > The following changes the percentage that determines how many
> > stmts are allowed for backwards jump threading from 50 to 54,
> > enabling the missed jump threading observed in PR109893.
> >
> > Bootstrapped and tested on x86_64-unknown-linux-gnu.  It seems that
> > at least backward threading is prone to profile mismatches, I've
> > altered two testcases to deal with new ones to pop up (definitely
> > latent issues).
> >
> > OK?
> >
> >   PR tree-optimization/109893
> >   * params.opt (fsm-scale-path-stmts): Change from 50 to 54.
> >
> >   * gcc.dg/tree-ssa/pr109893.c: New testcase.
> >   * gcc.dg/tree-prof/cmpsf-1.c: XFAIL.
> >   * gcc.dg/tree-ssa/pr109893.c: Remove scan on no profile
> >   mismatches.
> My recollection is the scaling factor was set one based on some old PR
> where code size exploded and wasn't really tuned further after that.  If
> the new value is working better, then that's obviously fine with me.
> Ideally we'd just get rid of the magic ratio

Yes, there's some hand-waving about the forward threader having ways
to re-use copied blocks but the backward threader does not.  Thus the
scaling to offset for that.  OTOH the backward threader does not have
any means to count likely eliminated stmts, so it already accounts things
as more expensive as the forward threader.

Unless somebody is going to spend time cleaning things up further
(we've talked about getting rid of the forward threader), this is what I
have to offer.

Richard.

>
> Jeff
>

[committed] s390: Fix vector pattern tests for -m31.

2025-07-09 Thread Juergen Christ

Vectorization of int patterns requires 64bit long type (at least the
way the tests are coded).  Fix this to only test for successful
vectoriation on 64bit targets.

Signed-off-by: Juergen Christ 

gcc/testsuite/ChangeLog:

* gcc.target/s390/vector/pattern-avg-1.c: Fix on -m31.
* gcc.target/s390/vector/pattern-mulh-1.c: Fix on -m31.
* gcc.target/s390/vector/pattern-mulh-2.c: Fix on -m31.
---
 gcc/testsuite/gcc.target/s390/vector/pattern-avg-1.c  | 3 ++-
 gcc/testsuite/gcc.target/s390/vector/pattern-mulh-1.c | 3 +--
 gcc/testsuite/gcc.target/s390/vector/pattern-mulh-2.c | 3 ++-
 3 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/gcc/testsuite/gcc.target/s390/vector/pattern-avg-1.c 
b/gcc/testsuite/gcc.target/s390/vector/pattern-avg-1.c
index 30c6ed476846..285ebc9a3a56 100644
--- a/gcc/testsuite/gcc.target/s390/vector/pattern-avg-1.c
+++ b/gcc/testsuite/gcc.target/s390/vector/pattern-avg-1.c
@@ -22,4 +22,5 @@ TEST(char,short,16)
 TEST(short,int,8)
 TEST(int,long,4)
 
-/* { dg-final { scan-tree-dump-times "\.AVG_CEIL" 6 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "\.AVG_CEIL" 6 "optimized" { target lp64 
} } } */
+/* { dg-final { scan-tree-dump-times "\.AVG_CEIL" 4 "optimized" { target { ! 
lp64 } } } } */
diff --git a/gcc/testsuite/gcc.target/s390/vector/pattern-mulh-1.c 
b/gcc/testsuite/gcc.target/s390/vector/pattern-mulh-1.c
index f71ef06c8252..f0b37d63847c 100644
--- a/gcc/testsuite/gcc.target/s390/vector/pattern-mulh-1.c
+++ b/gcc/testsuite/gcc.target/s390/vector/pattern-mulh-1.c
@@ -23,6 +23,5 @@
 
 TEST(char,short,16,8)
 TEST(short,int,8,16)
-TEST(int,long,4,32)
 
-/* { dg-final { scan-tree-dump-times "\.MULH" 6 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "\.MULH" 4 "optimized" } } */
diff --git a/gcc/testsuite/gcc.target/s390/vector/pattern-mulh-2.c 
b/gcc/testsuite/gcc.target/s390/vector/pattern-mulh-2.c
index 6ac6855b1bdf..2ff66b7ffaad 100644
--- a/gcc/testsuite/gcc.target/s390/vector/pattern-mulh-2.c
+++ b/gcc/testsuite/gcc.target/s390/vector/pattern-mulh-2.c
@@ -21,6 +21,7 @@
 (((unsigned T2)l[i] * (unsigned T2)r[i]) >> S); \
   }
 
+TEST(int,long,4,32)
 TEST(long,__int128,2,64)
 
-/* { dg-final { scan-tree-dump-times "\.MULH" 2 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "\.MULH" 4 "optimized" } } */
-- 
2.43.5

RE: [PATCH 1/3]middle-end: support vec_cbranch_any and vec_cbranch_all [PR118974]

2025-07-09 Thread Richard Biener

On Wed, 9 Jul 2025, Tamar Christina wrote:

> > -Original Message-
> > From: Richard Biener 
> > Sent: Wednesday, July 9, 2025 12:36 PM
> > To: Tamar Christina 
> > Cc: gcc-patches@gcc.gnu.org; nd 
> > Subject: RE: [PATCH 1/3]middle-end: support vec_cbranch_any and
> > vec_cbranch_all [PR118974]
> > 
> > On Wed, 9 Jul 2025, Tamar Christina wrote:
> > 
> > > > > +Operand 0 is a comparison operator.  Operand 1 and operand 2 are the
> > > > > +first and second operands of the comparison, respectively.  Operand 3
> > > > > +is the @code{code_label} to jump to.
> > > > > +
> > > > > +@cindex @code{cbranch_all@var{mode}4} instruction pattern
> > > > > +@item @samp{cbranch_all@var{mode}4}
> > > > > +Conditional branch instruction combined with a compare instruction on
> > vectors
> > > > > +where it is required that at all of the elementwise comparisons of 
> > > > > the
> > > > > +two input vectors are true.
> > > >
> > > > See above.
> > > >
> > > > When I look at the RTL for aarch64 I wonder whether the middle-end
> > > > can still invert a jump (for BB reorder, for example)?  Without
> > > > a cbranch_none expander we have to invert during RTL expansion?
> > > >
> > >
> > > Isn't cbranch_none just cbranch_all x 0? i.e. all value must be zero.
> > > I think all states are expressible with any and all and flipping the 
> > > branches
> > > so it shouldn't be any more restrictive than cbranch itself is today.
> > >
> > > cbranch also only supports eq and ne, so none would be cbranch (eq x 0)
> > >
> > > and FTR the RTL generated for AArch64 (Both SVE And Adv.SIMD) will be
> > simplified to:
> > >
> > > (insn 23 22 24 5 (parallel [
> > > (set (reg:VNx4BI 128 [ mask_patt_14.15_57 ])
> > > (unspec:VNx4BI [
> > > (reg:VNx4BI 129)
> > > (const_int 0 [0x0])
> > > (gt:VNx4BI (reg:VNx4SI 114 [ vect__2.11 ])
> > > (const_vector:VNx4SI repeat [
> > > (const_int 0 [0])
> > > ]))
> > > ] UNSPEC_PRED_Z))
> > > (clobber (reg:CC_NZC 66 cc))
> > > ]) "cbranch.c":25:10 -1
> > >
> > > (jump_insn 27 26 28 5 (set (pc)
> > > (if_then_else (eq (reg:CC_Z 66 cc)
> > > (const_int 0 [0]))
> > > (label_ref 33)
> > > (pc))) "cbranch.c":25:10 -1
> > >  (int_list:REG_BR_PROB 1014686025 (nil))
> > >
> > > The thing is we can't rid of the unspecs as there's concept of masking in 
> > > RTL
> > compares.
> > > We could technically do an AND (and do in some cases) but then you lose 
> > > the
> > predicate
> > > Hint constant in the RTL which tells you whether the mask is known to be 
> > > all true
> > or not.
> > > This hint is crucial to allow for further optimizations.
> > >
> > > That said the condition code, branch and compares are fully exposed.
> > >
> > > We expand to a larger sequence than I'd like mostly because there's no 
> > > support
> > > for conditional cbranch optabs, or even conditional vector comparisons. 
> > > So the
> > comparisons
> > > must be generated unpredicated by generating an all true mask, and later
> > patterns
> > > merge in the AND.
> > >
> > > The new patterns allow us to clean up codegen for Adv.SIMD + SVE (in a 
> > > single
> > loop)
> > > But not pure SVE.  For which I take a different approach to try to avoid 
> > > requiring
> > > a predicated version of these optabs.
> > >
> > > I don't want to push my luck, but would you be ok with a conditional 
> > > version of
> > these
> > > optabs too? i.e. cond_cbranch_all and cond_cbranch_all?  This would allow 
> > > us to
> > > immediately expand to the correct representation for both SVE and Adv.SIMD
> > > without having to rely on various combine patterns and cc-fusion to 
> > > optimize the
> > sequences
> > > later on (which has historically been a bit hit or miss if someone adds a 
> > > new CC
> > pattern).
> > 
> > Can you explain?  A conditional conditional branch makes my head hurt.
> > It's really a cbranch_{all,any} where the (vector) compare has an
> > additional mask applied?  So cbranch_cond_{all,any} would be a better
> > fit?
> 
> Yeah so cbranch is itself in GIMPLE
> 
> c = vec_a `cmp` vec_b
> if (c {any,all} 0)
>   ...
> 
> where cbranch_{all, any} represents the gimple
> 
> If (vec_a `cmp` vec_b)
>   ...
> 
> cbranch_cond_{all, any} would represent
> 
> if ((vec_a `cmp` vec_b) & loop_mask)
>   ...
> 
> In GIMPLE we mask most operations by & the mask with the result
> of the operation.  But cbranch doesn't have an LHS, so we can't wrap
> the & around anything.  And because of this we rely on backend patterns
> to later push the mask from the & into the operation such that we can
> generate the predicated compare.

OK, so we could already do

 c = .COND_`cmp` (vec_a, vec_b, loop_mask, { 0, 0... });
 if (c {any,all} 0)

but I can see how cond_cbranch is

Re: [AutoFDO] Fix get_original_name to strip only names that are generated after auto-profile

2025-07-09 Thread Jan Hubicka

> 
> I am seeing an ICEs in offline pass.
> 
> 
> during IPA pass: afdo_offline
> gmsh/src/mesh/meshGEdge.cpp:979:1: internal compiler error: in 
> set_call_location, at auto-profile.cc:433

I added location and call_location into function instance that are
originally set to UNKNOWN_LOCATION and later they are re-written by
actual location of function and call to it (for inline instances).
The ICE means that we have two gimple call statements that we think
calls the given location which is likely problem iwth a discriminator.

create_gcov often outputs discriminator 0 for calls with discriinator
non-0.  This is common situatio nin C++ where single statement implies a
lot of calls. It is bug at create_gcov side in parsing dwarf5
abbreviations fixed by

diff --git a/util/symbolize/addr2line_inlinestack.cc 
b/util/symbolize/addr2line_inlinestack.cc
index f68f6e1..8eeb8bd 100644
--- a/util/symbolize/addr2line_inlinestack.cc
+++ b/util/symbolize/addr2line_inlinestack.cc
@@ -493,6 +493,12 @@ void InlineStackHandler::ProcessAttributeSigned(
 subprogram_stack_.back()->set_callsite_line(data);
 break;
 
+  // In case discriminator is implicit const, it is processed as signed
+  // rather then unsigned value.
+  case DW_AT_GNU_discriminator:
+CHECK(form == DW_FORM_implicit_const);
+subprogram_stack_.back()->set_callsite_discr(data);
+break;
   default:
 break;
 }  

I tried to implement a workaround to match lost discriminator in cases
this is uniquely deterined, but it is not so easy to do.
My plan is to figure out how to upstream it and then drop the lost
discriminator workaround from match.

Do you see warnings with -Wauto-profile? 
Honza
>   979 | }
>   | ^
> 0x262582b internal_error(char const*, ...)
> ../../gcc/gcc/diagnostic-global-context.cc:517
> 0x864513 fancy_abort(char const*, int, char const*)
> ../../gcc/gcc/diagnostic.cc:1810
> 0x22da0e7 autofdo::function_instance::set_call_location(unsigned long)
> ../../gcc/gcc/auto-profile.cc:433
> 0x22da0e7 autofdo::function_instance::set_call_location(unsigned long)
> ../../gcc/gcc/auto-profile.cc:431
> 0x22da0e7 autofdo::function_instance::match(cgraph_node*, 
> vec&, hash_map -1, -2>, int, simple_hashmap_traits 
> >, int> >&)
> ../../gcc/gcc/auto-profile.cc:1498
> 0x22d8c8b autofdo::function_instance::match(cgraph_node*, 
> vec&, hash_map -1, -2>, int, simple_hashmap_traits 
> >, int> >&)
> ../../gcc/gcc/auto-profile.cc:1258
> 0x22d8c8b autofdo::function_instance::match(cgraph_node*, 
> vec&, hash_map -1, -2>, int, simple_hashmap_traits 
> >, int> >&)
> ../../gcc/gcc/auto-profile.cc:1638
> 0x22ddf6f autofdo::function_instance::match(cgraph_node*, 
> vec&, hash_map -1, -2>, int, simple_hashmap_traits 
> >, int> >&)
> ../../gcc/gcc/hash-table.h:994
> 0x22ddf6f autofdo::autofdo_source_profile::offline_external_functions()
> ../../gcc/gcc/auto-profile.cc:2032
> 0x22de0f3 execute
> ../../gcc/gcc/auto-profile.cc:4066
> 
> Here stmt is D.293641 = OBJ_TYPE_REF(_7;(const struct GEdge)from->57B) 
> (from); and set_call_location has call_location_ != UNKNOWN_LOCATION 
> 
> Thanks,
> Kugan

Re: [PATCH 2/2] tree-optimization/109893 - allow more backwards jump threading

2025-07-09 Thread Jan Hubicka

> The following changes the percentage that determines how many
> stmts are allowed for backwards jump threading from 50 to 54,
> enabling the missed jump threading observed in PR109893.
> 
> Bootstrapped and tested on x86_64-unknown-linux-gnu.  It seems that
> at least backward threading is prone to profile mismatches, I've
> altered two testcases to deal with new ones to pop up (definitely
> latent issues).
> 
> OK?
I wonder if the duplication limit should not be controlled by statement
size rather than statement count?

Honza

[PATCH v5 0/11] openmp: Add support for iterators in OpenMP mapping clauses

2025-07-09 Thread Kwok Cheung Yeung


This is yet another revision of the patch series posted at:
https://gcc.gnu.org/pipermail/gcc-patches/2024-November/670333.html
and incorporates the non-constant iterator bounds support posted at:
https://gcc.gnu.org/pipermail/gcc-patches/2024-December/671630.html

Compared to the previous patch set (v4), this set contains additional 
support for some recently committed OpenMP features (custom mapper and 
Fortran deep mapping), and bug-fixes, particularly for the handling of 
structs.


Tested on a x86-64 Linux host with offloading to nvptx.

Kwok

Re: [PATCH] cobol: Implement CXXFLAGS_FOR_COBOL.

2025-07-09 Thread Andreas Schwab

On Jul 09 2025, Robert Dubner wrote:

> An attempt to override that with
>
>   make CFLAGS-cobol/genapi.o=-DHARMLESS
>
> has no effect on the value of the CFLAGS-cobol/genapi.o

If you are doing this from the toplevel then it's because only known
variables are passed to subdirs.

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."

Re: [PATCH v1] rs6000: Restore opaque overload variant for correct diagnostics

2025-07-09 Thread Kishan Parmar

Ping!

please review.

Thanks & Regards
Kishan

On 05/06/25 12:36 pm, Kishan Parmar wrote:
> Hi All,
>
> The following patch has been bootstrapped and regtested on powerpc64le-linux.
>
> After r12-5752-gd08236359eb229, a new bif infrastructure was introduced
> which stopped using opaque vector types (e.g. opaque_V4SI_type_node)
> for overloaded built-in functions, which led to incorrect and
> misleading diagnostics when argument types didn’t exactly match.
>
> This patch reinstates the opaque overload variant for entries with
> multiple arguments where at least one is a vector, inserting it
> at the beginning of each stanza. This helps recover the intended
> fallback behavior and ensures clearer, type-generic error reporting.
>
> 2025-05-23  Kishan Parmar  
>
> gcc:
>   PR target/104930
>   * config/rs6000/rs6000-c.cc (altivec_resolve_overloaded_builtin):
>   Skip the first overload entry during iteration if it uses opaque type
>   parameters.
>   * config/rs6000/rs6000-gen-builtins.cc
>   (maybe_generate_opaque_variant): New function.
>   (parse_first_ovld_entry): New function.
>   (parse_ovld_stanza): call parse_first_ovld_entry.
> ---
>  gcc/config/rs6000/rs6000-c.cc|   9 +-
>  gcc/config/rs6000/rs6000-gen-builtins.cc | 180 ++-
>  2 files changed, 187 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/config/rs6000/rs6000-c.cc b/gcc/config/rs6000/rs6000-c.cc
> index d3b0a566821..6217d585b40 100644
> --- a/gcc/config/rs6000/rs6000-c.cc
> +++ b/gcc/config/rs6000/rs6000-c.cc
> @@ -1972,7 +1972,14 @@ altivec_resolve_overloaded_builtin (location_t loc, 
> tree fndecl,
>  arg_i++)
>   {
> tree parmtype = TREE_VALUE (nextparm);
> -   if (!rs6000_builtin_type_compatible (types[arg_i], parmtype))
> +   /* Since we only need opaque vector type for the default
> +  prototype which is the same as the first instance, we
> +  only expect to see it in the first instance.  */
> +   gcc_assert (instance == 
> rs6000_overload_info[adj_fcode].first_instance
> +   || parmtype != opaque_V4SI_type_node);
> +   if ((instance == rs6000_overload_info[adj_fcode].first_instance
> +&& parmtype == opaque_V4SI_type_node)
> +   || !rs6000_builtin_type_compatible (types[arg_i], parmtype))
>   {
> mismatch = true;
> break;
> diff --git a/gcc/config/rs6000/rs6000-gen-builtins.cc 
> b/gcc/config/rs6000/rs6000-gen-builtins.cc
> index f77087e0452..d442b93138e 100644
> --- a/gcc/config/rs6000/rs6000-gen-builtins.cc
> +++ b/gcc/config/rs6000/rs6000-gen-builtins.cc
> @@ -353,6 +353,7 @@ struct typeinfo
>char isunsigned;
>char isbool;
>char ispixel;
> +  char isopaque;
>char ispointer;
>basetype base;
>restriction restr;
> @@ -579,6 +580,7 @@ static typemap type_map[] =
>  { "v4sf","V4SF" },
>  { "v4si","V4SI" },
>  { "v8hi","V8HI" },
> +{ "vop4si",  "opaque_V4SI" },
>  { "vp8hi",   "pixel_V8HI" },
>};
>  
> @@ -1058,6 +1060,7 @@ match_type (typeinfo *typedata, int voidok)
> vdvector double
> v256  __vector_pair
> v512  __vector_quad
> +   vop   vector opaque
>  
>   For simplicity, We don't support "short int" and "long long int".
>   We don't currently support a  of "_Float16".  "signed"
> @@ -1496,6 +1499,12 @@ complete_vector_type (typeinfo *typeptr, char *buf, 
> int *bufi)
>*bufi += 4;
>return;
>  }
> +  else if (typeptr->isopaque)
> +{
> +  memcpy (&buf[*bufi], "op4si", 5);
> +  *bufi += 5;
> +  return;
> +}
>switch (typeptr->base)
>  {
>  case BT_CHAR:
> @@ -1661,7 +1670,8 @@ construct_fntype_id (prototype *protoptr)
> buf[bufi++] = '_';
> if (argptr->info.isconst
> && argptr->info.base == BT_INT
> -   && !argptr->info.ispointer)
> +   && !argptr->info.ispointer
> +   && !argptr->info.isopaque)
>   {
> buf[bufi++] = 'c';
> buf[bufi++] = 'i';
> @@ -1969,6 +1979,168 @@ create_bif_order (void)
>rbt_inorder_callback (&bifo_rbt, bifo_rbt.rbt_root, set_bif_order);
>  }
>  
> +/* Attempt to generate an opaque variant if needed and valid.  */
> +static void
> +maybe_generate_opaque_variant (ovlddata* entry)
> +{
> +  /* If no vector arg, no need to create opaque variant.  */
> +  bool has_vector_arg = false;
> +  for (typelist* arg = entry->proto.args; arg; arg = arg->next)
> +{
> +  if (arg->info.isvector)
> + {
> +   has_vector_arg = true;
> +   break;
> + }
> +}
> +
> +  if (!has_vector_arg || entry->proto.nargs <= 1)
> +return;
> +
> +  /* Construct the opaque variant.  */
> +  ovlddata* opaque_entry = &ovlds[curr_ovld];
> +  memcpy (opaque_entry, entry, sizeof (*entry));
> +
> +  /

Re: [PATCH] aarch64: Extend HVLA permutations to big-endian

2025-07-09 Thread Richard Sandiford

Richard Sandiford  writes:
> TARGET_VECTORIZE_VEC_PERM_CONST has code to match the SVE2.1
> "hybrid VLA" DUPQ, EXTQ, UZPQ{1,2}, and ZIPQ{1,2} instructions.
> This matching was conditional on !BYTES_BIG_ENDIAN.
>
> The ACLE code also lowered the associated SVE2.1 intrinsics into
> suitable VEC_PERM_EXPRs.  This lowering was not conditional on
> !BYTES_BIG_ENDIAN.
>
> The mismatch led to lots of ICEs in the ACLE tests on big-endian
> targets: we lowered to VEC_PERM_EXPRs that are not supported.
>
> I think the !BYTES_BIG_ENDIAN restriction was unnecessary.
> SVE maps the first memory element to the least significant end of
> the register for both endiannesses, so no endian correction or lane
> number adjustment is necessary.
>
> This is in some ways a bit counterintuitive.  ZIPQ1 is conceptually
> "apply Advanced SIMD ZIP1 to each 128-bit block" and endianness does
> matter when choosing between Advanced SIMD ZIP1 and ZIP2.  For example,
> the V4SI permute selector { 0, 4, 1, 5 } corresponds to ZIP1 for little-
> endian and ZIP2 for big-endian.  But the difference between the hybrid
> VLA and Advanced SIMD permute selectors is a consequence of the
> difference between the SVE and Advanced SIMD element orders.
>
> The same thing applies to ACLE intrinsics.  The current lowering of
> svzipq1 etc. is correct for both endiannesses.  If ACLE code does:
>
>   2x svld1_s32 + svzipq1_s32 + svst1_s32
>
> then the byte-for-byte result is the same for both endiannesses.
> On big-endian targets, this is different from using the Advanced SIMD
> sequence below for each 128-bit block:
>
>   2x LDR + ZIP1 + STR
>
> In contrast, the byte-for-byte result of:
>
>   2x svld1q_gather_s32 + svzipq1_s32 + svst11_scatter_s32
>
> depends on endianness, since the quadword gathers and scatters use
> Advanced SIMD byte ordering for each 128-bit block.  This gather/scatter
> sequence behaves in the same way as the Advanced SIMD LDR+ZIP1+STR
> sequence for both endiannesses.
>
> Programmers writing ACLE code have to be aware of this difference
> if they want to support both endiannesses.
>
> The patch includes some new execution tests to verify the expansion
> of the VEC_PERM_EXPRs.
>
> Tested on aarch64-linux-gnu and aarch64_be-elf.  OK to install?
>
> Richard
>
>
> gcc/
>   * doc/sourcebuild.texi (aarch64_sve2_hw, aarch64_sve2p1_hw): Document.
>   * config/aarch64/aarch64.cc (aarch64_evpc_hvla): Extend to
>   BYTES_BIG_ENDIAN.
>
> gcc/testsuite/
>   * lib/target-supports.exp (check_effective_target_aarch64_sve2p1_hw):
>   New proc.
>   * gcc.target/aarch64/sve2/dupq_1.c: Extend to big-endian.  Add
>   noipa attributes.
>   * gcc.target/aarch64/sve2/extq_1.c: Likewise.
>   * gcc.target/aarch64/sve2/uzpq_1.c: Likewise.
>   * gcc.target/aarch64/sve2/zipq_1.c: Likewise.

Just noticed that I failed to add nopia to the other files -- will fix.

>   * gcc.target/aarch64/sve2/dupq_1_run.c: New test.
>   * gcc.target/aarch64/sve2/extq_1_run.c: Likewise.
>   * gcc.target/aarch64/sve2/uzpq_1_run.c: Likewise.
>   * gcc.target/aarch64/sve2/zipq_1_run.c: Likewise.

Re: [PATCH v1] rs6000: Fix UBSAN runtime errors for powerpc64le-unknown-linux-gnu

2025-07-09 Thread Kishan Parmar

Ping!

Please review.

Thanks & Regards,
Kishan
On 26/06/25 1:26 pm, Kishan Parmar wrote:
> Hi All,
>
> The following patch has been bootstrapped and regtested on powerpc64le-linux.
>
> While building GCC with --with-build-config=bootstrap-ubsan on
> powerpc64le-unknown-linux-gnu, multiple UBSAN runtime errors were
> encountered in rs6000.cc and rs6000.md due to undefined behavior
> involving left shifts on negative values and shift exponents equal to
> or exceeding the type width.
>
> The issue was in bit pattern recognition code
> (in can_be_rotated_to_negative_lis and can_be_built_by_li_and_rldic),
> where signed values were shifted without handling negative inputs or
> guarding against shift counts equal to the type width, causing UB.
> The fix ensures shifts and rotations are done unsigned HOST_WIDE_INT,
> and casting back only where needed (like for arithmetic right shifts)
> with proper guards to prevent shift-by-64.
>
> 2025-06-26  Kishan Parmar  
>
> gcc:
>   PR target/118890
>   * config/rs6000/rs6000.cc (can_be_rotated_to_negative_lis):
>   Avoid left shift of negative value and guard shift count.
>   (can_be_built_by_li_and_rldic): Likewise.
>   (rs6000_emit_set_long_const): Likewise.
>   * config/rs6000/rs6000.md : Avoid signed overflow.
> ---
>  gcc/config/rs6000/rs6000.cc | 24 ++--
>  gcc/config/rs6000/rs6000.md |  4 +++-
>  2 files changed, 17 insertions(+), 11 deletions(-)
>
> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
> index 7ee26e52b13..e7e30fa95ba 100644
> --- a/gcc/config/rs6000/rs6000.cc
> +++ b/gcc/config/rs6000/rs6000.cc
> @@ -10309,15 +10309,18 @@ can_be_rotated_to_negative_lis (HOST_WIDE_INT c, 
> int *rot)
>  
>/* case b. xx0..01..1xx: some of 15 x's (and some of 16 0's) are
>   rotated over the highest bit.  */
> -  int pos_one = clz_hwi ((c << 16) >> 16);
> -  middle_zeros = ctz_hwi (c >> (HOST_BITS_PER_WIDE_INT - pos_one));
> -  int middle_ones = clz_hwi (~(c << pos_one));
> -  if (middle_zeros >= 16 && middle_ones >= 33)
> +  unsigned HOST_WIDE_INT uc = (unsigned HOST_WIDE_INT)c;
> +  int pos_one = clz_hwi ((HOST_WIDE_INT)(uc << 16) >> 16);
> +  if (pos_one != 0)
>  {
> -  *rot = pos_one;
> -  return true;
> +  middle_zeros = ctz_hwi (c >> (HOST_BITS_PER_WIDE_INT - pos_one));
> +  int middle_ones = clz_hwi (~(uc << pos_one));
> +  if (middle_zeros >= 16 && middle_ones >= 33)
> + {
> +   *rot = pos_one;
> +   return true;
> + }
>  }
> -
>return false;
>  }
>  
> @@ -10434,7 +10437,7 @@ can_be_built_by_li_and_rldic (HOST_WIDE_INT c, int 
> *shift, HOST_WIDE_INT *mask)
>if (lz >= HOST_BITS_PER_WIDE_INT)
>  return false;
>  
> -  int middle_ones = clz_hwi (~(c << lz));
> +  int middle_ones = clz_hwi (~(((unsigned HOST_WIDE_INT)c) << lz));
>if (tz + lz + middle_ones >= ones
>&& (tz - lz) < HOST_BITS_PER_WIDE_INT
>&& tz < HOST_BITS_PER_WIDE_INT)
> @@ -10468,7 +10471,7 @@ can_be_built_by_li_and_rldic (HOST_WIDE_INT c, int 
> *shift, HOST_WIDE_INT *mask)
>if (!IN_RANGE (pos_first_1, 1, HOST_BITS_PER_WIDE_INT-1))
>  return false;
>  
> -  middle_ones = clz_hwi (~c << pos_first_1);
> +  middle_ones = clz_hwi ((~(unsigned HOST_WIDE_INT)c) << pos_first_1);
>middle_zeros = ctz_hwi (c >> (HOST_BITS_PER_WIDE_INT - pos_first_1));
>if (pos_first_1 < HOST_BITS_PER_WIDE_INT
>&& middle_ones + middle_zeros < HOST_BITS_PER_WIDE_INT
> @@ -10570,7 +10573,8 @@ rs6000_emit_set_long_const (rtx dest, HOST_WIDE_INT 
> c, int *num_insns)
>  {
>/* li/lis; rldicX */
>unsigned HOST_WIDE_INT imm = (c | ~mask);
> -  imm = (imm >> shift) | (imm << (HOST_BITS_PER_WIDE_INT - shift));
> +  if (shift != 0)
> + imm = (imm >> shift) | (imm << (HOST_BITS_PER_WIDE_INT - shift));
>  
>count_or_emit_insn (temp, GEN_INT (imm));
>if (shift != 0)
> diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
> index 9c718ca2a22..8fc079a4297 100644
> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -1971,7 +1971,9 @@
>  {
>HOST_WIDE_INT val = INTVAL (operands[2]);
>HOST_WIDE_INT low = sext_hwi (val, 16);
> -  HOST_WIDE_INT rest = trunc_int_for_mode (val - low, mode);
> +  /* Avoid signed overflow by computing difference in unsigned domain.  */
> +  unsigned HOST_WIDE_INT urest = (unsigned HOST_WIDE_INT)val - (unsigned 
> HOST_WIDE_INT)low;
> +  HOST_WIDE_INT rest = trunc_int_for_mode (urest, mode);
>  
>operands[4] = GEN_INT (low);
>if (mode == SImode || satisfies_constraint_L (GEN_INT (rest)))

Re: [PATCH] aarch64: Some fixes for SVE INDEX constants

2025-07-09 Thread Andrew Pinski

On Wed, Jul 9, 2025 at 7:09 AM Richard Sandiford
 wrote:
>
> When using SVE INDEX to load an Advanced SIMD vector, we need to
> take account of the different element ordering for big-endian
> targets.  For example, when big-endian targets store the V4SI
> constant { 0, 1, 2, 3 } in registers, 0 becomes the most
> significant element, whereas INDEX always operates from the
> least significant element.  A big-endian target would therefore
> load V4SI { 0, 1, 2, 3 } using:
>
> INDEX Z0.S, #3, #-1
>
> rather than little-endian's:
>
> INDEX Z0.S, #0, #1
>
> While there, I noticed that we would only check the first vector
> in a multi-vector SVE constant, which would trigger an ICE if the
> other vectors turned out to be invalid.  This is pretty difficult to
> trigger at the moment, since we only allow single-register modes to be
> used as frontend & middle-end vector modes, but it can be seen using
> the RTL frontend.
>
> Tested on aarch64-linux-gnu and aarch64_be-elf.  OK to install?

When I was reviewing the original index patch internally I was worried
about this but it looks like I didn't do a thorough enough job at it.
Anyways this is ok.

Thanks,
Andrew

>
> Richard
>
>
> gcc/
> * config/aarch64/aarch64.cc (aarch64_sve_index_series_p): New
> function, split out from...
> (aarch64_simd_valid_imm): ...here.  Account for the different
> SVE and Advanced SIMD element orders on big-endian targets.
> Check each vector in a structure mode.
>
> gcc/testsuite/
> * gcc.dg/rtl/aarch64/vec-series-1.c: New test.
> * gcc.dg/rtl/aarch64/vec-series-2.c: Likewise.
> * gcc.target/aarch64/sve/acle/general/dupq_2.c: Fix expected
> output for this big-endian test.
> * gcc.target/aarch64/sve/acle/general/dupq_4.c: Likewise.
> * gcc.target/aarch64/sve/vec_init_3.c: Restrict to little-endian
> targets and add more tests.
> * gcc.target/aarch64/sve/vec_init_4.c: New big-endian version
> of vec_init_3.c.
> ---
>  gcc/config/aarch64/aarch64.cc |  59 -
>  .../gcc.dg/rtl/aarch64/vec-series-1.c |  35 +++
>  .../gcc.dg/rtl/aarch64/vec-series-2.c |  35 +++
>  .../aarch64/sve/acle/general/dupq_2.c |   2 +-
>  .../aarch64/sve/acle/general/dupq_4.c |   2 +-
>  .../gcc.target/aarch64/sve/vec_init_3.c   | 114 +-
>  .../gcc.target/aarch64/sve/vec_init_4.c   | 209 ++
>  7 files changed, 446 insertions(+), 10 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/rtl/aarch64/vec-series-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/rtl/aarch64/vec-series-2.c
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/vec_init_4.c
>
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index ce25f4f6f9f..6d5b2009b2a 100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -23074,6 +23074,58 @@ aarch64_sve_index_immediate_p (rtx base_or_step)
>   && IN_RANGE (INTVAL (base_or_step), -16, 15));
>  }
>
> +/* Return true if SERIES is a constant vector that can be loaded using
> +   an immediate SVE INDEX, considering both SVE and Advanced SIMD modes.
> +   When returning true, store the base in *BASE_OUT and the step
> +   in *STEP_OUT.  */
> +
> +static bool
> +aarch64_sve_index_series_p (rtx series, rtx *base_out, rtx *step_out)
> +{
> +  rtx base, step;
> +  if (!const_vec_series_p (series, &base, &step)
> +  || !CONST_INT_P (base)
> +  || !CONST_INT_P (step))
> +return false;
> +
> +  auto mode = GET_MODE (series);
> +  auto elt_mode = as_a (GET_MODE_INNER (mode));
> +  unsigned int vec_flags = aarch64_classify_vector_mode (mode);
> +  if (BYTES_BIG_ENDIAN && (vec_flags & VEC_ADVSIMD))
> +{
> +  /* On big-endian targets, architectural lane 0 holds the last element
> +for Advanced SIMD and the first element for SVE; see the comment at
> +the head of aarch64-sve.md for details.  This means that, from an SVE
> +point of view, an Advanced SIMD series goes from the last element to
> +the first.  */
> +  auto i = GET_MODE_NUNITS (mode).to_constant () - 1;
> +  base = gen_int_mode (UINTVAL (base) + i * UINTVAL (step), elt_mode);
> +  step = gen_int_mode (-UINTVAL (step), elt_mode);
> +}
> +
> +  if (!aarch64_sve_index_immediate_p (base)
> +  || !aarch64_sve_index_immediate_p (step))
> +return false;
> +
> +  /* If the mode spans multiple registers, check that each subseries is
> + in range.  */
> +  unsigned int nvectors = aarch64_ldn_stn_vectors (mode);
> +  if (nvectors != 1)
> +{
> +  unsigned int nunits;
> +  if (!GET_MODE_NUNITS (mode).is_constant (&nunits))
> +   return false;
> +  nunits /= nvectors;
> +  for (unsigned int i = 1; i < nvectors; ++i)
> +   if (!IN_RANGE (INTVAL (base) + i * nunits * INTVAL (step), -16, 15))
> + return false;
> +}
> +
> +  *base_out =

[PATCH v2] libstdc++: Implement std::chrono::current_zone() for Windows

2025-07-09 Thread Björn Schäpers

From: Björn Schäpers 

On Windows there is no API to get the current time zone as IANA name,
instead Windows has its own zones. But there exists a mapping provided
by the Unicode Consortium. This patch adds a script to convert the XML
file with the mapping to a lookup table and adds a Windows code path to
use that mapping.

libstdc++-v3/Changelog:

Implement std::chrono::current_zone() for Windows

* scripts/gen_windows_zones_map.py: New file, generates
windows_zones-map.h.
* src/c++20/windows_zones-map.h: New file, contains the look up
table.
* src/c++20/tzdb.cc (tzdb::current_zone): Add Windows code path.

Signed-off-by: Björn Schäpers 
---
 libstdc++-v3/scripts/gen_windows_zones_map.py | 127 ++
 libstdc++-v3/src/c++20/tzdb.cc| 103 -
 libstdc++-v3/src/c++20/windows_zones-map.h| 407 ++
 3 files changed, 635 insertions(+), 2 deletions(-)
 create mode 100644 libstdc++-v3/scripts/gen_windows_zones_map.py
 create mode 100644 libstdc++-v3/src/c++20/windows_zones-map.h

diff --git a/libstdc++-v3/scripts/gen_windows_zones_map.py 
b/libstdc++-v3/scripts/gen_windows_zones_map.py
new file mode 100644
index 000..9ac559209cc
--- /dev/null
+++ b/libstdc++-v3/scripts/gen_windows_zones_map.py
@@ -0,0 +1,127 @@
+#!/usr/bin/env python3
+#
+# Script to generate the map for libstdc++ std::chrono::current_zone under 
Windows.
+#
+# This file is part of GCC.
+#
+# GCC is free software; you can redistribute it and/or modify it under
+# the terms of the GNU General Public License as published by the Free
+# Software Foundation; either version 3, or (at your option) any later
+# version.
+#
+# GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+# WARRANTY; without even the implied warranty of MERCHANTABILITY or
+# FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+# for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with GCC; see the file COPYING3.  If not see
+# .
+
+# To update the Libstdc++ static data in src/c++20/windows_zones-map.h 
download the latest:
+# 
https://raw.githubusercontent.com/unicode-org/cldr/master/common/supplemental/windowsZones.xml
+# Then run this script and save the output to
+# src/c++20/windows_zones-map.h
+
+import os
+import sys
+import xml.etree.ElementTree as et
+
+if len(sys.argv) != 2:
+print("Usage: %s " % sys.argv[0], file=sys.stderr)
+sys.exit(1)
+
+self = os.path.basename(__file__)
+print("// Generated by scripts/{}, do not edit.".format(self))
+print("""
+// Copyright The GNU Toolchain Authors.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// Under Section 7 of GPL version 3, you are granted additional
+// permissions described in the GCC Runtime Library Exception, version
+// 3.1, as published by the Free Software Foundation.
+
+// You should have received a copy of the GNU General Public License and
+// a copy of the GCC Runtime Library Exception along with this program;
+// see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+// .
+
+/** @file bits/windows_zones-map.h
+ *  This is an internal header file, included by other library headers.
+ *  Do not attempt to use it directly. @headername{chrono}
+ */
+""")
+
+print("#ifndef _GLIBCXX_GET_WINDOWS_ZONES_MAP")
+print('# error "This is not a public header, do not include it directly"')
+print("#endif\n")
+
+class WindowsZoneMapEntry:
+def __init__(self, windows, territory, iana):
+self.windows = windows
+self.territory = territory
+self.iana = iana
+
+def __lt__(self, other):
+if self.windows < other.windows:
+return True
+if self.windows > other.windows:
+return False
+return self.territory < other.territory
+
+windows_zone_map = []
+
+tree = et.parse(sys.argv[1])
+xml_zone_map = tree.getroot().find("windowsZones").find("mapTimezones")
+
+for entry in xml_zone_map.iter("mapZone"):
+iana = entry.attrib["type"]
+space = iana.find(" ")
+if space != -1:
+iana = iana[0:space]
+windows_zone_map.append(WindowsZoneMapEntry(entry.attrib["other"], 
entry.attrib["territory"], iana))
+
+# Sort so we can use binary search on the array.
+windows_zone_map.sort();
+
+# Skip territories which have the same IANA zone as 001, so we can reduce the 
data.
+last_windows_zone = ""
+for entry in

Re: [PATCH] libstdc++: Implement std::chrono::current_zone() for Windows

2025-07-09 Thread Björn Schäpers


Am 09.07.2025 um 16:16 schrieb Jonathan Wakely:

On Wed, 9 Jul 2025 at 15:13, Jonathan Wakely  wrote:


On Tue, 8 Jul 2025 at 21:47, Björn Schäpers  wrote:


From: Björn Schäpers 

I have based this on my previous (not yet landed) patch, but it only
reuses the #ifdef to include . Since std::array isn't used
anywhere else I thought that was the right place to put it.

I hope the formatting is okay.

I've used wide strings for the Windows zone name and territory, since
the Windows API returns wide strings and thus they can be compared
directly. For the territory there exists a narrow string API, but
internally it calls the wide string version and narrows it down. If
desired I can switch to narrow strings, the conversion can be done by
static_cast per character since only ASCII chars are used.


Working with wide strings seems fine, that's the native format.

I think the generated header should be written to src/c++20/ directory
though, since it doesn't need to be installed alongside the public
headers and doesn't need to be included by anything except tzdb.cc.
That would mean you can just do #include "windows_zones-map.h" in tzdb.cc



Will do.



-- >8 --
On Windows there is no API to get the current time zone as IANA name,
instead Windows has its own zones. But there exists a mapping provided
by the Unicode Consortium. This patch adds a script to convert the XML
file with the mapping to a lookup table and adds a Windows code path to
use that mapping.

libstdc++-v3/Changelog:

 Implement std::chrono::current_zone() for Windows

 * include/bits/windows_zones-map.h: New file, contains the look
 up table.
 * scripts/gen_windows_zones_map.py: New file, generates
 windows_zones-map.h.
 * src/c++20/tzdb.cc (tzdb::current_zone): Add Windows code path.

Signed-off-by: Björn Schäpers 
---
  libstdc++-v3/include/bits/windows_zones-map.h | 407 ++
  libstdc++-v3/scripts/gen_windows_zones_map.py | 128 ++
  libstdc++-v3/src/c++20/tzdb.cc| 102 -
  3 files changed, 635 insertions(+), 2 deletions(-)
  create mode 100644 libstdc++-v3/include/bits/windows_zones-map.h
  create mode 100644 libstdc++-v3/scripts/gen_windows_zones_map.py

diff --git a/libstdc++-v3/include/bits/windows_zones-map.h 
b/libstdc++-v3/include/bits/windows_zones-map.h
new file mode 100644
index 000..7be736b063d
--- /dev/null
+++ b/libstdc++-v3/include/bits/windows_zones-map.h
@@ -0,0 +1,407 @@
+// Generated by scripts/gen_windows_zones_map.py, do not edit.
+
+// Copyright The GNU Toolchain Authors.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// Under Section 7 of GPL version 3, you are granted additional
+// permissions described in the GCC Runtime Library Exception, version
+// 3.1, as published by the Free Software Foundation.
+
+// You should have received a copy of the GNU General Public License and
+// a copy of the GCC Runtime Library Exception along with this program;
+// see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+// .
+
+/** @file bits/windows_zones-map.h
+ *  This is an internal header file, included by other library headers.
+ *  Do not attempt to use it directly. @headername{chrono}
+ */
+
+#ifndef _GLIBCXX_GET_WINDOWS_ZONES_MAP
+# error "This is not a public header, do not include it directly"
+#endif
+
+struct windows_zone_map_entry
+{
+  wstring_view windows_name;
+  wstring_view territory;
+  string_view iana_name;
+};
+
+static constexpr array windows_zone_map{
+  {
+{L"AUS Central Standard Time", L"001", "Australia/Darwin"},
+{L"AUS Eastern Standard Time", L"001", "Australia/Sydney"},
+{L"Afghanistan Standard Time", L"001", "Asia/Kabul"},
+{L"Alaskan Standard Time", L"001", "America/Anchorage"},
+{L"Aleutian Standard Time", L"001", "America/Adak"},
+{L"Altai Standard Time", L"001", "Asia/Barnaul"},
+{L"Arab Standard Time", L"001", "Asia/Riyadh"},
+{L"Arab Standard Time", L"BH", "Asia/Bahrain"},
+{L"Arab Standard Time", L"KW", "Asia/Kuwait"},
+{L"Arab Standard Time", L"QA", "Asia/Qatar"},
+{L"Arab Standard Time", L"YE", "Asia/Aden"},
+{L"Arabian Standard Time", L"001", "Asia/Dubai"},
+{L"Arabian Standard Time", L"OM", "Asia/Muscat"},
+{L"Arabian Standard Time", L"ZZ", "Etc/GMT-4"},
+{L"Arabic Standard Time", L"001", "Asia/Baghdad"},
+{L"Argentina Standard Time", L"001", "America/Buenos_Aires"},
+{L"Astrakhan Standard Time"

Re: [PATCH] c++: optional template after :: causing error [PR119838]

2025-07-09 Thread Jason Merrill


On 7/8/25 4:22 PM, Marek Polacek wrote:

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
Found while working on Reflection where we currently reject:

   constexpr auto r = ^^::template C::type;

which should work, because "::template C::" should match the

   nested-name-specifier template(opt) simple-template-id ::

production where the template is optional.  This bug is not limited
to Reflection as demonstrated by the attached test case, so I'm
submitting it separately.


Part of the problem is that cp_parser_nested_name_specifier_opt doesn't 
reflect the C++14 change to make :: a nested-name-specifier by itself. 
But that's a much larger change, and this patch seems like a fine 
workaround.  But please add a comment about the C++14 change to the 
function comment; OK with that tweak.



The check_template_keyword_in_nested_name_spec call should ensure that
we're dealing with a template-id if we've seen "template".

PR c++/119838

gcc/cp/ChangeLog:

* parser.cc (cp_parser_nested_name_specifier_opt): New global_p
parameter.  Look for "template" when global_p is true.
(cp_parser_simple_type_specifier): Pass global_p to
cp_parser_nested_name_specifier_opt.

gcc/testsuite/ChangeLog:

* g++.dg/parse/template32.C: New test.
---
  gcc/cp/parser.cc| 32 +++--
  gcc/testsuite/g++.dg/parse/template32.C | 13 ++
  2 files changed, 33 insertions(+), 12 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/parse/template32.C

diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index 32c6a42b31d..70c670a6f1c 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -2519,7 +2519,7 @@ static cp_expr cp_parser_id_expression
  static cp_expr cp_parser_unqualified_id
(cp_parser *, bool, bool, bool, bool);
  static tree cp_parser_nested_name_specifier_opt
-  (cp_parser *, bool, bool, bool, bool, bool = false);
+  (cp_parser *, bool, bool, bool, bool, bool = false, bool = false);
  static tree cp_parser_nested_name_specifier
(cp_parser *, bool, bool, bool, bool);
  static tree cp_parser_qualifying_entity
@@ -7242,18 +7242,19 @@ check_template_keyword_in_nested_name_spec (tree name)
   nested-name-specifier template [opt] simple-template-id ::
  
 PARSER->SCOPE should be set appropriately before this function is

-   called.  TYPENAME_KEYWORD_P is TRUE if the `typename' keyword is in
-   effect.  TYPE_P is TRUE if we non-type bindings should be ignored
-   in name lookups.
+   called.  TYPENAME_KEYWORD_P is true if the `typename' keyword is in
+   effect.  TYPE_P is true if we non-type bindings should be ignored
+   in name lookups.  TEMPLATE_KEYWORD_P is true if the `template' keyword
+   was seen.  GLOBAL_P is true if `::' has already been parsed.
  
 Sets PARSER->SCOPE to the class (TYPE) or namespace

 (NAMESPACE_DECL) specified by the nested-name-specifier, or leaves
 it unchanged if there is no nested-name-specifier.  Returns the new
 scope iff there is a nested-name-specifier, or NULL_TREE otherwise.
  
-   If CHECK_DEPENDENCY_P is FALSE, names are looked up in dependent scopes.

+   If CHECK_DEPENDENCY_P is false, names are looked up in dependent scopes.
  
-   If IS_DECLARATION is TRUE, the nested-name-specifier is known to be

+   If IS_DECLARATION is true, the nested-name-specifier is known to be
 part of a declaration and/or decl-specifier.  */
  
  static tree

@@ -7262,7 +7263,8 @@ cp_parser_nested_name_specifier_opt (cp_parser *parser,
 bool check_dependency_p,
 bool type_p,
 bool is_declaration,
-bool template_keyword_p /* = false */)
+bool template_keyword_p /* = false */,
+bool global_p /* = false */)
  {
bool success = false;
cp_token_position start = 0;
@@ -7310,8 +7312,9 @@ cp_parser_nested_name_specifier_opt (cp_parser *parser,
  
/* Spot cases that cannot be the beginning of a

 nested-name-specifier.  On the second and subsequent times
-through the loop, we look for the `template' keyword.  */
-  if (success && token->keyword == RID_TEMPLATE)
+(or the first, if '::' has already been parsed) through the
+loop, we look for the `template' keyword.  */
+  if ((success || global_p) && token->keyword == RID_TEMPLATE)
;
/* A template-id can start a nested-name-specifier.  */
else if (token->type == CPP_TEMPLATE_ID)
@@ -7359,8 +7362,11 @@ cp_parser_nested_name_specifier_opt (cp_parser *parser,
cp_parser_parse_tentatively (parser);
  
/* Look for the optional `template' keyword, if this isn't the

-first time through the loop.  */
-  if (success)
+first time through the loop, or if we've already parsed '::';
+this is then th

Re: [PATCH v5 16/24] c/c++: Add target_[version/clones] to decl diagnostics formatting.

2025-07-09 Thread Jason Merrill


On 5/29/25 8:52 AM, Alfie Richards wrote:

Adds the target_version and target_clones attributes to diagnostic messages
for target_version semantics.

This is because the target_version/target_clones attributes affect the identity
of the decls, so need to be represented in diagnostics for them.

After this change diagnostics look like:

```
test.c:5:7: error: redefinition of ‘[[target_version("sve")]] foo’
 5 | float foo () {return 2;}
   |   ^~~
test.c:2:7: note: previous definition of ‘[[target_version("sve")]] foo’ with 
type ‘float(void)’
 2 | float foo () {return 1;}
```


It would be better to print this information after the identifier, to 
match standard attribute syntax; attributes generally appertain to the 
thing immediately to their left.  Except attributes at the very 
beginning of the declaration, but that doesn't apply here because we 
only see the identifier.



This only affects targets which use target_version (aarch64 and riscv).

gcc/c-family/ChangeLog:

* c-pretty-print.cc (pp_c_function_target_version): New function.
(pp_c_function_target_clones): New function.
* c-pretty-print.h (pp_c_function_target_version): New function.
(pp_c_function_target_clones): New function.

gcc/c/ChangeLog:

* c-objc-common.cc (c_tree_printer): Add printing of target_clone and
target_version in decl diagnostics.

gcc/cp/ChangeLog:

* cxx-pretty-print.h (pp_cxx_function_target_version): New macro.
(pp_cxx_function_target_clones): Ditto.
* error.cc (dump_function_decl): Add printing of target_clone and
target_version in decl diagnostics.
---
  gcc/c-family/c-pretty-print.cc | 65 ++
  gcc/c-family/c-pretty-print.h  |  2 ++
  gcc/c/c-objc-common.cc |  6 
  gcc/cp/cxx-pretty-print.h  |  4 +++
  gcc/cp/error.cc|  3 ++
  5 files changed, 80 insertions(+)

diff --git a/gcc/c-family/c-pretty-print.cc b/gcc/c-family/c-pretty-print.cc
index fad6b5eb9b0..1cb71d99b16 100644
--- a/gcc/c-family/c-pretty-print.cc
+++ b/gcc/c-family/c-pretty-print.cc
@@ -36,6 +36,7 @@ along with GCC; see the file COPYING3.  If not see
  #include "function.h"
  #include "basic-block.h"
  #include "gimple.h"
+#include "tm.h"
  
  /* The pretty-printer code is primarily designed to closely follow

 (GNU) C and C++ grammars.  That is to be contrasted with spaghetti
@@ -3054,6 +3055,70 @@ pp_c_tree_decl_identifier (c_pretty_printer *pp, tree t)
pp_c_identifier (pp, name);
  }
  
+/* Prints "[version: VERSION]" for a versioned function decl.

+   This only works for target_version.  */
+void
+pp_c_function_target_version (c_pretty_printer *pp, tree t)
+{
+  if (TARGET_HAS_FMV_TARGET_ATTRIBUTE)
+return;
+
+  string_slice version = get_target_version (t);
+  if (!version.is_valid ())
+return;
+
+  pp_c_left_bracket (pp);
+  pp_c_left_bracket (pp);
+  pp_string (pp, "target_version");
+  pp_c_left_paren (pp);
+  pp_doublequote (pp);
+  pp_string_n (pp, version.begin (), version.size ());
+  pp_doublequote (pp);
+  pp_c_right_paren (pp);
+  pp_c_right_bracket (pp);
+  pp_c_right_bracket (pp);
+  pp_c_whitespace (pp);
+}
+
+/* Prints "[clones: VERSION, +]" for a versioned function decl.
+   This only works for target_version.  */
+void
+pp_c_function_target_clones (c_pretty_printer *pp, tree t)
+{
+  /* Only print for target_version semantics.
+ This is because for target FMV semantics a target_clone always defines
+ the entire FMV set.  target_version semantics can mix target_clone and
+ target_version decls in the definition of a FMV set and so the
+ target_clone becomes a part of the identity of the declaration.  */
+  if (TARGET_HAS_FMV_TARGET_ATTRIBUTE)
+return;
+
+  auto_vec versions = get_clone_versions (t, NULL, false);
+  if (versions.is_empty ())
+return;
+
+  string_slice final_version = versions.pop ();
+  pp_c_left_bracket (pp);
+  pp_c_left_bracket (pp);
+  pp_string (pp, "target_clones");
+  pp_c_left_paren (pp);
+  for (string_slice version : versions)
+{
+  pp_doublequote (pp);
+  pp_string_n (pp, version.begin (), version.size ());
+  pp_doublequote (pp);
+  pp_string (pp, ",");
+  pp_c_whitespace (pp);
+}
+  pp_doublequote (pp);
+  pp_string_n (pp, final_version.begin (), final_version.size ());
+  pp_doublequote (pp);
+  pp_c_right_paren (pp);
+  pp_c_right_bracket (pp);
+  pp_c_right_bracket (pp);
+  pp_c_whitespace (pp);
+}
+
  #if CHECKING_P
  
  namespace selftest {

diff --git a/gcc/c-family/c-pretty-print.h b/gcc/c-family/c-pretty-print.h
index c8fb6789991..5dc1cdff513 100644
--- a/gcc/c-family/c-pretty-print.h
+++ b/gcc/c-family/c-pretty-print.h
@@ -138,6 +138,8 @@ void pp_c_ws_string (c_pretty_printer *, const char *);
  void pp_c_identifier (c_pretty_printer *, const char *);
  void pp_c_string_literal (c_pretty_printer *, tree);
  void pp_c_integer_constant (c_pretty_printer *, tree);
+void pp_c_

[PATCH v3] libstdc++: Simplify __uninitialized_default and __uninitialized_default_n

2025-07-09 Thread Jonathan Wakely

With improved memset optimizations in std::uninitialized_fill and
std::uninitialized_fill_n (see r15-4473-g3abe751ea86e34), we can make
the non-standard internal helpers __uninitialized_default and
__uninitialized_default_n use those directly instead of using std::fill
and std::fill_n respectively. And if we do that, we no longer need to
check whether the type is assignable, because avoiding std::fill means
no assignment happens.

If the type being constructed is trivially default constructible and
trivially copy constructible, then it's unobservable if we construct one
object and copy it N-1 times, rather than constructing N objects. For
byte-sized integer types this allows the loop to be replaced with
memset.

Because these functions are not defined for C++98 at all, we can use
if-constexpr to simplify them and remove the dispatching to members of
class template specializations.

By removing the uses of std::fill and std::fill_n we no longer need to
include stl_algobase.h in stl_uninitialized.h which might improve
compilation time for some other parts of the library.

libstdc++-v3/ChangeLog:

* include/bits/stl_uninitialized.h: Do not include
bits/stl_algobase.h.
(__uninitialized_default_1, __uninitialized_default_n_1):
Remove.
(__uninitialized_default, __uninitialized_default_n): Use
'if constexpr' and only consider triviality constructibility
not assignability when deciding on the algorithm to use.

Reviewed-by: Tomasz Kamiński 
---

No changes since [PATCH v2], it's just been rebased because some of the
removed code was changed slightly on trunk so the PATCH v2 no longer
applies cleanly.

This still causes:
FAIL: g++.dg/pr104547.C -std=gnu++17  scan-tree-dump-not vrp2 
"_M_default_append"
and I don't know why the assembly contains _M_default_append even though
it appears to be unused. This is beyond my ability to inspect optimized
tree dumps so I need some help here.

 libstdc++-v3/include/bits/stl_uninitialized.h | 133 +-
 1 file changed, 34 insertions(+), 99 deletions(-)

diff --git a/libstdc++-v3/include/bits/stl_uninitialized.h 
b/libstdc++-v3/include/bits/stl_uninitialized.h
index 351c3a17457f..51974ad2b2a6 100644
--- a/libstdc++-v3/include/bits/stl_uninitialized.h
+++ b/libstdc++-v3/include/bits/stl_uninitialized.h
@@ -60,7 +60,6 @@
 # include 
 # include   // to_address
 # include // pair
-# include // fill, fill_n
 #endif
 
 #include  // __is_pointer
@@ -849,98 +848,31 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   // Extensions: __uninitialized_default, __uninitialized_default_n,
   // __uninitialized_default_a, __uninitialized_default_n_a.
 
-  template
-struct __uninitialized_default_1
-{
-  template
-_GLIBCXX26_CONSTEXPR
-static void
-__uninit_default(_ForwardIterator __first, _ForwardIterator __last)
-{
- _UninitDestroyGuard<_ForwardIterator> __guard(__first);
- for (; __first != __last; ++__first)
-   std::_Construct(std::addressof(*__first));
- __guard.release();
-   }
-};
-
-  template<>
-struct __uninitialized_default_1
-{
-  template
-_GLIBCXX26_CONSTEXPR
-static void
-__uninit_default(_ForwardIterator __first, _ForwardIterator __last)
-{
- if (__first == __last)
-   return;
-
- typename iterator_traits<_ForwardIterator>::value_type* __val
-   = std::addressof(*__first);
- std::_Construct(__val);
- if (++__first != __last)
-   std::fill(__first, __last, *__val);
-   }
-};
-
-  template
-struct __uninitialized_default_n_1
-{
-  template
-   _GLIBCXX20_CONSTEXPR
-static _ForwardIterator
-__uninit_default_n(_ForwardIterator __first, _Size __n)
-{
- _UninitDestroyGuard<_ForwardIterator> __guard(__first);
- for (; __n > 0; --__n, (void) ++__first)
-   std::_Construct(std::addressof(*__first));
- __guard.release();
- return __first;
-   }
-};
-
-  template<>
-struct __uninitialized_default_n_1
-{
-  template
-   _GLIBCXX20_CONSTEXPR
-static _ForwardIterator
-__uninit_default_n(_ForwardIterator __first, _Size __n)
-{
- if (__n > 0)
-   {
- typename iterator_traits<_ForwardIterator>::value_type* __val
-   = std::addressof(*__first);
- std::_Construct(__val);
- ++__first;
- __first = std::fill_n(__first, __n - 1, *__val);
-   }
- return __first;
-   }
-};
-
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Wc++17-extensions"
   // __uninitialized_default
   // Fills [first, last) with value-initialized value_types.
   template
 _GLIBCXX20_CONSTEXPR
 inline void
-__uninitialized_default(_ForwardIterator __first,
-   _ForwardIterator __last)
+__uniniti

Re: [PATCH] tail-call: Allow tail recusion for classes with RVO (TREE_ADDRESSABLE set) [PR120871]

2025-07-09 Thread Jeff Law





On 7/1/25 10:13 PM, Andrew Pinski wrote:

With struct returns, we normally get a decl on the LHS of the call expression
that will be tail called and we can match things up there easy.
With TREE_ADDRESSABLE set on the type, things get more complex.
Instead we get:
```
   *_6(D) = get_s (1); [return slot optimization]
...
   return _6(D);
```

So we have to match _6 as being the ssa name for the result decl, make sure RSO 
is set
and match MEM_REF with a zero offset if we want to do tail calls.

This is also the first step in allowing tail calls in this case too;  I will 
expand
the patch for PR71761 to handle the taill call later on.

Bootstrapped and tested on x86_64-linux-gnu and aarch64-linux-gnu.

PR tree-optimization/120871

gcc/ChangeLog:

* tree-tailcall.cc (find_tail_calls): Allow a MEM_REF
with a zero offset with RSO set on the call and with
the MEM_REF is of the result decl default definition.

gcc/testsuite/ChangeLog:

* g++.dg/opt/tail-call-1.C: New test.
Do you have to worry about statements between the call and return 
statements?  ISTM that to get a tail call all those statements would 
have be eliminated.


Or is this just about getting things marked so that we have a fighting 
chance to realize a tail call optimization later?


Jeff

Re: Basic fusions in RISC-V generic tuning model

2025-07-09 Thread Andrew Waterman

For statically scheduled superscalars that don't perform fusion, which
is probably the common choice for statically scheduled designs, this
change will generally be a deoptimization.  For dynamically scheduled
designs that don't perform fusion, it's probably more or less neutral.
Not sure how these facts should drive the decision; just pointing them
out.

On Wed, Jul 9, 2025 at 6:28 AM Jeff Law  wrote:
>
> One thing I forgot to bring up in the patchwork meeting yesterday.
>
> Philip or Craig asked if we should add the most basic fusions to the
> generic tuning models for the two toolchains.
>
> I'm generally in favor of making that kind of change.  I don't think
> anyone believes it'd be a major performance driver, but it does slightly
> reduce the search space when we do need to chase things down.
>
> lui/auipc+addi would fall into that set.  It's unclear if any others would.
>
> Thoughts?
>
> Jeff
>

1 2 >

1 - 100 of 134 matches

Mail list logo