[PATCH] rtl-ssa: Maintain clobber_group invariant [PR121757]

2025-09-05 Thread Richard Sandiford
In order to reduce time complexity, rtl-ssa groups consecutive clobbers together. Each group of clobbers has a splay tree for lookup and manipulation purposes. This arrangement means that we might need to split a group (when inserting a new non-clobber definition between two clobbers) or to join

Re: [PATCH 1/3]middle-end: clear the user unroll flag if the cost model has overriden it

2025-09-04 Thread Richard Sandiford
Richard Biener writes: > On Wed, 3 Sep 2025, Richard Sandiford wrote: > >> Tamar Christina writes: >> > We also don't ever force unrolling for predicated SVE because for >> > predicated SVE we have to balance predicate throughput limitations >> > of a

Re: [PATCH 1/3]middle-end: clear the user unroll flag if the cost model has overriden it

2025-09-03 Thread Richard Sandiford
Tamar Christina writes: >> -Original Message- >> From: Richard Biener >> Sent: Tuesday, September 2, 2025 1:44 PM >> To: Tamar Christina >> Cc: gcc-patches@gcc.gnu.org; nd >> Subject: Re: [PATCH 1/3]middle-end: clear the user unroll flag if the >> cost model has >> overriden it >> >> O

Re: [PATCH] MAINTAINERS: Update my email address and stand down as AArch64 maintainer

2025-08-22 Thread Richard Sandiford
Claudiu Zissulescu Ianculescu writes: > Hi Richard, > > On Thu, Aug 21, 2025 at 9:55 AM Richard Sandiford > wrote: >> >> Today is my last working day at Arm, so this patch switches my >> MAINTAINERS entries to my personal email address. (It turns out >> th

Re: [PATCH v4 5/9] targhooks: add TARGET_MEMTAG_COMPOSE_OFFSET_TAG

2025-08-22 Thread Richard Sandiford
Claudiu Zissulescu-Ianculescu writes: > Hi, > >>> +DEFHOOK >>> +(compose_offset_tag, >>> + "Return an RTX that represnts the result of composing >>> @var{tag_offset} with\n\ >>> +the base tag @var{base_tag}.\n\ >>> +The default of this hook is to byte add @var{tag_offset} to >>> @var{base_tag}.",

Re: [PATCH v4 9/9] aarch64: Add memtag-stack tests

2025-08-22 Thread Richard Sandiford
claudiu.zissulescu-iancule...@oracle.com writes: > diff --git a/gcc/testsuite/gcc.target/aarch64/memtag/basic-1.c > b/gcc/testsuite/gcc.target/aarch64/memtag/basic-1.c > new file mode 100644 > index 000..70b790c6c3e > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/memtag/basic-1.c

Re: [PATCH v4 7/9] asan: memtag-stack add support for MTE instructions

2025-08-22 Thread Richard Sandiford
claudiu.zissulescu-iancule...@oracle.com writes: > From: Claudiu Zissulescu > > Memory tagging is used for detecting memory safety bugs. On AArch64, the > memory tagging extension (MTE) helps in reducing the overheads of memory > tagging: > - CPU: MTE instructions for efficiently tagging and unt

Re: [PATCH v4 5/9] targhooks: add TARGET_MEMTAG_COMPOSE_OFFSET_TAG

2025-08-22 Thread Richard Sandiford
claudiu.zissulescu-iancule...@oracle.com writes: > From: Claudiu Zissulescu > > Add a new target hook TARGET_MEMTAG_COMPOSE_OFFSET_TAG to perform > addition between two tags. > > The default of this hook is to byte add the inputs. > > Hardware-assisted sanitizers on architectures providing instruc

Re: [PATCH v4 6/9] asan: add new memtag sanitizer

2025-08-22 Thread Richard Sandiford
claudiu.zissulescu-iancule...@oracle.com writes: > From: Indu Bhagat > > Add new command line option -fsanitize=memtag-stack with the following > new params: > --param memtag-instrument-alloca [0,1] (default 1) to use MTE insns > for enabling dynamic checking of stack allocas. > > Along with the n

Re: [PATCH v4 1/9] targhooks: i386: rename TAG_SIZE to TAG_BITSIZE

2025-08-22 Thread Richard Sandiford
claudiu.zissulescu-iancule...@oracle.com writes: > From: Indu Bhagat > > gcc/Changelog: > > * asan.h (HWASAN_TAG_SIZE): Use targetm.memtag.tag_bitsize. > * config/i386/i386.cc (ix86_memtag_tag_size): Rename to > ix86_memtag_tag_bitsize. > (TARGET_MEMTAG_TAG_SIZE): Renamed t

Re: [PATCH] rtl-ssa: Add missing live-out uses [PR121619]

2025-08-21 Thread Richard Sandiford
Richard Biener writes: >> Am 21.08.2025 um 16:56 schrieb Richard Sandiford : >> >> This PR is another bug in the rtl-ssa code to manage live-out uses. >> It seems that this didn't get much coverage until recently. >> >> In the testcase, late-combine fir

[PATCH] rtl-ssa: Add missing live-out uses [PR121619]

2025-08-21 Thread Richard Sandiford
This PR is another bug in the rtl-ssa code to manage live-out uses. It seems that this didn't get much coverage until recently. In the testcase, late-combine first removed a register-to-register move by substituting into all uses, some of which were in other EBBs. This was done after checking make

Re: [PATCH 2/2]AArch64: extend cost model to cost outer loop vect where the inner loop is invariant [PR121290]

2025-08-21 Thread Richard Sandiford
Tamar Christina writes: > Consider the example: > > void > f (int *restrict x, int *restrict y, int *restrict z, int n) > { > for (int i = 0; i < 4; ++i) > { > int res = 0; > for (int j = 0; j < 100; ++j) > res += y[j] * z[i]; > x[i] = res; > } > } > > we curren

Re: [PATCH 1/2]AArch64: Fix costing of scalar throughput based calculation for inner loops [PR121290]

2025-08-21 Thread Richard Sandiford
Richard Biener writes: > On Wed, 20 Aug 2025, Richard Sandiford wrote: > >> Tamar Christina writes: >> > To avoid double counting scalar instructions when doing inner-loop costing >> > the >> > vectorizer uses vect_prologue as the kind instead of vect_bod

[PATCH] MAINTAINERS: Update my email address and stand down as AArch64 maintainer

2025-08-20 Thread Richard Sandiford
ssner Jason Merrill David S. Miller Joseph Myers -Richard Sandiford +Richard Sandiford Bernd Sc

Re: [PATCH 1/2]AArch64: Fix costing of scalar throughput based calculation for inner loops [PR121290]

2025-08-20 Thread Richard Sandiford
Tamar Christina writes: > To avoid double counting scalar instructions when doing inner-loop costing the > vectorizer uses vect_prologue as the kind instead of vect_body. > > However doing this results in our throughput based costing to think the scalar > loop issues in 0 cycles (rounded up to 1.0

Re: [PATCH] rtl-ssa: Fix thinko when adding live-out uses

2025-08-20 Thread Richard Sandiford
Richard Biener writes: > On Wed, Aug 20, 2025 at 1:03 PM Richard Sandiford > wrote: >> >> While testing a later patch, I found that create_degenerate_phi >> had an inverted test for bitmap_set_bit. It was assuming that >> the return value was the previous bit valu

[PATCH] Merge aarch64-cc-fusion into late-combine

2025-08-20 Thread Richard Sandiford
I'd added the aarch64-specific CC fusion pass to fold a PTEST instruction into the instruction that feeds the PTEST, in cases where the latter instruction can set the appropriate flags as a side-effect. Combine does the same optimisation. However, as explained in the comments, the PTEST case ofte

[PATCH] rtl-ssa: Fix thinko when adding live-out uses

2025-08-20 Thread Richard Sandiford
While testing a later patch, I found that create_degenerate_phi had an inverted test for bitmap_set_bit. It was assuming that the return value was the previous bit value, rather than a "something changed" value. :( Also, the call to add_live_out_use shouldn't be conditional on the DF_LR_OUT opera

[PATCH] rtl-ssa: Add a find_uses function

2025-08-20 Thread Richard Sandiford
rtl-ssa already has a find_def function for finding the definition of a particular resource (register or memory) at a particular point in the program. This patch adds a similar function for looking up uses. Both functions have amortised logarithmic complexity. Tested on aarch64-linux-gnu, powerp

[GCC 15] aarch64: Fix mode mismatch when building a predicate [PR121118]

2025-08-18 Thread Richard Sandiford
This PR is about a case where we used aarch64_expand_sve_const_pred_trn to combine two predicates, one of which was constructing using aarch64_sve_move_pred_via_while. The former requires the inputs to have mode VNx16BI, but the latter returned VNx8BI for a .H WHILELO. The proper fix, used on tru

Re: [PATCH] gcse: Fix handling of partial clobbers [PR97497]

2025-08-18 Thread Richard Sandiford
Richard Biener writes: > On Mon, Aug 18, 2025 at 10:51 AM Richard Sandiford > wrote: >> >> This patch fixes an internal disagreement in gcse about how to >> handle partial clobbers. Like many passes, gcse doesn't track >> the modes of live values, so i

[PATCH] gcse: Fix handling of partial clobbers [PR97497]

2025-08-18 Thread Richard Sandiford
This patch fixes an internal disagreement in gcse about how to handle partial clobbers. Like many passes, gcse doesn't track the modes of live values, so if a call clobbers only part of a register, the pass has to make conservative assumptions. As the comment in the patch says, this means: (1) ig

[pushed] testsuite: Add a test for [PR119156]

2025-08-15 Thread Richard Sandiford
PR119156 was fixed by g:f702b593e7268ab161053bafd097f1b09933b783. This patch adds a test for it. Tested on aarch64-linux-gnu & pushed as (hopefully) obvious. Richard gcc/testsuite/ PR target/119156 * gcc.target/aarch64/sve/pr119156_1.c: New test. --- gcc/testsuite/gcc.target/aa

Re: [PATCH] RISC-V: Allow errors to be suppressed when parsing architectures

2025-08-15 Thread Richard Sandiford
please let me know if you see any fallout. Richard > Thanks :) > > On Thu, Aug 14, 2025 at 5:19 PM Richard Sandiford > wrote: >> >> One of Alfie's FMV patches adds a hook that, in some cases, >> is used to silently query a target_version (with no diagnostics >

Re: [PATCH v9 13/13] FMV: Redirect to specific target

2025-08-15 Thread Richard Sandiford
LGTM, but a minor suggestion below: writes: > @@ -423,61 +423,100 @@ expand_target_clones (struct cgraph_node *node, bool > definition) >return true; > } > > -/* When NODE is a target clone, consider all callees and redirect > - to a clone with equal target attributes. That prevents mu

Re: [PATCH v9 04/13] fmv: Add check_target_clone hook for filtering target_clone versions.

2025-08-15 Thread Richard Sandiford
writes: > From: Alfie Richards > > This patch introduces the TARGET_CHECK_TARGET_CLONE_VERSION hook > which is used to determine if a target_clones version string parses. > > The hook has a flag to enable emitting diagnostics. > > This is as specified in the Arm C Language Extension. The purpose

Re: [PATCH][13 BACKPORT] AArch64: Fix invalid immediate offsets in SVE gather/scatter [PR121449]

2025-08-15 Thread Richard Sandiford
Pengfei Li writes: > Hi, > > This patch backports the fix of r16-3083 to gcc-13. > > Compared to the trunk version, this is slightly different because those RTL > patterns in gcc-13 do not yet use the compact syntax for multiple > alternatives. > But this patch is functionally identical to the tr

Re: [PATCH v2] AArch64: Fix invalid immediate offsets in SVE gather/scatter [PR121449]

2025-08-15 Thread Richard Sandiford
Pengfei Li writes: >> I’d also ask for a slightly more descriptive sentence like “Use vg >> constraint for alternative so-and-so”. >> Ok to push whatever reword you come up with. > > This has been committed to trunk as r16-3083 for about one week. I wonder if I > could consider backporting it now

Re: [PATCH] Remove MODE_COMPOSITE_P test from simplify_gen_subreg [PR120718]

2025-08-14 Thread Richard Sandiford
Richard Biener writes: > On Thu, Aug 7, 2025 at 2:14 PM Richard Sandiford > wrote: >> >> simplify_gen_subreg rejected subregs of literal constants if >> MODE_COMPOSITE_P. This was added by the fix for PR96648 in >> g:c0f772894b6b3cd8ed5c5dd09d0c7917f51cf70f.

[pushed] powerpc: Add missing modes to P9 if_then_elses [PR121501]

2025-08-14 Thread Richard Sandiford
These patterns had one (if_then_else ...) nested within another. The outer if_then_else had SImode, which means that the "then" and "else" should also be SImode (unless they're const_ints). However, the inner if_then_else was modeless, which led to an assertion failure when trying to take a subreg

[PATCH] RISC-V: Allow errors to be suppressed when parsing architectures

2025-08-14 Thread Richard Sandiford
One of Alfie's FMV patches adds a hook that, in some cases, is used to silently query a target_version (with no diagnostics expected). In the review, I'd suggested handling this using a location_t *, with null meaning "suppress diagnostics": https://gcc.gnu.org/pipermail/gcc-patches/2025-Augus

Re: [PATCH] AArch64: Add isinf expander [PR 66462]

2025-08-13 Thread Richard Sandiford
Wilco Dijkstra writes: > Add an expander for isinf using integer arithmetic. This is > typically faster and avoids generating spurious exceptions on > signaling NaNs. > > int isinf1 (float x) { return __builtin_isinf (x); } > > Before: > fabss0, s0 > mov w0, 2139095039 >

Re: [PATCH v4 8/9] aarch64: Add support for memetag-stack sanitizer using MTE insns

2025-08-13 Thread Richard Sandiford
Claudiu Zissulescu-Ianculescu writes: >>> + [(set (match_operand:TI 0 "aarch64_granule16_memory_operand" "=Umg") >>> + (unspec:TI >>> +[(match_operand:TI 1 "aarch64_granule16_memory_operand" "Umg") >>> + (match_operand:DI 2 "register_operand" "rk")] >>> +UNSPEC_TAG_SP

Re: [RFC PATCH v2 0/6] aarch64: Support for user-defined aarch64 tuning parameters in JSON

2025-08-13 Thread Richard Sandiford
writes: > From: Soumya AR > > Hi, > > This RFC is a continuation of previous patches sent here: > https://gcc.gnu.org/pipermail/gcc-patches/2025-May/682702.html > > As suggested in the earlier thread, I've now added a python script to generete > the printing and parsing routines for the JSON tuni

Re: [PATCH v4 8/9] aarch64: Add support for memetag-stack sanitizer using MTE insns

2025-08-13 Thread Richard Sandiford
claudiu.zissulescu-iancule...@oracle.com writes: > [...] > /* Implement TARGET_MEMTAG_CAN_TAG_ADDRESSES. Here we tell the rest of the > compiler that we automatically ignore the top byte of our pointers, which > - allows using -fsanitize=hwaddress. */ > + allows using -fsanitize=hwaddres

Re: [PATCH v4 4/9] aarch64: add new constants for MTE insns

2025-08-13 Thread Richard Sandiford
claudiu.zissulescu-iancule...@oracle.com writes: > From: Indu Bhagat > > Define new constants to be used by the MTE pattern definitions. > > gcc/ > > * config/aarch64/aarch64.md (MEMTAG_TAG_MASK): New define > constant. > (MEMTAG_ADDR_MASK): Likewise. > (irg, subp, ldg): Us

Re: [PATCH v4 3/9] target-insns.def: (tag_memory) New pattern.

2025-08-13 Thread Richard Sandiford
claudiu.zissulescu-iancule...@oracle.com writes: > From: Claudiu Zissulescu > > Add a new target instruction. Hardware-assisted sanitizers on > architectures providing insstructions to tag/untag memory can then > make use of this new instruction pattern. For example, the > memtag-stack sanitizer u

Re: [PATCH v4 2/9] opts: use sanitize_code_type for sanitizer flags

2025-08-13 Thread Richard Sandiford
claudiu.zissulescu-iancule...@oracle.com writes: > From: Indu Bhagat > > Currently, the data type of sanitizer flags is unsigned int, with > SANITIZE_SHADOW_CALL_STACK (1UL << 31) being highest individual > enumerator for enum sanitize_code. Use 'sanitize_code_type' data type > to allow for more

[PATCH] fwprop: Don't propagate asms [PR121253]

2025-08-12 Thread Richard Sandiford
For the reasons explained in the comment, fwprop shouldn't even try to propagate an asm definition. Tested on aarch64-linux-gnu. Bordering on obvious, but just in case: OK to install? Richard gcc/ PR rtl-optimization/121253 * fwprop.cc (forward_propagate_into): Don't propagate

Re: [RFC PATCH v2 4/6] aarch64: Enable parsing of user-provided AArch64 CPU tuning parameters

2025-08-11 Thread Richard Sandiford
writes: > +/* Extract string value from JSON, returning allocated C string. */ > +char * > +extract_string (const json::value *val) > +{ > + if (auto *string_val = dyn_cast (val)) > +{ > + char *result = new char[string_val->get_length () + 1]; > + strcpy (result, string_val->get_s

Re: [RFC PATCH v2 2/6] aarch64: Enable dumping of AArch64 CPU tuning parameters to JSON

2025-08-11 Thread Richard Sandiford
writes: > +/* Mapping structure for enum-to-string conversion. */ > +template struct enum_mapping > +{ > + const char *name; > + EnumType value; > +}; > + > +static const enum_mapping > + autoprefetcher_model_mappings[] > + = {{"AUTOPREFETCHER_OFF", tune_params::AUTOPREFETCHER_OFF}, > +

Re: [RFC PATCH v2 6/6] aarch64: Script to auto generate JSON tuning routines

2025-08-11 Thread Richard Sandiford
writes: > From: Soumya AR > > This commit introduces a Python maintenance script that generates C++ code > for parsing and serializing AArch64 JSON tuning parameters based on the > schema defined in aarch64-json-schema.h. > > The script generates two include files: > - aarch64-json-tunings-pars

Re: [RFC PATCH v2 3/6] json: Add get_map() method to JSON object class

2025-08-11 Thread Richard Sandiford
writes: > From: Soumya AR > > This patch adds a get_map () method to the JSON object class to provide access > to the underlying hash map that stores the JSON key-value pairs. > > It also reorganizes the private and public sections of the class to expose the > map_t typedef, which is the return t

Re: [PATCH v2 12/13] aarch64: CMPBR branches must be invertable

2025-08-11 Thread Richard Sandiford
Richard Henderson writes: > On 8/8/25 21:18, Richard Sandiford wrote: >>> +(define_insn "*aarch64_cb" >>> + [(set (pc) (if_then_else >>> + (INT_CMP >>> + (match_operand:GPI 0 "register_operand" "r&qu

Re: [PATCH v2 05/13] aarch64: Fix gcs save/restore_stack_nonlocal

2025-08-11 Thread Richard Sandiford
Richard Henderson writes: > On 8/8/25 20:39, Richard Sandiford wrote: >> Richard Henderson writes: >>> The save/restore_stack_nonlocal patterns passed a DImode rtx >>> to gen_tbranch_neqi3 for a QImode compare. The tbranch expander >>> did not do what it

Re: [RFC PATCH v2 0/6] aarch64: Support for user-defined aarch64 tuning parameters in JSON

2025-08-11 Thread Richard Sandiford
writes: > From: Soumya AR > > Hi, > > This RFC is a continuation of previous patches sent here: > https://gcc.gnu.org/pipermail/gcc-patches/2025-May/682702.html > > As suggested in the earlier thread, I've now added a python script to generete > the printing and parsing routines for the JSON tuni

Re: [PATCH v8 13/13] FMV: Redirect to specific target

2025-08-08 Thread Richard Sandiford
writes: > From: Alfie Richards > > Adds an optimisation in FMV to redirect to a specific target if possible. > > A call is redirected to a specific target if both: > - the caller can always call the callee version > - and, it is possible to rule out all higher priority versions of the callee >

Re: [PATCH v8 05/13] fmv: Change target_version semantics to follow ACLE specification.

2025-08-08 Thread Richard Sandiford
OK for: writes: > gcc/cgraph.cc | 4 +- > gcc/cgraph.h | 2 + > gcc/cgraphunit.cc | 9 + > gcc/config/aarch64/aarch64.cc | 43 ++-- > gcc/ipa.cc

Re: [PATCH] aarch64: libgcc: Honor disable-werror [PR117600]

2025-08-08 Thread Richard Sandiford
Christophe Lyon writes: > In commit r15-4417-g71c7b446b98aa5, I made -werror mandatory when > building libgcc for aarch64. > > While it achieved its goal (make us fix problems unnoticed so far), > there has a been a lot of debate because it couldn't be disabled > easily. As discussed off-list: yo

Re: [PATCH v2 13/13] aarch64: Fix condition accepted by movcc

2025-08-08 Thread Richard Sandiford
Richard Henderson writes: > Reject QI/HImode conditions, which would require extension in > order to compare. Fixes > > z.c:10:1: error: unrecognizable insn: >10 | } > | ^ > (insn 23 22 24 2 (set (reg:CC 66 cc) > (compare:CC (reg:HI 128) > (reg:HI 127))) "z.c":6:6 -1

Re: [PATCH v2 00/13] aarch64: CMPBR fixes

2025-08-08 Thread Richard Sandiford
Richard Henderson writes: > Version 1 regressed the expansion of atomics, which means the addition > of CC clobber to all conditional branches is flawed. Version 2 goes > the other way: remove CC clobber from all conditional branches. > > This requires the out-of-range TBZ->TST+B.cond expansion b

Re: [PATCH v2 12/13] aarch64: CMPBR branches must be invertable

2025-08-08 Thread Richard Sandiford
Richard Henderson writes: > Restrict the immediate range to the intersection of LT/GE and GT/LE > so that cfglayout can invert the condition to redirect any branch. > > gcc: > * config/aarch64/aarch64.cc (aarch64_cb_rhs): Restrict the > range of LT/GE and GT/LE to their intersections.

Re: [PATCH v2 10/13] aarch64: Fix gcc.target/aarch64/cmpbr.c

2025-08-08 Thread Richard Sandiford
Richard Henderson writes: > The enable for the test was wrong, so it never ran. > > gcc/testsuite: > * gcc.target/aarch64/cmpbr.c: Use dg-require-effective-target. > --- > gcc/testsuite/gcc.target/aarch64/cmpbr.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/gcc/t

Re: [PATCH v2 05/13] aarch64: Fix gcs save/restore_stack_nonlocal

2025-08-08 Thread Richard Sandiford
Richard Henderson writes: > The save/restore_stack_nonlocal patterns passed a DImode rtx > to gen_tbranch_neqi3 for a QImode compare. The tbranch expander > did not do what it said on the tin, that is: emit TBNZ. > It only made it as far as AND+CMP+B.cond. Yeah, that was done to allow ifcombine

[PATCH] simplify-rtx: Distribute some non-narrowing subregs [PR121306]

2025-08-08 Thread Richard Sandiford
In g:965564eafb721f813a3112f1bba8d8fae32b I'd added code to try distributing non-widening subregs through logic ops, in cases where that would eliminate a term of the logic op. For "reasons", this indirectly caused combine to generate: (set (zero_extract:SI (reg/v:SI 101 [ a ]) (c

Re: [PATCH v8 04/13] fmv: Add check_target_clone hook for filtering target_clone versions.

2025-08-07 Thread Richard Sandiford
writes: > From: Alfie Richards > > This patch introduces the TARGET_CHECK_TARGET_CLONE_VERSION hook > which is used to determine if a target_clones version string parses. > > The hook has a flag to enable emitting diagnostics. > > This is as specified in the Arm C Language Extension. The purpose

Re: [PATCH v8 02/13] fmv: Refactor FMV name mangling.

2025-08-07 Thread Richard Sandiford
writes: > From: Alfie Richards > > This patch is an overhaul of how FMV name mangling works. Previously > mangling logic was duplicated in several places across both target > specific and independent code. This patch changes this such that all > mangling is done in targetm.mangle_decl_assembler_n

Re: [PATCH v8 01/13] Add clone_identifier function.

2025-08-07 Thread Richard Sandiford
writes: > From: Alfie Richards > > This is similar to clone_function_name and its siblings but takes an > identifier tree node rather than a function declaration. > > This is to be used in conjunction with the identifier node stored in > cgraph_function_version_info::assembler_name to mangle FMV

[PATCH] aarch64: Mark SME functions as .variant_pcs [PR121414]

2025-08-07 Thread Richard Sandiford
Unlike base PCS functions, __arm_streaming and __arm_streaming_compatible functions allow/require PSTATE.SM to be 1 on entry, so they need to be treated as STO_AARCH64_VARIANT_PCS. Similarly, functions that share ZA or ZT0 with their callers require ZA to be active on entry, whereas the base PCS r

[PATCH] Remove MODE_COMPOSITE_P test from simplify_gen_subreg [PR120718]

2025-08-07 Thread Richard Sandiford
simplify_gen_subreg rejected subregs of literal constants if MODE_COMPOSITE_P. This was added by the fix for PR96648 in g:c0f772894b6b3cd8ed5c5dd09d0c7917f51cf70f. Jakub said: As for the simplify_gen_subreg change, I think it would be desirable to just avoid creating SUBREGs of constants on

Re: [PATCH] x86: Add *one_cmplqi_ext_2

2025-08-07 Thread Richard Sandiford
Uros Bizjak writes: > On Tue, Aug 5, 2025 at 1:32 PM Richard Sandiford > wrote: >> It's coming from: >> >> (define_split >> [(set (match_operand:SWI 0 "register_operand") >> (any_rotate:SWI >> (match_o

Re: [PATCH] x86: Add *one_cmplqi_ext_2

2025-08-05 Thread Richard Sandiford
Richard Sandiford writes: > "H.J. Lu" writes: >> On Mon, Aug 4, 2025 at 3:28 PM H.J. Lu wrote: >>> >>> On Mon, Aug 4, 2025 at 2:04 PM H.J. Lu wrote: >>> > >>> > On Mon, Aug 4, 2025 at 8:50 AM Richard Sandiford >>> &g

[PATCH] i386: Extend recognition of high-reg rvalues [PR121306]

2025-08-05 Thread Richard Sandiford
The i386 high-register patterns used things like: (match_operator:SWI248 2 "extract_operator" [(match_operand 0 "int248_register_operand" "Q") (const_int 8) (const_int 8)]) to match an extraction of a high register such as AH from AX/EAX/RAX. This construct is used in con

Re: [PATCH 0/8] aarch64: CMPBR fixes

2025-08-05 Thread Richard Sandiford
Richard Henderson writes: > I have written patches for FEAT_CMPBR support in QEMU, and wanted to > test them out with gcc. The easiest way, seemed to be bootstrapping > gcc with cmpbr enabled. The attempt failed on stage1 libgcc. > > My bug report is target/121385. Pinski did some analyis, whic

Re: [PATCH 8/8] aarch64: Use cc when CB/CBB/CBH is out-of-range

2025-08-05 Thread Richard Sandiford
Richard Henderson writes: > Middle distance branches between 1KiB and 1MiB may be > implemented with cmp+branch instead of branch+branch. > > gcc: > * config/aarch64/aarch64.cc (*aarch64_cb): > Fall back to cmp/cmn + bcond if !far_branch. > Adjust far_branch to 1MiB. > (*aa

Re: [PATCH 6/8] aarch64: Add cc clobber to compare-and-branch patterns

2025-08-05 Thread Richard Sandiford
Richard Henderson writes: > Some of the compare-and-branch patterns rely on CC for scratch in some > of the alternative expansions. This is fine, because when the combined > compare-and-branch patterns are formed by combine, we will be eliminating > a write to CC, so CC is dead anyway. > > Standa

Re: [PATCH] x86: Add *one_cmplqi_ext_2

2025-08-05 Thread Richard Sandiford
"H.J. Lu" writes: > On Mon, Aug 4, 2025 at 3:28 PM H.J. Lu wrote: >> >> On Mon, Aug 4, 2025 at 2:04 PM H.J. Lu wrote: >> > >> > On Mon, Aug 4, 2025 at 8:50 AM Richard Sandiford >> > wrote: >> > > Sorry, I hadn't realised

Re: [PATCH] combine: Make extraction for ZERO_EXTRACT destination from LSHIFTRT

2025-08-04 Thread Richard Sandiford
"H.J. Lu" writes: > After > > commit 965564eafb721f813a3112f1bba8d8fae32b > Author: Richard Sandiford > Date: Tue Jul 29 15:58:34 2025 +0100 > > simplify-rtx: Simplify subregs of logic ops > > make_compound_operation_int gets >

Re: [PATCH] x86: Add *one_cmplqi_ext_2

2025-08-04 Thread Richard Sandiford
Uros Bizjak writes: > On Sat, Aug 2, 2025 at 8:56 PM H.J. Lu wrote: >> >> On Fri, Aug 1, 2025 at 10:32 PM Uros Bizjak wrote: >> > >> > On Sat, Aug 2, 2025 at 3:22 AM H.J. Lu wrote: >> > > >> > > After >> > > >>

Re: [PATCH] [aarch64] Make better use of overflowing operations in max/min(a, add/sub(a, b)) [PR116815]

2025-08-04 Thread Richard Sandiford
Dhruv Chawla writes: > On 01/08/25 22:10, Richard Sandiford wrote: >> External email: Use caution opening links or attachments >> >> >> Dhruv Chawla writes: >>> On 24/07/25 11:21, Andrew Pinski wrote: >>>> External email: Use caution opening li

Re: [PATCH v2 1/1] libiberty: add routines to handle type-sensitive doubly linked lists

2025-08-04 Thread Richard Sandiford
Matthieu Longo writes: > On 2025-08-04 11:33, Richard Sandiford wrote: >> Matthieu Longo writes: >>> On 2025-07-31 13:39, Jan Beulich wrote: >>>> On 09.07.2025 14:48, Matthieu Longo wrote: >>>>> Those methods's implementation is relying on duck

Re: [PATCH v2 1/1] libiberty: add routines to handle type-sensitive doubly linked lists

2025-08-04 Thread Richard Sandiford
Matthieu Longo writes: > On 2025-07-31 13:39, Jan Beulich wrote: >> On 09.07.2025 14:48, Matthieu Longo wrote: >>> Those methods's implementation is relying on duck-typing at compile >>> time. >>> The structure corresponding to the node of a doubly linked list needs >>> to define attributes 'prev'

Re: [PATCH v7 13/13] FMV: Redirect to specific target

2025-08-04 Thread Richard Sandiford
Alfie Richards writes: > Adds an optimisation in FMV to redirect to a specific target if possible. > > A call is redirected to a specific target if both: > - the caller can always call the callee version > - and, it is possible to rule out all higher priority versions of the callee > fmv set. Th

Re: [PATCH] [aarch64] Make better use of overflowing operations in max/min(a, add/sub(a, b)) [PR116815]

2025-08-01 Thread Richard Sandiford
Dhruv Chawla writes: > On 24/07/25 11:21, Andrew Pinski wrote: >> External email: Use caution opening links or attachments >> >> >> On Wed, Jul 23, 2025 at 10:16 PM wrote: >>> >>> From: Dhruv Chawla >>> >>> This patch folds the following patterns: >>> - max (a, add (a, b)) -> [sum, ovf] = adds

Re: [PATCH v7 08/13] fmv: Support mixing of target_clones and target_version.

2025-08-01 Thread Richard Sandiford
Alfie Richards writes: > Add support for a FMV set defined by a combination of target_clones and > target_version definitions. > > Additionally, change is_function_default_version to consider a function > declaration annotated with target_clones containing default to be a > default version. > > La

Re: [PATCH v7 02/13] fmv: Refactor FMV name mangling.

2025-08-01 Thread Richard Sandiford
Alfie Richards writes: > On 01/08/2025 11:46, Richard Sandiford wrote: >> Sorry, I think I missed the multiple_targets.cc changes in my >> previous review. >> >> Alfie Richards writes: >>> + >>> + t

Re: [PATCH v7 05/13] fmv: Change target_version semantics to follow ACLE specification.

2025-08-01 Thread Richard Sandiford
The target-independent and aarch64 bits mostly look good to me, but a few comments/questions: Alfie Richards writes: > diff --git a/gcc/cp/typeck.cc b/gcc/cp/typeck.cc > index a604511db71..4eb37f5818f 100644 > --- a/gcc/cp/typeck.cc > +++ b/gcc/cp/typeck.cc > @@ -4489,6 +4489,16 @@ cp_build_funct

Re: [PATCH v7 02/13] fmv: Refactor FMV name mangling.

2025-08-01 Thread Richard Sandiford
Sorry, I think I missed the multiple_targets.cc changes in my previous review. Alfie Richards writes: > diff --git a/gcc/multiple_target.cc b/gcc/multiple_target.cc > index d25277c0a93..44340cbc6a4 100644 > --- a/gcc/multiple_target.cc > +++ b/gcc/multiple_target.cc > @@ -313,7 +216,6 @@ create_t

Re: [PATCH 09/12] aarch64: Use VNx16BI for svpnext*

2025-08-01 Thread Richard Sandiford
Kyrylo Tkachov writes: >> On 29 Jul 2025, at 18:41, Richard Sandiford >> wrote: >> >> This patch continues the work of making ACLE intrinsics use VNx16BI >> for svbool_t results. It deals with the svpnext* intrinsics. >> > > I wonder if the new patte

Re: [PATCH v7 04/13] fmv: Add reject_target_clone hook for filtering target_clone versions.

2025-07-31 Thread Richard Sandiford
Alfie Richards writes: > diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi > index 5e305643b3a..253ea6dd77f 100644 > --- a/gcc/doc/tm.texi > +++ b/gcc/doc/tm.texi > @@ -12268,6 +12268,11 @@ function version at run-time for a given set of > function versions. > body must be generated. > @end deftyp

Re: [PATCH v7 02/13] fmv: Refactor FMV name mangling.

2025-07-31 Thread Richard Sandiford
FWIW, I agree with Jeff's comment in the v6 series against the duplication of is_valid_asm_symbol and create_new_asm_name. On the aarch64 bits: Alfie Richards writes: > @@ -20549,18 +20540,6 @@ aarch64_mangle_decl_assembler_name (tree decl, tree > id) > This is computed by taking the defaul

Re: [PATCH v2] aarch64: testsuite: Fix do-assemble tests for SME

2025-07-31 Thread Richard Sandiford
Spencer Abson writes: > GCC doesn't support SME without SVE2, so the -march=armv8-a+ argument to > check_no_compiler_messages causes aarch64_asm__ok to return zero for SME > and any that implies it. This patch changes the baseline architecure to > armv9-a for these extensions. > > The tests for

Re: [PATCH] aarch64: Improve svdupq_lane expension for big-endian [PR121293]

2025-07-30 Thread Richard Sandiford
Tamar Christina writes: >> -Original Message- >> From: Richard Sandiford >> Sent: Tuesday, July 29, 2025 1:43 PM >> To: gcc-patches@gcc.gnu.org >> Cc: Alex Coplan ; Alice Carlotti >> ; >> pins...@gmail.com; ktkac...@nvidia.com; Richard Earnsha

Re: [PATCH] aarch64: Use VNx16BI for more SVE WHILE* results [PR121118]

2025-07-30 Thread Richard Sandiford
Tamar Christina writes: >> -Original Message- >> From: Richard Sandiford >> Sent: Tuesday, July 29, 2025 4:33 PM >> To: gcc-patches@gcc.gnu.org >> Cc: Alex Coplan ; Alice Carlotti >> ; >> pins...@gmail.com; ktkac...@nvidia.com; Richard Earnsha

Re: [PATCH 06/12] aarch64: Use VNx16BI for floating-point svcmp*

2025-07-30 Thread Richard Sandiford
Kyrylo Tkachov writes: >> +(define_insn "*aarch64_pred_fcmuo_acle" >> + [(set (match_operand:VNx16BI 0 "register_operand") > > Looks like a “”=w” constraint is missing here. Argh! Thanks for catching that. I went through and checked for missing constraints in the other new patterns but it look

Re: [PATCH 2/2] aarch64: Use VNx16BI for svrev_b* [PR121294]

2025-07-30 Thread Richard Sandiford
Tamar Christina writes: >> -Original Message- >> From: Richard Sandiford >> Sent: Tuesday, July 29, 2025 5:20 PM >> To: Alex Coplan ; Alice Carlotti >> ; >> pins...@gmail.com; ktkac...@nvidia.com; Richard Earnshaw >> ; Tamar Christina ; >>

Re: [PATCH] simplify-rtx: Add `(subreg (not a))` simplification for word_mode [PR121308]

2025-07-30 Thread Richard Sandiford
Andrew Pinski writes: > Right now in simplify_subreg, there is code to try to simplify for word_mode > with the binary bitwise operators. The unary bitwise operator is not handle, > this causes an odd mix match and the new self testing code that was added with > r16-2614-g965564eafb721f was not ex

[PATCH 12/12] aarch64: Check the mode of SVE ACLE function results

2025-07-29 Thread Richard Sandiford
After previous patches, we should always get a VNx16BI result for ACLE intrinsics that return svbool_t. This patch adds an assert that checks a more general condition than that. gcc/ * config/aarch64/aarch64-sve-builtins.cc (function_expander::expand): Assert that the return value

[PATCH 09/12] aarch64: Use VNx16BI for svpnext*

2025-07-29 Thread Richard Sandiford
This patch continues the work of making ACLE intrinsics use VNx16BI for svbool_t results. It deals with the svpnext* intrinsics. gcc/ * config/aarch64/iterators.md (PNEXT_ONLY): New int iterator. * config/aarch64/aarch64-sve.md (@aarch64_sve_): Restrict SVE_PITER pattern

[PATCH 08/12] aarch64: Use VNx16BI for sv(n)match*

2025-07-29 Thread Richard Sandiford
This patch continues the work of making ACLE intrinsics use VNx16BI for svbool_t results. It deals with the svmatch* and svnmatch* intrinsics. gcc/ * config/aarch64/aarch64-sve2.md (@aarch64_pred_): Split SVE2_MATCH pattern into a VNx16QI_ONLY define_ins and a VNx8HI_ONLY

[PATCH 11/12] aarch64: Use VNx16BI for svdupq_b*

2025-07-29 Thread Richard Sandiford
This patch continues the work of making ACLE intrinsics use VNx16BI for svbool_t results. It deals with the predicate forms of svdupq. The general predicate expansion builds an equivalent integer vector and then compares it with zero. This patch therefore relies on the earlier patches to the com

[PATCH 07/12] aarch64: Use VNx16BI for svac*

2025-07-29 Thread Richard Sandiford
This patch continues the work of making ACLE intrinsics use VNx16BI for svbool_t results. It deals with the svac* intrinsics (floating- point compare absolute). gcc/ * config/aarch64/aarch64-sve.md (@aarch64_pred_fac): Replace with... (@aarch64_pred_fac_acle): ...this new

[PATCH 06/12] aarch64: Use VNx16BI for floating-point svcmp*

2025-07-29 Thread Richard Sandiford
This patch continues the work of making ACLE intrinsics use VNx16BI for svbool_t results. It deals with the floating-point forms of svcmp*. gcc/ * config/aarch64/aarch64-sve.md (@aarch64_pred_fcm_acle) (*aarch64_pred_fcm_acle, @aarch64_pred_fcmuo_acle) (*aarch64_pred_fcmuo

[PATCH 10/12] aarch64: Use VNx16BI for svdup_b*

2025-07-29 Thread Richard Sandiford
This patch continues the work of making ACLE intrinsics use VNx16BI for svbool_t results. It deals with the predicate forms of svdup. gcc/ * config/aarch64/aarch64-protos.h (aarch64_emit_sve_pred_vec_duplicate): Declare. * config/aarch64/aarch64.cc (aarch64_emit_sv

[PATCH 05/12] aarch64: Use VNx16BI for svcmp*_wide

2025-07-29 Thread Richard Sandiford
This patch continues the work of making ACLE intrinsics use VNx16BI for svbool_t results. It deals with the svcmp*_wide intrinsics. Since the only uses of these patterns are for ACLE intrinsics, there didn't seem much point adding an "_acle" suffix. gcc/ * config/aarch64/aarch64.cc (@aar

[PATCH 04/12] aarch64: Drop unnecessary GPs in svcmp_wide PTEST patterns

2025-07-29 Thread Richard Sandiford
Patterns that fuse a predicate operation P with a PTEST use aarch64_sve_same_pred_for_ptest_p to test whether the governing predicates of P and the PTEST are compatible. Most patterns were also written as define_insn_and_rewrites, with the rewrite replacing P's original governing predicate with PT

[PATCH 02/12] aarch64: Use VNx16BI for non-widening integer svcmp*

2025-07-29 Thread Richard Sandiford
This patch continues the work of making ACLE intrinsics use VNx16BI for svbool_t results. It deals with the non-widening integer forms of svcmp*. The handling of the PTEST patterns is similar to that for the earlier svwhile* patch. Unfortunately, on its own, this triggers a failure in the pred_c

[PATCH 03/12] aarch64: Use the correct GP mode in the svcmp_wide patterns

2025-07-29 Thread Richard Sandiford
The patterns for the svcmp_wide intrinsics used a VNx16BI input predicate for all modes, instead of the usual . That unnecessarily made some input bits significant, but more importantly, it triggered an ICE in aarch64_sve_same_pred_for_ptest_p when testing whether a comparison pattern could be fuse

[PATCH 01/12] aarch64: Use VNx16BI for svunpklo/hi_b

2025-07-29 Thread Richard Sandiford
This patch continues the work of making ACLE intrinsics use VNx16BI for svbool_t results. It deals with the svunpk* intrinsics. gcc/ * config/aarch64/aarch64-sve.md (@aarch64_sve_punpk_acle) (*aarch64_sve_punpk_acle): New patterns. * config/aarch64/aarch64-sve-builtins-bas

  1   2   3   4   5   6   7   8   9   10   >