Re: [PATCHv2] gcse: Skip hardreg pre if the hardreg is never alive [PRPR121095]

2025-07-17 Thread Richard Sandiford
Andrew Pinski writes: > r15-6789-ge7f98d9603808b added a new RTL pass for hardreg PRE for the hard > register > of FPM_REGNUM, this pass could get expensive if you have a large number of > basic blocks > and the hard register was never alive so it does nothing in the end. > In the aarch64 case,

Re: [PATCH] aarch64: small compile time improvement, disable hardreg PRE if !TARGET_FP8 [PR121095]

2025-07-16 Thread Richard Sandiford
Andrew Pinski writes: > r15-6789-ge7f98d9603808b added a new RTL pass for hardreg PRE for the hard > register > of FPM_REGNUM, but this pass does nothing if there can be any FPM_REGNUM > register in it. > So let's set HARDREG_PRE_REGNOS to include all zeros if !TARGET_FP8. > Now the pass will on

Re: [PATCH] aarch64: Use SVE2 BSL2N for vector EON

2025-07-15 Thread Richard Sandiford
Kyrylo Tkachov writes: >> On 15 Jul 2025, at 15:50, Richard Sandiford >> wrote: >> >> Kyrylo Tkachov writes: >>> Hi all, >>> >>> SVE2 BSL2N (x, y, z) = (x & z) | (~y & ~z). When x == y this computes: >>> (x & z) | (~

Re: [PATCH] aarch64: Use SVE2 NBSL for vector NOR and NAND for Advanced SIMD modes

2025-07-15 Thread Richard Sandiford
Kyrylo Tkachov writes: > From 930789b3c366777c49d4eb2f4dc84b0374601504 Mon Sep 17 00:00:00 2001 > From: Kyrylo Tkachov > Date: Fri, 11 Jul 2025 02:50:32 -0700 > Subject: [PATCH 1/2] aarch64: Use SVE2 NBSL for vector NOR and NAND for > Advanced SIMD modes > > We already have patterns to use the N

Re: [PATCH] aarch64: Use SVE2 BSL2N for vector EON

2025-07-15 Thread Richard Sandiford
Kyrylo Tkachov writes: > Hi all, > > SVE2 BSL2N (x, y, z) = (x & z) | (~y & ~z). When x == y this computes: > (x & z) | (~x & ~z) which is ~(x ^ z). > Thus, we can use it to match RTL patterns (not (xor (...) (...))) for both > Advanced SIMD and SVE modes when TARGET_SVE2. > This patch does that.

Re: [PATCH] asf: Fix calling of emit_move_insn on registers of different modes [PR119884]

2025-07-15 Thread Richard Sandiford
ackports with the subreg_size_lowpart_offset change mentioned below. It doesn't look like you have commit access yet. If you'd like it, please follow the instructions at https://gcc.gnu.org/gitwrite.html (I'll sponsor). Richard > Konstantinos > > On Fri, Jul 4, 2

Re: [PATCH] aarch64: fixup: Implement sme2+faminmax extension.

2025-07-15 Thread Richard Sandiford
Alfie Richards writes: > Hi all, > > This is a minor fixup to the previous patch I committed fixing Spencers > comments. > > Bootstrapped and reg tested for Aarch64. > > Thanks, > Alfie > > -- >8 -- > > Fixup to the SME2+FAMINMAX intrinsics commit. > > gcc/ChangeLog: > > * config/aarch64/aar

Re: [PATCH] aarch64: Enable selective LDAPUR generation for cores with RCPC2

2025-07-15 Thread Richard Sandiford
Soumya AR writes: > One additional change with this patch is that I had to update ldapr-sext.c > too. > > During the combine pass, cases of UNSPECV_LDAP (with an offset) + sign_extend > transform to LDAPUR + SEXT, and later to LDAPURS (with address folding). > > The aarch64 tests run with -moverr

Re: [PATCH 1/1] aarch64: Adapt unwinder to linux's SME signal behaviour

2025-07-15 Thread Richard Sandiford
Tamar Christina writes: > One question I did have not directly related to the unwinder changes, > But the ABI mentions that if any of the reserved bytes in TPIDR2_EL0 > Block are non-zero that TPIDR2_EL0 must be left unchanged [1]. The full requirement is: If TPIDR2_EL0 is nonnull and if any r

Re: [PATCH 1/1] aarch64: AND/BIC combines for unpacked SVE FP comparisons

2025-07-14 Thread Richard Sandiford
Spencer Abson writes: > This patch extends the splitting patterns for combining FP comparisons > with predicated logical operations such that they cover all of SVE_F. > > gcc/ChangeLog: > > * config/aarch64/aarch64-sve.md (*fcm_and_combine): > Extend from SVE_FULL_F to SVE_F. > (

Re: [PATCH] tree-optimization/121059 - record loop mask when required

2025-07-14 Thread Richard Sandiford
Richard Biener writes: > On Mon, 14 Jul 2025, Richard Sandiford wrote: > >> Richard Biener writes: >> > For loop masking we need to mask a mask AND operation with the loop >> > mask. The following makes sure we have a corresponding mask >> > available.

Re: [PATCH v3 1/1] aarch64: Fold builtins with highpart args to highpart equivalent [PR117850]

2025-07-14 Thread Richard Sandiford
Spencer Abson writes: > [...] > +/* If REF describes the high half of a 128-bit vector, return this > + vector. Otherwise, return NULL_TREE. */ > +static tree > +aarch64_v128_highpart_ref (const_tree ref) > +{ > + if (TREE_CODE (ref) != SSA_NAME) > +return NULL_TREE; > + > + gassign *stm

Re: [PATCH] tree-optimization/121059 - record loop mask when required

2025-07-14 Thread Richard Sandiford
Richard Biener writes: > For loop masking we need to mask a mask AND operation with the loop > mask. The following makes sure we have a corresponding mask > available. There's no good way to distinguish loop masking from > len masking here, so assume we have recorded a mask for the operands > ma

Re: [PATCH] aarch64: PR target/120999: Avoid movprfx for NBSL implementation of NOR

2025-07-14 Thread Richard Sandiford
Kyrylo Tkachov writes: >> On 11 Jul 2025, at 16:48, Richard Sandiford >> wrote: >>> Shall I backport this for GCC 15.2 as well? >>> The test case uses C operators which were enabled in GCC 15, though I >>> suppose one could construct a pure ACLE intrins

Re: [PATCH] aarch64: Tweak handling of general SVE permutes [PR121027]

2025-07-11 Thread Richard Sandiford
Tamar Christina writes: >> -Original Message- >> From: Richard Sandiford >> Sent: Friday, July 11, 2025 4:23 PM >> To: gcc-patches@gcc.gnu.org >> Cc: Alex Coplan ; Alice Carlotti >> ; >> pins...@gmail.com; ktkac...@nvidia.com; Richard Earnsha

[PATCH] aarch64: Tweak handling of general SVE permutes [PR121027]

2025-07-11 Thread Richard Sandiford
This PR is partly about a code quality regression that was triggered by g:caa7a99a052929d5970677c5b639e1fa5166e334. That patch taught the gimple optimisers to fold two VEC_PERM_EXPRs into one, conditional upon either (a) the original permutations not being "native" operations or (b) the combined p

Re: [PATCH] aarch64: PR target/120999: Avoid movprfx for NBSL implementation of NOR

2025-07-11 Thread Richard Sandiford
Kyrylo Tkachov writes: >> On 10 Jul 2025, at 11:12, Kyrylo Tkachov wrote: >> >> >> >>> On 10 Jul 2025, at 10:40, Richard Sandiford >>> wrote: >>> >>> Kyrylo Tkachov writes: >>>> Hi all, >>>> >>>&

Re: [PATCH] tree-optimization/120939 - remove uninitialized use of LOOP_VINFO_COST_MODEL_THRESHOLD

2025-07-11 Thread Richard Sandiford
Richard Biener writes: > On Thu, 10 Jul 2025, Richard Sandiford wrote: > >> Richard Biener writes: >> > The following removes an optimization that wrongly triggers right now >> > because it accesses LOOP_VINFO_COST_MODEL_THRESHOLD which might not be >> > co

[pushed] aarch64: Guard VF-based costing with !m_costing_for_scalar

2025-07-10 Thread Richard Sandiford
g:4b47acfe2b626d1276e229a0cf165e934813df6c caused a segfault in aarch64_vector_costs::analyze_loop_vinfo when costing scalar code, since we'd end up dividing by a zero VF. Much of the structure of the aarch64 costing code dates from a stage 4 patch, when we had to work within the bounds of what th

Re: [PATCH] tree-optimization/120939 - remove uninitialized use of LOOP_VINFO_COST_MODEL_THRESHOLD

2025-07-10 Thread Richard Sandiford
Richard Biener writes: > The following removes an optimization that wrongly triggers right now > because it accesses LOOP_VINFO_COST_MODEL_THRESHOLD which might not be > computed yet. > > Testing on x86_64 didn't reveal any testsuite coverage. > > Bootstrapped and tested on x86_64-unknown-linux-gn

[pushed] testsuite: Add -funwind-tables to sve*/pfalse* tests

2025-07-10 Thread Richard Sandiford
The SVE svpfalse folding tests use CFI directives to delimit the function bodies. That requires -funwind-tables to be enabled, which is true by default for *-linux-gnu targets, but not for *-elf. Tested on aarch64-linux-gnu and aarch64_be-elf. Pushed as obvious. Richard gcc/testsuite/

[PATCH] aarch64: Fix LD1Q and ST1Q failures for big-endian

2025-07-10 Thread Richard Sandiford
LD1Q gathers and ST1Q scatters are unusual in that they operate on 128-bit blocks (effectively VNx1TI). However, we don't have modes or ACLE types for 128-bit integers, and 128-bit integers are not the intended use case. Instead, the instructions are intended to be used in "hybrid VLA" operations

Re: [PATCH] aarch64: Enable selective LDAPUR generation for cores with RCPC2

2025-07-10 Thread Richard Sandiford
Soumya AR writes: >> On 10 Jul 2025, at 3:15 PM, Richard Sandiford >> wrote: >> >> External email: Use caution opening links or attachments >> >> >> Soumya AR writes: >>>> On 1 Jul 2025, at 9:22 PM, Kyrylo Tkachov wrote: >>>

Re: [PATCH] aarch64: Enable selective LDAPUR generation for cores with RCPC2

2025-07-10 Thread Richard Sandiford
Soumya AR writes: >> On 1 Jul 2025, at 9:22 PM, Kyrylo Tkachov wrote: >> >> >> >>> On 1 Jul 2025, at 17:36, Richard Sandiford >>> wrote: >>> >>> Soumya AR writes: >>>> From 2a2c3e3683aaf3041524df166fc6f8cf20895a0b Mo

Re: [PATCH] aarch64: PR target/120999: Avoid movprfx for NBSL implementation of NOR

2025-07-10 Thread Richard Sandiford
Kyrylo Tkachov writes: > Hi all, > > While the SVE2 NBSL instruction accepts MOVPRFX to add more flexibility > due to its tied operands, the destination of the movprfx cannot be also > a source operand. But the offending pattern in aarch64-sve2.md tries > to do exactly that for the "=?&w,w,w" alte

Re: [PATCH] aarch64: Extend HVLA permutations to big-endian

2025-07-09 Thread Richard Sandiford
Richard Sandiford writes: > TARGET_VECTORIZE_VEC_PERM_CONST has code to match the SVE2.1 > "hybrid VLA" DUPQ, EXTQ, UZPQ{1,2}, and ZIPQ{1,2} instructions. > This matching was conditional on !BYTES_BIG_ENDIAN. > > The ACLE code also lowered the associated SVE2.1 i

Re: [PATCH v2 1/1] libiberty: add routines to handle type-sensitive doubly linked lists

2025-07-09 Thread Richard Sandiford
Matthieu Longo writes: > Those methods's implementation is relying on duck-typing at compile > time. > The structure corresponding to the node of a doubly linked list needs > to define attributes 'prev' and 'next' which are pointers on the type > of a node. > The structure wrapping the nodes and o

[PATCH] aarch64: Fix endianness of DFmode vector constants

2025-07-09 Thread Richard Sandiford
aarch64_simd_valid_imm tries to decompose a constant into a repeating series of 64 bits, since most Advanced SIMD and SVE immediate forms require that. (The exceptions are handled first.) It does this by building up a byte-level register image, lsb first. If the image does turn out to repeat eve

[PATCH] aarch64: Some fixes for SVE INDEX constants

2025-07-09 Thread Richard Sandiford
When using SVE INDEX to load an Advanced SIMD vector, we need to take account of the different element ordering for big-endian targets. For example, when big-endian targets store the V4SI constant { 0, 1, 2, 3 } in registers, 0 becomes the most significant element, whereas INDEX always operates fr

[PATCH] aarch64: Extend HVLA permutations to big-endian

2025-07-09 Thread Richard Sandiford
TARGET_VECTORIZE_VEC_PERM_CONST has code to match the SVE2.1 "hybrid VLA" DUPQ, EXTQ, UZPQ{1,2}, and ZIPQ{1,2} instructions. This matching was conditional on !BYTES_BIG_ENDIAN. The ACLE code also lowered the associated SVE2.1 intrinsics into suitable VEC_PERM_EXPRs. This lowering was not conditio

[PATCH] Make the RTL frontend set REG_NREGS correctly

2025-07-09 Thread Richard Sandiford
While working on a new testcase that uses the RTL frontend, I hit a bug where a (reg ...) that spans multiple hard registers had REG_NREGS set to 1. This caused various things to misbehave. For example, if the (reg ...) in question was used as crtl->return_rtx, only the first register in the group

Re: [PATCH] ext-dce: Fix subreg_lsb is_constant assumption

2025-07-09 Thread Richard Sandiford
Jeff Law writes: > On 7/4/25 10:21 AM, Richard Sandiford wrote: >> ext-dce had: >> >>if (SUBREG_P (dst) && SUBREG_BYTE (dst).is_constant ()) >> { >>bit = subreg_lsb (dst).to_constant (); >>if (bit

[pushed] testsuite: Add a couple of fstack_protector guards

2025-07-09 Thread Richard Sandiford
These tests required runtime support for -fstack-protector, but didn't test for it. Tested on aarch64-linux-gnu and aarch64_be-elf & pushed as obvious. Richard gcc/testsuite/ * gcc.target/aarch64/pr118348_1.c: Require fstack_protector. * gcc.target/aarch64/pr118348_2.c: Likewise

Re: [PATCH 3/7] aarch64: Handle DImode BCAX operations

2025-07-08 Thread Richard Sandiford
Kyrylo Tkachov writes: > Thanks for your comments, do you mean something like the following? Yeah, the patch LGTM, thanks. Richard > Or do you mean to have separate alternatives with each one individually tying > one of operands 2 or 3 to r0? > > Kyrill > > >> >> Thanks, >> Tamar >> >>> Than

Re: [PATCH v1 1/1] libiberty: add common methods for type-sensitive doubly linked lists

2025-07-08 Thread Richard Sandiford
Matthieu Longo writes: > Those methods's implementation is relying on duck-typing at compile > time. > The structure corresponding to the node of a doubly linked list needs > to define attributes 'prev' and 'next' which are pointers on the type > of a node. > The structure wrapping the nodes and o

Re: [PATCH 3/7] aarch64: Handle DImode BCAX operations

2025-07-08 Thread Richard Sandiford
Tamar Christina writes: >> -Original Message- >> From: Richard Sandiford >> Sent: Monday, July 7, 2025 12:55 PM >> To: Kyrylo Tkachov >> Cc: GCC Patches ; Richard Earnshaw >> ; Alex Coplan ; Andrew >> Pinski >> Subject: Re: [P

Re: [PATCH 3/7] aarch64: Handle DImode BCAX operations

2025-07-07 Thread Richard Sandiford
Richard Sandiford writes: > Kyrylo Tkachov writes: >> Hi all, >> >> To handle DImode BCAX operations we want to do them on the SIMD side only if >> the incoming arguments don't require a cross-bank move. >> This means we need to split back the combination

Re: [PATCH 3/7] aarch64: Handle DImode BCAX operations

2025-07-07 Thread Richard Sandiford
Kyrylo Tkachov writes: > Hi all, > > To handle DImode BCAX operations we want to do them on the SIMD side only if > the incoming arguments don't require a cross-bank move. > This means we need to split back the combination to separate GP BIC+EOR > instructions if the operands are expected to be in

Re: [PATCH] aarch64: Improve popcountti2 with SVE

2025-07-07 Thread Richard Sandiford
Tamar Christina writes: >> -Original Message- >> From: Kyrylo Tkachov >> Sent: Monday, July 7, 2025 10:38 AM >> To: GCC Patches >> Cc: Richard Sandiford ; Richard Earnshaw >> ; Alex Coplan ; Andrew >> Pinski >> Subject: [PATCH] aar

[PATCH] vect: Fix VEC_WIDEN_PLUS_HI/LO choice for big-endian [PR118891]

2025-07-04 Thread Richard Sandiford
In the tree codes and optabs, the "hi" in a vector hi/lo pair means "most significant" and the "lo" means "least significant", with sigificance following GCC's normal endian expectations. Thus on big-endian targets, the hi part handles the first half of the elements in memory order and the lo part

[PATCH] ext-dce: Fix subreg_lsb is_constant assumption

2025-07-04 Thread Richard Sandiford
ext-dce had: if (SUBREG_P (dst) && SUBREG_BYTE (dst).is_constant ()) { bit = subreg_lsb (dst).to_constant (); if (bit >= HOST_BITS_PER_WIDE_INT) bit = HOST_BITS_PER_WIDE_INT - 1; dst = SUBREG_REG (dst); But a constant

[PATCH] aarch64: Fix neon-sve-bridge.c failures for big-endian

2025-07-04 Thread Richard Sandiford
Lowpart subregs are generally disallowed on big-endian SVE vector registers, since the first memory element is stored at the least significant end of the register, rather than the most significant end. (See the comment at the head of aarch64-sve.md for details, and aarch64_modes_compatible_p for th

[PATCH] aarch64: Fix ZIP1 order in aarch64_expand_vector_init

2025-07-04 Thread Richard Sandiford
aarch64_expand_vector_init contains some divide-and-conquer code that tries to load the odd and even elements into 64-bit registers and then ZIP them together. On big-endian targets, the even elements are more significant than the odd elements and so should come second in the ZIP. This fixes many

Re: [PATCH 1/2] Allow the target to request a masked vector epilogue

2025-07-04 Thread Richard Sandiford
Richard Biener writes: > @@ -1738,8 +1738,13 @@ protected: >unsigned int m_suggested_unroll_factor; > >/* The suggested mode to be used for a vectorized epilogue or VOIDmode, > - determined at finish_cost. */ > + determined at finish_cost. m_masked_epilogue is epilogue should u

Re: [PATCH v2 1/1] aarch64: Add support for unpacked SVE FP comparisons

2025-07-04 Thread Richard Sandiford
Spencer Abson writes: > This patch extends our vec_cmp expander to support partial FP modes. > > We use a predicate mode that is narrower the operation's VPRED to govern > unpacked FP operations under flag_trapping_math, so the expansion must > handle cases where the comparison's target and govern

Re: [PATCH] Update alignment for argument on stack

2025-07-03 Thread Richard Sandiford
"H.J. Lu" writes: > On Thu, Jul 3, 2025 at 11:02 PM Richard Sandiford > wrote: >> >> "H.J. Lu" writes: >> > Since a backend may ignore user type alignment for arguments passed on >> > stack, update alignment for arguments passed on stack

Re: [PATCH] asf: Fix calling of emit_move_insn on registers of different modes [PR119884]

2025-07-03 Thread Richard Sandiford
Konstantinos Eleftheriou writes: > On Wed, May 7, 2025 at 11:29 AM Richard Sandiford > wrote: >> But I thought the code was allowing multiple stores to be forwarded to >> a single (wider) load. E.g. 4 individual byte stores at address X, X+1, >> X+2 and X+3 could be fo

Re: [PATCH] Add string_slice class.

2025-07-03 Thread Richard Sandiford
Alfie Richards writes: > +/* string_slice inherits from array_slice, specifically to refer to a > substring > + of a character array. > + It includes some string like helpers. */ > +class string_slice : public array_slice > +{ > +public: > + string_slice () : array_slice () {} > + string_s

Re: [PATCH] Update alignment for argument on stack

2025-07-03 Thread Richard Sandiford
"H.J. Lu" writes: > Since a backend may ignore user type alignment for arguments passed on > stack, update alignment for arguments passed on stack when copying MEM's > memory attributes. > > gcc/ > > PR target/120839 > * emit-rtl.cc (set_mem_attrs): Update alignment for argument on > stack. > > gc

Re: [PATCH] tree-optimization/120927 - 510.parest_r segfault with masked epilog

2025-07-03 Thread Richard Sandiford
Richard Biener writes: > The following fixes bad alignment computaton for epilog vectorization > when as in this case for 510.parest_r and masked epilog vectorization > with AVX512 we end up choosing AVX to vectorize the main loop and > masked AVX512 (sic!) to vectorize the epilog. In that case a

Re: [PATCH v9 0/9] AArch64: CMPBR support

2025-07-03 Thread Richard Sandiford
Karl Meakin writes: > This patch series adds support for the CMPBR extension. It includes the > new `+cmpbr` option and rules to generate the new instructions when > lowering conditional branches. Thanks for the update, LGTM. I've pushed the series to trunk. Richard > Changelog: > * v9: > -

Re: [PATCH v1 1/2] AArch64: precommit test for masked load vectorisation.

2025-07-03 Thread Richard Sandiford
Karl Meakin writes: > Commit the test file `mask_load_2.c` before the vectorisation analysis > is changed, so that the changes in codegen are more obvious in the next > commit. > > gcc/testsuite/ChangeLog: > * gcc.target/aarch64/sve/mask_load_2.c: New test. OK, thanks. Richard > --- > ..

Re: [PATCH v7 8/9] AArch64: rules for CMPBR instructions

2025-07-02 Thread Richard Sandiford
Karl Meakin writes: > On 01/07/2025 11:02, Richard Sandiford wrote: >> Karl Meakin writes: >>> @@ -763,6 +784,68 @@ (define_expand "cbranchcc4" >>> "" >>> ) >>> >>> +;; Emit a `CB (register)` or `CB (immediate

Re: [PATCH] AArch64: Use correct cost for shifted halfword load/stores

2025-07-02 Thread Richard Sandiford
Wilco Dijkstra writes: > Since all Armv9 cores support shifted LDRH/STRH, use the correct cost of zero > for these. > > Passes regress, OK for commit? > > gcc: > * config/aarch64/tuning_models/generic_armv9_a.h > (generic_armv9_a_addrcost_table): Use zero cost for himode. OK if th

Re: [PATCH v7 7/9] AArch64: precommit test for CMPBR instructions

2025-07-01 Thread Richard Sandiford
Richard Sandiford writes: > Karl Meakin writes: >> +// If the branch destination is out of range (1KiB), we have to generate an >> +// extra B instruction (which can handle larger displacements) and branch >> around >> +// it >> +int far_branch(i32 x, i32 y) { &

Re: [PATCH] aarch64: Enable selective LDAPUR generation for cores with RCPC2

2025-07-01 Thread Richard Sandiford
Soumya AR writes: > From 2a2c3e3683aaf3041524df166fc6f8cf20895a0b Mon Sep 17 00:00:00 2001 > From: Soumya AR > Date: Mon, 30 Jun 2025 12:17:30 -0700 > Subject: [PATCH] aarch64: Enable selective LDAPUR generation for cores with > RCPC2 > > This patch adds the ability to fold the address computati

Re: [PATCH v3] AArch64 SIMD: convert mvn+shrn into mvni+subhn

2025-07-01 Thread Richard Sandiford
Remi Machet writes: > Attached is the updated patch for the aarch64 conversion of some > mvn+shrn patterns into a mvni+subhn. Hopefully attachment fixes the tab > issues, the cover letter was updated to better explain what the patch > does, code was changed to use emit_move_insn, and testcase w

Re: [PATCH v7 4/9] AArch64: add constants for branch displacements

2025-07-01 Thread Richard Sandiford
Karl Meakin writes: > Extract the hardcoded values for the minimum PC-relative displacements > into named constants and document them. > > gcc/ChangeLog: > > * config/aarch64/aarch64.md (BRANCH_LEN_P_128MiB): New constant. > (BRANCH_LEN_N_128MiB): Likewise. > (BRANCH_LEN_P_1MiB):

Re: [PATCH v7 7/9] AArch64: precommit test for CMPBR instructions

2025-07-01 Thread Richard Sandiford
Karl Meakin writes: > Commit the test file `cmpbr.c` before rules for generating the new > instructions are added, so that the changes in codegen are more obvious > in the next commit. > > gcc/testsuite/ChangeLog: > > * lib/target-supports.exp: Add `cmpbr` to the list of extensions. >

Re: [PATCH v7 8/9] AArch64: rules for CMPBR instructions

2025-07-01 Thread Richard Sandiford
Karl Meakin writes: > @@ -763,6 +784,68 @@ (define_expand "cbranchcc4" >"" > ) > > +;; Emit a `CB (register)` or `CB (immediate)` instruction. > +;; The immediate range depends on the comparison code. > +;; Comparisons against immediates outside this range fall back to > +;; CMP + B. > +(de

Re: [PATCH v7 2/9] AArch64: reformat branch instruction rules

2025-07-01 Thread Richard Sandiford
Karl Meakin writes: > @@ -729,30 +729,31 @@ (define_expand "cbranch4" > (match_operator 0 "aarch64_comparison_operator" >[(match_operand:GPF_F16 1 "register_operand") > (match_operand:GPF_F16 2 "aarch64_fp_compare_operand")]) > - (label_ref

Re: [PATCH 1/1] ivopts: Fix scan-assembler-not regexes for aarch64/sve test

2025-06-30 Thread Richard Sandiford
Christopher Bazley writes: > The test added by r16-1671-ge7ff8e8d77df74 passed despite using > regular expressions that would never match real assembly language > output from the compiler. Because the regular expressions were not > expected to match, and didn't, this was not noticeable; however, >

Re: [PATCH] fold-mem-offsets: Convert from DF to RTL-SSA

2025-06-30 Thread Richard Sandiford
Christoph Müllner writes: > This patch converts the fold-mem-offsets pass from DF to RTL-SSA. > Along with this conversion, the way the pass collects information > was completely reworked. Instead of visiting each instruction multiple > times, this is now down only once. > > Most significant chan

Re: [PATCH 1/2] Fixup partial_vectors_supported_p use

2025-06-27 Thread Richard Sandiford
Richard Biener writes: > On Fri, 27 Jun 2025, Richard Biener wrote: > >> On Thu, 26 Jun 2025, Richard Sandiford wrote: >> >> > Richard Biener writes: >> > > The following fixes the computation of supports_partial_vectors which >> > > is used t

Re: [PATCH 1/2] Fixup partial_vectors_supported_p use

2025-06-26 Thread Richard Sandiford
Richard Biener writes: > The following fixes the computation of supports_partial_vectors which > is used to prune the set of modes to iterate over for epilog > vectorization. The used partial_vectors_supported_p predicate > only looks for while_ult while also support predication when > mask modes

Re: [PATCH 2/2] Fixup vector epilog analysis skipping when not using partial vectors

2025-06-26 Thread Richard Sandiford
Richard Biener writes: > The following avoids re-analyzing the loop as epilogue when not > using partial vectors and the mode is the same as the autodetected > vector mode and that has a too high VF for a non-predicated loop. > This situation occurs almost always on x86 and saves us one > re-analy

Re: [PATCH v6 9/9] AArch64: make rules for CBZ/TBZ higher priority

2025-06-25 Thread Richard Sandiford
Karl Meakin writes: > Move the rules for CBZ/TBZ to be above the rules for > CBB/CBH/CB. We want them to have higher priority > because they can express larger displacements. > > gcc/ChangeLog: > > * config/aarch64/aarch64.md (aarch64_cbz1): Move > above rules for CBB/CBH/CB. > (

Re: [PATCH]middle-end: Fix store_bit_field expansions of vector constructors [PR120718]

2025-06-25 Thread Richard Sandiford
Richard Biener writes: > On Tue, 24 Jun 2025, Richard Sandiford wrote: > >> Tamar Christina writes: >> > store_bit_field_1 has an optimization where if a target is not a memory >> > operand >> > and the entire value is being set from something larger we

Re: [PATCH] rtl-ssa: Fix test condition for insn_info::has_been_deleted

2025-06-25 Thread Richard Sandiford
Christoph Müllner writes: > insn_info::has_been_deleted () is documented to return true if an > instruction is deleted. Such instructions have their `volatile` bit set, > which can be tested via rtx_insn::deleted (). > > The current condition for insn_info::has_been_deleted () is: > * m_rtl is no

Re: [PATCH v6 8/9] AArch64: rules for CMPBR instructions

2025-06-25 Thread Richard Sandiford
Karl Meakin writes: > Add rules for lowering `cbranch4` to CBB/CBH/CB when > CMPBR extension is enabled. > > gcc/ChangeLog: > > * config/aarch64/aarch64-protos.h (aarch64_cb_rhs): New function. > * config/aarch64/aarch64.cc (aarch64_cb_rhs): Likewise. > * config/aarch64/aarch64.m

Re: [PATCH] rtl-ssa: Fix test condition for insn_info::has_been_deleted

2025-06-25 Thread Richard Sandiford
Christoph Müllner writes: > On Tue, Jun 24, 2025 at 9:29 PM Richard Sandiford > wrote: >> >> Christoph Müllner writes: >> > insn_info::has_been_deleted () is documented to return true if an >> > instruction is deleted. Such instructions have their `volatil

Re: [PATCH v6 8/9] AArch64: rules for CMPBR instructions

2025-06-25 Thread Richard Sandiford
Richard Sandiford writes: > Karl Meakin writes: >> + "r")) >> + (label_ref (match_operand 2)) >> + (pc)))] >> + "TARGET_CMPBR" >> + "cb\\t%0, %1, %l2"; Sorr

Re: [PATCH v6 3/9] AArch64: rename branch instruction rules

2025-06-25 Thread Richard Sandiford
Karl Meakin writes: > Give the `define_insn` rules used in lowering `cbranch4` to RTL > more descriptive and consistent names: from now on, each rule is named > after the AArch64 instruction that it generates. Also add comments to > document each rule. > > gcc/ChangeLog: > > * config/aarch64

Re: [PATCH v6 2/9] AArch64: reformat branch instruction rules

2025-06-25 Thread Richard Sandiford
Karl Meakin writes: > Make the formatting of the RTL templates in the rules for branch > instructions more consistent with each other. > > gcc/ChangeLog: > > * config/aarch64/aarch64.md (cbranch4): Reformat. > (cbranchcc4): Likewise. > (condjump): Likewise. > (*compare_cond

Re: [PATCH]middle-end: Fix store_bit_field expansions of vector constructors [PR120718]

2025-06-24 Thread Richard Sandiford
Richard Biener writes: > On Tue, 24 Jun 2025, Richard Sandiford wrote: > >> Richard Biener writes: >> > On Tue, 24 Jun 2025, Richard Sandiford wrote: >> > >> >> Tamar Christina writes: >> >> > store_bit_field_1 has an optimization where

Re: [PATCH]middle-end: Fix store_bit_field expansions of vector constructors [PR120718]

2025-06-24 Thread Richard Sandiford
Richard Biener writes: > On Tue, 24 Jun 2025, Richard Sandiford wrote: >> Richard Biener writes: >> > On Tue, 24 Jun 2025, Richard Sandiford wrote: >> >> (from h8300). This is also why simplify_gen_subreg has: >> >> >> >> if (GET_CODE

[PATCH] rtl-ssa: Rewrite process_uses_of_deleted_def [PR120745]

2025-06-24 Thread Richard Sandiford
process_uses_of_deleted_def seems to have been written on the assumption that non-degenerate phis would be explicitly deleted by an insn_change, and that the function therefore only needed to delete degenerate phis. But that was inconsistent with the rest of the code, and wouldn't be very convenien

[PATCH] lra: Check for null lowpart_subregs [PR120733]

2025-06-24 Thread Richard Sandiford
lra-eliminations.cc:move_plus_up tries to: Transform (subreg (plus reg const)) to (plus (subreg reg) const) when it is possible. Most of it is heavily conditional: if (!paradoxical_subreg_p (x) && GET_CODE (subreg_reg) == PLUS && CONSTANT_P (XEXP (subreg_reg, 1)) && GET

Re: [PATCH]middle-end: Fix store_bit_field expansions of vector constructors [PR120718]

2025-06-24 Thread Richard Sandiford
Tamar Christina writes: > store_bit_field_1 has an optimization where if a target is not a memory > operand > and the entire value is being set from something larger we can just wrap a > subreg around the source and emit a move. > > For vector constructors this is however problematic because the

Re: [PATCH] AArch64: Disable TARGET_CONST_ANCHOR

2025-06-23 Thread Richard Sandiford
Andrew Pinski writes: > On Fri, Jun 20, 2025, 4:47 PM Wilco Dijkstra wrote: > >> >> TARGET_CONST_ANCHOR appears to trigger too often, even on simple >> immediates. >> It inserts extra ADD/SUB instructions even when a single MOV exists. >> Disable it to improve overall code quality: on SPEC2017 it

[PATCH] vregs: Use force_subreg when instantiating subregs [PR120721]

2025-06-20 Thread Richard Sandiford
In this PR, we started with: (subreg:V2DI (reg:DI virtual-reg) 0) and vregs instantiated the virtual register to the argument pointer. But: (subreg:V2DI (reg:DI ap) 0) is not a sensible subreg, since the argument pointer certainly can't be referenced in V2DImode. This is (IMO correctly

Re: [PATCH v4 08/10] AArch64: rules for CMPBR instructions

2025-06-19 Thread Richard Sandiford
Karl Meakin writes: > diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc > index be5a97294dd..1d4ae73a963 100644 > --- a/gcc/config/aarch64/aarch64.cc > +++ b/gcc/config/aarch64/aarch64.cc > @@ -944,16 +944,50 @@ static const char * > svpattern_token (enum aarch64_svpatter

Re: [PATCH v5 0/2] aarch64: add support of AEABI Build Attributes

2025-06-16 Thread Richard Sandiford
4.html > - [4]: https://gcc.gnu.org/pipermail/gcc-patches/2024-October/666178.html > - [5]: > https://inbox.sourceware.org/gcc-patches/20250604150612.1234394-1-matthieu.lo...@arm.com/ > > ## Previous revisions > > Diff with revision 4 [5]: > - address the comments from Richard Sandiford.

[PATCH] aarch64: Add vec_set/extract for tuple modes [PR113027]

2025-06-16 Thread Richard Sandiford
We generated inefficient code for bitfield references to Advanced SIMD structure modes. In RTL, these modes are just extra-long vectors, and so inserting and extracting an element is simply a vec_set or vec_extract operation. For the record, I don't think these modes should ever become fully fled

Re: [PATCH] AArch64 SIMD: convert mvn+shrn into mvni+subhn

2025-06-13 Thread Richard Sandiford
Alex Coplan writes: > Hi Remi, > > On 12/06/2025 17:02, Richard Sandiford wrote: >> Remi Machet writes: >> > +  "TARGET_SIMD" >> > +  "#" >> > +  "&& true" >> > +  [(const_int 0)] >> > +{ >

Re: [PATCH] expand: Add a helper function for edge splitting [PR120629]

2025-06-13 Thread Richard Sandiford
Jakub Jelinek writes: > On Fri, Jun 13, 2025 at 10:05:14AM +0200, Jakub Jelinek wrote: >> On Fri, Jun 13, 2025 at 08:52:55AM +0100, Richard Sandiford wrote: >> > > 2025-06-12 Jakub Jelinek >> > > >> > > * cfgexpand.cc (construct_init_block): If

Re: [PATCH v3] simplify-rtx.cc:Simplify XOR(AND(ROTATE(~1) A) ASHIFT(1 A)) to IOR.

2025-06-13 Thread Richard Sandiford
Jiawei writes: > This patch adds a new simplification rule to `simplify-rtx.cc` that > handles a common bit manipulation pattern involving a single-bit set > and clear followed by XOR. > > The transformation targets RTL of the form: > > (xor (and (rotate (~1) A) B) (ashift 1 A)) > > which is sem

Re: [PATCH] expand: Fix up edge splitting for ENTRY block during expansion if there are any PHIs [PR120629]

2025-06-13 Thread Richard Sandiford
Jakub Jelinek writes: > Hi! > > Andrew ran some extra ranger checking during bootstrap and found one more > case (though much rarer than the GIMPLE_COND case). > > Seems on fold-const.cc (native_encode_expr) we end up with bb 2, ENTRY > bb successor, having PHI nodes (usually there is some bb in b

Re: [PATCH] AArch64 SIMD: convert mvn+shrn into mvni+subhn

2025-06-12 Thread Richard Sandiford
Remi Machet writes: > Add an optimization to aarch64 SIMD converting mvn+shrn into mvni+subhn > which > allows for better optimization when the code is inside a loop by using a > constant. > > Bootstrapped and regtested on aarch64-linux-gnu. > > Signed-off-by: Remi Machet > > gcc/ChangeLog: > >

Re: [PATCH] expand: Fix up edge splitting for GIMPLE_COND expansion if there are any PHIs [PR120629]

2025-06-12 Thread Richard Sandiford
Jakub Jelinek writes: > Hi! > > My r16-1398 PR120434 ranger during expansion change broke profiled lto > bootstrap on x86_64-linux, the following testcase is reduced from that. > > The problem is during expand_gimple_cond, if we are unlucky that neither > of edge_true and edge_false point to the n

Re: [PATCH] emit-rtl: Use simplify_subreg_regno to validate hardware subregs [PR119966]

2025-06-12 Thread Richard Sandiford
Jeff Law writes: > On 6/9/25 12:40 PM, Dimitar Dimitrov wrote: >> On Sun, Jun 08, 2025 at 09:09:44AM -0600, Jeff Law wrote: >>> >>> >>> On 6/5/25 2:16 PM, Dimitar Dimitrov wrote: PR119966 showed that combine could generate unfoldable hardware subregs for pru-unknown-elf. To fix, strengt

Re: [v2 PATCH] simplify-rtx.cc:Simplify XOR(AND(ROTATE(~1) A) ASHIFT(1 A)) to IOR

2025-06-12 Thread Richard Sandiford
Jiawei writes: > This patch adds a new simplification rule to `simplify-rtx.cc` that > handles a common bit manipulation pattern involving a single-bit set > and clear followed by XOR. > > The transformation targets RTL of the form: > > (xor (and (rotate (~1) A) B) (ashift 1 A)) > > which is sem

Re: [PATCH] simplify-rtx.cc:Simplify XOR(AND(ROTATE(~1), A), ASHIFT(1, A)) to IOR

2025-06-12 Thread Richard Sandiford
Jeff Law writes: > On 6/11/25 9:53 AM, Richard Sandiford wrote: > >> >>> +into B | (1 << A). */ >>> + if (GET_CODE (op0) == AND >>> + && GET_CODE (XEXP (op0, 0)) == ROTATE >>> + && CONST_INT_P (XEXP (

Re: [PATCH] simplify-rtx.cc:Simplify XOR(AND(ROTATE(~1), A), ASHIFT(1, A)) to IOR

2025-06-11 Thread Richard Sandiford
Jiawei writes: > This patch adds a new simplification rule to `simplify-rtx.cc` that > handles a common bit manipulation pattern involving a single-bit set > and clear followed by XOR. > > The transformation targets RTL of the form: > > (xor (and (rotate (~1), A) B) (ashift 1, A)) > > which is s

Re: [PATCH 11/14] aarch64: Add support for unpacked SVE FP conditional binary arithmetic

2025-06-11 Thread Richard Sandiford
Spencer Abson writes: > On Tue, Jun 10, 2025 at 08:04:06PM +0100, Richard Sandiford wrote: >> Spencer Abson writes: >> > On Fri, Jun 06, 2025 at 03:52:12PM +0100, Richard Sandiford wrote: >> >> Spencer Abson writes: >> >> > @@ -8165,20 +8169,25 @@

Re: AArch64 promote aarch64-autovec-peference to mautovec-preference

2025-06-11 Thread Richard Sandiford
Tamar Christina writes: > @@ -360,8 +367,8 @@ The number of Newton iterations for calculating the > reciprocal for double type. > > -param=aarch64-autovec-preference= > Target Joined Var(aarch64_autovec_preference) > Enum(aarch64_autovec_preference) Init(AARCH64_AUTOVEC_DEFAULT) Param > ---p

Re: [PATCH v4 3/3] aarch64: add support for AEABI Build Attributes

2025-06-11 Thread Richard Sandiford
Matthieu Longo writes: > GCS (Guarded Control Stack, an Armv9.4-a extension) requires some > caution at runtime. The runtime linker needs to reason about the > compatibility of a set of relocable object files that might not > have been compiled with the same compiler. > Up until now, those metadat

Re: [PATCH v4 2/3] aarch64: encapsulate note.gnu.property emission into a class

2025-06-11 Thread Richard Sandiford
Matthieu Longo writes: > The code emitting the GNU properties was moved to a separate file to > improve modularity and "releave" the 31000-lines long aarch64.cc file > from a few lines. > > It introduces a new namespace "aarch64::" for AArch64 backend which > reduce the length of function names by

Re: [PATCH v4 1/3] aarch64: add debug comments to feature properties in .note.gnu.property

2025-06-11 Thread Richard Sandiford
Matthieu Longo writes: > GNU properties are emitted to provide some information about the features > used in the generated code like BTI, GCS, or PAC. However, no debug > comment are emitted in the generated assembly even if -dA is provided. > It makes understanding the information stored in the .

[PATCH] aarch64: Incorrect removal of ZA restore [PR120624]

2025-06-11 Thread Richard Sandiford
The PCS defines a lazy save scheme for managing ZA across normal "private-ZA" functions. GCC currently uses this scheme for calls to all private-ZA functions (rather than using caller-save). Therefore, before a sequence of calls to private-ZA functions, GCC emits code to set up a lazy save. Afte

  1   2   3   4   5   6   7   8   9   10   >