[pushed v2 16/25] aarch64: Add support for SME ZA attributes

2023-12-05 Thread Richard Sandiford
SME has an array called ZA that can be enabled and disabled separately from streaming mode. A status bit called PSTATE.ZA indicates whether ZA is currently enabled or not. In C and C++, the state of PSTATE.ZA is controlled using function attributes. There are four attributes that can be attached

[pushed v2 19/25] aarch64: Generalise unspec_based_function_base

2023-12-05 Thread Richard Sandiford
Until now, SVE intrinsics that map directly to unspecs have always used type suffix 0 to distinguish between signed integers, unsigned integers, and floating-point values. SME adds functions that need to use type suffix 1 instead. This patch generalises the classes accordingly. gcc/ * conf

[pushed v2 23/25] aarch64: Handle PSTATE.SM across abnormal edges

2023-12-05 Thread Richard Sandiford
PSTATE.SM is always off on entry to an exception handler, and on entry to a nonlocal goto receiver. Those entry points need to switch PSTATE.SM back to the appropriate state for the current function. In the case of streaming-compatible functions, they need to restore the mode that the caller was o

[pushed v2 24/25] aarch64: Enforce inlining restrictions for SME

2023-12-05 Thread Richard Sandiford
A function that has local ZA state cannot be inlined into its caller, since we only support managing ZA switches at function scope. A function whose body directly clobbers ZA state cannot be inlined into a function with ZA state. A function whose body requires a particular PSTATE.SM setting can o

[pushed v2 20/25] aarch64: Generalise _m rules for SVE intrinsics

2023-12-05 Thread Richard Sandiford
In SVE there was a simple rule that unary merging (_m) intrinsics had a separate initial argument to specify the values of inactive lanes, whereas other merging functions took inactive lanes from the first operand to the operation. That rule began to break down in SVE2, and it continues to do so i

[pushed v2 25/25] aarch64: Update sibcall handling for SME

2023-12-05 Thread Richard Sandiford
We only support tail calls between functions with the same PSTATE.ZA setting ("private-ZA" to "private-ZA" and "shared-ZA" to "shared-ZA"). Only a normal non-streaming function can tail-call another non-streaming function, and only a streaming function can tail-call another streaming function. An

[pushed v2 21/25] aarch64: Add support for

2023-12-05 Thread Richard Sandiford
This adds support for the SME parts of arm_sme.h. gcc/ * doc/invoke.texi: Document +sme-i16i64 and +sme-f64f64. * config.gcc (aarch64*-*-*): Add arm_sme.h to the list of headers to install and aarch64-sve-builtins-sme.o to the list of objects to build. * con

[pushed v2 1/5] aarch64: Add +sme2

2023-12-05 Thread Richard Sandiford
gcc/ * doc/invoke.texi: Document +sme2. * doc/sourcebuild.texi: Document aarch64_sme2. * config/aarch64/aarch64-option-extensions.def (AARCH64_OPT_EXTENSION): Add sme2. * config/aarch64/aarch64.h (AARCH64_ISA_SME2, TARGET_SME2): New macros. gcc/testsuite/

[pushed v2 3/5] aarch64: Add svboolx2_t

2023-12-05 Thread Richard Sandiford
SME2 has some instructions that operate on pairs of predicates. The SME2 ACLE defines an svboolx2_t type for the associated intrinsics. The patch uses a double-width predicate mode, VNx32BI, to represent the contents, similarly to how data vector tuples work. At present there doesn't seem to be a

[pushed v2 2/5] aarch64: Add svcount_t

2023-12-05 Thread Richard Sandiford
Some SME2 instructions interpret predicates as counters, rather than as bit-per-byte masks. The SME2 ACLE defines an svcount_t type for this interpretation. I don't think we have a better way of representing counters than the VNx16BI that we use for masks. The patch therefore doesn't add a new m

[pushed v2 0/5] aarch64: Add support for SME2

2023-12-05 Thread Richard Sandiford
to do last time - it fixes the incoming and outgoing liveness state for ZA in functions that share ZT0 but not ZA (plus tests) - it has tests for all the new overloaded function "shapes", with some fixes & improvements to the error messages Retested on aarch64-linux-gnu. Ric

[pushed v2 4/5] aarch64: Add ZT0

2023-12-05 Thread Richard Sandiford
SME2 adds a 512-bit lookup table called ZT0. It is enabled and disabled by PSTATE.ZA, just like ZA itself. This patch adds support for the register, including saving and restoring contents. The code reuses the V8DI that was added for LS64, including the associated memory classification rules. (

Re: [PATCH v2 06/11] aarch64: Fix up aarch64_print_operand xzr/wzr case

2023-12-05 Thread Richard Sandiford
Alex Coplan writes: > Hi, > > This is a v2 of: > > https://gcc.gnu.org/pipermail/gcc-patches/2023-November/637612.html > > v1 was approved as-is, but this version pulls out the test into a helper > function which is used by later patches in the series. > > Bootstrapped/regtested as a series on aar

Re: [PATCH v5] aarch64: New RTL optimization pass avoid-store-forwarding.

2023-12-05 Thread Richard Sandiford
Manos Anagnostakis writes: > This is an RTL pass that detects store forwarding from stores to larger loads > (load pairs). > > This optimization is SPEC2017-driven and was found to be beneficial for some > benchmarks, > through testing on ampere1/ampere1a machines. > > For example, it can transf

Re: [PATCH 17/21]AArch64: Add implementation for vector cbranch for Advanced SIMD

2023-12-06 Thread Richard Sandiford
Tamar Christina writes: >> -Original Message- >> From: Richard Sandiford >> Sent: Tuesday, November 28, 2023 5:56 PM >> To: Tamar Christina >> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw >> ; Marcus Shawcroft >> ; Kyrylo Tkachov

Re: [PATCH v6] aarch64: New RTL optimization pass avoid-store-forwarding.

2023-12-07 Thread Richard Sandiford
res and >> > > enabled by a target-specific option. >> > > >> > > If deemed beneficial enough for a default, it will be enabled on >> > > ampere1/ampere1a, >> > > or other architectures as well, without needing to be turned on by this >

Re: [PATCH] aarch64: add -fno-stack-protector to tests

2023-12-07 Thread Richard Sandiford
Marek Polacek writes: > Bootstrapped/regtested on aarch64-pc-linux-gnu, ok for trunk/13? > > -- >8 -- > These tests fail when the testsuite is executed with -fstack-protector-strong. > To avoid this, this patch adds -fno-stack-protector to dg-options. > > The list of FAILs is appended. As you can

Re: [PATCH 3/4] libgcc: aarch64: Add SME runtime support

2023-12-07 Thread Richard Sandiford
Szabolcs Nagy writes: > The call ABI for SME (Scalable Matrix Extension) requires a number of > helper routines which are added to libgcc so they are tied to the > compiler version instead of the libc version. See > https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst#sme-support-r

Re: [PATCH v2] libgcc: aarch64: Add SME runtime support

2023-12-08 Thread Richard Sandiford
Szabolcs Nagy writes: > The call ABI for SME (Scalable Matrix Extension) requires a number of > helper routines which are added to libgcc so they are tied to the > compiler version instead of the libc version. See > https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst#sme-support-r

[PATCH] aarch64: Some tweaks to the early-ra pass

2023-12-08 Thread Richard Sandiford
early-ra's likely_operand_match_p didn't handle relaxed and special memory constraints, which meant that the pass wasn't able to match LD1RQ instructions to their constraints, and so backed out of trying to allocate. This patch fixes that by switching the sense of the match: does the rtx seem appr

Re: [PATCH 15/21]middle-end: [RFC] conditionally support forcing final edge for debugging

2023-12-09 Thread Richard Sandiford
Tamar Christina writes: > Hi All, > > What do people think about having the ability to force only the latch > connected > exit as the exit as a param? I.e. what's in the patch but as a param. > > I found this useful when debugging large example failures as it tells me where > I should be looking.

Re: [PING][PATCH 2/2] arm: Add support for MVE Tail-Predicated Low Overhead Loops

2023-12-09 Thread Richard Sandiford
Sorry for the slow review. Stamatis Markianos-Wright writes: > [...] > diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md > index > 44a04b86cb5806fcf50917826512fd203d42106c..c083f965fa9a40781bc86beb6e63654afd14eac4 > 100644 > --- a/gcc/config/arm/mve.md > +++ b/gcc/config/arm/mve.md > @

Re: [PATCH] aarch64: Add missing driver-aarch64 dependencies

2023-12-09 Thread Richard Sandiford
Andrew Carlotti writes: > Ok for master? > > gcc/ChangeLog: > > * config/aarch64/x-aarch64: Add missing dependencies. > > > diff --git a/gcc/config/aarch64/x-aarch64 b/gcc/config/aarch64/x-aarch64 > index > 3cf701a0a01ab00eaaafdfad14bd90ebbb1d498f..6fd638faaab7cb5bb2309d36d6dea2adf1fb8d32 >

Re: aarch64: Fix +nocrypto handling

2023-12-09 Thread Richard Sandiford
Andrew Carlotti writes: > Additionally, replace all checks for the AARCH64_FL_CRYPTO bit with > checks for (AARCH64_FL_AES | AARCH64_FL_SHA2) instead. The value of the > AARCH64_FL_CRYPTO bit within isa_flags is now ignored, but it is > retained because removing it would make processing the data

Re: aarch64: Fix +nopredres, +nols64 and +nomops

2023-12-09 Thread Richard Sandiford
Andrew Carlotti writes: > For native cpu feature detection, certain features have no entry in > /proc/cpuinfo, so have to be assumed to be present whenever the detected > cpu is supposed to support that feature. > > However, the logic for this was mistakenly implemented by excluding > these featur

Re: [PATCH] aarch64: arm_neon.h - Fix -Wincompatible-pointer-types errors

2023-12-10 Thread Richard Sandiford
Victor Do Nascimento writes: > In the Linux kernel, u64/s64 are [un]signed long long, not [un]signed > long. This means that when the `arm_neon.h' header is used by the > kernel, any use of the `uint64_t' / `in64_t' types needs to be > correctly cast to the correct `__builtin_aarch64_simd_di' / >

Re: [PATCH] libstdc++: add ARM SVE support to std::experimental::simd

2023-12-10 Thread Richard Sandiford
Thanks for the patch and sorry for the slow review. I can only comment on the usage of SVE, rather than on the scaffolding around it. Hopefully Jonathan or others can comment on the rest. The main thing that worries me is: #if _GLIBCXX_SIMD_HAVE_SVE constexpr inline int __sve_vectorized_siz

[pushed] aarch64: Add -funwind-tables to some tests

2023-12-10 Thread Richard Sandiford
The .cfi scans in these tests failed for *-elf targets because those targets don't enable .eh_frame info by default. Tested on aarch64-linux-gnu and aarch64_be-elf, pushed to trunk. Richard gcc/testsuite/ * gcc.target/aarch64/sme/call_sm_switch_1.c: Add -funwind-tables. * gcc.ta

[pushed] aarch64: Skip some SME register save tests on BE

2023-12-10 Thread Richard Sandiford
Big-endian targets need to save Z8-Z15 in the same order as the registers would appear for D8-D15, because the layout is mandated by the EH ABI. BE targets therefore use ST1D instead of the normal STR for those registers (but not for others). That difference is already tested elsewhere and isn't

[pushed] aarch64: XFAIL some SME tests for BE

2023-12-10 Thread Richard Sandiford
The z0_z23 tests rely on being able to propagate: (1) set of double-register z0-z1 (2) copy of z0 to z28 (3) use of z28 to a use of z0. On LE targets it's regcprop that does this. But regcprop punts on (2) because of: https://gcc.gnu.org/pipermail/gcc-patches/2002-July/081990.html This

[pushed] aarch64: Fix SMSTART/SMSTOP save/restore for BE

2023-12-10 Thread Richard Sandiford
VNx16QI (the SVE register byte mode) is the only SVE mode for which LD1 and LDR result in the same register layout for big-endian. It is therefore the only mode for which we allow LDR and STR to be used for big-endian SVE moves. The SME support sometimes needs to use LDR and STR to save and resto

[pushed] aarch64: Fix invalid subregs for BE svread/write_za

2023-12-10 Thread Richard Sandiford
Multi-register svread_za and svwrite_za are implemented using one pattern per register count, with the register contents being bitcast on entry (for writes) or return (for reads). Previously we relied on subregs for this, with the subreg for reads being handled by target-independent code. But usi

Re: [PATCH] libstdc++: add ARM SVE support to std::experimental::simd

2023-12-11 Thread Richard Sandiford
Richard Sandiford writes: > template > struct _SveMaskWrapper > { > ... > > _GLIBCXX_SIMD_INTRINSIC constexpr value_type > operator[](size_t __i) const > { > return _BuiltinSveMaskType::__sve_mask_active_count( >

Re: [PATCH] RTL-SSA: Fix ICE on record_use of RTL_SSA for RISC-V VSETVL PASS

2023-12-11 Thread Richard Sandiford
"juzhe.zh...@rivai.ai" writes: > I think it's reasonable refactor reduction instruction pattern work around > this issue. > > Going to send a patch to apply this solution. > > So drop this patch. Sorry for bothering Richard S. It wasn't a bother. On the patch: as things stand, we try to make t

Re: [PATCH v3] aarch64: Implement the ACLE instruction/data prefetch functions.

2023-12-11 Thread Richard Sandiford
Victor Do Nascimento writes: > Key changes in v3: > * Implement the `require_const_argument' function to ensure the nth > argument in EXP represents a const-type argument in the valid range > given by [minval, maxval), forgoing expansion altogether when an > invalid argument is detected ea

Re: [PATCH] RTL-SSA: Fix ICE on record_use of RTL_SSA for RISC-V VSETVL PASS

2023-12-11 Thread Richard Sandiford
gt; > Is it reasonable to you ? Yeah, the above is OK for trunk, thanks. Richard > > Thanks. > > > juzhe.zh...@rivai.ai > > From: Richard Sandiford > Date: 2023-12-11 19:45 > To: juzhe.zhong\@rivai.ai > CC: Robin Dapp; gcc-patches > Subject: Re: [PATCH] RTL-SSA

Ping: [PATCH] Treat "p" in asms as addressing VOIDmode

2023-12-11 Thread Richard Sandiford
Ping --- check_asm_operands was inconsistent about how it handled "p" after RA compared to before RA. Before RA it tested the address with a void (unknown) memory mode: case CT_ADDRESS: /* Every address operand can be reloaded to fit. */ result = result

Ping: [PATCH] Add a late-combine pass [PR106594]

2023-12-11 Thread Richard Sandiford
Ping --- This patch adds a combine pass that runs late in the pipeline. There are two instances: one between combine and split1, and one after postreload. The pass currently has a single objective: remove definitions by substituting into all uses. The pre-RA version tries to restrict itself to

Re: [PATCH] aarch64: Fix wrong code for bfloat when f16 is enabled [PR 111867]

2023-12-11 Thread Richard Sandiford
Andrew Pinski writes: > The problem here is when f16 is enabled, movbf_aarch64 accepts `Ufc` > as a constraint: > [ w, Ufc ; fconsts , fp16 ] fmov\t%h0, %1 > But that is for fmov values and in this case fmov represents f16 rather than > bfloat16 values. > This means we would get

Re: [PATCH] Treat "p" in asms as addressing VOIDmode

2023-12-11 Thread Richard Sandiford
Jeff Law writes: > On 11/27/23 05:12, Richard Sandiford wrote: >> check_asm_operands was inconsistent about how it handled "p" after >> RA compared to before RA. Before RA it tested the address with a >> void (unknown) memory mode: >> >> case

Re: [PATCH] Treat "p" in asms as addressing VOIDmode

2023-12-12 Thread Richard Sandiford
Andrew Pinski writes: > On Mon, Dec 11, 2023 at 11:46 AM Richard Sandiford > wrote: >> >> Jeff Law writes: >> > On 11/27/23 05:12, Richard Sandiford wrote: >> >> check_asm_operands was inconsistent about how it handled "p" after >> >&g

Re: [PATCH 9/21]middle-end: implement vectorizable_early_exit for codegen of exit code

2023-12-12 Thread Richard Sandiford
Richard Biener writes: > On Mon, 11 Dec 2023, Tamar Christina wrote: >> @@ -5553,6 +5554,83 @@ integer_type_for_mask (tree var, vec_info *vinfo) >>return build_nonstandard_integer_type (def_stmt_info->mask_precision, 1); >> } >> >> +/* Function vect_recog_gcond_pattern >> + >> + Try to fi

Re: [PATCH] tree-optimization/112736 - avoid overread with non-grouped SLP load

2023-12-12 Thread Richard Sandiford
Richard Biener writes: > The following aovids over/under-read of storage when vectorizing > a non-grouped load with SLP. Instead of forcing peeling for gaps > use a smaller load for the last vector which might access excess > elements. This builds upon the existing optimization avoiding > peelin

Re: [PATCH] expmed: Perform mask extraction via QImode [PR112773].

2023-12-12 Thread Richard Sandiford
Robin Dapp writes: > What also works is something like: > > scalar_mode extract_mode = innermode; > if (GET_MODE_CLASS (outermode) == MODE_VECTOR_BOOL) > extract_mode = smallest_int_mode_for_size > (GET_MODE_PRECISION (innermode)); > > however > >> So yes,

Re: [PATCH] expmed: Perform mask extraction via QImode [PR112773].

2023-12-12 Thread Richard Sandiford
Robin Dapp writes: >> - Change the second mode to vec_extract_optab. This is only a name >> lookup, and it seems more natural to continue using the real element mode. > > Am I understanding correctly that this implies we should provide > a vec_extractbi expander? (with the innermode being BImo

Re: [PATCH v2 09/11] aarch64: Rewrite non-writeback ldp/stp patterns

2023-12-12 Thread Richard Sandiford
Alex Coplan writes: > Hi, > > This is a v2 version which addresses feedback from Richard's review > here: > > https://gcc.gnu.org/pipermail/gcc-patches/2023-November/637648.html > > I'll reply inline to address specific comments. > > Bootstrapped/regtested on aarch64-linux-gnu, OK for trunk? > > T

Re: [PATCH v3 08/11] aarch64: Generalize writeback ldp/stp patterns

2023-12-12 Thread Richard Sandiford
Alex Coplan writes: > Hi, > > This is a v3 patch which is rebased on top of the SME changes. > Otherwise it is the same as v2, posted here: > > https://gcc.gnu.org/pipermail/gcc-patches/2023-December/639367.html > > Bootstrapped/regtested as a series on aarch64-linux-gnu, OK for trunk? > > Thanks,

[pushed] aarch64: Add a %Z operand modifier for SVE registers

2023-11-07 Thread Richard Sandiford
This patch adds a %Z operand modifier that prints registers as SVE z registers. The SME patches need this, but so do Tamar's patches. I'm separating this out to unblock those. We should probably document the [wxbhsdqZ] modifiers as user-facing, but doing that for all of them is a separate patch.

Re: [PATCH 5/5] aarch64: Add rsr128 and wsr128 ACLE tests

2023-11-07 Thread Richard Sandiford
Victor Do Nascimento writes: > Extend existing unit tests for the ACLE system register manipulation > functions to include 128-bit tests. > > gcc/testsuite/ChangeLog: > > * gcc/testsuite/gcc.target/aarch64/acle/rwsr.c (get_rsr128): New. > (set_wsr128): Likewise. > --- > gcc/testsuite/

Re: [PATCH 4/5] aarch64: Implement 128-bit extension to ACLE sysreg r/w builtins

2023-11-07 Thread Richard Sandiford
Victor Do Nascimento writes: > Implement the ACLE builtins for 128-bit system register manipulation: > > * __uint128_t __arm_rsr128(const char *special_register); > * void __arm_wsr128(const char *special_register, __uint128_t value); > > gcc/ChangeLog: > > * config/aarch64/aarch64-built

Re: [PATCH 3/5] aarch64: Sync `aarch64-sys-regs.def' with Binutils.

2023-11-07 Thread Richard Sandiford
Victor Do Nascimento writes: > This patch updates `aarch64-sys-regs.def', bringing it into sync with > the Binutils source. > > gcc/ChangeLog: > > * config/aarch64/aarch64-sys-regs.def (par_el1): New. > (rcwmask_el1): Likewise. > (rcwsmask_el1): Likewise. > (ttbr0_el1): Lik

Re: [PATCH 1/5] aarch64: Add march flags for +the and +d128 arch extensions

2023-11-07 Thread Richard Sandiford
Victor Do Nascimento writes: > Given the introduction of optional 128-bit page table descriptor and > translation hardening extension support with the Arm9.4-a > architecture, this introduces the relevant flags to enable the reading > and writing of 128-bit system registers. > > The `+d128' -march

Re: [PATCH 2/5] aarch64: Add support for GCS system registers with the +gcs modifier

2023-11-07 Thread Richard Sandiford
Victor Do Nascimento writes: > Given the introduction of system registers associated with the Guarded > Control Stack extension to Armv9.4-a in Binutils and their reliance on > the `+gcs' modifier, we implement the necessary changes in GCC to > allow for them to be recognized by the compiler. > >

Re: [PATCH 07/12] mode-switching: Allow targets to set the mode for EH handlers

2023-11-07 Thread Richard Sandiford
Thanks for the reviews. Jeff Law writes: > On 11/5/23 11:48, Richard Sandiford wrote: >> The mode-switching pass already had hooks to say what mode >> an entity is in on entry to a function and what mode it must >> be in on return. For SME, we also want to say what

Re: [PATCH 10/12] mode-switching: Use 1-based edge aux fields

2023-11-07 Thread Richard Sandiford
Jeff Law writes: > On 11/5/23 11:49, Richard Sandiford wrote: >> The pass used the edge aux field to record which mode change >> should happen on the edge, with -1 meaning "none". It's more >> convenient for later patches to leave aux zero for "none&qu

Re: [PATCH 0/7] ira/lra: Support subreg coalesce

2023-11-08 Thread Richard Sandiford
Lehua Ding writes: > Hi, > > These patchs try to support subreg coalesce feature in > register allocation passes (ira and lra). Thanks a lot for the series. This is definitely something we've needed for a while. I probably won't be able to look at it in detail for a couple of weeks (and the rea

Re: PING^1 [PATCH v3] sched: Change no_real_insns_p to no_real_nondebug_insns_p [PR108273]

2023-11-08 Thread Richard Sandiford
"Kewen.Lin" writes: > Hi, > > Gentle ping this: > > https://gcc.gnu.org/pipermail/gcc-patches/2023-October/634201.html Sorry for the lack of review on this. Personally, I've never looked at this part of code base in detail, so I don't think I can do a proper review. I'll try to have a look in s

Re: [PATCH 5/5] aarch64: Add rsr128 and wsr128 ACLE tests

2023-11-08 Thread Richard Sandiford
Christophe Lyon writes: > On 11/7/23 23:51, Richard Sandiford wrote: >> Victor Do Nascimento writes: >>> Extend existing unit tests for the ACLE system register manipulation >>> functions to include 128-bit tests. >>> >>> gcc/testsuite/ChangeLog: &

Re: [PATCH]AArch64: Use SVE unpredicated LOGICAL expressions when Advanced SIMD inefficient [PR109154]

2023-11-09 Thread Richard Sandiford
Tamar Christina writes: >> >> > + "&& TARGET_SVE && rtx_equal_p (operands[0], operands[1]) >> >> > + && satisfies_constraint_ (operands[2]) >> >> > + && FP_REGNUM_P (REGNO (operands[0]))" >> >> > + [(const_int 0)] >> >> > + { >> >> > +rtx op1 = lowpart_subreg (mode, operands[1], >> mode

Re: [1/4] aarch64: Refactor check_required_extensions

2023-11-09 Thread Richard Sandiford
Andrew Carlotti writes: > Move SVE extension checking functionality to aarch64-builtins.cc, so > that it can be shared by non-SVE intrinsics. > > gcc/ChangeLog: > > * config/aarch64/aarch64-sve-builtins.cc (check_builtin_call) > (expand_builtin): Update calls to the below. > (rep

Re: [PATCH 0/7] ira/lra: Support subreg coalesce

2023-11-10 Thread Richard Sandiford
Lehua Ding writes: > Hi Richard, > > On 2023/11/8 17:40, Richard Sandiford wrote: >> Tracking subreg liveness will sometimes expose dead code that >> wasn't obvious without it. PR89606 has an example of this. >> There the dead code was introduced by init-regs,

Re: [2/4] aarch64: Fix tme intrinsic availability

2023-11-10 Thread Richard Sandiford
Andrew Carlotti writes: > The availability of tme intrinsics was previously gated at both > initialisation time (using global target options) and usage time > (accounting for function-specific target options). This patch removes > the check at initialisation time, and also moves the intrinsics ou

Re: [PATCH 0/7] ira/lra: Support subreg coalesce

2023-11-10 Thread Richard Sandiford
Lehua Ding writes: > On 2023/11/10 18:16, Richard Sandiford wrote: >> Lehua Ding writes: >>> Hi Richard, >>> >>> On 2023/11/8 17:40, Richard Sandiford wrote: >>>> Tracking subreg liveness will sometimes expose dead code that >>>>

[pushed] Allow md iterators to include other iterators

2023-11-10 Thread Richard Sandiford
add_map_value (end_ptr, v->number, joined); + } + } + else + { + number = group->find_builtin (name.string); + end_ptr = add_map_value (end_ptr, number, string); + } c = read_skip_spaces (); } while (c != ']'); -- 2.25.1 Date: Fri, 10 Nov 2023 15:47:53 + From: Richard Sandiford

Re: [PATCH v2] DSE: Allow vector type for get_stored_val when read < store

2023-11-11 Thread Richard Sandiford
Jeff Law writes: > On 11/8/23 23:08, pan2...@intel.com wrote: >> From: Pan Li >> >> Update in v2: >> * Move vector type support to get_stored_val. >> >> Original log: >> >> This patch would like to allow the vector mode in the >> get_stored_val in the DSE. It is valid for the read >> rtx if an

Re: [PATCH 0/7] ira/lra: Support subreg coalesce

2023-11-11 Thread Richard Sandiford
Jeff Law writes: > On 11/8/23 02:40, Richard Sandiford wrote: >> Lehua Ding writes: >>> Hi, >>> >>> These patchs try to support subreg coalesce feature in >>> register allocation passes (ira and lra). >> >> Thanks a lot for the series.

Re: [PATCH 10/12] mode-switching: Use 1-based edge aux fields

2023-11-11 Thread Richard Sandiford
Jeff Law writes: > On 11/7/23 17:35, Richard Sandiford wrote: > >> I could have sworn that there was something that checked that passes >> left edge aux fields clear, but it looks like I misremembered. So I >> probably need to stick a clear_aux_for_edges () call above the

Re: [PATCH 11/12] mode-switching: Add a target-configurable confluence operator

2023-11-11 Thread Richard Sandiford
Jeff Law writes: > On 11/5/23 11:50, Richard Sandiford wrote: >> The mode-switching pass assumed that all of an entity's modes >> were mutually exclusive. However, the upcoming SME changes >> have an entity with some overlapping modes, so that there is >> so

Re: [PATCH] aarch64: New RTL optimization pass avoid-store-forwarding.

2023-11-11 Thread Richard Sandiford
Thanks for the patch. Manos Anagnostakis writes: > This is an RTL pass that detects store forwarding from stores to larger loads > (load pairs). > > This optimization is SPEC2017-driven and was found to be beneficial for some > benchmarks, > through testing on ampere1/ampere1a machines. > > For

Re: [PATCH 11/12] mode-switching: Add a target-configurable confluence operator

2023-11-11 Thread Richard Sandiford
Jeff Law writes: > On 11/11/23 08:54, Richard Sandiford wrote: >> Jeff Law writes: >>> On 11/5/23 11:50, Richard Sandiford wrote: >>>> The mode-switching pass assumed that all of an entity's modes >>>> were mutually exclusive. However, the upc

Re: [PATCH 0/7] ira/lra: Support subreg coalesce

2023-11-12 Thread Richard Sandiford
钟居哲 writes: > Hi, Richard. > >>> Maybe dead lanes are better tracked at the gimple level though, not sure. >>> (But AArch64 might need to lower lane operations more than it does now if >>> we want gimple to handle it.) > > We were trying to address such issue at GIMPLE leve at the beginning. > Tra

[PATCH 0/5] Add support for operand-specific alignment requirements

2023-11-12 Thread Richard Sandiford
ad is only small for targets that use the new feature. Almost all of the new code gets optimised away on targets that don't use the feature. Richard Sandiford (5): Add register filter operand to define_register_constraint recog: Handle register filters lra: Handle register filters i

[PATCH 1/5] Add register filter operand to define_register_constraint

2023-11-12 Thread Richard Sandiford
The main way of enforcing registers to be aligned is through HARD_REGNO_MODE_OK. But this is a global property that applies to all operands. A given (regno, mode) pair is either globally valid or globally invalid. This patch instead adds a way of specifying that individual operands must be align

[PATCH 2/5] recog: Handle register filters

2023-11-12 Thread Richard Sandiford
The main (but simplest) part of this patch makes constrain_operands take register filters into account. The rest of the patch adds register filter information to operand_alternative. Generally, if two register constraints have different register filters, it's better if they're in separate alterna

[PATCH 3/5] lra: Handle register filters

2023-11-12 Thread Richard Sandiford
This patch makes LRA apply register filters. This plus the recog change is enough for correct code generation, but a follow-on IRA patch improves the allocation. All the new code should be optimised away on targets that don't use register filters. That's because get_register_filter just wraps "r

[PATCH 4/5] ira: Handle register filters

2023-11-12 Thread Richard Sandiford
This patch makes IRA apply register filters when picking hard registers. All the new code should be optimised away on targets that don't use register filters. On targets that do use them, the new register_filters bitfield is expected to be only a handful of bits. Information about register filter

[PATCH 5/5] Add an aligned_register_operand predicate

2023-11-12 Thread Richard Sandiford
This patch adds a target-independent aligned_register_operand predicate, for use with register constraints that use filters to impose an alignment. The definition deliberately jetisons some of the historical baggage in general_operand. gcc/ * common.md (aligned_register_operand): New pred

Re: [PATCH]AArch64: only discount MLA for vector and scalar statements

2023-11-16 Thread Richard Sandiford
Tamar Christina writes: > Hi All, > > In testcases gcc.dg/tree-ssa/slsr-19.c and gcc.dg/tree-ssa/slsr-20.c we have > a > fairly simple computation. On the current generic costing we generate: > > f: > add w0, w0, 2 > maddw1, w0, w1, w1 > lsl w0, w1, 1 >

[PATCH 00/21] aarch64: Add support for SME

2023-11-17 Thread Richard Sandiford
This series of patches adds support for SME. A follow-on series will add SME2 on top. All of the detail is in the individual patch summaries. The series can't go in yet, because it depends on: https://gcc.gnu.org/pipermail/gcc-patches/2023-September/629713.html and some reviewed-but-unpushed

[PATCH 01/21] aarch64: Generalise require_immediate_lane_index

2023-11-17 Thread Richard Sandiford
require_immediate_lane_index previously hard-coded the assumption that the group size is determined by the argument immediately before the index. However, for SME, there are cases where it should be determined by an earlier argument instead. gcc/ * config/aarch64/aarch64-sve-builtins.h:

[PATCH 02/21] aarch64: Add a result_mode helper function

2023-11-17 Thread Richard Sandiford
SME will add more intrinsics whose expansion code requires the mode of the function return value. This patch adds an associated helper routine. gcc/ * config/aarch64/aarch64-sve-builtins.h (function_expander::result_mode): New member function. * config/aarch64/aarch64-sve-

[PATCH 03/21] aarch64: Use SVE's RDVL instruction

2023-11-17 Thread Richard Sandiford
We didn't previously use SVE's RDVL instruction, since the CNT* forms are preferred and provide most of the range. However, there are some cases that RDVL can handle and CNT* can't, and using RDVL-like instructions becomes important for SME. gcc/ * config/aarch64/aarch64-protos.h (aarch64

[PATCH 04/21] aarch64: Make AARCH64_FL_SVE requirements explicit

2023-11-17 Thread Richard Sandiford
So far, all intrinsics covered by the aarch64-sve-builtins* framework have (naturally enough) required at least SVE. However, arm_sme.h defines a couple of intrinsics that can be called by any code. It's therefore necessary to make the implicit SVE requirement explicit. gcc/ * config/aarc

[PATCH 05/21] aarch64: Add group suffixes to SVE intrinsics

2023-11-17 Thread Richard Sandiford
The SME2 ACLE adds a new "group" suffix component to the naming convention for SVE intrinsics. This is also used in the new tuple forms of the svreinterpret intrinsics. This patch adds support for group suffixes and defines the x2, x3 and x4 suffixes that are needed for the svreinterprets. gcc/

[PATCH 06/21] aarch64: Add tuple forms of svreinterpret

2023-11-17 Thread Richard Sandiford
SME2 adds a number of intrinsics that operate on tuples of 2 and 4 vectors. The ACLE therefore extends the existing svreinterpret intrinsics to handle tuples as well. gcc/ * config/aarch64/aarch64-sve-builtins-base.cc (svreinterpret_impl::fold): Punt on tuple forms. (svrei

[PATCH 07/21] aarch64: Add arm_streaming(_compatible) attributes

2023-11-17 Thread Richard Sandiford
This patch adds support for recognising the SME arm::streaming and arm::streaming_compatible attributes. These attributes respectively describe whether the processor is definitely in "streaming mode" (PSTATE.SM==1), whether the processor is definitely not in streaming mode (PSTATE.SM==0), or wheth

[PATCH 08/21] aarch64: Add +sme

2023-11-17 Thread Richard Sandiford
This patch adds the +sme ISA feature and requires it to be present when compiling arm_streaming code. (arm_streaming_compatible code does not necessarily assume the presence of SME. It just has to work when SME is present and streaming mode is enabled.) gcc/ * doc/invoke.texi: Document S

[PATCH 09/21] aarch64: Distinguish streaming-compatible AdvSIMD insns

2023-11-17 Thread Richard Sandiford
The vast majority of Advanced SIMD instructions are not available in streaming mode, but some of the load/store/move instructions are. This patch adds a new target feature macro called TARGET_BASE_SIMD for this streaming-compatible subset. The vector-to-vector move instructions are not streaming-

[PATCH 12/21] aarch64: Add support for SME ZA attributes

2023-11-17 Thread Richard Sandiford
SME has an array called ZA that can be enabled and disabled separately from streaming mode. A status bit called PSTATE.ZA indicates whether ZA is currently enabled or not. In C and C++, the state of PSTATE.ZA is controlled using function attributes. There are four attributes that can be attached

[PATCH 15/21] aarch64: Generalise unspec_based_function_base

2023-11-17 Thread Richard Sandiford
Until now, SVE intrinsics that map directly to unspecs have always used type suffix 0 to distinguish between signed integers, unsigned integers, and floating-point values. SME adds functions that need to use type suffix 1 instead. This patch generalises the classes accordingly. gcc/ * conf

[PATCH 11/21] aarch64: Switch PSTATE.SM around calls

2023-11-17 Thread Richard Sandiford
This patch adds support for switching to the appropriate SME mode for each call. Switching to streaming mode requires an SMSTART SM instruction and switching to non-streaming mode requires an SMSTOP SM instruction. If the call is being made from streaming-compatible code, these switches are condi

[PATCH 16/21] aarch64: Generalise _m rules for SVE intrinsics

2023-11-17 Thread Richard Sandiford
In SVE there was a simple rule that unary merging (_m) intrinsics had a separate initial argument to specify the values of inactive lanes, whereas other merging functions took inactive lanes from the first operand to the operation. That rule began to break down in SVE2, and it continues to do so i

[PATCH 13/21] aarch64: Add a register class for w12-w15

2023-11-17 Thread Richard Sandiford
Some SME instructions use w12-w15 to index ZA. This patch adds a register class for that range. gcc/ * config/aarch64/aarch64.h (W12_W15_REGNUM_P): New macro. (W12_W15_REGS): New register class. (REG_CLASS_NAMES, REG_CLASS_CONTENTS): Add entries for it. * config/aa

[PATCH 14/21] aarch64: Add a VNx1TI mode

2023-11-17 Thread Richard Sandiford
Although TI isn't really a native SVE element mode, it's convenient for SME if we define VNx1TI anyway, so that it can be used to distinguish .Q ZA operations from others. It's purely an RTL convenience and isn't (yet) a valid storage mode. gcc/ * config/aarch64/aarch64-modes.def: Add VNx

[PATCH 19/21] aarch64: Handle PSTATE.SM across abnormal edges

2023-11-17 Thread Richard Sandiford
PSTATE.SM is always off on entry to an exception handler, and on entry to a nonlocal goto receiver. Those entry points need to switch PSTATE.SM back to the appropriate state for the current function. In the case of streaming-compatible functions, they need to restore the mode that the caller was o

[PATCH 18/21] aarch64: Add support for __arm_locally_streaming

2023-11-17 Thread Richard Sandiford
This patch adds support for the __arm_locally_streaming attribute, which allows a function to use SME internally without changing the function's ABI. The attribute is valid but redundant for __arm_streaming functions. gcc/ * config/aarch64/aarch64.cc (aarch64_arm_attribute_table): Add

[PATCH 21/21] aarch64: Update sibcall handling for SME

2023-11-17 Thread Richard Sandiford
We only support tail calls between functions with the same PSTATE.ZA setting ("private-ZA" to "private-ZA" and "shared-ZA" to "shared-ZA"). Only a normal non-streaming function can tail-call another non-streaming function, and only a streaming function can tail-call another streaming function. An

[PATCH 20/21] aarch64: Enforce inlining restrictions for SME

2023-11-17 Thread Richard Sandiford
A function that has local ZA state cannot be inlined into its caller, since we only support managing ZA switches at function scope. A function whose body directly clobbers ZA state cannot be inlined into a function with ZA state. A function whose body requires a particular PSTATE.SM setting can o

aarch64: Add support for SME2

2023-11-17 Thread Richard Sandiford
This series of patches adds support for SME2. It is gated behind the earlier series for SME. All of the detail is in the individual patch summaries. Tested on aarch64-linux-gnu. Richard

[PATCH 1/5] aarch64: Add +sme2

2023-11-17 Thread Richard Sandiford
gcc/ * doc/invoke.texi: Document +sme2. * doc/sourcebuild.texi: Document aarch64_sme2. * config/aarch64/aarch64-option-extensions.def (AARCH64_OPT_EXTENSION): Add sme2. * config/aarch64/aarch64.h (AARCH64_ISA_SME2, TARGET_SME2): New macros. gcc/testsuite/

<    5   6   7   8   9   10   11   12   13   14   >