SME has an array called ZA that can be enabled and disabled separately
from streaming mode. A status bit called PSTATE.ZA indicates whether
ZA is currently enabled or not.
In C and C++, the state of PSTATE.ZA is controlled using function
attributes. There are four attributes that can be attached
Until now, SVE intrinsics that map directly to unspecs
have always used type suffix 0 to distinguish between signed
integers, unsigned integers, and floating-point values.
SME adds functions that need to use type suffix 1 instead.
This patch generalises the classes accordingly.
gcc/
* conf
PSTATE.SM is always off on entry to an exception handler, and on entry
to a nonlocal goto receiver. Those entry points need to switch
PSTATE.SM back to the appropriate state for the current function.
In the case of streaming-compatible functions, they need to restore
the mode that the caller was o
A function that has local ZA state cannot be inlined into its caller,
since we only support managing ZA switches at function scope.
A function whose body directly clobbers ZA state cannot be inlined into
a function with ZA state.
A function whose body requires a particular PSTATE.SM setting can o
In SVE there was a simple rule that unary merging (_m) intrinsics
had a separate initial argument to specify the values of inactive
lanes, whereas other merging functions took inactive lanes from
the first operand to the operation.
That rule began to break down in SVE2, and it continues to do
so i
We only support tail calls between functions with the same PSTATE.ZA
setting ("private-ZA" to "private-ZA" and "shared-ZA" to "shared-ZA").
Only a normal non-streaming function can tail-call another non-streaming
function, and only a streaming function can tail-call another streaming
function. An
This adds support for the SME parts of arm_sme.h.
gcc/
* doc/invoke.texi: Document +sme-i16i64 and +sme-f64f64.
* config.gcc (aarch64*-*-*): Add arm_sme.h to the list of headers
to install and aarch64-sve-builtins-sme.o to the list of objects
to build.
* con
gcc/
* doc/invoke.texi: Document +sme2.
* doc/sourcebuild.texi: Document aarch64_sme2.
* config/aarch64/aarch64-option-extensions.def (AARCH64_OPT_EXTENSION):
Add sme2.
* config/aarch64/aarch64.h (AARCH64_ISA_SME2, TARGET_SME2): New macros.
gcc/testsuite/
SME2 has some instructions that operate on pairs of predicates.
The SME2 ACLE defines an svboolx2_t type for the associated
intrinsics.
The patch uses a double-width predicate mode, VNx32BI, to represent
the contents, similarly to how data vector tuples work. At present
there doesn't seem to be a
Some SME2 instructions interpret predicates as counters, rather than
as bit-per-byte masks. The SME2 ACLE defines an svcount_t type for
this interpretation.
I don't think we have a better way of representing counters than
the VNx16BI that we use for masks. The patch therefore doesn't
add a new m
to do last time
- it fixes the incoming and outgoing liveness state for ZA in functions
that share ZT0 but not ZA (plus tests)
- it has tests for all the new overloaded function "shapes", with some
fixes & improvements to the error messages
Retested on aarch64-linux-gnu.
Ric
SME2 adds a 512-bit lookup table called ZT0. It is enabled
and disabled by PSTATE.ZA, just like ZA itself. This patch
adds support for the register, including saving and restoring
contents.
The code reuses the V8DI that was added for LS64, including
the associated memory classification rules. (
Alex Coplan writes:
> Hi,
>
> This is a v2 of:
>
> https://gcc.gnu.org/pipermail/gcc-patches/2023-November/637612.html
>
> v1 was approved as-is, but this version pulls out the test into a helper
> function which is used by later patches in the series.
>
> Bootstrapped/regtested as a series on aar
Manos Anagnostakis writes:
> This is an RTL pass that detects store forwarding from stores to larger loads
> (load pairs).
>
> This optimization is SPEC2017-driven and was found to be beneficial for some
> benchmarks,
> through testing on ampere1/ampere1a machines.
>
> For example, it can transf
Tamar Christina writes:
>> -Original Message-
>> From: Richard Sandiford
>> Sent: Tuesday, November 28, 2023 5:56 PM
>> To: Tamar Christina
>> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw
>> ; Marcus Shawcroft
>> ; Kyrylo Tkachov
res and
>> > > enabled by a target-specific option.
>> > >
>> > > If deemed beneficial enough for a default, it will be enabled on
>> > > ampere1/ampere1a,
>> > > or other architectures as well, without needing to be turned on by this
>
Marek Polacek writes:
> Bootstrapped/regtested on aarch64-pc-linux-gnu, ok for trunk/13?
>
> -- >8 --
> These tests fail when the testsuite is executed with -fstack-protector-strong.
> To avoid this, this patch adds -fno-stack-protector to dg-options.
>
> The list of FAILs is appended. As you can
Szabolcs Nagy writes:
> The call ABI for SME (Scalable Matrix Extension) requires a number of
> helper routines which are added to libgcc so they are tied to the
> compiler version instead of the libc version. See
> https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst#sme-support-r
Szabolcs Nagy writes:
> The call ABI for SME (Scalable Matrix Extension) requires a number of
> helper routines which are added to libgcc so they are tied to the
> compiler version instead of the libc version. See
> https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst#sme-support-r
early-ra's likely_operand_match_p didn't handle relaxed and special
memory constraints, which meant that the pass wasn't able to match
LD1RQ instructions to their constraints, and so backed out of
trying to allocate. This patch fixes that by switching the sense
of the match: does the rtx seem appr
Tamar Christina writes:
> Hi All,
>
> What do people think about having the ability to force only the latch
> connected
> exit as the exit as a param? I.e. what's in the patch but as a param.
>
> I found this useful when debugging large example failures as it tells me where
> I should be looking.
Sorry for the slow review.
Stamatis Markianos-Wright writes:
> [...]
> diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
> index
> 44a04b86cb5806fcf50917826512fd203d42106c..c083f965fa9a40781bc86beb6e63654afd14eac4
> 100644
> --- a/gcc/config/arm/mve.md
> +++ b/gcc/config/arm/mve.md
> @
Andrew Carlotti writes:
> Ok for master?
>
> gcc/ChangeLog:
>
> * config/aarch64/x-aarch64: Add missing dependencies.
>
>
> diff --git a/gcc/config/aarch64/x-aarch64 b/gcc/config/aarch64/x-aarch64
> index
> 3cf701a0a01ab00eaaafdfad14bd90ebbb1d498f..6fd638faaab7cb5bb2309d36d6dea2adf1fb8d32
>
Andrew Carlotti writes:
> Additionally, replace all checks for the AARCH64_FL_CRYPTO bit with
> checks for (AARCH64_FL_AES | AARCH64_FL_SHA2) instead. The value of the
> AARCH64_FL_CRYPTO bit within isa_flags is now ignored, but it is
> retained because removing it would make processing the data
Andrew Carlotti writes:
> For native cpu feature detection, certain features have no entry in
> /proc/cpuinfo, so have to be assumed to be present whenever the detected
> cpu is supposed to support that feature.
>
> However, the logic for this was mistakenly implemented by excluding
> these featur
Victor Do Nascimento writes:
> In the Linux kernel, u64/s64 are [un]signed long long, not [un]signed
> long. This means that when the `arm_neon.h' header is used by the
> kernel, any use of the `uint64_t' / `in64_t' types needs to be
> correctly cast to the correct `__builtin_aarch64_simd_di' /
>
Thanks for the patch and sorry for the slow review.
I can only comment on the usage of SVE, rather than on the scaffolding
around it. Hopefully Jonathan or others can comment on the rest.
The main thing that worries me is:
#if _GLIBCXX_SIMD_HAVE_SVE
constexpr inline int __sve_vectorized_siz
The .cfi scans in these tests failed for *-elf targets because
those targets don't enable .eh_frame info by default.
Tested on aarch64-linux-gnu and aarch64_be-elf, pushed to trunk.
Richard
gcc/testsuite/
* gcc.target/aarch64/sme/call_sm_switch_1.c: Add -funwind-tables.
* gcc.ta
Big-endian targets need to save Z8-Z15 in the same order as
the registers would appear for D8-D15, because the layout is
mandated by the EH ABI. BE targets therefore use ST1D instead
of the normal STR for those registers (but not for others).
That difference is already tested elsewhere and isn't
The z0_z23 tests rely on being able to propagate:
(1) set of double-register z0-z1
(2) copy of z0 to z28
(3) use of z28
to a use of z0. On LE targets it's regcprop that does this.
But regcprop punts on (2) because of:
https://gcc.gnu.org/pipermail/gcc-patches/2002-July/081990.html
This
VNx16QI (the SVE register byte mode) is the only SVE mode for which
LD1 and LDR result in the same register layout for big-endian. It is
therefore the only mode for which we allow LDR and STR to be used for
big-endian SVE moves.
The SME support sometimes needs to use LDR and STR to save and resto
Multi-register svread_za and svwrite_za are implemented using one
pattern per register count, with the register contents being bitcast
on entry (for writes) or return (for reads). Previously we relied
on subregs for this, with the subreg for reads being handled by
target-independent code. But usi
Richard Sandiford writes:
> template
> struct _SveMaskWrapper
> {
> ...
>
> _GLIBCXX_SIMD_INTRINSIC constexpr value_type
> operator[](size_t __i) const
> {
> return _BuiltinSveMaskType::__sve_mask_active_count(
>
"juzhe.zh...@rivai.ai" writes:
> I think it's reasonable refactor reduction instruction pattern work around
> this issue.
>
> Going to send a patch to apply this solution.
>
> So drop this patch. Sorry for bothering Richard S.
It wasn't a bother.
On the patch: as things stand, we try to make t
Victor Do Nascimento writes:
> Key changes in v3:
> * Implement the `require_const_argument' function to ensure the nth
> argument in EXP represents a const-type argument in the valid range
> given by [minval, maxval), forgoing expansion altogether when an
> invalid argument is detected ea
gt;
> Is it reasonable to you ?
Yeah, the above is OK for trunk, thanks.
Richard
>
> Thanks.
>
>
> juzhe.zh...@rivai.ai
>
> From: Richard Sandiford
> Date: 2023-12-11 19:45
> To: juzhe.zhong\@rivai.ai
> CC: Robin Dapp; gcc-patches
> Subject: Re: [PATCH] RTL-SSA
Ping
---
check_asm_operands was inconsistent about how it handled "p" after
RA compared to before RA. Before RA it tested the address with a
void (unknown) memory mode:
case CT_ADDRESS:
/* Every address operand can be reloaded to fit. */
result = result
Ping
---
This patch adds a combine pass that runs late in the pipeline.
There are two instances: one between combine and split1, and one
after postreload.
The pass currently has a single objective: remove definitions by
substituting into all uses. The pre-RA version tries to restrict
itself to
Andrew Pinski writes:
> The problem here is when f16 is enabled, movbf_aarch64 accepts `Ufc`
> as a constraint:
> [ w, Ufc ; fconsts , fp16 ] fmov\t%h0, %1
> But that is for fmov values and in this case fmov represents f16 rather than
> bfloat16 values.
> This means we would get
Jeff Law writes:
> On 11/27/23 05:12, Richard Sandiford wrote:
>> check_asm_operands was inconsistent about how it handled "p" after
>> RA compared to before RA. Before RA it tested the address with a
>> void (unknown) memory mode:
>>
>> case
Andrew Pinski writes:
> On Mon, Dec 11, 2023 at 11:46 AM Richard Sandiford
> wrote:
>>
>> Jeff Law writes:
>> > On 11/27/23 05:12, Richard Sandiford wrote:
>> >> check_asm_operands was inconsistent about how it handled "p" after
>> >&g
Richard Biener writes:
> On Mon, 11 Dec 2023, Tamar Christina wrote:
>> @@ -5553,6 +5554,83 @@ integer_type_for_mask (tree var, vec_info *vinfo)
>>return build_nonstandard_integer_type (def_stmt_info->mask_precision, 1);
>> }
>>
>> +/* Function vect_recog_gcond_pattern
>> +
>> + Try to fi
Richard Biener writes:
> The following aovids over/under-read of storage when vectorizing
> a non-grouped load with SLP. Instead of forcing peeling for gaps
> use a smaller load for the last vector which might access excess
> elements. This builds upon the existing optimization avoiding
> peelin
Robin Dapp writes:
> What also works is something like:
>
> scalar_mode extract_mode = innermode;
> if (GET_MODE_CLASS (outermode) == MODE_VECTOR_BOOL)
> extract_mode = smallest_int_mode_for_size
> (GET_MODE_PRECISION (innermode));
>
> however
>
>> So yes,
Robin Dapp writes:
>> - Change the second mode to vec_extract_optab. This is only a name
>> lookup, and it seems more natural to continue using the real element mode.
>
> Am I understanding correctly that this implies we should provide
> a vec_extractbi expander? (with the innermode being BImo
Alex Coplan writes:
> Hi,
>
> This is a v2 version which addresses feedback from Richard's review
> here:
>
> https://gcc.gnu.org/pipermail/gcc-patches/2023-November/637648.html
>
> I'll reply inline to address specific comments.
>
> Bootstrapped/regtested on aarch64-linux-gnu, OK for trunk?
>
> T
Alex Coplan writes:
> Hi,
>
> This is a v3 patch which is rebased on top of the SME changes.
> Otherwise it is the same as v2, posted here:
>
> https://gcc.gnu.org/pipermail/gcc-patches/2023-December/639367.html
>
> Bootstrapped/regtested as a series on aarch64-linux-gnu, OK for trunk?
>
> Thanks,
This patch adds a %Z operand modifier that prints registers as SVE z
registers. The SME patches need this, but so do Tamar's patches.
I'm separating this out to unblock those.
We should probably document the [wxbhsdqZ] modifiers as
user-facing, but doing that for all of them is a separate patch.
Victor Do Nascimento writes:
> Extend existing unit tests for the ACLE system register manipulation
> functions to include 128-bit tests.
>
> gcc/testsuite/ChangeLog:
>
> * gcc/testsuite/gcc.target/aarch64/acle/rwsr.c (get_rsr128): New.
> (set_wsr128): Likewise.
> ---
> gcc/testsuite/
Victor Do Nascimento writes:
> Implement the ACLE builtins for 128-bit system register manipulation:
>
> * __uint128_t __arm_rsr128(const char *special_register);
> * void __arm_wsr128(const char *special_register, __uint128_t value);
>
> gcc/ChangeLog:
>
> * config/aarch64/aarch64-built
Victor Do Nascimento writes:
> This patch updates `aarch64-sys-regs.def', bringing it into sync with
> the Binutils source.
>
> gcc/ChangeLog:
>
> * config/aarch64/aarch64-sys-regs.def (par_el1): New.
> (rcwmask_el1): Likewise.
> (rcwsmask_el1): Likewise.
> (ttbr0_el1): Lik
Victor Do Nascimento writes:
> Given the introduction of optional 128-bit page table descriptor and
> translation hardening extension support with the Arm9.4-a
> architecture, this introduces the relevant flags to enable the reading
> and writing of 128-bit system registers.
>
> The `+d128' -march
Victor Do Nascimento writes:
> Given the introduction of system registers associated with the Guarded
> Control Stack extension to Armv9.4-a in Binutils and their reliance on
> the `+gcs' modifier, we implement the necessary changes in GCC to
> allow for them to be recognized by the compiler.
>
>
Thanks for the reviews.
Jeff Law writes:
> On 11/5/23 11:48, Richard Sandiford wrote:
>> The mode-switching pass already had hooks to say what mode
>> an entity is in on entry to a function and what mode it must
>> be in on return. For SME, we also want to say what
Jeff Law writes:
> On 11/5/23 11:49, Richard Sandiford wrote:
>> The pass used the edge aux field to record which mode change
>> should happen on the edge, with -1 meaning "none". It's more
>> convenient for later patches to leave aux zero for "none&qu
Lehua Ding writes:
> Hi,
>
> These patchs try to support subreg coalesce feature in
> register allocation passes (ira and lra).
Thanks a lot for the series. This is definitely something we've
needed for a while.
I probably won't be able to look at it in detail for a couple of weeks
(and the rea
"Kewen.Lin" writes:
> Hi,
>
> Gentle ping this:
>
> https://gcc.gnu.org/pipermail/gcc-patches/2023-October/634201.html
Sorry for the lack of review on this. Personally, I've never looked
at this part of code base in detail, so I don't think I can do a proper
review. I'll try to have a look in s
Christophe Lyon writes:
> On 11/7/23 23:51, Richard Sandiford wrote:
>> Victor Do Nascimento writes:
>>> Extend existing unit tests for the ACLE system register manipulation
>>> functions to include 128-bit tests.
>>>
>>> gcc/testsuite/ChangeLog:
&
Tamar Christina writes:
>> >> > + "&& TARGET_SVE && rtx_equal_p (operands[0], operands[1])
>> >> > + && satisfies_constraint_ (operands[2])
>> >> > + && FP_REGNUM_P (REGNO (operands[0]))"
>> >> > + [(const_int 0)]
>> >> > + {
>> >> > +rtx op1 = lowpart_subreg (mode, operands[1],
>> mode
Andrew Carlotti writes:
> Move SVE extension checking functionality to aarch64-builtins.cc, so
> that it can be shared by non-SVE intrinsics.
>
> gcc/ChangeLog:
>
> * config/aarch64/aarch64-sve-builtins.cc (check_builtin_call)
> (expand_builtin): Update calls to the below.
> (rep
Lehua Ding writes:
> Hi Richard,
>
> On 2023/11/8 17:40, Richard Sandiford wrote:
>> Tracking subreg liveness will sometimes expose dead code that
>> wasn't obvious without it. PR89606 has an example of this.
>> There the dead code was introduced by init-regs,
Andrew Carlotti writes:
> The availability of tme intrinsics was previously gated at both
> initialisation time (using global target options) and usage time
> (accounting for function-specific target options). This patch removes
> the check at initialisation time, and also moves the intrinsics ou
Lehua Ding writes:
> On 2023/11/10 18:16, Richard Sandiford wrote:
>> Lehua Ding writes:
>>> Hi Richard,
>>>
>>> On 2023/11/8 17:40, Richard Sandiford wrote:
>>>> Tracking subreg liveness will sometimes expose dead code that
>>>>
add_map_value (end_ptr, v->number, joined);
+ }
+ }
+ else
+ {
+ number = group->find_builtin (name.string);
+ end_ptr = add_map_value (end_ptr, number, string);
+ }
c = read_skip_spaces ();
}
while (c != ']');
--
2.25.1
Date: Fri, 10 Nov 2023 15:47:53 +
From: Richard Sandiford
Jeff Law writes:
> On 11/8/23 23:08, pan2...@intel.com wrote:
>> From: Pan Li
>>
>> Update in v2:
>> * Move vector type support to get_stored_val.
>>
>> Original log:
>>
>> This patch would like to allow the vector mode in the
>> get_stored_val in the DSE. It is valid for the read
>> rtx if an
Jeff Law writes:
> On 11/8/23 02:40, Richard Sandiford wrote:
>> Lehua Ding writes:
>>> Hi,
>>>
>>> These patchs try to support subreg coalesce feature in
>>> register allocation passes (ira and lra).
>>
>> Thanks a lot for the series.
Jeff Law writes:
> On 11/7/23 17:35, Richard Sandiford wrote:
>
>> I could have sworn that there was something that checked that passes
>> left edge aux fields clear, but it looks like I misremembered. So I
>> probably need to stick a clear_aux_for_edges () call above the
Jeff Law writes:
> On 11/5/23 11:50, Richard Sandiford wrote:
>> The mode-switching pass assumed that all of an entity's modes
>> were mutually exclusive. However, the upcoming SME changes
>> have an entity with some overlapping modes, so that there is
>> so
Thanks for the patch.
Manos Anagnostakis writes:
> This is an RTL pass that detects store forwarding from stores to larger loads
> (load pairs).
>
> This optimization is SPEC2017-driven and was found to be beneficial for some
> benchmarks,
> through testing on ampere1/ampere1a machines.
>
> For
Jeff Law writes:
> On 11/11/23 08:54, Richard Sandiford wrote:
>> Jeff Law writes:
>>> On 11/5/23 11:50, Richard Sandiford wrote:
>>>> The mode-switching pass assumed that all of an entity's modes
>>>> were mutually exclusive. However, the upc
钟居哲 writes:
> Hi, Richard.
>
>>> Maybe dead lanes are better tracked at the gimple level though, not sure.
>>> (But AArch64 might need to lower lane operations more than it does now if
>>> we want gimple to handle it.)
>
> We were trying to address such issue at GIMPLE leve at the beginning.
> Tra
ad is only small for targets that
use the new feature. Almost all of the new code gets optimised away
on targets that don't use the feature.
Richard Sandiford (5):
Add register filter operand to define_register_constraint
recog: Handle register filters
lra: Handle register filters
i
The main way of enforcing registers to be aligned is through
HARD_REGNO_MODE_OK. But this is a global property that applies
to all operands. A given (regno, mode) pair is either globally
valid or globally invalid.
This patch instead adds a way of specifying that individual operands
must be align
The main (but simplest) part of this patch makes constrain_operands
take register filters into account.
The rest of the patch adds register filter information to
operand_alternative. Generally, if two register constraints
have different register filters, it's better if they're in separate
alterna
This patch makes LRA apply register filters. This plus the recog
change is enough for correct code generation, but a follow-on IRA
patch improves the allocation.
All the new code should be optimised away on targets that don't
use register filters. That's because get_register_filter just
wraps "r
This patch makes IRA apply register filters when picking hard registers.
All the new code should be optimised away on targets that don't use
register filters. On targets that do use them, the new register_filters
bitfield is expected to be only a handful of bits.
Information about register filter
This patch adds a target-independent aligned_register_operand
predicate, for use with register constraints that use filters
to impose an alignment. The definition deliberately jetisons
some of the historical baggage in general_operand.
gcc/
* common.md (aligned_register_operand): New pred
Tamar Christina writes:
> Hi All,
>
> In testcases gcc.dg/tree-ssa/slsr-19.c and gcc.dg/tree-ssa/slsr-20.c we have
> a
> fairly simple computation. On the current generic costing we generate:
>
> f:
> add w0, w0, 2
> maddw1, w0, w1, w1
> lsl w0, w1, 1
>
This series of patches adds support for SME. A follow-on series
will add SME2 on top.
All of the detail is in the individual patch summaries.
The series can't go in yet, because it depends on:
https://gcc.gnu.org/pipermail/gcc-patches/2023-September/629713.html
and some reviewed-but-unpushed
require_immediate_lane_index previously hard-coded the assumption
that the group size is determined by the argument immediately before
the index. However, for SME, there are cases where it should be
determined by an earlier argument instead.
gcc/
* config/aarch64/aarch64-sve-builtins.h:
SME will add more intrinsics whose expansion code requires
the mode of the function return value. This patch adds an
associated helper routine.
gcc/
* config/aarch64/aarch64-sve-builtins.h
(function_expander::result_mode): New member function.
* config/aarch64/aarch64-sve-
We didn't previously use SVE's RDVL instruction, since the CNT*
forms are preferred and provide most of the range. However,
there are some cases that RDVL can handle and CNT* can't,
and using RDVL-like instructions becomes important for SME.
gcc/
* config/aarch64/aarch64-protos.h (aarch64
So far, all intrinsics covered by the aarch64-sve-builtins*
framework have (naturally enough) required at least SVE.
However, arm_sme.h defines a couple of intrinsics that can
be called by any code. It's therefore necessary to make
the implicit SVE requirement explicit.
gcc/
* config/aarc
The SME2 ACLE adds a new "group" suffix component to the naming
convention for SVE intrinsics. This is also used in the new tuple
forms of the svreinterpret intrinsics.
This patch adds support for group suffixes and defines the
x2, x3 and x4 suffixes that are needed for the svreinterprets.
gcc/
SME2 adds a number of intrinsics that operate on tuples of 2 and 4
vectors. The ACLE therefore extends the existing svreinterpret
intrinsics to handle tuples as well.
gcc/
* config/aarch64/aarch64-sve-builtins-base.cc
(svreinterpret_impl::fold): Punt on tuple forms.
(svrei
This patch adds support for recognising the SME arm::streaming
and arm::streaming_compatible attributes. These attributes
respectively describe whether the processor is definitely in
"streaming mode" (PSTATE.SM==1), whether the processor is
definitely not in streaming mode (PSTATE.SM==0), or wheth
This patch adds the +sme ISA feature and requires it to be present
when compiling arm_streaming code. (arm_streaming_compatible code
does not necessarily assume the presence of SME. It just has to
work when SME is present and streaming mode is enabled.)
gcc/
* doc/invoke.texi: Document S
The vast majority of Advanced SIMD instructions are not
available in streaming mode, but some of the load/store/move
instructions are. This patch adds a new target feature macro
called TARGET_BASE_SIMD for this streaming-compatible subset.
The vector-to-vector move instructions are not streaming-
SME has an array called ZA that can be enabled and disabled separately
from streaming mode. A status bit called PSTATE.ZA indicates whether
ZA is currently enabled or not.
In C and C++, the state of PSTATE.ZA is controlled using function
attributes. There are four attributes that can be attached
Until now, SVE intrinsics that map directly to unspecs
have always used type suffix 0 to distinguish between signed
integers, unsigned integers, and floating-point values.
SME adds functions that need to use type suffix 1 instead.
This patch generalises the classes accordingly.
gcc/
* conf
This patch adds support for switching to the appropriate SME mode
for each call. Switching to streaming mode requires an SMSTART SM
instruction and switching to non-streaming mode requires an SMSTOP SM
instruction. If the call is being made from streaming-compatible code,
these switches are condi
In SVE there was a simple rule that unary merging (_m) intrinsics
had a separate initial argument to specify the values of inactive
lanes, whereas other merging functions took inactive lanes from
the first operand to the operation.
That rule began to break down in SVE2, and it continues to do
so i
Some SME instructions use w12-w15 to index ZA. This patch
adds a register class for that range.
gcc/
* config/aarch64/aarch64.h (W12_W15_REGNUM_P): New macro.
(W12_W15_REGS): New register class.
(REG_CLASS_NAMES, REG_CLASS_CONTENTS): Add entries for it.
* config/aa
Although TI isn't really a native SVE element mode, it's convenient
for SME if we define VNx1TI anyway, so that it can be used to
distinguish .Q ZA operations from others. It's purely an RTL
convenience and isn't (yet) a valid storage mode.
gcc/
* config/aarch64/aarch64-modes.def: Add VNx
PSTATE.SM is always off on entry to an exception handler, and on entry
to a nonlocal goto receiver. Those entry points need to switch
PSTATE.SM back to the appropriate state for the current function.
In the case of streaming-compatible functions, they need to restore
the mode that the caller was o
This patch adds support for the __arm_locally_streaming attribute,
which allows a function to use SME internally without changing
the function's ABI. The attribute is valid but redundant for
__arm_streaming functions.
gcc/
* config/aarch64/aarch64.cc (aarch64_arm_attribute_table): Add
We only support tail calls between functions with the same PSTATE.ZA
setting ("private-ZA" to "private-ZA" and "shared-ZA" to "shared-ZA").
Only a normal non-streaming function can tail-call another non-streaming
function, and only a streaming function can tail-call another streaming
function. An
A function that has local ZA state cannot be inlined into its caller,
since we only support managing ZA switches at function scope.
A function whose body directly clobbers ZA state cannot be inlined into
a function with ZA state.
A function whose body requires a particular PSTATE.SM setting can o
This series of patches adds support for SME2. It is gated behind
the earlier series for SME.
All of the detail is in the individual patch summaries.
Tested on aarch64-linux-gnu.
Richard
gcc/
* doc/invoke.texi: Document +sme2.
* doc/sourcebuild.texi: Document aarch64_sme2.
* config/aarch64/aarch64-option-extensions.def (AARCH64_OPT_EXTENSION):
Add sme2.
* config/aarch64/aarch64.h (AARCH64_ISA_SME2, TARGET_SME2): New macros.
gcc/testsuite/
901 - 1000 of 9556 matches
Mail list logo