In order to reduce time complexity, rtl-ssa groups consecutive
clobbers together. Each group of clobbers has a splay tree for
lookup and manipulation purposes.
This arrangement means that we might need to split a group (when
inserting a new non-clobber definition between two clobbers) or
to join
Richard Biener writes:
> On Wed, 3 Sep 2025, Richard Sandiford wrote:
>
>> Tamar Christina writes:
>> > We also don't ever force unrolling for predicated SVE because for
>> > predicated SVE we have to balance predicate throughput limitations
>> > of a
Tamar Christina writes:
>> -Original Message-
>> From: Richard Biener
>> Sent: Tuesday, September 2, 2025 1:44 PM
>> To: Tamar Christina
>> Cc: gcc-patches@gcc.gnu.org; nd
>> Subject: Re: [PATCH 1/3]middle-end: clear the user unroll flag if the
>> cost model has
>> overriden it
>>
>> O
Claudiu Zissulescu Ianculescu writes:
> Hi Richard,
>
> On Thu, Aug 21, 2025 at 9:55 AM Richard Sandiford
> wrote:
>>
>> Today is my last working day at Arm, so this patch switches my
>> MAINTAINERS entries to my personal email address. (It turns out
>> th
Claudiu Zissulescu-Ianculescu writes:
> Hi,
>
>>> +DEFHOOK
>>> +(compose_offset_tag,
>>> + "Return an RTX that represnts the result of composing
>>> @var{tag_offset} with\n\
>>> +the base tag @var{base_tag}.\n\
>>> +The default of this hook is to byte add @var{tag_offset} to
>>> @var{base_tag}.",
claudiu.zissulescu-iancule...@oracle.com writes:
> diff --git a/gcc/testsuite/gcc.target/aarch64/memtag/basic-1.c
> b/gcc/testsuite/gcc.target/aarch64/memtag/basic-1.c
> new file mode 100644
> index 000..70b790c6c3e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/memtag/basic-1.c
claudiu.zissulescu-iancule...@oracle.com writes:
> From: Claudiu Zissulescu
>
> Memory tagging is used for detecting memory safety bugs. On AArch64, the
> memory tagging extension (MTE) helps in reducing the overheads of memory
> tagging:
> - CPU: MTE instructions for efficiently tagging and unt
claudiu.zissulescu-iancule...@oracle.com writes:
> From: Claudiu Zissulescu
>
> Add a new target hook TARGET_MEMTAG_COMPOSE_OFFSET_TAG to perform
> addition between two tags.
>
> The default of this hook is to byte add the inputs.
>
> Hardware-assisted sanitizers on architectures providing instruc
claudiu.zissulescu-iancule...@oracle.com writes:
> From: Indu Bhagat
>
> Add new command line option -fsanitize=memtag-stack with the following
> new params:
> --param memtag-instrument-alloca [0,1] (default 1) to use MTE insns
> for enabling dynamic checking of stack allocas.
>
> Along with the n
claudiu.zissulescu-iancule...@oracle.com writes:
> From: Indu Bhagat
>
> gcc/Changelog:
>
> * asan.h (HWASAN_TAG_SIZE): Use targetm.memtag.tag_bitsize.
> * config/i386/i386.cc (ix86_memtag_tag_size): Rename to
> ix86_memtag_tag_bitsize.
> (TARGET_MEMTAG_TAG_SIZE): Renamed t
Richard Biener writes:
>> Am 21.08.2025 um 16:56 schrieb Richard Sandiford :
>>
>> This PR is another bug in the rtl-ssa code to manage live-out uses.
>> It seems that this didn't get much coverage until recently.
>>
>> In the testcase, late-combine fir
This PR is another bug in the rtl-ssa code to manage live-out uses.
It seems that this didn't get much coverage until recently.
In the testcase, late-combine first removed a register-to-register
move by substituting into all uses, some of which were in other EBBs.
This was done after checking make
Tamar Christina writes:
> Consider the example:
>
> void
> f (int *restrict x, int *restrict y, int *restrict z, int n)
> {
> for (int i = 0; i < 4; ++i)
> {
> int res = 0;
> for (int j = 0; j < 100; ++j)
> res += y[j] * z[i];
> x[i] = res;
> }
> }
>
> we curren
Richard Biener writes:
> On Wed, 20 Aug 2025, Richard Sandiford wrote:
>
>> Tamar Christina writes:
>> > To avoid double counting scalar instructions when doing inner-loop costing
>> > the
>> > vectorizer uses vect_prologue as the kind instead of vect_bod
ssner
Jason Merrill
David S. Miller
Joseph Myers
-Richard Sandiford
+Richard Sandiford
Bernd Sc
Tamar Christina writes:
> To avoid double counting scalar instructions when doing inner-loop costing the
> vectorizer uses vect_prologue as the kind instead of vect_body.
>
> However doing this results in our throughput based costing to think the scalar
> loop issues in 0 cycles (rounded up to 1.0
Richard Biener writes:
> On Wed, Aug 20, 2025 at 1:03 PM Richard Sandiford
> wrote:
>>
>> While testing a later patch, I found that create_degenerate_phi
>> had an inverted test for bitmap_set_bit. It was assuming that
>> the return value was the previous bit valu
I'd added the aarch64-specific CC fusion pass to fold a PTEST
instruction into the instruction that feeds the PTEST, in cases
where the latter instruction can set the appropriate flags as a
side-effect.
Combine does the same optimisation. However, as explained in the
comments, the PTEST case ofte
While testing a later patch, I found that create_degenerate_phi
had an inverted test for bitmap_set_bit. It was assuming that
the return value was the previous bit value, rather than a
"something changed" value. :(
Also, the call to add_live_out_use shouldn't be conditional
on the DF_LR_OUT opera
rtl-ssa already has a find_def function for finding the definition
of a particular resource (register or memory) at a particular point
in the program. This patch adds a similar function for looking
up uses. Both functions have amortised logarithmic complexity.
Tested on aarch64-linux-gnu, powerp
This PR is about a case where we used aarch64_expand_sve_const_pred_trn
to combine two predicates, one of which was constructing using
aarch64_sve_move_pred_via_while. The former requires the inputs
to have mode VNx16BI, but the latter returned VNx8BI for a .H
WHILELO.
The proper fix, used on tru
Richard Biener writes:
> On Mon, Aug 18, 2025 at 10:51 AM Richard Sandiford
> wrote:
>>
>> This patch fixes an internal disagreement in gcse about how to
>> handle partial clobbers. Like many passes, gcse doesn't track
>> the modes of live values, so i
This patch fixes an internal disagreement in gcse about how to
handle partial clobbers. Like many passes, gcse doesn't track
the modes of live values, so if a call clobbers only part of
a register, the pass has to make conservative assumptions.
As the comment in the patch says, this means:
(1) ig
PR119156 was fixed by g:f702b593e7268ab161053bafd097f1b09933b783.
This patch adds a test for it.
Tested on aarch64-linux-gnu & pushed as (hopefully) obvious.
Richard
gcc/testsuite/
PR target/119156
* gcc.target/aarch64/sve/pr119156_1.c: New test.
---
gcc/testsuite/gcc.target/aa
please let me know if you see any fallout.
Richard
> Thanks :)
>
> On Thu, Aug 14, 2025 at 5:19 PM Richard Sandiford
> wrote:
>>
>> One of Alfie's FMV patches adds a hook that, in some cases,
>> is used to silently query a target_version (with no diagnostics
>
LGTM, but a minor suggestion below:
writes:
> @@ -423,61 +423,100 @@ expand_target_clones (struct cgraph_node *node, bool
> definition)
>return true;
> }
>
> -/* When NODE is a target clone, consider all callees and redirect
> - to a clone with equal target attributes. That prevents mu
writes:
> From: Alfie Richards
>
> This patch introduces the TARGET_CHECK_TARGET_CLONE_VERSION hook
> which is used to determine if a target_clones version string parses.
>
> The hook has a flag to enable emitting diagnostics.
>
> This is as specified in the Arm C Language Extension. The purpose
Pengfei Li writes:
> Hi,
>
> This patch backports the fix of r16-3083 to gcc-13.
>
> Compared to the trunk version, this is slightly different because those RTL
> patterns in gcc-13 do not yet use the compact syntax for multiple
> alternatives.
> But this patch is functionally identical to the tr
Pengfei Li writes:
>> I’d also ask for a slightly more descriptive sentence like “Use vg
>> constraint for alternative so-and-so”.
>> Ok to push whatever reword you come up with.
>
> This has been committed to trunk as r16-3083 for about one week. I wonder if I
> could consider backporting it now
Richard Biener writes:
> On Thu, Aug 7, 2025 at 2:14 PM Richard Sandiford
> wrote:
>>
>> simplify_gen_subreg rejected subregs of literal constants if
>> MODE_COMPOSITE_P. This was added by the fix for PR96648 in
>> g:c0f772894b6b3cd8ed5c5dd09d0c7917f51cf70f.
These patterns had one (if_then_else ...) nested within another.
The outer if_then_else had SImode, which means that the "then"
and "else" should also be SImode (unless they're const_ints).
However, the inner if_then_else was modeless, which led to an
assertion failure when trying to take a subreg
One of Alfie's FMV patches adds a hook that, in some cases,
is used to silently query a target_version (with no diagnostics
expected). In the review, I'd suggested handling this using
a location_t *, with null meaning "suppress diagnostics":
https://gcc.gnu.org/pipermail/gcc-patches/2025-Augus
Wilco Dijkstra writes:
> Add an expander for isinf using integer arithmetic. This is
> typically faster and avoids generating spurious exceptions on
> signaling NaNs.
>
> int isinf1 (float x) { return __builtin_isinf (x); }
>
> Before:
> fabss0, s0
> mov w0, 2139095039
>
Claudiu Zissulescu-Ianculescu writes:
>>> + [(set (match_operand:TI 0 "aarch64_granule16_memory_operand" "=Umg")
>>> + (unspec:TI
>>> +[(match_operand:TI 1 "aarch64_granule16_memory_operand" "Umg")
>>> + (match_operand:DI 2 "register_operand" "rk")]
>>> +UNSPEC_TAG_SP
writes:
> From: Soumya AR
>
> Hi,
>
> This RFC is a continuation of previous patches sent here:
> https://gcc.gnu.org/pipermail/gcc-patches/2025-May/682702.html
>
> As suggested in the earlier thread, I've now added a python script to generete
> the printing and parsing routines for the JSON tuni
claudiu.zissulescu-iancule...@oracle.com writes:
> [...]
> /* Implement TARGET_MEMTAG_CAN_TAG_ADDRESSES. Here we tell the rest of the
> compiler that we automatically ignore the top byte of our pointers, which
> - allows using -fsanitize=hwaddress. */
> + allows using -fsanitize=hwaddres
claudiu.zissulescu-iancule...@oracle.com writes:
> From: Indu Bhagat
>
> Define new constants to be used by the MTE pattern definitions.
>
> gcc/
>
> * config/aarch64/aarch64.md (MEMTAG_TAG_MASK): New define
> constant.
> (MEMTAG_ADDR_MASK): Likewise.
> (irg, subp, ldg): Us
claudiu.zissulescu-iancule...@oracle.com writes:
> From: Claudiu Zissulescu
>
> Add a new target instruction. Hardware-assisted sanitizers on
> architectures providing insstructions to tag/untag memory can then
> make use of this new instruction pattern. For example, the
> memtag-stack sanitizer u
claudiu.zissulescu-iancule...@oracle.com writes:
> From: Indu Bhagat
>
> Currently, the data type of sanitizer flags is unsigned int, with
> SANITIZE_SHADOW_CALL_STACK (1UL << 31) being highest individual
> enumerator for enum sanitize_code. Use 'sanitize_code_type' data type
> to allow for more
For the reasons explained in the comment, fwprop shouldn't even
try to propagate an asm definition.
Tested on aarch64-linux-gnu. Bordering on obvious, but just in case:
OK to install?
Richard
gcc/
PR rtl-optimization/121253
* fwprop.cc (forward_propagate_into): Don't propagate
writes:
> +/* Extract string value from JSON, returning allocated C string. */
> +char *
> +extract_string (const json::value *val)
> +{
> + if (auto *string_val = dyn_cast (val))
> +{
> + char *result = new char[string_val->get_length () + 1];
> + strcpy (result, string_val->get_s
writes:
> +/* Mapping structure for enum-to-string conversion. */
> +template struct enum_mapping
> +{
> + const char *name;
> + EnumType value;
> +};
> +
> +static const enum_mapping
> + autoprefetcher_model_mappings[]
> + = {{"AUTOPREFETCHER_OFF", tune_params::AUTOPREFETCHER_OFF},
> +
writes:
> From: Soumya AR
>
> This commit introduces a Python maintenance script that generates C++ code
> for parsing and serializing AArch64 JSON tuning parameters based on the
> schema defined in aarch64-json-schema.h.
>
> The script generates two include files:
> - aarch64-json-tunings-pars
writes:
> From: Soumya AR
>
> This patch adds a get_map () method to the JSON object class to provide access
> to the underlying hash map that stores the JSON key-value pairs.
>
> It also reorganizes the private and public sections of the class to expose the
> map_t typedef, which is the return t
Richard Henderson writes:
> On 8/8/25 21:18, Richard Sandiford wrote:
>>> +(define_insn "*aarch64_cb"
>>> + [(set (pc) (if_then_else
>>> + (INT_CMP
>>> + (match_operand:GPI 0 "register_operand" "r&qu
Richard Henderson writes:
> On 8/8/25 20:39, Richard Sandiford wrote:
>> Richard Henderson writes:
>>> The save/restore_stack_nonlocal patterns passed a DImode rtx
>>> to gen_tbranch_neqi3 for a QImode compare. The tbranch expander
>>> did not do what it
writes:
> From: Soumya AR
>
> Hi,
>
> This RFC is a continuation of previous patches sent here:
> https://gcc.gnu.org/pipermail/gcc-patches/2025-May/682702.html
>
> As suggested in the earlier thread, I've now added a python script to generete
> the printing and parsing routines for the JSON tuni
writes:
> From: Alfie Richards
>
> Adds an optimisation in FMV to redirect to a specific target if possible.
>
> A call is redirected to a specific target if both:
> - the caller can always call the callee version
> - and, it is possible to rule out all higher priority versions of the callee
>
OK for:
writes:
> gcc/cgraph.cc | 4 +-
> gcc/cgraph.h | 2 +
> gcc/cgraphunit.cc | 9 +
> gcc/config/aarch64/aarch64.cc | 43 ++--
> gcc/ipa.cc
Christophe Lyon writes:
> In commit r15-4417-g71c7b446b98aa5, I made -werror mandatory when
> building libgcc for aarch64.
>
> While it achieved its goal (make us fix problems unnoticed so far),
> there has a been a lot of debate because it couldn't be disabled
> easily.
As discussed off-list: yo
Richard Henderson writes:
> Reject QI/HImode conditions, which would require extension in
> order to compare. Fixes
>
> z.c:10:1: error: unrecognizable insn:
>10 | }
> | ^
> (insn 23 22 24 2 (set (reg:CC 66 cc)
> (compare:CC (reg:HI 128)
> (reg:HI 127))) "z.c":6:6 -1
Richard Henderson writes:
> Version 1 regressed the expansion of atomics, which means the addition
> of CC clobber to all conditional branches is flawed. Version 2 goes
> the other way: remove CC clobber from all conditional branches.
>
> This requires the out-of-range TBZ->TST+B.cond expansion b
Richard Henderson writes:
> Restrict the immediate range to the intersection of LT/GE and GT/LE
> so that cfglayout can invert the condition to redirect any branch.
>
> gcc:
> * config/aarch64/aarch64.cc (aarch64_cb_rhs): Restrict the
> range of LT/GE and GT/LE to their intersections.
Richard Henderson writes:
> The enable for the test was wrong, so it never ran.
>
> gcc/testsuite:
> * gcc.target/aarch64/cmpbr.c: Use dg-require-effective-target.
> ---
> gcc/testsuite/gcc.target/aarch64/cmpbr.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/gcc/t
Richard Henderson writes:
> The save/restore_stack_nonlocal patterns passed a DImode rtx
> to gen_tbranch_neqi3 for a QImode compare. The tbranch expander
> did not do what it said on the tin, that is: emit TBNZ.
> It only made it as far as AND+CMP+B.cond.
Yeah, that was done to allow ifcombine
In g:965564eafb721f813a3112f1bba8d8fae32b I'd added code
to try distributing non-widening subregs through logic ops,
in cases where that would eliminate a term of the logic op.
For "reasons", this indirectly caused combine to generate:
(set (zero_extract:SI (reg/v:SI 101 [ a ])
(c
writes:
> From: Alfie Richards
>
> This patch introduces the TARGET_CHECK_TARGET_CLONE_VERSION hook
> which is used to determine if a target_clones version string parses.
>
> The hook has a flag to enable emitting diagnostics.
>
> This is as specified in the Arm C Language Extension. The purpose
writes:
> From: Alfie Richards
>
> This patch is an overhaul of how FMV name mangling works. Previously
> mangling logic was duplicated in several places across both target
> specific and independent code. This patch changes this such that all
> mangling is done in targetm.mangle_decl_assembler_n
writes:
> From: Alfie Richards
>
> This is similar to clone_function_name and its siblings but takes an
> identifier tree node rather than a function declaration.
>
> This is to be used in conjunction with the identifier node stored in
> cgraph_function_version_info::assembler_name to mangle FMV
Unlike base PCS functions, __arm_streaming and __arm_streaming_compatible
functions allow/require PSTATE.SM to be 1 on entry, so they need to
be treated as STO_AARCH64_VARIANT_PCS.
Similarly, functions that share ZA or ZT0 with their callers require
ZA to be active on entry, whereas the base PCS r
simplify_gen_subreg rejected subregs of literal constants if
MODE_COMPOSITE_P. This was added by the fix for PR96648 in
g:c0f772894b6b3cd8ed5c5dd09d0c7917f51cf70f. Jakub said:
As for the simplify_gen_subreg change, I think it would be desirable
to just avoid creating SUBREGs of constants on
Uros Bizjak writes:
> On Tue, Aug 5, 2025 at 1:32 PM Richard Sandiford
> wrote:
>> It's coming from:
>>
>> (define_split
>> [(set (match_operand:SWI 0 "register_operand")
>> (any_rotate:SWI
>> (match_o
Richard Sandiford writes:
> "H.J. Lu" writes:
>> On Mon, Aug 4, 2025 at 3:28 PM H.J. Lu wrote:
>>>
>>> On Mon, Aug 4, 2025 at 2:04 PM H.J. Lu wrote:
>>> >
>>> > On Mon, Aug 4, 2025 at 8:50 AM Richard Sandiford
>>> &g
The i386 high-register patterns used things like:
(match_operator:SWI248 2 "extract_operator"
[(match_operand 0 "int248_register_operand" "Q")
(const_int 8)
(const_int 8)])
to match an extraction of a high register such as AH from AX/EAX/RAX.
This construct is used in con
Richard Henderson writes:
> I have written patches for FEAT_CMPBR support in QEMU, and wanted to
> test them out with gcc. The easiest way, seemed to be bootstrapping
> gcc with cmpbr enabled. The attempt failed on stage1 libgcc.
>
> My bug report is target/121385. Pinski did some analyis, whic
Richard Henderson writes:
> Middle distance branches between 1KiB and 1MiB may be
> implemented with cmp+branch instead of branch+branch.
>
> gcc:
> * config/aarch64/aarch64.cc (*aarch64_cb):
> Fall back to cmp/cmn + bcond if !far_branch.
> Adjust far_branch to 1MiB.
> (*aa
Richard Henderson writes:
> Some of the compare-and-branch patterns rely on CC for scratch in some
> of the alternative expansions. This is fine, because when the combined
> compare-and-branch patterns are formed by combine, we will be eliminating
> a write to CC, so CC is dead anyway.
>
> Standa
"H.J. Lu" writes:
> On Mon, Aug 4, 2025 at 3:28 PM H.J. Lu wrote:
>>
>> On Mon, Aug 4, 2025 at 2:04 PM H.J. Lu wrote:
>> >
>> > On Mon, Aug 4, 2025 at 8:50 AM Richard Sandiford
>> > wrote:
>> > > Sorry, I hadn't realised
"H.J. Lu" writes:
> After
>
> commit 965564eafb721f813a3112f1bba8d8fae32b
> Author: Richard Sandiford
> Date: Tue Jul 29 15:58:34 2025 +0100
>
> simplify-rtx: Simplify subregs of logic ops
>
> make_compound_operation_int gets
>
Uros Bizjak writes:
> On Sat, Aug 2, 2025 at 8:56 PM H.J. Lu wrote:
>>
>> On Fri, Aug 1, 2025 at 10:32 PM Uros Bizjak wrote:
>> >
>> > On Sat, Aug 2, 2025 at 3:22 AM H.J. Lu wrote:
>> > >
>> > > After
>> > >
>>
Dhruv Chawla writes:
> On 01/08/25 22:10, Richard Sandiford wrote:
>> External email: Use caution opening links or attachments
>>
>>
>> Dhruv Chawla writes:
>>> On 24/07/25 11:21, Andrew Pinski wrote:
>>>> External email: Use caution opening li
Matthieu Longo writes:
> On 2025-08-04 11:33, Richard Sandiford wrote:
>> Matthieu Longo writes:
>>> On 2025-07-31 13:39, Jan Beulich wrote:
>>>> On 09.07.2025 14:48, Matthieu Longo wrote:
>>>>> Those methods's implementation is relying on duck
Matthieu Longo writes:
> On 2025-07-31 13:39, Jan Beulich wrote:
>> On 09.07.2025 14:48, Matthieu Longo wrote:
>>> Those methods's implementation is relying on duck-typing at compile
>>> time.
>>> The structure corresponding to the node of a doubly linked list needs
>>> to define attributes 'prev'
Alfie Richards writes:
> Adds an optimisation in FMV to redirect to a specific target if possible.
>
> A call is redirected to a specific target if both:
> - the caller can always call the callee version
> - and, it is possible to rule out all higher priority versions of the callee
> fmv set. Th
Dhruv Chawla writes:
> On 24/07/25 11:21, Andrew Pinski wrote:
>> External email: Use caution opening links or attachments
>>
>>
>> On Wed, Jul 23, 2025 at 10:16 PM wrote:
>>>
>>> From: Dhruv Chawla
>>>
>>> This patch folds the following patterns:
>>> - max (a, add (a, b)) -> [sum, ovf] = adds
Alfie Richards writes:
> Add support for a FMV set defined by a combination of target_clones and
> target_version definitions.
>
> Additionally, change is_function_default_version to consider a function
> declaration annotated with target_clones containing default to be a
> default version.
>
> La
Alfie Richards writes:
> On 01/08/2025 11:46, Richard Sandiford wrote:
>> Sorry, I think I missed the multiple_targets.cc changes in my
>> previous review.
>>
>> Alfie Richards writes:
>>> +
>>> + t
The target-independent and aarch64 bits mostly look good to me, but
a few comments/questions:
Alfie Richards writes:
> diff --git a/gcc/cp/typeck.cc b/gcc/cp/typeck.cc
> index a604511db71..4eb37f5818f 100644
> --- a/gcc/cp/typeck.cc
> +++ b/gcc/cp/typeck.cc
> @@ -4489,6 +4489,16 @@ cp_build_funct
Sorry, I think I missed the multiple_targets.cc changes in my
previous review.
Alfie Richards writes:
> diff --git a/gcc/multiple_target.cc b/gcc/multiple_target.cc
> index d25277c0a93..44340cbc6a4 100644
> --- a/gcc/multiple_target.cc
> +++ b/gcc/multiple_target.cc
> @@ -313,7 +216,6 @@ create_t
Kyrylo Tkachov writes:
>> On 29 Jul 2025, at 18:41, Richard Sandiford
>> wrote:
>>
>> This patch continues the work of making ACLE intrinsics use VNx16BI
>> for svbool_t results. It deals with the svpnext* intrinsics.
>>
>
> I wonder if the new patte
Alfie Richards writes:
> diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
> index 5e305643b3a..253ea6dd77f 100644
> --- a/gcc/doc/tm.texi
> +++ b/gcc/doc/tm.texi
> @@ -12268,6 +12268,11 @@ function version at run-time for a given set of
> function versions.
> body must be generated.
> @end deftyp
FWIW, I agree with Jeff's comment in the v6 series against the duplication
of is_valid_asm_symbol and create_new_asm_name. On the aarch64 bits:
Alfie Richards writes:
> @@ -20549,18 +20540,6 @@ aarch64_mangle_decl_assembler_name (tree decl, tree
> id)
> This is computed by taking the defaul
Spencer Abson writes:
> GCC doesn't support SME without SVE2, so the -march=armv8-a+ argument to
> check_no_compiler_messages causes aarch64_asm__ok to return zero for SME
> and any that implies it. This patch changes the baseline architecure to
> armv9-a for these extensions.
>
> The tests for
Tamar Christina writes:
>> -Original Message-
>> From: Richard Sandiford
>> Sent: Tuesday, July 29, 2025 1:43 PM
>> To: gcc-patches@gcc.gnu.org
>> Cc: Alex Coplan ; Alice Carlotti
>> ;
>> pins...@gmail.com; ktkac...@nvidia.com; Richard Earnsha
Tamar Christina writes:
>> -Original Message-
>> From: Richard Sandiford
>> Sent: Tuesday, July 29, 2025 4:33 PM
>> To: gcc-patches@gcc.gnu.org
>> Cc: Alex Coplan ; Alice Carlotti
>> ;
>> pins...@gmail.com; ktkac...@nvidia.com; Richard Earnsha
Kyrylo Tkachov writes:
>> +(define_insn "*aarch64_pred_fcmuo_acle"
>> + [(set (match_operand:VNx16BI 0 "register_operand")
>
> Looks like a “”=w” constraint is missing here.
Argh! Thanks for catching that. I went through and checked for
missing constraints in the other new patterns but it look
Tamar Christina writes:
>> -Original Message-
>> From: Richard Sandiford
>> Sent: Tuesday, July 29, 2025 5:20 PM
>> To: Alex Coplan ; Alice Carlotti
>> ;
>> pins...@gmail.com; ktkac...@nvidia.com; Richard Earnshaw
>> ; Tamar Christina ;
>>
Andrew Pinski writes:
> Right now in simplify_subreg, there is code to try to simplify for word_mode
> with the binary bitwise operators. The unary bitwise operator is not handle,
> this causes an odd mix match and the new self testing code that was added with
> r16-2614-g965564eafb721f was not ex
After previous patches, we should always get a VNx16BI result
for ACLE intrinsics that return svbool_t. This patch adds
an assert that checks a more general condition than that.
gcc/
* config/aarch64/aarch64-sve-builtins.cc
(function_expander::expand): Assert that the return value
This patch continues the work of making ACLE intrinsics use VNx16BI
for svbool_t results. It deals with the svpnext* intrinsics.
gcc/
* config/aarch64/iterators.md (PNEXT_ONLY): New int iterator.
* config/aarch64/aarch64-sve.md
(@aarch64_sve_): Restrict SVE_PITER pattern
This patch continues the work of making ACLE intrinsics use VNx16BI
for svbool_t results. It deals with the svmatch* and svnmatch*
intrinsics.
gcc/
* config/aarch64/aarch64-sve2.md (@aarch64_pred_):
Split SVE2_MATCH pattern into a VNx16QI_ONLY define_ins and a
VNx8HI_ONLY
This patch continues the work of making ACLE intrinsics use VNx16BI
for svbool_t results. It deals with the predicate forms of svdupq.
The general predicate expansion builds an equivalent integer vector
and then compares it with zero. This patch therefore relies on
the earlier patches to the com
This patch continues the work of making ACLE intrinsics use VNx16BI
for svbool_t results. It deals with the svac* intrinsics (floating-
point compare absolute).
gcc/
* config/aarch64/aarch64-sve.md (@aarch64_pred_fac):
Replace with...
(@aarch64_pred_fac_acle): ...this new
This patch continues the work of making ACLE intrinsics use VNx16BI
for svbool_t results. It deals with the floating-point forms of svcmp*.
gcc/
* config/aarch64/aarch64-sve.md (@aarch64_pred_fcm_acle)
(*aarch64_pred_fcm_acle, @aarch64_pred_fcmuo_acle)
(*aarch64_pred_fcmuo
This patch continues the work of making ACLE intrinsics use VNx16BI
for svbool_t results. It deals with the predicate forms of svdup.
gcc/
* config/aarch64/aarch64-protos.h
(aarch64_emit_sve_pred_vec_duplicate): Declare.
* config/aarch64/aarch64.cc
(aarch64_emit_sv
This patch continues the work of making ACLE intrinsics use VNx16BI
for svbool_t results. It deals with the svcmp*_wide intrinsics.
Since the only uses of these patterns are for ACLE intrinsics,
there didn't seem much point adding an "_acle" suffix.
gcc/
* config/aarch64/aarch64.cc (@aar
Patterns that fuse a predicate operation P with a PTEST use
aarch64_sve_same_pred_for_ptest_p to test whether the governing
predicates of P and the PTEST are compatible. Most patterns were also
written as define_insn_and_rewrites, with the rewrite replacing P's
original governing predicate with PT
This patch continues the work of making ACLE intrinsics use VNx16BI
for svbool_t results. It deals with the non-widening integer forms
of svcmp*. The handling of the PTEST patterns is similar to that
for the earlier svwhile* patch.
Unfortunately, on its own, this triggers a failure in the
pred_c
The patterns for the svcmp_wide intrinsics used a VNx16BI
input predicate for all modes, instead of the usual .
That unnecessarily made some input bits significant, but more
importantly, it triggered an ICE in aarch64_sve_same_pred_for_ptest_p
when testing whether a comparison pattern could be fuse
This patch continues the work of making ACLE intrinsics use VNx16BI
for svbool_t results. It deals with the svunpk* intrinsics.
gcc/
* config/aarch64/aarch64-sve.md (@aarch64_sve_punpk_acle)
(*aarch64_sve_punpk_acle): New patterns.
* config/aarch64/aarch64-sve-builtins-bas
1 - 100 of 2318 matches
Mail list logo