Re: [PATCH] middle-end/113576 - avoid out-of-bound vector element access

2024-02-14 Thread Richard Sandiford
Richard Biener writes: > The following avoids accessing out-of-bound vector elements when > native encoding a boolean vector with sub-BITS_PER_UNIT precision > elements. The error was basing the number of elements to extract > on the rounded up total byte size involved and the patch bases > every

Re: [PATCH] middle-end/113576 - avoid out-of-bound vector element access

2024-02-14 Thread Richard Sandiford
Richard Biener writes: > On Wed, 14 Feb 2024, Richard Sandiford wrote: > >> Richard Biener writes: >> > The following avoids accessing out-of-bound vector elements when >> > native encoding a boolean vector with sub-BITS_PER_UNIT precision >> > elemen

Re: [PATCH V2] rs6000: New pass for replacement of adjacent loads fusion (lxv).

2024-02-14 Thread Richard Sandiford
Ajit Agarwal writes: >>> diff --git a/gcc/emit-rtl.cc b/gcc/emit-rtl.cc >>> index 1856fa4884f..ffc47a6eaa0 100644 >>> --- a/gcc/emit-rtl.cc >>> +++ b/gcc/emit-rtl.cc >>> @@ -921,7 +921,7 @@ validate_subreg (machine_mode omode, machine_mode imode, >>> return false; >>> >>>/* The subreg o

Re: [PATCH V1] Common infrastructure for load-store fusion for aarch64 and rs6000 target

2024-02-14 Thread Richard Sandiford
Ajit Agarwal writes: >>> diff --git a/gcc/df-problems.cc b/gcc/df-problems.cc >>> index 88ee0dd67fc..a8d0ee7c4db 100644 >>> --- a/gcc/df-problems.cc >>> +++ b/gcc/df-problems.cc >>> @@ -3360,7 +3360,7 @@ df_set_unused_notes_for_mw (rtx_insn *insn, struct >>> df_mw_hardreg *mws, >>>if (df_whol

Re: [PATCH V1] Common infrastructure for load-store fusion for aarch64 and rs6000 target

2024-02-14 Thread Richard Sandiford
Ajit Agarwal writes: > On 14/02/24 10:56 pm, Richard Sandiford wrote: >> Ajit Agarwal writes: >>>>> diff --git a/gcc/df-problems.cc b/gcc/df-problems.cc >>>>> index 88ee0dd67fc..a8d0ee7c4db 100644 >>>>> --- a/gcc/df-problems.cc &

Re: [PATCH V2] rs6000: New pass for replacement of adjacent loads fusion (lxv).

2024-02-14 Thread Richard Sandiford
Ajit Agarwal writes: > Hello Richard: > > > On 14/02/24 10:45 pm, Richard Sandiford wrote: >> Ajit Agarwal writes: >>>>> diff --git a/gcc/emit-rtl.cc b/gcc/emit-rtl.cc >>>>> index 1856fa4884f..ffc47a6eaa0 100644 >>>>> --- a/gcc

Re: [PATCH] aarch64: Use vec_perm_indices::new_shrunk_vector in aarch64_evpc_reencode

2024-02-14 Thread Richard Sandiford
Andrew Pinski writes: > While working on PERM related stuff, I can across that aarch64_evpc_reencode > was manually figuring out if we shrink the perm indices instead of > using vec_perm_indices::new_shrunk_vector; shrunk was added after reencode > was added. > > Built and tested for aarch64-linux

Re: [PATCH] aarch64: Reword error message for mismatch guard size and probing interval [PR90155]

2024-02-14 Thread Richard Sandiford
Andrew Pinski writes: > The error message is not clear what options are being taked about when it > says the values > need to match; plus there is a wrong quotation dealing with the diagnostic. > So this changes the error message to be exactly talking about the param > options that > are being t

Re: [PATCH] middle-end/113576 - avoid out-of-bound vector element access

2024-02-15 Thread Richard Sandiford
Richard Biener writes: > On Wed, 14 Feb 2024, Richard Sandiford wrote: > >> Richard Biener writes: >> > On Wed, 14 Feb 2024, Richard Sandiford wrote: >> > >> >> Richard Biener writes: >> >> > The following avoids accessing out-of-bou

Re: [PATCH] aarch64: Improve PERM<{0}, a, ...> (64bit) by adding whole vector shift right [PR113872]

2024-02-15 Thread Richard Sandiford
Andrew Pinski writes: > The backend currently defines a whole vector shift left for 64bit vectors, > adding the > shift right can also improve code for some PERMs too. So this adds that > pattern. Is this reversed? It looks like we have the shift right and the patch is adding the shift left (a

Re: [PATCH] tree-optimization/113910 - huge compile time during PTA

2024-02-15 Thread Richard Sandiford
Richard Biener writes: > On Wed, 14 Feb 2024, Richard Biener wrote: > >> For the testcase in PR113910 we spend a lot of time in PTA comparing >> bitmaps for looking up equivalence class members. This points to >> the very weak bitmap_hash function which effectively hashes set >> and a subset of n

Re: [PATCH] aarch64: Fix undefined code in vect_ctz_1.c

2024-02-15 Thread Richard Sandiford
Andrew Pinski writes: > The testcase gcc.target/aarch64/vect_ctz_1.c fails execution when running > with -march=armv9-a due to the testcase calls __builtin_ctz with a value of 0. > The testcase should not depend on undefined behavior of __builtin_ctz. So this > changes it to use the g form with th

Re: [PATCH] aarch64, acle header: Cast uint64_t pointers to DIMode.

2024-02-15 Thread Richard Sandiford
Iain Sandoe writes: >> On 5 Feb 2024, at 14:56, Iain Sandoe wrote: >> >> Tested on aarch64-linux,darwin and a cross from aarch64-darwin to linux, >> OK for trunk, or some alternative is needed? > > Hmm.. apparently, this fails the linaro pre-commit CI for g++ with: > error: invalid conversion fr

Re: [PATCH] aarch64, acle header: Cast uint64_t pointers to DIMode.

2024-02-19 Thread Richard Sandiford
Iain Sandoe writes: >> On 15 Feb 2024, at 18:05, Richard Sandiford >> wrote: >> >> Iain Sandoe writes: >>>> On 5 Feb 2024, at 14:56, Iain Sandoe wrote: >>>> >>>> Tested on aarch64-linux,darwin and a cross from aarch64-darwi

Re: [PATCH] rtl-optimization/54052 - RTL SSA PHI insertion compile-time hog

2024-02-19 Thread Richard Sandiford
Richard Biener writes: > The following tries to address the PHI insertion compile-time hog in > RTL fwprop observed with the PR54052 testcase where the loop computing > the "unfiltered" set of variables possibly needing PHI nodes for each > block exhibits quadratic compile-time and memory-use. > >

Re: [PATCH] rtl-optimization/54052 - RTL SSA PHI insertion compile-time hog

2024-02-19 Thread Richard Sandiford
Richard Biener writes: > On Mon, 19 Feb 2024, Richard Sandiford wrote: > >> Richard Biener writes: >> > The following tries to address the PHI insertion compile-time hog in >> > RTL fwprop observed with the PR54052 testcase where the loop computing >> > the

Re: [PATCH] rtl-optimization/54052 - RTL SSA PHI insertion compile-time hog

2024-02-19 Thread Richard Sandiford
Richard Biener writes: >> I suppose that's better than the first version when a block has a >> large number of dominance frontiers. But I can't remember whether >> that was the case in PR98863. I have a feeling that I tried the above >> as part of the PR, since it's the obvious way of applying l

Re: [PATCH] AArch64: Update system register database.

2024-02-20 Thread Richard Sandiford
Victor Do Nascimento writes: > [...] > diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h > index 157a0b9dfa5..45e901cda64 100644 > --- a/gcc/config/aarch64/aarch64.h > +++ b/gcc/config/aarch64/aarch64.h > @@ -297,6 +297,26 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE = > A

Re: [PATCH]AArch64: remove ls64 from being mandatory on armv8.7-a..

2024-02-20 Thread Richard Sandiford
Tamar Christina writes: > Hi, this I a new version of the patch updating some additional tests > because some of the LTO tests required a newer binutils than my distro had. > > --- > > The Arm Architectural Reference Manual (Version J.a, section A2.9 on > FEAT_LS64) > shows that ls64 is an optio

Re: [PATCH][GCC 12] aarch64: Avoid out-of-range shrink-wrapped saves [PR111677]

2024-02-20 Thread Richard Sandiford
Alex Coplan writes: > On 14/02/2024 11:18, Richard Sandiford wrote: >> Alex Coplan writes: >> > This is a backport of the GCC 13 fix for PR111677 to the GCC 12 branch. >> > The only part of the patch that isn't a straight cherry-pick is due to >> > the

Re: [PATCH]AArch64: update vget_set_lane_1.c test output

2024-02-20 Thread Richard Sandiford
Tamar Christina writes: >> -Original Message- >> From: Richard Sandiford >> Sent: Thursday, February 1, 2024 4:42 PM >> To: Tamar Christina >> Cc: Andrew Pinski ; gcc-patches@gcc.gnu.org; nd >> ; Richard Earnshaw ; Marcus >> Shawcroft ; Kyrylo

[pushed] aarch64: Fix streaming-compatible code with -mtrack-speculation [PR113805]

2024-02-20 Thread Richard Sandiford
This patch makes -mtrack-speculation work on streaming-compatible functions. There were two related issues. The first is that the streaming-compatible code was using TB(N)Z unconditionally, whereas those instructions are not allowed with speculation tracking. That part can be fixed in a similar w

Re: [PATCH] libgcc, aarch64: Allow for BE platforms in heap trampolines.

2024-02-20 Thread Richard Sandiford
Iain Sandoe writes: > Andrew Pinski pointed out on irc, that the current implementation of the > heap trampoline code fragment would make the instruction byte order follow > memory byte order for BE AArch64, which is not what is required. > > This patch revises the initializers so that instruction

[PATCH] Allow mode-switching to introduce internal loops [PR113220]

2024-02-21 Thread Richard Sandiford
In this PR, the SME mode-switching code needs to insert a stack-probe loop for an alloca. This patch allows the target to do that. There are two parts to it: allowing loops for insertions in blocks, and allowing them for insertions on edges. The former can be handled entirely within mode-switchi

[pushed] aarch64: Stack-clash prologues and VG saves [PR113995]

2024-02-21 Thread Richard Sandiford
This patch fixes an ICE for a combination of: - -fstack-clash-protection - a frame that has SVE save slots - a frame that has no GPR save slots - a frame that has a VG save slot The allocation code was folding the SVE save slot allocation into the initial frame allocation, so that we had one allo

[pushed] aarch64: Remove the aarch64_commit_lazy_save pattern

2024-02-21 Thread Richard Sandiford
The main purpose of the aarch64_commit_lazy_save pattern was to defer insertion of a half-diamond until splitting, since splitting knew how to create the associated basic blocks. However, the fix for PR113220 means that mode-switching also knows how to do that. This patch therefore removes the pa

[PATCH] aarch64: Ensure ZT0 is zeroed in a new-ZT0 function

2024-02-21 Thread Richard Sandiford
ACLE guarantees that a function like: __arm_new("zt0") foo() { ... } will start with ZT0 equal to zero. I'd forgotten to enforce that after commiting a lazy save. After such a save, we should zero ZA iff the function has ZA state and zero ZT0 iff the function has ZT0 state. Tested on aarch64

[pushed] aarch64: Fix sibcalls involving shared-ZT0 functions

2024-02-21 Thread Richard Sandiford
In: void bar() __arm_inout("za"); void foo() __arm_inout("za", "zt0") { bar(); } foo cannot tail-call bar because foo needs to restore ZT0 after the call. I'd forgotten to update the ok_for_sibcall rules to handle this when adding SME2. Thanks to Sander de Smalen for the spot. Tested on aa

[pushed] aarch64: More SME vs -mtrack-speculation

2024-02-21 Thread Richard Sandiford
The sequence to commit a lazy save includes a branch based on whether TPIDR2_EL0 is zero. The code assumed that CBZ could be used for this, but that instruction is forbidden when -mtrack-speculation is being used. Tested on aarch64-linux-gnu & pushed. Richard gcc/ * config/aarch64/aarc

[pushed] aarch64: Remove duplicated call

2024-02-21 Thread Richard Sandiford
I noticed while working on another patch that we had a duplicated call to aarch64_process_target_attr. Tested on aarch64-linux-gnu & pushed. Richard gcc/ * config/aarch64/aarch64.cc (aarch64_option_valid_attribute_p): Remove duplicated call. --- gcc/config/aarch64/aarch64.cc |

Re: [PATCH 0/2 V2] aarch64: Place target independent and dependent code in one file.

2024-02-22 Thread Richard Sandiford
Ajit Agarwal writes: > Hello Alex/Richard: > > I have placed target indpendent and target dependent code in > aarch64-ldp-fusion for load store fusion. > > Common infrastructure of load store pair fusion is divided into > target independent and target dependent code. > > Target independent code is

[pushed] aarch64: Spread out FPR usage between RA regions [PR113613]

2024-02-23 Thread Richard Sandiford
early-ra already had code to do regrename-style "broadening" of the allocation, to promote scheduling freedom. However, the pass divides the function into allocation regions and this broadening only worked within a single region. This meant that if a basic block contained one subblock of FPR use,

[pushed] aarch64: Tighten early-ra chain test for wide registers [PR113295]

2024-02-23 Thread Richard Sandiford
Most code in early-ra used is_chain_candidate to check whether we should chain two allocnos. This included both tests that matter for correctness and tests for certain heuristics. Once that test passes for one pair of allocnos, we test whether it's safe to chain the containing groups (which might

Re: [PATCH v1 03/13] aarch64: Mark x18 register as a fixed register for MS ABI

2024-02-23 Thread Richard Sandiford
"Richard Earnshaw (lists)" writes: > On 21/02/2024 18:30, Evgeny Karpov wrote: >> > +/* X18 reserved for the TEB on Windows. */ > +#ifdef TARGET_ARM64_MS_ABI > +# define FIXED_X18 1 > +# define CALL_USED_X18 0 > +#else > +# define FIXED_X18 0 > +# define CALL_USED_X18 1 > +#endif > > I'm not ove

Re: [PATCH v1 04/13] aarch64: Add aarch64-w64-mingw32 COFF

2024-02-23 Thread Richard Sandiford
Evgeny Karpov writes: > From 55fd2a63afa9abb3543d714b6f5925efd2682e08 Mon Sep 17 00:00:00 2001 > From: Zac Walker > Date: Wed, 21 Feb 2024 12:20:46 +0100 > Subject: [PATCH v1 04/13] aarch64: Add aarch64-w64-mingw32 COFF > > Define ASM specific for COFF format on AArch64. > > gcc/ChangeLog: > >

Re: [PATCH v1 08/13] aarch64: Add Cygwin and MinGW environments for AArch64

2024-02-23 Thread Richard Sandiford
Evgeny Karpov writes: > From 1ea6efa6f88d131884ecef21c4b5d2ecbab14ea7 Mon Sep 17 00:00:00 2001 > From: Zac Walker > Date: Tue, 20 Feb 2024 18:06:36 +0100 > Subject: [PATCH v1 08/13] aarch64: Add Cygwin and MinGW environments for > AArch64 > > Define Cygwin and MinGW environment such as types, SE

Re: [PATCH v1 02/13] aarch64: The aarch64-w64-mingw32 target implements

2024-02-23 Thread Richard Sandiford
Evgeny Karpov writes: > The calling ABI enum definition has been done following a similar convention > in > https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/config/i386/i386-opts.h;h=ef2825803b32001b9632769bdff196d1e43d27ba;hb=refs/heads/master#l41 > > MS_ABI is used in gcc/config/i386/mingw32.h

Re: [PATCH v1 00/13] Add aarch64-w64-mingw32 target

2024-02-23 Thread Richard Sandiford
"Richard Earnshaw (lists)" writes: > On 21/02/2024 17:47, Evgeny Karpov wrote: >> Hello, >> >> We would like to take your attention to the review of changes for the >> new GCC target, aarch64-w64-mingw32. The new target will be >> supported, tested, added to CI, and maintained by Linaro. This mar

[PATCH] vect: Tighten check for impossible SLP layouts [PR113205]

2024-02-24 Thread Richard Sandiford
During its forward pass, the SLP layout code tries to calculate the cost of a layout change on an incoming edge. This is taken as the minimum of two costs: one in which the source partition keeps its current layout (chosen earlier during the pass) and one in which the source partition switches to

[pushed] Restrict gcc.dg/rtl/aarch64/pr113295-1.c to aarch64

2024-02-24 Thread Richard Sandiford
I keep forgetting that gcc.dg/rtl is the one testsuite where tests in target-specific subdirectories aren't automatically restricted to that target. Pushed as obvious after testing on aarch64-linux-gnu & x86_64-linux-gnu. Richard gcc/testsuite/ * gcc.dg/rtl/aarch64/pr113295-1.c: Restric

Re: [PATCH] Add a late-combine pass [PR106594]

2024-01-10 Thread Richard Sandiford
Just a note that, following discussion on IRC, I'll pull this for GCC 14 and resubmit for GCC 15. There was also pushback on IRC about making the pass opt-in. Enabling it for x86_64 would mean fixing RPAD to use a representation that is more robust against recombination, but as you can imagine, it

Re: [PATCH] aarch64: Make ldp/stp pass off by default

2024-01-10 Thread Richard Sandiford
Alex Coplan writes: > As discussed on IRC, this makes the aarch64 ldp/stp pass off by default. This > should stabilize the trunk and give some time to address the P1 regressions. > > Sorry for the breakage. > > Bootstrapped/regtested on aarch64-linux-gnu, OK for trunk? > > Alex > > gcc/ChangeLog:

Re: [PATCH] AArch64: Reassociate CONST in address expressions [PR112573]

2024-01-10 Thread Richard Sandiford
Wilco Dijkstra writes: > GCC tends to optimistically create CONST of globals with an immediate offset. > However it is almost always better to CSE addresses of globals and add > immediate > offsets separately (the offset could be merged later in single-use cases). > Splitting CONST expressions wi

Re: [PATCH v4] AArch64: Cleanup memset expansion

2024-01-10 Thread Richard Sandiford
Wilco Dijkstra writes: > Hi Richard, > >>> +#define MAX_SET_SIZE(speed) (speed ? 256 : 96) >> >> Since this isn't (AFAIK) a standard macro, there doesn't seem to be >> any need to put it in the header file. It could just go at the head >> of aarch64.cc instead. > > Sure, I've moved it in v4. > >>

Re: [PATCH v3] aarch64: Fix dwarf2cfi ICEs due to recent CFI note changes [PR113077]

2024-01-11 Thread Richard Sandiford
Alex Coplan writes: > This is a v3 which addresses shortcomings of the v2 patch. v2 was > posted here: > https://gcc.gnu.org/pipermail/gcc-patches/2024-January/642448.html > > The main issue in v2 is that we were using the final (transformed) > patterns in combine_reg_notes instead of the initial

Re: [RFC] Either fix or disable SME feature for `aarch64-w64-mingw32` target?

2024-01-11 Thread Richard Sandiford
ype`, and `.size` pseudo-ops for > `aarch64-w64-mingw32` target > Cc: Andrew Pinski , > Richard Sandiford , > Jonathan Yong <10wa...@gmail.com> > > Recent change > (https://gcc.gnu.org/pipermail/gcc-cvs/2023-December/394915.html) added a > generic SME support us

[PATCHv3] aarch64/expr: Use ccmp when the outer expression is used twice [PR100942]

2024-01-12 Thread Richard Sandiford
/aarch64/ccmp_3.c: New test. * gcc.target/aarch64/ccmp_4.c: New test. * gcc.target/aarch64/ccmp_5.c: New test. Signed-off-by: Andrew Pinski Co-authored-by: Richard Sandiford --- gcc/ccmp.cc | 12 +-- gcc/cfgexpand.cc | 31 ++-

[PATCH 1/2] aarch64: Use a separate group for SME builtins [PR112989]

2024-01-12 Thread Richard Sandiford
The PR shows that we were registering the same overloaded SVE builtins twice. This was supposed to be prevented by function_builder::add_overloaded_function, which uses a map to detect whether a function of the same name has already been registered. add_overloaded_function then had some asserts t

[PATCH 2/2] aarch64: Use a global map to detect duplicated overloads [PR112989]

2024-01-12 Thread Richard Sandiford
As explained in the covering note to the previous patch, the fact that aarch64-sve-* is now used for multiple header files means that function_builder::add_overloaded_function now needs to use a global map to detect duplicated overload functions, instead of the member variable that it used previous

[pushed] aarch64: Rework uxtl->zip optimisation [PR113196]

2024-01-12 Thread Richard Sandiford
g:f26f92b534f9 implemented unsigned extensions using ZIPs rather than UXTL{,2}, since the former has a higher throughput than the latter on amny cores. The optimisation worked by lowering directly to ZIP during expand, so that the zero input could be hoisted and shared. However, changing to ZIP m

Ping: [PATCHv3] aarch64/expr: Use ccmp when the outer expression is used twice [PR100942]

2024-01-22 Thread Richard Sandiford
Ping for the expr/cfgexpand bits Richard Sandiford writes: > Andrew Pinski writes: >> Ccmp is not used if the result of the and/ior is used by both >> a GIMPLE_COND and a GIMPLE_ASSIGN. This improves the code generation >> here by using ccmp in this case. >> Two ch

Re: [PATCH 1/4] rtl-ssa: Run finalize_new_accesses forwards [PR113070]

2024-01-22 Thread Richard Sandiford
Alex Coplan writes: > The next patch in this series exposes an interface for creating new uses > in RTL-SSA. The intent is that new user-created uses can consume new > user-created defs in the same change group. This is so that we can > correctly update uses of memory when inserting a new store

Re: [PATCH 2/4] rtl-ssa: Support for creating new uses [PR113070]

2024-01-22 Thread Richard Sandiford
Alex Coplan writes: > This exposes an interface for users to create new uses in RTL-SSA. > This is needed for updating uses after inserting a new store pair insn > in the aarch64 load/store pair fusion pass. > > gcc/ChangeLog: > > PR target/113070 > * rtl-ssa/accesses.cc (function_info

Re: [PATCH 3/4] rtl-ssa: Ensure new defs get inserted [PR113070]

2024-01-22 Thread Richard Sandiford
Alex Coplan writes: > In r14-5820-ga49befbd2c783e751dc2110b544fe540eb7e33eb I added support to > RTL-SSA for inserting new insns, which included support for users > creating new defs. > > However, I missed that apply_changes_to_insn needed updating to ensure > that the new defs actually got insert

Re: [PATCH 4/4] aarch64: Fix up uses of mem following stp insert [PR113070]

2024-01-22 Thread Richard Sandiford
Alex Coplan writes: > As the PR shows (specifically #c7) we are missing updating uses of mem > when inserting an stp in the aarch64 load/store pair fusion pass. This > patch fixes that. > > RTL-SSA has a simple view of memory and by default doesn't allow stores > to be re-ordered w.r.t. other sto

Re: [PATCH] aarch64: Don't record hazards against paired insns [PR113356]

2024-01-22 Thread Richard Sandiford
Alex Coplan writes: > Hi, > > For the testcase in the PR, we try to pair insns where the first has > writeback and the second uses the updated base register. This causes us > to record a hazard against the second insn, thus narrowing the move > range away from the end of the BB. > > However, it i

Re: [PATCH 1/3] rtl-ssa: Provide easier access to debug uses [PR113089]

2024-01-22 Thread Richard Sandiford
Alex Coplan writes: > This patch adds some accessors to set_info and use_info to make it > easier to get at and iterate through uses in debug insns. > > It is used by the aarch64 load/store pair fusion pass in a subsequent > patch to fix PR113089, i.e. to update debug uses in the pass. > > Bootstr

Re: [PATCH 2/3] aarch64: Re-parent trailing nondebug base reg uses [PR113089]

2024-01-22 Thread Richard Sandiford
Alex Coplan writes: > While working on PR113089, I realised we where missing code to re-parent > trailing nondebug uses of the base register in the case of cancelling > writeback in the load/store pair pass. This patch fixes that. > > Bootstrapped/regtested as a series on aarch64-linux-gnu (with/

Re: [PATCH 3/3] aarch64: Fix up debug uses in ldp/stp pass [PR113089]

2024-01-22 Thread Richard Sandiford
Sorry for the earlier review comment about debug insns. I hadn't looked far enough into the queue to see this patch. Alex Coplan writes: > As the PR shows, we were missing code to update debug uses in the > load/store pair fusion pass. This patch fixes that. > > Note that this patch depends on

Re: [PATCH] aarch64: Don't assert recog success in ldp/stp pass [PR113114]

2024-01-22 Thread Richard Sandiford
Alex Coplan writes: > Hi, > > The PR shows two different cases where try_promote_writeback produces an > RTL pattern which isn't recognized. Currently this leads to an ICE, as > we assert recog success, but I think it's better just to back out of the > changes gracefully if recog fails (as we do

[pushed] aarch64: Avoid registering duplicate C++ overloads [PR112989]

2024-01-23 Thread Richard Sandiford
In the original fix for this PR, I'd made sure that including didn't reach the final return in simulate_builtin_function_decl (which would indicate duplicate function definitions). But it seems I forgot to do the same thing for C++, which defines all of its overloads directly. This patch fixes a

Re: [PATCH 3/4] rtl-ssa: Ensure new defs get inserted [PR113070]

2024-01-23 Thread Richard Sandiford
Alex Coplan writes: > On 22/01/2024 13:49, Richard Sandiford wrote: >> Alex Coplan writes: >> > In r14-5820-ga49befbd2c783e751dc2110b544fe540eb7e33eb I added support to >> > RTL-SSA for inserting new insns, which included support for users >> > creating new de

Re: [PATCH 3/3] aarch64: Fix up debug uses in ldp/stp pass [PR113089]

2024-01-23 Thread Richard Sandiford
Alex Coplan writes: >> > + writeback_pats[i] = orig_rtl[i]; >> > + >> > + // Now that we've characterized the defs involved, go through the >> > + // debug uses and determine how to update them (if needed). >> > + for (auto use : set->debug_insn_uses ()) >> > + { >> > +if

Re: [PATCH] aarch64: enforce lane checking for intrinsics

2024-01-23 Thread Richard Sandiford
Alexandre Oliva writes: > Calling arm_neon.h functions that take lanes as arguments may fail to > report malformed values if the intrinsic happens to be optimized away, > e.g. because it is pure or const and the result is unused. > > Adding __AARCH64_LANE_CHECK calls to the always_inline functions

Re: [RFC] Either fix or disable SME feature for `aarch64-w64-mingw32` target?

2024-01-23 Thread Richard Sandiford
Radek Barton writes: > Hello Richard. > > Thank you for your suggestion. I am sending a patch update according to it. > >> How about avoiding the clash by using the names HIDDEN, SYMBOL_TYPE and >> SYMBOL_SIZE, with SYMBOL_TYPE taking the symbol type as argument? > > Yes, unless the symbol is expl

Re: [PATCH] libstdc++: add ARM SVE support to std::experimental::simd

2024-01-23 Thread Richard Sandiford
Matthias Kretz writes: > On Sunday, 10 December 2023 14:29:45 CET Richard Sandiford wrote: >> Thanks for the patch and sorry for the slow review. > > Sorry for my slow reaction. I needed a long vacation. For now I'll focus on > the design question wrt. multi-arch compi

Re: [PATCH]AArch64: Do not allow SIMD clones with simdlen 1 [PR113552]

2024-01-24 Thread Richard Sandiford
Tamar Christina writes: > Hi All, > > The AArch64 vector PCS does not allow simd calls with simdlen 1, > however due to a bug we currently do allow it for num == 0. > > This causes us to emit a symbol that doesn't exist and we fail to link. > > Bootstrapped Regtested on aarch64-none-linux-gnu and

Re: [PATCH]AArch64: Fix expansion of Advanced SIMD div and mul using SVE [PR109636]

2024-01-24 Thread Richard Sandiford
Tamar Christina writes: > Hi All, > > As suggested in the ticket this replaces the expansion by converting the > Advanced SIMD types to SVE types by simply printing out an SVE register for > these instructions. > > This fixes the subreg issues since there are no subregs involved anymore. > > Boots

Re: [PATCH] fold-const: Handle AND, IOR, XOR with stepped vectors [PR112971].

2024-01-24 Thread Richard Sandiford
Richard Biener writes: > On Mon, 15 Jan 2024, Robin Dapp wrote: > >> I gave it another shot now by introducing a separate function as >> Richard suggested. It's probably not at the location he intended. >> >> The way I read the discussion there hasn't been any consensus >> on how (or rather wher

Re: [PATCH] AArch64: aarch64_class_max_nregs mishandles 64-bit structure modes [PR112577]

2024-01-24 Thread Richard Sandiford
Tejas Belagod writes: > The target hook aarch64_class_max_nregs returns the incorrect result for > 64-bit > structure modes like V31DImode or V41DFmode etc. The calculation of the nregs > is based on the size of AdvSIMD vector register for 64-bit modes which ought > to > be UNITS_PER_VREG / 2.

Re: [PATCH] aarch64: Check the ldp/stp policy model correctly when mem ops are reversed.

2024-01-24 Thread Richard Sandiford
Manos Anagnostakis writes: > The current ldp/stp policy framework implementation was missing cases, where > the memory operands were reversed. Therefore the call to the framework > function > is moved after the lower mem check with the suitable parameters. Also removes > the mode of aarch64_opera

Re: [PATCH v1 2/4] C++: Support clang compatible [[musttail]]

2024-01-24 Thread Richard Sandiford
Thanks for doing this. I'm not qualified to review the patch properly, but was just curious... Andi Kleen writes: > This patch implements a clang compatible [[musttail]] attribute for > returns. > > musttail is useful as an alternative to computed goto for interpreters. > With computed goto the

Re: [PATCH] aarch64: Fix eh_return for -mtrack-speculation [PR112987]

2024-01-24 Thread Richard Sandiford
Szabolcs Nagy writes: > Recent commit introduced a conditional branch in eh_return epilogues > that is not compatible with speculation tracking: > > commit 426fddcbdad6746fe70e031f707fb07f55dfb405 > Author: Szabolcs Nagy > CommitDate: 2023-11-27 15:52:48 + > > aarch64: Use br inst

Re: [PATCH] aarch64: Fix __builtin_apply with -mgeneral-regs-only [PR113486]

2024-01-24 Thread Richard Sandiford
Andrew Pinski writes: > The problem here is the builtin apply mechanism thinks the FP registers > are to be used due to get_raw_arg_mode not returning VOIDmode. This > fixes that oversight and the backend now returns VOIDmode for non-general-regs > if TARGET_GENERAL_REGS_ONLY is true. > > Built an

Re: [PATCH] Fix vect_long_mult for aarch64 [PR109705]

2024-01-24 Thread Richard Sandiford
Andrew Pinski writes: > On aarch64, vectorization of `long` multiply can be done if SVE is enabled > or if long is 32bit (ILP32). It can also be done for constants too but there > is no effective target test for that just yet. > > Build and tested on aarch64-linux-gnu with no regressions (also tes

[pushed] aarch64: Avoid paradoxical subregs in UXTL split [PR113485]

2024-01-25 Thread Richard Sandiford
g:74e3e839ab2d36841320 handled the UXTL{,2}-ZIP[12] optimisation in split1. The UXTL input is a 64-bit vector of N-bit elements and the result is a 128-bit vector of 2N-bit elements. The corresponding ZIP1 operates on 128-bit vectors of N-bit elements. This meant that the ZIP1 input had to be a

Re: [PATCH] aarch64: Fix movv8di for overlapping register and memory load [PR113550]

2024-01-25 Thread Richard Sandiford
Andrew Pinski writes: > The split for movv8di is not ready to handle the case where the setting > register overlaps with the address of the memory that is being load. > Fixing the split than just making the output constraint as an early clobber > for this alternative. The split would first need to

[pushed] aarch64: Fix out-of-bounds ENCODED_ELT access [PR113572]

2024-01-25 Thread Richard Sandiford
When generalising vector_cst_all_same, I'd forgotten to update VECTOR_CST_ENCODED_ELT to VECTOR_CST_ELT. The check deliberately looks at implicitly encoded elements in some cases. Tested on aarch64-linux-gnu & pushed. Richard gcc/ PR target/113572 * config/aarch64/aarch64-sve-b

Re: [PATCH v2] aarch64: Fix eh_return for -mtrack-speculation [PR112987]

2024-01-25 Thread Richard Sandiford
Szabolcs Nagy writes: > Recent commit introduced a conditional branch in eh_return epilogues > that is not compatible with speculation tracking: > > commit 426fddcbdad6746fe70e031f707fb07f55dfb405 > Author: Szabolcs Nagy > CommitDate: 2023-11-27 15:52:48 + > > aarch64: Use br inst

Re: [PATCH] aarch64: Fix function multiversioning mangling

2024-01-25 Thread Richard Sandiford
Andrew Carlotti writes: > It would be neater if the middle end for target_clones used a target > hook for version name mangling, so we only do version name mangling > once. However, that would require more intrusive refactoring that will > have to wait till Stage 1. > > > This patch builds upon t

Re: [PATCH v4 1/4] libatomic: atomic_16.S: Improve ENTRY, END and ALIAS macro interface

2024-01-25 Thread Richard Sandiford
Victor Do Nascimento writes: > The introduction of further architectural-feature dependent ifuncs > for AArch64 makes hard-coding ifunc `_i' suffixes to functions > cumbersome to work with. It is awkward to remember which ifunc maps > onto which arch feature and makes the code harder to maintain

Re: [PATCH v4 2/4] libatomic: Add support for __ifunc_arg_t arg in ifunc resolver

2024-01-25 Thread Richard Sandiford
Victor Do Nascimento writes: > With support for new atomic features in Armv9.4-a being indicated by > HWCAP2 bits, Libatomic's ifunc resolver must now query its second > argument, of type __ifunc_arg_t*. > > We therefore make this argument known to libatomic, allowing us to > query hwcap2 bits in

Re: [PATCH v4 3/4] libatomic: Enable LSE128 128-bit atomics for armv9.4-a

2024-01-25 Thread Richard Sandiford
Victor Do Nascimento writes: > The armv9.4-a architectural revision adds three new atomic operations > associated with the LSE128 feature: > > * LDCLRP - Atomic AND NOT (bitclear) of a location with 128-bit > value held in a pair of registers, with original data loaded into > the same 2 regi

Re: [PATCH v4 4/4] aarch64: Add explicit checks for implicit LSE/LSE2 requirements.

2024-01-25 Thread Richard Sandiford
Victor Do Nascimento writes: > At present, Evaluation of both `has_lse2(hwcap)' and > `has_lse128(hwcap)' may require issuing an `mrs' instruction to query > a system register. This instruction, when issued from user-space > results in a trap by the kernel which then returns the value read in > b

Re: [PATCH 2/2] aarch64: Add support for _BitInt

2024-01-25 Thread Richard Sandiford
Andre Vieira writes: > This patch adds support for C23's _BitInt for the AArch64 port when compiling > for little endianness. Big Endianness requires further target-agnostic > support and we therefor disable it for now. > > gcc/ChangeLog: > > * config/aarch64/aarch64.cc (TARGET_C_BITINT_TYP

Re: [PATCH v2 2/2] libatomic: Add rcpc3 128-bit atomic operations for AArch64

2024-01-26 Thread Richard Sandiford
Victor Do Nascimento writes: > @@ -712,6 +760,27 @@ ENTRY (libat_test_and_set_16) > END (libat_test_and_set_16) > > > +/* Alias all LSE128_LRCPC3 ifuncs to their specific implementations, > + that is, map it to LSE128, LRCPC or CORE as appropriate. */ > + > +ALIAS (libat_exchange_16, LSE12

[PATCH] vect: Tighten vect_determine_precisions_from_range [PR113281]

2024-01-27 Thread Richard Sandiford
This was another PR caused by the way that vect_determine_precisions_from_range handle shifts. We tried to narrow 32768 >> x to a 16-bit shift based on range information for the inputs and outputs, with vect_recog_over_widening_pattern (after PR110828) adjusting the shift amount. But this doesn't

Re: [aarch64] PR112950: gcc.target/aarch64/sve/acle/general/dupq_5.c fails on aarch64_be-linux-gnu

2024-01-27 Thread Richard Sandiford
Prathamesh Kulkarni writes: > Hi, > The test passes -mlittle-endian option but doesn't have target check > for aarch64_little_endian and thus fails to compile on > aarch64_be-linux-gnu. The patch adds the missing aarch64_little_endian > target check, which makes it unsupported on the target. > OK

Re: [PATCH] aarch64: Ensure iterator validity when updating debug uses [PR113616]

2024-01-29 Thread Richard Sandiford
Alex Coplan writes: > Hi, > > The fix for PR113089 introduced range-based for loops over the > debug_insn_uses of an RTL-SSA set_info, but in the case that we reset a > debug insn, the use would get removed from the use list, and thus we > would end up using an invalidated iterator in the next ite

Re: [aarch64] PR112950: gcc.target/aarch64/sve/acle/general/dupq_5.c fails on aarch64_be-linux-gnu

2024-01-29 Thread Richard Sandiford
Prathamesh Kulkarni writes: > On Sat, 27 Jan 2024 at 21:19, Richard Sandiford > wrote: >> >> Prathamesh Kulkarni writes: >> > Hi, >> > The test passes -mlittle-endian option but doesn't have target check >> > for aarch64_little_endian and thus

Re: [PATCH]AArch64: relax cbranch tests to accepted inverted branches [PR113502]

2024-01-29 Thread Richard Sandiford
Tamar Christina writes: > Hi All, > > Recently something in the midend had started inverting the branches by > inverting > the condition and the branches. > > While this is fine, it makes it hard to actually test. In RTL I disable > scheduling and BB reordering to prevent this. But in GIMPLE th

Re: [PATCH] aarch64: enforce lane checking for intrinsics

2024-01-29 Thread Richard Sandiford
Alexandre Oliva writes: > On Jan 23, 2024, Richard Sandiford wrote: > >> Performing the check in expand is itself wrong > > *nod* > >> So I think we should enforce the immediate range within the frontend >> instead, via TARGET_CHECK_BUILTIN_CALL. > >

[pushed] aarch64: Handle debug references to removed registers [PR113636]

2024-01-30 Thread Richard Sandiford
In this PR, we entered early-ra with quite a bit of dead code. The code was duly removed (to avoid wasting registers), but there was a dangling reference in debug instructions, which caused an ICE later. Fixed by resetting a debug instruction if it references a register that is no longer needed by

[pushed] aarch64: Avoid allocating FPRs to address registers [PR113623]

2024-01-30 Thread Richard Sandiford
For something like: void foo (void) { int *ptr; asm volatile ("%0" : "=w" (ptr)); asm volatile ("%0" :: "m" (*ptr)); } early-ra would allocate ptr to an FPR for the first asm, thus leaving an FPR address in the second asm. The address was then reloaded by LRA to make it valid. But early-r

Re: [PATCH]AArch64: relax cbranch tests to accepted inverted branches [PR113502]

2024-01-30 Thread Richard Sandiford
Richard Biener writes: > On Mon, Jan 29, 2024 at 5:00 PM Richard Sandiford > wrote: >> >> Tamar Christina writes: >> > Hi All, >> > >> > Recently something in the midend had started inverting the branches by >> > inverting >> > t

Re: [PATCH] aarch64: Avoid out-of-range shrink-wrapped saves [PR111677]

2024-01-31 Thread Richard Sandiford
Alex Coplan writes: > Hi, > > The PR shows us ICEing due to an unrecognizable TFmode save emitted by > aarch64_process_components. The problem is that for T{I,F,D}mode we > conservatively require mems to be in range for x-register ldp/stp. That > is because (at least for TImode) it can be alloca

Re: [PATCH][GCC 13] aarch64: Avoid out-of-range shrink-wrapped saves [PR111677]

2024-01-31 Thread Richard Sandiford
Alex Coplan writes: > Bootstrapped/regtested on aarch64-linux-gnu, OK for the 13 branch after > a week of the trunk fix being in? OK for the other active branches if > the same changes test cleanly there? > > GCC 14 patch for reference: > https://gcc.gnu.org/pipermail/gcc-patches/2024-January/644

Re: [PATCH] expmed: Perform mask extraction via QImode [PR112773].

2023-12-13 Thread Richard Sandiford
Robin Dapp writes: > @@ -1758,16 +1759,19 @@ extract_bit_field_1 (rtx str_rtx, poly_uint64 > bitsize, poly_uint64 bitnum, >if (VECTOR_MODE_P (outermode) && !MEM_P (op0)) > { >scalar_mode innermode = GET_MODE_INNER (outermode); >enum insn_code icode > = convert_optab

Re: [PATCH v4] aarch64: SVE/NEON Bridging intrinsics

2023-12-13 Thread Richard Sandiford
Richard Ball writes: > ACLE has added intrinsics to bridge between SVE and Neon. > > The NEON_SVE Bridge adds intrinsics that allow conversions between NEON and > SVE vectors. > > This patch adds support to GCC for the following 3 intrinsics: > svset_neonq, svget_neonq and svdup_neonq > > gcc/Chan

Re: [PATCH v2 09/11] aarch64: Rewrite non-writeback ldp/stp patterns

2023-12-13 Thread Richard Sandiford
Alex Coplan writes: > On 12/12/2023 15:58, Richard Sandiford wrote: >> Alex Coplan writes: >> > Hi, >> > >> > This is a v2 version which addresses feedback from Richard's review >> > here: >> > >> > https://gcc.gnu.org/piperma

<    3   4   5   6   7   8   9   10   11   12   >