from:"Wilco Dijkstra"

[COMMITTED] MAINTAINERS: Add myself as an aarch64 port reviewer

2025-09-07 Thread Wilco Dijkstra

portAndrew Pinski +aarch64 portWilco Dijkstra arm port (MVE) Christophe Lyon callgraph Martin Jambor C front end Marek Polacek @@ -447,7 +448,7 @@ Jerry DeLisle

[PATCH] AArch64: Add isnan expander [PR 66462]

2025-09-04 Thread Wilco Dijkstra

Add an expander for isnan using integer arithmetic. Since isnan is just a compare, enable it only with -fsignaling-nans to avoid generating spurious exceptions. This fixes part of PR66462. int isnan1 (float x) { return __builtin_isnan (x); } Before: fcmps0, s0 csetw0, v

[PATCH] doc: Document missing isinf optab [PR 101852]

2025-09-04 Thread Wilco Dijkstra

Document missing isinf optab. gcc: PR middle-end/101852 * doc/md.texi: Document isinf optab. --- diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index 973c0dd302964966a91fa8dbab85930d6dbeec9e..a9c3354891551101d25ba2e6656711dbd9c5dd09 100644 --- a/gcc/doc/md.texi +++ b/gcc/doc/m

[PATCH] AArch64: Add isfinite expander [PR 66462]

2025-08-27 Thread Wilco Dijkstra

Add an expander for isfinite using integer arithmetic. This is typically faster and avoids generating spurious exceptions on signaling NaNs. This fixes part of PR66462. int isfinite1 (float x) { return __builtin_isfinite (x); } Before: fabss0, s0 mov w0, 2139095039

[PATCH v2] AArch64: Add isinf expander [PR 66462]

2025-08-27 Thread Wilco Dijkstra

v2: Add testcase Add an expander for isinf using integer arithmetic. This is typically faster and avoids generating spurious exceptions on signaling NaNs. This fixes part of PR66462. int isinf1 (float x) { return __builtin_isinf (x); } Before: fabss0, s0 mov w0, 213909

[PATCH v2] optab: Add optab for isnan [PR 101852]

2025-08-27 Thread Wilco Dijkstra

ping Add an optab for isnan. This requires changes to the existing folding code to extend the interclass_mathfn infrastructure to support BUILT_IN_ISNAN. It now checks for a valid optab before emitting the generic expansion. There is no change if no optab is defined. Update documentation, includ

[PATCH] AArch64: Add isinf expander [PR 66462]

2025-08-13 Thread Wilco Dijkstra

Add an expander for isinf using integer arithmetic. This is typically faster and avoids generating spurious exceptions on signaling NaNs. int isinf1 (float x) { return __builtin_isinf (x); } Before: fabss0, s0 mov w0, 2139095039 fmovs31, w0 fcmps

[PATCH v2] optab: Add optab for isnan [PR 101852]

2025-08-13 Thread Wilco Dijkstra

Hi Andrew, > Note the bug report for the missing optab documentation is PR 101852; > Some the script I added there; missed isinf. Sure, I've added the PR so it gets updated. Cheers, Wilco v2: Add PR for missing doc entry Add an optab for isnan. This requires changes to the existing folding c

Re: [PATCH 0/2] aarch64: Add -msimd-memops option controlling SIMD usage

2025-08-07 Thread Wilco Dijkstra

Hi Keith, Thanks for the explanation - however I'm afraid compilers don't have a concept of implicit vs explicit use of operations or registers. > I'm trying to find all cases where that happens for data types other > than SIMD/FP. Do you know of other places where the compiler implicitly > uses

Re: [PATCH] aarch64: Mark SME functions as .variant_pcs [PR121414]

2025-08-07 Thread Wilco Dijkstra

Hi Richard, > Unlike base PCS functions, __arm_streaming and __arm_streaming_compatible > functions allow/require PSTATE.SM to be 1 on entry, so they need to > be treated as STO_AARCH64_VARIANT_PCS. > > Similarly, functions that share ZA or ZT0 with their callers require > ZA to be active on entry

[PATCH] optab: Add optab for isnan

2025-08-07 Thread Wilco Dijkstra

Add an optab for isnan. This requires changes to the existing folding code to extend the interclass_mathfn infrastructure to support BUILT_IN_ISNAN. It now checks for a valid optab before emitting the generic expansion. There is no change if no optab is defined. Update documentation, including t

[PATCH 0/2] aarch64: Add -msimd-memops option controlling SIMD usage

2025-08-06 Thread Wilco Dijkstra

Hi Keith, > This option (enabled by default) preserves existing behavior by > allowing use of Advanced SIMD registers while expanding > memset/memcpy/memmove operations into inline instructions. > > Disabling this option prevents use of these registers for environments > where the FPU may be disab

[PATCH 8/8] aarch64: Use cc when CB/CBB/CBH is out-of-range

2025-08-06 Thread Wilco Dijkstra

Hi Richard, +++ b/gcc/config/aarch64/aarch64.md @@ -876,10 +876,16 @@ (clobber (reg:CC CC_REGNUM))] "TARGET_CMPBR && aarch64_cb_rhs (, operands[1])" { -return (get_attr_far_branch (insn) == FAR_BRANCH_NO) - ? "cb\\t%0, %1, %l2" - : aarch64_gen_far_branch (operands, 2, -

[PATCH 0/8] aarch64: CMPBR fixes

2025-08-05 Thread Wilco Dijkstra

Hi Richard, This is a really good improvement - I've built all of SPEC2017 without any issues. Overall it shows almost 3.0% codesize reduction with +cmpbr! I noticed that the patches slightly increase codesize even without +cmpbr - not quite sure why. So overall this looks OK for commit. Btw as

[PATCH 6/8] aarch64: Add cc clobber to compare-and-branch patterns

2025-08-05 Thread Wilco Dijkstra

>On 8/5/25 19:43, Richard Henderson wrote: >> That said, I'm a little confused >> why we'd want to use SUBS+B.{EQ,NE} instead of SUB+CB{Z,NZ}. > > The answer to that is that B.{EQ,NE} converts easily to CSEL/CSINC/CSINV. I did it originally because that works out best - CBZ has a shorter branch

Re: [PATCH] aarch64: Fix endianness of DFmode vector constants

2025-07-09 Thread Wilco Dijkstra

Hi Richard, > aarch64_simd_valid_imm tries to decompose a constant into a repeating > series of 64 bits, since most Advanced SIMD and SVE immediate forms > require that. (The exceptions are handled first.) It does this by > building up a byte-level register image, lsb first. If the image does

[PATCH] AArch64: Use correct cost for shifted halfword load/stores

2025-07-01 Thread Wilco Dijkstra

Since all Armv9 cores support shifted LDRH/STRH, use the correct cost of zero for these. Passes regress, OK for commit? gcc: * config/aarch64/tuning_models/generic_armv9_a.h (generic_armv9_a_addrcost_table): Use zero cost for himode. --- diff --git a/gcc/config/aarch64/tuning_mo

[PATCH] AArch64: Disable TARGET_CONST_ANCHOR

2025-06-20 Thread Wilco Dijkstra

TARGET_CONST_ANCHOR appears to trigger too often, even on simple immediates. It inserts extra ADD/SUB instructions even when a single MOV exists. Disable it to improve overall code quality: on SPEC2017 it removes 1850 ADD/SUB instructions and 630 spill instructions, and SPECINT is ~0.06% faster on

[PATCH] libgcc: Cleanup HWCAP defines in cpuinfo.c

2025-04-30 Thread Wilco Dijkstra

Cleanup HWCAP defines - rather than including hwcap.h and then repeating it using #ifndef, just define the HWCAPs we need exactly as in hwcap.h. libgcc: * config/aarch64/cpuinfo.c: Cleanup HWCAP defines. --- diff --git a/libgcc/config/aarch64/cpuinfo.c b/libgcc/config/aarch64/cpuinfo.c

[PATCH] libgcc: Update FMV features to latest ACLE spec 2024Q4

2025-04-30 Thread Wilco Dijkstra

Update FMV features to latest ACLE spec of 2024Q4 - several features have been removed or merged. Add FMV support for CSSC and MOPS. Preserve the ordering in enum CPUFeatures. gcc: * common/config/aarch64/cpuinfo.h: Remove unused features, add FEAT_CSSC and FEAT_MOPS.

Re: libatomic: use HWCAPs in AArch64 ifunc tests

2025-03-13 Thread Wilco Dijkstra

Hi Richard, > Could you give details? I thought it was always known that trapped > system register accesses were slow. In the previous versions, the > checks seemed to be presented as an up-front price worth paying for > faster atomic operations, on the systems that would use those paths. > Now

Re: AArch64: Turn off outline atomics with -mcmodel=large (PR112465)

2025-03-12 Thread Wilco Dijkstra

Hi Richard, > That was also what I was trying to say. In the worst case, the linked > object has to meet the requirements of the lowest common denominator. > > And my supposition was that that isn't a property of static vs dynamic. But it is. Dynamic linking supports mixing different code models

Re: AArch64: Turn off outline atomics with -mcmodel=large (PR112465)

2025-03-07 Thread Wilco Dijkstra

Hi Richard, >> Basically the small and large model are fundamentally incompatible. The >> infamous >> "dumb linker" approach means it doesn't try to sort sections, so an ADRP >> relocation >> will be out of reach if its data is placed after a huge array. Static >> linking with GLIBC or >> enabl

Re: AArch64: Turn off outline atomics with -mcmodel=large (PR112465)

2025-03-04 Thread Wilco Dijkstra

Hi Ramana, > -Generate code for the large code model. This makes no assumptions about > -addresses and sizes of sections. Programs can be statically linked only. > The > +Generate code for the large code model. This allows large .bss and .data > +sections, however .text and .rodata must still

Re: AArch64: Turn off outline atomics with -mcmodel=large (PR112465)

2025-03-04 Thread Wilco Dijkstra

Hi Kyrill, > This restriction should be documented in invoke.texi IMO. > I also think it would be more user friendly to warn them about the > incompatibility if an explicit -moutline-atomics option is passed. > It’s okay though to silently turn off the implicit default-on option though. I've upd

Re: AArch64: Enable early scheduling for -O3 and higher (PR118351)

2025-03-04 Thread Wilco Dijkstra

Hi Richard&Kyrill, >> I’m in favour of this. > > Yeah, seems ok to me too. I suppose we ought to update the documentation too: I've added a note to the documentation. However it is impossible to be complete here since many targets switch off early scheduling under various circumstances. So I'v

libatomic: use HWCAPs in AArch64 ifunc tests

2025-03-03 Thread Wilco Dijkstra

Feedback from the kernel team suggests that it's best to only use HWCAPs rather than also use low-level checks as done by has_lse128() and has_rcpc3(). So change these to just use HWCAPs which simplifies the code and speeds up ifunc selection by avoiding expensive system register accesses. Passes

libgcc: Remove PREDRES and LS64 from AArch64 cpuinfo

2025-03-03 Thread Wilco Dijkstra

Change AArch64 cpuinfo to follow the latest updates to the FMV spec [1]: Remove FEAT_PREDRES and FEAT_LS64*. Preserve the ordering in enum CPUFeatures. Passes regress, OK for commit? [1] https://github.com/ARM-software/acle/pull/382 gcc: * common/config/aarch64/cpuinfo.h: Remove FEAT_PR

AArch64: Enable early scheduling for -O3 and higher (PR118351)

2025-03-03 Thread Wilco Dijkstra

Enable the early scheduler on AArch64 for O3/Ofast. This means GCC15 benefits from much faster build times with -O2, but avoids the regressions in lbm which is very sensitive to minor scheduling changes due to long FMA chains. We can then revisit this for GCC16. gcc: PR target/118351

AArch64: Turn off outline atomics with -mcmodel=large (PR112465)

2025-03-03 Thread Wilco Dijkstra

Outline atomics is not designed to be used with -mcmodel=large, so disable it automatically if the large code model is used. Passes regress, OK for commit? gcc: PR target/112465 * config/aarch64/aarch64.cc (aarch64_override_options_after_change_1): Turn off outline atomic

Re: [PATCH 3/3] AArch64: Add SVE vector cost to baseline tuning

2025-01-14 Thread Wilco Dijkstra

Hi Richard, > Sorry to be awkward, but I don't think we should put > AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT in base. > CHEAP_SHIFT_EXTEND is a good base flag because it means we can make full > use of a certain group of instructions. FULLY_PIPELINED_FMA similarly > means that FMA chains beh

Re: [PATCH] AArch64: Deprecate -mabi=ilp32

2025-01-14 Thread Wilco Dijkstra

Hi Richard, >> + if (TARGET_ILP32) >> + warning (OPT_Wdeprecated, "%<-mabi=ilp32%> is deprecated."); > > There should be no "." at the end of the message. Right, fixed in v2 below. > Otherwise it looks good to me, although like Kyrill says, it'll also > need a release note. I've added one,

[wwwdocs] gcc-15: Deprecate ILP32 on AArch64

2025-01-14 Thread Wilco Dijkstra

As suggested in https://gcc.gnu.org/pipermail/gcc-patches/2025-January/673558.html update the gcc-15 Changes page: Add ILP32 depreciation to Caveats section. --- diff --git a/htdocs/gcc-15/changes.html b/htdocs/gcc-15/changes.html index 1c690c4a168f4d6297ad33dd5b798e9200792dc5..d5037efb34cc8e6

Re: [PATCH] AArch64: Deprecate -mabi=ilp32

2025-01-13 Thread Wilco Dijkstra

Hi all, > In that case, I'm coming round to the idea of deprecating ILP32. > I think it was already common ground that the GNU/Linux support is dead. > watchOS would use Mach objects rather than ELF. As you say, it isn't > clear how much of the current ILP32 support would be relevant for it. > An

Re: [PATCH] AArch64: Cleanup alignment macros

2025-01-10 Thread Wilco Dijkstra

Hi Richard, > It looks like you committed the original version instead, with no extra > explanation. I suppose I should have asked for another review round > instead. Did you check the commit log? Change the AARCH64_EXPAND_ALIGNMENT macro into proper function calls to make future change

Re: [PATCH] libatomic: Cleanup AArch64 ifunc selection

2025-01-10 Thread Wilco Dijkstra

Hi Richard, > Yeah, somewhat. But won't we go on to test has_lse2 anyway, due to: > > # elif defined (LSE2_LRCPC3_ATOP) > # define IFUNC_NCOND(N) 2 > # define IFUNC_COND_1 (has_rcpc3 (hwcap, features)) > # define IFUNC_COND_2 (has_lse2 (hwcap, features)) > > If we want to reduce the

Re: [PATCH] AArch64: Deprecate -mabi=ilp32

2025-01-10 Thread Wilco Dijkstra

Hi Andrew, > Personally I would like this deprecated even for bare-metal. Yes the > iwatch ABI is an ILP32 ABI but I don't see GCC implementing that any > time soon and I suspect it would not be hard to resurrect the code at > that point. My patch deprecates it in all cases currently. It will be

Re: [PATCH] libatomic: Cleanup AArch64 ifunc selection

2025-01-10 Thread Wilco Dijkstra

Hi Richard, >> + /* LSE2 is a prerequisite for atomic LDIAPP/STILP. */ >> + if (!(hwcap & HWCAP_USCAT)) >> return false; > > Is there a reason for not using has_lse2 here? It'd be good to have > a comment if so. Yes, the MRS instructions cause expensive traps, so we try to avoid them whe

Re: [PATCH 3/3] AArch64: Add SVE vector cost to baseline tuning

2025-01-10 Thread Wilco Dijkstra

Hi Kyrill, >> Add AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS and >> AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT >> to the baseline tuning since all modern cores use it. Fix the >> neoverse512tvb tuning to be >> like Neoverse V1/V2. > > For neoversev512tvb this means adding AARCH64_EXTRA_TUNE_AVOI

Re: [PATCH 3/3] AArch64: Add SVE vector cost to baseline tuning

2025-01-10 Thread Wilco Dijkstra

ping Add AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS and AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT to the baseline tuning since all modern cores use it. Fix the neoverse512tvb tuning to be like Neoverse V1/V2. gcc/ChangeLog: * config/aarch64/aarch64-tuning-flags.def (AARCH64_EXTRA_TU

Re: [PATCH 2/3] AArch64: Add FULLY_PIPELINED_FMA to tune baseline

2025-01-10 Thread Wilco Dijkstra

ping Add FULLY_PIPELINED_FMA to tune baseline - this is a generic feature that is already enabled for some cores, but benchmarking it shows it is faster on all modern cores (SPECFP improves ~0.17% on Neoverse V1 and 0.04% on Neoverse N1). Passes regress & bootstrap, OK for commit? gcc/ChangeLo

Re: [PATCH] libatomic: Cleanup AArch64 ifunc selection

2025-01-10 Thread Wilco Dijkstra

ping Simplify and cleanup ifunc selection logic. Since LRCPC3 does not imply LSE2, has_rcpc3() should also check LSE2 is enabled. Passes regress and bootstrap, OK for commit? libatomic: * config/linux/aarch64/host-config.h (has_lse2): Cleanup. (has_lse128): Likewise. (

[PATCH] AArch64: Deprecate -mabi=ilp32

2025-01-10 Thread Wilco Dijkstra

ILP32 was originally intended to make porting to AArch64 easier. Support was never merged in the Linux kernel or GLIBC, so it has been unsupported for many years. There isn't a benefit in keeping unsupported features forever, so deprecate it now (and it could be removed in a future release). Pa

[PATCH] AArch64: Remove Cortex-A57 FMA steering pass

2025-01-10 Thread Wilco Dijkstra

As a minor cleanup remove Cortex-A57 FMA steering pass. Since Cortex-A57 is pretty old, there isn't any benefit of keeping this. Passes regress & bootstrap, OK for commit? gcc: * config.gcc (extra_objs): Remove cortex-a57-fma-steering.o. * config/aarch64/aarch64-passes.def: Remo

Re: [PATCH v2] AArch64: Block combine_and_move from creating FP literal loads

2025-01-09 Thread Wilco Dijkstra

Hi Richard, > The patch below is what I meant. It passes bootstrap & regression-test > on aarch64-linux-gnu (and so produces the same results for the tests > that you changed). Do you see any problems with this version? > If not, I think we should go with it. Thanks for the detailed example - u

Re: [PATCH] AArch64: Cleanup alignment macros

2024-12-06 Thread Wilco Dijkstra

Hi Richard, >> A common case is a constant string which is compared against some >> argument. Most string functions work on 8 or 16-byte quantities. If we >> ensure the whole array fits in one aligned load, we save time in the >> string function. >> >> Runtime data collected for strlen calls shows

Re: [PATCH] AArch64: Cleanup alignment macros

2024-12-06 Thread Wilco Dijkstra

Hi Richard, > So just to be sure I understand: we still want to align (say) an array > of 4 chars to 32 bits so that the LDR & STR are aligned, and an array of > 3 chars to 32 bits so that the LDRH & STRH for the leading two bytes are > aligned? Is that right? We don't seem to take advantage of

[PATCH] arm: Fix LDRD register overlap [PR117675]

2024-12-03 Thread Wilco Dijkstra

The register indexed variants of LDRD have complex register overlap constraints which makes them hard to use without using output_move_double (which can't be used for atomics as it doesn't guarantee to emit atomic LDRD/STRD when required). Add a new predicate and constraint for plain LDRD/STRD wi

[PATCH] AArch64: Cleanup alignment macros

2024-12-03 Thread Wilco Dijkstra

Change the AARCH64_EXPAND_ALIGNMENT macro into proper function calls to make future changes easier. Use the existing alignment settings, however avoid overaligning small array's or structs to 64 bits when there is no benefit. This gives a small reduction in data and stack size. Passes regress & b

[PATCH] libatomic: Cleanup AArch64 ifunc selection

2024-11-27 Thread Wilco Dijkstra

Simplify and cleanup ifunc selection logic. Since LRCPC3 does not imply LSE2, has_rcpc3() should also check LSE2 is enabled. Passes regress and bootstrap, OK for commit? libatomic: * config/linux/aarch64/host-config.h (has_lse2): Cleanup. (has_lse128): Likewise. (has_rcp

Re: [PATCH 3/3] AArch64: Add SVE vector cost to baseline tuning

2024-11-15 Thread Wilco Dijkstra

Hi Kyrill, > This would make USE_NEW_VECTOR_COSTS effectively the default. > Jennifer has been trying to do that as well and then to remove it (as it > would be always true) but there are some codegen regressions that still > > need to be addressed. Yes, that's the goal - we should use good tun

[PATCH 2/3] AArch64: Add FULLY_PIPELINED_FMA to tune baseline

2024-11-14 Thread Wilco Dijkstra

Add FULLY_PIPELINED_FMA to tune baseline - this is a generic feature that is already enabled for some cores, but benchmarking it shows it is faster on all modern cores (SPECFP improves ~0.17% on Neoverse V1 and 0.04% on Neoverse N1). Passes regress & bootstrap, OK for commit? gcc/ChangeLog:

[PATCH 1/3] AArch64: Add baseline tune

2024-11-14 Thread Wilco Dijkstra

Cleanup the extra tune defines by introducing AARCH64_EXTRA_TUNE_BASE as a common base supported by all modern cores. Initially set it to AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND. No change in generated code. Passes regress & bootstrap, OK for commit? gcc/ChangeLog: * config/aarch64/aarc

[PATCH 3/3] AArch64: Add SVE vector cost to baseline tuning

2024-11-14 Thread Wilco Dijkstra

Add AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS and AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT to the baseline tuning since all modern cores use it. Fix the neoverse512tvb tuning to be like Neoverse V1/V2. gcc/ChangeLog: * config/aarch64/aarch64-tuning-flags.def (AARCH64_EXTRA_TUNE_BASE

Re: [PATCH v2] AArch64: Block combine_and_move from creating FP literal loads

2024-11-13 Thread Wilco Dijkstra

Hi Richard, > ...I still think we should avoid testing can_create_pseudo_p. > Does it work with the last part replaced by: > > if (!DECIMAL_FLOAT_MODE_P (mode)) > { > if (aarch64_can_const_movi_rtx_p (src, mode) > || aarch64_float_const_representable_p (src) > || aarch64

Re: [PATCH] AArch64: Switch off early scheduling

2024-11-12 Thread Wilco Dijkstra

Hi, >>> What do you think about disabling late scheduling as well? >> >> I think this would definitely need separate consideration and evaluation >> given the above. >> >> Another thing to consider is the macro fusion machinery. IIRC it works >> during scheduling so if we don’t run any schedulin

Re: [PATCH v2] AArch64: Block combine_and_move from creating FP literal loads

2024-11-12 Thread Wilco Dijkstra

Hi Richard, > The idea was that, if we did the split during expand, the movsf/df > define_insns would then only accept the immediates that their > constraints can handle. Right, always disallowing these immediates works fine too (it seems reload doesn't require all immediates to be valid), and th

[PATCH] AArch64: Cleanup fusion defines

2024-11-08 Thread Wilco Dijkstra

Cleanup the fusion defines by introducing AARCH64_FUSE_BASE as a common base level of fusion supported by almost all cores. Add AARCH64_FUSE_MOVK as a shortcut for all MOVK fusion. In most cases there is no change. It enables AARCH64_FUSE_CMP_BRANCH for a few older cores since it has no measura

[PATCH] AArch64: Remove duplicated addr_cost tables

2024-11-08 Thread Wilco Dijkstra

Remove duplicated addr_cost tables - use generic_armv9_a_addrcost_table for Armv9-a cores and generic_armv8_a_addrcost_table for recent Armv8-a cores. No changes in generated code. OK for commit? gcc/ChangeLog: * config/aarch64/tuning_models/cortexx925.h (cortexx925_addrcost_table): Re

Re: [PATCH] AArch64: Block combine_and_move from creating FP literal loads

2024-11-08 Thread Wilco Dijkstra

Hi Richard, > That's because, once an instruction matches, the instruction should > continue to match. It should always be possible to set the INSN_CODE of > an existing instruction to -1, rerun recog, and get the same instruction > code back. > > Because of that, insn conditions shouldn't depend

Re: [PATCH] AArch64: Block combine_and_move from creating FP literal loads

2024-11-08 Thread Wilco Dijkstra

Hi Richard, > It's ok for instructions to require properties that are false during > early RTL passes and then transition to true. But they can't require > properties that go from true to false, since that would mean that > existing instructions become unrecognisable at certain points during > th

[PATCH v2] AArch64: Switch off early scheduling

2024-11-01 Thread Wilco Dijkstra

v2: split off movsf/df pattern fixes, remove some guality xfails that now pass The early scheduler takes up ~33% of the total build time, however it doesn't provide a meaningful performance gain. This is partly because modern OoO cores need far less scheduling, partly because the scheduler tends

[PATCH] AArch64: Block combine_and_move from creating FP literal loads

2024-11-01 Thread Wilco Dijkstra

The IRA combine_and_move pass runs if the scheduler is disabled and aggressively combines moves. The movsf/df patterns allow all FP immediates since they rely on a split pattern. However splits do not happen during IRA, so the result is extra literal loads. To avoid this, use a more accurate ch

Re: [PATCH] AArch64: Switch off early scheduling

2024-10-31 Thread Wilco Dijkstra

Hi Kyrill, > I think the approach that I’d like to try is using the TARGET_SCHED_DISPATCH > hooks like x86 does for bdver1-4. > That would try to exploit the dispatch constraints information in the SWOGs > rather than the instruction latency and throughput tables. > That would still require some

Re: [PATCH] AArch64: Switch off early scheduling

2024-10-31 Thread Wilco Dijkstra

Hi Andrew, > I suspect the following scheduling models could be removed due either > to hw never going to production or no longer being used by anyone: > thunderx3t110.md > falkor.md > saphira.md If you're planning to remove these, it would also be good to remove the falkor-tag-collision-avoidanc

[PATCH] AArch64: Switch off early scheduling

2024-10-31 Thread Wilco Dijkstra

The early scheduler takes up ~33% of the total build time, however it doesn't provide a meaningful performance gain. This is partly because modern OoO cores need far less scheduling, partly because the scheduler tends to create many unnecessary spills by increasing register pressure. Building ap

Re: [PATCH 1/4] sched1: hookize pressure scheduling spilling agressiveness

2024-10-29 Thread Wilco Dijkstra

Hi Vineet, > I agree the NARROW/WIDE stuff is obfuscating things in technicalities. Is there evidence this change would make things significantly worse for some targets? I did a few runs on Neoverse V2 with various options and it looks beneficial both for integer and FP. On the example and option

[PATCH] AArch64: Add more accurate constraint [PR117292]

2024-10-25 Thread Wilco Dijkstra

As shown in the PR, reload may only check the constraint in some cases and and not check the predicate is still valid for the resulting instruction. To fix the issue, add a new constraint which matches the predicate exactly. Passes regress & bootstrap, OK for commit? gcc/ChangeLog: PR ta

[PATCH] AArch64: Remove redundant check in aarch64_simd_mov

2024-10-17 Thread Wilco Dijkstra

The split condition in aarch64_simd_mov uses aarch64_simd_special_constant_p. While doing the split, it checks the mode before calling aarch64_maybe_generate_simd_constant. This risky since it may result in unexpectedly calling aarch64_split_simd_move instead of aarch64_maybe_generate_simd_con

[PATCH v3] AArch64: Fix copysign patterns

2024-10-17 Thread Wilco Dijkstra

The current copysign pattern has a mismatch in the predicates and constraints - operand[2] is a register_operand but also has an alternative X which allows any operand. Since it is a floating point operation, having an integer alternative makes no sense. Change the expander to always use vector i

Re: [PATCH 3/3] AArch64: Add support for SIMD xor immediate

2024-10-15 Thread Wilco Dijkstra

Add support for SVE xor immediate when generating AdvSIMD code and SVE is available. Passes bootstrap & regress, OK for commit? gcc/ChangeLog: * config/aarch64/aarch64.cc (enum simd_immediate_check): Add AARCH64_CHECK_XOR. (aarch64_simd_valid_xor_imm): New function. (a

Re: [PATCH 2/2] AArch64: Improve SIMD immediate generation

2024-10-14 Thread Wilco Dijkstra

Allow use of SVE immediates when generating AdvSIMD code and SVE is available. First check for a valid AdvSIMD immediate, and if SVE is available, try using an SVE move or bitmask immediate. Passes bootstrap & regress, OK for commit? gcc/ChangeLog: * config/aarch64/aarch64-simd.md (ior3

[PATCH 1/2] AArch64: Improve SIMD immediate generation

2024-10-14 Thread Wilco Dijkstra

Cleanup the various interfaces related to SIMD immediate generation. Introduce new functions that make it clear which operation (AND, OR, MOV) we are testing for rather than guessing the final instruction. Reduce the use of overly long names, unused and default parameters for clarity. No cha

Re: [PATCH] aarch64: Fix bug with max/min (PR116934)

2024-10-04 Thread Wilco Dijkstra

Hi Saurabh, This looks good, one little nit: > gcc/ChangeLog: > > * config/aarch64/iterators.md: Move UNSPEC_COND_SMAX and > UNSPEC_COND_SMIN to correct iterators. This should also have the PR target/116934 before it - it's fine to change it when you commit. Speaking of which,

[PATCH v2] AArch64: Fix copysign patterns

2024-09-18 Thread Wilco Dijkstra

v2: Add more testcase fixes. The current copysign pattern has a mismatch in the predicates and constraints - operand[2] is a register_operand but also has an alternative X which allows any operand. Since it is a floating point operation, having an integer alternative makes no sense. Change the e

[PATCH] AArch64: Fix copysign patterns

2024-09-17 Thread Wilco Dijkstra

The current copysign pattern has a mismatch in the predicates and constraints - operand[2] is a register_operand but also has an alternative X which allows any operand. Since it is a floating point operation, having an integer alternative makes no sense. Change the expander to always use the vec

Re: [PATCH v3] Arm: Fix ldrd offset range [PR115153]

2024-06-27 Thread Wilco Dijkstra

Hi Richard, > The Linaro CI is reporting an ICE while building libgfortran with this change. So it looks like Thumb-2 oddly enough restricts the negative range of DFmode eventhough that is unnecessary and inefficient. The easiest workaround turned out to avoid using checked adjust_address. Cheer

Re: [PATCH v3] Arm: Fix disassembly error in Thumb-1 relaxed load/store [PR115188]

2024-06-27 Thread Wilco Dijkstra

Hi Richard, > Doing just this will mean that the register allocator will have to undo a > pre/post memory operand that was accepted by the predicate (memory_operand). > I think we really need a tighter predicate (lets call it noautoinc_mem_op) > here to avoid that. Note that the existing uses

[BACKPORT] AArch64: Fix strict-align cpymem/setmem [PR103100]

2024-06-27 Thread Wilco Dijkstra

OK to backport to GCC13 (it applies cleanly and regress/bootstrap passes)? Cheers, Wilco On 29/11/2023 18:09, Richard Sandiford wrote: > Wilco Dijkstra writes: >> v2: Use UINTVAL, rename max_mops_size. >> >> The cpymemdi/setmemdi implementation doesn't fully support

[PATCH v2] Arm: Fix ldrd offset range [PR115153]

2024-06-11 Thread Wilco Dijkstra

v2: use a new arm_arch_v7ve_neon, fix use of DImode in output_move_neon The valid offset range of LDRD in arm_legitimate_index_p is increased to -1024..1020 if NEON is enabled since VALID_NEON_DREG_MODE includes DImode. Fix this by moving the LDRD check earlier. Passes bootstrap & regress, OK for

[PATCH v2] Arm: Fix disassembly error in Thumb-1 relaxed load/store [PR115188]

2024-06-11 Thread Wilco Dijkstra

Hi Christophe, > PR target/115153 I guess this is typo (should be 115188) ? Correct. > +/* { dg-options "-O2 -mthumb" } */-mthumb is included in arm_arch_v6m, so I > think you don't need to add it here? Indeed, it's not strictly necessary. Fixed in v2: A Thumb-1 memory operand allows

Re: PATCH] AArch64: Fix cpu features initialization [PR115342]

2024-06-05 Thread Wilco Dijkstra

Hi Richard, >> Essentially anything covered by HWCAP doesn't need an explicit check. So I >> kept >> the LS64 and PREDRES checks since they don't have a HWCAP allocated (I'm not >> entirely convinced we need these, let alone having 3 individual bits for >> LS64, but >> that's something for the A

Re: PATCH] AArch64: Fix cpu features initialization [PR115342]

2024-06-04 Thread Wilco Dijkstra

Hi Richard, I've reworded the commit message a bit: The CPU features initialization code uses CPUID registers (rather than HWCAP). The equality comparisons it uses are incorrect: for example FEAT_SVE is not set if SVE2 is available. Using HWCAPs for these is both simpler and correct. The initi

PATCH] AArch64: Fix cpu features initialization [PR115342]

2024-06-04 Thread Wilco Dijkstra

Fix CPU features initialization. Use HWCAP rather than explicit accesses to CPUID registers. Perform the initialization atomically to avoid multi- threading issues. Passes regress, OK for commit and backport? libgcc: PR target/115342 * config/aarch64/cpuinfo.c (__init_cpu_featu

[PATCH] Arm: Fix disassembly error in Thumb-1 relaxed load/store [PR115188]

2024-06-03 Thread Wilco Dijkstra

A Thumb-1 memory operand allows single-register LDMIA/STMIA. This doesn't get printed as LDR/STR with writeback in unified syntax, resulting in strange assembler errors if writeback is selected. To work around this, use the 'Uw' constraint that blocks writeback. Passes bootstrap & regress, OK for

[PATCH] Arm: Fix ldrd offset range [PR115153]

2024-06-03 Thread Wilco Dijkstra

The valid offset range of LDRD in arm_legitimate_index_p is increased to -1024..1020 if NEON is enabled since VALID_NEON_DREG_MODE includes DImode. Fix this by moving the LDRD check earlier. Passes bootstrap & regress, OK for commit? gcc: PR target/115153 * config/arm/arm.cc (arm

Re: [PATCH] AArch64: Add ACLE MOPS support

2024-05-31 Thread Wilco Dijkstra

Hi Richard, > I think this should be in a push_options/pop_options block, as for other > intrinsics that require certain features. But then the intrinsic would always be defined, which is contrary to what the ACLE spec demands - it would not give a compilation error at the callsite but give assem

[PATCH] AArch64: Add ACLE MOPS support

2024-05-31 Thread Wilco Dijkstra

Add __ARM_FEATURE_MOPS predefine. Add support for ACLE __arm_mops_memset_tag. Passes regress, OK for commit? gcc: * config/aaarch64/aarch64-c.cc (aarch64_update_cpp_builtins): Add __ARM_FEATURE_MOPS predefine. * config/aarch64/arm_acle.h: Add __arm_mops_memset_tag(). gc

[PATCH] testsuite: Improve check-function-bodies

2024-05-31 Thread Wilco Dijkstra

Improve check-function-bodies by allowing single-character function names. Also skip '#' comments which may be emitted from inline assembler. Passes regress, OK for commit? gcc/testsuite: * lib/scanasm.exp (configure_check-function-bodies): Allow single-char function names. Skip

[PATCH v3] aarch64: Fix normal returns inside functions which use eh_returns [PR114843]

2024-05-20 Thread Wilco Dijkstra

Hi Andrew, A few comments on the implementation, I think it can be simplified a lot: > +++ b/gcc/config/aarch64/aarch64.h > @@ -700,8 +700,9 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE = > AARCH64_FL_SM_OFF; > #define DWARF2_UNWIND_INFO 1 > > /* Use R0 through R3 to pass exception handling

Re: [PATCH] AArch64: Improve costing of ctz

2024-05-15 Thread Wilco Dijkstra

Hi Andrew, > I should note popcount has a similar issue which I hope to fix next week. > Popcount cost is used during expand so it is very useful to be slightly more > correct. It's useful to set the cost so that all of the special cases still apply - even if popcount is relatively fast, it's s

[PATCH] AArch64: Improve costing of ctz

2024-05-15 Thread Wilco Dijkstra

Improve costing of ctz - both TARGET_CSSC and vector cases were not handled yet. Passes regress & bootstrap - OK for commit? gcc: * config/aarch64/aarch64.cc (aarch64_rtx_costs): Improve CTZ costing. --- diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc index f

[PATCH] AArch64: Fix printing of 2-instruction alternatives

2024-05-15 Thread Wilco Dijkstra

Add missing '\' in 2-instruction movsi/di alternatives so that they are printed on separate lines. Passes bootstrap and regress, OK for commit once stage 1 reopens? gcc: * config/aarch64/aarch64.md (movsi_aarch64): Use '\;' to force newline in 2-instruction pattern. (movdi

[PATCH] AArch64: Use LDP/STP for large struct types

2024-05-15 Thread Wilco Dijkstra

Use LDP/STP for large struct types as they have useful immediate offsets and are typically faster. This removes differences between little and big endian and allows use of LDP/STP without UNSPEC. Passes regress and bootstrap, OK for commit? gcc: * config/aarch64/aarch64.cc (aarch64_clas

[PATCH] AArch64: Use LDP/STP for large struct types

2024-05-15 Thread Wilco Dijkstra

Use LDP/STP for large struct types as they have useful immediate offsets and are typically faster. This removes differences between little and big endian and allows use of LDP/STP without UNSPEC. Passes regress and bootstrap, OK for commit? gcc: * config/aarch64/aarch64.cc (aarch64_clas

[PATCH] AArch64: Use UZP1 instead of INS

2024-05-15 Thread Wilco Dijkstra

Use UZP1 instead of INS when combining low and high halves of vectors. UZP1 has 3 operands which improves register allocation, and is faster on some microarchitectures. Passes regress & bootstrap, OK for commit? gcc: * config/aarch64/aarch64-simd.md (aarch64_combine_internal): Use

[PATCH] regalloc: Ignore '^' in early costing [PR114766]

2024-04-29 Thread Wilco Dijkstra

According to documentation, '^' should only have an effect during reload. However ira-costs.cc treats it in the same way as '?' during early costing. As a result using '^' can accidentally disable valid alternatives and cause significant regressions (see PR114741). Avoid this by ignoring '^' duri

[PATCH] libgcc: Add missing HWCAP entries to aarch64/cpuinfo.c

2024-04-02 Thread Wilco Dijkstra

A few HWCAP entries are missing from aarch64/cpuinfo.c. This results in build errors on older machines. This counts a trivial build fix, but since it's late in stage 4 I'll let maintainers chip in. OK for commit? libgcc/ * config/aarch64/cpuinfo.c: Add HWCAP_EVTSTRM, HWCAP_CRC32, HWC

[PATCH] libatomic: Cleanup macros in atomic_16.S

2024-03-26 Thread Wilco Dijkstra

As mentioned in https://gcc.gnu.org/pipermail/gcc-patches/2024-March/648397.html , do some additional cleanup of the macros and aliases: Cleanup the macros to add the libat_ prefixes in atomic_16.S. Emit the alias to __atomic_ when ifuncs are not enabled in the ENTRY macro. Passes regress and

Re: [PATCH] libatomic: Fix build for --disable-gnu-indirect-function [PR113986]

2024-03-26 Thread Wilco Dijkstra

Hi Richard, > This description is too brief for me. Could you say in detail how the > new scheme works? E.g. the description doesn't explain: > > -if ARCH_AARCH64_HAVE_LSE128 > -AM_CPPFLAGS = -DHAVE_FEAT_LSE128 > -endif That is not needed because we can include auto-config.h in atomic_16.

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1077 matches

Mail list logo