Re: [PATCH] AArch64: Fix __sync_val_compare_and_swap [PR111404]

2023-11-30 Thread Wilco Dijkstra
Hi Richard, Thanks for the review, now committed. > The new aarch64_split_compare_and_swap code looks a bit twisty. > The approach in lse.S seems more obvious.  But I'm guessing you > didn't want to spend any time restructuring the pre-LSE > -mno-outline-atomics code, and I agree the patch in its

Re: [PATCH v3] AArch64: Add inline memmove expansion

2023-12-01 Thread Wilco Dijkstra
Hi Richard, > +  rtx load[max_ops], store[max_ops]; > > Please either add a comment explaining why 40 is guaranteed to be > enough, or (my preference) use: > >  auto_vec, ...> ops; I've changed to using auto_vec since that should help reduce conflicts with Alex' LDP changes. I double-checked maxi

Re: [PATCH v2] libatomic: Enable lock-free 128-bit atomics on AArch64 [PR110061]

2023-12-04 Thread Wilco Dijkstra
Hi Richard, >> Enable lock-free 128-bit atomics on AArch64.  This is backwards compatible >> with >> existing binaries, gives better performance than locking atomics and is what >> most users expect. > > Please add a justification for why it's backwards compatible, rather > than just stating that

Re: [PATCH] AArch64: Cleanup memset expansion

2023-11-10 Thread Wilco Dijkstra
Hi Kyrill, > +  /* Reduce the maximum size with -Os.  */ > +  if (optimize_function_for_size_p (cfun)) > +    max_set_size = 96; > + > This is a new "magic" number in this code. It looks sensible, but how > did you arrive at it? We need 1 instruction to create the value to store (DUP or MO

Re: [PATCH] libatomic: Improve ifunc selection on AArch64

2023-11-10 Thread Wilco Dijkstra
Hi Kyrill, > +  if (!(hwcap & HWCAP_CPUID)) > +    return false; > + > +  unsigned long midr; > +  asm volatile ("mrs %0, midr_el1" : "=r" (midr)); > From what I recall that midr_el1 register is emulated by the kernel and so > userspace software > has to check that the kernel supports that emula

Re: [PATCH v2] AArch64: Cleanup memset expansion

2023-11-14 Thread Wilco Dijkstra
Hi, >>> I checked codesize on SPECINT2017, and 96 had practically identical size. >>> Using 128 would also be a reasonable Os value with a very slight size >>> increase, >>> and 384 looks good for O2 - however I didn't want to tune these values >>> as this >>> is a cleanup patch. >>> >>> Cheers, >

Re: [PATCH v2] AArch64: Cleanup memset expansion

2023-11-14 Thread Wilco Dijkstra
Hi Richard, > +/* Maximum bytes set for an inline memset expansion.  With -Os use 3 STP > +   and 1 MOVI/DUP (same size as a call).  */ > +#define MAX_SET_SIZE(speed) (speed ? 256 : 96) > So it looks like this assumes we have AdvSIMD.  What about > -mgeneral-regs-only? After my strictalign bugf

Re: [PATCH] AArch64: Improve immediate expansion [PR105928]

2023-09-19 Thread Wilco Dijkstra
Hi Richard, >> Note that aarch64_internal_mov_immediate may be called after reload, >> so it would end up even more complex. > > The sequence I quoted was supposed to work before and after reload.  The: > >    rtx tmp = aarch64_target_reg (dest, DImode); > > would create a fresh tempor

[PATCH] AArch64: Fix strict-align cpymem/setmem [PR103100]

2023-09-20 Thread Wilco Dijkstra
The cpymemdi/setmemdi implementation doesn't fully support strict alignment. Block the expansion if the alignment is less than 16 with STRICT_ALIGNMENT. Clean up the condition when to use MOPS. Passes regress/bootstrap, OK for commit? gcc/ChangeLog/ PR target/103100 * con

[PATCH v2] AArch64: Fix memmove operand corruption [PR111121]

2023-09-20 Thread Wilco Dijkstra
A MOPS memmove may corrupt registers since there is no copy of the input operands to temporary registers. Fix this by calling aarch64_expand_cpymem_mops. Passes regress/bootstrap, OK for commit? gcc/ChangeLog/ PR target/21 * config/aarch64/aarch64.md (aarc

Re: [PATCH] AArch64: Fix strict-align cpymem/setmem [PR103100]

2023-09-20 Thread Wilco Dijkstra
Hi Richard, > * config/aarch64/aarch64.md (cpymemdi): Remove pattern condition. > Shouldn't this be a separate patch?  It's not immediately obvious that this > is a necessary part of this change. You mean this? @@ -1627,7 +1627,7 @@ (define_expand "cpymemdi" (match_operand:BLK 1 "m

[PATCH v2] AArch64: Fix strict-align cpymem/setmem [PR103100]

2023-09-21 Thread Wilco Dijkstra
v2: Use UINTVAL, rename max_mops_size. The cpymemdi/setmemdi implementation doesn't fully support strict alignment. Block the expansion if the alignment is less than 16 with STRICT_ALIGNMENT. Clean up the condition when to use MOPS. Passes regress/bootstrap, OK for commit? gcc/ChangeLog/

[PATCH] AArch64: Add inline memmove expansion

2023-09-21 Thread Wilco Dijkstra
Add support for inline memmove expansions. The generated code is identical as for memcpy, except that all loads are emitted before stores rather than being interleaved. The maximum size is 256 bytes which requires at most 16 registers. Passes regress/bootstrap, OK for commit? gcc/ChangeLog

Re: [PATCH] AArch64: Fix __sync_val_compare_and_swap [PR111404]

2023-09-25 Thread Wilco Dijkstra
Hi Ramana, >> __sync_val_compare_and_swap may be used on 128-bit types and either calls the >> outline atomic code or uses an inline loop.  On AArch64 LDXP is only atomic >> if >> the value is stored successfully using STXP, but the current implementations >> do not perform the store if the compa

[PATCH] AArch64: Remove BTI from outline atomics

2023-09-26 Thread Wilco Dijkstra
The outline atomic functions have hidden visibility and can only be called directly.  Therefore we can remove the BTI at function entry.  This improves security by reducing the number of indirect entry points in a binary. The BTI markings on the objects are still emitted. Passes regress, OK for c

Re: [PATCH v2] ARM: Block predication on atomics [PR111235]

2023-09-27 Thread Wilco Dijkstra
Hi Ramana, > Hope this helps. Yes definitely! >> Passes regress/bootstrap, OK for commit? > > Target ? armhf ? --with-arch , -with-fpu , -with-float parameters ? > Please be specific. I used --target=arm-none-linux-gnueabihf --host=arm-none-linux-gnueabihf --build=arm-none-linux-gnueabihf --wit

[BACKPORT] AArch64: Fix strict-align cpymem/setmem [PR103100]

2024-06-27 Thread Wilco Dijkstra
OK to backport to GCC13 (it applies cleanly and regress/bootstrap passes)? Cheers, Wilco On 29/11/2023 18:09, Richard Sandiford wrote: > Wilco Dijkstra writes: >> v2: Use UINTVAL, rename max_mops_size. >> >> The cpymemdi/setmemdi implementation doesn't fully support

Re: [PATCH v3] Arm: Fix disassembly error in Thumb-1 relaxed load/store [PR115188]

2024-06-27 Thread Wilco Dijkstra
Hi Richard, > Doing just this will mean that the register allocator will have to undo a > pre/post memory operand that was accepted by the predicate (memory_operand).  > I think we really need a tighter predicate (lets call it noautoinc_mem_op) > here to avoid that.  Note that the existing uses

Re: [PATCH v3] Arm: Fix ldrd offset range [PR115153]

2024-06-27 Thread Wilco Dijkstra
Hi Richard, > The Linaro CI is reporting an ICE while building libgfortran with this change. So it looks like Thumb-2 oddly enough restricts the negative range of DFmode eventhough that is unnecessary and inefficient. The easiest workaround turned out to avoid using checked adjust_address. Cheer

[PATCH] AArch64: Use UZP1 instead of INS

2024-05-15 Thread Wilco Dijkstra
Use UZP1 instead of INS when combining low and high halves of vectors. UZP1 has 3 operands which improves register allocation, and is faster on some microarchitectures. Passes regress & bootstrap, OK for commit? gcc: * config/aarch64/aarch64-simd.md (aarch64_combine_internal): Use

[PATCH] AArch64: Use LDP/STP for large struct types

2024-05-15 Thread Wilco Dijkstra
Use LDP/STP for large struct types as they have useful immediate offsets and are typically faster. This removes differences between little and big endian and allows use of LDP/STP without UNSPEC. Passes regress and bootstrap, OK for commit? gcc: * config/aarch64/aarch64.cc (aarch64_clas

[PATCH] AArch64: Use LDP/STP for large struct types

2024-05-15 Thread Wilco Dijkstra
Use LDP/STP for large struct types as they have useful immediate offsets and are typically faster. This removes differences between little and big endian and allows use of LDP/STP without UNSPEC. Passes regress and bootstrap, OK for commit? gcc: * config/aarch64/aarch64.cc (aarch64_clas

[PATCH] AArch64: Fix printing of 2-instruction alternatives

2024-05-15 Thread Wilco Dijkstra
Add missing '\' in 2-instruction movsi/di alternatives so that they are printed on separate lines. Passes bootstrap and regress, OK for commit once stage 1 reopens? gcc: * config/aarch64/aarch64.md (movsi_aarch64): Use '\;' to force newline in 2-instruction pattern. (movdi

[PATCH] AArch64: Improve costing of ctz

2024-05-15 Thread Wilco Dijkstra
Improve costing of ctz - both TARGET_CSSC and vector cases were not handled yet. Passes regress & bootstrap - OK for commit? gcc: * config/aarch64/aarch64.cc (aarch64_rtx_costs): Improve CTZ costing. --- diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc index f

Re: [PATCH] AArch64: Improve costing of ctz

2024-05-15 Thread Wilco Dijkstra
Hi Andrew, > I should note popcount has a similar issue which I hope to fix next week. > Popcount cost is used during expand so it is very useful to be slightly more > correct. It's useful to set the cost so that all of the special cases still apply - even if popcount is relatively fast, it's s

[PATCH v3] aarch64: Fix normal returns inside functions which use eh_returns [PR114843]

2024-05-20 Thread Wilco Dijkstra
Hi Andrew, A few comments on the implementation, I think it can be simplified a lot: > +++ b/gcc/config/aarch64/aarch64.h > @@ -700,8 +700,9 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE = > AARCH64_FL_SM_OFF; > #define DWARF2_UNWIND_INFO 1 > > /* Use R0 through R3 to pass exception handling

[PATCH] testsuite: Improve check-function-bodies

2024-05-31 Thread Wilco Dijkstra
Improve check-function-bodies by allowing single-character function names. Also skip '#' comments which may be emitted from inline assembler. Passes regress, OK for commit? gcc/testsuite: * lib/scanasm.exp (configure_check-function-bodies): Allow single-char function names. Skip

[PATCH] AArch64: Add ACLE MOPS support

2024-05-31 Thread Wilco Dijkstra
Add __ARM_FEATURE_MOPS predefine. Add support for ACLE __arm_mops_memset_tag. Passes regress, OK for commit? gcc: * config/aaarch64/aarch64-c.cc (aarch64_update_cpp_builtins): Add __ARM_FEATURE_MOPS predefine. * config/aarch64/arm_acle.h: Add __arm_mops_memset_tag(). gc

Re: [PATCH] AArch64: Add ACLE MOPS support

2024-05-31 Thread Wilco Dijkstra
Hi Richard, > I think this should be in a push_options/pop_options block, as for other > intrinsics that require certain features. But then the intrinsic would always be defined, which is contrary to what the ACLE spec demands - it would not give a compilation error at the callsite but give assem

[PATCH] Arm: Fix ldrd offset range [PR115153]

2024-06-03 Thread Wilco Dijkstra
The valid offset range of LDRD in arm_legitimate_index_p is increased to -1024..1020 if NEON is enabled since VALID_NEON_DREG_MODE includes DImode. Fix this by moving the LDRD check earlier. Passes bootstrap & regress, OK for commit? gcc: PR target/115153 * config/arm/arm.cc (arm

[PATCH] Arm: Fix disassembly error in Thumb-1 relaxed load/store [PR115188]

2024-06-03 Thread Wilco Dijkstra
A Thumb-1 memory operand allows single-register LDMIA/STMIA. This doesn't get printed as LDR/STR with writeback in unified syntax, resulting in strange assembler errors if writeback is selected. To work around this, use the 'Uw' constraint that blocks writeback. Passes bootstrap & regress, OK for

PATCH] AArch64: Fix cpu features initialization [PR115342]

2024-06-04 Thread Wilco Dijkstra
Fix CPU features initialization. Use HWCAP rather than explicit accesses to CPUID registers. Perform the initialization atomically to avoid multi- threading issues. Passes regress, OK for commit and backport? libgcc: PR target/115342 * config/aarch64/cpuinfo.c (__init_cpu_featu

Re: PATCH] AArch64: Fix cpu features initialization [PR115342]

2024-06-04 Thread Wilco Dijkstra
Hi Richard, I've reworded the commit message a bit: The CPU features initialization code uses CPUID registers (rather than HWCAP). The equality comparisons it uses are incorrect: for example FEAT_SVE is not set if SVE2 is available. Using HWCAPs for these is both simpler and correct. The initi

Re: PATCH] AArch64: Fix cpu features initialization [PR115342]

2024-06-05 Thread Wilco Dijkstra
Hi Richard, >> Essentially anything covered by HWCAP doesn't need an explicit check. So I >> kept >> the LS64 and PREDRES checks since they don't have a HWCAP allocated (I'm not >> entirely convinced we need these, let alone having 3 individual bits for >> LS64, but >> that's something for the A

[PATCH v2] Arm: Fix disassembly error in Thumb-1 relaxed load/store [PR115188]

2024-06-11 Thread Wilco Dijkstra
Hi Christophe, >  PR target/115153 I guess this is typo (should be 115188) ? Correct. > +/* { dg-options "-O2 -mthumb" } */-mthumb is included in arm_arch_v6m, so I > think you don't need to add it here? Indeed, it's not strictly necessary. Fixed in v2: A Thumb-1 memory operand allows

[PATCH v2] Arm: Fix ldrd offset range [PR115153]

2024-06-11 Thread Wilco Dijkstra
v2: use a new arm_arch_v7ve_neon, fix use of DImode in output_move_neon The valid offset range of LDRD in arm_legitimate_index_p is increased to -1024..1020 if NEON is enabled since VALID_NEON_DREG_MODE includes DImode. Fix this by moving the LDRD check earlier. Passes bootstrap & regress, OK for

[PATCH] regalloc: Ignore '^' in early costing [PR114766]

2024-04-29 Thread Wilco Dijkstra
According to documentation, '^' should only have an effect during reload. However ira-costs.cc treats it in the same way as '?' during early costing. As a result using '^' can accidentally disable valid alternatives and cause significant regressions (see PR114741). Avoid this by ignoring '^' duri

Re: [PATCH 2/4 v2][AArch64] Add support for FCCMP

2016-01-21 Thread Wilco Dijkstra
James Greenhalgh wrote: > If we don't have any targets which care about the fccmps/fccmpd split in > the code base, do we really need it? Can we just follow the example of > fcsel? If we do that then we should also change fcmps/d to fcmp to keep the f(c)cmp attributes orthogonal. However it seems

Re: [PATCH][AArch64] Replace insn to zero up DF register

2016-01-22 Thread Wilco Dijkstra
On 12/16/2015 03:30 PM, Evandro Menezes wrote: > >    On 10/30/2015 05:24 AM, Marcus Shawcroft wrote: > >    On 20 October 2015 at 00:40, Evandro Menezes >wrote: > >    In the existing targets, it seems that it's always faster to zero >up a DF > >    register with "movi %d0,

[PATCH] Improve TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS callback

2016-01-22 Thread Wilco Dijkstra
cno FP_REGS a1 (r79,l0) best FP_REGS, allocno FP_REGS As a result it is now no longer a requirement to use register move costs that are larger than the memory move cost. So it will be feasible to use realistic costs for both without a huge penalty. ChangeLog: 2016-01-22 Wilco Dijk

Re: [PATCH] Fix aarch64 bootstrap (pr69416)

2016-01-25 Thread Wilco Dijkstra
Richard Henderson wrote: > On 01/25/2016 05:28 AM, Christophe Lyon wrote: > > After this, I'm seeing this test now FAILs: > > gcc.target/aarch64/ccmp_1.c scan-assembler adds\t > > That test case is badly written. In addition to that one, several of the > other > failures that I see within that fi

Re: [PATCH 4/4][AArch64] Cost CCMP instruction sequences to choose better expand order

2016-01-25 Thread Wilco Dijkstra
;wzr' on cmp - BTW is there a regular expression that correctly implements (0|xzr)? If I use that the test still fails somehow but \[0wzr\]+ works fine... Is the correct syntax documented somewhere? Finally to ensure FCCMPE is emitted on relational compares, add -ffinite-math-only. Cha

Re: [PATCH][AArch64] Add vector permute cost

2016-01-26 Thread Wilco Dijkstra
ping From: Wilco Dijkstra Sent: 16 December 2015 11:37 To: Richard Biener; James Greenhalgh Cc: GCC Patches; nd Subject: RE: [PATCH][AArch64] Add vector permute cost Richard Biener wrote: > On Wed, Dec 16, 2015 at 10:32 AM, James Greenhalgh >

Re: [PATCH][AArch64] Add TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS

2016-01-26 Thread Wilco Dijkstra
ping (note the regressions discussed below are addressed by https://gcc.gnu.org/ml/gcc-patches/2016-01/msg01761.html) From: Wilco Dijkstra Sent: 17 December 2015 13:37 To: James Greenhalgh Cc: gcc-patches@gcc.gnu.org; nd Subject: RE: [PATCH][AArch64] Add

Re: [PATCH][ARM] Enable fusion of AES instructions

2016-01-26 Thread Wilco Dijkstra
ping > -Original Message- > From: Wilco Dijkstra [mailto:wilco.dijks...@arm.com] > Sent: 19 November 2015 18:12 > To: gcc-patches@gcc.gnu.org > Subject: [PATCH][ARM] Enable fusion of AES instructions > > Enable instruction fusion of AES instructions on ARM for Cor

[COMMITTED] Add myself as GCC maintainer

2016-01-27 Thread Wilco Dijkstra
I've added myself to the "Write After Approval" maintainers (Committed revision 232880): Index: ChangeLog === --- ChangeLog (revision 232874) +++ ChangeLog (working copy) @@ -1,3 +1,7 @@ +2015-01-27

Re: [PATCH 4/4][AArch64] Cost CCMP instruction sequences to choose better expand order

2016-02-03 Thread Wilco Dijkstra
James Greenhalgh wrote: > I'm still seeing: > > FAIL: gcc.target/aarch64/ccmp_1.c scan-assembler-times \\tcmp\\tw[0-9]+, > (0|wzr) 4 That's because "(0|wzr)" is not correctly matching due to the weird regular expression syntax used in the testsuite (I tried with several escapes to no avail). I

[COMMITTED][AArch64] Fix ccmp_1.c test

2016-02-03 Thread Wilco Dijkstra
Fix the ccmp_1.c test back to use '0' as regular expressions don't work correctly. '0' is right due to compare with zero now printing as 'CMP w0, 0' rather than 'CMP w0, wzr' (since r232921). Committed as trivial patch in r233102. ChangeLog

[PATCH] PR69619: Fix exponential issue in ccmp.c

2016-02-03 Thread Wilco Dijkstra
don't see why the backend should expand tree expressions, especially when they are not part of the CCMP sequence. OK for commit? ChangeLog: 2016-02-03 Wilco Dijkstra gcc/ PR target/69619 * ccmp.c (expand_ccmp_expr_1): Avoid evaluating gs0/gs1 twice when co

[COMMITTED][AArch64] Add missing return in aarch64_internal_mov_immediate

2016-02-17 Thread Wilco Dijkstra
4. Adding the return fixes the regressions. Committed as trivial in revision 233490. 2016-02-17 Wilco Dijkstra gcc/ * config/aarch64/aarch64.c (aarch64_internal_mov_immediate): Add missing return. -- diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c

Re: [PATCH][AArch64] Replace insn to zero up DF register

2016-02-26 Thread Wilco Dijkstra
Evandro Menezes wrote: > > I have a question though: is it necessary to add the "fp" and "simd" > attributes to both movsf_aarch64 and movdf_aarch64 as well? You need at least the "simd" attribute, but providing "fp" as well is clearer (in principle the TARGET_FLOAT check in the pattern condition

Re: [PATCH][AArch64] Replace insn to zero up DF register

2016-02-29 Thread Wilco Dijkstra
Evandro Menezes wrote: > > Please, verify the new "simd" and "fp" attributes for SF and DF. Both movsf and movdf should be: (set_attr "simd" "*,yes,*,*,*,*,*,*,*,*") (set_attr "fp" "*,*,*,yes,yes,yes,yes,*,*,*") Did you check that with -mcpu=generic+nosimd you get fmov s0, wzr? In my version

Re: [PATCH][AArch64] Replace insn to zero up DF register

2016-03-01 Thread Wilco Dijkstra
Evandro Menezes wrote: > > The meaning of these attributes are not clear to me. Is there a > reference somewhere about which insns are FP or SIMD or neither? The meaning should be clear, "fp" is a floating point instruction, "simd" a SIMD one as defined in ARM-ARM. > Indeed, I had to add the Y

[PATCH][AArch64] Improve spill code - swap order in shr patterns

2015-07-27 Thread Wilco Dijkstra
the extra int<->FP moves. Placing the integer variant first in the shr pattern generates far more optimal spill code. 2015-07-27 Wilco Dijkstra * gcc/config/aarch64/aarch64.md (aarch64_lshr_sisd_or_int_3): Place integer variant first. (aarch64_ashr_sisd_or

RE: [PATCH][AArch64] Improve spill code - swap order in shl pattern

2015-07-27 Thread Wilco Dijkstra
ping > -Original Message- > From: Wilco Dijkstra [mailto:wdijk...@arm.com] > Sent: 27 April 2015 14:37 > To: GCC Patches > Subject: [PATCH][AArch64] Improve spill code - swap order in shl pattern > > Various instructions are supported as integer operations as well

[PATCH][AArch64] Improve add immediate expansion

2015-09-25 Thread Wilco Dijkstra
d on AArch64. OK for commit? ChangeLog: 2015-09-25 Wilco Dijkstra * gcc/config/aarch64/aarch64.md (add3): Block early expansion into 2 add instructions. (add3_pluslong): New pattern to combine complex immediates into 2 additions. --- gcc/config/aarch64/aarch64.md

[PATCH][AArch64] Update patterns to support FP zero

2015-10-08 Thread Wilco Dijkstra
This patch improves support for instructions that allow FP zero immediate. All FP compares generated by various patterns should use aarch64_fp_compare_operand. LDP/STP uses aarch64_reg_or_fp_zero. Passes regression on AArch64. OK for commit? ChangeLog: 2015-10-08 Wilco Dijkstra

[PATCH][AArch64] Enable fusion of AES instructions

2015-10-14 Thread Wilco Dijkstra
Enable instruction fusion of dependent AESE; AESMC and AESD; AESIMC pairs. This can give up to 2x speedup on many AArch64 implementations. Also model the crypto instructions on Cortex-A57 according to the Optimization Guide. Passes regression tests. ChangeLog: 2015-10-14 Wilco Dijkstra

[PATCH][AArch64] Avoid emitting zero immediate as zero register

2015-10-28 Thread Wilco Dijkstra
Several instructions accidentally emit wzr/xzr even when the pattern specifies an immediate. Fix this by removing the register specifier in patterns that emit immediates. Passes regression tests. OK for commit? ChangeLog: 2015-10-28 Wilco Dijkstra * gcc/config/aarch64/aarch64.md

[PATCH][AArch64] Add TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS

2015-11-06 Thread Wilco Dijkstra
of the register. This results in better register allocation overall, fewer spills and reduced codesize - particularly in SPEC2006 gamess. GCC regression passes with several minor fixes. OK for commit? ChangeLog: 2015-11-06 Wilco Dijkstra * gcc/config/aarch64/aarch64.c

[PATCH] Fix IRA register preferencing

2015-11-10 Thread Wilco Dijkstra
reg_pref to illegal register classes so this kind of issue can be trivially found with an assert? Also would it not be a good idea to have a single register copy function that ensures all data is copied? ChangeLog: 2014-12-09  Wilco Dijkstra  wdijk...@arm.com     * gcc/ira-emit.c

[PATCH 1/4][AArch64] Generalize CCMP support

2015-11-13 Thread Wilco Dijkstra
compare with zero can be merged into an ALU operation: int f (int a, int b) { a += b; return a == 0 || a == 3; } f: addsw0, w0, w1 ccmpw0, 3, 4, ne csetw0, eq ret Passes GCC regression tests. OK for commit? ChangeLog: 2015-11-13 Wilco Dijkstra

[PATCH 2/4][AArch64] Add support for FCCMP

2015-11-13 Thread Wilco Dijkstra
This patch adds support for FCCMP. This is trivial with the new CCMP representation - remove the restriction of FP in ccmp.c and add FCCMP patterns. Add a test to ensure FCCMP/FCCMPE are emitted as expected. OK for commit? ChangeLog: 2015-11-13 Wilco Dijkstra * gcc/ccmp.c

[PATCH 3/4][AArch64] Add CCMP to rtx costs

2015-11-13 Thread Wilco Dijkstra
This patch adds support for rtx costing of CCMP. The cost is the same as int/FP compare, however comparisons with zero get a slightly larger cost. This means we prefer emitting compares with zero so they can be merged with ALU operations. OK for commit? ChangeLog: 2015-11-13 Wilco Dijkstra

[PATCH 4/4][AArch64] Cost CCMP instruction sequences to choose better expand order

2015-11-13 Thread Wilco Dijkstra
This patch adds CCMP selection based on rtx costs. This is based on Jiong's already approved patch https://gcc.gnu.org/ml/gcc-patches/2015-09/msg01434.html with some minor refactoring and the tests updated. OK for commit? ChangeLog: 2015-11-13 Jiong Wang gcc/ * ccmp.c (expand_ccmp_exp

RE: [PATCH 2/4][AArch64] Add support for FCCMP

2015-11-13 Thread Wilco Dijkstra
> Evandro Menezes wrote: > Hi, Wilco. > > It looks good to me, but FCMP is quite different from FCCMP on Exynos M1, > so it'd be helpful to have distinct types for them. Say, "fcmp{s,d}" > and "fccmp{s,d}". Would it be acceptable to add this with this patch or > later? It would be easy to add f

RE: [PATCH 1/4 v2][AArch64] Generalize CCMP support

2015-11-18 Thread Wilco Dijkstra
Bernd Schmidt wrote: > Sent: 17 November 2015 22:16 > To: Wilco Dijkstra; gcc-patches@gcc.gnu.org > Subject: Re: [PATCH 1/4][AArch64] Generalize CCMP support > > On 11/13/2015 05:02 PM, Wilco Dijkstra wrote: > > * gcc/ccmp.c (expand_ccmp_expr): Extract cmp_code from

[PATCH 2/4 v2][AArch64] Add support for FCCMP

2015-11-18 Thread Wilco Dijkstra
(v2 version removes 4 enums) This patch adds support for FCCMP. This is trivial with the new CCMP representation - remove the restriction of FP in ccmp.c and add FCCMP patterns. Add a test to ensure FCCMP/FCCMPE are emitted as expected. OK for commit? ChangeLog: 2015-11-18 Wilco Dijkstra

[PATCH 4/4 v2][AArch64] Cost CCMP instruction sequences to choose better expand order

2015-11-18 Thread Wilco Dijkstra
Jiong Wang 2015-11-18 Wilco Dijkstra gcc/ * ccmp.c (expand_ccmp_expr_1): Cost the instruction sequences generated from different expand order. Cleanup enum use. gcc/testsuite/ * gcc.target/aarch64/ccmp_1.c: Update test. --- gcc/ccmp.c

[PATCH][ARM] Enable fusion of AES instructions

2015-11-20 Thread Wilco Dijkstra
Enable instruction fusion of AES instructions on ARM for Cortex-A53 and Cortex-A57. OK for commit? ChangeLog: 2015-11-20 Wilco Dijkstra * gcc/config/arm/arm.c (arm_cortex_a53_tune): Add AES fusion. (arm_cortex_a57_tune): Likewise. (aarch_macro_fusion_pair_p): Add

RE: [PATCH 1/4 v2][AArch64] Generalize CCMP support

2015-11-24 Thread Wilco Dijkstra
ch compares the previously set CC register. The then part does the compare like a normal compare. The else part contains the integer value of the AArch64 condition that must be set if the if condition is false. ChangeLog: 2015-11-12 Wilco Dijkstra * gcc/target.def (gen_ccmp_fir

RE: [PATCH][ARM] Enable fusion of AES instructions

2015-11-25 Thread Wilco Dijkstra
Yvan Roux wrote: > I've a question regarding Cortex-A35, I don't see the same > documentation for it on ARM website as we have for the other cores > yet, but is AES fusion not beneficial for it or is it planned to do it > later ? It's early days for Cortex-A35, GCC 6 just has initial support. When

RE: [PATCH 1/4 v2][AArch64] Generalize CCMP support

2015-11-27 Thread Wilco Dijkstra
> James Greenhalgh wrote: > > Could you please repost this with the word-wrapping issues fixed. > > I can't apply it to my tree for review or to commit it on your behalf in > > the current form. So it looks like Outlook no longer supports sending emails without wrapping and the maximum is only

[COMMITTED][AArch64] Fix PR93565 testcase for ILP32.

2020-02-17 Thread Wilco Dijkstra
Fix PR93565 testcase for ILP32. Committed as obvious. testsuite/ * gcc.target/aarch64/pr93565.c: Fix test for ilp32. -- diff --git a/gcc/testsuite/gcc.target/aarch64/pr93565.c b/gcc/testsuite/gcc.target/aarch64/pr93565.c index 7200f80..fb64f5c 100644 --- a/gcc/testsuite/gcc.target/aarch

Re: [PATCH][AARCH64] Fix for PR86901

2020-03-03 Thread Wilco Dijkstra
Hi Modi, > The zero extract now matching against other modes would generate a test + > branch rather > than the combined instruction which led to the code size regression. I've > updated the patch > so that tbnz etc. matches GPI and that brings code size down to <0.2% in > spec2017 and <0.4% in

Re: [PATCH][AARCH64] Fix for PR86901

2020-03-03 Thread Wilco Dijkstra
Hi Modi, > The zero extract now matching against other modes would generate a test + > branch rather > than the combined instruction which led to the code size regression. I've > updated the patch > so that tbnz etc. matches GPI and that brings code size down to <0.2% in > spec2017 and <0.4% in

[PATCH][AArch64] Fix lane specifier syntax

2020-03-06 Thread Wilco Dijkstra
The syntax for lane specifiers uses a vector element rather than a vector: fmlsv0.2s, v1.2s, v1.s[1] // rather than v1.2s[2] Fix all the lane specifiers to use Vetype which uses the correct element type. Regress&bootstrap pass. ChangeLog: 2020-03-06 Wilco Dijkstra * aar

[PATCH][AArch64] Use intrinsics for widening multiplies (PR91598)

2020-03-06 Thread Wilco Dijkstra
cores. Fix this by adding new patterns and intrinsics for widening multiplies, which results in a 63% speedup for the example in the PR. This fixes the performance regression. Passes regress&bootstrap. ChangeLog: 2020-03-06 Wilco Dijkstra PR target/91598 * config/aarch6

Re: [PATCH][AArch64] Use intrinsics for widening multiplies (PR91598)

2020-03-09 Thread Wilco Dijkstra
Hi Christophe, > I noticed a regression introduced by Delia's patch "aarch64: ACLE > intrinsics for BFCVTN, BFCVTN2 and BFCVT": > (on aarch64-linux-gnu) > FAIL: g++.dg/cpp0x/variadic-sizeof4.C  -std=c++14 (internal compiler error) > > I couldn't reproduce it with current ToT, until I realized that

Re: [PING^2][PATCH] Fix documentation of -mpoke-function-name ARM option

2020-03-09 Thread Wilco Dijkstra
Hi, There is no single PC offset that is correct given CPUs may use different offsets. GCC may also schedule the instruction that stores the PC. This feature used to work on early Arms but is no longer functional or useful today, so the best way forward is to remove it altogether. There are many

Re: [PATCH 2/2] [aarch64] Rework fpcr fpsr getter/setter builtins

2020-03-17 Thread Wilco Dijkstra
Hi Andrea, I think the first part is fine when approved, but the 2nd part is problematic like Szabolcs already pointed out. We can't just change the ABI or semantics, and these builtins are critical for GLIBC performance. We would first need to change GLIBC back to using inline assembler so it

Re: [PATCH 0/6] aarch64: Implement TImode comparisons

2020-03-19 Thread Wilco Dijkstra
Hi Richard, Thanks for these patches - yes TI mode expansions can certainly be improved! So looking at your expansions for signed compares, why not copy the optimal sequence from 32-bit Arm? Any compare can be done in at most 2 instructions: void doit(void); void f(long long a) { if (a <= 1)

Re: [PATCH 0/6] aarch64: Implement TImode comparisons

2020-03-19 Thread Wilco Dijkstra
Hi Richard, > Any compare can be done in at most 2 instructions: > > void doit(void); > void f(long long a) > { > if (a <= 1) > doit(); > } > > f: > cmp r0, #2 > sbcs    r3, r1, #0 > blt .L4 > Well, this one requires that you be able to add 1 to an in

Re: [PATCH PR00002] aarch64:Add an error message in large code model for ilp32

2020-04-13 Thread Wilco Dijkstra
Hi Duanbo, > This is a simple fix for pr94577. > The option -mabi=ilp32 should not be used in large code model. Like x86, > using -mx32 and -mcmodel=large together will result in an error message. > On aarch64, there is no error message for this option conflict. > A solution to this problem can b

Re: [PATCH][ARM] Remove support for MULS

2019-10-10 Thread Wilco Dijkstra
Any further comments? Note GCC doesn't support S/UMULLS either since it is equally useless. It's no surprise that Thumb-2 removed support for flag-setting 64-bit multiplies, while AArch64 didn't add flag-setting multiplies. So there is no argument that these instructions are in any way useful to

Re: [PATCH][ARM] Correctly set SLOW_BYTE_ACCESS

2019-10-10 Thread Wilco Dijkstra
K, OK for commit? ChangeLog: 2019-09-11 Wilco Dijkstra * config/arm/arm.h (SLOW_BYTE_ACCESS): Set to 1. -- diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h index 8b92c830de09a3ad49420fdfacde02d8efc2a89b..11212d988a0f56299c2266bace80170d074be56c 100644 --- a/gcc/config/arm/ar

Re: [PATCH][AArch64] Set SLOW_BYTE_ACCESS

2019-10-10 Thread Wilco Dijkstra
s. OK for commit until we get rid of it? ChangeLog: 2017-11-17 Wilco Dijkstra gcc/ * config/aarch64/aarch64.h (SLOW_BYTE_ACCESS): Set to 1. -- diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h index 056110afb228fb919e837c04aa5e55

Re: [PATCH][AArch64] Fix symbol offset limit

2019-10-10 Thread Wilco Dijkstra
nces. Bootstrapped on AArch64, passes regress, OK for commit? ChangeLog: 2018-11-09 Wilco Dijkstra gcc/ * config/aarch64/aarch64.c (aarch64_classify_symbol): Apply reasonable limit to symbol offsets. testsuite/ * gcc.ta

Re: [PATCH][ARM] Switch to default sched pressure algorithm

2019-10-10 Thread Wilco Dijkstra
one-linux-gnueabihf --with-cpu=cortex-a57 ChangeLog: 2019-07-29 Wilco Dijkstra * config/arm/arm.c (arm_option_override): Don't override sched pressure algorithm. -- diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/

Re: [PATCH][ARM] Tweak HONOR_REG_ALLOC_ORDER

2019-10-10 Thread Wilco Dijkstra
? ChangeLog: 2019-09-09 Wilco Dijkstra * config/arm/arm.h (HONOR_REG_ALLOC_ORDER): Set when optimizing for size. -- diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h index 8d023389eec469ad9c8a4e88edebdad5f3c23769..e3473e29fbbb964ff1136c226fbe30d35dbf7b39 100644 --- a/gcc

Re: [PATCH][ARM] Enable arm_legitimize_address for Thumb-2

2019-10-10 Thread Wilco Dijkstra
while SPECFP improves 0.2%. Bootstrap OK, OK for commit? ChangeLog: 2019-09-09 Wilco Dijkstra * config/arm/arm.c (arm_legitimize_address): Remove Thumb-2 bailout. -- diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index a5a6a0fab1b4b7ef07931522e7d47e59842

Re: [PATCH][AArch64] PR79262: Adjust vector cost

2019-10-10 Thread Wilco Dijkstra
testcase - libquantum and SPECv6 performance improves. OK for commit? ChangeLog: 2018-01-22 Wilco Dijkstra PR target/79262 * config/aarch64/aarch64.c (generic_vector_cost): Adjust vec_to_scalar_cost. -- diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c

Re: [PATCH][ARM] Tweak HONOR_REG_ALLOC_ORDER

2019-10-11 Thread Wilco Dijkstra
Hi Ramana, > My only question would be whether it's more suitable to use > optimize_function_for_size_p(cfun) instead as IIRC that gives us a > chance with lto rather than the global optimize_size. Yes that is even better and that defaults to optimize_size if cfun isn't set. I've committed this:

Re: [PATCH][ARM] Enable arm_legitimize_address for Thumb-2

2019-10-11 Thread Wilco Dijkstra
Hi Ramana, >On Mon, Sep 9, 2019 at 6:03 PM Wilco Dijkstra wrote: >> >> Currently arm_legitimize_address doesn't handle Thumb-2 at all, resulting in >> inefficient code. Since Thumb-2 supports similar address offsets use the Arm >> legitimization code for Thumb-2

Re: [PATCH][ARM] Switch to default sched pressure algorithm

2019-10-11 Thread Wilco Dijkstra
Hi Ramana, > Can you see what happens with the Cortex-A8 or Cortex-A9 schedulers to > spread the range across some v7-a CPUs as well ? While they aren't that > popular today I > would suggest you look at them because the defaults for v7-a are still to use > the > Cortex-A8 scheduler and the Cor

Re: [PATCH][AArch64] Fix symbol offset limit

2019-10-11 Thread Wilco Dijkstra
Hi Richard, > If global_char really is a char then isn't that UB? No why? We can do all kinds of arithmetic based on pointers, either using pointer types or converted to uintptr_t. Note that the optimizer actually creates these expressions, for example arr[N-x] can be evaluated as (&arr[0] + N

Re: [PATCH][ARM] Switch to default sched pressure algorithm

2019-10-11 Thread Wilco Dijkstra
Hi, > the defaults for v7-a are still to use the > Cortex-A8 scheduler I missed that part, but that's a serious bug btw - Cortex-A8 is 15 years old now so way beyond obsolete. Even Cortex-A53 is ancient now, but it has an accurate scheduler that performs surprisingly well on both in-order and

Re: [PATCH][AArch64] Fix symbol offset limit

2019-10-14 Thread Wilco Dijkstra
Hi Richard, >> No - the testcases fail with that. > > Hmm, OK. Could you give more details? What does the motivating case > actually look like? Well it's now a very long time ago since I first posted this patch but the failure was in SPEC. It did something like &array[0xff000 - x], presuma

Re: [PATCH][AArch64] Fix symbol offset limit

2019-10-15 Thread Wilco Dijkstra
Hi Richard, > Sure, the "extern array of unknown size" case isn't about section anchors. > But this part of my message (snipped above) was about the other case > (objects of known size), and applied to individual objects as well as > section anchors. > > What I was trying to say is: yes, we need b

Re: [PATCH][ARM] Switch to default sched pressure algorithm

2019-10-16 Thread Wilco Dijkstra
Hi Christophe, > I've noticed that your patch caused a regression: > FAIL: gcc.dg/tree-prof/pr77698.c scan-rtl-dump-times alignments > "internal loop alignment added" 1 That's just a testism - it only tests for loop alignment and doesn't consider the possibility of the loop being jumped into like

<    1   2   3   4   5   6   7   8   9   10   >