Re: [PATCH v3 2/2] AArch64: Add SME LUTv2 intrinsics

2025-09-05 Thread Kyrylo Tkachov
> On 4 Sep 2025, at 16:13, Karl Meakin wrote: > > Add intrinsic functions for the SME LUTv2 architecture extension > (`svluti4_zt`, `svwrite_lane_zt` and `svwrite_zt`). > > gcc/ChangeLog: > > * config/aarch64/aarch64-sme.md (@aarch64_sme_write_zt): New > insn. > (aarch64_sme_lut_zt): Likewis

[PATCH] aarch64: Use SVE for V2DImode integer min/max operations

2025-09-04 Thread Kyrylo Tkachov
immediate operand case is obviously better as the SVE immediate form doesn't require a predicate operand. Bootstrapped and tested on aarch64-none-linux-gnu. Ok for trunk? Thanks, Kyrill Signed-off-by: Kyrylo Tkachov gcc/ * config/aarch64/iterators.md (sve_di_suf): New mode attr

Re: [PATCH] aarch64: PR target/121749: Use correct predicate for narrowing shift amounts

2025-09-03 Thread Kyrylo Tkachov
> On 3 Sep 2025, at 17:32, Kyrylo Tkachov wrote: > > > >> On 3 Sep 2025, at 17:26, Alex Coplan wrote: >> >> On 03/09/2025 09:40, Kyrylo Tkachov wrote: >>> Hi all, >>> >>> With g:d20b2ad845876eec0ee80a3933ad49f9f6c4ee30 the narro

Re: [PATCH] aarch64: PR target/121749: Use correct predicate for narrowing shift amounts

2025-09-03 Thread Kyrylo Tkachov
> On 3 Sep 2025, at 17:26, Alex Coplan wrote: > > On 03/09/2025 09:40, Kyrylo Tkachov wrote: >> Hi all, >> >> With g:d20b2ad845876eec0ee80a3933ad49f9f6c4ee30 the narrowing shift >> instructions >> are now represented with standard RTL and more merging

Re: [PATCH v1 2/2] AArch64: Add LUTv2 intrinsics

2025-09-03 Thread Kyrylo Tkachov
Hi Karl, > On 2 Sep 2025, at 16:16, Karl Meakin wrote: > > gcc/ChangeLog: > > * config/aarch64/aarch64-sme.md (@aarch64_sme_write_zt): New > insn. > (aarch64_sme_lut_zt): Likewise. > * config/aarch64/aarch64-sve-builtins-shapes.cc (parse_type): New type format > "%T". > (struct luti_lane_zt_b

[PATCH] aarch64: PR target/121749: Use correct predicate for narrowing shift amounts

2025-09-03 Thread Kyrylo Tkachov
trunk? Thanks, Kyrill Signed-off-by: Kyrylo Tkachov gcc/ PR target/121749 * config/aarch64/aarch64-simd.md (aarch64_shrn_n): Use aarch64_simd_shift_imm_offset_ instead of aarch64_simd_shift_imm_offset_ predicate. (aarch64_shrn_n VQN define_expand): Lik

Re: [PATCH v2 2/3]AArch64: Add support for addhn vectorizer optabs for Adv.SIMD

2025-09-03 Thread Kyrylo Tkachov
> On 2 Sep 2025, at 12:57, Tamar Christina wrote: > > This implements the new vector optabs vec_addh_narrow > adding support for in-vectorizer use for early break. > > Bootstrapped Regtested on aarch64-none-linux-gnu, > arm-none-linux-gnueabihf, x86_64-pc-linux-gnu > -m32, -m64 and no issues.

Re: [PATCH 3/3]middle-end: Use addhn for compression instead of inclusive OR when reducing comparison values

2025-09-02 Thread Kyrylo Tkachov
> On 2 Sep 2025, at 11:46, Tamar Christina wrote: > > Given a sequence such as > > int foo () > { > #pragma GCC unroll 4 > for (int i = 0; i < N; i++) >if (a[i] == 124) > return 1; > > return 0; > } > > where a[i] is long long, we will unroll the loop and use an OR reduction for >

Re: [PATCH 2/3]AArch64: Add support for addhn vectorizer optabs for Adv.SIMD

2025-09-02 Thread Kyrylo Tkachov
Hi Tamar, > On 2 Sep 2025, at 11:46, Tamar Christina wrote: > > This implements the new vector optabs vec_addh_narrow > adding support for in-vectorizer use for early break. The Advanced SIMD ADDHN instruction doesn’t perform a widening of the operands for the addition as far as I know. That i

Re: [PATCH][PR121599] aarch64: Fix ICE when op2 is zero for SVE2 saturating add intrinsics.

2025-08-25 Thread Kyrylo Tkachov
Hi Jennifer, > On 25 Aug 2025, at 12:56, Jennifer Schmitz wrote: > > When op2 in SVE2 saturating add intrinsics (svuqadd, svsqadd) is a zero > vector and predication is _z, an ICE in vregs occurs, e.g. for > > svuint8_t foo (svbool_t pg, svuint8_t op1) > { >return svsqadd_u8_z (pg, op1, svd

Re: [PATCH] MAINTAINERS: Update my email address and stand down as AArch64 maintainer

2025-08-21 Thread Kyrylo Tkachov
aarch64 ldp/stp Alex Coplan > aarch64 portRichard Earnshaw > -aarch64 portRichard Sandiford > aarch64 portTamar Christina > aarch64 portKyrylo Tkachov > alpha port Richard Henderson > -- > 2.50.1 >

Re: [PATCH] AArch64: Add isinf expander [PR 66462]

2025-08-15 Thread Kyrylo Tkachov
> On 13 Aug 2025, at 17:34, Richard Sandiford wrote: > > Wilco Dijkstra writes: >> Add an expander for isinf using integer arithmetic. This is >> typically faster and avoids generating spurious exceptions on >> signaling NaNs. >> >> int isinf1 (float x) { return __builtin_isinf (x); } >> >

Re: [PATCH v2] AArch64: Fix invalid immediate offsets in SVE gather/scatter [PR121449]

2025-08-08 Thread Kyrylo Tkachov
> On 8 Aug 2025, at 14:23, Tamar Christina wrote: > >> -Original Message- >> From: Pengfei Li >> Sent: Friday, August 8, 2025 11:00 AM >> To: Kyrylo Tkachov >> Cc: gcc-patches@gcc.gnu.org; Richard Sandiford ; >> Tamar Christina >> Su

Re: [PATCH] AArch64: Fix invalid immediate offsets in SVE gather/scatter [PR121449]

2025-08-07 Thread Kyrylo Tkachov
Hi Pengfei, > On 7 Aug 2025, at 18:45, Pengfei Li wrote: > > This patch fixes incorrect constraints in RTL patterns for AArch64 SVE > gather/scatter with type widening/narrowing and vector-plus-immediate > addressing. The bug leads to below "immediate offset out of range" > errors during assembl

Re: [PATCH 1/2] aarch64: Fix predication of FP8 FDOT insns [PR120986]

2025-08-05 Thread Kyrylo Tkachov
> On 5 Aug 2025, at 11:00, Alex Coplan wrote: > > Hi Kyrill, > > Sorry for the slow reply, I was away on holiday until yesterday. > > On 15/07/2025 13:08, Kyrylo Tkachov wrote: >> Hi Alex, >> >>> On 15 Jul 2025, at 14:59, Alex Coplan wrote: >>

Re: [PATCH 3/3] AArch64: Enable dispatch scheduling for Neoverse V2.

2025-08-01 Thread Kyrylo Tkachov
> On 31 Jul 2025, at 17:20, Tamar Christina wrote: > >> -Original Message----- >> From: Kyrylo Tkachov >> Sent: Thursday, July 31, 2025 3:47 PM >> To: Jennifer Schmitz >> Cc: GCC Patches ; Andrew Pinski >> ; Richard Earnshaw ; Richard >&

Re: [PATCH 12/12] aarch64: Check the mode of SVE ACLE function results

2025-07-31 Thread Kyrylo Tkachov
> On 29 Jul 2025, at 18:41, Richard Sandiford wrote: > > After previous patches, we should always get a VNx16BI result > for ACLE intrinsics that return svbool_t. This patch adds > an assert that checks a more general condition than that. > Ok. Thanks, Kyrill > gcc/ > * config/aarch64/aarc

Re: [PATCH 11/12] aarch64: Use VNx16BI for svdupq_b*

2025-07-31 Thread Kyrylo Tkachov
> On 29 Jul 2025, at 18:41, Richard Sandiford wrote: > > This patch continues the work of making ACLE intrinsics use VNx16BI > for svbool_t results. It deals with the predicate forms of svdupq. > > The general predicate expansion builds an equivalent integer vector > and then compares it wit

Re: [PATCH 10/12] aarch64: Use VNx16BI for svdup_b*

2025-07-31 Thread Kyrylo Tkachov
> On 29 Jul 2025, at 18:41, Richard Sandiford wrote: > > This patch continues the work of making ACLE intrinsics use VNx16BI > for svbool_t results. It deals with the predicate forms of svdup. > Ok. Thanks, Kyrill > gcc/ > * config/aarch64/aarch64-protos.h > (aarch64_emit_sve_pred_vec_dupl

Re: [PATCH 09/12] aarch64: Use VNx16BI for svpnext*

2025-07-31 Thread Kyrylo Tkachov
> On 29 Jul 2025, at 18:41, Richard Sandiford wrote: > > This patch continues the work of making ACLE intrinsics use VNx16BI > for svbool_t results. It deals with the svpnext* intrinsics. > I wonder if the new patterns need pred_clobber alternatives in this and the other patches? If they d

Re: [PATCH 08/12] aarch64: Use VNx16BI for sv(n)match*

2025-07-31 Thread Kyrylo Tkachov
> On 29 Jul 2025, at 18:41, Richard Sandiford wrote: > > This patch continues the work of making ACLE intrinsics use VNx16BI > for svbool_t results. It deals with the svmatch* and svnmatch* > intrinsics. > Ok. Thanks, Kyrill > gcc/ > * config/aarch64/aarch64-sve2.md (@aarch64_pred_): > Spl

Re: [PATCH 07/12] aarch64: Use VNx16BI for svac*

2025-07-31 Thread Kyrylo Tkachov
> On 29 Jul 2025, at 18:41, Richard Sandiford wrote: > > This patch continues the work of making ACLE intrinsics use VNx16BI > for svbool_t results. It deals with the svac* intrinsics (floating- > point compare absolute). Ok. Thanks, Kyrill > > gcc/ > * config/aarch64/aarch64-sve.md (@aarc

Re: [PATCH 05/12] aarch64: Use VNx16BI for svcmp*_wide

2025-07-31 Thread Kyrylo Tkachov
> On 29 Jul 2025, at 18:41, Richard Sandiford wrote: > > This patch continues the work of making ACLE intrinsics use VNx16BI > for svbool_t results. It deals with the svcmp*_wide intrinsics. > > Since the only uses of these patterns are for ACLE intrinsics, > there didn't seem much point add

Re: [PATCH 04/12] aarch64: Drop unnecessary GPs in svcmp_wide PTEST patterns

2025-07-31 Thread Kyrylo Tkachov
> On 29 Jul 2025, at 18:41, Richard Sandiford wrote: > > Patterns that fuse a predicate operation P with a PTEST use > aarch64_sve_same_pred_for_ptest_p to test whether the governing > predicates of P and the PTEST are compatible. Most patterns were also > written as define_insn_and_rewrites,

Re: [PATCH 03/12] aarch64: Use the correct GP mode in the svcmp_wide patterns

2025-07-31 Thread Kyrylo Tkachov
> On 29 Jul 2025, at 18:41, Richard Sandiford wrote: > > The patterns for the svcmp_wide intrinsics used a VNx16BI > input predicate for all modes, instead of the usual . > That unnecessarily made some input bits significant, but more > importantly, it triggered an ICE in aarch64_sve_same_pred

Re: [PATCH 02/12] aarch64: Use VNx16BI for non-widening integer svcmp*

2025-07-31 Thread Kyrylo Tkachov
> On 29 Jul 2025, at 18:41, Richard Sandiford wrote: > > This patch continues the work of making ACLE intrinsics use VNx16BI > for svbool_t results. It deals with the non-widening integer forms > of svcmp*. The handling of the PTEST patterns is similar to that > for the earlier svwhile* patc

Re: [PATCH 3/3] AArch64: Enable dispatch scheduling for Neoverse V2.

2025-07-31 Thread Kyrylo Tkachov
> On 29 Jul 2025, at 17:14, Jennifer Schmitz wrote: > > This patch adds dispatch constraints for Neoverse V2 and illustrates the steps > necessary to enable dispatch scheduling for an AArch64 core. > > The dispatch constraints are based on section 4.1 of the Neoverse V2 SWOG. > Please note tha

Re: [PATCH] AArch64: Fix test for vector length safety

2025-07-31 Thread Kyrylo Tkachov
> On 31 Jul 2025, at 14:34, Tejas Belagod wrote: > > The test was unsafe when tested on different vector lengths. This patch > fixes it to work on all lengths. > Ok. I’ve seen this test fail on GCC 15 branch too, do we want this fix there as well? Thanks, Kyrill > gcc/testsuite/ChangeLog >

Re: [PATCH 01/12] aarch64: Use VNx16BI for svunpklo/hi_b

2025-07-30 Thread Kyrylo Tkachov
> On 29 Jul 2025, at 18:41, Richard Sandiford wrote: > > This patch continues the work of making ACLE intrinsics use VNx16BI > for svbool_t results. It deals with the svunpk* intrinsics. > LGTM. Thanks, Kyrill > gcc/ > * config/aarch64/aarch64-sve.md (@aarch64_sve_punpk_acle) > (*aarch64_s

Re: [PATCH 06/12] aarch64: Use VNx16BI for floating-point svcmp*

2025-07-29 Thread Kyrylo Tkachov
> On 29 Jul 2025, at 18:41, Richard Sandiford wrote: > > This patch continues the work of making ACLE intrinsics use VNx16BI > for svbool_t results. It deals with the floating-point forms of svcmp*. > > gcc/ > * config/aarch64/aarch64-sve.md (@aarch64_pred_fcm_acle) > (*aarch64_pred_fcm_acle,

Re: [PATCH 1/2] testsuite: Make aarch64/cmpbr.c more forgiving

2025-07-29 Thread Kyrylo Tkachov
> On 29 Jul 2025, at 15:31, Richard Sandiford wrote: > > The 8-bit and 16-bit tests in cmpbr.c assumed an inverted operand > order ("w1, w0"), but it's possible to use the uninverted operand > order too. This patch generalises the tests to support both forms. > > This is a prerequisite for a

Re: [PATCH] aarch64: Fix function_expander::get_reg_target

2025-07-29 Thread Kyrylo Tkachov
> On 29 Jul 2025, at 15:15, Remi Machet wrote: > > > On 7/29/25 14:44, Richard Sandiford wrote: >> External email: Use caution opening links or attachments >> >> >> function_expander::get_reg_target didn't actually check for a register, >> meaning that it could return a memory target instea

Re: [PATCH 2/2] aarch64: Prevent streaming-compatible code from assembler rejection [PR121028]

2025-07-28 Thread Kyrylo Tkachov
> On 28 Jul 2025, at 17:10, Remi Machet wrote: > > > On 7/28/25 17:02, Kyrylo Tkachov wrote: >> External email: Use caution opening links or attachments >> >> >> Hi Spencer, >> >>> On 28 Jul 2025, at 16:25, Spencer Abson wrote: >

Re: [PATCH 2/2] aarch64: Prevent streaming-compatible code from assembler rejection [PR121028]

2025-07-28 Thread Kyrylo Tkachov
Hi Spencer, > On 28 Jul 2025, at 16:25, Spencer Abson wrote: > > Streaming-compatible functions can be compiled without SME enabled, but need > to use "SMSTART SM" and "SMSTOP SM" to temporarily switch into the streaming > mode of a callee. These switches are conditional on the current mode be

Re: [PATCH] aarch64: Add tuning model for Olympus core.

2025-07-28 Thread Kyrylo Tkachov
> On 27 Jul 2025, at 03:31, Andrew Pinski wrote: > > On Fri, Jul 25, 2025 at 5:14 AM Jennifer Schmitz wrote: >> >> This patch adds a new tuning model for the NVIDIA Olympus core. >> The values used here are based on the Software Optimization Guide >> that will be published imminently. >> >>

Re: [PATCH 2/2] aarch64: Allow CPU tuning to avoid INS-(W|X)ZR instructions

2025-07-21 Thread Kyrylo Tkachov
> On 21 Jul 2025, at 11:43, Kyrylo Tkachov wrote: > > Hi Tamar, > >> On 21 Jul 2025, at 11:12, Tamar Christina wrote: >> >> Hi Kyrill, >> >>> -Original Message- >>> From: Kyrylo Tkachov >>> Sent: Friday, July 18, 2025

Re: [PATCH 2/2] aarch64: Allow CPU tuning to avoid INS-(W|X)ZR instructions

2025-07-21 Thread Kyrylo Tkachov
Hi Tamar, > On 21 Jul 2025, at 11:12, Tamar Christina wrote: > > Hi Kyrill, > >> -Original Message----- >> From: Kyrylo Tkachov >> Sent: Friday, July 18, 2025 10:40 AM >> To: GCC Patches >> Cc: Tamar Christina ; Richard Sandiford >> ; Andr

Re: [PATCH 2/2] aarch64: Allow CPU tuning to avoid INS-(W|X)ZR instructions

2025-07-18 Thread Kyrylo Tkachov
Hi Jennifer, > On 18 Jul 2025, at 17:08, Jennifer Schmitz wrote: > > > >> On 18 Jul 2025, at 11:39, Kyrylo Tkachov wrote: >> >> External email: Use caution opening links or attachments >> >> >> Hi all, >> >> For insertin

Re: [PATCH 1/2] aarch64: NFC - Make vec_* rtx costing logic consistent

2025-07-18 Thread Kyrylo Tkachov
Hi Tamar, > On 18 Jul 2025, at 18:25, Tamar Christina wrote: > > Hi Kyrill, > >> -Original Message----- >> From: Kyrylo Tkachov >> Sent: Friday, July 18, 2025 10:40 AM >> To: GCC Patches >> Cc: Tamar Christina ; Richard Sandiford >> ; Alex C

[PATCH 2/2] aarch64: Allow CPU tuning to avoid INS-(W|X)ZR instructions

2025-07-18 Thread Kyrylo Tkachov
Kyrill Signed-off-by: Kyrylo Tkachov gcc/ * config/arm/aarch-common-protos.h (vector_cost_table): Add ins_gp field. Add comments to other vector cost fields. * config/aarch64/aarch64.cc (aarch64_rtx_costs): Handle VEC_MERGE case. * config/aarch64/aarch6

[PATCH 1/2] aarch64: NFC - Make vec_* rtx costing logic consistent

2025-07-18 Thread Kyrylo Tkachov
-by: Kyrylo Tkachov gcc/ * config/aarch64/aarch64.cc (aarch64_rtx_costs): Add extra_cost values only when speed is true for CONST_VECTOR, VEC_DUPLICATE, VEC_SELECT cases. * config/aarch64/aarch64-cost-tables.h (qdf24xx_extra_costs, thunderx_extra_costs

Re: [PATCH] aarch64: Use SVE2 BSL2N for vector EON

2025-07-15 Thread Kyrylo Tkachov
> On 15 Jul 2025, at 15:50, Richard Sandiford wrote: > > Kyrylo Tkachov writes: >> Hi all, >> >> SVE2 BSL2N (x, y, z) = (x & z) | (~y & ~z). When x == y this computes: >> (x & z) | (~x & ~z) which is ~(x ^ z). >> Thus, we can use it

Re: [PATCH 2/2] aarch64: Relax fpm_t assert to allow const_ints [PR120986]

2025-07-15 Thread Kyrylo Tkachov
> On 15 Jul 2025, at 15:01, Alex Coplan wrote: > > Hi, > > This relaxes an overzealous assert that required the fpm_t argument to > be in DImode when expanding FP8 intrinsics. Of course this fails to > account for modeless const_ints. > > Bootstrapped/regtested on aarch64-linux-gnu, OK for

Re: [PATCH 1/2] aarch64: Fix predication of FP8 FDOT insns [PR120986]

2025-07-15 Thread Kyrylo Tkachov
Hi Alex, > On 15 Jul 2025, at 14:59, Alex Coplan wrote: > > Hi, > > The predication of the SVE2 FP8 dot product insns was relying on the > architectural dependency: > > FEAT_FP8DOT2 => FEAT_FP8DOT4 > > which was relaxed in GCC as of > r15-7480-g299a8e2dc667e795991bc439d2cad5ea5bd379e2, thus l

[PATCH] aarch64: Use SVE2 BSL2N for vector EON

2025-07-15 Thread Kyrylo Tkachov
not z0.d, p3/m, z0.d ret Bootstrapped and tested on aarch64-none-linux-gnu. Ok for trunk? Thanks, Kyrill Signed-off-by: Kyrylo Tkachov gcc/ * config/aarch64/aarch64-sve2.md (*aarch64_sve2_bsl2n_eon): New pattern. (*aarch64_sve2_eon_bsl2n_unpred)

[PATCH] aarch64: Use SVE2 NBSL for vector NOR and NAND for Advanced SIMD modes

2025-07-15 Thread Kyrylo Tkachov
nerate the MOVPRFX when the operands fall that way, but I guess having a 2-insn MOVPRFX form is not worse than the current 2-insn codegen at least, and the MOVPRFX can be fused by many cores. Bootstrapped and tested on aarch64-none-linux-gnu. Ok for trunk? Thanks, Kyrill Signed-off-by: Kyrylo Tka

Re: [PATCH 3/7] aarch64: Handle DImode BCAX operations

2025-07-15 Thread Kyrylo Tkachov
> On 8 Jul 2025, at 17:43, Richard Sandiford wrote: > > Kyrylo Tkachov writes: >> Thanks for your comments, do you mean something like the following? > > Yeah, the patch LGTM, thanks. So it turned out that doing this in the EOR3 pattern in patch 4/7 caused wrong-co

Re: [PATCH 4/7] aarch64: Use EOR3 for DImode values

2025-07-15 Thread Kyrylo Tkachov
I had pushed this patch on Friday but have reverted it on trunk now because it seems to be causing miscomputes in 531.deepsjeng_r. Thanks, Kyrill > On 8 Jul 2025, at 08:28, Tamar Christina wrote: > >> -Original Message----- >> From: Kyrylo Tkachov >> Sent: Monda

Re: [PATCH] arm: avoid gcc_s dependency

2025-07-14 Thread Kyrylo Tkachov
+ arm maintainers. Hi Pierre, > On 14 Jul 2025, at 14:07, Pierre Ossman wrote: > > Suggested fix for this issue: > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60428 > > Did not get any response there, so seeing if this is a better forum for > suggested changes. > > We've been using this

Re: [PATCH] aarch64: PR target/120999: Avoid movprfx for NBSL implementation of NOR

2025-07-11 Thread Kyrylo Tkachov
> On 11 Jul 2025, at 16:48, Richard Sandiford wrote: > > Kyrylo Tkachov writes: >>> On 10 Jul 2025, at 11:12, Kyrylo Tkachov wrote: >>> >>> >>> >>>> On 10 Jul 2025, at 10:40, Richard Sandiford >>>> wrote: >>>

Re: [PATCH] aarch64: PR target/120999: Avoid movprfx for NBSL implementation of NOR

2025-07-11 Thread Kyrylo Tkachov
> On 10 Jul 2025, at 11:12, Kyrylo Tkachov wrote: > > > >> On 10 Jul 2025, at 10:40, Richard Sandiford >> wrote: >> >> Kyrylo Tkachov writes: >>> Hi all, >>> >>> While the SVE2 NBSL instruction accepts MOVPRFX to add more f

Re: [PATCH] aarch64: PR target/120999: Avoid movprfx for NBSL implementation of NOR

2025-07-10 Thread Kyrylo Tkachov
> On 10 Jul 2025, at 10:40, Richard Sandiford wrote: > > Kyrylo Tkachov writes: >> Hi all, >> >> While the SVE2 NBSL instruction accepts MOVPRFX to add more flexibility >> due to its tied operands, the destination of the movprfx cannot be also >> a so

Re: [PATCH] aarch64: Add support for NVIDIA GB10

2025-07-10 Thread Kyrylo Tkachov
> On 18 Jun 2025, at 17:26, Kyrylo Tkachov wrote: > > Hi all, > > This adds support for -mcpu=gb10. This is a big.LITTLE configuration > involving Cortex-X925 and Cortex-A725 cores. The appropriate MIDR numbers > are added to detect them in -mcpu=native. We did not add a

[PATCH] aarch64: PR target/120999: Avoid movprfx for NBSL implementation of NOR

2025-07-10 Thread Kyrylo Tkachov
nbsl z0.d, z0.d, z2.d, z0.d ret which generated a gas warning. Bootstrapped and tested on aarch64-none-linux-gnu. Ok for trunk? Do we want to backport it? Thanks, Kyrill Signed-off-by: Kyrylo Tkachov gcc/ PR target/120999 * config/aarch64/aarch64-sve2.md (*aa

Re: [PATCH] Change bellow in comments to below

2025-07-10 Thread Kyrylo Tkachov
> On 10 Jul 2025, at 08:09, Jakub Jelinek wrote: > > Hi! > > While I'm not a native English speaker, I believe all the uses > of bellow (roar/bark/...) in comments in gcc are meant to be > below (beneath/under/...). > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? > >

Re: [PATCH] aarch64: Implement sme2+faminmax extension.

2025-07-09 Thread Kyrylo Tkachov
Hi Alfie, > On 7 Jul 2025, at 10:46, Alfie Richards wrote: > > Hello all, > > This patch implements the couple of amin/amax instructions that are part of > SME2 + faminmax. > > Regression testsed and bootstrapped for Aarch64. > > Thanks, > Alfie > > -- >8 -- > > Implements the sme2+faminmax

Re: [PATCH 3/7] aarch64: Handle DImode BCAX operations

2025-07-08 Thread Kyrylo Tkachov
> On 8 Jul 2025, at 12:39, Tamar Christina wrote: > >> -Original Message- >> From: Richard Sandiford >> Sent: Tuesday, July 8, 2025 10:07 AM >> To: Tamar Christina >> Cc: Kyrylo Tkachov ; GCC Patches > patc...@gcc.gnu.org>; Richard

Re: [PATCH] aarch64: Improve popcountti2 with SVE

2025-07-07 Thread Kyrylo Tkachov
> On 7 Jul 2025, at 13:27, Richard Sandiford wrote: > > Tamar Christina writes: >>> -Original Message- >>> From: Kyrylo Tkachov >>> Sent: Monday, July 7, 2025 10:38 AM >>> To: GCC Patches >>> Cc: Richard Sandiford ; Richard Earns

[PATCH 6/7] aarch64: Use SVE2 BSL1N for DImode arguments

2025-07-07 Thread Kyrylo Tkachov
for trunk? Thanks, Kyrill Signed-off-by: Kyrylo Tkachov gcc/ * config/aarch64/aarch64-sve2.md (*aarch64_sve2_bsl1n_unpreddi): New define_insn_and_split. gcc/testsuite/ * gcc.target/aarch64/sve2/bsl1n_d.c: New test. 0006-aarch64-Use-SVE2-BSL1N-for-DImode-arguments.patch

[PATCH 4/7] aarch64: Use EOR3 for DImode values

2025-07-07 Thread Kyrylo Tkachov
x1_t a, uint64x1_t b, uint64x1_t c) { return EOR3 (a, b, c); } We generate the desired: eor3_d_gp: eor x1, x1, x2 eor x0, x1, x0 ret eor3_d: eor3 v0.16b, v0.16b, v1.16b, v2.16b ret Bootstrapped and tested on aarch64-none-linux-gnu. Ok for trunk? Thanks, Kyrill Signed-off-by: Kyrylo Tkachov

[PATCH 7/7] aarch64: Use BSL2N for DImode operands

2025-07-07 Thread Kyrylo Tkachov
ested on aarch64-none-linux-gnu. Ok for trunk? Thanks, Kyrill Signed-off-by: Kyrylo Tkachov gcc/ * config/aarch64/aarch64-sve2.md (*aarch64_sve2_bsl2n_unpreddi): New define_insn_and_split. * config/aarch64/aarch64.cc (aarch64_bsl2n_rtx_form_p): Define. (aarch64_rt

[PATCH 5/7] aarch64: Use SVE2 NBSL for DImode arguments

2025-07-07 Thread Kyrylo Tkachov
trunk? Thanks, Kyrill Signed-off-by: Kyrylo Tkachov gcc/ * config/aarch64/aarch64-sve.md (*aarch64_sve2_nbsl_unpreddi): New define_insn_and_split. gcc/testsuite/ * gcc.target/aarch64/sve2/nbsl_d.c: New test. 0005-aarch64-Use-SVE2-NBSL-for-DImode-arguments.patch Description:

[PATCH 2/7] aarch64: Use EOR3 for 64-bit vector modes

2025-07-07 Thread Kyrylo Tkachov
of: bcax_s: eor v1.8b, v1.8b, v2.8b eor v0.8b, v1.8b, v0.8b ret Bootstrapped and tested on aarch64-none-linux-gnu. Ok for trunk? Thanks, Kyrill Signed-off-by: Kyrylo Tkachov gcc/ * config/aarch64/aarch64-simd.md (eor3q4): Use VDQ_I mode iterator. gcc/testsuite

[PATCH 3/7] aarch64: Handle DImode BCAX operations

2025-07-07 Thread Kyrylo Tkachov
b ret When the inputs are in SIMD regs we use BCAX and when they are in GP regs we don't force them to SIMD with extra moves. Bootstrapped and tested on aarch64-none-linux-gnu. Ok for trunk? Thanks, Kyrill Signed-off-by: Kyrylo Tkachov gcc/ * config/aarch64/aarch64-simd

[PATCH 1/7] aarch64: Allow 64-bit vector modes in pattern for BCAX instruction

2025-07-07 Thread Kyrylo Tkachov
rovement always. Bootstrapped and tested on aarch64-none-linux-gnu. Ok for trunk? Thanks, Kyrill Signed-off-by: Kyrylo Tkachov gcc/ * config/aarch64/aarch64-simd.md (bcaxq4): Use VDQ_I mode iterator. gcc/testsuite/ * gcc.target/aarch64/simd/bcax_d.c: New test. 0001-a

Re: [PATCH 0/7] Improve bit-manipulation SIMD codegen for 64-bit types

2025-07-07 Thread Kyrylo Tkachov
Resending due to difficulties with my email > On 7 Jul 2025, at 11:56, Kyrylo Tkachov wrote: > > Hi all, > > This series improves code generation for 64-bit vector types as well as the > scalar DImode types. > It makes use of SHA3 and SVE2 instructions like BCAX, EOR3

[PATCH] aarch64: Improve popcountti2 with SVE

2025-07-07 Thread Kyrylo Tkachov
cheap itself and can be scheduled away from the critical path or even CSE'd with other PTRUE constants. As this sequence is larger code size-wise it is avoided for -Os. Bootstrapped and tested on aarch64-none-linux-gnu. Ok for trunk? Thanks, Kyrill Signed-off-by: Kyrylo Tkachov

Re: [PATCH 2/2] aarch64: Drop const_int from aarch64_maskload_else_operand

2025-07-02 Thread Kyrylo Tkachov
> On 1 Jul 2025, at 18:37, Alex Coplan wrote: > > The "else operand" to maskload should always be a const_vector, never a > const_int. > > This was just an issue I noticed while looking through the code, I don't > have a testcase which shows a concrete problem due to this. > > Testing of tha

Re: [PATCH] aarch64: Enable selective LDAPUR generation for cores with RCPC2

2025-07-01 Thread Kyrylo Tkachov
> On 1 Jul 2025, at 17:36, Richard Sandiford wrote: > > Soumya AR writes: >> From 2a2c3e3683aaf3041524df166fc6f8cf20895a0b Mon Sep 17 00:00:00 2001 >> From: Soumya AR >> Date: Mon, 30 Jun 2025 12:17:30 -0700 >> Subject: [PATCH] aarch64: Enable selective LDAPUR generation for cores with >> RCP

Re: [PATCH] aarch64: Sync `aarch64-sys-regs.def' with Binutils.

2025-07-01 Thread Kyrylo Tkachov
> On 17 Jun 2025, at 12:19, Kyrylo Tkachov wrote: > > > >> On 4 Apr 2025, at 20:28, ezra.sito...@arm.com wrote: >> >> From: Ezra Sitorus >> >> This patch updates `aarch64-sys-regs.def', bringing it into sync with >> the Binutil

[PATCH] aarch64: Add support for NVIDIA GB10

2025-06-18 Thread Kyrylo Tkachov
trunk and GCC 15 when I’m back. Thanks, Kyrill Signed-off-by: Kyrylo Tkachov gcc/ * config/aarch64/aarch64-cores.def (gb10): New entry. * config/aarch64/aarch64-tune.md: Regenerate. * doc/invoke.texi (AArch64 Options): Document the above. 0001-aarch64-Add-support-for

Re: [PATCH] aarch64: Add vec_set/extract for tuple modes [PR113027]

2025-06-17 Thread Kyrylo Tkachov
> On 16 Jun 2025, at 09:54, Richard Sandiford wrote: > > We generated inefficient code for bitfield references to Advanced > SIMD structure modes. In RTL, these modes are just extra-long > vectors, and so inserting and extracting an element is simply > a vec_set or vec_extract operation. > >

Re: [PATCH] aarch64: Sync `aarch64-sys-regs.def' with Binutils.

2025-06-17 Thread Kyrylo Tkachov
> On 4 Apr 2025, at 20:28, ezra.sito...@arm.com wrote: > > From: Ezra Sitorus > > This patch updates `aarch64-sys-regs.def', bringing it into sync with > the Binutils source after this change: > https://sourceware.org/pipermail/binutils/2025-March/139894.html Ok. I think these changes are co

Re: [PATCH] aarch64: Fold NOT+PTEST to NOTS [PR118150]

2025-06-13 Thread Kyrylo Tkachov
Hi Spencer, Thanks for the patch. > On 13 Jun 2025, at 14:46, Spencer Abson wrote: > > Add the missing combiner patterns for folding NOT+PTEST to NOTS when > they share the same GP. > I guess GP here means “governing predicate”? GP usually means “General Purpose (register)” in aarch64 so it’d

Re: [PATCH] AArch64 SIMD: convert mvn+shrn into mvni+subhn

2025-06-12 Thread Kyrylo Tkachov
> On 12 Jun 2025, at 18:20, Remi Machet wrote: > > > On 6/12/25 12:02, Richard Sandiford wrote: >> External email: Use caution opening links or attachments >> >> >> Remi Machet writes: >>> Add an optimization to aarch64 SIMD converting mvn+shrn into mvni+subhn >>> which >>> allows for bett

Re: [PATCH] AArch64 SIMD: convert mvn+shrn into mvni+subhn

2025-06-12 Thread Kyrylo Tkachov
> On 12 Jun 2025, at 18:02, Richard Sandiford wrote: > > Remi Machet writes: >> Add an optimization to aarch64 SIMD converting mvn+shrn into mvni+subhn >> which >> allows for better optimization when the code is inside a loop by using a >> constant. >> >> Bootstrapped and regtested on aarch6

Re: [PATCH] aarch64: Incorrect removal of ZA restore [PR120624]

2025-06-12 Thread Kyrylo Tkachov
> On 11 Jun 2025, at 16:22, Richard Sandiford wrote: > > The PCS defines a lazy save scheme for managing ZA across normal > "private-ZA" functions. GCC currently uses this scheme for calls > to all private-ZA functions (rather than using caller-save). > > Therefore, before a sequence of call

Re: AArch64 promote aarch64-autovec-peference to mautovec-preference

2025-06-03 Thread Kyrylo Tkachov
> On 3 Jun 2025, at 17:56, Richard Sandiford wrote: > > Tamar Christina writes: >> As requested in my patch for -mmax-vectorization this promotes the parameter >> --param aarch64-autovec-preference to a first class top target flag. >> >> If both the parameter and the flag is specified the par

Re: [PATCH][GCC16][GCC15] aarch64: Add support for FUJITSU-MONAKA (-mcpu=fujitsu-monaka) CPU

2025-05-29 Thread Kyrylo Tkachov
> On 28 May 2025, at 13:36, Kyrylo Tkachov wrote: > > Hi Yuta-san > >> On 23 May 2025, at 07:49, Yuta Mukai (Fujitsu) >> wrote: >> >> Hello, >> >> We would like to enable features for FUJITSU-MONAKA that were implemented in >> GC

Re: [PATCH][GCC16][GCC15] aarch64: Add support for FUJITSU-MONAKA (-mcpu=fujitsu-monaka) CPU

2025-05-28 Thread Kyrylo Tkachov
Hi Yuta-san > On 23 May 2025, at 07:49, Yuta Mukai (Fujitsu) wrote: > > Hello, > > We would like to enable features for FUJITSU-MONAKA that were implemented in > GCC after we added support for FUJITSU-MONAKA. > As the features were implemented in GCC15, we also want to backport it to > GCC15.

Re: [PATCH] [PR120276] regcprop: Replace partial_subreg_p by ordered_p && maybe_lt

2025-05-16 Thread Kyrylo Tkachov
> On 16 May 2025, at 12:35, Richard Sandiford wrote: > > Jennifer Schmitz writes: >> The ICE in PR120276 resulted from a comparison of VNx4QI and V8QI using >> partial_subreg_p in the function copy_value during the RTL pass >> regcprop, failing the assertion in >> >> inline bool >> partial_su

Re: [PATCH] aarch64: Fix narrowing warning in driver-aarch64.cc [PR118603]

2025-05-16 Thread Kyrylo Tkachov
> On 10 May 2025, at 06:17, Andrew Pinski wrote: > > Since the AARCH64_CORE defines in aarch64-cores.def all use -1 for > the variant, it is just easier to add the cast to unsigned in the usage > in driver-aarch64.cc. > > Build and tested on aarch64-linux-gnu. Ok. Thanks, Kyrill > > gcc/Ch

Re: [PATCH] aarch64: Fix narrowing warning in aarch64_detect_vector_stmt_subtype

2025-05-16 Thread Kyrylo Tkachov
> On 10 May 2025, at 05:59, Andrew Pinski wrote: > > There is a narrowing warning in aarch64_detect_vector_stmt_subtype > about gather_load_x32_cost and gather_load_x64_cost converting from int to > unsigned. > These fields are always unsigned and even the constructor for sve_vec_cost > take

Re: [PATCH 8/9] AArch64: rules for CMPBR instructions

2025-05-09 Thread Kyrylo Tkachov
> On 8 May 2025, at 21:10, Karl Meakin wrote: > > Add rules for lowering `cbranch4` to CBB/CBH/CB when > CMPBR extension is enabled. > > gcc/ChangeLog: > > * config/aarch64/aarch64.md (cbranch4): Mmit CMPBR > instructions if possible. > (BRANCH_LEN_P_1Kib): New constant. > (BRANCH_LEN_N_1Kib)

Re: [PATCH 00/13] arm: Remove iWMMXT code generation

2025-05-08 Thread Kyrylo Tkachov
Hi Richard, > On 7 May 2025, at 18:15, Richard Earnshaw wrote: > > > The header file for the Arm implementation of mmintrin.h was changed in GCC-15 > to disable access to the intrinsics. This patch removes the internal code > as well. > > We still allow -mcpu/-march options for the wmmx cpus,

Re: [PATCH 3/8] AArch64: rename branch instruction rules

2025-05-07 Thread Kyrylo Tkachov
> On 7 May 2025, at 12:27, Karl Meakin wrote: > > Give the `define_insn` rules used in lowering `cbranch4` to RTL > more descriptive and consistent names: from now on, each rule is named > after the AArch64 instruction that it generates. Also add comments to > document each rule. > > gcc/Chang

Re: [PATCH 1/8] AArch64: place branch instruction rules together

2025-05-07 Thread Kyrylo Tkachov
> On 7 May 2025, at 12:27, Karl Meakin wrote: > > The rules for conditional branches were spread throughout `aarch64.md`. > Group them together so it is easier to understand how `cbranch4` > is lowered to RTL. > > gcc/ChangeLog: > > * config/aarch64/aarch64.md (condjump): move. > (*compare_co

Re: [PATCH 0/8] AArch64: CMPBR support

2025-05-07 Thread Kyrylo Tkachov
Hi Karl, > On 7 May 2025, at 12:27, Karl Meakin wrote: > > This patch series adds support for the CMPBR extension. It includes the > new `+cmpbr` option and rules to generate the new instructions when > lowering conditional branches. Thanks for the series. You didn’t state it explicitly, but ha

Re: [PATCH 8/8] AArch64: rules for CMPBR instructions

2025-05-07 Thread Kyrylo Tkachov
> On 7 May 2025, at 12:27, Karl Meakin wrote: > > Add rules for lowering `cbranch4` to CBB/CBH/CB when CMPBR > extension is enabled. > > gcc/ChangeLog: > > * config/aarch64/aarch64.md (cbranch4): emit CMPBR > instructions if possible. > (cbranch4): new expand rule. > (aarch64_cb): likewise. >

Re: [PATCH 7/8] AArch64: precommit test for CMPBR instructions

2025-05-07 Thread Kyrylo Tkachov
> On 7 May 2025, at 12:27, Karl Meakin wrote: > > Commit the test file `cmpbr.c` before rules for generating the new > instructions are added, so that the changes in codegen are more obvious > in the next commit. I guess that’s an LLVM best practice. In GCC since we have the check-function-bod

Re: [PATCH 6/8] AArch64: recognize `+cmpbr` option

2025-05-07 Thread Kyrylo Tkachov
> On 7 May 2025, at 12:27, Karl Meakin wrote: > > Add the `+cmpbr` option to enable the FEAT_CMPBR architectural > extension. > > gcc/ChangeLog: > > * config/aarch64/aarch64-option-extensions.def (cmpbr): new > option. > * config/aarch64/aarch64.h (TARGET_CMPBR): new macro. > * doc/invoke.tex

Re: [PATCH 5/8] AArch64: make `far_branch` attribute a boolean

2025-05-07 Thread Kyrylo Tkachov
> On 7 May 2025, at 12:27, Karl Meakin wrote: > > The `far_branch` attribute only ever takes the values 0 or 1, so make it > a `no/yes` valued string attribute instead. > > gcc/ChangeLog: > > * config/aarch64/aarch64.md (far_branch): replace 0/1 with > no/yes. > (aarch64_bcond): handle renam

Re: [PATCH 4/8] AArch64: add constants for branch displacements

2025-05-07 Thread Kyrylo Tkachov
> On 7 May 2025, at 12:27, Karl Meakin wrote: > > Extract the hardcoded values for the minimum PC-relative displacements > into named constants and document them. > > gcc/ChangeLog: > > * config/aarch64/aarch64.md (BRANCH_LEN_P_128MiB): New constant. > (BRANCH_LEN_N_128MiB): likewise. > (BRA

Re: [PATCH 2/8] AArch64: reformat branch instruction rules

2025-05-07 Thread Kyrylo Tkachov
> On 7 May 2025, at 12:27, Karl Meakin wrote: > > Make the formatting of the RTL templates in the rules for branch > instructions more consistent with each other. > > gcc/ChangeLog: > > * config/aarch64/aarch64.md (cbranch4): reformat. > (cbranchcc4): likewise. > (condjump): likewise. > (*co

Re: [RFC PATCH 3/5] json: Add get_map() method to JSON object class

2025-05-07 Thread Kyrylo Tkachov
> On 6 May 2025, at 10:30, Soumya AR wrote: > > From: Soumya AR > > This patch adds a get_map () method to the JSON object class to provide access > to the underlying hash map that stores the JSON key-value pairs. > > It also reorganizes the private and public sections of the class to expos

Re: [RFC PATCH 0/5] aarch64: Support for user-defined aarch64 tuning parameters in JSON

2025-05-07 Thread Kyrylo Tkachov
In Hi Richard, > On 6 May 2025, at 12:34, Richard Sandiford wrote: > > writes: >> From: Soumya AR >> >> Hi, >> >> This RFC and subsequent patch series introduces support for printing and >> parsing >> of aarch64 tuning parameters in the form of JSON. > > Thanks for doing this. It looks r

Re: [RFC PATCH 0/2] Add target_clones profile option support

2025-05-05 Thread Kyrylo Tkachov
> On 4 May 2025, at 19:19, Yangyu Chen wrote: > > Hi everyone, > > This patch series introduces support for the target_clones profile > option in GCC. This option enables users to specify target_clones > attributes in a separate file, allowing GCC to generate multiple > versions of the functio

[AArch64] changes.html: Fix typo

2025-05-02 Thread Kyrylo Tkachov
Pushing as obvious. Signed-off-by: Kyrylo Tkachov 0001-AArch64-changes.html-Fix-typo.patch Description: 0001-AArch64-changes.html-Fix-typo.patch

Re: [PATCH v4 2/2] Aarch64: Add __sqrt and __sqrtf intrinsics and corresponding tests

2025-05-01 Thread Kyrylo Tkachov
> On 1 May 2025, at 14:02, Ayan Shafqat wrote: > > On Thu, May 01, 2025 at 08:09:18AM +0000, Kyrylo Tkachov wrote: >> >> I was going to ask why not use the standard __buuiltin_sqrt builtins but I >> guess those don’t guarantee that we avoid a libcall in

Re: [PATCH v4 2/2] Aarch64: Add __sqrt and __sqrtf intrinsics and corresponding tests

2025-05-01 Thread Kyrylo Tkachov
> On 28 Apr 2025, at 21:29, Ayan Shafqat wrote: > > Rebased with gcc 15.1 > > This patch introduces two new inline functions, __sqrt and __sqrtf, in > arm_acle.h for Aarch64 targets. These functions wrap the new builtins > __builtin_aarch64_sqrtdf and __builtin_aarch64_sqrtsf, respectively, >

  1   2   3   4   5   6   7   8   9   10   >