Re: [PATCH 2/2] aarch64: Allow CPU tuning to avoid INS-(W|X)ZR instructions

2025-07-18 Thread Kyrylo Tkachov
Hi Jennifer, > On 18 Jul 2025, at 17:08, Jennifer Schmitz wrote: > > > >> On 18 Jul 2025, at 11:39, Kyrylo Tkachov wrote: >> >> External email: Use caution opening links or attachments >> >> >> Hi all, >> >> For insertin

Re: [PATCH 1/2] aarch64: NFC - Make vec_* rtx costing logic consistent

2025-07-18 Thread Kyrylo Tkachov
Hi Tamar, > On 18 Jul 2025, at 18:25, Tamar Christina wrote: > > Hi Kyrill, > >> -Original Message----- >> From: Kyrylo Tkachov >> Sent: Friday, July 18, 2025 10:40 AM >> To: GCC Patches >> Cc: Tamar Christina ; Richard Sandiford >> ; Alex C

[PATCH 2/2] aarch64: Allow CPU tuning to avoid INS-(W|X)ZR instructions

2025-07-18 Thread Kyrylo Tkachov
Kyrill Signed-off-by: Kyrylo Tkachov gcc/ * config/arm/aarch-common-protos.h (vector_cost_table): Add ins_gp field. Add comments to other vector cost fields. * config/aarch64/aarch64.cc (aarch64_rtx_costs): Handle VEC_MERGE case. * config/aarch64/aarch6

[PATCH 1/2] aarch64: NFC - Make vec_* rtx costing logic consistent

2025-07-18 Thread Kyrylo Tkachov
-by: Kyrylo Tkachov gcc/ * config/aarch64/aarch64.cc (aarch64_rtx_costs): Add extra_cost values only when speed is true for CONST_VECTOR, VEC_DUPLICATE, VEC_SELECT cases. * config/aarch64/aarch64-cost-tables.h (qdf24xx_extra_costs, thunderx_extra_costs

Re: [PATCH] aarch64: Use SVE2 BSL2N for vector EON

2025-07-15 Thread Kyrylo Tkachov
> On 15 Jul 2025, at 15:50, Richard Sandiford wrote: > > Kyrylo Tkachov writes: >> Hi all, >> >> SVE2 BSL2N (x, y, z) = (x & z) | (~y & ~z). When x == y this computes: >> (x & z) | (~x & ~z) which is ~(x ^ z). >> Thus, we can use it

Re: [PATCH 2/2] aarch64: Relax fpm_t assert to allow const_ints [PR120986]

2025-07-15 Thread Kyrylo Tkachov
> On 15 Jul 2025, at 15:01, Alex Coplan wrote: > > Hi, > > This relaxes an overzealous assert that required the fpm_t argument to > be in DImode when expanding FP8 intrinsics. Of course this fails to > account for modeless const_ints. > > Bootstrapped/regtested on aarch64-linux-gnu, OK for

Re: [PATCH 1/2] aarch64: Fix predication of FP8 FDOT insns [PR120986]

2025-07-15 Thread Kyrylo Tkachov
Hi Alex, > On 15 Jul 2025, at 14:59, Alex Coplan wrote: > > Hi, > > The predication of the SVE2 FP8 dot product insns was relying on the > architectural dependency: > > FEAT_FP8DOT2 => FEAT_FP8DOT4 > > which was relaxed in GCC as of > r15-7480-g299a8e2dc667e795991bc439d2cad5ea5bd379e2, thus l

[PATCH] aarch64: Use SVE2 BSL2N for vector EON

2025-07-15 Thread Kyrylo Tkachov
not z0.d, p3/m, z0.d ret Bootstrapped and tested on aarch64-none-linux-gnu. Ok for trunk? Thanks, Kyrill Signed-off-by: Kyrylo Tkachov gcc/ * config/aarch64/aarch64-sve2.md (*aarch64_sve2_bsl2n_eon): New pattern. (*aarch64_sve2_eon_bsl2n_unpred)

[PATCH] aarch64: Use SVE2 NBSL for vector NOR and NAND for Advanced SIMD modes

2025-07-15 Thread Kyrylo Tkachov
nerate the MOVPRFX when the operands fall that way, but I guess having a 2-insn MOVPRFX form is not worse than the current 2-insn codegen at least, and the MOVPRFX can be fused by many cores. Bootstrapped and tested on aarch64-none-linux-gnu. Ok for trunk? Thanks, Kyrill Signed-off-by: Kyrylo Tka

Re: [PATCH 3/7] aarch64: Handle DImode BCAX operations

2025-07-15 Thread Kyrylo Tkachov
> On 8 Jul 2025, at 17:43, Richard Sandiford wrote: > > Kyrylo Tkachov writes: >> Thanks for your comments, do you mean something like the following? > > Yeah, the patch LGTM, thanks. So it turned out that doing this in the EOR3 pattern in patch 4/7 caused wrong-co

Re: [PATCH 4/7] aarch64: Use EOR3 for DImode values

2025-07-15 Thread Kyrylo Tkachov
I had pushed this patch on Friday but have reverted it on trunk now because it seems to be causing miscomputes in 531.deepsjeng_r. Thanks, Kyrill > On 8 Jul 2025, at 08:28, Tamar Christina wrote: > >> -Original Message----- >> From: Kyrylo Tkachov >> Sent: Monda

Re: [PATCH] arm: avoid gcc_s dependency

2025-07-14 Thread Kyrylo Tkachov
+ arm maintainers. Hi Pierre, > On 14 Jul 2025, at 14:07, Pierre Ossman wrote: > > Suggested fix for this issue: > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60428 > > Did not get any response there, so seeing if this is a better forum for > suggested changes. > > We've been using this

Re: [PATCH] aarch64: PR target/120999: Avoid movprfx for NBSL implementation of NOR

2025-07-11 Thread Kyrylo Tkachov
> On 11 Jul 2025, at 16:48, Richard Sandiford wrote: > > Kyrylo Tkachov writes: >>> On 10 Jul 2025, at 11:12, Kyrylo Tkachov wrote: >>> >>> >>> >>>> On 10 Jul 2025, at 10:40, Richard Sandiford >>>> wrote: >>>

Re: [PATCH] aarch64: PR target/120999: Avoid movprfx for NBSL implementation of NOR

2025-07-11 Thread Kyrylo Tkachov
> On 10 Jul 2025, at 11:12, Kyrylo Tkachov wrote: > > > >> On 10 Jul 2025, at 10:40, Richard Sandiford >> wrote: >> >> Kyrylo Tkachov writes: >>> Hi all, >>> >>> While the SVE2 NBSL instruction accepts MOVPRFX to add more f

Re: [PATCH] aarch64: PR target/120999: Avoid movprfx for NBSL implementation of NOR

2025-07-10 Thread Kyrylo Tkachov
> On 10 Jul 2025, at 10:40, Richard Sandiford wrote: > > Kyrylo Tkachov writes: >> Hi all, >> >> While the SVE2 NBSL instruction accepts MOVPRFX to add more flexibility >> due to its tied operands, the destination of the movprfx cannot be also >> a so

Re: [PATCH] aarch64: Add support for NVIDIA GB10

2025-07-10 Thread Kyrylo Tkachov
> On 18 Jun 2025, at 17:26, Kyrylo Tkachov wrote: > > Hi all, > > This adds support for -mcpu=gb10. This is a big.LITTLE configuration > involving Cortex-X925 and Cortex-A725 cores. The appropriate MIDR numbers > are added to detect them in -mcpu=native. We did not add a

[PATCH] aarch64: PR target/120999: Avoid movprfx for NBSL implementation of NOR

2025-07-10 Thread Kyrylo Tkachov
nbsl z0.d, z0.d, z2.d, z0.d ret which generated a gas warning. Bootstrapped and tested on aarch64-none-linux-gnu. Ok for trunk? Do we want to backport it? Thanks, Kyrill Signed-off-by: Kyrylo Tkachov gcc/ PR target/120999 * config/aarch64/aarch64-sve2.md (*aa

Re: [PATCH] Change bellow in comments to below

2025-07-10 Thread Kyrylo Tkachov
> On 10 Jul 2025, at 08:09, Jakub Jelinek wrote: > > Hi! > > While I'm not a native English speaker, I believe all the uses > of bellow (roar/bark/...) in comments in gcc are meant to be > below (beneath/under/...). > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? > >

Re: [PATCH] aarch64: Implement sme2+faminmax extension.

2025-07-09 Thread Kyrylo Tkachov
Hi Alfie, > On 7 Jul 2025, at 10:46, Alfie Richards wrote: > > Hello all, > > This patch implements the couple of amin/amax instructions that are part of > SME2 + faminmax. > > Regression testsed and bootstrapped for Aarch64. > > Thanks, > Alfie > > -- >8 -- > > Implements the sme2+faminmax

Re: [PATCH 3/7] aarch64: Handle DImode BCAX operations

2025-07-08 Thread Kyrylo Tkachov
> On 8 Jul 2025, at 12:39, Tamar Christina wrote: > >> -Original Message- >> From: Richard Sandiford >> Sent: Tuesday, July 8, 2025 10:07 AM >> To: Tamar Christina >> Cc: Kyrylo Tkachov ; GCC Patches > patc...@gcc.gnu.org>; Richard

Re: [PATCH] aarch64: Improve popcountti2 with SVE

2025-07-07 Thread Kyrylo Tkachov
> On 7 Jul 2025, at 13:27, Richard Sandiford wrote: > > Tamar Christina writes: >>> -Original Message- >>> From: Kyrylo Tkachov >>> Sent: Monday, July 7, 2025 10:38 AM >>> To: GCC Patches >>> Cc: Richard Sandiford ; Richard Earns

[PATCH 6/7] aarch64: Use SVE2 BSL1N for DImode arguments

2025-07-07 Thread Kyrylo Tkachov
for trunk? Thanks, Kyrill Signed-off-by: Kyrylo Tkachov gcc/ * config/aarch64/aarch64-sve2.md (*aarch64_sve2_bsl1n_unpreddi): New define_insn_and_split. gcc/testsuite/ * gcc.target/aarch64/sve2/bsl1n_d.c: New test. 0006-aarch64-Use-SVE2-BSL1N-for-DImode-arguments.patch

[PATCH 4/7] aarch64: Use EOR3 for DImode values

2025-07-07 Thread Kyrylo Tkachov
x1_t a, uint64x1_t b, uint64x1_t c) { return EOR3 (a, b, c); } We generate the desired: eor3_d_gp: eor x1, x1, x2 eor x0, x1, x0 ret eor3_d: eor3 v0.16b, v0.16b, v1.16b, v2.16b ret Bootstrapped and tested on aarch64-none-linux-gnu. Ok for trunk? Thanks, Kyrill Signed-off-by: Kyrylo Tkachov

[PATCH 7/7] aarch64: Use BSL2N for DImode operands

2025-07-07 Thread Kyrylo Tkachov
ested on aarch64-none-linux-gnu. Ok for trunk? Thanks, Kyrill Signed-off-by: Kyrylo Tkachov gcc/ * config/aarch64/aarch64-sve2.md (*aarch64_sve2_bsl2n_unpreddi): New define_insn_and_split. * config/aarch64/aarch64.cc (aarch64_bsl2n_rtx_form_p): Define. (aarch64_rt

[PATCH 5/7] aarch64: Use SVE2 NBSL for DImode arguments

2025-07-07 Thread Kyrylo Tkachov
trunk? Thanks, Kyrill Signed-off-by: Kyrylo Tkachov gcc/ * config/aarch64/aarch64-sve.md (*aarch64_sve2_nbsl_unpreddi): New define_insn_and_split. gcc/testsuite/ * gcc.target/aarch64/sve2/nbsl_d.c: New test. 0005-aarch64-Use-SVE2-NBSL-for-DImode-arguments.patch Description:

[PATCH 2/7] aarch64: Use EOR3 for 64-bit vector modes

2025-07-07 Thread Kyrylo Tkachov
of: bcax_s: eor v1.8b, v1.8b, v2.8b eor v0.8b, v1.8b, v0.8b ret Bootstrapped and tested on aarch64-none-linux-gnu. Ok for trunk? Thanks, Kyrill Signed-off-by: Kyrylo Tkachov gcc/ * config/aarch64/aarch64-simd.md (eor3q4): Use VDQ_I mode iterator. gcc/testsuite

[PATCH 3/7] aarch64: Handle DImode BCAX operations

2025-07-07 Thread Kyrylo Tkachov
b ret When the inputs are in SIMD regs we use BCAX and when they are in GP regs we don't force them to SIMD with extra moves. Bootstrapped and tested on aarch64-none-linux-gnu. Ok for trunk? Thanks, Kyrill Signed-off-by: Kyrylo Tkachov gcc/ * config/aarch64/aarch64-simd

[PATCH 1/7] aarch64: Allow 64-bit vector modes in pattern for BCAX instruction

2025-07-07 Thread Kyrylo Tkachov
rovement always. Bootstrapped and tested on aarch64-none-linux-gnu. Ok for trunk? Thanks, Kyrill Signed-off-by: Kyrylo Tkachov gcc/ * config/aarch64/aarch64-simd.md (bcaxq4): Use VDQ_I mode iterator. gcc/testsuite/ * gcc.target/aarch64/simd/bcax_d.c: New test. 0001-a

Re: [PATCH 0/7] Improve bit-manipulation SIMD codegen for 64-bit types

2025-07-07 Thread Kyrylo Tkachov
Resending due to difficulties with my email > On 7 Jul 2025, at 11:56, Kyrylo Tkachov wrote: > > Hi all, > > This series improves code generation for 64-bit vector types as well as the > scalar DImode types. > It makes use of SHA3 and SVE2 instructions like BCAX, EOR3

[PATCH] aarch64: Improve popcountti2 with SVE

2025-07-07 Thread Kyrylo Tkachov
cheap itself and can be scheduled away from the critical path or even CSE'd with other PTRUE constants. As this sequence is larger code size-wise it is avoided for -Os. Bootstrapped and tested on aarch64-none-linux-gnu. Ok for trunk? Thanks, Kyrill Signed-off-by: Kyrylo Tkachov

Re: [PATCH 2/2] aarch64: Drop const_int from aarch64_maskload_else_operand

2025-07-02 Thread Kyrylo Tkachov
> On 1 Jul 2025, at 18:37, Alex Coplan wrote: > > The "else operand" to maskload should always be a const_vector, never a > const_int. > > This was just an issue I noticed while looking through the code, I don't > have a testcase which shows a concrete problem due to this. > > Testing of tha

Re: [PATCH] aarch64: Enable selective LDAPUR generation for cores with RCPC2

2025-07-01 Thread Kyrylo Tkachov
> On 1 Jul 2025, at 17:36, Richard Sandiford wrote: > > Soumya AR writes: >> From 2a2c3e3683aaf3041524df166fc6f8cf20895a0b Mon Sep 17 00:00:00 2001 >> From: Soumya AR >> Date: Mon, 30 Jun 2025 12:17:30 -0700 >> Subject: [PATCH] aarch64: Enable selective LDAPUR generation for cores with >> RCP

Re: [PATCH] aarch64: Sync `aarch64-sys-regs.def' with Binutils.

2025-07-01 Thread Kyrylo Tkachov
> On 17 Jun 2025, at 12:19, Kyrylo Tkachov wrote: > > > >> On 4 Apr 2025, at 20:28, ezra.sito...@arm.com wrote: >> >> From: Ezra Sitorus >> >> This patch updates `aarch64-sys-regs.def', bringing it into sync with >> the Binutil

[PATCH] aarch64: Add support for NVIDIA GB10

2025-06-18 Thread Kyrylo Tkachov
trunk and GCC 15 when I’m back. Thanks, Kyrill Signed-off-by: Kyrylo Tkachov gcc/ * config/aarch64/aarch64-cores.def (gb10): New entry. * config/aarch64/aarch64-tune.md: Regenerate. * doc/invoke.texi (AArch64 Options): Document the above. 0001-aarch64-Add-support-for

Re: [PATCH] aarch64: Add vec_set/extract for tuple modes [PR113027]

2025-06-17 Thread Kyrylo Tkachov
> On 16 Jun 2025, at 09:54, Richard Sandiford wrote: > > We generated inefficient code for bitfield references to Advanced > SIMD structure modes. In RTL, these modes are just extra-long > vectors, and so inserting and extracting an element is simply > a vec_set or vec_extract operation. > >

Re: [PATCH] aarch64: Sync `aarch64-sys-regs.def' with Binutils.

2025-06-17 Thread Kyrylo Tkachov
> On 4 Apr 2025, at 20:28, ezra.sito...@arm.com wrote: > > From: Ezra Sitorus > > This patch updates `aarch64-sys-regs.def', bringing it into sync with > the Binutils source after this change: > https://sourceware.org/pipermail/binutils/2025-March/139894.html Ok. I think these changes are co

Re: [PATCH] aarch64: Fold NOT+PTEST to NOTS [PR118150]

2025-06-13 Thread Kyrylo Tkachov
Hi Spencer, Thanks for the patch. > On 13 Jun 2025, at 14:46, Spencer Abson wrote: > > Add the missing combiner patterns for folding NOT+PTEST to NOTS when > they share the same GP. > I guess GP here means “governing predicate”? GP usually means “General Purpose (register)” in aarch64 so it’d

Re: [PATCH] AArch64 SIMD: convert mvn+shrn into mvni+subhn

2025-06-12 Thread Kyrylo Tkachov
> On 12 Jun 2025, at 18:20, Remi Machet wrote: > > > On 6/12/25 12:02, Richard Sandiford wrote: >> External email: Use caution opening links or attachments >> >> >> Remi Machet writes: >>> Add an optimization to aarch64 SIMD converting mvn+shrn into mvni+subhn >>> which >>> allows for bett

Re: [PATCH] AArch64 SIMD: convert mvn+shrn into mvni+subhn

2025-06-12 Thread Kyrylo Tkachov
> On 12 Jun 2025, at 18:02, Richard Sandiford wrote: > > Remi Machet writes: >> Add an optimization to aarch64 SIMD converting mvn+shrn into mvni+subhn >> which >> allows for better optimization when the code is inside a loop by using a >> constant. >> >> Bootstrapped and regtested on aarch6

Re: [PATCH] aarch64: Incorrect removal of ZA restore [PR120624]

2025-06-12 Thread Kyrylo Tkachov
> On 11 Jun 2025, at 16:22, Richard Sandiford wrote: > > The PCS defines a lazy save scheme for managing ZA across normal > "private-ZA" functions. GCC currently uses this scheme for calls > to all private-ZA functions (rather than using caller-save). > > Therefore, before a sequence of call

Re: AArch64 promote aarch64-autovec-peference to mautovec-preference

2025-06-03 Thread Kyrylo Tkachov
> On 3 Jun 2025, at 17:56, Richard Sandiford wrote: > > Tamar Christina writes: >> As requested in my patch for -mmax-vectorization this promotes the parameter >> --param aarch64-autovec-preference to a first class top target flag. >> >> If both the parameter and the flag is specified the par

Re: [PATCH][GCC16][GCC15] aarch64: Add support for FUJITSU-MONAKA (-mcpu=fujitsu-monaka) CPU

2025-05-29 Thread Kyrylo Tkachov
> On 28 May 2025, at 13:36, Kyrylo Tkachov wrote: > > Hi Yuta-san > >> On 23 May 2025, at 07:49, Yuta Mukai (Fujitsu) >> wrote: >> >> Hello, >> >> We would like to enable features for FUJITSU-MONAKA that were implemented in >> GC

Re: [PATCH][GCC16][GCC15] aarch64: Add support for FUJITSU-MONAKA (-mcpu=fujitsu-monaka) CPU

2025-05-28 Thread Kyrylo Tkachov
Hi Yuta-san > On 23 May 2025, at 07:49, Yuta Mukai (Fujitsu) wrote: > > Hello, > > We would like to enable features for FUJITSU-MONAKA that were implemented in > GCC after we added support for FUJITSU-MONAKA. > As the features were implemented in GCC15, we also want to backport it to > GCC15.

Re: [PATCH] [PR120276] regcprop: Replace partial_subreg_p by ordered_p && maybe_lt

2025-05-16 Thread Kyrylo Tkachov
> On 16 May 2025, at 12:35, Richard Sandiford wrote: > > Jennifer Schmitz writes: >> The ICE in PR120276 resulted from a comparison of VNx4QI and V8QI using >> partial_subreg_p in the function copy_value during the RTL pass >> regcprop, failing the assertion in >> >> inline bool >> partial_su

Re: [PATCH] aarch64: Fix narrowing warning in driver-aarch64.cc [PR118603]

2025-05-16 Thread Kyrylo Tkachov
> On 10 May 2025, at 06:17, Andrew Pinski wrote: > > Since the AARCH64_CORE defines in aarch64-cores.def all use -1 for > the variant, it is just easier to add the cast to unsigned in the usage > in driver-aarch64.cc. > > Build and tested on aarch64-linux-gnu. Ok. Thanks, Kyrill > > gcc/Ch

Re: [PATCH] aarch64: Fix narrowing warning in aarch64_detect_vector_stmt_subtype

2025-05-16 Thread Kyrylo Tkachov
> On 10 May 2025, at 05:59, Andrew Pinski wrote: > > There is a narrowing warning in aarch64_detect_vector_stmt_subtype > about gather_load_x32_cost and gather_load_x64_cost converting from int to > unsigned. > These fields are always unsigned and even the constructor for sve_vec_cost > take

Re: [PATCH 8/9] AArch64: rules for CMPBR instructions

2025-05-09 Thread Kyrylo Tkachov
> On 8 May 2025, at 21:10, Karl Meakin wrote: > > Add rules for lowering `cbranch4` to CBB/CBH/CB when > CMPBR extension is enabled. > > gcc/ChangeLog: > > * config/aarch64/aarch64.md (cbranch4): Mmit CMPBR > instructions if possible. > (BRANCH_LEN_P_1Kib): New constant. > (BRANCH_LEN_N_1Kib)

Re: [PATCH 00/13] arm: Remove iWMMXT code generation

2025-05-08 Thread Kyrylo Tkachov
Hi Richard, > On 7 May 2025, at 18:15, Richard Earnshaw wrote: > > > The header file for the Arm implementation of mmintrin.h was changed in GCC-15 > to disable access to the intrinsics. This patch removes the internal code > as well. > > We still allow -mcpu/-march options for the wmmx cpus,

Re: [PATCH 3/8] AArch64: rename branch instruction rules

2025-05-07 Thread Kyrylo Tkachov
> On 7 May 2025, at 12:27, Karl Meakin wrote: > > Give the `define_insn` rules used in lowering `cbranch4` to RTL > more descriptive and consistent names: from now on, each rule is named > after the AArch64 instruction that it generates. Also add comments to > document each rule. > > gcc/Chang

Re: [PATCH 1/8] AArch64: place branch instruction rules together

2025-05-07 Thread Kyrylo Tkachov
> On 7 May 2025, at 12:27, Karl Meakin wrote: > > The rules for conditional branches were spread throughout `aarch64.md`. > Group them together so it is easier to understand how `cbranch4` > is lowered to RTL. > > gcc/ChangeLog: > > * config/aarch64/aarch64.md (condjump): move. > (*compare_co

Re: [PATCH 0/8] AArch64: CMPBR support

2025-05-07 Thread Kyrylo Tkachov
Hi Karl, > On 7 May 2025, at 12:27, Karl Meakin wrote: > > This patch series adds support for the CMPBR extension. It includes the > new `+cmpbr` option and rules to generate the new instructions when > lowering conditional branches. Thanks for the series. You didn’t state it explicitly, but ha

Re: [PATCH 8/8] AArch64: rules for CMPBR instructions

2025-05-07 Thread Kyrylo Tkachov
> On 7 May 2025, at 12:27, Karl Meakin wrote: > > Add rules for lowering `cbranch4` to CBB/CBH/CB when CMPBR > extension is enabled. > > gcc/ChangeLog: > > * config/aarch64/aarch64.md (cbranch4): emit CMPBR > instructions if possible. > (cbranch4): new expand rule. > (aarch64_cb): likewise. >

Re: [PATCH 7/8] AArch64: precommit test for CMPBR instructions

2025-05-07 Thread Kyrylo Tkachov
> On 7 May 2025, at 12:27, Karl Meakin wrote: > > Commit the test file `cmpbr.c` before rules for generating the new > instructions are added, so that the changes in codegen are more obvious > in the next commit. I guess that’s an LLVM best practice. In GCC since we have the check-function-bod

Re: [PATCH 6/8] AArch64: recognize `+cmpbr` option

2025-05-07 Thread Kyrylo Tkachov
> On 7 May 2025, at 12:27, Karl Meakin wrote: > > Add the `+cmpbr` option to enable the FEAT_CMPBR architectural > extension. > > gcc/ChangeLog: > > * config/aarch64/aarch64-option-extensions.def (cmpbr): new > option. > * config/aarch64/aarch64.h (TARGET_CMPBR): new macro. > * doc/invoke.tex

Re: [PATCH 5/8] AArch64: make `far_branch` attribute a boolean

2025-05-07 Thread Kyrylo Tkachov
> On 7 May 2025, at 12:27, Karl Meakin wrote: > > The `far_branch` attribute only ever takes the values 0 or 1, so make it > a `no/yes` valued string attribute instead. > > gcc/ChangeLog: > > * config/aarch64/aarch64.md (far_branch): replace 0/1 with > no/yes. > (aarch64_bcond): handle renam

Re: [PATCH 4/8] AArch64: add constants for branch displacements

2025-05-07 Thread Kyrylo Tkachov
> On 7 May 2025, at 12:27, Karl Meakin wrote: > > Extract the hardcoded values for the minimum PC-relative displacements > into named constants and document them. > > gcc/ChangeLog: > > * config/aarch64/aarch64.md (BRANCH_LEN_P_128MiB): New constant. > (BRANCH_LEN_N_128MiB): likewise. > (BRA

Re: [PATCH 2/8] AArch64: reformat branch instruction rules

2025-05-07 Thread Kyrylo Tkachov
> On 7 May 2025, at 12:27, Karl Meakin wrote: > > Make the formatting of the RTL templates in the rules for branch > instructions more consistent with each other. > > gcc/ChangeLog: > > * config/aarch64/aarch64.md (cbranch4): reformat. > (cbranchcc4): likewise. > (condjump): likewise. > (*co

Re: [RFC PATCH 3/5] json: Add get_map() method to JSON object class

2025-05-07 Thread Kyrylo Tkachov
> On 6 May 2025, at 10:30, Soumya AR wrote: > > From: Soumya AR > > This patch adds a get_map () method to the JSON object class to provide access > to the underlying hash map that stores the JSON key-value pairs. > > It also reorganizes the private and public sections of the class to expos

Re: [RFC PATCH 0/5] aarch64: Support for user-defined aarch64 tuning parameters in JSON

2025-05-07 Thread Kyrylo Tkachov
In Hi Richard, > On 6 May 2025, at 12:34, Richard Sandiford wrote: > > writes: >> From: Soumya AR >> >> Hi, >> >> This RFC and subsequent patch series introduces support for printing and >> parsing >> of aarch64 tuning parameters in the form of JSON. > > Thanks for doing this. It looks r

Re: [RFC PATCH 0/2] Add target_clones profile option support

2025-05-05 Thread Kyrylo Tkachov
> On 4 May 2025, at 19:19, Yangyu Chen wrote: > > Hi everyone, > > This patch series introduces support for the target_clones profile > option in GCC. This option enables users to specify target_clones > attributes in a separate file, allowing GCC to generate multiple > versions of the functio

[AArch64] changes.html: Fix typo

2025-05-02 Thread Kyrylo Tkachov
Pushing as obvious. Signed-off-by: Kyrylo Tkachov 0001-AArch64-changes.html-Fix-typo.patch Description: 0001-AArch64-changes.html-Fix-typo.patch

Re: [PATCH v4 2/2] Aarch64: Add __sqrt and __sqrtf intrinsics and corresponding tests

2025-05-01 Thread Kyrylo Tkachov
> On 1 May 2025, at 14:02, Ayan Shafqat wrote: > > On Thu, May 01, 2025 at 08:09:18AM +0000, Kyrylo Tkachov wrote: >> >> I was going to ask why not use the standard __buuiltin_sqrt builtins but I >> guess those don’t guarantee that we avoid a libcall in

Re: [PATCH v4 2/2] Aarch64: Add __sqrt and __sqrtf intrinsics and corresponding tests

2025-05-01 Thread Kyrylo Tkachov
> On 28 Apr 2025, at 21:29, Ayan Shafqat wrote: > > Rebased with gcc 15.1 > > This patch introduces two new inline functions, __sqrt and __sqrtf, in > arm_acle.h for Aarch64 targets. These functions wrap the new builtins > __builtin_aarch64_sqrtdf and __builtin_aarch64_sqrtsf, respectively, >

Re: [PATCH v4 1/2] Aarch64: Use BUILTIN_VHSDF_HSDF for vector and scalar sqrt builtins

2025-05-01 Thread Kyrylo Tkachov
> On 28 Apr 2025, at 21:27, Ayan Shafqat wrote: > > Rebased with gcc 15.1 > > This patch changes the `sqrt` builtin definition from `BUILTIN_VHSDF_DF` > to `BUILTIN_VHSDF_HSDF` in `aarch64-simd-builtins.def`, ensuring the > builtin covers half, single, and double precision variants. The redun

Re: [PATCH] AArch64: Fold LD1/ST1 with ptrue to LDR/STR for 128-bit VLS

2025-04-28 Thread Kyrylo Tkachov
> On 25 Apr 2025, at 19:55, Richard Sandiford wrote: > > Jennifer Schmitz writes: >> If -msve-vector-bits=128, SVE loads and stores (LD1 and ST1) with a >> ptrue predicate can be replaced by neon instructions (LDR and STR), >> thus avoiding the predicate altogether. This also enables formation

Re: [PATCH v2] Document AArch64 changes for GCC 15

2025-04-25 Thread Kyrylo Tkachov
> On 25 Apr 2025, at 12:06, Richard Sandiford wrote: > > Kyrylo Tkachov writes: >> Hi Richard, >> >>> On 23 Apr 2025, at 13:47, Richard Sandiford >>> wrote: >>> >>> Thanks for all the feedback. I've tried to address it in

Re: [PATCH v2] Document AArch64 changes for GCC 15

2025-04-24 Thread Kyrylo Tkachov
> On 23 Apr 2025, at 13:47, Richard Sandiford wrote: > > Thanks for all the feedback. I've tried to address it in the version > below. I'll push later today if there are no further comments. > > Richard > > > The list is structured as: > > - new configurations > - command-line changes > -

Re: [PATCH] opts.cc Fix thinko with default handling of -flto-partition=

2025-04-24 Thread Kyrylo Tkachov
> On 24 Apr 2025, at 14:44, Jakub Jelinek wrote: > > On Thu, Apr 24, 2025 at 12:39:59PM +0000, Kyrylo Tkachov wrote: >>> The third case looks undesirable, -fno-ipa-reorder-for-locality is the >>> default and shouldn't affect anything, whether explicit or im

Re: [PATCH] opts.cc Fix thinko with default handling of -flto-partition=

2025-04-24 Thread Kyrylo Tkachov
> On 24 Apr 2025, at 14:28, Jakub Jelinek wrote: > > On Thu, Apr 24, 2025 at 12:05:06PM +0000, Kyrylo Tkachov wrote: >>>>> On 24 Apr 2025, at 12:09, Jakub Jelinek wrote: >>>>> >>>>> On Thu, Apr 24, 2025 at 09:54:09AM +, Kyrylo T

Re: [PATCH] opts.cc Fix thinko with default handling of -flto-partition=

2025-04-24 Thread Kyrylo Tkachov
> On 24 Apr 2025, at 12:18, Jakub Jelinek wrote: > > On Thu, Apr 24, 2025 at 10:15:08AM +0000, Kyrylo Tkachov wrote: >> >> >>> On 24 Apr 2025, at 12:09, Jakub Jelinek wrote: >>> >>> On Thu, Apr 24, 2025 at 09:54:09AM +, Kyrylo Tkach

Re: [PATCH] opts.cc Fix thinko with default handling of -flto-partition=

2025-04-24 Thread Kyrylo Tkachov
> On 24 Apr 2025, at 12:09, Jakub Jelinek wrote: > > On Thu, Apr 24, 2025 at 09:54:09AM +0000, Kyrylo Tkachov wrote: >>> I'd have expected instead of the LTO_PARTITION_DEFAULT checks one should be >>> testing !opts_set->x_flag_lto_partition (i.e. -flto-p

Re: [PATCH] opts.cc Fix thinko with default handling of -flto-partition=

2025-04-24 Thread Kyrylo Tkachov
lt >>> up to that point. We should also be testing opts instead of opts_set here. >>> >>> Bootstrapped and tested on aarch64-none-linux-gnu. >>> >>> Ok for trunk? Sorry for the late patch, but I guess we want this in the GCC >>> 15 branch as

[PATCH] opts.cc Fix thinko with default handling of -flto-partition=

2025-04-24 Thread Kyrylo Tkachov
instead of opts_set here. Bootstrapped and tested on aarch64-none-linux-gnu. Ok for trunk? Sorry for the late patch, but I guess we want this in the GCC 15 branch as well. Thanks, Kyrill Signed-off-by: Kyrylo Tkachov gcc/ * opts.cc (finish_options): Check for == against

Re: [PATCH] Introduce -flto-partition=locality

2025-04-24 Thread Kyrylo Tkachov
gt;> opts_set->x_flag_lto_partition = opts->x_flag_lto_partition = >> LTO_PARTITION_BALANCED; > Hmm, yes I think the condition should be == instead of !=. I’ll test a patch momentarily. Thanks, Kyrill > Regards, > Feng > > From:

Re: [PATCH]middle-end: Add new "max" vector cost model

2025-04-23 Thread Kyrylo Tkachov
> On 23 Apr 2025, at 08:37, Tamar Christina wrote: > > Hi All, > > This patch proposes a new vector cost model called "max". The cost model is > an > intersection between two of our existing cost models. Like `unlimited` it > disables the costing vs scalar and assumes all vectorization to

Re: [PATCH] Document AArch64 changes for GCC 15

2025-04-22 Thread Kyrylo Tkachov
> On 22 Apr 2025, at 15:31, Tamar Christina wrote: > >> -Original Message- >> From: Richard Sandiford >> Sent: Tuesday, April 22, 2025 2:28 PM >> To: Tamar Christina >> Cc: gcc-patches@gcc.gnu.org; Richard Earnshaw ; >> ktkac...@nvidia.com >> Subject: Re: [PATCH] Document AArch64 cha

[PATCH] aarch64: Update FP8 dependencies for -mcpu=olympus

2025-04-22 Thread Kyrylo Tkachov
ed on aarch64-none-linux-gnu. I’m pushing this to trunk, is it also ok for the GCC 15 branch? I’d like to have the right CPU features enabled for the realease. Thanks, Kyrill Signed-off-by: Kyrylo Tkachov gcc/ * config/aarch64/aarch64-cores.def (olympus): Add fp8fma, fp8dot4 expli

[PATCH] Document locality partitioning params in invoke.texi

2025-04-22 Thread Kyrylo Tkachov
: Kyrylo Tkachov * invoke.texi (lto-partition-locality-frequency-cutoff, lto-partition-locality-size-cutoff, lto-max-locality-partition): Document. 0001-Document-locality-partitioning-params-in-invoke.texi.patch Description: 0001-Document-locality-partitioning-params-in

Regenerate common.opt.urls

2025-04-15 Thread Kyrylo Tkachov
Pushing as obvious. Thanks, Kyrill Signed-off-by: Kyrylo Tkachov * common.opt.urls: Regenerate. 0001-Regenerate-common.opt.urls.patch Description: 0001-Regenerate-common.opt.urls.patch

Re: [PATCH] Locality cloning pass (was: Introduce -flto-partition=locality)

2025-04-15 Thread Kyrylo Tkachov
> On 15 Apr 2025, at 15:42, Richard Biener wrote: > > On Mon, Apr 14, 2025 at 3:11 PM Kyrylo Tkachov wrote: >> >> Hi Honza, >> >>> On 13 Apr 2025, at 23:19, Jan Hubicka wrote: >>> >>>> +@opindex fipa-reorder-for-locality >>>

Re: [PATCH] AArch64: Fix operands order in vec_extract expander

2025-04-14 Thread Kyrylo Tkachov
Hi Tejas, > On 14 Apr 2025, at 16:04, Tejas Belagod wrote: > > The operand order to gen_vcond_mask call in the vec_extract pattern is wrong. > Fix the order where predicate is operand 3. > > Tested and bootstrapped on aarch64-linux-gnu. OK for trunk? > > gcc/ChangeLog > > * config/aarch64/aar

Re: [PATCH] Locality cloning pass (was: Introduce -flto-partition=locality)

2025-04-14 Thread Kyrylo Tkachov
Hi Honza, > On 13 Apr 2025, at 23:19, Jan Hubicka wrote: > >> +@opindex fipa-reorder-for-locality >> +@item -fipa-reorder-for-locality >> +Group call chains close together in the binary layout to improve code code >> +locality. This option is incompatible with an explicit >> +@option{-flto-part

Re: [PATCH] Locality cloning pass (was: Introduce -flto-partition=locality)

2025-04-10 Thread Kyrylo Tkachov
> On 26 Mar 2025, at 08:42, Kyrylo Tkachov wrote: > > Ping. Ping. https://gcc.gnu.org/pipermail/gcc-patches/2025-March/676958.html I’ve ran a profiled LTO bootstrap of GCC with the new bootstrap-lto-locality bootstrap config And compared it against a GCC produced by the exi

Re: [PATCH v2] aarch64, Darwin: Initial implementation of Apple cores [PR113257].

2025-04-07 Thread Kyrylo Tkachov
> On 7 Apr 2025, at 10:21, Tamar Christina wrote: > >> -Original Message----- >> From: Kyrylo Tkachov >> Sent: Monday, March 31, 2025 1:43 PM >> To: i...@sandoe.co.uk >> Cc: Tamar Christina ; GCC Patches > patc...@gcc.gnu.org>; Alice Carlotti ;

Re: [PATCH] PR middle-end/119442: expr.cc: Fix vec_duplicate into vector boolean modes

2025-04-05 Thread Kyrylo Tkachov
> On 31 Mar 2025, at 09:43, Richard Biener wrote: > > On Mon, Mar 31, 2025 at 9:41 AM Richard Biener > wrote: >> >> On Mon, Mar 31, 2025 at 9:36 AM Kyrylo Tkachov wrote: >>> >>> Ping. >> >> Can you reference the patch please? I'

[PATCH] aarch64: Deprecate -march= for the month of April

2025-04-05 Thread Kyrylo Tkachov
Hi all, As we're starting a new month, introduce a more appropriate -mapril= to specify the compilation target instead. This helps keep GCC more up to date with the passage of time. Bootstrapped and tested on aarch64-none-linux-gnu. Signed-off-by: Kyrylo Tkachov gcc/ * config/aa

Re: [PATCH v2] aarch64, Darwin: Initial implementation of Apple cores [PR113257].

2025-03-31 Thread Kyrylo Tkachov
Hi Iain, > On 22 Mar 2025, at 15:31, Iain Sandoe wrote: > > 0. Sorry this has taken some time to close off; partly because of waiting > for input, but mostly that I've been stretched with other work. > 1. As per the commit message, the apparent non-conformance with 8.5/6 > because FEAT_SPECR

Re: [PATCH] PR middle-end/119442: expr.cc: Fix vec_duplicate into vector boolean modes

2025-03-31 Thread Kyrylo Tkachov
Ping. Thanks, Kyrill > On 24 Mar 2025, at 14:28, Kyrylo Tkachov wrote: > > Hi all, > > In this testcase GCC tries to expand a VNx4BI vector: > vector(4) _40; > _39 = () _24; > _40 = {_39, _39, _39, _39}; > > This ends up in a scalarised sequence of bitfiel

Re: [PATCH] Locality cloning pass (was: Introduce -flto-partition=locality)

2025-03-26 Thread Kyrylo Tkachov
Ping. Thanks, Kyrill > On 6 Mar 2025, at 09:25, Kyrylo Tkachov wrote: > > Hi all, > > Implement partitioning and cloning in the callgraph to help locality. > A new -fipa-reorder-for-locality flag is used to enable this. > The majority of the logic is in the new IPA

[PATCH] PR middle-end/119442: expr.cc: Fix vec_duplicate into vector boolean modes

2025-03-24 Thread Kyrylo Tkachov
bfis are gone. Bootstrapped and tested on aarch64-none-linux-gnu. Given this a regression from GCC 13 is this ok for trunk now? Thanks, Kyrill Signed-off-by: Kyrylo Tkachov gcc/ PR middle-end/119442 * expr.cc (store_constructor): Also allow element modes explicitly accepted by

Re: [PATCH] aarch64: Add support for -mcpu=olympus

2025-03-21 Thread Kyrylo Tkachov
Hi Dhruv, > On 21 Mar 2025, at 11:11, Dhruv Chawla wrote: > > This adds support for the NVIDIA Olympus core to the AArch64 backend. The > initial patch does not add any special tuning decisions, and those may come > later. > > Bootstrapped and tested on aarch64-none-linux-gnu. > Thanks, given

[PATCH] aarch64: Add +sve2p1 to -march=armv9.4-a flags

2025-03-19 Thread Kyrylo Tkachov
g to trunk. Thanks, Kyrill Signed-off-by: Kyrylo Tkachov gcc/ * config/aarch64/aarch64-arches.def (...): Add SVE2p1. * doc/invoke.texi (AArch64 Options): Document +sve2p1 in -march=armv9.4-a. 0001-aarch64-Add-sve2p1-to-march-armv9.4-a-flags.patch Description: 0001-a

Re: [PATCH v3 1/2] Aarch64: Add FMA and FMAF intrinsic and corresponding tests

2025-03-17 Thread Kyrylo Tkachov
> On 16 Mar 2025, at 20:15, Ayan Shafqat wrote: > > This patch introduces inline definitions for the __fma and __fmaf > functions in arm_acle.h for Aarch64 targets. These definitions rely on > __builtin_fma and __builtin_fmaf to ensure proper inlining and to meet > the ACLE requirements [1]. >

Re: [PATCH 1/2] aarch64: Add FMA and FMAF intrinsics and tests

2025-03-13 Thread Kyrylo Tkachov
Hi Ayan, > On 11 Mar 2025, at 14:53, Ayan Shafqat wrote: > > Hello Kyrylo, > > On Tue, Mar 11, 2025 at 08:55:46AM +, Kyrylo Tkachov wrote: >> This looks ok to me. >> GCC is currently in a regression fixing stage so normally such a change >> would wait u

Re: [PATCH 1/2] aarch64: Add FMA and FMAF intrinsics and tests

2025-03-11 Thread Kyrylo Tkachov
Hi Ayan, > On 9 Mar 2025, at 21:46, Ayan Shafqat wrote: > > This patch introduces inline definitions for the __fma and __fmaf > functions in arm_acle.h for AArch64 targets. These definitions rely on > __builtin_fma and __builtin_fmaf to ensure proper inlining and to meet > the ACLE requirements

[PATCH] Locality cloning pass (was: Introduce -flto-partition=locality)

2025-03-06 Thread Kyrylo Tkachov
ality, but we'd appreciate wider performance evaluation. Bootstrapped and tested on aarch64-none-linux-gnu. Ok for mainline? Thanks, Kyrill Signed-off-by: Prachi Godbole Co-authored-by: Kyrylo Tkachov config/ChangeLog: * bootstrap-lto-locality.mk: New file. gcc

Re: [PATCH] Introduce -flto-partition=locality

2025-03-06 Thread Kyrylo Tkachov
both (normal LTO bootstrap and profiledbootstrap). >> >> With this optimization we are seeing good performance gains on some large >> internal workloads that stress the parts of the processor that is sensitive >> to code locality, but we'd appreciate wider performance eva

[PATCH] PR rtl-optimization/119046: aarch64: Fix PARALLEL mode for vec_perm DUP expansion

2025-03-05 Thread Kyrylo Tkachov
. Bootstrapped and tested on aarch64-none-linux-gnu. Pushing to trunk. Thanks, Kyrill Signed-off-by: Kyrylo Tkachov PR rtl-optimization/119046 * config/aarch64/aarch64.cc (aarch64_evpc_dup): Use VOIDmode for PARALLEL. 0001-PR-rtl-optimization-119046-aarch64-Fix-PARALLEL

Re: [PATCH][v2] PR rtl-optimization/119046: Don't mark PARALLEL RTXes with floating-point mode as trapping

2025-03-05 Thread Kyrylo Tkachov
> On 5 Mar 2025, at 11:14, Richard Biener wrote: > > On Tue, Mar 4, 2025 at 10:01 PM Richard Sandiford > wrote: >> >> Kyrylo Tkachov writes: >>> Hi all, >>> >>> In this testcase late-combine was failing to merge: >>> dup v31.4s

Re: AArch64: Turn off outline atomics with -mcmodel=large (PR112465)

2025-03-04 Thread Kyrylo Tkachov
> On 3 Mar 2025, at 19:52, Wilco Dijkstra wrote: > > > Outline atomics is not designed to be used with -mcmodel=large, so disable > it automatically if the large code model is used. > > Passes regress, OK for commit? > This restriction should be documented in invoke.texi IMO. I also think i

  1   2   3   4   5   6   7   8   9   10   >