Re: [PATCH][GCC16][GCC15] aarch64: Add support for FUJITSU-MONAKA (-mcpu=fujitsu-monaka) CPU

2025-05-29 Thread Kyrylo Tkachov
> On 28 May 2025, at 13:36, Kyrylo Tkachov wrote: > > Hi Yuta-san > >> On 23 May 2025, at 07:49, Yuta Mukai (Fujitsu) >> wrote: >> >> Hello, >> >> We would like to enable features for FUJITSU-MONAKA that were implemented in >> GC

Re: [PATCH][GCC16][GCC15] aarch64: Add support for FUJITSU-MONAKA (-mcpu=fujitsu-monaka) CPU

2025-05-28 Thread Kyrylo Tkachov
Hi Yuta-san > On 23 May 2025, at 07:49, Yuta Mukai (Fujitsu) wrote: > > Hello, > > We would like to enable features for FUJITSU-MONAKA that were implemented in > GCC after we added support for FUJITSU-MONAKA. > As the features were implemented in GCC15, we also want to backport it to > GCC15.

Re: [PATCH] [PR120276] regcprop: Replace partial_subreg_p by ordered_p && maybe_lt

2025-05-16 Thread Kyrylo Tkachov
> On 16 May 2025, at 12:35, Richard Sandiford wrote: > > Jennifer Schmitz writes: >> The ICE in PR120276 resulted from a comparison of VNx4QI and V8QI using >> partial_subreg_p in the function copy_value during the RTL pass >> regcprop, failing the assertion in >> >> inline bool >> partial_su

Re: [PATCH] aarch64: Fix narrowing warning in driver-aarch64.cc [PR118603]

2025-05-16 Thread Kyrylo Tkachov
> On 10 May 2025, at 06:17, Andrew Pinski wrote: > > Since the AARCH64_CORE defines in aarch64-cores.def all use -1 for > the variant, it is just easier to add the cast to unsigned in the usage > in driver-aarch64.cc. > > Build and tested on aarch64-linux-gnu. Ok. Thanks, Kyrill > > gcc/Ch

Re: [PATCH] aarch64: Fix narrowing warning in aarch64_detect_vector_stmt_subtype

2025-05-16 Thread Kyrylo Tkachov
> On 10 May 2025, at 05:59, Andrew Pinski wrote: > > There is a narrowing warning in aarch64_detect_vector_stmt_subtype > about gather_load_x32_cost and gather_load_x64_cost converting from int to > unsigned. > These fields are always unsigned and even the constructor for sve_vec_cost > take

Re: [PATCH 8/9] AArch64: rules for CMPBR instructions

2025-05-09 Thread Kyrylo Tkachov
> On 8 May 2025, at 21:10, Karl Meakin wrote: > > Add rules for lowering `cbranch4` to CBB/CBH/CB when > CMPBR extension is enabled. > > gcc/ChangeLog: > > * config/aarch64/aarch64.md (cbranch4): Mmit CMPBR > instructions if possible. > (BRANCH_LEN_P_1Kib): New constant. > (BRANCH_LEN_N_1Kib)

Re: [PATCH 00/13] arm: Remove iWMMXT code generation

2025-05-08 Thread Kyrylo Tkachov
Hi Richard, > On 7 May 2025, at 18:15, Richard Earnshaw wrote: > > > The header file for the Arm implementation of mmintrin.h was changed in GCC-15 > to disable access to the intrinsics. This patch removes the internal code > as well. > > We still allow -mcpu/-march options for the wmmx cpus,

Re: [PATCH 3/8] AArch64: rename branch instruction rules

2025-05-07 Thread Kyrylo Tkachov
> On 7 May 2025, at 12:27, Karl Meakin wrote: > > Give the `define_insn` rules used in lowering `cbranch4` to RTL > more descriptive and consistent names: from now on, each rule is named > after the AArch64 instruction that it generates. Also add comments to > document each rule. > > gcc/Chang

Re: [PATCH 1/8] AArch64: place branch instruction rules together

2025-05-07 Thread Kyrylo Tkachov
> On 7 May 2025, at 12:27, Karl Meakin wrote: > > The rules for conditional branches were spread throughout `aarch64.md`. > Group them together so it is easier to understand how `cbranch4` > is lowered to RTL. > > gcc/ChangeLog: > > * config/aarch64/aarch64.md (condjump): move. > (*compare_co

Re: [PATCH 0/8] AArch64: CMPBR support

2025-05-07 Thread Kyrylo Tkachov
Hi Karl, > On 7 May 2025, at 12:27, Karl Meakin wrote: > > This patch series adds support for the CMPBR extension. It includes the > new `+cmpbr` option and rules to generate the new instructions when > lowering conditional branches. Thanks for the series. You didn’t state it explicitly, but ha

Re: [PATCH 8/8] AArch64: rules for CMPBR instructions

2025-05-07 Thread Kyrylo Tkachov
> On 7 May 2025, at 12:27, Karl Meakin wrote: > > Add rules for lowering `cbranch4` to CBB/CBH/CB when CMPBR > extension is enabled. > > gcc/ChangeLog: > > * config/aarch64/aarch64.md (cbranch4): emit CMPBR > instructions if possible. > (cbranch4): new expand rule. > (aarch64_cb): likewise. >

Re: [PATCH 7/8] AArch64: precommit test for CMPBR instructions

2025-05-07 Thread Kyrylo Tkachov
> On 7 May 2025, at 12:27, Karl Meakin wrote: > > Commit the test file `cmpbr.c` before rules for generating the new > instructions are added, so that the changes in codegen are more obvious > in the next commit. I guess that’s an LLVM best practice. In GCC since we have the check-function-bod

Re: [PATCH 6/8] AArch64: recognize `+cmpbr` option

2025-05-07 Thread Kyrylo Tkachov
> On 7 May 2025, at 12:27, Karl Meakin wrote: > > Add the `+cmpbr` option to enable the FEAT_CMPBR architectural > extension. > > gcc/ChangeLog: > > * config/aarch64/aarch64-option-extensions.def (cmpbr): new > option. > * config/aarch64/aarch64.h (TARGET_CMPBR): new macro. > * doc/invoke.tex

Re: [PATCH 5/8] AArch64: make `far_branch` attribute a boolean

2025-05-07 Thread Kyrylo Tkachov
> On 7 May 2025, at 12:27, Karl Meakin wrote: > > The `far_branch` attribute only ever takes the values 0 or 1, so make it > a `no/yes` valued string attribute instead. > > gcc/ChangeLog: > > * config/aarch64/aarch64.md (far_branch): replace 0/1 with > no/yes. > (aarch64_bcond): handle renam

Re: [PATCH 4/8] AArch64: add constants for branch displacements

2025-05-07 Thread Kyrylo Tkachov
> On 7 May 2025, at 12:27, Karl Meakin wrote: > > Extract the hardcoded values for the minimum PC-relative displacements > into named constants and document them. > > gcc/ChangeLog: > > * config/aarch64/aarch64.md (BRANCH_LEN_P_128MiB): New constant. > (BRANCH_LEN_N_128MiB): likewise. > (BRA

Re: [PATCH 2/8] AArch64: reformat branch instruction rules

2025-05-07 Thread Kyrylo Tkachov
> On 7 May 2025, at 12:27, Karl Meakin wrote: > > Make the formatting of the RTL templates in the rules for branch > instructions more consistent with each other. > > gcc/ChangeLog: > > * config/aarch64/aarch64.md (cbranch4): reformat. > (cbranchcc4): likewise. > (condjump): likewise. > (*co

Re: [RFC PATCH 3/5] json: Add get_map() method to JSON object class

2025-05-07 Thread Kyrylo Tkachov
> On 6 May 2025, at 10:30, Soumya AR wrote: > > From: Soumya AR > > This patch adds a get_map () method to the JSON object class to provide access > to the underlying hash map that stores the JSON key-value pairs. > > It also reorganizes the private and public sections of the class to expos

Re: [RFC PATCH 0/5] aarch64: Support for user-defined aarch64 tuning parameters in JSON

2025-05-07 Thread Kyrylo Tkachov
In Hi Richard, > On 6 May 2025, at 12:34, Richard Sandiford wrote: > > writes: >> From: Soumya AR >> >> Hi, >> >> This RFC and subsequent patch series introduces support for printing and >> parsing >> of aarch64 tuning parameters in the form of JSON. > > Thanks for doing this. It looks r

Re: [RFC PATCH 0/2] Add target_clones profile option support

2025-05-05 Thread Kyrylo Tkachov
> On 4 May 2025, at 19:19, Yangyu Chen wrote: > > Hi everyone, > > This patch series introduces support for the target_clones profile > option in GCC. This option enables users to specify target_clones > attributes in a separate file, allowing GCC to generate multiple > versions of the functio

[AArch64] changes.html: Fix typo

2025-05-02 Thread Kyrylo Tkachov
Pushing as obvious. Signed-off-by: Kyrylo Tkachov 0001-AArch64-changes.html-Fix-typo.patch Description: 0001-AArch64-changes.html-Fix-typo.patch

Re: [PATCH v4 2/2] Aarch64: Add __sqrt and __sqrtf intrinsics and corresponding tests

2025-05-01 Thread Kyrylo Tkachov
> On 1 May 2025, at 14:02, Ayan Shafqat wrote: > > On Thu, May 01, 2025 at 08:09:18AM +0000, Kyrylo Tkachov wrote: >> >> I was going to ask why not use the standard __buuiltin_sqrt builtins but I >> guess those don’t guarantee that we avoid a libcall in

Re: [PATCH v4 2/2] Aarch64: Add __sqrt and __sqrtf intrinsics and corresponding tests

2025-05-01 Thread Kyrylo Tkachov
> On 28 Apr 2025, at 21:29, Ayan Shafqat wrote: > > Rebased with gcc 15.1 > > This patch introduces two new inline functions, __sqrt and __sqrtf, in > arm_acle.h for Aarch64 targets. These functions wrap the new builtins > __builtin_aarch64_sqrtdf and __builtin_aarch64_sqrtsf, respectively, >

Re: [PATCH v4 1/2] Aarch64: Use BUILTIN_VHSDF_HSDF for vector and scalar sqrt builtins

2025-05-01 Thread Kyrylo Tkachov
> On 28 Apr 2025, at 21:27, Ayan Shafqat wrote: > > Rebased with gcc 15.1 > > This patch changes the `sqrt` builtin definition from `BUILTIN_VHSDF_DF` > to `BUILTIN_VHSDF_HSDF` in `aarch64-simd-builtins.def`, ensuring the > builtin covers half, single, and double precision variants. The redun

Re: [PATCH] AArch64: Fold LD1/ST1 with ptrue to LDR/STR for 128-bit VLS

2025-04-28 Thread Kyrylo Tkachov
> On 25 Apr 2025, at 19:55, Richard Sandiford wrote: > > Jennifer Schmitz writes: >> If -msve-vector-bits=128, SVE loads and stores (LD1 and ST1) with a >> ptrue predicate can be replaced by neon instructions (LDR and STR), >> thus avoiding the predicate altogether. This also enables formation

Re: [PATCH v2] Document AArch64 changes for GCC 15

2025-04-25 Thread Kyrylo Tkachov
> On 25 Apr 2025, at 12:06, Richard Sandiford wrote: > > Kyrylo Tkachov writes: >> Hi Richard, >> >>> On 23 Apr 2025, at 13:47, Richard Sandiford >>> wrote: >>> >>> Thanks for all the feedback. I've tried to address it in

Re: [PATCH v2] Document AArch64 changes for GCC 15

2025-04-24 Thread Kyrylo Tkachov
> On 23 Apr 2025, at 13:47, Richard Sandiford wrote: > > Thanks for all the feedback. I've tried to address it in the version > below. I'll push later today if there are no further comments. > > Richard > > > The list is structured as: > > - new configurations > - command-line changes > -

Re: [PATCH] opts.cc Fix thinko with default handling of -flto-partition=

2025-04-24 Thread Kyrylo Tkachov
> On 24 Apr 2025, at 14:44, Jakub Jelinek wrote: > > On Thu, Apr 24, 2025 at 12:39:59PM +0000, Kyrylo Tkachov wrote: >>> The third case looks undesirable, -fno-ipa-reorder-for-locality is the >>> default and shouldn't affect anything, whether explicit or im

Re: [PATCH] opts.cc Fix thinko with default handling of -flto-partition=

2025-04-24 Thread Kyrylo Tkachov
> On 24 Apr 2025, at 14:28, Jakub Jelinek wrote: > > On Thu, Apr 24, 2025 at 12:05:06PM +0000, Kyrylo Tkachov wrote: >>>>> On 24 Apr 2025, at 12:09, Jakub Jelinek wrote: >>>>> >>>>> On Thu, Apr 24, 2025 at 09:54:09AM +, Kyrylo T

Re: [PATCH] opts.cc Fix thinko with default handling of -flto-partition=

2025-04-24 Thread Kyrylo Tkachov
> On 24 Apr 2025, at 12:18, Jakub Jelinek wrote: > > On Thu, Apr 24, 2025 at 10:15:08AM +0000, Kyrylo Tkachov wrote: >> >> >>> On 24 Apr 2025, at 12:09, Jakub Jelinek wrote: >>> >>> On Thu, Apr 24, 2025 at 09:54:09AM +, Kyrylo Tkach

Re: [PATCH] opts.cc Fix thinko with default handling of -flto-partition=

2025-04-24 Thread Kyrylo Tkachov
> On 24 Apr 2025, at 12:09, Jakub Jelinek wrote: > > On Thu, Apr 24, 2025 at 09:54:09AM +0000, Kyrylo Tkachov wrote: >>> I'd have expected instead of the LTO_PARTITION_DEFAULT checks one should be >>> testing !opts_set->x_flag_lto_partition (i.e. -flto-p

Re: [PATCH] opts.cc Fix thinko with default handling of -flto-partition=

2025-04-24 Thread Kyrylo Tkachov
lt >>> up to that point. We should also be testing opts instead of opts_set here. >>> >>> Bootstrapped and tested on aarch64-none-linux-gnu. >>> >>> Ok for trunk? Sorry for the late patch, but I guess we want this in the GCC >>> 15 branch as

[PATCH] opts.cc Fix thinko with default handling of -flto-partition=

2025-04-24 Thread Kyrylo Tkachov
instead of opts_set here. Bootstrapped and tested on aarch64-none-linux-gnu. Ok for trunk? Sorry for the late patch, but I guess we want this in the GCC 15 branch as well. Thanks, Kyrill Signed-off-by: Kyrylo Tkachov gcc/ * opts.cc (finish_options): Check for == against

Re: [PATCH] Introduce -flto-partition=locality

2025-04-24 Thread Kyrylo Tkachov
gt;> opts_set->x_flag_lto_partition = opts->x_flag_lto_partition = >> LTO_PARTITION_BALANCED; > Hmm, yes I think the condition should be == instead of !=. I’ll test a patch momentarily. Thanks, Kyrill > Regards, > Feng > > From:

Re: [PATCH]middle-end: Add new "max" vector cost model

2025-04-23 Thread Kyrylo Tkachov
> On 23 Apr 2025, at 08:37, Tamar Christina wrote: > > Hi All, > > This patch proposes a new vector cost model called "max". The cost model is > an > intersection between two of our existing cost models. Like `unlimited` it > disables the costing vs scalar and assumes all vectorization to

Re: [PATCH] Document AArch64 changes for GCC 15

2025-04-22 Thread Kyrylo Tkachov
> On 22 Apr 2025, at 15:31, Tamar Christina wrote: > >> -Original Message- >> From: Richard Sandiford >> Sent: Tuesday, April 22, 2025 2:28 PM >> To: Tamar Christina >> Cc: gcc-patches@gcc.gnu.org; Richard Earnshaw ; >> ktkac...@nvidia.com >> Subject: Re: [PATCH] Document AArch64 cha

[PATCH] aarch64: Update FP8 dependencies for -mcpu=olympus

2025-04-22 Thread Kyrylo Tkachov
ed on aarch64-none-linux-gnu. I’m pushing this to trunk, is it also ok for the GCC 15 branch? I’d like to have the right CPU features enabled for the realease. Thanks, Kyrill Signed-off-by: Kyrylo Tkachov gcc/ * config/aarch64/aarch64-cores.def (olympus): Add fp8fma, fp8dot4 expli

[PATCH] Document locality partitioning params in invoke.texi

2025-04-22 Thread Kyrylo Tkachov
: Kyrylo Tkachov * invoke.texi (lto-partition-locality-frequency-cutoff, lto-partition-locality-size-cutoff, lto-max-locality-partition): Document. 0001-Document-locality-partitioning-params-in-invoke.texi.patch Description: 0001-Document-locality-partitioning-params-in

Regenerate common.opt.urls

2025-04-15 Thread Kyrylo Tkachov
Pushing as obvious. Thanks, Kyrill Signed-off-by: Kyrylo Tkachov * common.opt.urls: Regenerate. 0001-Regenerate-common.opt.urls.patch Description: 0001-Regenerate-common.opt.urls.patch

Re: [PATCH] Locality cloning pass (was: Introduce -flto-partition=locality)

2025-04-15 Thread Kyrylo Tkachov
> On 15 Apr 2025, at 15:42, Richard Biener wrote: > > On Mon, Apr 14, 2025 at 3:11 PM Kyrylo Tkachov wrote: >> >> Hi Honza, >> >>> On 13 Apr 2025, at 23:19, Jan Hubicka wrote: >>> >>>> +@opindex fipa-reorder-for-locality >>>

Re: [PATCH] AArch64: Fix operands order in vec_extract expander

2025-04-14 Thread Kyrylo Tkachov
Hi Tejas, > On 14 Apr 2025, at 16:04, Tejas Belagod wrote: > > The operand order to gen_vcond_mask call in the vec_extract pattern is wrong. > Fix the order where predicate is operand 3. > > Tested and bootstrapped on aarch64-linux-gnu. OK for trunk? > > gcc/ChangeLog > > * config/aarch64/aar

Re: [PATCH] Locality cloning pass (was: Introduce -flto-partition=locality)

2025-04-14 Thread Kyrylo Tkachov
Hi Honza, > On 13 Apr 2025, at 23:19, Jan Hubicka wrote: > >> +@opindex fipa-reorder-for-locality >> +@item -fipa-reorder-for-locality >> +Group call chains close together in the binary layout to improve code code >> +locality. This option is incompatible with an explicit >> +@option{-flto-part

Re: [PATCH] Locality cloning pass (was: Introduce -flto-partition=locality)

2025-04-10 Thread Kyrylo Tkachov
> On 26 Mar 2025, at 08:42, Kyrylo Tkachov wrote: > > Ping. Ping. https://gcc.gnu.org/pipermail/gcc-patches/2025-March/676958.html I’ve ran a profiled LTO bootstrap of GCC with the new bootstrap-lto-locality bootstrap config And compared it against a GCC produced by the exi

Re: [PATCH v2] aarch64, Darwin: Initial implementation of Apple cores [PR113257].

2025-04-07 Thread Kyrylo Tkachov
> On 7 Apr 2025, at 10:21, Tamar Christina wrote: > >> -Original Message----- >> From: Kyrylo Tkachov >> Sent: Monday, March 31, 2025 1:43 PM >> To: i...@sandoe.co.uk >> Cc: Tamar Christina ; GCC Patches > patc...@gcc.gnu.org>; Alice Carlotti ;

Re: [PATCH] PR middle-end/119442: expr.cc: Fix vec_duplicate into vector boolean modes

2025-04-05 Thread Kyrylo Tkachov
> On 31 Mar 2025, at 09:43, Richard Biener wrote: > > On Mon, Mar 31, 2025 at 9:41 AM Richard Biener > wrote: >> >> On Mon, Mar 31, 2025 at 9:36 AM Kyrylo Tkachov wrote: >>> >>> Ping. >> >> Can you reference the patch please? I'

[PATCH] aarch64: Deprecate -march= for the month of April

2025-04-05 Thread Kyrylo Tkachov
Hi all, As we're starting a new month, introduce a more appropriate -mapril= to specify the compilation target instead. This helps keep GCC more up to date with the passage of time. Bootstrapped and tested on aarch64-none-linux-gnu. Signed-off-by: Kyrylo Tkachov gcc/ * config/aa

Re: [PATCH v2] aarch64, Darwin: Initial implementation of Apple cores [PR113257].

2025-03-31 Thread Kyrylo Tkachov
Hi Iain, > On 22 Mar 2025, at 15:31, Iain Sandoe wrote: > > 0. Sorry this has taken some time to close off; partly because of waiting > for input, but mostly that I've been stretched with other work. > 1. As per the commit message, the apparent non-conformance with 8.5/6 > because FEAT_SPECR

Re: [PATCH] PR middle-end/119442: expr.cc: Fix vec_duplicate into vector boolean modes

2025-03-31 Thread Kyrylo Tkachov
Ping. Thanks, Kyrill > On 24 Mar 2025, at 14:28, Kyrylo Tkachov wrote: > > Hi all, > > In this testcase GCC tries to expand a VNx4BI vector: > vector(4) _40; > _39 = () _24; > _40 = {_39, _39, _39, _39}; > > This ends up in a scalarised sequence of bitfiel

Re: [PATCH] Locality cloning pass (was: Introduce -flto-partition=locality)

2025-03-26 Thread Kyrylo Tkachov
Ping. Thanks, Kyrill > On 6 Mar 2025, at 09:25, Kyrylo Tkachov wrote: > > Hi all, > > Implement partitioning and cloning in the callgraph to help locality. > A new -fipa-reorder-for-locality flag is used to enable this. > The majority of the logic is in the new IPA

[PATCH] PR middle-end/119442: expr.cc: Fix vec_duplicate into vector boolean modes

2025-03-24 Thread Kyrylo Tkachov
bfis are gone. Bootstrapped and tested on aarch64-none-linux-gnu. Given this a regression from GCC 13 is this ok for trunk now? Thanks, Kyrill Signed-off-by: Kyrylo Tkachov gcc/ PR middle-end/119442 * expr.cc (store_constructor): Also allow element modes explicitly accepted by

Re: [PATCH] aarch64: Add support for -mcpu=olympus

2025-03-21 Thread Kyrylo Tkachov
Hi Dhruv, > On 21 Mar 2025, at 11:11, Dhruv Chawla wrote: > > This adds support for the NVIDIA Olympus core to the AArch64 backend. The > initial patch does not add any special tuning decisions, and those may come > later. > > Bootstrapped and tested on aarch64-none-linux-gnu. > Thanks, given

[PATCH] aarch64: Add +sve2p1 to -march=armv9.4-a flags

2025-03-19 Thread Kyrylo Tkachov
g to trunk. Thanks, Kyrill Signed-off-by: Kyrylo Tkachov gcc/ * config/aarch64/aarch64-arches.def (...): Add SVE2p1. * doc/invoke.texi (AArch64 Options): Document +sve2p1 in -march=armv9.4-a. 0001-aarch64-Add-sve2p1-to-march-armv9.4-a-flags.patch Description: 0001-a

Re: [PATCH v3 1/2] Aarch64: Add FMA and FMAF intrinsic and corresponding tests

2025-03-17 Thread Kyrylo Tkachov
> On 16 Mar 2025, at 20:15, Ayan Shafqat wrote: > > This patch introduces inline definitions for the __fma and __fmaf > functions in arm_acle.h for Aarch64 targets. These definitions rely on > __builtin_fma and __builtin_fmaf to ensure proper inlining and to meet > the ACLE requirements [1]. >

Re: [PATCH 1/2] aarch64: Add FMA and FMAF intrinsics and tests

2025-03-13 Thread Kyrylo Tkachov
Hi Ayan, > On 11 Mar 2025, at 14:53, Ayan Shafqat wrote: > > Hello Kyrylo, > > On Tue, Mar 11, 2025 at 08:55:46AM +, Kyrylo Tkachov wrote: >> This looks ok to me. >> GCC is currently in a regression fixing stage so normally such a change >> would wait u

Re: [PATCH 1/2] aarch64: Add FMA and FMAF intrinsics and tests

2025-03-11 Thread Kyrylo Tkachov
Hi Ayan, > On 9 Mar 2025, at 21:46, Ayan Shafqat wrote: > > This patch introduces inline definitions for the __fma and __fmaf > functions in arm_acle.h for AArch64 targets. These definitions rely on > __builtin_fma and __builtin_fmaf to ensure proper inlining and to meet > the ACLE requirements

[PATCH] Locality cloning pass (was: Introduce -flto-partition=locality)

2025-03-06 Thread Kyrylo Tkachov
ality, but we'd appreciate wider performance evaluation. Bootstrapped and tested on aarch64-none-linux-gnu. Ok for mainline? Thanks, Kyrill Signed-off-by: Prachi Godbole Co-authored-by: Kyrylo Tkachov config/ChangeLog: * bootstrap-lto-locality.mk: New file. gcc

Re: [PATCH] Introduce -flto-partition=locality

2025-03-06 Thread Kyrylo Tkachov
both (normal LTO bootstrap and profiledbootstrap). >> >> With this optimization we are seeing good performance gains on some large >> internal workloads that stress the parts of the processor that is sensitive >> to code locality, but we'd appreciate wider performance eva

[PATCH] PR rtl-optimization/119046: aarch64: Fix PARALLEL mode for vec_perm DUP expansion

2025-03-05 Thread Kyrylo Tkachov
. Bootstrapped and tested on aarch64-none-linux-gnu. Pushing to trunk. Thanks, Kyrill Signed-off-by: Kyrylo Tkachov PR rtl-optimization/119046 * config/aarch64/aarch64.cc (aarch64_evpc_dup): Use VOIDmode for PARALLEL. 0001-PR-rtl-optimization-119046-aarch64-Fix-PARALLEL

Re: [PATCH][v2] PR rtl-optimization/119046: Don't mark PARALLEL RTXes with floating-point mode as trapping

2025-03-05 Thread Kyrylo Tkachov
> On 5 Mar 2025, at 11:14, Richard Biener wrote: > > On Tue, Mar 4, 2025 at 10:01 PM Richard Sandiford > wrote: >> >> Kyrylo Tkachov writes: >>> Hi all, >>> >>> In this testcase late-combine was failing to merge: >>> dup v31.4s

Re: AArch64: Turn off outline atomics with -mcmodel=large (PR112465)

2025-03-04 Thread Kyrylo Tkachov
> On 3 Mar 2025, at 19:52, Wilco Dijkstra wrote: > > > Outline atomics is not designed to be used with -mcmodel=large, so disable > it automatically if the large code model is used. > > Passes regress, OK for commit? > This restriction should be documented in invoke.texi IMO. I also think i

Re: AArch64: Enable early scheduling for -O3 and higher (PR118351)

2025-03-04 Thread Kyrylo Tkachov
> On 3 Mar 2025, at 19:58, Wilco Dijkstra wrote: > > > Enable the early scheduler on AArch64 for O3/Ofast. This means GCC15 benefits > from much faster build times with -O2, but avoids the regressions in lbm which > is very sensitive to minor scheduling changes due to long FMA chains. We can

Re: [PATCH] PR rtl-optimization/119046: Don't mark PARALLEL RTXes with floating-point mode as trapping

2025-03-03 Thread Kyrylo Tkachov
> On 3 Mar 2025, at 09:49, Andrew Pinski wrote: > > On Mon, Mar 3, 2025 at 12:43 AM Kyrylo Tkachov wrote: >> >> >> >>> On 28 Feb 2025, at 19:06, Andrew Pinski wrote: >>> >>> On Fri, Feb 28, 2025 at 5:25 AM Kyrylo Tkachov wrote: >

Re: [PATCH] PR rtl-optimization/119046: Don't mark PARALLEL RTXes with floating-point mode as trapping

2025-03-03 Thread Kyrylo Tkachov
> On 28 Feb 2025, at 19:06, Andrew Pinski wrote: > > On Fri, Feb 28, 2025 at 5:25 AM Kyrylo Tkachov wrote: >> >> Hi all, >> >> In this PR late-combine was failing to merge: >> dup v31.4s, v31.s[3] >> fmla v30.4s, v31.4s, v29.4s >> in

[PATCH][v2] PR rtl-optimization/119046: Don't mark PARALLEL RTXes with floating-point mode as trapping

2025-03-03 Thread Kyrylo Tkachov
d and tested on aarch64-none-linux-gnu. Apparently this also fixes a regression in gcc.target/aarch64/vmul_element_cost.c that I observed. Signed-off-by: Kyrylo Tkachov gcc/ PR rtl-optimization/119046 * rtlanal.cc (may_trap_p_1): Don't mark FP-mode PARALLELs as trapping. gcc

[PATCH] PR rtl-optimization/119046: Don't mark PARALLEL RTXes with floating-point mode as trapping

2025-02-28 Thread Kyrylo Tkachov
igned-off-by: Kyrylo Tkachov gcc/ PR rtl-optimization/119046 * rtlanal.cc (may_trap_p_1): Don't mark FP-mode PARALLELs as trapping. gcc/testsuite/ PR rtl-optimization/119046 * g++.target/aarch64/pr119046.C: New test. 0001-PR-rtl-optimization-119046-

Re: [PATCH] aarch64: Use generic_armv8_a_prefetch_tune in generic_armv8_a.h

2025-02-18 Thread Kyrylo Tkachov
> On 18 Feb 2025, at 09:48, Kyrylo Tkachov wrote: > > > >> On 18 Feb 2025, at 09:41, Richard Sandiford >> wrote: >> >> Kyrylo Tkachov writes: >>> Hi Soumya >>> >>>> On 18 Feb 2025, at 09:12, Soumya AR wrote: >>&g

Re: [PATCH] aarch64: Use generic_armv8_a_prefetch_tune in generic_armv8_a.h

2025-02-18 Thread Kyrylo Tkachov
> On 18 Feb 2025, at 09:41, Richard Sandiford wrote: > > Kyrylo Tkachov writes: >> Hi Soumya >> >>> On 18 Feb 2025, at 09:12, Soumya AR wrote: >>> >>> generic_armv8_a.h defines generic_armv8_a_prefetch_tune but still uses >>> generi

Re: [PATCH] aarch64: Use generic_armv8_a_prefetch_tune in generic_armv8_a.h

2025-02-18 Thread Kyrylo Tkachov
Hi Soumya > On 18 Feb 2025, at 09:12, Soumya AR wrote: > > generic_armv8_a.h defines generic_armv8_a_prefetch_tune but still uses > generic_prefetch_tune in generic_armv8_a_tunings. > > This patch updates the pointer to generic_armv8_a_prefetch_tune. > > This patch was bootstrapped and regtest

Re: [PATCH 1/1] AArch64: Fold builtins with highpart args to highpart equivalent [PR117850]

2025-02-17 Thread Kyrylo Tkachov
Hi Spencer, > On 17 Feb 2025, at 20:07, Spencer Abson wrote: > > Add a fold at gimple_fold_builtin to prefer the highpart variant of a builtin > if the arguments are better suited to it. This helps us avoid copying data > between lanes before operation. > > E.g. We prefer to use UMULL2 rather t

Re: [PATCH] aarch64: Fix bootstrap with --enable-checking=release [PR118771]

2025-02-07 Thread Kyrylo Tkachov
> On 7 Feb 2025, at 01:04, Andrew Pinski wrote: > > With release checking we get an uninitialization warning > inside aarch64_split_move because of jump threading for the case of > `npieces==0` > but `npieces` is never 0 (but there is no way the compiler can know that. > So this fixes the iss

Re: [PATCH] aarch64: Fix sve/acle/general/ldff1_8.c failures

2025-02-05 Thread Kyrylo Tkachov
Hi Richard, > On 5 Feb 2025, at 09:57, Richard Sandiford wrote: > > gcc.target/aarch64/sve/acle/general/ldff1_8.c and > gcc.target/aarch64/sve/ptest_1.c were failing because the > aarch64 port was giving a zero (unknown) cost to instructions > that compute two results in parallel. This was late

Re: [PATCH 3/3] aarch64: Avoid redundant writes to FPMR

2025-01-22 Thread Kyrylo Tkachov
> On 22 Jan 2025, at 13:53, Richard Sandiford wrote: > > Kyrylo Tkachov writes: >> Hi Richard, >> >>> On 22 Jan 2025, at 13:21, Richard Sandiford >>> wrote: >>> >>> GCC 15 is the first release to support FP8 intrinsics. >>>

Re: [PATCH 3/3] aarch64: Avoid redundant writes to FPMR

2025-01-22 Thread Kyrylo Tkachov
Hi Richard, > On 22 Jan 2025, at 13:21, Richard Sandiford wrote: > > GCC 15 is the first release to support FP8 intrinsics. > The underlying instructions depend on the value of a new register, > FPMR. Unlike FPCR, FPMR is a normal call-clobbered/caller-save > register rather than a global regis

Re: [PATCH]AArch64: Drop ILP32 from default elf multilibs after deprecation

2025-01-20 Thread Kyrylo Tkachov
> On 20 Jan 2025, at 19:43, Tamar Christina wrote: > >> -Original Message- >> From: Tamar Christina >> Sent: Friday, January 17, 2025 5:07 PM >> To: Kyrylo Tkachov ; Richard Sandiford >> >> Cc: GCC Patches ; nd ; Richard >> Earnsh

Re: [PATCH v3 1/2] aarch64: Use standard names for saturating arithmetic

2025-01-17 Thread Kyrylo Tkachov
> On 17 Jan 2025, at 15:01, Richard Sandiford wrote: > > Tamar Christina writes: >>> -Original Message- >>> From: Richard Sandiford >>> Sent: Friday, January 10, 2025 4:50 PM >>> To: Akram Ahmad >>> Cc: ktkac...@nvidia.com; gcc-patches@gcc.gnu.org >>> Subject: Re: [PATCH v3 1/2] aar

Re: [PATCH]AArch64: Drop ILP32 from default elf multilibs after deprecation

2025-01-17 Thread Kyrylo Tkachov
> On 17 Jan 2025, at 14:47, Richard Sandiford wrote: > > Tamar Christina writes: >>> -Original Message- >>> From: Kyrylo Tkachov >>> Sent: Friday, January 17, 2025 1:22 PM >>> To: Tamar Christina >>> Cc: GCC Patches ; nd

Re: [PATCH]AArch64: Drop ILP32 from default elf multilibs after deprecation

2025-01-17 Thread Kyrylo Tkachov
> On 17 Jan 2025, at 14:06, Tamar Christina wrote: > >> -Original Message----- >> From: Kyrylo Tkachov >> Sent: Friday, January 17, 2025 1:04 PM >> To: Tamar Christina >> Cc: GCC Patches ; nd ; Richard >> Earnshaw ; ktkac...@gcc.gnu.org; Ri

Re: [PATCH]AArch64: Drop ILP32 from default elf multilibs after deprecation

2025-01-17 Thread Kyrylo Tkachov
> On 17 Jan 2025, at 13:56, Tamar Christina wrote: > > Hi All, > > Following the deprecation of ILP32 *-elf builds fail now due to -Werror on the > deprecation warning. This is because on embedded builds ILP32 is part of the > default multilib. > > This patch removed it from the default targ

Re: [PATCH] AArch64: Deprecate -mabi=ilp32

2025-01-14 Thread Kyrylo Tkachov
> On 13 Jan 2025, at 18:51, Richard Sandiford wrote: > > Iain Sandoe writes: >> Hi Folks, >> >>> On 10 Jan 2025, at 18:30, Wilco Dijkstra wrote: >>> >>> Hi Andrew, >>> Personally I would like this deprecated even for bare-metal. Yes the iwatch ABI is an ILP32 ABI but I don't see

Re: [PATCH] aarch64: Provide initial specifications for Apple CPU cores.

2025-01-13 Thread Kyrylo Tkachov
Hi Iain, > On 11 Jan 2025, at 14:21, Iain Sandoe wrote: > > Hi, > > I originally made this patch for the Darwin Arm64 development branch, > however in discussions on IRC, it seems that it is also relevant to > Linux - since there are implementations running on Apple hardware with > the M1..3 CP

Re: [PATCH 3/3] AArch64: Add SVE vector cost to baseline tuning

2025-01-10 Thread Kyrylo Tkachov
> On 10 Jan 2025, at 15:54, Wilco Dijkstra wrote: > > ping > > > Add AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS and > AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT > to the baseline tuning since all modern cores use it. Fix the neoverse512tvb > tuning to be > like Neoverse V1/V2. For neovers

Re: [PATCH] AArch64: Remove Cortex-A57 FMA steering pass

2025-01-10 Thread Kyrylo Tkachov
> On 10 Jan 2025, at 15:30, Richard Sandiford wrote: > > Wilco Dijkstra writes: >> As a minor cleanup remove Cortex-A57 FMA steering pass. Since Cortex-A57 is >> pretty old, there isn't any benefit of keeping this. >> >> Passes regress & bootstrap, OK for commit? >> >> gcc: >> * config.gcc

Re: [PATCH] AArch64: Deprecate -mabi=ilp32

2025-01-10 Thread Kyrylo Tkachov
Hi Wilco, > On 10 Jan 2025, at 15:05, Wilco Dijkstra wrote: > > > ILP32 was originally intended to make porting to AArch64 easier. Support was > never merged in the Linux kernel or GLIBC, so it has been unsupported for many > years. There isn't a benefit in keeping unsupported features foreve

Re: [PATCH v2] Add warning for non-spec compliant FMV in Aarch64

2025-01-10 Thread Kyrylo Tkachov
> On 10 Jan 2025, at 11:22, Richard Sandiford wrote: > > writes: >> This patch adds a warning when FMV is used for Aarch64. >> >> The reasoning for this is the ACLE [1] spec for FMV has diverged >> significantly from the current implementation and we want to prevent >> potential future compat

Re: [PATCH]AArch64: correct Cortex-X4 MIDR

2025-01-10 Thread Kyrylo Tkachov
> On 10 Jan 2025, at 00:07, Tamar Christina wrote: > > Hi All, > > The Parts Num field for the MIDR for Cortex-X4 is wrong. It's currently the > parts number for a Cortex-A720 (which does have the right number). > > The correct number can be found in the Cortex-X4 Technical Reference Manual

Re: [PATCH v3 1/2] aarch64: Use standard names for saturating arithmetic

2025-01-09 Thread Kyrylo Tkachov
Hi Akram > On 8 Jan 2025, at 16:23, Akram Ahmad wrote: > > Hi Kyrill, > > Thanks for the feedback on V2. I found a pattern which works for > the open-coded signed arithmetic, and I've implemented the other > feedback you provided as well. > > I've send the modified patch in this thread as the

Re: [PATCH] Add warning for use of non-spec FMV in Aarch64

2025-01-09 Thread Kyrylo Tkachov
Hi Alfie, > On 9 Jan 2025, at 10:58, alfie.richa...@arm.com wrote: > > This patch adds a warning whenever FMV is used for Aarch64. > > The reasoning for this is the ACLE [1] spec for FMV has diverged > significantly from the current implementation and we want to prevent > future compatability is

Re: [PATCH] Introduce -flto-partition=locality

2024-12-20 Thread Kyrylo Tkachov
Ping. Thanks, Kyrill > On 13 Dec 2024, at 16:47, Kyrylo Tkachov wrote: > > Ping. > Thanks, > Kyrill > >> On 28 Nov 2024, at 11:22, Kyrylo Tkachov wrote: >> >> Ping. >> >>> On 15 Nov 2024, at 17:04, Kyrylo Tkachov wrote: >>> >&

Re: [PATCH v2 1/2] aarch64: Use standard names for saturating arithmetic

2024-12-17 Thread Kyrylo Tkachov
Hi Akram, > On 14 Nov 2024, at 16:53, Akram Ahmad wrote: > > This renames the existing {s,u}q{add,sub} instructions to use the > standard names {s,u}s{add,sub}3 which are used by IFN_SAT_ADD and > IFN_SAT_SUB. > > The NEON intrinsics for saturating arithmetic and their corresponding > builtins

Re: [PATCH] Introduce -flto-partition=locality

2024-12-13 Thread Kyrylo Tkachov
Ping. Thanks, Kyrill > On 28 Nov 2024, at 11:22, Kyrylo Tkachov wrote: > > Ping. > >> On 15 Nov 2024, at 17:04, Kyrylo Tkachov wrote: >> >> Hi all, >> >> This is a patch submission following-up from the RFC at: >> https://gcc.gnu.org/piperma

Re: [PATCH 1/2]AArch64: Add CMP+CSEL and CMP+CSET for cores that support it

2024-12-12 Thread Kyrylo Tkachov
Thanks for doing this Tamar, > On 11 Dec 2024, at 10:54, Tamar Christina wrote: > >> -Original Message- >> From: Richard Sandiford >> Sent: Wednesday, December 11, 2024 9:50 AM >> To: Tamar Christina >> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw >> ; ktkac...@gcc.gnu.org >> Sub

Re: [PATCH 2/8]AArch64: Add Neoverse V3 core definition and cost model

2024-12-05 Thread Kyrylo Tkachov
> On 3 Dec 2024, at 11:32, Tamar Christina wrote: > >> -Original Message----- >> From: Kyrylo Tkachov >> Sent: Tuesday, December 3, 2024 10:19 AM >> To: Tamar Christina >> Cc: GCC Patches ; nd ; Richard >> Earnshaw ; Marcus Shawcroft &g

Re: [PATCH 0/4] Rename the Advanced SIMD intrinsic flags

2024-12-05 Thread Kyrylo Tkachov
> On 4 Dec 2024, at 19:02, Richard Sandiford wrote: > > The arm_neon.h intrinsic definitions use a bitmask of flags to > indicate what side-effects the intrinsic might have. However, > their names are a bit confusing: > > - FLAG_AUTO_FP was originally suggested as a way of saying > "automati

[PATCH] aarch64: Update cpuinfo strings for some arch features

2024-12-03 Thread Kyrylo Tkachov
next week if there are no objections. Thanks, Kyrill Signed-off-by: Kyrylo Tkachov gcc/ * config/aarch64/aarch64-option-extensions.def (sve-b16b16, f32mm, f64mm, sve2p1, sme-f64f64, sme-i16i64, sme-b16b16, sme-f16f16, mops): Update FEATURE_STRING field. 0001-aarc

Re: [PATCH v1 1/1] aarch64: fix fp8 cpuinfo feature names

2024-12-03 Thread Kyrylo Tkachov
> On 3 Dec 2024, at 11:41, Claudio Bantaloukas > wrote: > > > > On 12/3/2024 10:24 AM, Kyrylo Tkachov wrote: >> Hi Claudio, >>> On 2 Dec 2024, at 19:14, Claudio Bantaloukas >>> wrote: >>> >>> >>> The previous version o

Re: [PATCH v1 1/1] aarch64: fix fp8 cpuinfo feature names

2024-12-03 Thread Kyrylo Tkachov
Hi Claudio, > On 2 Dec 2024, at 19:14, Claudio Bantaloukas > wrote: > > > The previous version of the patch was based on the mistaken assumption that > features in /proc/cpuinfo had matching names to the feature names that gcc and > gas accept. > This patch enables the fp8 feature when the f8c

Re: [PATCH 2/8]AArch64: Add Neoverse V3 core definition and cost model

2024-12-03 Thread Kyrylo Tkachov
Hi Tamar, Something I noticed when looking at the various tuning files…. > On 26 Jul 2024, at 11:20, Tamar Christina wrote: > > External email: Use caution opening links or attachments > > > Hi All, > > This adds a cost model and core definition for Neoverse V3. > > It also makes Cortex-X4

Re: [PATCH 1/1] aarch64: remove extra XTN in vector concatenation

2024-12-02 Thread Kyrylo Tkachov
Hi Akram, > On 2 Dec 2024, at 15:54, Akram Ahmad wrote: > > GIMPLE code which performs a narrowing truncation on the result of a > vector concatenation currently results in an unnecessary XTN being > emitted following a UZP1 to concate the operands. In cases such as this, > UZP1 should instead u

Re: [PATCH] aarch64: Extend SVE2 bit-select instructions for Neon modes.

2024-12-02 Thread Kyrylo Tkachov
> On 29 Nov 2024, at 14:16, Richard Sandiford wrote: > > Kyrylo Tkachov writes: >>> On 27 Nov 2024, at 09:34, Richard Sandiford >>> wrote: >>> >>> Soumya AR writes: >>>> NBSL, BSL1N, and BSL2N are bit-select intructions on SVE

Re: [PATCH v2] aarch64: Fix build failure due to missing header

2024-11-29 Thread Kyrylo Tkachov
> On 29 Nov 2024, at 14:49, Yury Khrustalev wrote: > > Including the "arm_acle.h" header in aarch64-unwind.h requires > stdint.h to be present and it may not be available during the > first stage of cross-compilation of GCC. > > When cross-building GCC for the aarch64-none-linux-gnu target >

Re: [PATCH] aarch64: Fix bootstrap build failure due to missing header

2024-11-29 Thread Kyrylo Tkachov
> On 29 Nov 2024, at 14:25, Yury Khrustalev wrote: > > Hi Kyrill, > > On Fri, Nov 29, 2024 at 02:06:17PM +, Kyrylo Tkachov wrote: >> Hi Yury, >> >>> On 29 Nov 2024, at 13:57, Yury Khrustalev wrote: >>> >>> Inclusion of "arm_ac

  1   2   3   4   5   6   7   8   9   10   >