Regenerate common.opt.urls

2025-04-15 Thread Kyrylo Tkachov
Pushing as obvious. Thanks, Kyrill Signed-off-by: Kyrylo Tkachov * common.opt.urls: Regenerate. 0001-Regenerate-common.opt.urls.patch Description: 0001-Regenerate-common.opt.urls.patch

Re: [PATCH] Locality cloning pass (was: Introduce -flto-partition=locality)

2025-04-15 Thread Kyrylo Tkachov
> On 15 Apr 2025, at 15:42, Richard Biener wrote: > > On Mon, Apr 14, 2025 at 3:11 PM Kyrylo Tkachov wrote: >> >> Hi Honza, >> >>> On 13 Apr 2025, at 23:19, Jan Hubicka wrote: >>> >>>> +@opindex fipa-reorder-for-locality >>>

Re: [PATCH] AArch64: Fix operands order in vec_extract expander

2025-04-14 Thread Kyrylo Tkachov
Hi Tejas, > On 14 Apr 2025, at 16:04, Tejas Belagod wrote: > > The operand order to gen_vcond_mask call in the vec_extract pattern is wrong. > Fix the order where predicate is operand 3. > > Tested and bootstrapped on aarch64-linux-gnu. OK for trunk? > > gcc/ChangeLog > > * config/aarch64/aar

Re: [PATCH] Locality cloning pass (was: Introduce -flto-partition=locality)

2025-04-14 Thread Kyrylo Tkachov
Hi Honza, > On 13 Apr 2025, at 23:19, Jan Hubicka wrote: > >> +@opindex fipa-reorder-for-locality >> +@item -fipa-reorder-for-locality >> +Group call chains close together in the binary layout to improve code code >> +locality. This option is incompatible with an explicit >> +@option{-flto-part

Re: [PATCH] Locality cloning pass (was: Introduce -flto-partition=locality)

2025-04-10 Thread Kyrylo Tkachov
> On 26 Mar 2025, at 08:42, Kyrylo Tkachov wrote: > > Ping. Ping. https://gcc.gnu.org/pipermail/gcc-patches/2025-March/676958.html I’ve ran a profiled LTO bootstrap of GCC with the new bootstrap-lto-locality bootstrap config And compared it against a GCC produced by the exi

Re: [PATCH v2] aarch64, Darwin: Initial implementation of Apple cores [PR113257].

2025-04-07 Thread Kyrylo Tkachov
> On 7 Apr 2025, at 10:21, Tamar Christina wrote: > >> -Original Message----- >> From: Kyrylo Tkachov >> Sent: Monday, March 31, 2025 1:43 PM >> To: i...@sandoe.co.uk >> Cc: Tamar Christina ; GCC Patches > patc...@gcc.gnu.org>; Alice Carlotti ;

Re: [PATCH] PR middle-end/119442: expr.cc: Fix vec_duplicate into vector boolean modes

2025-04-05 Thread Kyrylo Tkachov
> On 31 Mar 2025, at 09:43, Richard Biener wrote: > > On Mon, Mar 31, 2025 at 9:41 AM Richard Biener > wrote: >> >> On Mon, Mar 31, 2025 at 9:36 AM Kyrylo Tkachov wrote: >>> >>> Ping. >> >> Can you reference the patch please? I'

[PATCH] aarch64: Deprecate -march= for the month of April

2025-04-05 Thread Kyrylo Tkachov
Hi all, As we're starting a new month, introduce a more appropriate -mapril= to specify the compilation target instead. This helps keep GCC more up to date with the passage of time. Bootstrapped and tested on aarch64-none-linux-gnu. Signed-off-by: Kyrylo Tkachov gcc/ * config/aa

Re: [PATCH v2] aarch64, Darwin: Initial implementation of Apple cores [PR113257].

2025-03-31 Thread Kyrylo Tkachov
Hi Iain, > On 22 Mar 2025, at 15:31, Iain Sandoe wrote: > > 0. Sorry this has taken some time to close off; partly because of waiting > for input, but mostly that I've been stretched with other work. > 1. As per the commit message, the apparent non-conformance with 8.5/6 > because FEAT_SPECR

Re: [PATCH] PR middle-end/119442: expr.cc: Fix vec_duplicate into vector boolean modes

2025-03-31 Thread Kyrylo Tkachov
Ping. Thanks, Kyrill > On 24 Mar 2025, at 14:28, Kyrylo Tkachov wrote: > > Hi all, > > In this testcase GCC tries to expand a VNx4BI vector: > vector(4) _40; > _39 = () _24; > _40 = {_39, _39, _39, _39}; > > This ends up in a scalarised sequence of bitfiel

Re: [PATCH] Locality cloning pass (was: Introduce -flto-partition=locality)

2025-03-26 Thread Kyrylo Tkachov
Ping. Thanks, Kyrill > On 6 Mar 2025, at 09:25, Kyrylo Tkachov wrote: > > Hi all, > > Implement partitioning and cloning in the callgraph to help locality. > A new -fipa-reorder-for-locality flag is used to enable this. > The majority of the logic is in the new IPA

[PATCH] PR middle-end/119442: expr.cc: Fix vec_duplicate into vector boolean modes

2025-03-24 Thread Kyrylo Tkachov
bfis are gone. Bootstrapped and tested on aarch64-none-linux-gnu. Given this a regression from GCC 13 is this ok for trunk now? Thanks, Kyrill Signed-off-by: Kyrylo Tkachov gcc/ PR middle-end/119442 * expr.cc (store_constructor): Also allow element modes explicitly accepted by

Re: [PATCH] aarch64: Add support for -mcpu=olympus

2025-03-21 Thread Kyrylo Tkachov
Hi Dhruv, > On 21 Mar 2025, at 11:11, Dhruv Chawla wrote: > > This adds support for the NVIDIA Olympus core to the AArch64 backend. The > initial patch does not add any special tuning decisions, and those may come > later. > > Bootstrapped and tested on aarch64-none-linux-gnu. > Thanks, given

[PATCH] aarch64: Add +sve2p1 to -march=armv9.4-a flags

2025-03-19 Thread Kyrylo Tkachov
g to trunk. Thanks, Kyrill Signed-off-by: Kyrylo Tkachov gcc/ * config/aarch64/aarch64-arches.def (...): Add SVE2p1. * doc/invoke.texi (AArch64 Options): Document +sve2p1 in -march=armv9.4-a. 0001-aarch64-Add-sve2p1-to-march-armv9.4-a-flags.patch Description: 0001-a

Re: [PATCH v3 1/2] Aarch64: Add FMA and FMAF intrinsic and corresponding tests

2025-03-17 Thread Kyrylo Tkachov
> On 16 Mar 2025, at 20:15, Ayan Shafqat wrote: > > This patch introduces inline definitions for the __fma and __fmaf > functions in arm_acle.h for Aarch64 targets. These definitions rely on > __builtin_fma and __builtin_fmaf to ensure proper inlining and to meet > the ACLE requirements [1]. >

Re: [PATCH 1/2] aarch64: Add FMA and FMAF intrinsics and tests

2025-03-13 Thread Kyrylo Tkachov
Hi Ayan, > On 11 Mar 2025, at 14:53, Ayan Shafqat wrote: > > Hello Kyrylo, > > On Tue, Mar 11, 2025 at 08:55:46AM +, Kyrylo Tkachov wrote: >> This looks ok to me. >> GCC is currently in a regression fixing stage so normally such a change >> would wait u

Re: [PATCH 1/2] aarch64: Add FMA and FMAF intrinsics and tests

2025-03-11 Thread Kyrylo Tkachov
Hi Ayan, > On 9 Mar 2025, at 21:46, Ayan Shafqat wrote: > > This patch introduces inline definitions for the __fma and __fmaf > functions in arm_acle.h for AArch64 targets. These definitions rely on > __builtin_fma and __builtin_fmaf to ensure proper inlining and to meet > the ACLE requirements

[PATCH] Locality cloning pass (was: Introduce -flto-partition=locality)

2025-03-06 Thread Kyrylo Tkachov
ality, but we'd appreciate wider performance evaluation. Bootstrapped and tested on aarch64-none-linux-gnu. Ok for mainline? Thanks, Kyrill Signed-off-by: Prachi Godbole Co-authored-by: Kyrylo Tkachov config/ChangeLog: * bootstrap-lto-locality.mk: New file. gcc

Re: [PATCH] Introduce -flto-partition=locality

2025-03-06 Thread Kyrylo Tkachov
both (normal LTO bootstrap and profiledbootstrap). >> >> With this optimization we are seeing good performance gains on some large >> internal workloads that stress the parts of the processor that is sensitive >> to code locality, but we'd appreciate wider performance eva

[PATCH] PR rtl-optimization/119046: aarch64: Fix PARALLEL mode for vec_perm DUP expansion

2025-03-05 Thread Kyrylo Tkachov
. Bootstrapped and tested on aarch64-none-linux-gnu. Pushing to trunk. Thanks, Kyrill Signed-off-by: Kyrylo Tkachov PR rtl-optimization/119046 * config/aarch64/aarch64.cc (aarch64_evpc_dup): Use VOIDmode for PARALLEL. 0001-PR-rtl-optimization-119046-aarch64-Fix-PARALLEL

Re: [PATCH][v2] PR rtl-optimization/119046: Don't mark PARALLEL RTXes with floating-point mode as trapping

2025-03-05 Thread Kyrylo Tkachov
> On 5 Mar 2025, at 11:14, Richard Biener wrote: > > On Tue, Mar 4, 2025 at 10:01 PM Richard Sandiford > wrote: >> >> Kyrylo Tkachov writes: >>> Hi all, >>> >>> In this testcase late-combine was failing to merge: >>> dup v31.4s

Re: AArch64: Turn off outline atomics with -mcmodel=large (PR112465)

2025-03-04 Thread Kyrylo Tkachov
> On 3 Mar 2025, at 19:52, Wilco Dijkstra wrote: > > > Outline atomics is not designed to be used with -mcmodel=large, so disable > it automatically if the large code model is used. > > Passes regress, OK for commit? > This restriction should be documented in invoke.texi IMO. I also think i

Re: AArch64: Enable early scheduling for -O3 and higher (PR118351)

2025-03-04 Thread Kyrylo Tkachov
> On 3 Mar 2025, at 19:58, Wilco Dijkstra wrote: > > > Enable the early scheduler on AArch64 for O3/Ofast. This means GCC15 benefits > from much faster build times with -O2, but avoids the regressions in lbm which > is very sensitive to minor scheduling changes due to long FMA chains. We can

Re: [PATCH] PR rtl-optimization/119046: Don't mark PARALLEL RTXes with floating-point mode as trapping

2025-03-03 Thread Kyrylo Tkachov
> On 3 Mar 2025, at 09:49, Andrew Pinski wrote: > > On Mon, Mar 3, 2025 at 12:43 AM Kyrylo Tkachov wrote: >> >> >> >>> On 28 Feb 2025, at 19:06, Andrew Pinski wrote: >>> >>> On Fri, Feb 28, 2025 at 5:25 AM Kyrylo Tkachov wrote: >

Re: [PATCH] PR rtl-optimization/119046: Don't mark PARALLEL RTXes with floating-point mode as trapping

2025-03-03 Thread Kyrylo Tkachov
> On 28 Feb 2025, at 19:06, Andrew Pinski wrote: > > On Fri, Feb 28, 2025 at 5:25 AM Kyrylo Tkachov wrote: >> >> Hi all, >> >> In this PR late-combine was failing to merge: >> dup v31.4s, v31.s[3] >> fmla v30.4s, v31.4s, v29.4s >> in

[PATCH][v2] PR rtl-optimization/119046: Don't mark PARALLEL RTXes with floating-point mode as trapping

2025-03-03 Thread Kyrylo Tkachov
d and tested on aarch64-none-linux-gnu. Apparently this also fixes a regression in gcc.target/aarch64/vmul_element_cost.c that I observed. Signed-off-by: Kyrylo Tkachov gcc/ PR rtl-optimization/119046 * rtlanal.cc (may_trap_p_1): Don't mark FP-mode PARALLELs as trapping. gcc

[PATCH] PR rtl-optimization/119046: Don't mark PARALLEL RTXes with floating-point mode as trapping

2025-02-28 Thread Kyrylo Tkachov
igned-off-by: Kyrylo Tkachov gcc/ PR rtl-optimization/119046 * rtlanal.cc (may_trap_p_1): Don't mark FP-mode PARALLELs as trapping. gcc/testsuite/ PR rtl-optimization/119046 * g++.target/aarch64/pr119046.C: New test. 0001-PR-rtl-optimization-119046-

Re: [PATCH] aarch64: Use generic_armv8_a_prefetch_tune in generic_armv8_a.h

2025-02-18 Thread Kyrylo Tkachov
> On 18 Feb 2025, at 09:48, Kyrylo Tkachov wrote: > > > >> On 18 Feb 2025, at 09:41, Richard Sandiford >> wrote: >> >> Kyrylo Tkachov writes: >>> Hi Soumya >>> >>>> On 18 Feb 2025, at 09:12, Soumya AR wrote: >>&g

Re: [PATCH] aarch64: Use generic_armv8_a_prefetch_tune in generic_armv8_a.h

2025-02-18 Thread Kyrylo Tkachov
> On 18 Feb 2025, at 09:41, Richard Sandiford wrote: > > Kyrylo Tkachov writes: >> Hi Soumya >> >>> On 18 Feb 2025, at 09:12, Soumya AR wrote: >>> >>> generic_armv8_a.h defines generic_armv8_a_prefetch_tune but still uses >>> generi

Re: [PATCH] aarch64: Use generic_armv8_a_prefetch_tune in generic_armv8_a.h

2025-02-18 Thread Kyrylo Tkachov
Hi Soumya > On 18 Feb 2025, at 09:12, Soumya AR wrote: > > generic_armv8_a.h defines generic_armv8_a_prefetch_tune but still uses > generic_prefetch_tune in generic_armv8_a_tunings. > > This patch updates the pointer to generic_armv8_a_prefetch_tune. > > This patch was bootstrapped and regtest

Re: [PATCH 1/1] AArch64: Fold builtins with highpart args to highpart equivalent [PR117850]

2025-02-17 Thread Kyrylo Tkachov
Hi Spencer, > On 17 Feb 2025, at 20:07, Spencer Abson wrote: > > Add a fold at gimple_fold_builtin to prefer the highpart variant of a builtin > if the arguments are better suited to it. This helps us avoid copying data > between lanes before operation. > > E.g. We prefer to use UMULL2 rather t

Re: [PATCH] aarch64: Fix bootstrap with --enable-checking=release [PR118771]

2025-02-07 Thread Kyrylo Tkachov
> On 7 Feb 2025, at 01:04, Andrew Pinski wrote: > > With release checking we get an uninitialization warning > inside aarch64_split_move because of jump threading for the case of > `npieces==0` > but `npieces` is never 0 (but there is no way the compiler can know that. > So this fixes the iss

Re: [PATCH] aarch64: Fix sve/acle/general/ldff1_8.c failures

2025-02-05 Thread Kyrylo Tkachov
Hi Richard, > On 5 Feb 2025, at 09:57, Richard Sandiford wrote: > > gcc.target/aarch64/sve/acle/general/ldff1_8.c and > gcc.target/aarch64/sve/ptest_1.c were failing because the > aarch64 port was giving a zero (unknown) cost to instructions > that compute two results in parallel. This was late

Re: [PATCH 3/3] aarch64: Avoid redundant writes to FPMR

2025-01-22 Thread Kyrylo Tkachov
> On 22 Jan 2025, at 13:53, Richard Sandiford wrote: > > Kyrylo Tkachov writes: >> Hi Richard, >> >>> On 22 Jan 2025, at 13:21, Richard Sandiford >>> wrote: >>> >>> GCC 15 is the first release to support FP8 intrinsics. >>>

Re: [PATCH 3/3] aarch64: Avoid redundant writes to FPMR

2025-01-22 Thread Kyrylo Tkachov
Hi Richard, > On 22 Jan 2025, at 13:21, Richard Sandiford wrote: > > GCC 15 is the first release to support FP8 intrinsics. > The underlying instructions depend on the value of a new register, > FPMR. Unlike FPCR, FPMR is a normal call-clobbered/caller-save > register rather than a global regis

Re: [PATCH]AArch64: Drop ILP32 from default elf multilibs after deprecation

2025-01-20 Thread Kyrylo Tkachov
> On 20 Jan 2025, at 19:43, Tamar Christina wrote: > >> -Original Message- >> From: Tamar Christina >> Sent: Friday, January 17, 2025 5:07 PM >> To: Kyrylo Tkachov ; Richard Sandiford >> >> Cc: GCC Patches ; nd ; Richard >> Earnsh

Re: [PATCH v3 1/2] aarch64: Use standard names for saturating arithmetic

2025-01-17 Thread Kyrylo Tkachov
> On 17 Jan 2025, at 15:01, Richard Sandiford wrote: > > Tamar Christina writes: >>> -Original Message- >>> From: Richard Sandiford >>> Sent: Friday, January 10, 2025 4:50 PM >>> To: Akram Ahmad >>> Cc: ktkac...@nvidia.com; gcc-patches@gcc.gnu.org >>> Subject: Re: [PATCH v3 1/2] aar

Re: [PATCH]AArch64: Drop ILP32 from default elf multilibs after deprecation

2025-01-17 Thread Kyrylo Tkachov
> On 17 Jan 2025, at 14:47, Richard Sandiford wrote: > > Tamar Christina writes: >>> -Original Message- >>> From: Kyrylo Tkachov >>> Sent: Friday, January 17, 2025 1:22 PM >>> To: Tamar Christina >>> Cc: GCC Patches ; nd

Re: [PATCH]AArch64: Drop ILP32 from default elf multilibs after deprecation

2025-01-17 Thread Kyrylo Tkachov
> On 17 Jan 2025, at 14:06, Tamar Christina wrote: > >> -Original Message----- >> From: Kyrylo Tkachov >> Sent: Friday, January 17, 2025 1:04 PM >> To: Tamar Christina >> Cc: GCC Patches ; nd ; Richard >> Earnshaw ; ktkac...@gcc.gnu.org; Ri

Re: [PATCH]AArch64: Drop ILP32 from default elf multilibs after deprecation

2025-01-17 Thread Kyrylo Tkachov
> On 17 Jan 2025, at 13:56, Tamar Christina wrote: > > Hi All, > > Following the deprecation of ILP32 *-elf builds fail now due to -Werror on the > deprecation warning. This is because on embedded builds ILP32 is part of the > default multilib. > > This patch removed it from the default targ

Re: [PATCH] AArch64: Deprecate -mabi=ilp32

2025-01-14 Thread Kyrylo Tkachov
> On 13 Jan 2025, at 18:51, Richard Sandiford wrote: > > Iain Sandoe writes: >> Hi Folks, >> >>> On 10 Jan 2025, at 18:30, Wilco Dijkstra wrote: >>> >>> Hi Andrew, >>> Personally I would like this deprecated even for bare-metal. Yes the iwatch ABI is an ILP32 ABI but I don't see

Re: [PATCH] aarch64: Provide initial specifications for Apple CPU cores.

2025-01-13 Thread Kyrylo Tkachov
Hi Iain, > On 11 Jan 2025, at 14:21, Iain Sandoe wrote: > > Hi, > > I originally made this patch for the Darwin Arm64 development branch, > however in discussions on IRC, it seems that it is also relevant to > Linux - since there are implementations running on Apple hardware with > the M1..3 CP

Re: [PATCH 3/3] AArch64: Add SVE vector cost to baseline tuning

2025-01-10 Thread Kyrylo Tkachov
> On 10 Jan 2025, at 15:54, Wilco Dijkstra wrote: > > ping > > > Add AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS and > AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT > to the baseline tuning since all modern cores use it. Fix the neoverse512tvb > tuning to be > like Neoverse V1/V2. For neovers

Re: [PATCH] AArch64: Remove Cortex-A57 FMA steering pass

2025-01-10 Thread Kyrylo Tkachov
> On 10 Jan 2025, at 15:30, Richard Sandiford wrote: > > Wilco Dijkstra writes: >> As a minor cleanup remove Cortex-A57 FMA steering pass. Since Cortex-A57 is >> pretty old, there isn't any benefit of keeping this. >> >> Passes regress & bootstrap, OK for commit? >> >> gcc: >> * config.gcc

Re: [PATCH] AArch64: Deprecate -mabi=ilp32

2025-01-10 Thread Kyrylo Tkachov
Hi Wilco, > On 10 Jan 2025, at 15:05, Wilco Dijkstra wrote: > > > ILP32 was originally intended to make porting to AArch64 easier. Support was > never merged in the Linux kernel or GLIBC, so it has been unsupported for many > years. There isn't a benefit in keeping unsupported features foreve

Re: [PATCH v2] Add warning for non-spec compliant FMV in Aarch64

2025-01-10 Thread Kyrylo Tkachov
> On 10 Jan 2025, at 11:22, Richard Sandiford wrote: > > writes: >> This patch adds a warning when FMV is used for Aarch64. >> >> The reasoning for this is the ACLE [1] spec for FMV has diverged >> significantly from the current implementation and we want to prevent >> potential future compat

Re: [PATCH]AArch64: correct Cortex-X4 MIDR

2025-01-10 Thread Kyrylo Tkachov
> On 10 Jan 2025, at 00:07, Tamar Christina wrote: > > Hi All, > > The Parts Num field for the MIDR for Cortex-X4 is wrong. It's currently the > parts number for a Cortex-A720 (which does have the right number). > > The correct number can be found in the Cortex-X4 Technical Reference Manual

Re: [PATCH v3 1/2] aarch64: Use standard names for saturating arithmetic

2025-01-09 Thread Kyrylo Tkachov
Hi Akram > On 8 Jan 2025, at 16:23, Akram Ahmad wrote: > > Hi Kyrill, > > Thanks for the feedback on V2. I found a pattern which works for > the open-coded signed arithmetic, and I've implemented the other > feedback you provided as well. > > I've send the modified patch in this thread as the

Re: [PATCH] Add warning for use of non-spec FMV in Aarch64

2025-01-09 Thread Kyrylo Tkachov
Hi Alfie, > On 9 Jan 2025, at 10:58, alfie.richa...@arm.com wrote: > > This patch adds a warning whenever FMV is used for Aarch64. > > The reasoning for this is the ACLE [1] spec for FMV has diverged > significantly from the current implementation and we want to prevent > future compatability is

Re: [PATCH] Introduce -flto-partition=locality

2024-12-20 Thread Kyrylo Tkachov
Ping. Thanks, Kyrill > On 13 Dec 2024, at 16:47, Kyrylo Tkachov wrote: > > Ping. > Thanks, > Kyrill > >> On 28 Nov 2024, at 11:22, Kyrylo Tkachov wrote: >> >> Ping. >> >>> On 15 Nov 2024, at 17:04, Kyrylo Tkachov wrote: >>> >&

Re: [PATCH v2 1/2] aarch64: Use standard names for saturating arithmetic

2024-12-17 Thread Kyrylo Tkachov
Hi Akram, > On 14 Nov 2024, at 16:53, Akram Ahmad wrote: > > This renames the existing {s,u}q{add,sub} instructions to use the > standard names {s,u}s{add,sub}3 which are used by IFN_SAT_ADD and > IFN_SAT_SUB. > > The NEON intrinsics for saturating arithmetic and their corresponding > builtins

Re: [PATCH] Introduce -flto-partition=locality

2024-12-13 Thread Kyrylo Tkachov
Ping. Thanks, Kyrill > On 28 Nov 2024, at 11:22, Kyrylo Tkachov wrote: > > Ping. > >> On 15 Nov 2024, at 17:04, Kyrylo Tkachov wrote: >> >> Hi all, >> >> This is a patch submission following-up from the RFC at: >> https://gcc.gnu.org/piperma

Re: [PATCH 1/2]AArch64: Add CMP+CSEL and CMP+CSET for cores that support it

2024-12-12 Thread Kyrylo Tkachov
Thanks for doing this Tamar, > On 11 Dec 2024, at 10:54, Tamar Christina wrote: > >> -Original Message- >> From: Richard Sandiford >> Sent: Wednesday, December 11, 2024 9:50 AM >> To: Tamar Christina >> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw >> ; ktkac...@gcc.gnu.org >> Sub

Re: [PATCH 2/8]AArch64: Add Neoverse V3 core definition and cost model

2024-12-05 Thread Kyrylo Tkachov
> On 3 Dec 2024, at 11:32, Tamar Christina wrote: > >> -Original Message----- >> From: Kyrylo Tkachov >> Sent: Tuesday, December 3, 2024 10:19 AM >> To: Tamar Christina >> Cc: GCC Patches ; nd ; Richard >> Earnshaw ; Marcus Shawcroft &g

Re: [PATCH 0/4] Rename the Advanced SIMD intrinsic flags

2024-12-05 Thread Kyrylo Tkachov
> On 4 Dec 2024, at 19:02, Richard Sandiford wrote: > > The arm_neon.h intrinsic definitions use a bitmask of flags to > indicate what side-effects the intrinsic might have. However, > their names are a bit confusing: > > - FLAG_AUTO_FP was originally suggested as a way of saying > "automati

[PATCH] aarch64: Update cpuinfo strings for some arch features

2024-12-03 Thread Kyrylo Tkachov
next week if there are no objections. Thanks, Kyrill Signed-off-by: Kyrylo Tkachov gcc/ * config/aarch64/aarch64-option-extensions.def (sve-b16b16, f32mm, f64mm, sve2p1, sme-f64f64, sme-i16i64, sme-b16b16, sme-f16f16, mops): Update FEATURE_STRING field. 0001-aarc

Re: [PATCH v1 1/1] aarch64: fix fp8 cpuinfo feature names

2024-12-03 Thread Kyrylo Tkachov
> On 3 Dec 2024, at 11:41, Claudio Bantaloukas > wrote: > > > > On 12/3/2024 10:24 AM, Kyrylo Tkachov wrote: >> Hi Claudio, >>> On 2 Dec 2024, at 19:14, Claudio Bantaloukas >>> wrote: >>> >>> >>> The previous version o

Re: [PATCH v1 1/1] aarch64: fix fp8 cpuinfo feature names

2024-12-03 Thread Kyrylo Tkachov
Hi Claudio, > On 2 Dec 2024, at 19:14, Claudio Bantaloukas > wrote: > > > The previous version of the patch was based on the mistaken assumption that > features in /proc/cpuinfo had matching names to the feature names that gcc and > gas accept. > This patch enables the fp8 feature when the f8c

Re: [PATCH 2/8]AArch64: Add Neoverse V3 core definition and cost model

2024-12-03 Thread Kyrylo Tkachov
Hi Tamar, Something I noticed when looking at the various tuning files…. > On 26 Jul 2024, at 11:20, Tamar Christina wrote: > > External email: Use caution opening links or attachments > > > Hi All, > > This adds a cost model and core definition for Neoverse V3. > > It also makes Cortex-X4

Re: [PATCH 1/1] aarch64: remove extra XTN in vector concatenation

2024-12-02 Thread Kyrylo Tkachov
Hi Akram, > On 2 Dec 2024, at 15:54, Akram Ahmad wrote: > > GIMPLE code which performs a narrowing truncation on the result of a > vector concatenation currently results in an unnecessary XTN being > emitted following a UZP1 to concate the operands. In cases such as this, > UZP1 should instead u

Re: [PATCH] aarch64: Extend SVE2 bit-select instructions for Neon modes.

2024-12-02 Thread Kyrylo Tkachov
> On 29 Nov 2024, at 14:16, Richard Sandiford wrote: > > Kyrylo Tkachov writes: >>> On 27 Nov 2024, at 09:34, Richard Sandiford >>> wrote: >>> >>> Soumya AR writes: >>>> NBSL, BSL1N, and BSL2N are bit-select intructions on SVE

Re: [PATCH v2] aarch64: Fix build failure due to missing header

2024-11-29 Thread Kyrylo Tkachov
> On 29 Nov 2024, at 14:49, Yury Khrustalev wrote: > > Including the "arm_acle.h" header in aarch64-unwind.h requires > stdint.h to be present and it may not be available during the > first stage of cross-compilation of GCC. > > When cross-building GCC for the aarch64-none-linux-gnu target >

Re: [PATCH] aarch64: Fix bootstrap build failure due to missing header

2024-11-29 Thread Kyrylo Tkachov
> On 29 Nov 2024, at 14:25, Yury Khrustalev wrote: > > Hi Kyrill, > > On Fri, Nov 29, 2024 at 02:06:17PM +, Kyrylo Tkachov wrote: >> Hi Yury, >> >>> On 29 Nov 2024, at 13:57, Yury Khrustalev wrote: >>> >>> Inclusion of "arm_ac

Re: [PATCH v5 5/5] aarch64: add SVE2 FP8DOT2 and FP8DOT4 intrinsics

2024-11-29 Thread Kyrylo Tkachov
> On 29 Nov 2024, at 13:00, Richard Sandiford wrote: > > Thanks for the update! > > Claudio Bantaloukas writes: >> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi >> index 2a4f016e2df..f7440113570 100644 >> --- a/gcc/doc/invoke.texi >> +++ b/gcc/doc/invoke.texi >> @@ -21957,6 +21957,18

Re: [PATCH] aarch64: Fix bootstrap build failure due to missing header

2024-11-29 Thread Kyrylo Tkachov
Hi Yury, > On 29 Nov 2024, at 13:57, Yury Khrustalev wrote: > > Inclusion of "arm_acle.h" would requires stdint.h that may > not be available during first stage of cross-compilation. Do you mean when trying to build a big-endian cross-compiler or something? The change seems harmless to me but t

Re: [PATCH] aarch64: Add ISA requirements to some SVE/SME md comments

2024-11-29 Thread Kyrylo Tkachov
> On 29 Nov 2024, at 13:04, Richard Sandiford wrote: > > Kyrylo Tkachov writes: >> Hi Richard >>> On 6 Nov 2024, at 18:16, Richard Sandiford >>> wrote: >>> >>> This series adds support for FEAT_SVE2p1 (-march=...+sve2p1). >>>

Re: [PATCH]AArch64 Suppress default options when march or mcpu used is not affected by it.

2024-11-29 Thread Kyrylo Tkachov
g that one? I've noted the documentation comment you > mentioned :) Ah, I did review the latest one, but I had clicked reply on the wrong one in the thread. I’ve ok’ed that explicitly separately. Kyrill > > Thanks, > Tamar > >> -Original Message- >> From: Kyrylo

Re: [PATCH]AArch64 Suppress default options when march or mcpu used is not affected by it.

2024-11-29 Thread Kyrylo Tkachov
> On 21 Nov 2024, at 10:13, Tamar Christina wrote: > >>> I tried writing automated testcases for these, however the testsuite doesn't >>> want to scan the output of -### and it makes the excess error tests always >>> fail >>> unless you use dg-error, which also looks for"error:". So tested ma

Re: [PATCH]AArch64 Suppress default options when march or mcpu used is not affected by it.

2024-11-29 Thread Kyrylo Tkachov
Hi Tamar, > On 15 Nov 2024, at 14:24, Tamar Christina wrote: > > Hi All, > > This patch makes it so that when you use any of the Cortex-A53 errata > workarounds but have specified an -march or -mcpu we know is not affected by > it > that we suppress the errata workaround. > > This is a driver

Re: [PATCH] Introduce -flto-partition=locality

2024-11-28 Thread Kyrylo Tkachov
Ping. > On 15 Nov 2024, at 17:04, Kyrylo Tkachov wrote: > > Hi all, > > This is a patch submission following-up from the RFC at: > https://gcc.gnu.org/pipermail/gcc/2024-November/245076.html > The patch is rebased and retested against current trunk, some debugging cod

Re: [PATCH] aarch64: Extend SVE2 bit-select instructions for Neon modes.

2024-11-27 Thread Kyrylo Tkachov
> On 27 Nov 2024, at 09:34, Richard Sandiford wrote: > > Soumya AR writes: >> NBSL, BSL1N, and BSL2N are bit-select intructions on SVE2 with certain >> operands >> inverted. These can be extended to work with Neon modes. >> >> Since these instructions are unpredicated, duplicate patterns wer

Re: [PATCH 00/15] aarch64: Add support for SVE2.1

2024-11-25 Thread Kyrylo Tkachov
Hi Richard > On 6 Nov 2024, at 18:16, Richard Sandiford wrote: > > This series adds support for FEAT_SVE2p1 (-march=...+sve2p1). > One thing that the extension does is make some SME and SME2 instructions > available outside of streaming mode. It also adds quite a few new > instructions. Some o

[PATCH] Introduce -flto-partition=locality

2024-11-15 Thread Kyrylo Tkachov
Ok for mainline? Thanks, Kyrill Signed-off-by: Prachi Godbole Co-authored-by: Kyrylo Tkachov config/ChangeLog: * bootstrap-lto-locality.mk: New file. gcc/ChangeLog: * Makefile.in (OBJS): Add ipa-locality-cloning.o (GTFILES): Add ipa-localit

Re: [PATCH 1/3] AArch64: Add baseline tune

2024-11-15 Thread Kyrylo Tkachov
> On 14 Nov 2024, at 18:40, Wilco Dijkstra wrote: > > > Cleanup the extra tune defines by introducing AARCH64_EXTRA_TUNE_BASE as a > common base supported by all modern cores. Initially set it to > AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND. No change in generated code. > > Passes regress & boo

Re: [PATCH 3/3] AArch64: Add SVE vector cost to baseline tuning

2024-11-15 Thread Kyrylo Tkachov
> On 15 Nov 2024, at 12:33, Wilco Dijkstra wrote: > > Hi Kyrill, > >> This would make USE_NEW_VECTOR_COSTS effectively the default. >> Jennifer has been trying to do that as well and then to remove it (as it >> would be always true) but there are some codegen regressions that still > >> need

Re: [PATCH 3/3] AArch64: Add SVE vector cost to baseline tuning

2024-11-15 Thread Kyrylo Tkachov
Hi Wilco, > On 14 Nov 2024, at 18:44, Wilco Dijkstra wrote: > > > Add AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS and > AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT > to the baseline tuning since all modern cores use it. Fix the neoverse512tvb > tuning to be > like Neoverse V1/V2. > This would

Re: [PATCH] AArch64: Switch off early scheduling

2024-11-13 Thread Kyrylo Tkachov
> On 12 Nov 2024, at 18:55, Richard Sandiford wrote: > > Wilco Dijkstra writes: >> Hi, >> > What do you think about disabling late scheduling as well? I think this would definitely need separate consideration and evaluation given the above. Another thing to con

Re: [PATCH 1/3] aarch64: Add support for fp8 convert and scale

2024-11-07 Thread Kyrylo Tkachov
Hi Saurabh, > On 6 Nov 2024, at 11:03, saurabh@arm.com wrote: > > > The AArch64 FEAT_FP8 extension introduces instructions for conversion > and scaling. > > This patch introduces the following intrinsics: > 1. vcvt{1|2}_{bf16|high_bf16|low_bf16}_mf8_fpm. > 2. vcvt{q}_mf8_f16_fpm. > 3. vcvt_

Re: [PATCH] aarch64: Extend support for the AE family of Cortex CPUs

2024-11-07 Thread Kyrylo Tkachov
Hi Victor, > On 31 Oct 2024, at 22:40, Victor Do Nascimento > wrote: > > Implement -mcpu options for: > > - Cortex-A520AE > - Cortex-A720AE > - Cortex-R82AE > > These all implement the same feature sets as their non-AE > counterparts, using the same scheduler and costs and differing only i

Re: [PATCH 2/2] aarch64: Add AdvSIMD LUT extension and vluti2{q}_lane{q} intrinsics

2024-11-06 Thread Kyrylo Tkachov
Hi Vladimir, Thanks for the patches! > On 6 Nov 2024, at 08:50, vladimir.miloser...@arm.com wrote: > > > The AArch64 FEAT_LUT extension is optional from Armv9.2-a and mandatory > from Armv9.5-a. This extension introduces instructions for lookup table > read with 2-bit indices. > > This patch ad

Fwd: [PATCH] PR target/117449: Restrict vector rotate match and split to pre-reload

2024-11-05 Thread Kyrylo Tkachov
Forwarding to the correct ML... > Begin forwarded message: > > From: Kyrylo Tkachov via Gcc > Subject: [PATCH] PR target/117449: Restrict vector rotate match and split to > pre-reload > Date: 5 November 2024 at 17:57:40 GMT+1 > To: gcc mailing list > Reply-To: Ky

Re: [PATCH 1/6] PR 117048: simplify-rtx: Simplify (X << C1) [+, ^] (X >> C2) into ROTATE

2024-11-04 Thread Kyrylo Tkachov
> On 4 Nov 2024, at 16:03, Kyrylo Tkachov wrote: > > > >> On 4 Nov 2024, at 15:20, Jakub Jelinek wrote: >> >> On Mon, Nov 04, 2024 at 02:31:29PM +0100, Jakub Jelinek wrote: >>> On Mon, Nov 04, 2024 at 01:07:33PM +, Kyrylo Tkachov wrote: >>>&g

Re: [PATCH 1/6] PR 117048: simplify-rtx: Simplify (X << C1) [+, ^] (X >> C2) into ROTATE

2024-11-04 Thread Kyrylo Tkachov
> On 4 Nov 2024, at 15:20, Jakub Jelinek wrote: > > On Mon, Nov 04, 2024 at 02:31:29PM +0100, Jakub Jelinek wrote: >> On Mon, Nov 04, 2024 at 01:07:33PM +, Kyrylo Tkachov wrote: >>>> This seems to have broken bootstrap on multiple targets and is caus

Re: [PATCH 1/6] PR 117048: simplify-rtx: Simplify (X << C1) [+, ^] (X >> C2) into ROTATE

2024-11-04 Thread Kyrylo Tkachov
> On 4 Nov 2024, at 13:55, Richard Biener wrote: > > On Thu, Oct 31, 2024 at 4:30 PM Jeff Law wrote: >> >> >> >> On 10/27/24 10:21 AM, Kyrylo Tkachov wrote: >>> Hi all, >>> >>> simplify-rtx can transform (X << C1) | (X &

Re: [PATCH] AArch64: Switch off early scheduling

2024-10-31 Thread Kyrylo Tkachov
> On 31 Oct 2024, at 18:06, Richard Sandiford wrote: > > Wilco Dijkstra writes: >> The early scheduler takes up ~33% of the total build time, however it doesn't >> provide a meaningful performance gain. This is partly because modern OoO >> cores >> need far less scheduling, partly because th

Re: [PATCH 4/6] expmed, aarch64: Optimize vector rotates as vector permutes where possible

2024-10-31 Thread Kyrylo Tkachov
Hi Jeff, > On 31 Oct 2024, at 16:25, Jeff Law wrote: > > > > On 10/27/24 10:22 AM, Kyrylo Tkachov wrote: >> Hi all, >> Some vector rotate operations can be implemented in a single instruction >> rather than using the fallback SHL+USRA sequence. >> In par

Re: [PATCH v2 04/21] aarch64: Add __builtin_aarch64_chkfeat

2024-10-31 Thread Kyrylo Tkachov
> On 31 Oct 2024, at 14:23, Yury Khrustalev wrote: > > From: Szabolcs Nagy > > Builtin for chkfeat: the input argument is used to initialize x16 then > execute chkfeat and return the updated x16. > > Note: ACLE __chkfeat(x) plans to flip the bits to be more intuitive > (xor the input to outp

Re: [PATCH v2 07/21] aarch64: Add GCS builtins

2024-10-31 Thread Kyrylo Tkachov
Hi Yury, > On 31 Oct 2024, at 14:23, Yury Khrustalev wrote: > > From: Szabolcs Nagy > > Add new builtins for GCS: > > void *__builtin_aarch64_gcspr (void) > uint64_t __builtin_aarch64_gcspopm (void) > void *__builtin_aarch64_gcsss (void *) > > The builtins are always enabled, but should b

Re: [PATCH] aarch64: Add support for FUJITSU-MONAKA (-mcpu=fujitsu-monaka) CPU

2024-10-31 Thread Kyrylo Tkachov
> On 31 Oct 2024, at 11:50, Richard Sandiford wrote: > > "Yuta Mukai (Fujitsu)" writes: >> Hello, >> >> This patch adds initial support for FUJITSU-MONAKA CPU, which we are >> developing. >> This is the slides for the CPU: >> https://www.fujitsu.com/downloads/SUPER/topics/isc24/next-arm-bas

Re: [PATCH 6/6] simplify-rtx: Simplify ROTATE:HI (X:HI, 8) into BSWAP:HI (X)

2024-10-29 Thread Kyrylo Tkachov
> On 27 Oct 2024, at 20:42, Jeff Law wrote: > > > > On 10/24/24 12:24 AM, Kyrylo Tkachov wrote: >>> On 24 Oct 2024, at 07:36, Jeff Law wrote: >>> >>> >>> >>> On 10/22/24 2:26 PM, Kyrylo Tkachov wrote: >>>> Hi all, &g

[PATCH][committed] aarch64: Use implementation namespace for vxarq_u64 immediate argument

2024-10-28 Thread Kyrylo Tkachov
Hi all, Looks like this immediate variable was missed out when I last fixed the namespace issues in arm_neon.h. Fixed in the obvious manner. Bootstrapped and tested on aarch64-none-linux-gnu. Pushing to trunk. Thanks, Kyrill Signed-off-by: Kyrylo Tkachov * config/aarch64/arm_neon.h

Re: [PATCH 4/6] aarch64: Optimize vector rotates into REV* instructions where possible

2024-10-27 Thread Kyrylo Tkachov
> On 25 Oct 2024, at 15:25, Richard Sandiford wrote: > > Kyrylo Tkachov writes: >>> On 25 Oct 2024, at 13:46, Richard Sandiford >>> wrote: >>> >>> Kyrylo Tkachov writes: >>>> Thank you for the suggestions! I’m trying them

[PATCH 6/6] simplify-rtx: Simplify ROTATE:HI (X:HI, 8) into BSWAP:HI (X)

2024-10-27 Thread Kyrylo Tkachov
This change is not enough to generate the equivalent sequence in SVE, but that is something that should be tackled separately. Bootstrapped and tested on aarch64-none-linux-gnu. Signed-off-by: Kyrylo Tkachov gcc/ * simplify-rtx.cc (simplify_context::simplify_binary_operat

[PATCH 4/6] expmed, aarch64: Optimize vector rotates as vector permutes where possible

2024-10-27 Thread Kyrylo Tkachov
ensure the permute indices are not messed up. Bootstrapped and tested on aarch64-none-linux-gnu. Richard had approved these changes in the previous iteration, but I’ll only push this after the prerequisites in the series. Thanks, Kyrill Signed-off-by: Kyrylo Tkachov gcc/ * expmed.h

[PATCH 5/6] aarch64: Emit XAR for vector rotates where possible

2024-10-27 Thread Kyrylo Tkachov
usrav31.4s, v0.4s, 23 mov v0.16b, v31.16b ret G2: shl v31.8b, v0.8b, 3 usrav31.8b, v0.8b, 5 mov v0.8b, v31.8b ret Bootstrapped and tested on aarch64-none-linux-gnu. Signed-off-by: Kyrylo Tkachov gcc/

[PATCH 3/6] PR 117048: aarch64: Add define_insn_and_split for vector ROTATE

2024-10-27 Thread Kyrylo Tkachov
-none-linux-gnu. I’ll push this if the prerequisites are approved. Thanks, Kyrill Signed-off-by: Kyrylo Tkachov gcc/ PR target/117048 * config/aarch64/aarch64-simd.md (*aarch64_simd_rotate_imm): New define_insn_and_split. gcc/testsuite/ PR target/117048

[PATCH 1/6] PR 117048: simplify-rtx: Simplify (X << C1) [+,^] (X >> C2) into ROTATE

2024-10-27 Thread Kyrylo Tkachov
lf-tests in this patch to validate the transformation. Bootstrapped and tested on aarch64-none-linux-gnu. Ok for mainline? Thanks, Kyrill Signed-off-by: Kyrylo Tkachov PR target/117048 * simplify-rtx.cc (extract_ashift_operands_p): Define. (simplif

[PATCH 2/6] aarch64: Use canonical RTL representation for SVE2 XAR and extend it to fixed-width modes

2024-10-27 Thread Kyrylo Tkachov
ed on aarch64-none-linux-gnu. Ok for mainline? Thanks, Kyrill Signed-off-by: Kyrylo Tkachov gcc/ * config/aarch64/iterators.md (SVE_ASIMD_FULL_I): New mode iterator. * config/aarch64/aarch64-sve2.md (@aarch64_sve2_xar): Use SVE_ASIMD_FULL_I modes. Use ROTATE code for the r

Re: [PATCH 4/6] aarch64: Optimize vector rotates into REV* instructions where possible

2024-10-25 Thread Kyrylo Tkachov
> On 25 Oct 2024, at 13:46, Richard Sandiford wrote: > > Kyrylo Tkachov writes: >> Thank you for the suggestions! I’m trying them out now. >> >>>> + if (rotamnt % BITS_PER_UNIT != 0) >>>> +return NULL_RTX; >>>> + machine_mo

Re: [PATCH 4/6] aarch64: Optimize vector rotates into REV* instructions where possible

2024-10-25 Thread Kyrylo Tkachov
Thank you for the suggestions! I’m trying them out now. > On 24 Oct 2024, at 21:11, Richard Sandiford wrote: > > Kyrylo Tkachov writes: >> Hi Richard, >> >>> On 23 Oct 2024, at 11:30, Richard Sandiford >>> wrote: >>> >>> Kyrylo Tk

  1   2   3   4   5   6   7   8   9   10   >