Hi Jennifer,
> On 18 Jul 2025, at 17:08, Jennifer Schmitz wrote:
>
>
>
>> On 18 Jul 2025, at 11:39, Kyrylo Tkachov wrote:
>>
>> External email: Use caution opening links or attachments
>>
>>
>> Hi all,
>>
>> For insertin
Hi Tamar,
> On 18 Jul 2025, at 18:25, Tamar Christina wrote:
>
> Hi Kyrill,
>
>> -Original Message-----
>> From: Kyrylo Tkachov
>> Sent: Friday, July 18, 2025 10:40 AM
>> To: GCC Patches
>> Cc: Tamar Christina ; Richard Sandiford
>> ; Alex C
Kyrill
Signed-off-by: Kyrylo Tkachov
gcc/
* config/arm/aarch-common-protos.h (vector_cost_table): Add ins_gp
field. Add comments to other vector cost fields.
* config/aarch64/aarch64.cc (aarch64_rtx_costs): Handle VEC_MERGE case.
* config/aarch64/aarch6
-by: Kyrylo Tkachov
gcc/
* config/aarch64/aarch64.cc (aarch64_rtx_costs): Add extra_cost values
only when speed is true for CONST_VECTOR, VEC_DUPLICATE, VEC_SELECT
cases.
* config/aarch64/aarch64-cost-tables.h (qdf24xx_extra_costs,
thunderx_extra_costs
> On 15 Jul 2025, at 15:50, Richard Sandiford wrote:
>
> Kyrylo Tkachov writes:
>> Hi all,
>>
>> SVE2 BSL2N (x, y, z) = (x & z) | (~y & ~z). When x == y this computes:
>> (x & z) | (~x & ~z) which is ~(x ^ z).
>> Thus, we can use it
> On 15 Jul 2025, at 15:01, Alex Coplan wrote:
>
> Hi,
>
> This relaxes an overzealous assert that required the fpm_t argument to
> be in DImode when expanding FP8 intrinsics. Of course this fails to
> account for modeless const_ints.
>
> Bootstrapped/regtested on aarch64-linux-gnu, OK for
Hi Alex,
> On 15 Jul 2025, at 14:59, Alex Coplan wrote:
>
> Hi,
>
> The predication of the SVE2 FP8 dot product insns was relying on the
> architectural dependency:
>
> FEAT_FP8DOT2 => FEAT_FP8DOT4
>
> which was relaxed in GCC as of
> r15-7480-g299a8e2dc667e795991bc439d2cad5ea5bd379e2, thus l
not z0.d, p3/m, z0.d
ret
Bootstrapped and tested on aarch64-none-linux-gnu.
Ok for trunk?
Thanks,
Kyrill
Signed-off-by: Kyrylo Tkachov
gcc/
* config/aarch64/aarch64-sve2.md (*aarch64_sve2_bsl2n_eon):
New pattern.
(*aarch64_sve2_eon_bsl2n_unpred)
nerate the MOVPRFX
when the operands fall that way, but I guess having a 2-insn MOVPRFX form is
not worse than the current 2-insn codegen at least, and the MOVPRFX can be
fused by many cores.
Bootstrapped and tested on aarch64-none-linux-gnu.
Ok for trunk?
Thanks,
Kyrill
Signed-off-by: Kyrylo Tka
> On 8 Jul 2025, at 17:43, Richard Sandiford wrote:
>
> Kyrylo Tkachov writes:
>> Thanks for your comments, do you mean something like the following?
>
> Yeah, the patch LGTM, thanks.
So it turned out that doing this in the EOR3 pattern in patch 4/7 caused
wrong-co
I had pushed this patch on Friday but have reverted it on trunk now because it
seems to be causing miscomputes in 531.deepsjeng_r.
Thanks,
Kyrill
> On 8 Jul 2025, at 08:28, Tamar Christina wrote:
>
>> -Original Message-----
>> From: Kyrylo Tkachov
>> Sent: Monda
+ arm maintainers.
Hi Pierre,
> On 14 Jul 2025, at 14:07, Pierre Ossman wrote:
>
> Suggested fix for this issue:
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60428
>
> Did not get any response there, so seeing if this is a better forum for
> suggested changes.
>
> We've been using this
> On 11 Jul 2025, at 16:48, Richard Sandiford wrote:
>
> Kyrylo Tkachov writes:
>>> On 10 Jul 2025, at 11:12, Kyrylo Tkachov wrote:
>>>
>>>
>>>
>>>> On 10 Jul 2025, at 10:40, Richard Sandiford
>>>> wrote:
>>>
> On 10 Jul 2025, at 11:12, Kyrylo Tkachov wrote:
>
>
>
>> On 10 Jul 2025, at 10:40, Richard Sandiford
>> wrote:
>>
>> Kyrylo Tkachov writes:
>>> Hi all,
>>>
>>> While the SVE2 NBSL instruction accepts MOVPRFX to add more f
> On 10 Jul 2025, at 10:40, Richard Sandiford wrote:
>
> Kyrylo Tkachov writes:
>> Hi all,
>>
>> While the SVE2 NBSL instruction accepts MOVPRFX to add more flexibility
>> due to its tied operands, the destination of the movprfx cannot be also
>> a so
> On 18 Jun 2025, at 17:26, Kyrylo Tkachov wrote:
>
> Hi all,
>
> This adds support for -mcpu=gb10. This is a big.LITTLE configuration
> involving Cortex-X925 and Cortex-A725 cores. The appropriate MIDR numbers
> are added to detect them in -mcpu=native. We did not add a
nbsl z0.d, z0.d, z2.d, z0.d
ret
which generated a gas warning.
Bootstrapped and tested on aarch64-none-linux-gnu.
Ok for trunk?
Do we want to backport it?
Thanks,
Kyrill
Signed-off-by: Kyrylo Tkachov
gcc/
PR target/120999
* config/aarch64/aarch64-sve2.md (*aa
> On 10 Jul 2025, at 08:09, Jakub Jelinek wrote:
>
> Hi!
>
> While I'm not a native English speaker, I believe all the uses
> of bellow (roar/bark/...) in comments in gcc are meant to be
> below (beneath/under/...).
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
>
Hi Alfie,
> On 7 Jul 2025, at 10:46, Alfie Richards wrote:
>
> Hello all,
>
> This patch implements the couple of amin/amax instructions that are part of
> SME2 + faminmax.
>
> Regression testsed and bootstrapped for Aarch64.
>
> Thanks,
> Alfie
>
> -- >8 --
>
> Implements the sme2+faminmax
> On 8 Jul 2025, at 12:39, Tamar Christina wrote:
>
>> -Original Message-
>> From: Richard Sandiford
>> Sent: Tuesday, July 8, 2025 10:07 AM
>> To: Tamar Christina
>> Cc: Kyrylo Tkachov ; GCC Patches > patc...@gcc.gnu.org>; Richard
> On 7 Jul 2025, at 13:27, Richard Sandiford wrote:
>
> Tamar Christina writes:
>>> -Original Message-
>>> From: Kyrylo Tkachov
>>> Sent: Monday, July 7, 2025 10:38 AM
>>> To: GCC Patches
>>> Cc: Richard Sandiford ; Richard Earns
for trunk?
Thanks,
Kyrill
Signed-off-by: Kyrylo Tkachov
gcc/
* config/aarch64/aarch64-sve2.md (*aarch64_sve2_bsl1n_unpreddi): New
define_insn_and_split.
gcc/testsuite/
* gcc.target/aarch64/sve2/bsl1n_d.c: New test.
0006-aarch64-Use-SVE2-BSL1N-for-DImode-arguments.patch
x1_t a, uint64x1_t b, uint64x1_t c) { return EOR3 (a,
b, c); }
We generate the desired:
eor3_d_gp:
eor x1, x1, x2
eor x0, x1, x0
ret
eor3_d:
eor3 v0.16b, v0.16b, v1.16b, v2.16b
ret
Bootstrapped and tested on aarch64-none-linux-gnu.
Ok for trunk?
Thanks,
Kyrill
Signed-off-by: Kyrylo Tkachov
ested on aarch64-none-linux-gnu.
Ok for trunk?
Thanks,
Kyrill
Signed-off-by: Kyrylo Tkachov
gcc/
* config/aarch64/aarch64-sve2.md (*aarch64_sve2_bsl2n_unpreddi): New
define_insn_and_split.
* config/aarch64/aarch64.cc (aarch64_bsl2n_rtx_form_p): Define.
(aarch64_rt
trunk?
Thanks,
Kyrill
Signed-off-by: Kyrylo Tkachov
gcc/
* config/aarch64/aarch64-sve.md (*aarch64_sve2_nbsl_unpreddi): New
define_insn_and_split.
gcc/testsuite/
* gcc.target/aarch64/sve2/nbsl_d.c: New test.
0005-aarch64-Use-SVE2-NBSL-for-DImode-arguments.patch
Description:
of:
bcax_s:
eor v1.8b, v1.8b, v2.8b
eor v0.8b, v1.8b, v0.8b
ret
Bootstrapped and tested on aarch64-none-linux-gnu.
Ok for trunk?
Thanks,
Kyrill
Signed-off-by: Kyrylo Tkachov
gcc/
* config/aarch64/aarch64-simd.md (eor3q4): Use VDQ_I mode
iterator.
gcc/testsuite
b
ret
When the inputs are in SIMD regs we use BCAX and when they are in GP regs we
don't force them to SIMD with extra moves.
Bootstrapped and tested on aarch64-none-linux-gnu.
Ok for trunk?
Thanks,
Kyrill
Signed-off-by: Kyrylo Tkachov
gcc/
* config/aarch64/aarch64-simd
rovement
always.
Bootstrapped and tested on aarch64-none-linux-gnu.
Ok for trunk?
Thanks,
Kyrill
Signed-off-by: Kyrylo Tkachov
gcc/
* config/aarch64/aarch64-simd.md (bcaxq4): Use VDQ_I mode
iterator.
gcc/testsuite/
* gcc.target/aarch64/simd/bcax_d.c: New test.
0001-a
Resending due to difficulties with my email
> On 7 Jul 2025, at 11:56, Kyrylo Tkachov wrote:
>
> Hi all,
>
> This series improves code generation for 64-bit vector types as well as the
> scalar DImode types.
> It makes use of SHA3 and SVE2 instructions like BCAX, EOR3
cheap itself and can be scheduled away from the critical path or even CSE'd
with other PTRUE constants.
As this sequence is larger code size-wise it is avoided for -Os.
Bootstrapped and tested on aarch64-none-linux-gnu.
Ok for trunk?
Thanks,
Kyrill
Signed-off-by: Kyrylo Tkachov
> On 1 Jul 2025, at 18:37, Alex Coplan wrote:
>
> The "else operand" to maskload should always be a const_vector, never a
> const_int.
>
> This was just an issue I noticed while looking through the code, I don't
> have a testcase which shows a concrete problem due to this.
>
> Testing of tha
> On 1 Jul 2025, at 17:36, Richard Sandiford wrote:
>
> Soumya AR writes:
>> From 2a2c3e3683aaf3041524df166fc6f8cf20895a0b Mon Sep 17 00:00:00 2001
>> From: Soumya AR
>> Date: Mon, 30 Jun 2025 12:17:30 -0700
>> Subject: [PATCH] aarch64: Enable selective LDAPUR generation for cores with
>> RCP
> On 17 Jun 2025, at 12:19, Kyrylo Tkachov wrote:
>
>
>
>> On 4 Apr 2025, at 20:28, ezra.sito...@arm.com wrote:
>>
>> From: Ezra Sitorus
>>
>> This patch updates `aarch64-sys-regs.def', bringing it into sync with
>> the Binutil
trunk and GCC 15 when I’m back.
Thanks,
Kyrill
Signed-off-by: Kyrylo Tkachov
gcc/
* config/aarch64/aarch64-cores.def (gb10): New entry.
* config/aarch64/aarch64-tune.md: Regenerate.
* doc/invoke.texi (AArch64 Options): Document the above.
0001-aarch64-Add-support-for
> On 16 Jun 2025, at 09:54, Richard Sandiford wrote:
>
> We generated inefficient code for bitfield references to Advanced
> SIMD structure modes. In RTL, these modes are just extra-long
> vectors, and so inserting and extracting an element is simply
> a vec_set or vec_extract operation.
>
>
> On 4 Apr 2025, at 20:28, ezra.sito...@arm.com wrote:
>
> From: Ezra Sitorus
>
> This patch updates `aarch64-sys-regs.def', bringing it into sync with
> the Binutils source after this change:
> https://sourceware.org/pipermail/binutils/2025-March/139894.html
Ok. I think these changes are co
Hi Spencer,
Thanks for the patch.
> On 13 Jun 2025, at 14:46, Spencer Abson wrote:
>
> Add the missing combiner patterns for folding NOT+PTEST to NOTS when
> they share the same GP.
>
I guess GP here means “governing predicate”?
GP usually means “General Purpose (register)” in aarch64 so it’d
> On 12 Jun 2025, at 18:20, Remi Machet wrote:
>
>
> On 6/12/25 12:02, Richard Sandiford wrote:
>> External email: Use caution opening links or attachments
>>
>>
>> Remi Machet writes:
>>> Add an optimization to aarch64 SIMD converting mvn+shrn into mvni+subhn
>>> which
>>> allows for bett
> On 12 Jun 2025, at 18:02, Richard Sandiford wrote:
>
> Remi Machet writes:
>> Add an optimization to aarch64 SIMD converting mvn+shrn into mvni+subhn
>> which
>> allows for better optimization when the code is inside a loop by using a
>> constant.
>>
>> Bootstrapped and regtested on aarch6
> On 11 Jun 2025, at 16:22, Richard Sandiford wrote:
>
> The PCS defines a lazy save scheme for managing ZA across normal
> "private-ZA" functions. GCC currently uses this scheme for calls
> to all private-ZA functions (rather than using caller-save).
>
> Therefore, before a sequence of call
> On 3 Jun 2025, at 17:56, Richard Sandiford wrote:
>
> Tamar Christina writes:
>> As requested in my patch for -mmax-vectorization this promotes the parameter
>> --param aarch64-autovec-preference to a first class top target flag.
>>
>> If both the parameter and the flag is specified the par
> On 28 May 2025, at 13:36, Kyrylo Tkachov wrote:
>
> Hi Yuta-san
>
>> On 23 May 2025, at 07:49, Yuta Mukai (Fujitsu)
>> wrote:
>>
>> Hello,
>>
>> We would like to enable features for FUJITSU-MONAKA that were implemented in
>> GC
Hi Yuta-san
> On 23 May 2025, at 07:49, Yuta Mukai (Fujitsu) wrote:
>
> Hello,
>
> We would like to enable features for FUJITSU-MONAKA that were implemented in
> GCC after we added support for FUJITSU-MONAKA.
> As the features were implemented in GCC15, we also want to backport it to
> GCC15.
> On 16 May 2025, at 12:35, Richard Sandiford wrote:
>
> Jennifer Schmitz writes:
>> The ICE in PR120276 resulted from a comparison of VNx4QI and V8QI using
>> partial_subreg_p in the function copy_value during the RTL pass
>> regcprop, failing the assertion in
>>
>> inline bool
>> partial_su
> On 10 May 2025, at 06:17, Andrew Pinski wrote:
>
> Since the AARCH64_CORE defines in aarch64-cores.def all use -1 for
> the variant, it is just easier to add the cast to unsigned in the usage
> in driver-aarch64.cc.
>
> Build and tested on aarch64-linux-gnu.
Ok.
Thanks,
Kyrill
>
> gcc/Ch
> On 10 May 2025, at 05:59, Andrew Pinski wrote:
>
> There is a narrowing warning in aarch64_detect_vector_stmt_subtype
> about gather_load_x32_cost and gather_load_x64_cost converting from int to
> unsigned.
> These fields are always unsigned and even the constructor for sve_vec_cost
> take
> On 8 May 2025, at 21:10, Karl Meakin wrote:
>
> Add rules for lowering `cbranch4` to CBB/CBH/CB when
> CMPBR extension is enabled.
>
> gcc/ChangeLog:
>
> * config/aarch64/aarch64.md (cbranch4): Mmit CMPBR
> instructions if possible.
> (BRANCH_LEN_P_1Kib): New constant.
> (BRANCH_LEN_N_1Kib)
Hi Richard,
> On 7 May 2025, at 18:15, Richard Earnshaw wrote:
>
>
> The header file for the Arm implementation of mmintrin.h was changed in GCC-15
> to disable access to the intrinsics. This patch removes the internal code
> as well.
>
> We still allow -mcpu/-march options for the wmmx cpus,
> On 7 May 2025, at 12:27, Karl Meakin wrote:
>
> Give the `define_insn` rules used in lowering `cbranch4` to RTL
> more descriptive and consistent names: from now on, each rule is named
> after the AArch64 instruction that it generates. Also add comments to
> document each rule.
>
> gcc/Chang
> On 7 May 2025, at 12:27, Karl Meakin wrote:
>
> The rules for conditional branches were spread throughout `aarch64.md`.
> Group them together so it is easier to understand how `cbranch4`
> is lowered to RTL.
>
> gcc/ChangeLog:
>
> * config/aarch64/aarch64.md (condjump): move.
> (*compare_co
Hi Karl,
> On 7 May 2025, at 12:27, Karl Meakin wrote:
>
> This patch series adds support for the CMPBR extension. It includes the
> new `+cmpbr` option and rules to generate the new instructions when
> lowering conditional branches.
Thanks for the series.
You didn’t state it explicitly, but ha
> On 7 May 2025, at 12:27, Karl Meakin wrote:
>
> Add rules for lowering `cbranch4` to CBB/CBH/CB when CMPBR
> extension is enabled.
>
> gcc/ChangeLog:
>
> * config/aarch64/aarch64.md (cbranch4): emit CMPBR
> instructions if possible.
> (cbranch4): new expand rule.
> (aarch64_cb): likewise.
>
> On 7 May 2025, at 12:27, Karl Meakin wrote:
>
> Commit the test file `cmpbr.c` before rules for generating the new
> instructions are added, so that the changes in codegen are more obvious
> in the next commit.
I guess that’s an LLVM best practice.
In GCC since we have the check-function-bod
> On 7 May 2025, at 12:27, Karl Meakin wrote:
>
> Add the `+cmpbr` option to enable the FEAT_CMPBR architectural
> extension.
>
> gcc/ChangeLog:
>
> * config/aarch64/aarch64-option-extensions.def (cmpbr): new
> option.
> * config/aarch64/aarch64.h (TARGET_CMPBR): new macro.
> * doc/invoke.tex
> On 7 May 2025, at 12:27, Karl Meakin wrote:
>
> The `far_branch` attribute only ever takes the values 0 or 1, so make it
> a `no/yes` valued string attribute instead.
>
> gcc/ChangeLog:
>
> * config/aarch64/aarch64.md (far_branch): replace 0/1 with
> no/yes.
> (aarch64_bcond): handle renam
> On 7 May 2025, at 12:27, Karl Meakin wrote:
>
> Extract the hardcoded values for the minimum PC-relative displacements
> into named constants and document them.
>
> gcc/ChangeLog:
>
> * config/aarch64/aarch64.md (BRANCH_LEN_P_128MiB): New constant.
> (BRANCH_LEN_N_128MiB): likewise.
> (BRA
> On 7 May 2025, at 12:27, Karl Meakin wrote:
>
> Make the formatting of the RTL templates in the rules for branch
> instructions more consistent with each other.
>
> gcc/ChangeLog:
>
> * config/aarch64/aarch64.md (cbranch4): reformat.
> (cbranchcc4): likewise.
> (condjump): likewise.
> (*co
> On 6 May 2025, at 10:30, Soumya AR wrote:
>
> From: Soumya AR
>
> This patch adds a get_map () method to the JSON object class to provide access
> to the underlying hash map that stores the JSON key-value pairs.
>
> It also reorganizes the private and public sections of the class to expos
In Hi Richard,
> On 6 May 2025, at 12:34, Richard Sandiford wrote:
>
> writes:
>> From: Soumya AR
>>
>> Hi,
>>
>> This RFC and subsequent patch series introduces support for printing and
>> parsing
>> of aarch64 tuning parameters in the form of JSON.
>
> Thanks for doing this. It looks r
> On 4 May 2025, at 19:19, Yangyu Chen wrote:
>
> Hi everyone,
>
> This patch series introduces support for the target_clones profile
> option in GCC. This option enables users to specify target_clones
> attributes in a separate file, allowing GCC to generate multiple
> versions of the functio
Pushing as obvious.
Signed-off-by: Kyrylo Tkachov
0001-AArch64-changes.html-Fix-typo.patch
Description: 0001-AArch64-changes.html-Fix-typo.patch
> On 1 May 2025, at 14:02, Ayan Shafqat wrote:
>
> On Thu, May 01, 2025 at 08:09:18AM +0000, Kyrylo Tkachov wrote:
>>
>> I was going to ask why not use the standard __buuiltin_sqrt builtins but I
>> guess those don’t guarantee that we avoid a libcall in
> On 28 Apr 2025, at 21:29, Ayan Shafqat wrote:
>
> Rebased with gcc 15.1
>
> This patch introduces two new inline functions, __sqrt and __sqrtf, in
> arm_acle.h for Aarch64 targets. These functions wrap the new builtins
> __builtin_aarch64_sqrtdf and __builtin_aarch64_sqrtsf, respectively,
>
> On 28 Apr 2025, at 21:27, Ayan Shafqat wrote:
>
> Rebased with gcc 15.1
>
> This patch changes the `sqrt` builtin definition from `BUILTIN_VHSDF_DF`
> to `BUILTIN_VHSDF_HSDF` in `aarch64-simd-builtins.def`, ensuring the
> builtin covers half, single, and double precision variants. The redun
> On 25 Apr 2025, at 19:55, Richard Sandiford wrote:
>
> Jennifer Schmitz writes:
>> If -msve-vector-bits=128, SVE loads and stores (LD1 and ST1) with a
>> ptrue predicate can be replaced by neon instructions (LDR and STR),
>> thus avoiding the predicate altogether. This also enables formation
> On 25 Apr 2025, at 12:06, Richard Sandiford wrote:
>
> Kyrylo Tkachov writes:
>> Hi Richard,
>>
>>> On 23 Apr 2025, at 13:47, Richard Sandiford
>>> wrote:
>>>
>>> Thanks for all the feedback. I've tried to address it in
> On 23 Apr 2025, at 13:47, Richard Sandiford wrote:
>
> Thanks for all the feedback. I've tried to address it in the version
> below. I'll push later today if there are no further comments.
>
> Richard
>
>
> The list is structured as:
>
> - new configurations
> - command-line changes
> -
> On 24 Apr 2025, at 14:44, Jakub Jelinek wrote:
>
> On Thu, Apr 24, 2025 at 12:39:59PM +0000, Kyrylo Tkachov wrote:
>>> The third case looks undesirable, -fno-ipa-reorder-for-locality is the
>>> default and shouldn't affect anything, whether explicit or im
> On 24 Apr 2025, at 14:28, Jakub Jelinek wrote:
>
> On Thu, Apr 24, 2025 at 12:05:06PM +0000, Kyrylo Tkachov wrote:
>>>>> On 24 Apr 2025, at 12:09, Jakub Jelinek wrote:
>>>>>
>>>>> On Thu, Apr 24, 2025 at 09:54:09AM +, Kyrylo T
> On 24 Apr 2025, at 12:18, Jakub Jelinek wrote:
>
> On Thu, Apr 24, 2025 at 10:15:08AM +0000, Kyrylo Tkachov wrote:
>>
>>
>>> On 24 Apr 2025, at 12:09, Jakub Jelinek wrote:
>>>
>>> On Thu, Apr 24, 2025 at 09:54:09AM +, Kyrylo Tkach
> On 24 Apr 2025, at 12:09, Jakub Jelinek wrote:
>
> On Thu, Apr 24, 2025 at 09:54:09AM +0000, Kyrylo Tkachov wrote:
>>> I'd have expected instead of the LTO_PARTITION_DEFAULT checks one should be
>>> testing !opts_set->x_flag_lto_partition (i.e. -flto-p
lt
>>> up to that point. We should also be testing opts instead of opts_set here.
>>>
>>> Bootstrapped and tested on aarch64-none-linux-gnu.
>>>
>>> Ok for trunk? Sorry for the late patch, but I guess we want this in the GCC
>>> 15 branch as
instead of opts_set here.
Bootstrapped and tested on aarch64-none-linux-gnu.
Ok for trunk? Sorry for the late patch, but I guess we want this in the GCC 15
branch as well.
Thanks,
Kyrill
Signed-off-by: Kyrylo Tkachov
gcc/
* opts.cc (finish_options): Check for == against
gt;> opts_set->x_flag_lto_partition = opts->x_flag_lto_partition =
>> LTO_PARTITION_BALANCED;
>
Hmm, yes I think the condition should be == instead of !=. I’ll test a patch
momentarily.
Thanks,
Kyrill
> Regards,
> Feng
>
> From:
> On 23 Apr 2025, at 08:37, Tamar Christina wrote:
>
> Hi All,
>
> This patch proposes a new vector cost model called "max". The cost model is
> an
> intersection between two of our existing cost models. Like `unlimited` it
> disables the costing vs scalar and assumes all vectorization to
> On 22 Apr 2025, at 15:31, Tamar Christina wrote:
>
>> -Original Message-
>> From: Richard Sandiford
>> Sent: Tuesday, April 22, 2025 2:28 PM
>> To: Tamar Christina
>> Cc: gcc-patches@gcc.gnu.org; Richard Earnshaw ;
>> ktkac...@nvidia.com
>> Subject: Re: [PATCH] Document AArch64 cha
ed on aarch64-none-linux-gnu.
I’m pushing this to trunk, is it also ok for the GCC 15 branch? I’d like to
have the right CPU features enabled for the realease.
Thanks,
Kyrill
Signed-off-by: Kyrylo Tkachov
gcc/
* config/aarch64/aarch64-cores.def (olympus): Add fp8fma, fp8dot4
expli
: Kyrylo Tkachov
* invoke.texi (lto-partition-locality-frequency-cutoff,
lto-partition-locality-size-cutoff, lto-max-locality-partition):
Document.
0001-Document-locality-partitioning-params-in-invoke.texi.patch
Description: 0001-Document-locality-partitioning-params-in
Pushing as obvious.
Thanks,
Kyrill
Signed-off-by: Kyrylo Tkachov
* common.opt.urls: Regenerate.
0001-Regenerate-common.opt.urls.patch
Description: 0001-Regenerate-common.opt.urls.patch
> On 15 Apr 2025, at 15:42, Richard Biener wrote:
>
> On Mon, Apr 14, 2025 at 3:11 PM Kyrylo Tkachov wrote:
>>
>> Hi Honza,
>>
>>> On 13 Apr 2025, at 23:19, Jan Hubicka wrote:
>>>
>>>> +@opindex fipa-reorder-for-locality
>>>
Hi Tejas,
> On 14 Apr 2025, at 16:04, Tejas Belagod wrote:
>
> The operand order to gen_vcond_mask call in the vec_extract pattern is wrong.
> Fix the order where predicate is operand 3.
>
> Tested and bootstrapped on aarch64-linux-gnu. OK for trunk?
>
> gcc/ChangeLog
>
> * config/aarch64/aar
Hi Honza,
> On 13 Apr 2025, at 23:19, Jan Hubicka wrote:
>
>> +@opindex fipa-reorder-for-locality
>> +@item -fipa-reorder-for-locality
>> +Group call chains close together in the binary layout to improve code code
>> +locality. This option is incompatible with an explicit
>> +@option{-flto-part
> On 26 Mar 2025, at 08:42, Kyrylo Tkachov wrote:
>
> Ping.
Ping.
https://gcc.gnu.org/pipermail/gcc-patches/2025-March/676958.html
I’ve ran a profiled LTO bootstrap of GCC with the new bootstrap-lto-locality
bootstrap config
And compared it against a GCC produced by the exi
> On 7 Apr 2025, at 10:21, Tamar Christina wrote:
>
>> -Original Message-----
>> From: Kyrylo Tkachov
>> Sent: Monday, March 31, 2025 1:43 PM
>> To: i...@sandoe.co.uk
>> Cc: Tamar Christina ; GCC Patches > patc...@gcc.gnu.org>; Alice Carlotti ;
> On 31 Mar 2025, at 09:43, Richard Biener wrote:
>
> On Mon, Mar 31, 2025 at 9:41 AM Richard Biener
> wrote:
>>
>> On Mon, Mar 31, 2025 at 9:36 AM Kyrylo Tkachov wrote:
>>>
>>> Ping.
>>
>> Can you reference the patch please? I'
Hi all,
As we're starting a new month, introduce a more appropriate -mapril=
to specify the compilation target instead.
This helps keep GCC more up to date with the passage of time.
Bootstrapped and tested on aarch64-none-linux-gnu.
Signed-off-by: Kyrylo Tkachov
gcc/
* config/aa
Hi Iain,
> On 22 Mar 2025, at 15:31, Iain Sandoe wrote:
>
> 0. Sorry this has taken some time to close off; partly because of waiting
> for input, but mostly that I've been stretched with other work.
> 1. As per the commit message, the apparent non-conformance with 8.5/6
> because FEAT_SPECR
Ping.
Thanks,
Kyrill
> On 24 Mar 2025, at 14:28, Kyrylo Tkachov wrote:
>
> Hi all,
>
> In this testcase GCC tries to expand a VNx4BI vector:
> vector(4) _40;
> _39 = () _24;
> _40 = {_39, _39, _39, _39};
>
> This ends up in a scalarised sequence of bitfiel
Ping.
Thanks,
Kyrill
> On 6 Mar 2025, at 09:25, Kyrylo Tkachov wrote:
>
> Hi all,
>
> Implement partitioning and cloning in the callgraph to help locality.
> A new -fipa-reorder-for-locality flag is used to enable this.
> The majority of the logic is in the new IPA
bfis are gone.
Bootstrapped and tested on aarch64-none-linux-gnu.
Given this a regression from GCC 13 is this ok for trunk now?
Thanks,
Kyrill
Signed-off-by: Kyrylo Tkachov
gcc/
PR middle-end/119442
* expr.cc (store_constructor): Also allow element modes explicitly
accepted by
Hi Dhruv,
> On 21 Mar 2025, at 11:11, Dhruv Chawla wrote:
>
> This adds support for the NVIDIA Olympus core to the AArch64 backend. The
> initial patch does not add any special tuning decisions, and those may come
> later.
>
> Bootstrapped and tested on aarch64-none-linux-gnu.
>
Thanks, given
g to trunk.
Thanks,
Kyrill
Signed-off-by: Kyrylo Tkachov
gcc/
* config/aarch64/aarch64-arches.def (...): Add SVE2p1.
* doc/invoke.texi (AArch64 Options): Document +sve2p1 in
-march=armv9.4-a.
0001-aarch64-Add-sve2p1-to-march-armv9.4-a-flags.patch
Description: 0001-a
> On 16 Mar 2025, at 20:15, Ayan Shafqat wrote:
>
> This patch introduces inline definitions for the __fma and __fmaf
> functions in arm_acle.h for Aarch64 targets. These definitions rely on
> __builtin_fma and __builtin_fmaf to ensure proper inlining and to meet
> the ACLE requirements [1].
>
Hi Ayan,
> On 11 Mar 2025, at 14:53, Ayan Shafqat wrote:
>
> Hello Kyrylo,
>
> On Tue, Mar 11, 2025 at 08:55:46AM +, Kyrylo Tkachov wrote:
>> This looks ok to me.
>> GCC is currently in a regression fixing stage so normally such a change
>> would wait u
Hi Ayan,
> On 9 Mar 2025, at 21:46, Ayan Shafqat wrote:
>
> This patch introduces inline definitions for the __fma and __fmaf
> functions in arm_acle.h for AArch64 targets. These definitions rely on
> __builtin_fma and __builtin_fmaf to ensure proper inlining and to meet
> the ACLE requirements
ality, but we'd appreciate wider performance evaluation.
Bootstrapped and tested on aarch64-none-linux-gnu.
Ok for mainline?
Thanks,
Kyrill
Signed-off-by: Prachi Godbole
Co-authored-by: Kyrylo Tkachov
config/ChangeLog:
* bootstrap-lto-locality.mk: New file.
gcc
both (normal LTO bootstrap and profiledbootstrap).
>>
>> With this optimization we are seeing good performance gains on some large
>> internal workloads that stress the parts of the processor that is sensitive
>> to code locality, but we'd appreciate wider performance eva
.
Bootstrapped and tested on aarch64-none-linux-gnu.
Pushing to trunk.
Thanks,
Kyrill
Signed-off-by: Kyrylo Tkachov
PR rtl-optimization/119046
* config/aarch64/aarch64.cc (aarch64_evpc_dup): Use VOIDmode for
PARALLEL.
0001-PR-rtl-optimization-119046-aarch64-Fix-PARALLEL
> On 5 Mar 2025, at 11:14, Richard Biener wrote:
>
> On Tue, Mar 4, 2025 at 10:01 PM Richard Sandiford
> wrote:
>>
>> Kyrylo Tkachov writes:
>>> Hi all,
>>>
>>> In this testcase late-combine was failing to merge:
>>> dup v31.4s
> On 3 Mar 2025, at 19:52, Wilco Dijkstra wrote:
>
>
> Outline atomics is not designed to be used with -mcmodel=large, so disable
> it automatically if the large code model is used.
>
> Passes regress, OK for commit?
>
This restriction should be documented in invoke.texi IMO.
I also think i
1 - 100 of 1219 matches
Mail list logo