[PATCH v3 2/2]AArch64: propose -mmax-vectorization as an option to override vector costing

2025-06-03 Thread Tamar Christina
Hi All, With the middle-end providing a way to make vectorization more profitable by scaling vect-scalar-cost-multiplier this makes a more user friendly option to make it easier to use. I propose making it an actual -m option that we document and retain vs using the parameter name. In the future

[PATCH 1/2]AArch64 docs: add itemx for outline-atomics docs

2025-06-03 Thread Tamar Christina
The documentation for outline atomics is missing the entry for -mno-outline-atomics which this patch adds. Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. Ok for master? Thanks, Tamar gcc/ChangeLog: * doc/extend.texi (outline-atomics): Document the inverse -mno flag. -

AArch64 promote aarch64-autovec-peference to mautovec-preference

2025-06-03 Thread Tamar Christina
As requested in my patch for -mmax-vectorization this promotes the parameter --param aarch64-autovec-preference to a first class top target flag. If both the parameter and the flag is specified the parameter takes precedence with the reasoning that it may already be embedded in build systems. Boo

RE: [PATCH 1/2]middle-end: Apply loop->unroll directly in vectorizer

2025-06-02 Thread Tamar Christina
> -Original Message- > From: Richard Biener > Sent: Monday, May 26, 2025 2:56 PM > To: Tamar Christina > Cc: gcc-patches@gcc.gnu.org; nd > Subject: RE: [PATCH 1/2]middle-end: Apply loop->unroll directly in vectorizer > > On Mon, 19 May

RE: [PATCH 1/2]middle-end: Add new parameter to scale scalar loop costing in vectorizer

2025-05-19 Thread Tamar Christina
> > +-param=vect-scalar-cost-multiplier= > > +Common Joined UInteger Var(param_vect_scalar_cost_multiplier) Init(1) > IntegerRange(0, 10) Param Optimization > > +The scaling multiplier to add to all scalar loop costing when performing > vectorization profitability analysis. The default value i

RE: [PATCH 1/2]middle-end: Apply loop->unroll directly in vectorizer

2025-05-19 Thread Tamar Christina
> >/* Complete the target-specific cost calculations. */ > >loop_vinfo->vector_costs->finish_cost (loop_vinfo->scalar_costs); > >vec_prologue_cost = loop_vinfo->vector_costs->prologue_cost (); > > @@ -12373,6 +12394,13 @@ vect_transform_loop (loop_vec_info loop_vinfo, > gimple *loop_ve

RE: [PATCH][RFC] Allow the target to request a masked vector epilogue

2025-05-18 Thread Tamar Christina
> -Original Message- > From: Richard Biener > Sent: Friday, May 16, 2025 11:35 AM > To: gcc-patches@gcc.gnu.org > Cc: Richard Sandiford ; Tamar Christina > > Subject: [PATCH][RFC] Allow the target to request a masked vector epilogue > > Targets recently got

[PATCH 2/2]AArch64: Use vectorizer initial unrolling as default

2025-05-14 Thread Tamar Christina
Hi All, The vectorizer now tries to maintain the target VF that a user wanted through uncreasing the unroll factor if the user used pragma GCC unroll and we've vectorized the loop. This change makes the AArch64 backend honor this initial value being set by the vectorizer. Consider the loop void

RE: [PATCH 1/2]middle-end: Apply loop->unroll directly in vectorizer

2025-05-14 Thread Tamar Christina
> > > > > > > > - /* Loops vectorized with a variable factor won't benefit from > > > > + /* Loops vectorized would have already taken into account unrolling > specified > > > > + by the user as the suggested unroll factor, as such we need to > > > > prevent the > > > > + RTL unroller fr

RE: [PATCH v2 1/2]middle-end: Add new parameter to scale scalar loop costing in vectorizer

2025-05-14 Thread Tamar Christina
> -Original Message- > From: Tamar Christina > Sent: Wednesday, May 14, 2025 12:19 PM > To: gcc-patches@gcc.gnu.org > Cc: nd ; rguent...@suse.de > Subject: [PATCH v2 1/2]middle-end: Add new parameter to scale scalar loop > costing in vectorizer > > Hi All,

[PATCH v2 2/2]AArch64: propose -mmax-vectorization as an option to override vector costing

2025-05-14 Thread Tamar Christina
Hi All, With the middle-end providing a way to make vectorization more profitable by scaling vect-scalar-cost-multiplier this makes a more user friendly option to make it easier to use. I propose making it an actual -m option that we document and retain vs using the parameter name. In the future

RE: [PATCH][RFC] Add vector_costs::add_vector_cost vector stmt grouping hook

2025-05-13 Thread Tamar Christina
> -Original Message- > From: Richard Biener > Sent: Tuesday, May 13, 2025 12:08 PM > To: Richard Sandiford > Cc: gcc-patches@gcc.gnu.org; Tamar Christina > Subject: Re: [PATCH][RFC] Add vector_costs::add_vector_cost vector stmt > grouping hook > > On Tue, 13

RE: [PATCH 1/2]middle-end: Apply loop->unroll directly in vectorizer

2025-05-13 Thread Tamar Christina
> -Original Message- > From: Richard Biener > Sent: Tuesday, May 13, 2025 1:59 PM > To: Tamar Christina > Cc: gcc-patches@gcc.gnu.org; nd > Subject: Re: [PATCH 1/2]middle-end: Apply loop->unroll directly in vectorizer > > On Tue, 13 May 2025, Tamar Chr

RE: [PATCH 1/4]middle-end: document pragma unroll n [PR116140]

2025-05-13 Thread Tamar Christina
> -Original Message- > From: Richard Biener > Sent: Tuesday, May 13, 2025 1:36 PM > To: Jakub Jelinek > Cc: Tamar Christina ; Jonathan Wakely > ; gcc-patches@gcc.gnu.org; nd > Subject: Re: [PATCH 1/4]middle-end: document pragma unroll n > [PR116140] > &

RE: [PATCH 2/4][c-frontend]: implement pragma unroll n for C [PR116140]

2025-05-13 Thread Tamar Christina
> -Original Message- > From: Joseph Myers > Sent: Tuesday, May 13, 2025 12:35 PM > To: Tamar Christina > Cc: gcc-patches@gcc.gnu.org; nd > Subject: Re: [PATCH 2/4][c-frontend]: implement pragma unroll n > for C [PR116140] > > On Tue, 13 May 2025, Tamar Chri

RE: [PATCH 1/4]middle-end: document pragma unroll n [PR116140]

2025-05-13 Thread Tamar Christina
> -Original Message- > From: Richard Biener > Sent: Tuesday, May 13, 2025 12:44 PM > To: Eric Botcazou > Cc: Tamar Christina ; gcc-patches@gcc.gnu.org; nd > > Subject: Re: [PATCH 1/4]middle-end: document pragma unroll n > [PR116140] > > On Tue, 13

RE: [PATCH 1/4]middle-end: document pragma unroll n [PR116140]

2025-05-13 Thread Tamar Christina
> -Original Message- > From: Jakub Jelinek > Sent: Tuesday, May 13, 2025 11:49 AM > To: Tamar Christina > Cc: Jonathan Wakely ; gcc-patches@gcc.gnu.org; nd > ; rguent...@suse.de > Subject: Re: [PATCH 1/4]middle-end: document pragma unroll n > [PR116140] > &

RE: [PATCH 1/4]middle-end: document pragma unroll n [PR116140]

2025-05-13 Thread Tamar Christina
> -Original Message- > From: Jonathan Wakely > Sent: Tuesday, May 13, 2025 11:34 AM > To: Tamar Christina > Cc: gcc-patches@gcc.gnu.org; nd ; rguent...@suse.de > Subject: Re: [PATCH 1/4]middle-end: document pragma unroll n > [PR116140] > > On Tue, 13 May 202

RE: [PATCH 1/4]middle-end: document pragma unroll n [PR116140]

2025-05-13 Thread Tamar Christina
> -Original Message- > From: Jonathan Wakely > Sent: Tuesday, May 13, 2025 11:01 AM > To: Tamar Christina > Cc: gcc-patches@gcc.gnu.org; nd ; rguent...@suse.de > Subject: Re: [PATCH 1/4]middle-end: document pragma unroll n > [PR116140] > > On 13/05/25 10:39 +

[PATCH 2/4][c-frontend]: implement pragma unroll n for C [PR116140]

2025-05-13 Thread Tamar Christina
Hi All, In PR116140 it was brought up that adding pragma GCC unroll in std::find makes it so that you can't use a larger unroll factor if you wanted to. This is because the value can't be overriden by the other unrolling flags such as -funroll-loops. To know whether this should be possible to do

[PATCH 1/4]middle-end: document pragma unroll n [PR116140]

2025-05-13 Thread Tamar Christina
Hi All, In PR116140 it was brought up that adding pragma GCC unroll in std::find makes it so that you can't use a larger unroll factor if you wanted to. This is because the value can't be overriden by the other unrolling flags such as -funroll-loops. To know whether this should be possible to do

[PATCH 1/2]middle-end: Apply loop->unroll directly in vectorizer

2025-05-13 Thread Tamar Christina
Hi All, Consider the loop void f1 (int *restrict a, int n) { #pragma GCC unroll 4 requested for (int i = 0; i < n; i++) a[i] *= 2; } Which today is vectorized and then unrolled 3x by the RTL unroller due to the use of the pragma. This is unfortunate because the pragma was intended for the

[PATCH 2/2]AArch64: Use vectorizer initial unrolling as default

2025-05-13 Thread Tamar Christina
Hi All, The vectorizer now tries to maintain the target VF that a user wanted through uncreasing the unroll factor if the user used pragma GCC unroll and we've vectorized the loop. This change makes the AArch64 backend honor this initial value being set by the vectorizer. Consider the loop void

[PATCH 4/4][libstdc++] use pragma GCC 4 preferred for std::find [PR116140]

2025-05-13 Thread Tamar Christina
Hi All, In PR116140 it was brought up that adding pragma GCC unroll in std::find makes it so that you can't use a larger unroll factor if you wanted to. This is because the value can't be overriden by the other unrolling flags such as -funroll-loops. To know whether this should be possible to do

[PATCH 3/4][c++-frontend]: implement pragma unroll n for C++ [PR116140]

2025-05-13 Thread Tamar Christina
Hi All, In PR116140 it was brought up that adding pragma GCC unroll in std::find makes it so that you can't use a larger unroll factor if you wanted to. This is because the value can't be overriden by the other unrolling flags such as -funroll-loops. To know whether this should be possible to do

[PATCH 2/2]AArch64: propose -mmax-vectorization as an option to override vector costing

2025-05-13 Thread Tamar Christina
Hi All, With the middle-end providing a way to make vectorization more profitable by scaling vect-scalar-cost-multiplier this makes a more user friendly option to make it easier to use. I propose making it an actual -m option that we document and retain vs using the parameter name. In the future

[PATCH 1/2]middle-end: Add new parameter to scale scalar loop costing in vectorizer

2025-05-13 Thread Tamar Christina
Hi All, This patch adds a new param vect-scalar-cost-multiplier to scale the scalar costing during vectorization. If the cost is set high enough and when using the dynamic cost model it has the effect of effectively disabling the costing vs scalar and assumes all vectorization to be profitable.

RE: [PATCH] Cleanup internal vectorizer costing API

2025-05-12 Thread Tamar Christina
> -Original Message- > From: Richard Biener > Sent: Monday, May 12, 2025 1:46 PM > To: gcc-patches@gcc.gnu.org > Cc: Tamar Christina ; RISC-V CI c...@rivosinc.com> > Subject: [PATCH] Cleanup internal vectorizer costing API > > This tries to cleanup the API ava

RE: [PATCH] vect: Improve vectorization for small-trip-count loops using subvectors

2025-05-09 Thread Tamar Christina
> -Original Message- > From: Richard Biener > Sent: Friday, May 9, 2025 2:44 PM > To: Tamar Christina > Cc: Richard Sandiford ; Pengfei Li > ; gcc-patches@gcc.gnu.org; ktkac...@nvidia.com > Subject: RE: [PATCH] vect: Improve vectorization for small-trip-count loops

RE: [PATCH v2] match.pd: Fold (x + y) >> 1 into IFN_AVG_FLOOR (x, y) for vectors

2025-05-09 Thread Tamar Christina
> -Original Message- > From: Richard Biener > Sent: Friday, May 9, 2025 8:31 AM > To: Pengfei Li > Cc: gcc-patches@gcc.gnu.org; Richard Sandiford > Subject: Re: [PATCH v2] match.pd: Fold (x + y) >> 1 into IFN_AVG_FLOOR (x, y) > for > vectors > > On Thu, 8 May 2025, Pengfei Li wrote:

RE: [PATCH] vect: Improve vectorization for small-trip-count loops using subvectors

2025-05-09 Thread Tamar Christina
> -Original Message- > From: Richard Biener > Sent: Friday, May 9, 2025 11:08 AM > To: Richard Sandiford > Cc: Pengfei Li ; gcc-patches@gcc.gnu.org; > ktkac...@nvidia.com > Subject: Re: [PATCH] vect: Improve vectorization for small-trip-count loops > using > subvectors > > On Fri, 9 May

RE: [PATCH 0/3] Remove non-SLP path from vectorizable_conversion

2025-05-06 Thread Tamar Christina
> > This is an example on how I'd like to see cleanup for SLP happening > in the vectorizable_* and related functions. While this example, > vectorizable_conversion, is quite straight-forward it helps to > isolate errors. I've done this in 3 steps: Happy to help with this if you let me know whi

RE: [PATCH] tree-optimization/120089 - force all PHIs live for early-break vect

2025-05-06 Thread Tamar Christina
> -Original Message- > From: Richard Biener > Sent: Tuesday, May 6, 2025 9:51 AM > To: gcc-patches@gcc.gnu.org > Cc: Tamar Christina ; RISC-V CI c...@rivosinc.com> > Subject: [PATCH] tree-optimization/120089 - force all PHIs live for > early-break vect > &g

RE: [PATCH] aarch64: Optimize SVE extract last to Neon lane extract for 128-bit VLS.

2025-04-28 Thread Tamar Christina
> -Original Message- > From: Jennifer Schmitz > Sent: Monday, April 28, 2025 11:40 AM > To: gcc-patches@gcc.gnu.org > Cc: Richard Sandiford ; Tamar Christina > > Subject: Re: [PATCH] aarch64: Optimize SVE extract last to Neon lane extract > for > 128-bit

[PATCH][committed][14 backport]AArch64: force operand to fresh register to avoid subreg issues [PR118892]

2025-04-28 Thread Tamar Christina
Hi All, When the input is already a subreg and we try to make a paradoxical subreg out of it for copysign this can fail if it violates the subreg relationship. Use force_lowpart_subreg instead of lowpart_subreg to then force the results to a register instead of ICEing. Bootstrapped Regtested on

[PATCH v2][committed][14 backport]middle-end: fix masking for partial vectors and early break [PR119351]

2025-04-28 Thread Tamar Christina
Hi All, The following testcase shows an incorrect masked codegen: #define N 512 #define START 1 #define END 505 int x[N] __attribute__((aligned(32))); int __attribute__((noipa)) foo (void) { int z = 0; for (unsigned int i = START; i < END; ++i) { z++; if (x[i] > 0)

RE: [PATCH] AArch64: Fold LD1/ST1 with ptrue to LDR/STR for 128-bit VLS

2025-04-27 Thread Tamar Christina
> -Original Message- > From: Richard Sandiford > Sent: Friday, April 25, 2025 6:55 PM > To: Jennifer Schmitz > Cc: gcc-patches@gcc.gnu.org > Subject: Re: [PATCH] AArch64: Fold LD1/ST1 with ptrue to LDR/STR for 128-bit > VLS > > Jennifer Schmitz writes: > > If -msve-vector-bits=128, SVE

RE: [PATCH] aarch64: Optimize SVE extract last to Neon lane extract for 128-bit VLS.

2025-04-26 Thread Tamar Christina
> -Original Message- > From: Richard Sandiford > Sent: Friday, April 25, 2025 4:45 PM > To: Jennifer Schmitz > Cc: gcc-patches@gcc.gnu.org > Subject: Re: [PATCH] aarch64: Optimize SVE extract last to Neon lane extract > for > 128-bit VLS. > > Jennifer Schmitz writes: > > For the test c

RE: [PATCH] Add a bootstrap-native build config

2025-04-23 Thread Tamar Christina
> -Original Message- > From: Jakub Jelinek > Sent: Wednesday, April 23, 2025 10:39 AM > To: Tamar Christina > Cc: Richard Biener ; Andi Kleen > ; GCC Patches > Subject: Re: [PATCH] Add a bootstrap-native build config > > On Wed, Apr 23, 2025 at 09:36:11AM +

RE: [PATCH]middle-end: Add new "max" vector cost model

2025-04-23 Thread Tamar Christina
> -Original Message- > From: Richard Biener > Sent: Wednesday, April 23, 2025 9:37 AM > To: Tamar Christina > Cc: gcc-patches@gcc.gnu.org; nd ; Richard Sandiford > > Subject: Re: [PATCH]middle-end: Add new "max" vector cost model > > On Wed, 23 Apr

RE: [PATCH] Add a bootstrap-native build config

2025-04-23 Thread Tamar Christina
> -Original Message- > From: Richard Biener > Sent: Wednesday, April 23, 2025 9:19 AM > To: Andi Kleen ; GCC Patches > Subject: Re: [PATCH] Add a bootstrap-native build config > > On Tue, Apr 22, 2025 at 5:43 PM Andi Kleen wrote: > > > > On 2025-04-22 13:22, Richard Biener wrote: > > >

RE: [PATCH]middle-end: Add new "max" vector cost model

2025-04-23 Thread Tamar Christina
> -Original Message- > From: Richard Biener > Sent: Wednesday, April 23, 2025 10:14 AM > To: Tamar Christina > Cc: Richard Sandiford ; gcc-patches@gcc.gnu.org; > nd > Subject: RE: [PATCH]middle-end: Add new "max" vector cost model > > On We

RE: [PATCH]middle-end: Add new "max" vector cost model

2025-04-23 Thread Tamar Christina
> -Original Message- > From: Richard Sandiford > Sent: Wednesday, April 23, 2025 9:45 AM > To: Tamar Christina > Cc: Richard Biener ; gcc-patches@gcc.gnu.org; nd > > Subject: Re: [PATCH]middle-end: Add new "max" vector cost model > > Tamar Christin

RE: [PATCH]middle-end: Add new "max" vector cost model

2025-04-23 Thread Tamar Christina
> -Original Message- > From: Richard Biener > Sent: Wednesday, April 23, 2025 9:46 AM > To: Tamar Christina > Cc: gcc-patches@gcc.gnu.org; nd ; Richard Sandiford > > Subject: RE: [PATCH]middle-end: Add new "max" vector cost model > > On We

RE: [PATCH]middle-end: Add new "max" vector cost model

2025-04-23 Thread Tamar Christina
> -Original Message- > From: Richard Biener > Sent: Wednesday, April 23, 2025 9:31 AM > To: Tamar Christina > Cc: gcc-patches@gcc.gnu.org; nd ; Richard Sandiford > > Subject: Re: [PATCH]middle-end: Add new "max" vector cost model > > On We

[PATCH]middle-end: Add new "max" vector cost model

2025-04-23 Thread Tamar Christina
Hi All, This patch proposes a new vector cost model called "max". The cost model is an intersection between two of our existing cost models. Like `unlimited` it disables the costing vs scalar and assumes all vectorization to be profitable. But unlike unlimited it does not fully disable the vect

docs: Document PFA support in GCC-15 changes

2025-04-23 Thread Tamar Christina
Hi All, This documents the PFA support in GCC-15. Ok for master? Thanks, Tamar --- diff --git a/htdocs/gcc-15/changes.html b/htdocs/gcc-15/changes.html index f03e29c8581f2749a968e592eae2e40ce3ca8521..7fb70b993c56ff43c09aeb7bfaa4479385679dec 100644 --- a/htdocs/gcc-15/changes.html +++ b/htdocs

[PATCH] testsuite: AMDGCN test for vect-early-break_38.c as well to consistent architecture [PR119286]

2025-04-22 Thread Tamar Christina
Hi All, I had missed this one during the AMDGCN test failures. Like vect-early-break_18.c this test is also scalaring the loads and thus leading to unexpected vectorization for this testcase. Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. Cross checked the failing case on amdgc

RE: [PATCH] Document AArch64 changes for GCC 15

2025-04-22 Thread Tamar Christina
> -Original Message- > From: Richard Sandiford > Sent: Tuesday, April 22, 2025 2:28 PM > To: Tamar Christina > Cc: gcc-patches@gcc.gnu.org; Richard Earnshaw ; > ktkac...@nvidia.com > Subject: Re: [PATCH] Document AArch64 changes for GCC 15 > > Tamar Christina

RE: [PATCH] Document AArch64 changes for GCC 15

2025-04-22 Thread Tamar Christina
> -Original Message- > From: Richard Sandiford > Sent: Tuesday, April 22, 2025 1:31 PM > To: gcc-patches@gcc.gnu.org > Cc: Richard Earnshaw ; ktkac...@nvidia.com; > Tamar Christina > Subject: [PATCH] Document AArch64 changes for GCC 15 > > The list i

RE: [PATCH]middle-end: fix masking for partial vectors and early break [PR119351]

2025-04-16 Thread Tamar Christina
> -Original Message- > From: Richard Biener > Sent: Wednesday, April 16, 2025 9:57 AM > To: Tamar Christina > Cc: gcc-patches@gcc.gnu.org; nd > Subject: Re: [PATCH]middle-end: fix masking for partial vectors and early > break > [PR119351] > > On Wed, 16 A

[PATCH] testsuite: force AMDGCN test for vect-early-break_18.c to consistent architecture [PR119286]

2025-04-16 Thread Tamar Christina
Hi All, The given test is intended to test vectorization of a strided access done by having a step of > 1. GCN target doesn't support load lanes, so the testcase is expected to fail, other targets create a permuted load here which we then then reject. However some GCN arch don't seem to support

[PATCH]middle-end: fix masking for partial vectors and early break [PR119351]

2025-04-16 Thread Tamar Christina
Hi All, The following testcase shows an incorrect masked codegen: #define N 512 #define START 1 #define END 505 int x[N] __attribute__((aligned(32))); int __attribute__((noipa)) foo (void) { int z = 0; for (unsigned int i = START; i < END; ++i) { z++; if (x[i] > 0)

RE: [PATCH]middle-end: Fix incorrect codegen with PFA and VLS [PR119351]

2025-04-15 Thread Tamar Christina
> -Original Message- > From: Richard Biener > Sent: Tuesday, April 15, 2025 12:50 PM > To: Tamar Christina > Cc: Richard Sandiford ; gcc-patches@gcc.gnu.org; > nd > Subject: RE: [PATCH]middle-end: Fix incorrect codegen with PFA and VLS > [PR119351] > &

RE: [PATCH]middle-end: Fix incorrect codegen with PFA and VLS [PR119351]

2025-04-15 Thread Tamar Christina
> -Original Message- > From: Richard Biener > Sent: Tuesday, April 15, 2025 12:49 PM > To: Tamar Christina > Cc: gcc-patches@gcc.gnu.org; nd > Subject: Re: [PATCH]middle-end: Fix incorrect codegen with PFA and VLS > [PR119351] > > On Tue, 15 Apr 2025, Tamar

RE: [PATCH]middle-end: Fix incorrect codegen with PFA and VLS [PR119351]

2025-04-15 Thread Tamar Christina
> -Original Message- > From: Richard Sandiford > Sent: Tuesday, April 15, 2025 10:52 AM > To: Tamar Christina > Cc: gcc-patches@gcc.gnu.org; nd ; rguent...@suse.de > Subject: Re: [PATCH]middle-end: Fix incorrect codegen with PFA and VLS > [PR119351] > > Tamar

[PATCH]middle-end: Fix incorrect codegen with PFA and VLS [PR119351]

2025-04-15 Thread Tamar Christina
Hi All, The following example: #define N 512 #define START 2 #define END 505 int x[N] __attribute__((aligned(32))); int __attribute__((noipa)) foo (void) { for (signed int i = START; i < END; ++i) { if (x[i] == 0) return i; } return -1; } generates incorrect code with

RE: [PATCH][contrib]: support json output from check_GNU_style_lib.py

2025-04-09 Thread Tamar Christina
Ping > -Original Message- > From: Tamar Christina > Sent: Tuesday, July 23, 2024 3:30 PM > To: Jonathan Wakely ; Filip Kastl > Cc: gcc-patches@gcc.gnu.org; nd > Subject: RE: [PATCH][contrib]: support json output from check_GNU_style_lib.py > > Hi Both, >

RE: [PATCH v2] aarch64, Darwin: Initial implementation of Apple cores [PR113257].

2025-04-07 Thread Tamar Christina
> -Original Message- > From: Kyrylo Tkachov > Sent: Monday, March 31, 2025 1:43 PM > To: i...@sandoe.co.uk > Cc: Tamar Christina ; GCC Patches patc...@gcc.gnu.org>; Alice Carlotti ; Richard > Sandiford > ; s...@gentoo.org > Subject: Re: [PATCH v2] aarch64, Dar

RE: [PATCH] testsuite: update early-break tests for non-load-lanes targets [PR119286]

2025-03-18 Thread Tamar Christina
> -Original Message- > From: Richard Biener > Sent: Tuesday, March 18, 2025 10:48 AM > To: Tamar Christina > Cc: gcc-patches@gcc.gnu.org; nd > Subject: Re: [PATCH] testsuite: update early-break tests for non-load-lanes > targets > [PR119286] > > On Mon

[PATCH] testsuite: update early-break tests for non-load-lanes targets [PR119286]

2025-03-17 Thread Tamar Christina
Hi All, Broadly speaking, these tests were failing because the BB limitation for SLP'ing loads in an || in an early break makes the loads end up in different BBs and so today we can't SLP them. This results in load_lanes being required to vectorize them because the alternative is loads with permu

RE: [1/3 PATCH]AArch64: add support for partial modes to last extractions [PR118464]

2025-03-11 Thread Tamar Christina
> -Original Message- > From: Richard Sandiford > Sent: Wednesday, March 5, 2025 11:27 AM > To: Tamar Christina > Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw > ; ktkac...@gcc.gnu.org > Subject: Re: [1/3 PATCH]AArch64: add support for partial modes to last &g

RE: [1/3 PATCH]AArch64: add support for partial modes to last extractions [PR118464]

2025-03-06 Thread Tamar Christina
> > diff --git a/gcc/config/aarch64/aarch64-sve.md b/gcc/config/aarch64/aarch64- > sve.md > > index > a93bc463a909ea28460cc7877275fce16e05f7e6..205eeec2e35544de848e0dbb > 48e3f5ae59391a88 100644 > > --- a/gcc/config/aarch64/aarch64-sve.md > > +++ b/gcc/config/aarch64/aarch64-sve.md > > @@ -3107,12

RE: [1/3 PATCH]AArch64: add support for partial modes to last extractions [PR118464]

2025-03-06 Thread Tamar Christina
> -Original Message- > From: Richard Sandiford > Sent: Thursday, March 6, 2025 10:40 AM > To: Tamar Christina > Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw > ; ktkac...@gcc.gnu.org > Subject: Re: [1/3 PATCH]AArch64: add support for partial modes to last &g

RE: [1/3 PATCH]AArch64: add support for partial modes to last extractions [PR118464]

2025-03-05 Thread Tamar Christina
> -Original Message- > From: Richard Sandiford > Sent: Monday, March 3, 2025 11:53 AM > To: Tamar Christina > Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw > ; ktkac...@gcc.gnu.org > Subject: Re: [1/3 PATCH]AArch64: add support for partial modes to last &g

RE: [3/3 PATCH v4]middle-end: delay checking for alignment to load [PR118464]

2025-03-03 Thread Tamar Christina
> >/* For now assume all conditional loads/stores support unaligned > > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc > > index > 6bbb16beff2c627fca11a7403ba5ee3a5faa21c1..b661dd400e5826fc1c4f70 > 957b335d1741fa 100644 > > --- a/gcc/tree-vect-stmts.cc > > +++ b/gcc/tree-vect-

RE: [PATCH]AArch64: force operand to fresh register to avoid subreg issues [PR118892]

2025-03-03 Thread Tamar Christina
> -Original Message- > From: Richard Sandiford > Sent: Monday, March 3, 2025 10:12 AM > To: Tamar Christina > Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw > ; ktkac...@gcc.gnu.org > Subject: Re: [PATCH]AArch64: force operand to fresh register to avoid subr

[PATCH]AArch64: force operand to fresh register to avoid subreg issues [PR118892]

2025-02-27 Thread Tamar Christina
Hi All, When the input is already a subreg and we try to make a paradoxical subreg out of it for copysign this can fail if it violates the sugreg relationship. Use force_lowpart_subreg instead of lowpart_subreg to then force the results to a register instead of ICEing. Bootstrapped Regtested on

RE: [3/3 PATCH v3]middle-end: delay checking for alignment to load [PR118464]

2025-02-26 Thread Tamar Christina
> > > > > > No, I don't think so. The code that eventually performs a > > > contiguous sub-group access directly should never extend > > > the load beyond GROUP_SIZE - or should be gated on the DR > > > not executed speculatively. That is, we should "fix" this > > > elsewhere. > > > > > > > It do

RE: [3/3 PATCH v3]middle-end: delay checking for alignment to load [PR118464]

2025-02-26 Thread Tamar Christina
> -Original Message- > From: Richard Biener > Sent: Wednesday, February 26, 2025 1:52 PM > To: Tamar Christina > Cc: gcc-patches@gcc.gnu.org; nd > Subject: RE: [3/3 PATCH v3]middle-end: delay checking for alignment to load > [PR118464] > > On Wed, 26 Feb

RE: [3/3 PATCH v3]middle-end: delay checking for alignment to load [PR118464]

2025-02-26 Thread Tamar Christina
> -Original Message- > From: Richard Biener > Sent: Wednesday, February 26, 2025 12:30 PM > To: Tamar Christina > Cc: gcc-patches@gcc.gnu.org; nd > Subject: Re: [3/3 PATCH v3]middle-end: delay checking for alignment to load > [PR118464] > > On Tue, 25 Feb

[3/3 PATCH v3]middle-end: delay checking for alignment to load [PR118464]

2025-02-25 Thread Tamar Christina
Hi All, This fixes two PRs on Early break vectorization by delaying the safety checks to vectorizable_load when the VF, VMAT and vectype are all known. This patch does add two new restrictions: 1. On LOAD_LANES targets, where the buffer size is known, we reject uneven group sizes, as they are

[2/3 PATCH][committed] testsuite: Add pragma novector to more tests [PR118464]

2025-02-25 Thread Tamar Christina
Hi All, These loops will now vectorize the entry finding loops. As such we get more failures because they were not expecting to be vectorized. Fixed by adding #pragma GCC novector. Bootstrapped Regtested on aarch64-none-linux-gnu, arm-none-linux-gnueabihf, x86_64-pc-linux-gnu -m32, -m64 and no

[1/3 PATCH]AArch64: add support for partial modes to last extractions [PR118464]

2025-02-25 Thread Tamar Christina
Hi All, The last extraction instructions work full both full and partial SVE vectors, however we currrently only define them for FULL vectors. Early break code for VLA now however requires partial vector support, which relies on extract_last support. I have not added any new testcases as they ov

RE: [PATCH v2]middle-end: delay checking for alignment to load [PR118464]

2025-02-13 Thread Tamar Christina
> -Original Message- > From: Richard Sandiford > Sent: Thursday, February 13, 2025 4:55 PM > To: Tamar Christina > Cc: Richard Biener ; gcc-patches@gcc.gnu.org; nd > > Subject: Re: [PATCH v2]middle-end: delay checking for alignment to load > [PR118464] >

RE: [PATCH v2]middle-end: delay checking for alignment to load [PR118464]

2025-02-12 Thread Tamar Christina
> -Original Message- > From: Tamar Christina > Sent: Wednesday, February 12, 2025 3:20 PM > To: Richard Biener > Cc: gcc-patches@gcc.gnu.org; nd > Subject: RE: [PATCH v2]middle-end: delay checking for alignment to load > [PR118464] > > > -Original

RE: [PATCH v2]middle-end: delay checking for alignment to load [PR118464]

2025-02-12 Thread Tamar Christina
> -Original Message- > From: Richard Biener > Sent: Wednesday, February 12, 2025 2:58 PM > To: Tamar Christina > Cc: gcc-patches@gcc.gnu.org; nd > Subject: Re: [PATCH v2]middle-end: delay checking for alignment to load > [PR118464] > > On Tue, 11 Feb

[PATCH]AArch64: Fix GCC 13 backport of big.Little CPU detection [PR118800]

2025-02-10 Thread Tamar Christina
Hi All, It seems I ran regressions but forgot to check them last time `(*>?<*)? On the GCC-13 branch the backport caused a failure due to the branch not having generic-armv8-a and also it still treating the generic cpu special. This made it return NULL when trying to find the default CPU. In GC

[PATCH]middle-end: Fix two testisms on x86 after PFA [PR118754]

2025-02-10 Thread Tamar Christina
Hi All, These two tests now vectorize the result finding loop with PFA and so the number of loops checked fails. This fixes them by adding #pragma GCC novector to the testcases. Regtested on x86_64-pc-linux-gnu on an AVX512 machine with -m32, -m64 and test pass again. Ok for master? Thanks, Ta

RE: [PATCH]middle-end: delay checking for alignment to load [PR118464]

2025-02-07 Thread Tamar Christina
> -Original Message- > From: Richard Biener > Sent: Wednesday, February 5, 2025 1:15 PM > To: Tamar Christina > Cc: gcc-patches@gcc.gnu.org; nd > Subject: RE: [PATCH]middle-end: delay checking for alignment to load > [PR118464] > > On Wed, 5 Feb

[PATCH]middle-end: Remove unused internal function after IVopts cleanup [PR118756]

2025-02-06 Thread Tamar Christina
Hi All, It seems that after my IVopts patches the function contain_complex_addr_expr became unused and clang is rightfully complaining about it. This removes the unused internal function. Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. Ok for master? Thanks, Tamar gcc/ChangeLo

RE: [PATCH]middle-end: delay checking for alignment to load [PR118464]

2025-02-05 Thread Tamar Christina
> -Original Message- > From: Richard Biener > Sent: Wednesday, February 5, 2025 10:16 AM > To: Tamar Christina > Cc: gcc-patches@gcc.gnu.org; nd > Subject: RE: [PATCH]middle-end: delay checking for alignment to load > [PR118464] > > On Wed, 5 Feb

RE: [PATCH]middle-end: delay checking for alignment to load [PR118464]

2025-02-05 Thread Tamar Christina
> -Original Message- > From: Richard Biener > Sent: Tuesday, February 4, 2025 12:49 PM > To: Tamar Christina > Cc: gcc-patches@gcc.gnu.org; nd > Subject: RE: [PATCH]middle-end: delay checking for alignment to load > [PR118464] > > On Tue, 4 Feb

RE: [PATCH 1/4] vect: Set counts of early break exit blocks correctly [PR117790]

2025-02-05 Thread Tamar Christina
> -Original Message- > From: Jan Hubicka > Sent: Tuesday, February 4, 2025 4:25 PM > To: Alex Coplan > Cc: gcc-patches@gcc.gnu.org; Richard Biener ; Tamar > Christina > Subject: Re: [PATCH 1/4] vect: Set counts of early break exit blocks correctly > [PR

RE: [PATCH]middle-end: delay checking for alignment to load [PR118464]

2025-02-03 Thread Tamar Christina
Looks like a last minute change I made accidentally blocked SVE. Fixed and re-sending: Hi All, This fixes two PRs on Early break vectorization by delaying the safety checks to vectorizable_load when the VF, VMAT and vectype are all known. This patch does add two new restrictions: 1. On LOAD_LA

RE: [PATCH 4/4] vect: Fix scale_profile_for_vect_loop for multiple exits [PR117790]

2025-02-03 Thread Tamar Christina
Ping > -Original Message- > From: Tamar Christina > Sent: Friday, January 24, 2025 9:18 AM > To: Alex Coplan ; gcc-patches@gcc.gnu.org > Cc: Richard Biener ; Jan Hubicka > Subject: RE: [PATCH 4/4] vect: Fix scale_profile_for_vect_loop for multiple > exits &

RE: [PATCH 3/4] vect: Ensure profile consistency when adding epilog guard [PR117790]

2025-02-03 Thread Tamar Christina
Ping > -Original Message- > From: Tamar Christina > Sent: Friday, January 24, 2025 9:18 AM > To: Alex Coplan ; gcc-patches@gcc.gnu.org > Cc: Richard Biener ; Jan Hubicka > Subject: RE: [PATCH 3/4] vect: Ensure profile consistency when adding epilog > guard

[PATCH]middle-end: delay checking for alignment to load [PR118464]

2025-02-03 Thread Tamar Christina
Hi All, This fixes two PRs on Early break vectorization by delaying the safety checks to vectorizable_load when the VF, VMAT and vectype are all known. This patch does add two new restrictions: 1. On LOAD_LANES targets, where the buffer size is known, we reject uneven group sizes, as they are

RE: [PATCH 2/4] cfgloopmanip: Add infrastructure for scaling of multi-exit loops [PR117790]

2025-02-03 Thread Tamar Christina
Ping > -Original Message- > From: Tamar Christina > Sent: Friday, January 24, 2025 9:18 AM > To: Alex Coplan ; gcc-patches@gcc.gnu.org > Cc: Richard Biener ; Jan Hubicka > Subject: RE: [PATCH 2/4] cfgloopmanip: Add infrastructure for scaling of > multi-exit > lo

RE: [PATCH 1/4] vect: Set counts of early break exit blocks correctly [PR117790]

2025-02-03 Thread Tamar Christina
Ping > -Original Message- > From: Tamar Christina > Sent: Friday, January 24, 2025 9:17 AM > To: Alex Coplan ; 'gcc-patches@gcc.gnu.org' patc...@gcc.gnu.org> > Cc: 'Richard Biener' ; 'Jan Hubicka' > Subject: RE: [PATCH 1/4] ve

RE: [PATCH 4/4] vect: Fix scale_profile_for_vect_loop for multiple exits [PR117790]

2025-01-24 Thread Tamar Christina
ping > -Original Message- > From: Tamar Christina > Sent: Wednesday, January 15, 2025 2:08 PM > To: Alex Coplan ; gcc-patches@gcc.gnu.org > Cc: Richard Biener ; Jan Hubicka > Subject: RE: [PATCH 4/4] vect: Fix scale_profile_for_vect_loop for multiple > exits &

RE: [PATCH 2/4] cfgloopmanip: Add infrastructure for scaling of multi-exit loops [PR117790]

2025-01-24 Thread Tamar Christina
ping > -Original Message- > From: Tamar Christina > Sent: Wednesday, January 15, 2025 2:08 PM > To: Alex Coplan ; gcc-patches@gcc.gnu.org > Cc: Richard Biener ; Jan Hubicka > Subject: RE: [PATCH 2/4] cfgloopmanip: Add infrastructure for scaling of > multi-e

RE: [PATCH 1/4] vect: Set counts of early break exit blocks correctly [PR117790]

2025-01-24 Thread Tamar Christina
ping > -Original Message- > From: Tamar Christina > Sent: Wednesday, January 15, 2025 2:07 PM > To: Alex Coplan ; gcc-patches@gcc.gnu.org > Cc: Richard Biener ; Jan Hubicka > Subject: RE: [PATCH 1/4] vect: Set counts of early break exit blocks correctly >

RE: [PATCH 3/4] vect: Ensure profile consistency when adding epilog guard [PR117790]

2025-01-24 Thread Tamar Christina
ping > -Original Message- > From: Tamar Christina > Sent: Wednesday, January 15, 2025 2:08 PM > To: Alex Coplan ; gcc-patches@gcc.gnu.org > Cc: Richard Biener ; Jan Hubicka > Subject: RE: [PATCH 3/4] vect: Ensure profile consistency when adding epilog > guard

RE: [PATCH]AArch64: Drop ILP32 from default elf multilibs after deprecation

2025-01-20 Thread Tamar Christina
> -Original Message- > From: Tamar Christina > Sent: Friday, January 17, 2025 5:07 PM > To: Kyrylo Tkachov ; Richard Sandiford > > Cc: GCC Patches ; nd ; Richard > Earnshaw ; ktkac...@gcc.gnu.org > Subject: RE: [PATCH]AArch64: Drop ILP32 from default elf multi

RE: [PATCH] aarch64: Provide initial specifications for Apple CPU cores.

2025-01-20 Thread Tamar Christina
> -Original Message- > From: Iain Sandoe > Sent: Monday, January 20, 2025 6:15 PM > To: Andrew Carlotti > Cc: Kyrylo Tkachov ; GCC Patches patc...@gcc.gnu.org>; Tamar Christina ; Richard > Sandiford ; Sam James > Subject: Re: [PATCH] aarch64: Provide initial

[PATCH]middle-end: use ncopies both when registering and reading masks [PR118273]

2025-01-20 Thread Tamar Christina
Hi All, When registering masks for SIMD clone we end up using nmasks instead of nvectors where nmasks seems to compute the number of input masks required for the call given the current simdlen. This is however wrong as vect_record_loop_mask wants to know how many masks you want to create from the

RE: [gcc r15-6807] vect: Force alignment peeling to vectorize more early break loops [PR118211]

2025-01-20 Thread Tamar Christina
> -Original Message- > From: Thomas Schwinge > Sent: Monday, January 13, 2025 9:54 AM > To: Tamar Christina ; Alex Coplan > ; gcc-patches@gcc.gnu.org > Cc: Andrew Stubbs > Subject: Re: [gcc r15-6807] vect: Force alignment peeling to vectorize more > early

RE: [PATCH]AArch64: Drop ILP32 from default elf multilibs after deprecation

2025-01-17 Thread Tamar Christina
> -Original Message- > From: Kyrylo Tkachov > Sent: Friday, January 17, 2025 3:10 PM > To: Richard Sandiford > Cc: Tamar Christina ; GCC Patches patc...@gcc.gnu.org>; nd ; Richard Earnshaw > ; ktkac...@gcc.gnu.org > Subject: Re: [PATCH]AArch64: Drop ILP32 fr

RE: [PATCH]AArch64: Drop ILP32 from default elf multilibs after deprecation

2025-01-17 Thread Tamar Christina
> -Original Message- > From: Kyrylo Tkachov > Sent: Friday, January 17, 2025 1:22 PM > To: Tamar Christina > Cc: GCC Patches ; nd ; Richard > Earnshaw ; ktkac...@gcc.gnu.org; Richard > Sandiford > Subject: Re: [PATCH]AArch64: Drop ILP32 from default elf multi

  1   2   3   4   5   6   7   8   9   10   >