[aarch64] Update reg-costs to differentiate between memmove costs

2022-03-16 Thread Andre Vieira (lists) via Gcc-patches
cost to ensure the behaviour remains the same. 2022-03-16  Tamar Christina                Andre Vieira gcc/ChangeLog:     * config/aarch64/aarch64-protos.h (struct cpu_memmov_cost): New struct.     (struct tune_params): Change type of memmov_cost to use cpu_memmov_cost

[aarch64] Implement determine_suggested_unroll_factor

2022-03-16 Thread Andre Vieira (lists) via Gcc-patches
Hi, This patch implements the costing function determine_suggested_unroll_factor for aarch64. It determines the unrolling factor by dividing the number of X operations we can do per cycle by the number of X operations in the loop body, taking this information from the vec_ops analysis during v

[aarch64] Update Neoverse N2 core definition

2022-03-16 Thread Andre Vieira (lists) via Gcc-patches
Hi, As requested, I updated the Neoverse N2 entry to use the AARCH64_FL_FOR_ARCH9 feature set, removed duplicate entries, updated the ARCH_INDENT to 9A and moved it under the Armv9 cores. gcc/ChangeLog:     * config/aarch64/aarch64-cores.def: Update Neoverse N2 core entry. diff --git a/g

Re: [aarch64] Update Neoverse N2 core definition

2022-03-24 Thread Andre Vieira (lists) via Gcc-patches
Ping. On 16/03/2022 15:00, Andre Vieira (lists) via Gcc-patches wrote: Hi, As requested, I updated the Neoverse N2 entry to use the AARCH64_FL_FOR_ARCH9 feature set, removed duplicate entries, updated the ARCH_INDENT to 9A and moved it under the Armv9 cores. gcc/ChangeLog

Re: [aarch64] Implement determine_suggested_unroll_factor

2022-03-25 Thread Andre Vieira (lists) via Gcc-patches
.     (aarch64_vector_costs::add_stmt_cost): Check for a qualifying pattern     to set m_nosve_pattern.     (aarch64_vector_costs::finish_costs): Use determine_suggested_unroll_factor.     * config/aarch64/aarch64.opt (aarch64-vect-unroll-limit): New. On 16/03/2022 18:01, Richard Sandiford wrote: "

Re: [aarch64] Implement determine_suggested_unroll_factor

2022-03-31 Thread Andre Vieira (lists) via Gcc-patches
On 28/03/2022 15:59, Richard Sandiford wrote: "Andre Vieira (lists)" writes: Hi, Addressed all of your comments bar the pred ops one. Is this OK? gcc/ChangeLog:     * config/aarch64/aarch64.cc (aarch64_vector_costs): Define determine_suggested_unroll_factor and m_nos

[AArch64] PR target/105157 Increase number of cores TARGET_CPU_DEFAULT can encode

2022-04-07 Thread Andre Vieira (lists) via Gcc-patches
Hi, This addresses the compile-time increase seen in the PR target/105157. This was being caused by selecting the wrong core tuning, as when we added the latest AArch64 the TARGET_CPU_generic tuning was pushed beyond the 0x3f mask we used to encode both target cpu and attributes into TARGET_C

Re: [AArch64] PR target/105157 Increase number of cores TARGET_CPU_DEFAULT can encode

2022-04-08 Thread Andre Vieira (lists) via Gcc-patches
On 08/04/2022 08:04, Richard Sandiford wrote: I think this would be better as a static assert at the top level: static_assert (TARGET_CPU_generic < TARGET_CPU_MASK, "TARGET_CPU_NBITS is big enough"); The motivation being that you want this to be checked regardless of wheth

Re: [vect] PR103997: Fix epilogue mode skipping

2022-01-18 Thread Andre Vieira (lists) via Gcc-patches
On 14/01/2022 09:57, Richard Biener wrote: The 'used_vector_modes' is also a heuristic by itself since it registers every vector type we query, not only those that are used in the end ... So it's really all heuristics that can eventually go bad. IMHO remembering the VF that we ended up with (

Re: [vect] PR103997: Fix epilogue mode skipping

2022-01-19 Thread Andre Vieira (lists) via Gcc-patches
On 19/01/2022 11:04, Richard Biener wrote: On Tue, 18 Jan 2022, Andre Vieira (lists) wrote: On 14/01/2022 09:57, Richard Biener wrote: The 'used_vector_modes' is also a heuristic by itself since it registers every vector type we query, not only those that are used in the end ..

Re: [PATCH v3 04/15] arm: Add GENERAL_AND_VPR_REGS regclass

2022-01-19 Thread Andre Vieira (lists) via Gcc-patches
Hi Christophe, On 13/01/2022 14:56, Christophe Lyon via Gcc-patches wrote: At some point during the development of this patch series, it appeared that in some cases the register allocator wants “VPR or general” rather than “VPR or general or FP” (which is the same thing as ALL_REGS). The series

Re: [PATCH v3 05/15] arm: Add support for VPR_REG in arm_class_likely_spilled_p

2022-01-19 Thread Andre Vieira (lists) via Gcc-patches
On 13/01/2022 14:56, Christophe Lyon via Gcc-patches wrote: VPR_REG is the only register in its class, so it should be handled by TARGET_CLASS_LIKELY_SPILLED_P, which is achieved by calling default_class_likely_spilled_p. No test fails without this patch, but it seems it should be implemented.

Re: [PATCH v3 06/15] arm: Fix mve_vmvnq_n_ argument mode

2022-01-19 Thread Andre Vieira (lists) via Gcc-patches
On 13/01/2022 14:56, Christophe Lyon via Gcc-patches wrote: The vmvnq_n* intrinsics and have [u]int[16|32]_t arguments, so use iterator instead of HI in mve_vmvnq_n_. 2022-01-13 Christophe Lyon gcc/ * config/arm/mve.md (mve_vmvnq_n_): Use V_elem mode for operand 1.

Re: [PATCH v3 04/15] arm: Add GENERAL_AND_VPR_REGS regclass

2022-01-20 Thread Andre Vieira (lists) via Gcc-patches
On 20/01/2022 09:14, Christophe Lyon wrote: On Wed, Jan 19, 2022 at 7:18 PM Andre Vieira (lists) via Gcc-patches wrote: Hi Christophe, On 13/01/2022 14:56, Christophe Lyon via Gcc-patches wrote: > At some point during the development of this patch series, it appea

Re: [PATCH v3 04/15] arm: Add GENERAL_AND_VPR_REGS regclass

2022-01-20 Thread Andre Vieira (lists) via Gcc-patches
On 20/01/2022 10:40, Richard Sandiford wrote: "Andre Vieira (lists)" writes: On 20/01/2022 09:14, Christophe Lyon wrote: On Wed, Jan 19, 2022 at 7:18 PM Andre Vieira (lists) via Gcc-patches wrote: Hi Christophe, On 13/01/2022 14:56, Christophe Lyon via Gcc-pat

Re: [PATCH v3 06/15] arm: Fix mve_vmvnq_n_ argument mode

2022-01-20 Thread Andre Vieira (lists) via Gcc-patches
On 20/01/2022 10:45, Richard Sandiford wrote: "Andre Vieira (lists)" writes: On 13/01/2022 14:56, Christophe Lyon via Gcc-patches wrote: The vmvnq_n* intrinsics and have [u]int[16|32]_t arguments, so use iterator instead of HI in mve_vmvnq_n_. 2022-01-13 Christophe Lyon

Re: [PATCH v3 07/15] arm: Implement MVE predicates as vectors of booleans

2022-01-21 Thread Andre Vieira (lists) via Gcc-patches
Hi Christophe, On 13/01/2022 14:56, Christophe Lyon via Gcc-patches wrote: diff --git a/gcc/config/arm/arm-simd-builtin-types.def b/gcc/config/arm/arm-simd-builtin-types.def index 6ba6f211531..920c2a68e4c 100644 --- a/gcc/config/arm/arm-simd-builtin-types.def +++ b/gcc/config/arm/arm-simd-built

[PATCH][gcc][middle-end] PR104498: Fix comparing symbol reference

2022-02-16 Thread Andre Vieira (lists) via Gcc-patches
Hi, As reported on PR104498, the issue here is that when compare_base_symbol_refs swaps x and y but doesn't take that into account when computing the distance. This patch makes sure that if x and y are swapped, we correct the distance computation by multiplying it by -1 to end up with the corr

Re: [AArch64] Enable generation of FRINTNZ instructions

2021-11-17 Thread Andre Vieira (lists) via Gcc-patches
On 16/11/2021 12:10, Richard Biener wrote: On Fri, 12 Nov 2021, Andre Simoes Dias Vieira wrote: On 12/11/2021 10:56, Richard Biener wrote: On Thu, 11 Nov 2021, Andre Vieira (lists) wrote: Hi, This patch introduces two IFN's FTRUNC32 and FTRUNC64, the corresponding optabs and mapping

Re: [AArch64] Enable generation of FRINTNZ instructions

2021-11-22 Thread Andre Vieira (lists) via Gcc-patches
On 18/11/2021 11:05, Richard Biener wrote: @@ -3713,12 +3713,21 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) trapping behaviour, so require !flag_trapping_math. */ #if GIMPLE (simplify - (float (fix_trunc @0)) - (if (!flag_trapping_math - && types_match (type, TREE_TYPE (@0)) -

Re: [PATCH 1v2/3][vect] Add main vectorized loop unrolling

2021-11-22 Thread Andre Vieira (lists) via Gcc-patches
On 12/11/2021 13:12, Richard Biener wrote: On Thu, 11 Nov 2021, Andre Vieira (lists) wrote: Hi, This is the rebased and reworked version of the unroll patch.  I wasn't entirely sure whether I should compare the costs of the unrolled loop_vinfo with the original loop_vinfo it was unroll

Re: [PATCH 1v2/3][vect] Add main vectorized loop unrolling

2021-11-24 Thread Andre Vieira (lists) via Gcc-patches
On 22/11/2021 12:39, Richard Biener wrote: + if (first_loop_vinfo->suggested_unroll_factor > 1) +{ + if (LOOP_VINFO_EPIL_USING_PARTIAL_VECTORS_P (first_loop_vinfo)) + { + if (dump_enabled_p ()) + dump_printf_loc (MSG_NOTE, vect_location, +

Re: [PATCH 1v2/3][vect] Add main vectorized loop unrolling

2021-11-25 Thread Andre Vieira (lists) via Gcc-patches
On 24/11/2021 11:00, Richard Biener wrote: On Wed, 24 Nov 2021, Andre Vieira (lists) wrote: On 22/11/2021 12:39, Richard Biener wrote: + if (first_loop_vinfo->suggested_unroll_factor > 1) +{ + if (LOOP_VINFO_EPIL_USING_PARTIAL_VECTORS_P (first_loop_vinfo)) + { +

Re: [AArch64] Enable generation of FRINTNZ instructions

2021-11-25 Thread Andre Vieira (lists) via Gcc-patches
On 22/11/2021 11:41, Richard Biener wrote: On 18/11/2021 11:05, Richard Biener wrote: This is a good shout and made me think about something I hadn't before... I thought I could handle the vector forms later, but the problem is if I add support for the scalar, it will stop the vectorizer. It

Re: [AArch64] Enable generation of FRINTNZ instructions

2021-11-29 Thread Andre Vieira (lists) via Gcc-patches
On 18/11/2021 11:05, Richard Biener wrote: + (if (!flag_trapping_math + && direct_internal_fn_supported_p (IFN_TRUNC, type, +OPTIMIZE_FOR_BOTH)) + (IFN_TRUNC @0) #endif does IFN_FTRUNC_INT preserve the same exceptions as doing

Re: [PATCH 1v2/3][vect] Add main vectorized loop unrolling

2021-11-30 Thread Andre Vieira (lists) via Gcc-patches
On 25/11/2021 12:46, Richard Biener wrote: Oops, my fault, yes, it does. I would suggest to refactor things so that the mode_i = first_loop_i case is there only once. I also wonder if all the argument about starting at 0 doesn't apply to the not unrolled LOOP_VINFO_EPIL_USING_PARTIAL_VECTORS_

[vect] Re-analyze all modes for epilogues

2021-12-07 Thread Andre Vieira (lists) via Gcc-patches
ree-vect-loop.c (vect_better_loop_vinfo_p): Round factors up for epilogue costing.     (vect_analyze_loop): Re-analyze all modes for epilogues. gcc/testsuite/ChangeLog:     * gcc.target/aarch64/masked_epilogue.c: New test. On 30/11/2021 13:56, Richard Biener wrote: On Tue, 30 Nov 2021, Andr

Re: [AArch64] Enable generation of FRINTNZ instructions

2021-12-07 Thread Andre Vieira (lists) via Gcc-patches
ping On 25/11/2021 13:53, Andre Vieira (lists) via Gcc-patches wrote: On 22/11/2021 11:41, Richard Biener wrote: On 18/11/2021 11:05, Richard Biener wrote: This is a good shout and made me think about something I hadn't before... I thought I could handle the vector forms later, bu

Re: [vect] Re-analyze all modes for epilogues

2021-12-07 Thread Andre Vieira (lists) via Gcc-patches
costs): Add new member m_suggested_unroll_factor.     (vector_costs::suggested_unroll_factor): New getter function.     (finish_cost): Set return argument suggested_unroll_factor. Regards, Andre On 07/12/2021 11:27, Andre Vieira (lists) via Gcc-patches wrote: Hi, I've split this

Re: [vect] Re-analyze all modes for epilogues

2021-12-07 Thread Andre Vieira (lists) via Gcc-patches
On 07/12/2021 11:45, Richard Biener wrote: Can you check whether, give we know the main VF, the epilogue analysis does not start with am autodetected vector mode that needs a too large VF? Hmm struggling to see how we could check this here. AFAIU before we analyze the loop for a given vector

ifcvt: Fix bitpos calculation in bitfield lowering [PR107229]

2022-10-12 Thread Andre Vieira (lists) via Gcc-patches
Hi, The bitposition calculation for the bitfield lowering in loop if conversion was not taking DECL_FIELD_OFFSET into account, which meant that it would result in wrong bitpositions for bitfields that did not end up having representations starting at the beginning of the struct. Bootstrappend

vect: Don't pattern match BITFIELD_REF's of non-integrals [PR107226]

2022-10-12 Thread Andre Vieira (lists) via Gcc-patches
Hi, The original patch supported matching the vect_recog_bitfield_ref_pattern for BITFIELD_REF's where the first operand didn't have a INTEGRAL_TYPE_P type. That means it would also match vectors, leading to regressions in targets that supported vectorization of those. Bootstrappend and regr

Re: ifcvt: Fix bitpos calculation in bitfield lowering [PR107229]

2022-10-13 Thread Andre Vieira (lists) via Gcc-patches
Added some extra comments to describe what is going on there. On 13/10/2022 09:14, Richard Biener wrote: On Wed, 12 Oct 2022, Andre Vieira (lists) wrote: Hi, The bitposition calculation for the bitfield lowering in loop if conversion was not taking DECL_FIELD_OFFSET into account, which meant

Re: ifcvt: Fix bitpos calculation in bitfield lowering [PR107229]

2022-10-13 Thread Andre Vieira (lists) via Gcc-patches
Hi Rainer, Thanks for reporting, I was actually expecting these! I thought about pre-empting them by using a positive filter on the tests for aarch64 and x86_64 as I knew those would pass, but I thought it would be better to let other targets report failures since then you get a testsuite that

Re: ifcvt: Fix bitpos calculation in bitfield lowering [PR107229]

2022-10-13 Thread Andre Vieira (lists) via Gcc-patches
On 13/10/2022 15:15, Richard Biener wrote: On Thu, 13 Oct 2022, Andre Vieira (lists) wrote: Hi Rainer, Thanks for reporting, I was actually expecting these! I thought about pre-empting them by using a positive filter on the tests for aarch64 and x86_64 as I knew those would pass, but I

[PATCH] ifcvt: Do not lower bitfields if we can't analyze dr's [PR107275]

2022-10-18 Thread Andre Vieira (lists) via Gcc-patches
The ifcvt dead code elimination code was not built to deal with inline assembly, as loops with such would never be if-converted in the past since we can't do data-reference analysis on them and vectorization would eventually fail. For this reason we now also do not lower bitfields if the data-ref

[PATCH]vect: Fix vectype when widening container type in bitfield pattern [PR107326]

2022-10-20 Thread Andre Vieira (lists) via Gcc-patches
Hi, The 'vect_recog_bitfield_ref_pattern' was not correctly adapting the vectype when widening the container. I thought the original tests covered that code-path but they didn't, so I added a new run-test that covers it too. Bootstrapped and regression tested on x86_64 and aarch64. gcc/Cha

vect: Make vect_check_gather_scatter reject offsets that aren't multiples of BITS_PER_UNIT [PR107346]

2022-10-21 Thread Andre Vieira (lists) via Gcc-patches
Hi, The ada failure reported in the PR was being caused by vect_check_gather_scatter failing to deal with bit offsets that weren't multiples of BITS_PER_UNIT. This patch makes vect_check_gather_scatter reject memory accesses with such offsets. Bootstrapped and regression tested on aarch64 an

Re: vect: Make vect_check_gather_scatter reject offsets that aren't multiples of BITS_PER_UNIT [PR107346]

2022-10-24 Thread Andre Vieira (lists) via Gcc-patches
On 24/10/2022 08:17, Richard Biener wrote: Can you check why vect_find_stmt_data_reference doesn't trip on the if (TREE_CODE (DR_REF (dr)) == COMPONENT_REF && DECL_BIT_FIELD (TREE_OPERAND (DR_REF (dr), 1))) { free_data_ref (dr); return opt_result::failure_at (stmt

Re: vect: Make vect_check_gather_scatter reject offsets that aren't multiples of BITS_PER_UNIT [PR107346]

2022-10-24 Thread Andre Vieira (lists) via Gcc-patches
On 24/10/2022 13:46, Richard Biener wrote: On Mon, 24 Oct 2022, Andre Vieira (lists) wrote: On 24/10/2022 08:17, Richard Biener wrote: Can you check why vect_find_stmt_data_reference doesn't trip on the if (TREE_CODE (DR_REF (dr)) == COMPONENT_REF && D

Re: vect: Make vect_check_gather_scatter reject offsets that aren't multiples of BITS_PER_UNIT [PR107346]

2022-10-28 Thread Andre Vieira (lists) via Gcc-patches
On 24/10/2022 14:29, Richard Biener wrote: On Mon, 24 Oct 2022, Andre Vieira (lists) wrote: Changing if-convert would merely change this testcase but we could still trigger using a different structure type, changing the size of Int24 to 32 bits rather than 24: package Loop_Optimization23_Pkg

[PATCH] ifcvt: Support bitfield lowering of multiple-exit loops

2022-11-03 Thread Andre Vieira (lists) via Gcc-patches
Hi, With Tamar's patch (https://gcc.gnu.org/pipermail/gcc-patches/2022-November/604880.html) enabling the vectorization of early-breaks, I'd like to allow bitfield lowering in such loops, which requires the relaxation of allowing multiple exits when doing so.  In order to avoid a similar issu

Re: [PATCH][AArch64] Implement ACLE Data Intrinsics

2022-08-11 Thread Andre Vieira (lists) via Gcc-patches
OK to backport this to gcc-12? Applies cleanly and did a bootstrat and regression test on aarch64-linux-gnu Regards, Andre On 01/07/2022 12:26, Richard Sandiford wrote: "Andre Vieira (lists)" writes: On 29/06/2022 08:18, Richard Sandiford wrote: + break; +case AA

Re: [PATCH] Teach vectorizer to deal with bitfield accesses (was: [RFC] Teach vectorizer to deal with bitfield reads)

2022-08-16 Thread Andre Vieira (lists) via Gcc-patches
Hi, New version of the patch attached, but haven't recreated the ChangeLog yet, just waiting to see if this is what you had in mind. See also some replies to your comments in-line below: On 09/08/2022 15:34, Richard Biener wrote: @@ -2998,7 +3013,7 @@ ifcvt_split_critical_edges (class loop

Re: [PATCH] Teach vectorizer to deal with bitfield accesses (was: [RFC] Teach vectorizer to deal with bitfield reads)

2022-08-25 Thread Andre Vieira (lists) via Gcc-patches
On 17/08/2022 13:49, Richard Biener wrote: Yes, of course. What you need to do is subtract DECL_FIELD_BIT_OFFSET of the representative from DECL_FIELD_BIT_OFFSET of the original bitfield access - that's the offset within the representative (by construction both fields share DECL_FIELD_OFFSET).

Re: [PATCH] Teach vectorizer to deal with bitfield accesses (was: [RFC] Teach vectorizer to deal with bitfield reads)

2022-09-08 Thread Andre Vieira (lists) via Gcc-patches
Ping. On 25/08/2022 10:09, Andre Vieira (lists) via Gcc-patches wrote: On 17/08/2022 13:49, Richard Biener wrote: Yes, of course.  What you need to do is subtract DECL_FIELD_BIT_OFFSET of the representative from DECL_FIELD_BIT_OFFSET of the original bitfield access - that's the o

[PATCH] vect, tree-optimization/105219: Disable epilogue vectorization when peeling for alignment

2022-04-26 Thread Andre Vieira (lists) via Gcc-patches
Hi, This patch disables epilogue vectorization when we are peeling for alignment in the prologue and we can't guarantee the main vectorized loop is entered.  This is to prevent executing vectorized code with an unaligned access if the target has indicated it wants to peel for alignment. We ta

Re: [PATCH] vect, tree-optimization/105219: Disable epilogue vectorization when peeling for alignment

2022-04-26 Thread Andre Vieira (lists) via Gcc-patches
On 26/04/2022 15:43, Richard Sandiford wrote: "Andre Vieira (lists)" writes: Hi, This patch disables epilogue vectorization when we are peeling for alignment in the prologue and we can't guarantee the main vectorized loop is entered.  This is to prevent executing vectoriz

Re: [PATCH] vect, tree-optimization/105219: Disable epilogue vectorization when peeling for alignment

2022-04-26 Thread Andre Vieira (lists) via Gcc-patches
On 26/04/2022 16:12, Jakub Jelinek wrote: On Tue, Apr 26, 2022 at 03:43:13PM +0100, Richard Sandiford via Gcc-patches wrote: --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/pr105219-2.c @@ -0,0 +1,29 @@ +/* { dg-do run } */ +/* { dg-options "-O3 -march=armv8.2-a -mtune=thunderx -fno-vect-c

Re: [PATCH] vect, tree-optimization/105219: Disable epilogue vectorization when peeling for alignment

2022-04-27 Thread Andre Vieira (lists) via Gcc-patches
On 27/04/2022 07:35, Richard Biener wrote: On Tue, 26 Apr 2022, Richard Sandiford wrote: "Andre Vieira (lists)" writes: Hi, This patch disables epilogue vectorization when we are peeling for alignment in the prologue and we can't guarantee the main vectorized loop is enter

Re: [PATCH] tree-optimization/105219 - bogus max iters for vectorized epilogue

2022-04-28 Thread Andre Vieira (lists) via Gcc-patches
On 27/04/2022 15:03, Richard Biener wrote: On Wed, 27 Apr 2022, Richard Biener wrote: The following makes sure to take into account prologue peeling when trying to narrow down the maximum number of iterations computed for the epilogue of a vectorized epilogue. Bootstrap & regtest running on x

[AArch64] Improve SVE dup intrinsics codegen

2022-05-17 Thread Andre Vieira (lists) via Gcc-patches
Hi, This patch teaches the aarch64 backend to improve codegen when using dup with NEON vectors with repeating patterns. It will attempt to use a smaller NEON vector (or element) to limit the number of instructions needed to construct the input vector. Bootstrapped and regression tested  aarc

Re: [0/9] [middle-end] Add param to vec_perm_const hook to specify mode of input operand

2022-05-18 Thread Andre Vieira (lists) via Gcc-patches
Hi Prathamesh, I am just looking at this as it interacts with a change I am trying to make, but I'm not a reviewer so take my comments with a pinch of salt ;) I copied in bits of your patch below to comment. > -@deftypefn {Target Hook} bool TARGET_VECTORIZE_VEC_PERM_CONST (machine_mode @var{

[PATCH 0/3][vect] Enable vector unrolling of main loop

2021-09-17 Thread Andre Vieira (lists) via Gcc-patches
Hi all, This patch series enables unrolling of an unpredicated main vectorized loop based on a target hook. The epilogue loop will have (at least) half the VF of the main loop and can be predicated. Andre Vieira (3): [vect] Add main vectorized loop unrolling [vect] Consider outside costs

[PATCH 1/3][vect] Add main vectorized loop unrolling

2021-09-17 Thread Andre Vieira (lists) via Gcc-patches
Hi all, This patch adds the ability to define a target hook to unroll the main vectorized loop. It also introduces --param's vect-unroll and vect-unroll-reductions to control this through a command-line. I found this useful to experiment and believe can help when tuning, so I decided to leave

[PATCH 2/3][vect] Consider outside costs earlier for epilogue loops

2021-09-17 Thread Andre Vieira (lists) via Gcc-patches
Hi, This patch changes the order in which we check outside and inside costs for epilogue loops, this is to ensure that a predicated epilogue is more likely to be picked over an unpredicated one, since it saves having to enter a scalar epilogue loop. gcc/ChangeLog:     * tree-vect-loop.c

Re: [PATCH 1/3][vect] Add main vectorized loop unrolling

2021-09-21 Thread Andre Vieira (lists) via Gcc-patches
Hi Richi, Thanks for the review, see below some questions. On 21/09/2021 13:30, Richard Biener wrote: On Fri, 17 Sep 2021, Andre Vieira (lists) wrote: Hi all, This patch adds the ability to define a target hook to unroll the main vectorized loop. It also introduces --param's vect-unrol

Re: [PATCH 1v2/3][vect] Add main vectorized loop unrolling

2021-09-30 Thread Andre Vieira (lists) via Gcc-patches
Hi, That just forces trying the vector modes we've tried before. Though I might need to revisit this now I think about it. I'm afraid it might be possible for this to generate an epilogue with a vf that is not lower than that of the main loop, but I'd need to think about this again. Either way

[AArch64] Fix NEON load/store gimple lowering and big-endian testisms

2021-11-04 Thread Andre Vieira (lists) via Gcc-patches
Hi, This should address the ubsan bootstrap build and big-endian testisms reported against the last NEON load/store gimple lowering patch. I also fixed a follow-up issue where the alias information was leading to a bad codegen transformation. The NEON intrinsics specifications do not forbid t

Re: [AArch64] Fix NEON load/store gimple lowering and big-endian testisms

2021-11-09 Thread Andre Vieira (lists) via Gcc-patches
Thank you both! Here is a reworked version, this OK for trunk?diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c index a815e4cfbccab692ca688ba87c71b06c304abbfb..e06131a7c61d31c1be3278dcdccc49c3053c78cb 100644 --- a/gcc/config/aarch64/aarch64-builtins.c +

[AArch64] Fix big-endian testisms introduced by NEON gimple lowering patch

2021-11-09 Thread Andre Vieira (lists) via Gcc-patches
Decided to split the patches up to make it clear that the testisms fixes had nothing to do with the TBAA fix. I'll be committing these two separately First: [AArch64] Fix big-endian testisms introduced by NEON gimple lowering patch This patch reverts the tests for big-endian after the NEON gim

[AArch64] Fix TBAA information when lowering NEON loads and stores to gimple

2021-11-09 Thread Andre Vieira (lists) via Gcc-patches
And second (also added a test): [AArch64] Fix TBAA information when lowering NEON loads and stores to gimple This patch fixes the wrong TBAA information when lowering NEON loads and stores to gimple that showed up when bootstrapping with UBSAN. gcc/ChangeLog:     * config/aarch64/aarch64

[committed][AArch64] Fix bootstrap failure due to missing ATTRIBUTE_UNUSED,andsim01,Wed 10-Nov-21 12:58 PM,View with a light background,Like,Reply,Reply all,Forward

2021-11-10 Thread Andre Vieira (lists) via Gcc-patches
Hi, Committed this as obvious. My earlier patch removed the need for the GSI to be used. gcc/ChangeLog:     * config/aarch64/aarch64-builtins.c     (aarch64_general_gimple_fold_builtin): Mark argument as unused. diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/

Re: [PATCH 1v2/3][vect] Add main vectorized loop unrolling

2021-11-11 Thread Andre Vieira (lists) via Gcc-patches
Hi, This is the rebased and reworked version of the unroll patch.  I wasn't entirely sure whether I should compare the costs of the unrolled loop_vinfo with the original loop_vinfo it was unrolled of. I did now, but I wasn't too sure whether it was a good idea to... Any thoughts on this? Re

[AArch64] Enable generation of FRINTNZ instructions

2021-11-11 Thread Andre Vieira (lists) via Gcc-patches
Hi, This patch introduces two IFN's FTRUNC32 and FTRUNC64, the corresponding optabs and mappings. It also creates a backend pattern to implement them for aarch64 and a match.pd pattern to idiom recognize these. These IFN's (and optabs) represent a truncation towards zero, as if performed by fi

[PATCH][vect] Use main loop's thresholds and vectorization factor to narrow upper_bound of epilogue

2021-05-23 Thread Andre Vieira (lists) via Gcc-patches
Hi, When vectorizing with --param vect-partial-vector-usage=1 the vectorizer uses an unpredicated (all-true predicate for SVE) main loop and a predicated tail loop. The way this was implemented seems to mean it re-uses the same vector-mode for both loops, which means the tail loop isn't an ac

Re: [PATCH][vect] Use main loop's thresholds and vectorization factor to narrow upper_bound of epilogue

2021-06-03 Thread Andre Vieira (lists) via Gcc-patches
Thank you Kewen!! I will apply this now. BR, Andre On 25/05/2021 09:42, Kewen.Lin wrote: on 2021/5/24 下午3:21, Kewen.Lin via Gcc-patches wrote: Hi Andre, on 2021/5/24 下午2:17, Andre Vieira (lists) via Gcc-patches wrote: Hi, When vectorizing with --param vect-partial-vector-usage=1 the

[RFC] Implementing detection of saturation and rounding arithmetic

2021-06-03 Thread Andre Vieira (lists) via Gcc-patches
Hi, This RFC is motivated by the IV sharing RFC in https://gcc.gnu.org/pipermail/gcc-patches/2021-May/569502.html and the need to have the IVOPTS pass be able to clean up IV's shared between multiple loops. When creating a similar problem with C code I noticed IVOPTs treated IV's with uses ou

[RFC][ivopts] Generate better code for IVs with uses outside the loop (was Re: [RFC] Implementing detection of saturation and rounding arithmetic)

2021-06-03 Thread Andre Vieira (lists) via Gcc-patches
Streams got crossed there and used the wrong subject ... On 03/06/2021 17:34, Andre Vieira (lists) via Gcc-patches wrote: Hi, This RFC is motivated by the IV sharing RFC in https://gcc.gnu.org/pipermail/gcc-patches/2021-May/569502.html and the need to have the IVOPTS pass be able to clean up

[RFC][ivopts] Generate better code for IVs with uses outside the loop

2021-06-10 Thread Andre Vieira (lists) via Gcc-patches
On 08/06/2021 16:00, Andre Simoes Dias Vieira via Gcc-patches wrote: Hi Bin, Thank you for the reply, I have some questions, see below. On 07/06/2021 12:28, Bin.Cheng wrote: On Fri, Jun 4, 2021 at 12:35 AM Andre Vieira (lists) via Gcc-patches wrote: Hi Andre, I didn't look int

Re: [RFC] Using main loop's updated IV as base_address for epilogue vectorization

2021-06-14 Thread Andre Vieira (lists) via Gcc-patches
Hi, On 20/05/2021 11:22, Richard Biener wrote: On Mon, 17 May 2021, Andre Vieira (lists) wrote: Hi, So this is my second attempt at finding a way to improve how we generate the vector IV's and teach the vectorizer to share them between main loop and epilogues. On IRC we discussed my id

Re: [PATCH 1v2/3][vect] Add main vectorized loop unrolling

2021-10-12 Thread Andre Vieira (lists) via Gcc-patches
gle_defuse_cyle when unrolling.     * tree-vect-slp.c (vect_bb_vectorization_profitable_p): Adjust call to finish_cost.     * tree-vectorizer.h (finish_cost): Change to pass new class vec_info parameter. On 01/10/2021 09:19, Richard Biener wrote: On Thu, 30 Sep 2021, Andre Vieira (lists) wr

[arm] Fix MVE addressing modes for VLDR[BHW] and VSTR[BHW]

2021-10-12 Thread Andre Vieira (lists) via Gcc-patches
addressing modes. gcc/ChangeLog: 2021-10-12  Andre Vieira      * config/arm/arm.c (thumb2_legitimate_address_p): Use VALID_MVE_MODE     when checking mve addressing modes.     (mve_vector_mem_operand): Fix the way we handle pre, post and offset     addressing modes

Re: [arm] Fix MVE addressing modes for VLDR[BHW] and VSTR[BHW]

2021-10-13 Thread Andre Vieira (lists) via Gcc-patches
On 13/10/2021 13:37, Kyrylo Tkachov wrote: Hi Andre, @@ -24276,7 +24271,7 @@ arm_print_operand (FILE *stream, rtx x, int code) else if (code == POST_MODIFY || code == PRE_MODIFY) { asm_fprintf (stream, "[%r", REGNO (XEXP (addr, 0))); - postinc_reg = XEX

Re: [PATCH 2/3][vect] Consider outside costs earlier for epilogue loops

2021-10-14 Thread Andre Vieira (lists) via Gcc-patches
Hi, I completely forgot I still had this patch out as well, I grouped it together with the unrolling because it was what motivated the change, but it is actually wider applicable and can be reviewed separately. On 17/09/2021 16:32, Andre Vieira (lists) via Gcc-patches wrote: Hi, This patch

Re: [Patch][GCC][middle-end] - Lower store and load neon builtins to gimple

2021-10-20 Thread Andre Vieira (lists) via Gcc-patches
On 27/09/2021 12:54, Richard Biener via Gcc-patches wrote: On Mon, 27 Sep 2021, Jirui Wu wrote: Hi all, I now use the type based on the specification of the intrinsic instead of type based on formal argument. I use signed Int vector types because the outputs of the neon builtins that I am low

Re: FW: [PING] Re: [Patch][GCC][middle-end] - Generate FRINTZ for (double)(int) under -ffast-math on aarch64

2021-10-20 Thread Andre Vieira (lists) via Gcc-patches
On 19/10/2021 00:22, Joseph Myers wrote: On Fri, 15 Oct 2021, Richard Biener via Gcc-patches wrote: On Fri, Sep 24, 2021 at 2:59 PM Jirui Wu via Gcc-patches wrote: Hi, Ping: https://gcc.gnu.org/pipermail/gcc-patches/2021-August/577846.html The patch is attached as text for ease of use. Is

Re: [PATCH 1v2/3][vect] Add main vectorized loop unrolling

2021-10-20 Thread Andre Vieira (lists) via Gcc-patches
On 15/10/2021 09:48, Richard Biener wrote: On Tue, 12 Oct 2021, Andre Vieira (lists) wrote: Hi Richi, I think this is what you meant, I now hide all the unrolling cost calculations in the existing target hooks for costs. I did need to adjust 'finish_cost' to take the loop_vi

[Aarch64] Fix alignment of neon loads & stores in gimple

2021-10-25 Thread Andre Vieira (lists) via Gcc-patches
Hi, This fixes the alignment on the memory access type for neon loads & stores in the gimple lowering. Bootstrap ubsan on aarch64 builds again with this change. 2021-10-25  Andre Vieira  gcc/ChangeLog:     * config/aarch64/aarch64-builtins.c (aarch64_general_gimple_fold_bui

[RFC] Using main loop's updated IV as base_address for epilogue vectorization

2021-04-30 Thread Andre Vieira (lists) via Gcc-patches
tion for which I haven't quite worked out a solution yet and does cause some minor regressions due to unfortunate spills. Let me know what you think and if you have ideas of how we can better achieve this. Kind regards, Andre Vieira diff --git a/gcc/tree-vect-loop-manip.c

Re: [PATCH 6/9] arm: Auto-vectorization for MVE: vcmp

2021-05-04 Thread Andre Vieira (lists) via Gcc-patches
Hi Christophe, On 30/04/2021 15:09, Christophe Lyon via Gcc-patches wrote: Since MVE has a different set of vector comparison operators from Neon, we have to update the expansion to take into account the new ones, for instance 'NE' for which MVE does not require to use 'EQ' with the inverted con

Re: [PATCH 7/9] arm: Auto-vectorization for MVE: add __fp16 support to VCMP

2021-05-04 Thread Andre Vieira (lists) via Gcc-patches
It would be good to also add tests for NEON as you also enable auto-vec for it. I checked and I do think the necessary 'neon_vc' patterns exist for 'VH', so we should be OK there. On 30/04/2021 15:09, Christophe Lyon via Gcc-patches wrote: This patch adds __fp16 support to the previous patch t

Re: [PATCH 9/9] arm: Auto-vectorization for MVE: vld4/vst4

2021-05-04 Thread Andre Vieira (lists) via Gcc-patches
Hi Christophe, The series LGTM but you'll need the approval of an arm port maintainer before committing. I only did code-review, did not try to build/run tests. Kind regards, Andre On 30/04/2021 15:09, Christophe Lyon via Gcc-patches wrote: This patch enables MVE vld4/vst4 instructions for a

Re: [RFC] Using main loop's updated IV as base_address for epilogue vectorization

2021-05-05 Thread Andre Vieira (lists) via Gcc-patches
th vector and scalar!) and then teach it to merge IV's if one ends where the other begins? On 04/05/2021 10:56, Richard Biener wrote: On Fri, 30 Apr 2021, Andre Vieira (lists) wrote: Hi, The aim of this RFC is to explore a way of cleaning up the codegen around data_references.  To be s

Re: [RFC] Using main loop's updated IV as base_address for epilogue vectorization

2021-05-05 Thread Andre Vieira (lists) via Gcc-patches
On 05/05/2021 13:34, Richard Biener wrote: On Wed, 5 May 2021, Andre Vieira (lists) wrote: I tried to see what IVOPTs would make of this and it is able to analyze the IVs but it doesn't realize (not even sure it tries) that one IV's end (loop 1) could be used as the base for the o

[PATCH][AArch64]: Use UNSPEC_LD1_SVE for all LD1 loads

2021-05-14 Thread Andre Vieira (lists) via Gcc-patches
PEC_PRED_X. If there is a firm belief the UNSPEC_LD1_SVE will not be used for anything I am also happy to refactor it out. Bootstrapped and regression tested aarch64-none-linux-gnu. Is this OK for trunk? Kind regards, Andre Vieira gcc/ChangeLog: 2021-05-14  Andre Vieira      * config/aarch

Re: [RFC] Using main loop's updated IV as base_address for epilogue vectorization

2021-05-17 Thread Andre Vieira (lists) via Gcc-patches
Hi, So this is my second attempt at finding a way to improve how we generate the vector IV's and teach the vectorizer to share them between main loop and epilogues. On IRC we discussed my idea to use the loop's control_iv, but that was a terrible idea and I quickly threw it in the bin. The mai

Re: [PATCH][AArch64]: Use UNSPEC_LD1_SVE for all LD1 loads

2021-05-18 Thread Andre Vieira (lists) via Gcc-patches
the extending aarch64_load_* patterns accept both UNSPEC_LD1_SVE and UNSPEC_PRED_X. Is this OK for trunk? Kind regards, Andre Vieira gcc/ChangeLog: 2021-05-18  Andre Vieira      * config/aarch64/iterators.md (SVE_PRED_LOAD): New iterator.     (pred_load): New int attribute.     * con

Re: [PATCH] aarch64: enable mixed-types for aarch64 simdclones

2023-08-29 Thread Andre Vieira (lists) via Gcc-patches
Hi, This patch enables the use of mixed-types for simd clones for AArch64, adds aarch64 as a target_vect_simd_clones and corrects the way the simdlen is chosen for non-specified simdlen clauses according to the 'Vector Function Application Binary Interface Specification for AArch64'. Additio

aarch64, vect, omp: Add SVE support for simd clones [PR 96342]

2023-08-30 Thread Andre Vieira (lists) via Gcc-patches
on aarch64-unknown-linux-gnu and x86_64-pc-linux-gnu. I also tried building the patches separately, but that was before some further clean-up restructuring, so will do that again prior to pushing. Andre Vieira (8): parloops: Copy target and optimizations when creating a function clone parloops

[PATCH 1/8] parloops: Copy target and optimizations when creating a function clone

2023-08-30 Thread Andre Vieira (lists) via Gcc-patches
SVE simd clones require to be compiled with a SVE target enabled or the argument types will not be created properly. To achieve this we need to copy DECL_FUNCTION_SPECIFIC_TARGET from the original function declaration to the clones. I decided it was probably also a good idea to copy DECL_FUN

[Patch 2/8] parloops: Allow poly nit and bound

2023-08-30 Thread Andre Vieira (lists) via Gcc-patches
Teach parloops how to handle a poly nit and bound e ahead of the changes to enable non-constant simdlen. gcc/ChangeLog: * tree-parloops.cc (try_to_transform_to_exit_first_loop_alt): Accept poly NIT and ALT_BOUND.diff --git a/gcc/tree-parloops.cc b/gcc/tree-parloops.cc index a35

[Patch 3/8] vect: Fix vect_get_smallest_scalar_type for simd clones

2023-08-30 Thread Andre Vieira (lists) via Gcc-patches
The vect_get_smallest_scalar_type helper function was using any argument to a simd clone call when trying to determine the smallest scalar type that would be vectorized. This included the function pointer type in a MASK_CALL for instance, and would result in the wrong type being selected. Ins

[PATCH 4/8] vect: don't allow fully masked loops with non-masked simd clones [PR 110485]

2023-08-30 Thread Andre Vieira (lists) via Gcc-patches
When analyzing a loop and choosing a simdclone to use it is possible to choose a simdclone that cannot be used 'inbranch' for a loop that can use partial vectors. This may lead to the vectorizer deciding to use partial vectors which are not supported for notinbranch simd clones. This patch fix

[PATCH 5/8] vect: Use inbranch simdclones in masked loops

2023-08-30 Thread Andre Vieira (lists) via Gcc-patches
This patch enables the compiler to use inbranch simdclones when generating masked loops in autovectorization. gcc/ChangeLog: * omp-simd-clone.cc (simd_clone_adjust_argument_types): Make function compatible with mask parameters in clone. * tree-vect-stmts.cc (vect_convert

[PATCH 6/8] vect: Add vector_mode paramater to simd_clone_usable

2023-08-30 Thread Andre Vieira (lists) via Gcc-patches
This patch adds a machine_mode parameter to the TARGET_SIMD_CLONE_USABLE hook to enable rejecting SVE modes when the target architecture does not support SVE. gcc/ChangeLog: * config/aarch64/aarch64.cc (aarch64_simd_clone_usable): Add mode parameter and use to to reject SVE mod

Re: [PATCH 6/8] vect: Add vector_mode paramater to simd_clone_usable

2023-08-30 Thread Andre Vieira (lists) via Gcc-patches
Forgot to CC this one to maintainers... On 30/08/2023 10:14, Andre Vieira (lists) via Gcc-patches wrote: This patch adds a machine_mode parameter to the TARGET_SIMD_CLONE_USABLE hook to enable rejecting SVE modes when the target architecture does not support SVE. gcc/ChangeLog

[PATCH7/8] vect: Add TARGET_SIMD_CLONE_ADJUST_RET_OR_PARAM

2023-08-30 Thread Andre Vieira (lists) via Gcc-patches
This patch adds a new target hook to enable us to adapt the types of return and parameters of simd clones. We use this in two ways, the first one is to make sure we can create valid SVE types, including the SVE type attribute, when creating a SVE simd clone, even when the target options do not

[PATCH 8/8] aarch64: Add SVE support for simd clones [PR 96342]

2023-08-30 Thread Andre Vieira (lists) via Gcc-patches
This patch finalizes adding support for the generation of SVE simd clones when no simdlen is provided, following the ABI rules where the widest data type determines the minimum amount of elements in a length agnostic vector. gcc/ChangeLog: * config/aarch64/aarch64-protos.h (add_sve_ty

Re: [PATCH 6/8] vect: Add vector_mode paramater to simd_clone_usable

2023-08-30 Thread Andre Vieira (lists) via Gcc-patches
On 30/08/2023 14:01, Richard Biener wrote: On Wed, Aug 30, 2023 at 11:15 AM Andre Vieira (lists) via Gcc-patches wrote: This patch adds a machine_mode parameter to the TARGET_SIMD_CLONE_USABLE hook to enable rejecting SVE modes when the target architecture does not support SVE. How does

<    2   3   4   5   6   7   8   >