cost
to ensure the behaviour remains the same.
2022-03-16 Tamar Christina
Andre Vieira
gcc/ChangeLog:
* config/aarch64/aarch64-protos.h (struct cpu_memmov_cost): New
struct.
(struct tune_params): Change type of memmov_cost to use
cpu_memmov_cost
Hi,
This patch implements the costing function
determine_suggested_unroll_factor for aarch64.
It determines the unrolling factor by dividing the number of X
operations we can do per cycle by the number of X operations in the loop
body, taking this information from the vec_ops analysis during v
Hi,
As requested, I updated the Neoverse N2 entry to use the
AARCH64_FL_FOR_ARCH9 feature set, removed duplicate entries, updated the
ARCH_INDENT to 9A and moved it under the Armv9 cores.
gcc/ChangeLog:
* config/aarch64/aarch64-cores.def: Update Neoverse N2 core entry.
diff --git a/g
Ping.
On 16/03/2022 15:00, Andre Vieira (lists) via Gcc-patches wrote:
Hi,
As requested, I updated the Neoverse N2 entry to use the
AARCH64_FL_FOR_ARCH9 feature set, removed duplicate entries, updated
the ARCH_INDENT to 9A and moved it under the Armv9 cores.
gcc/ChangeLog
.
(aarch64_vector_costs::add_stmt_cost): Check for a qualifying
pattern
to set m_nosve_pattern.
(aarch64_vector_costs::finish_costs): Use
determine_suggested_unroll_factor.
* config/aarch64/aarch64.opt (aarch64-vect-unroll-limit): New.
On 16/03/2022 18:01, Richard Sandiford wrote:
"
On 28/03/2022 15:59, Richard Sandiford wrote:
"Andre Vieira (lists)" writes:
Hi,
Addressed all of your comments bar the pred ops one.
Is this OK?
gcc/ChangeLog:
* config/aarch64/aarch64.cc (aarch64_vector_costs): Define
determine_suggested_unroll_factor and m_nos
Hi,
This addresses the compile-time increase seen in the PR target/105157.
This was being caused by selecting the wrong core tuning, as when we
added the latest AArch64 the TARGET_CPU_generic tuning was pushed beyond
the 0x3f mask we used to encode both target cpu and attributes into
TARGET_C
On 08/04/2022 08:04, Richard Sandiford wrote:
I think this would be better as a static assert at the top level:
static_assert (TARGET_CPU_generic < TARGET_CPU_MASK,
"TARGET_CPU_NBITS is big enough");
The motivation being that you want this to be checked regardless of
wheth
On 14/01/2022 09:57, Richard Biener wrote:
The 'used_vector_modes' is also a heuristic by itself since it registers
every vector type we query, not only those that are used in the end ...
So it's really all heuristics that can eventually go bad.
IMHO remembering the VF that we ended up with (
On 19/01/2022 11:04, Richard Biener wrote:
On Tue, 18 Jan 2022, Andre Vieira (lists) wrote:
On 14/01/2022 09:57, Richard Biener wrote:
The 'used_vector_modes' is also a heuristic by itself since it registers
every vector type we query, not only those that are used in the end ..
Hi Christophe,
On 13/01/2022 14:56, Christophe Lyon via Gcc-patches wrote:
At some point during the development of this patch series, it appeared
that in some cases the register allocator wants “VPR or general”
rather than “VPR or general or FP” (which is the same thing as
ALL_REGS). The series
On 13/01/2022 14:56, Christophe Lyon via Gcc-patches wrote:
VPR_REG is the only register in its class, so it should be handled by
TARGET_CLASS_LIKELY_SPILLED_P, which is achieved by calling
default_class_likely_spilled_p. No test fails without this patch, but
it seems it should be implemented.
On 13/01/2022 14:56, Christophe Lyon via Gcc-patches wrote:
The vmvnq_n* intrinsics and have [u]int[16|32]_t arguments, so use
iterator instead of HI in mve_vmvnq_n_.
2022-01-13 Christophe Lyon
gcc/
* config/arm/mve.md (mve_vmvnq_n_): Use V_elem mode
for operand 1.
On 20/01/2022 09:14, Christophe Lyon wrote:
On Wed, Jan 19, 2022 at 7:18 PM Andre Vieira (lists) via Gcc-patches
wrote:
Hi Christophe,
On 13/01/2022 14:56, Christophe Lyon via Gcc-patches wrote:
> At some point during the development of this patch series, it
appea
On 20/01/2022 10:40, Richard Sandiford wrote:
"Andre Vieira (lists)" writes:
On 20/01/2022 09:14, Christophe Lyon wrote:
On Wed, Jan 19, 2022 at 7:18 PM Andre Vieira (lists) via Gcc-patches
wrote:
Hi Christophe,
On 13/01/2022 14:56, Christophe Lyon via Gcc-pat
On 20/01/2022 10:45, Richard Sandiford wrote:
"Andre Vieira (lists)" writes:
On 13/01/2022 14:56, Christophe Lyon via Gcc-patches wrote:
The vmvnq_n* intrinsics and have [u]int[16|32]_t arguments, so use
iterator instead of HI in mve_vmvnq_n_.
2022-01-13 Christophe Lyon
Hi Christophe,
On 13/01/2022 14:56, Christophe Lyon via Gcc-patches wrote:
diff --git a/gcc/config/arm/arm-simd-builtin-types.def
b/gcc/config/arm/arm-simd-builtin-types.def
index 6ba6f211531..920c2a68e4c 100644
--- a/gcc/config/arm/arm-simd-builtin-types.def
+++ b/gcc/config/arm/arm-simd-built
Hi,
As reported on PR104498, the issue here is that when
compare_base_symbol_refs swaps x and y but doesn't take that into
account when computing the distance.
This patch makes sure that if x and y are swapped, we correct the
distance computation by multiplying it by -1 to end up with the corr
On 16/11/2021 12:10, Richard Biener wrote:
On Fri, 12 Nov 2021, Andre Simoes Dias Vieira wrote:
On 12/11/2021 10:56, Richard Biener wrote:
On Thu, 11 Nov 2021, Andre Vieira (lists) wrote:
Hi,
This patch introduces two IFN's FTRUNC32 and FTRUNC64, the corresponding
optabs and mapping
On 18/11/2021 11:05, Richard Biener wrote:
@@ -3713,12 +3713,21 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
trapping behaviour, so require !flag_trapping_math. */
#if GIMPLE
(simplify
- (float (fix_trunc @0))
- (if (!flag_trapping_math
- && types_match (type, TREE_TYPE (@0))
-
On 12/11/2021 13:12, Richard Biener wrote:
On Thu, 11 Nov 2021, Andre Vieira (lists) wrote:
Hi,
This is the rebased and reworked version of the unroll patch. I wasn't
entirely sure whether I should compare the costs of the unrolled loop_vinfo
with the original loop_vinfo it was unroll
On 22/11/2021 12:39, Richard Biener wrote:
+ if (first_loop_vinfo->suggested_unroll_factor > 1)
+{
+ if (LOOP_VINFO_EPIL_USING_PARTIAL_VECTORS_P (first_loop_vinfo))
+ {
+ if (dump_enabled_p ())
+ dump_printf_loc (MSG_NOTE, vect_location,
+
On 24/11/2021 11:00, Richard Biener wrote:
On Wed, 24 Nov 2021, Andre Vieira (lists) wrote:
On 22/11/2021 12:39, Richard Biener wrote:
+ if (first_loop_vinfo->suggested_unroll_factor > 1)
+{
+ if (LOOP_VINFO_EPIL_USING_PARTIAL_VECTORS_P (first_loop_vinfo))
+ {
+
On 22/11/2021 11:41, Richard Biener wrote:
On 18/11/2021 11:05, Richard Biener wrote:
This is a good shout and made me think about something I hadn't before... I
thought I could handle the vector forms later, but the problem is if I add
support for the scalar, it will stop the vectorizer. It
On 18/11/2021 11:05, Richard Biener wrote:
+ (if (!flag_trapping_math
+ && direct_internal_fn_supported_p (IFN_TRUNC, type,
+OPTIMIZE_FOR_BOTH))
+ (IFN_TRUNC @0)
#endif
does IFN_FTRUNC_INT preserve the same exceptions as doing
On 25/11/2021 12:46, Richard Biener wrote:
Oops, my fault, yes, it does. I would suggest to refactor things so
that the mode_i = first_loop_i case is there only once. I also wonder
if all the argument about starting at 0 doesn't apply to the
not unrolled LOOP_VINFO_EPIL_USING_PARTIAL_VECTORS_
ree-vect-loop.c (vect_better_loop_vinfo_p): Round factors up
for epilogue costing.
(vect_analyze_loop): Re-analyze all modes for epilogues.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/masked_epilogue.c: New test.
On 30/11/2021 13:56, Richard Biener wrote:
On Tue, 30 Nov 2021, Andr
ping
On 25/11/2021 13:53, Andre Vieira (lists) via Gcc-patches wrote:
On 22/11/2021 11:41, Richard Biener wrote:
On 18/11/2021 11:05, Richard Biener wrote:
This is a good shout and made me think about something I hadn't
before... I
thought I could handle the vector forms later, bu
costs): Add new member
m_suggested_unroll_factor.
(vector_costs::suggested_unroll_factor): New getter function.
(finish_cost): Set return argument suggested_unroll_factor.
Regards,
Andre
On 07/12/2021 11:27, Andre Vieira (lists) via Gcc-patches wrote:
Hi,
I've split this
On 07/12/2021 11:45, Richard Biener wrote:
Can you check whether, give we know the main VF, the epilogue analysis
does not start with am autodetected vector mode that needs a too large VF?
Hmm struggling to see how we could check this here. AFAIU before we
analyze the loop for a given vector
Hi,
The bitposition calculation for the bitfield lowering in loop if
conversion was not
taking DECL_FIELD_OFFSET into account, which meant that it would result in
wrong bitpositions for bitfields that did not end up having representations
starting at the beginning of the struct.
Bootstrappend
Hi,
The original patch supported matching the
vect_recog_bitfield_ref_pattern for
BITFIELD_REF's where the first operand didn't have a INTEGRAL_TYPE_P type.
That means it would also match vectors, leading to regressions in
targets that
supported vectorization of those.
Bootstrappend and regr
Added some extra comments to describe what is going on there.
On 13/10/2022 09:14, Richard Biener wrote:
On Wed, 12 Oct 2022, Andre Vieira (lists) wrote:
Hi,
The bitposition calculation for the bitfield lowering in loop if conversion
was not
taking DECL_FIELD_OFFSET into account, which meant
Hi Rainer,
Thanks for reporting, I was actually expecting these! I thought about
pre-empting them by using a positive filter on the tests for aarch64 and
x86_64 as I knew those would pass, but I thought it would be better to
let other targets report failures since then you get a testsuite that
On 13/10/2022 15:15, Richard Biener wrote:
On Thu, 13 Oct 2022, Andre Vieira (lists) wrote:
Hi Rainer,
Thanks for reporting, I was actually expecting these! I thought about
pre-empting them by using a positive filter on the tests for aarch64 and
x86_64 as I knew those would pass, but I
The ifcvt dead code elimination code was not built to deal with inline
assembly, as loops with such would never be if-converted in the past since
we can't do data-reference analysis on them and vectorization would
eventually fail.
For this reason we now also do not lower bitfields if the data-ref
Hi,
The 'vect_recog_bitfield_ref_pattern' was not correctly adapting the
vectype when widening the container.
I thought the original tests covered that code-path but they didn't, so
I added a new run-test that covers it too.
Bootstrapped and regression tested on x86_64 and aarch64.
gcc/Cha
Hi,
The ada failure reported in the PR was being caused by
vect_check_gather_scatter failing to deal with bit offsets that weren't
multiples of BITS_PER_UNIT. This patch makes vect_check_gather_scatter
reject memory accesses with such offsets.
Bootstrapped and regression tested on aarch64 an
On 24/10/2022 08:17, Richard Biener wrote:
Can you check why vect_find_stmt_data_reference doesn't trip on the
if (TREE_CODE (DR_REF (dr)) == COMPONENT_REF
&& DECL_BIT_FIELD (TREE_OPERAND (DR_REF (dr), 1)))
{
free_data_ref (dr);
return opt_result::failure_at (stmt
On 24/10/2022 13:46, Richard Biener wrote:
On Mon, 24 Oct 2022, Andre Vieira (lists) wrote:
On 24/10/2022 08:17, Richard Biener wrote:
Can you check why vect_find_stmt_data_reference doesn't trip on the
if (TREE_CODE (DR_REF (dr)) == COMPONENT_REF
&& D
On 24/10/2022 14:29, Richard Biener wrote:
On Mon, 24 Oct 2022, Andre Vieira (lists) wrote:
Changing if-convert would merely change this testcase but we could still
trigger using a different structure type, changing the size of Int24 to 32
bits rather than 24:
package Loop_Optimization23_Pkg
Hi,
With Tamar's patch
(https://gcc.gnu.org/pipermail/gcc-patches/2022-November/604880.html)
enabling the vectorization of early-breaks, I'd like to allow bitfield
lowering in such loops, which requires the relaxation of allowing
multiple exits when doing so. In order to avoid a similar issu
OK to backport this to gcc-12? Applies cleanly and did a bootstrat and
regression test on aarch64-linux-gnu
Regards,
Andre
On 01/07/2022 12:26, Richard Sandiford wrote:
"Andre Vieira (lists)" writes:
On 29/06/2022 08:18, Richard Sandiford wrote:
+ break;
+case AA
Hi,
New version of the patch attached, but haven't recreated the ChangeLog
yet, just waiting to see if this is what you had in mind. See also some
replies to your comments in-line below:
On 09/08/2022 15:34, Richard Biener wrote:
@@ -2998,7 +3013,7 @@ ifcvt_split_critical_edges (class loop
On 17/08/2022 13:49, Richard Biener wrote:
Yes, of course. What you need to do is subtract DECL_FIELD_BIT_OFFSET
of the representative from DECL_FIELD_BIT_OFFSET of the original bitfield
access - that's the offset within the representative (by construction
both fields share DECL_FIELD_OFFSET).
Ping.
On 25/08/2022 10:09, Andre Vieira (lists) via Gcc-patches wrote:
On 17/08/2022 13:49, Richard Biener wrote:
Yes, of course. What you need to do is subtract DECL_FIELD_BIT_OFFSET
of the representative from DECL_FIELD_BIT_OFFSET of the original
bitfield
access - that's the o
Hi,
This patch disables epilogue vectorization when we are peeling for
alignment in the prologue and we can't guarantee the main vectorized
loop is entered. This is to prevent executing vectorized code with an
unaligned access if the target has indicated it wants to peel for
alignment. We ta
On 26/04/2022 15:43, Richard Sandiford wrote:
"Andre Vieira (lists)" writes:
Hi,
This patch disables epilogue vectorization when we are peeling for
alignment in the prologue and we can't guarantee the main vectorized
loop is entered. This is to prevent executing vectoriz
On 26/04/2022 16:12, Jakub Jelinek wrote:
On Tue, Apr 26, 2022 at 03:43:13PM +0100, Richard Sandiford via Gcc-patches
wrote:
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/pr105219-2.c
@@ -0,0 +1,29 @@
+/* { dg-do run } */
+/* { dg-options "-O3 -march=armv8.2-a -mtune=thunderx -fno-vect-c
On 27/04/2022 07:35, Richard Biener wrote:
On Tue, 26 Apr 2022, Richard Sandiford wrote:
"Andre Vieira (lists)" writes:
Hi,
This patch disables epilogue vectorization when we are peeling for
alignment in the prologue and we can't guarantee the main vectorized
loop is enter
On 27/04/2022 15:03, Richard Biener wrote:
On Wed, 27 Apr 2022, Richard Biener wrote:
The following makes sure to take into account prologue peeling
when trying to narrow down the maximum number of iterations
computed for the epilogue of a vectorized epilogue.
Bootstrap & regtest running on x
Hi,
This patch teaches the aarch64 backend to improve codegen when using dup
with NEON vectors with repeating patterns. It will attempt to use a
smaller NEON vector (or element) to limit the number of instructions
needed to construct the input vector.
Bootstrapped and regression tested aarc
Hi Prathamesh,
I am just looking at this as it interacts with a change I am trying to
make, but I'm not a reviewer so take my comments with a pinch of salt ;)
I copied in bits of your patch below to comment.
> -@deftypefn {Target Hook} bool TARGET_VECTORIZE_VEC_PERM_CONST
(machine_mode @var{
Hi all,
This patch series enables unrolling of an unpredicated main vectorized
loop based on a target hook. The epilogue loop will have (at least) half
the VF of the main loop and can be predicated.
Andre Vieira (3):
[vect] Add main vectorized loop unrolling
[vect] Consider outside costs
Hi all,
This patch adds the ability to define a target hook to unroll the main
vectorized loop. It also introduces --param's vect-unroll and
vect-unroll-reductions to control this through a command-line. I found
this useful to experiment and believe can help when tuning, so I decided
to leave
Hi,
This patch changes the order in which we check outside and inside costs
for epilogue loops, this is to ensure that a predicated epilogue is more
likely to be picked over an unpredicated one, since it saves having to
enter a scalar epilogue loop.
gcc/ChangeLog:
* tree-vect-loop.c
Hi Richi,
Thanks for the review, see below some questions.
On 21/09/2021 13:30, Richard Biener wrote:
On Fri, 17 Sep 2021, Andre Vieira (lists) wrote:
Hi all,
This patch adds the ability to define a target hook to unroll the main
vectorized loop. It also introduces --param's vect-unrol
Hi,
That just forces trying the vector modes we've tried before. Though I might
need to revisit this now I think about it. I'm afraid it might be possible for
this to generate an epilogue with a vf that is not lower than that of the main
loop, but I'd need to think about this again.
Either way
Hi,
This should address the ubsan bootstrap build and big-endian testisms
reported against the last NEON load/store gimple lowering patch. I also
fixed a follow-up issue where the alias information was leading to a bad
codegen transformation. The NEON intrinsics specifications do not forbid
t
Thank you both!
Here is a reworked version, this OK for trunk?diff --git a/gcc/config/aarch64/aarch64-builtins.c
b/gcc/config/aarch64/aarch64-builtins.c
index
a815e4cfbccab692ca688ba87c71b06c304abbfb..e06131a7c61d31c1be3278dcdccc49c3053c78cb
100644
--- a/gcc/config/aarch64/aarch64-builtins.c
+
Decided to split the patches up to make it clear that the testisms fixes
had nothing to do with the TBAA fix. I'll be committing these two separately
First:
[AArch64] Fix big-endian testisms introduced by NEON gimple lowering patch
This patch reverts the tests for big-endian after the NEON gim
And second (also added a test):
[AArch64] Fix TBAA information when lowering NEON loads and stores to gimple
This patch fixes the wrong TBAA information when lowering NEON loads and
stores
to gimple that showed up when bootstrapping with UBSAN.
gcc/ChangeLog:
* config/aarch64/aarch64
Hi,
Committed this as obvious. My earlier patch removed the need for the GSI
to be used.
gcc/ChangeLog:
* config/aarch64/aarch64-builtins.c
(aarch64_general_gimple_fold_builtin): Mark argument as unused.
diff --git a/gcc/config/aarch64/aarch64-builtins.c
b/gcc/config/aarch64/
Hi,
This is the rebased and reworked version of the unroll patch. I wasn't
entirely sure whether I should compare the costs of the unrolled
loop_vinfo with the original loop_vinfo it was unrolled of. I did now,
but I wasn't too sure whether it was a good idea to... Any thoughts on
this?
Re
Hi,
This patch introduces two IFN's FTRUNC32 and FTRUNC64, the corresponding
optabs and mappings. It also creates a backend pattern to implement them
for aarch64 and a match.pd pattern to idiom recognize these.
These IFN's (and optabs) represent a truncation towards zero, as if
performed by fi
Hi,
When vectorizing with --param vect-partial-vector-usage=1 the vectorizer
uses an unpredicated (all-true predicate for SVE) main loop and a
predicated tail loop. The way this was implemented seems to mean it
re-uses the same vector-mode for both loops, which means the tail loop
isn't an ac
Thank you Kewen!!
I will apply this now.
BR,
Andre
On 25/05/2021 09:42, Kewen.Lin wrote:
on 2021/5/24 下午3:21, Kewen.Lin via Gcc-patches wrote:
Hi Andre,
on 2021/5/24 下午2:17, Andre Vieira (lists) via Gcc-patches wrote:
Hi,
When vectorizing with --param vect-partial-vector-usage=1 the
Hi,
This RFC is motivated by the IV sharing RFC in
https://gcc.gnu.org/pipermail/gcc-patches/2021-May/569502.html and the
need to have the IVOPTS pass be able to clean up IV's shared between
multiple loops. When creating a similar problem with C code I noticed
IVOPTs treated IV's with uses ou
Streams got crossed there and used the wrong subject ...
On 03/06/2021 17:34, Andre Vieira (lists) via Gcc-patches wrote:
Hi,
This RFC is motivated by the IV sharing RFC in
https://gcc.gnu.org/pipermail/gcc-patches/2021-May/569502.html and the
need to have the IVOPTS pass be able to clean up
On 08/06/2021 16:00, Andre Simoes Dias Vieira via Gcc-patches wrote:
Hi Bin,
Thank you for the reply, I have some questions, see below.
On 07/06/2021 12:28, Bin.Cheng wrote:
On Fri, Jun 4, 2021 at 12:35 AM Andre Vieira (lists) via Gcc-patches
wrote:
Hi Andre,
I didn't look int
Hi,
On 20/05/2021 11:22, Richard Biener wrote:
On Mon, 17 May 2021, Andre Vieira (lists) wrote:
Hi,
So this is my second attempt at finding a way to improve how we generate the
vector IV's and teach the vectorizer to share them between main loop and
epilogues. On IRC we discussed my id
gle_defuse_cyle when unrolling.
* tree-vect-slp.c (vect_bb_vectorization_profitable_p): Adjust
call to finish_cost.
* tree-vectorizer.h (finish_cost): Change to pass new class
vec_info parameter.
On 01/10/2021 09:19, Richard Biener wrote:
On Thu, 30 Sep 2021, Andre Vieira (lists) wr
addressing
modes.
gcc/ChangeLog:
2021-10-12 Andre Vieira
* config/arm/arm.c (thumb2_legitimate_address_p): Use
VALID_MVE_MODE
when checking mve addressing modes.
(mve_vector_mem_operand): Fix the way we handle pre, post and
offset
addressing modes
On 13/10/2021 13:37, Kyrylo Tkachov wrote:
Hi Andre,
@@ -24276,7 +24271,7 @@ arm_print_operand (FILE *stream, rtx x, int code)
else if (code == POST_MODIFY || code == PRE_MODIFY)
{
asm_fprintf (stream, "[%r", REGNO (XEXP (addr, 0)));
- postinc_reg = XEX
Hi,
I completely forgot I still had this patch out as well, I grouped it
together with the unrolling because it was what motivated the change,
but it is actually wider applicable and can be reviewed separately.
On 17/09/2021 16:32, Andre Vieira (lists) via Gcc-patches wrote:
Hi,
This patch
On 27/09/2021 12:54, Richard Biener via Gcc-patches wrote:
On Mon, 27 Sep 2021, Jirui Wu wrote:
Hi all,
I now use the type based on the specification of the intrinsic
instead of type based on formal argument.
I use signed Int vector types because the outputs of the neon builtins
that I am low
On 19/10/2021 00:22, Joseph Myers wrote:
On Fri, 15 Oct 2021, Richard Biener via Gcc-patches wrote:
On Fri, Sep 24, 2021 at 2:59 PM Jirui Wu via Gcc-patches
wrote:
Hi,
Ping: https://gcc.gnu.org/pipermail/gcc-patches/2021-August/577846.html
The patch is attached as text for ease of use. Is
On 15/10/2021 09:48, Richard Biener wrote:
On Tue, 12 Oct 2021, Andre Vieira (lists) wrote:
Hi Richi,
I think this is what you meant, I now hide all the unrolling cost calculations
in the existing target hooks for costs. I did need to adjust 'finish_cost' to
take the loop_vi
Hi,
This fixes the alignment on the memory access type for neon loads &
stores in the gimple lowering. Bootstrap ubsan on aarch64 builds again
with this change.
2021-10-25 Andre Vieira
gcc/ChangeLog:
* config/aarch64/aarch64-builtins.c
(aarch64_general_gimple_fold_bui
tion for which I haven't quite worked out a
solution yet and does cause some minor regressions due to unfortunate
spills.
Let me know what you think and if you have ideas of how we can better
achieve this.
Kind regards,
Andre Vieira
diff --git a/gcc/tree-vect-loop-manip.c
Hi Christophe,
On 30/04/2021 15:09, Christophe Lyon via Gcc-patches wrote:
Since MVE has a different set of vector comparison operators from
Neon, we have to update the expansion to take into account the new
ones, for instance 'NE' for which MVE does not require to use 'EQ'
with the inverted con
It would be good to also add tests for NEON as you also enable auto-vec
for it. I checked and I do think the necessary 'neon_vc' patterns exist
for 'VH', so we should be OK there.
On 30/04/2021 15:09, Christophe Lyon via Gcc-patches wrote:
This patch adds __fp16 support to the previous patch t
Hi Christophe,
The series LGTM but you'll need the approval of an arm port maintainer
before committing. I only did code-review, did not try to build/run tests.
Kind regards,
Andre
On 30/04/2021 15:09, Christophe Lyon via Gcc-patches wrote:
This patch enables MVE vld4/vst4 instructions for a
th vector and scalar!) and then
teach it to merge IV's if one ends where the other begins?
On 04/05/2021 10:56, Richard Biener wrote:
On Fri, 30 Apr 2021, Andre Vieira (lists) wrote:
Hi,
The aim of this RFC is to explore a way of cleaning up the codegen around
data_references. To be s
On 05/05/2021 13:34, Richard Biener wrote:
On Wed, 5 May 2021, Andre Vieira (lists) wrote:
I tried to see what IVOPTs would make of this and it is able to analyze the
IVs but it doesn't realize (not even sure it tries) that one IV's end (loop 1)
could be used as the base for the o
PEC_PRED_X.
If there is a firm belief the UNSPEC_LD1_SVE will not be used for
anything I am also happy to refactor it out.
Bootstrapped and regression tested aarch64-none-linux-gnu.
Is this OK for trunk?
Kind regards,
Andre Vieira
gcc/ChangeLog:
2021-05-14 Andre Vieira
* config/aarch
Hi,
So this is my second attempt at finding a way to improve how we generate
the vector IV's and teach the vectorizer to share them between main loop
and epilogues. On IRC we discussed my idea to use the loop's control_iv,
but that was a terrible idea and I quickly threw it in the bin. The mai
the
extending aarch64_load_* patterns accept both UNSPEC_LD1_SVE and
UNSPEC_PRED_X.
Is this OK for trunk?
Kind regards,
Andre Vieira
gcc/ChangeLog:
2021-05-18 Andre Vieira
* config/aarch64/iterators.md (SVE_PRED_LOAD): New iterator.
(pred_load): New int attribute.
* con
Hi,
This patch enables the use of mixed-types for simd clones for AArch64,
adds aarch64 as a target_vect_simd_clones and corrects the way the
simdlen is chosen for non-specified simdlen clauses according to the
'Vector Function Application Binary Interface Specification for AArch64'.
Additio
on
aarch64-unknown-linux-gnu and x86_64-pc-linux-gnu. I also tried building
the patches separately, but that was before some further clean-up
restructuring, so will do that again prior to pushing.
Andre Vieira (8):
parloops: Copy target and optimizations when creating a function clone
parloops
SVE simd clones require to be compiled with a SVE target enabled or the
argument types will not be created properly. To achieve this we need to
copy DECL_FUNCTION_SPECIFIC_TARGET from the original function
declaration to the clones. I decided it was probably also a good idea
to copy DECL_FUN
Teach parloops how to handle a poly nit and bound e ahead of the changes
to enable non-constant simdlen.
gcc/ChangeLog:
* tree-parloops.cc (try_to_transform_to_exit_first_loop_alt): Accept
poly NIT and ALT_BOUND.diff --git a/gcc/tree-parloops.cc b/gcc/tree-parloops.cc
index
a35
The vect_get_smallest_scalar_type helper function was using any argument
to a simd clone call when trying to determine the smallest scalar type
that would be vectorized. This included the function pointer type in a
MASK_CALL for instance, and would result in the wrong type being
selected. Ins
When analyzing a loop and choosing a simdclone to use it is possible to
choose a simdclone that cannot be used 'inbranch' for a loop that can
use partial vectors. This may lead to the vectorizer deciding to use
partial vectors which are not supported for notinbranch simd clones.
This patch fix
This patch enables the compiler to use inbranch simdclones when
generating masked loops in autovectorization.
gcc/ChangeLog:
* omp-simd-clone.cc (simd_clone_adjust_argument_types): Make function
compatible with mask parameters in clone.
* tree-vect-stmts.cc (vect_convert
This patch adds a machine_mode parameter to the TARGET_SIMD_CLONE_USABLE
hook to enable rejecting SVE modes when the target architecture does not
support SVE.
gcc/ChangeLog:
* config/aarch64/aarch64.cc (aarch64_simd_clone_usable): Add mode
parameter and use to to reject SVE mod
Forgot to CC this one to maintainers...
On 30/08/2023 10:14, Andre Vieira (lists) via Gcc-patches wrote:
This patch adds a machine_mode parameter to the TARGET_SIMD_CLONE_USABLE
hook to enable rejecting SVE modes when the target architecture does not
support SVE.
gcc/ChangeLog
This patch adds a new target hook to enable us to adapt the types of
return and parameters of simd clones. We use this in two ways, the
first one is to make sure we can create valid SVE types, including the
SVE type attribute, when creating a SVE simd clone, even when the target
options do not
This patch finalizes adding support for the generation of SVE simd
clones when no simdlen is provided, following the ABI rules where the
widest data type determines the minimum amount of elements in a length
agnostic vector.
gcc/ChangeLog:
* config/aarch64/aarch64-protos.h (add_sve_ty
On 30/08/2023 14:01, Richard Biener wrote:
On Wed, Aug 30, 2023 at 11:15 AM Andre Vieira (lists) via Gcc-patches
wrote:
This patch adds a machine_mode parameter to the TARGET_SIMD_CLONE_USABLE
hook to enable rejecting SVE modes when the target architecture does not
support SVE.
How does
601 - 700 of 765 matches
Mail list logo