[PATCH] vect: Don't apply masks to operations on invariants [PR108979]

2023-03-02 Thread Richard Sandiford via Gcc-patches
The loop body in the testcase contains an operation on invariants. SLP detects this and can hoist/schedule the operation outside of the loop. However, after the fix for PR96373, we would try to apply a loop mask to this operation, even though the mask is defined in the loop. The patch does what R

Re: [PATCH v2] RISC-V: Bugfix for rvv bool mode precision adjustment

2023-03-02 Thread Richard Sandiford via Gcc-patches
"Li, Pan2" writes: > Oops, looks I missed that part for assertion. Thank you for coaching. > Added and tested the below changes at the end of emit_mode_adjustments > already but looks we may have other problems about the size, the precision > and the C types. > > Looks like I need to hold this P

[PATCH] combine: Try harder to form zero_extends [PR106594]

2023-03-06 Thread Richard Sandiford via Gcc-patches
Jeff Law via Gcc-patches writes: > On 3/5/23 12:28, Tamar Christina via Gcc-patches wrote: >> >> The regression was reported during stage-1. A patch was provided during >> stage 1 and the discussions around combine stalled. >> >> The regression for AArch64 needs to be fixed in GCC 13. The hit i

Re: [PATCH v3] RISC-V: Bugfix for rvv bool mode precision adjustment

2023-03-06 Thread Richard Sandiford via Gcc-patches
pan2...@intel.com writes: > From: Pan Li > > Fix the bug of the rvv bool mode precision with the adjustment. > The bits size of vbool*_t will be adjusted to > [1, 2, 4, 8, 16, 32, 64] according to the rvv spec 1.0 isa. The > adjusted mode precison of vbool*_t will help unde

Re: [PATCH v4] RISC-V: Bugfix for rvv bool mode precision adjustment

2023-03-06 Thread Richard Sandiford via Gcc-patches
pan2...@intel.com writes: > From: Pan Li > > Fix the bug of the rvv bool mode precision with the adjustment. > The bits size of vbool*_t will be adjusted to > [1, 2, 4, 8, 16, 32, 64] according to the rvv spec 1.0 isa. The > adjusted mode precison of vbool*_t will help unde

Re: [PATCH] combine: Try harder to form zero_extends [PR106594]

2023-03-06 Thread Richard Sandiford via Gcc-patches
Segher Boessenkool writes: > On Mon, Mar 06, 2023 at 12:47:06PM +, Richard Sandiford wrote: >> How about the patch below? > > What about it? What would make it any better than the previous? It does what Jeff suggested in the quoted message: work within the existing extract/make_compound_oper

Re: [PATCH] combine: Try harder to form zero_extends [PR106594]

2023-03-06 Thread Richard Sandiford via Gcc-patches
Jakub Jelinek writes: > On Mon, Mar 06, 2023 at 03:08:00PM +0000, Richard Sandiford via Gcc-patches > wrote: >> Segher Boessenkool writes: >> > On Mon, Mar 06, 2023 at 12:47:06PM +, Richard Sandiford wrote: >> >> How about the patch below? >> > &

Re: [PATCH]AArch64: Fix codegen regressions around tbz.

2023-03-06 Thread Richard Sandiford via Gcc-patches
Tamar Christina writes: >> -Original Message- >> From: Richard Sandiford >> Sent: Friday, January 27, 2023 12:26 PM >> To: Tamar Christina >> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw >> ; Marcus Shawcroft >> ; Kyrylo Tkachov >> Subject: Re: [PATCH]AArch64: Fix codegen regressi

Re: [PATCH] combine: Try harder to form zero_extends [PR106594]

2023-03-06 Thread Richard Sandiford via Gcc-patches
Segher Boessenkool writes: > On Mon, Mar 06, 2023 at 04:34:59PM +, Richard Sandiford wrote: >> Jakub Jelinek writes: >> > Could we have a target hook to canonicalize memory addresses for combiner, >> > like we have that targetm.canonicalize_comparison ? >> >> I don't think a hook makes sense

Re: [PATCH 3/4]middle-end: Implement preferred_div_as_shifts_over_mult [PR108583]

2023-03-08 Thread Richard Sandiford via Gcc-patches
Tamar Christina writes: > Ping, > > And updated the hook to allow to differentiate between ISAs. > > As Andy said before initializing a ranger instance is cheap but not free, and > if > the intention is to call it often during a pass it should be instantiated at > pass startup and passed along to

Re: [PATCH 4/4]AArch64 Update div-bitmask to implement new optab instead of target hook [PR108583]

2023-03-08 Thread Richard Sandiford via Gcc-patches
Tamar Christina writes: > Ping, > > And updating the hook. > > There are no new test as new correctness tests were added to the mid-end and > the existing codegen tests for this already exist. > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. > > Ok for master? > > Thanks, > Tama

Re: [PATCH 4/4]AArch64 Update div-bitmask to implement new optab instead of target hook [PR108583]

2023-03-08 Thread Richard Sandiford via Gcc-patches
Tamar Christina writes: >> -Original Message- >> > + (match_operand:VQN 4 "register_operand" "w")))] >> >"TARGET_SIMD" >> > + "#" >> > + "&& true" >> > + [(const_int 0)] >> > { >> > - unsigned HOST_WIDE_INT size >> > -= (1ULL << GET_MODE_UNIT_BITSIZE (mode)) - 1; >> >

Re: [PATCH] combine: Try harder to form zero_extends [PR106594]

2023-03-08 Thread Richard Sandiford via Gcc-patches
Segher Boessenkool writes: > Hi! > > On Mon, Mar 06, 2023 at 07:13:08PM +, Richard Sandiford wrote: >> Segher Boessenkool writes: >> > Most importantly, what makes you think this is a problem for aarch64 >> > only? If it actually is, you can fix it in the aarch64 config! Either >> > with or

Re: [PATCH] combine: Try harder to form zero_extends [PR106594]

2023-03-09 Thread Richard Sandiford via Gcc-patches
Segher Boessenkool writes: > On Wed, Mar 08, 2023 at 11:58:51AM +, Richard Sandiford wrote: >> Segher Boessenkool writes: >> > An #ifdef is a way of making a change that is not finished yet not hurt >> > the other targets. It still hurts generic development, which indirectly >> > hurts all t

Re: [PATCH] Avoid unnecessary epilogues from tree_unroll_loop

2023-03-09 Thread Richard Sandiford via Gcc-patches
Richard Biener via Gcc-patches writes: > The following fixes the condition determining whether we need an > epilogue. > > When r12-2429-g62acc72a957b56 introduced this check I didn't notice > the odd condition on review. Richard - do you remember if this > was on purpose? Oops, no, looks like a

[PATCH v2 0/2] Series of patch to fix PR106594

2023-03-09 Thread Richard Sandiford via Gcc-patches
This series of patches fixes PR106594, an aarch64 regression in which we fail to combine an extension into an address. The first patch just refactors code. The second patch contains the actual fix. The cover note for the second patch describes the problem and the fix. Tested on aarch64-linux-gn

[PATCH v2 1/2] combine: Split code out of make_compound_operation_int

2023-03-09 Thread Richard Sandiford via Gcc-patches
This patch just splits some code out of make_compound_operation_int into a new function called make_compound_operation_and. It is a prerequisite for the fix for PR106594. It might (or might not) make sense to put more of the existing "and" handling into the new function, so that the subreg+lshift

[PATCH v2 2/2] combine: Try harder to form zero_extends [PR106594]

2023-03-09 Thread Richard Sandiford via Gcc-patches
g:c23a9c87cc62bd177fd0d4db6ad34b34e1b9a31f uses nonzero_bits information to convert sign_extends into zero_extends. That change is semantically correct in itself, but for the testcase in the PR, it leads to a series of unfortunate events, as described below. We try to combine: Trying 24 -> 25:

Re: AArch64 bfloat16 mangling

2023-03-09 Thread Richard Sandiford via Gcc-patches
Sorry for the slow response. Jakub Jelinek writes: > Hi! > > On Mon, Jan 30, 2023 at 11:07:23PM +, Richard Sandiford wrote: >> Jakub Jelinek writes: >> > https://gcc.gnu.org/pipermail/gcc-patches/2022-November/605965.html >> > - ABI - aarch64: Add bfloat16_t support for aarch64 (enabling i

Re: [PATCH 3/4]middle-end: Implement preferred_div_as_shifts_over_mult [PR108583]

2023-03-10 Thread Richard Sandiford via Gcc-patches
Tamar Christina writes: > Hi, > > Here's the respun patch. > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. > > Ok for master? > > Thanks, > Tamar > > gcc/ChangeLog: > > PR target/108583 > * target.def (preferred_div_as_shifts_over_mult): New. > * doc/tm.texi.i

Re: AArch64 bfloat16 mangling

2023-03-10 Thread Richard Sandiford via Gcc-patches
Jakub Jelinek writes: > On Thu, Mar 09, 2023 at 05:14:11PM +, Richard Sandiford wrote: >> We decided to keep the current mangling of __bf16 and use it for >> std::bfloat16_t too. __bf16 will become a non-standard arithmetic type. >> This will be an explicit diversion from the Itanium ABI. >>

Re: [PATCH] RISC-V: Add fault first load C/C++ support

2023-03-10 Thread Richard Sandiford via Gcc-patches
Bernhard Reutner-Fischer via Gcc-patches writes: > On 7 March 2023 07:21:23 CET, juzhe.zh...@rivai.ai wrote: >>From: Ju-Zhe Zhong >> > >>+class vleff : public function_base >>+{ >>+public: >>+ unsigned int call_properties (const function_instance &) const override >>+ { >>+return CP_READ_ME

Re: [PATCH] Fix PR 108874: aarch64 code regression with shift and ands

2023-03-10 Thread Richard Sandiford via Gcc-patches
Andrew Pinski via Gcc-patches writes: > After r6-2044-g98e30e515f184b, code like "((x & 0xff00ff00U) >> 8)" > would be optimized like (x >> 8) & 0xff00ffU which is normally better > except on aarch64, the shift right could be combined with another > operation in some cases. So we need to add a few

Re: AArch64 bfloat16 mangling

2023-03-10 Thread Richard Sandiford via Gcc-patches
Jakub Jelinek writes: > On Fri, Mar 10, 2023 at 08:43:02AM +, Richard Sandiford wrote: >> > So, either __bf16 should be also extended floating-point type >> > like decltype (0.0bf16) and std::bfloat16_t and in that case >> > it is fine if it mangles u6__bf16, or __bf16 will be a distinct >> >

Re: AArch64 bfloat16 mangling

2023-03-10 Thread Richard Sandiford via Gcc-patches
Jakub Jelinek writes: > On Fri, Mar 10, 2023 at 11:50:39AM +, Richard Sandiford wrote: >> > Will test it momentarily (including the patch it depends on): > > Note, testing still pending, I'm testing in a Fedora scratch build > and that is quite slow (lto bootstrap and the like). > >> A naive q

Re: [aarch64] Use dup and zip1 for interleaving elements in initializing vector

2023-03-10 Thread Richard Sandiford via Gcc-patches
Sorry for the slow reply. Prathamesh Kulkarni writes: > Unfortunately it regresses code-gen for the following case: > > svint32_t f(int32x4_t x) > { > return svdupq_s32 (x[0], x[1], x[2], x[3]); > } > > -O2 code-gen with trunk: > f: > dup z0.q, z0.q[0] > ret > > -O2 code-gen

Re: [PATCH 2/2] aarch64: Improve register allocation for lane instructions

2023-05-10 Thread Richard Sandiford via Gcc-patches
Richard Biener writes: > On Wed, May 10, 2023 at 12:05 AM Richard Sandiford via Gcc-patches > wrote: >> >> Andrew Pinski writes: >> > On Tue, May 9, 2023 at 11:02 AM Richard Sandiford via Gcc-patches >> > wrote: >> >> >> >> REG_A

Re: [PATCH] Var-Tracking: Leverage pointer_mux for decl_or_value

2023-05-10 Thread Richard Sandiford via Gcc-patches
Richard Biener writes: > On Wed, 10 May 2023, pan2...@intel.com wrote: > >> From: Pan Li >> >> The decl_or_value is defined as void * before this PATCH. It will take >> care of both the tree_node and rtx_def. Unfortunately, given a void >> pointer cannot tell the input is tree_node or rtx_def. >

Re: [PATCH] vect: Missed opportunity to use [SU]ABD

2023-05-10 Thread Richard Sandiford via Gcc-patches
Oluwatamilore Adebayo writes: > From 0b5f469171c340ef61a48a31877d495bb77bd35f Mon Sep 17 00:00:00 2001 > From: oluade01 > Date: Fri, 14 Apr 2023 10:24:43 +0100 > Subject: [PATCH 1/4] Missed opportunity to use [SU]ABD > > This adds a recognition pattern for the non-widening > absolute difference (

Re: [PATCH v2] Var-Tracking: Typedef pointer_mux as decl_or_value

2023-05-10 Thread Richard Sandiford via Gcc-patches
Jakub Jelinek writes: > On Wed, May 10, 2023 at 07:57:05PM +0800, pan2...@intel.com wrote: >> --- a/gcc/var-tracking.cc >> +++ b/gcc/var-tracking.cc >> @@ -116,9 +116,14 @@ >> #include "fibonacci_heap.h" >> #include "print-rtl.h" >> #include "function-abi.h" >> +#include "mux-utils.h" >> >>

Re: [PATCH] vect: Missed opportunity to use [SU]ABD

2023-05-10 Thread Richard Sandiford via Gcc-patches
Richard Biener writes: > On Wed, May 10, 2023 at 11:49 AM Richard Biener > wrote: >> >> On Wed, May 10, 2023 at 11:01 AM Richard Sandiford >> wrote: >> > >> > Oluwatamilore Adebayo writes: >> > > From 0b5f469171c340ef61a48a31877d495bb77bd35f Mon Sep 17 00:00:00 2001 >> > > From: oluade01 >> >

Re: [PATCH v3] Var-Tracking: Typedef pointer_mux as decl_or_value

2023-05-10 Thread Richard Sandiford via Gcc-patches
Thanks, mostly looks good to me. Some minor comments below. pan2...@intel.com writes: > From: Pan Li > > The decl_or_value is defined as void * before this PATCH. It will take > care of both the tree_node and rtx_def. Unfortunately, given a void > pointer cannot tell the input is tree_node or rt

Re: [PATCH V4] VECT: Add decrement IV iteration loop control by variable amount support

2023-05-10 Thread Richard Sandiford via Gcc-patches
In addition to Jeff's comments: juzhe.zh...@rivai.ai writes: > [...] > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi > index cc4a93a8763..99cf0cdbdca 100644 > --- a/gcc/doc/md.texi > +++ b/gcc/doc/md.texi > @@ -4974,6 +4974,40 @@ for (i = 1; i < operand3; i++) >operand0[i] = operand0[i - 1] &

Re: [vxworks] [testsuite] [aarch64] use builtin in pred-not-gen-4.c

2023-05-10 Thread Richard Sandiford via Gcc-patches
Alexandre Oliva via Gcc-patches writes: > On vxworks, isunordered is defined as a macro that ultimately calls a > _Fpcomp function, that GCC doesn't recognize as a builtin, so it > can't optimize accordingly. > > Use __builtin_isunordered instead to get the desired code for the > test. > > Regstra

Re: [PATCH V4] VECT: Add decrement IV iteration loop control by variable amount support

2023-05-10 Thread Richard Sandiford via Gcc-patches
钟居哲 writes: > Thanks Richard. > I am planning to seperate a patch with only creat_iv stuff only. > > Are you suggesting that I remove "tree_code incr_op = code;" > Use the argument directly ? > > I saw the codes here: > > /* For easier readability of the created code, produce MINUS_EXPRs >

Re: [PATCH v3] Var-Tracking: Typedef pointer_mux as decl_or_value

2023-05-10 Thread Richard Sandiford via Gcc-patches
"Li, Pan2" writes: > Thanks Richard Sandiford. Update PATCH v4 here -> > https://gcc.gnu.org/pipermail/gcc-patches/2023-May/618099.html. > >> - if (dv_as_opaque (node->dv) != decl || node->offset != offset) >> + if (node->dv.first_or_null () != decl || node->offset != >> + offset) > >>

Re: [PATCH V4] VECT: Add decrement IV iteration loop control by variable amount support

2023-05-10 Thread Richard Sandiford via Gcc-patches
钟居哲 writes: > I am sorry that I am still confused about that. > > Is this what you want ? > > bool use_minus_p = TREE_CODE (step) == INTEGER_CST && ((TYPE_UNSIGNED > (TREE_TYPE (step)) && tree_int_cst_lt (step1, step)) > || (!TYPE_UNSIGNED (TREE_TYPE (step)) && > !tree_exp

Re: [PATCH v5] Var-Tracking: Typedef pointer_mux as decl_or_value

2023-05-11 Thread Richard Sandiford via Gcc-patches
pan2...@intel.com writes: > From: Pan Li > > The decl_or_value is defined as void * before this PATCH. It will take > care of both the tree_node and rtx_def. Unfortunately, given a void > pointer cannot tell the input is tree_node or rtx_def. > > Then we have some implicit structure layout require

Re: [PATCH V5] VECT: Add tree_code into "creat_iv" and allow it can handle MINUS_EXPR IV.

2023-05-11 Thread Richard Sandiford via Gcc-patches
juzhe.zh...@rivai.ai writes: > From: Juzhe-Zhong > > This is patch is a seperate patch preparing for supporting decrement IV. > > gcc/ChangeLog: > > * cfgloopmanip.cc (create_empty_loop_on_edge): Add PLUS_EXPR. > * gimple-loop-interchange.cc > (tree_loop_interchange::map_induction

Re: [PATCH 15/20] arm: [MVE intrinsics] add unary_acc shape

2023-05-11 Thread Richard Sandiford via Gcc-patches
Christophe Lyon writes: > On 5/10/23 16:52, Kyrylo Tkachov wrote: >> >> >>> -Original Message- >>> From: Christophe Lyon >>> Sent: Wednesday, May 10, 2023 2:31 PM >>> To: gcc-patches@gcc.gnu.org; Kyrylo Tkachov ; >>> Richard Earnshaw ; Richard Sandiford >>> >>> Cc: Christophe Lyon >>>

Re: [PATCH 15/20] arm: [MVE intrinsics] add unary_acc shape

2023-05-11 Thread Richard Sandiford via Gcc-patches
Christophe Lyon writes: > On 5/11/23 10:30, Richard Sandiford wrote: >> Christophe Lyon writes: >>> On 5/10/23 16:52, Kyrylo Tkachov wrote: > -Original Message- > From: Christophe Lyon > Sent: Wednesday, May 10, 2023 2:31 PM > To: gcc-patches@gcc.gnu.org; Kyrylo

Re: [PATCH V4] VECT: Add decrement IV iteration loop control by variable amount support

2023-05-11 Thread Richard Sandiford via Gcc-patches
"juzhe.zh...@rivai.ai" writes: > Hi, Richard. Since create_iv has been approved and soon will be commited > after > we bootstrap && regression. > > Now, I plan to send patch for "decrement IV". > > After reading your comments, I have several questions: > > 1. >>if (use_bias_adjusted_len) >>

Re: [PATCH V4] VECT: Add decrement IV iteration loop control by variable amount support

2023-05-11 Thread Richard Sandiford via Gcc-patches
"juzhe.zh...@rivai.ai" writes: > Oh, I see. But I saw there is a variable using_partial_vectors_p > in the loop data structure. > > Can I add a variable call using_select_vl_p ? Yeah. Please also add a wrapper macro like LOOP_VINFO_USING_PARTIAL_VECTORS_P. (I'm not really a fan of the wrappers,

[PATCH] aarch64: Remove alignment assertions [PR109661]

2023-05-11 Thread Richard Sandiford via Gcc-patches
The trunk patch for this PR corrected the ABI for enums that have a defined underlying type. We shouldn't change the ABI on the branches though, so this patch just removes the assertions that highlighed the problem. I think the same approach makes sense longer-term: keep the assertions at maximum

Re: [PATCH V4] VECT: Add decrement IV iteration loop control by variable amount support

2023-05-11 Thread Richard Sandiford via Gcc-patches
"juzhe.zh...@rivai.ai" writes: > Thanks. I have read rgroup descriptions again. > Still I am not fully understand it clearly, bear with me :) > > I don't known how to differentiate Case 2 and Case 3. > > Case 2 is multiple rgroup for SLP. > Case 3 is multiple rgroup for non-SLP (VEC_PACK_TRUNC) >

Re: [PATCH] Add RTX codes for BITREVERSE and COPYSIGN.

2023-05-11 Thread Richard Sandiford via Gcc-patches
"Roger Sayle" writes: > An analysis of backend UNSPECs reveals that two of the most common UNSPECs > across target backends are for copysign and bit reversal. This patch > adds RTX codes for these expressions to allow their representation to > be standardized, and them to optimized by the middle-

Re: [libgcc PATCH] Add bit reversal functions __bitrev[qhsd]i2.

2023-05-11 Thread Richard Sandiford via Gcc-patches
"Roger Sayle" writes: > This patch proposes adding run-time library support for bit reversal, > by adding a __bitrevsi2 function to libgcc. Thoughts/opinions? > > I'm also tempted to add __popcount[qh]i2 and __parity[qh]i2 to libgcc, > to allow the RTL optimizers to perform narrowing operations,

Re: [aarch64] Use dup and zip1 for interleaving elements in initializing vector

2023-05-11 Thread Richard Sandiford via Gcc-patches
Prathamesh Kulkarni writes: > diff --git a/gcc/testsuite/gcc.target/aarch64/vec-init-18.c > b/gcc/testsuite/gcc.target/aarch64/vec-init-18.c > new file mode 100644 > index 000..598a51f17c6 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/vec-init-18.c > @@ -0,0 +1,20 @@ > +/* { dg

Re: [aarch64] Code-gen for vector initialization involving constants

2023-05-11 Thread Richard Sandiford via Gcc-patches
Prathamesh Kulkarni writes: > On Tue, 2 May 2023 at 18:22, Richard Sandiford > wrote: >> >> Prathamesh Kulkarni writes: >> > On Tue, 2 May 2023 at 17:32, Richard Sandiford >> > wrote: >> >> >> >> Prathamesh Kulkarni writes: >> >> > On Tue, 2 May 2023 at 14:56, Richard Sandiford >> >> > wrote

Re: [PATCH] [PR96339] AArch64: Optimise svlast[ab]

2023-05-11 Thread Richard Sandiford via Gcc-patches
Tejas Belagod writes: > From: Tejas Belagod > > This PR optimizes an SVE intrinsics sequence where > svlasta (svptrue_pat_b8 (SV_VL1), x) > a scalar is selected based on a constant predicate and a variable vector. > This sequence is optimized to return the correspoding element of a NEON

Re: [PATCH] Machine_Mode: Extend machine_mode from 8 to 16 bits

2023-05-12 Thread Richard Sandiford via Gcc-patches
pan2...@intel.com writes: > From: Pan Li > > We are running out of the machine_mode(8 bits) in RISC-V backend. Thus > we would like to extend the machine mode bit size from 8 to 16 bits. > However, it is sensitive to extend the memory size in common structure > like tree or rtx. This patch would l

Re: [PATCH V6] VECT: Add decrement IV support in Loop Vectorizer

2023-05-12 Thread Richard Sandiford via Gcc-patches
juzhe.zh...@rivai.ai writes: > From: Ju-Zhe Zhong > > 1. Fix document description according Jeff && Richard. > 2. Add LOOP_VINFO_USING_SELECT_VL_P for single rgroup. > 3. Add LOOP_VINFO_USING_SLP_ADJUSTED_LEN_P for SLP multiple rgroup. > > Fix bugs for V5 after testing: > https://gcc.gnu.org/piper

Re: [PATCH] Machine_Mode: Extend machine_mode from 8 to 16 bits

2023-05-12 Thread Richard Sandiford via Gcc-patches
"Li, Pan2 via Gcc-patches" writes: > Thanks Richard for comments. In previous, I am not sure it is reasonable to > let everywhere consume the same macro in rtl.h (As the includes you > mentioned). Thus, make a conservative change in PATCH v1. > > I will address the comments and try to align the

Re: [PATCH V6] VECT: Add decrement IV support in Loop Vectorizer

2023-05-12 Thread Richard Sandiford via Gcc-patches
"juzhe.zhong" writes: > Thanks Richard. > I will do that as you suggested. I have a question for the first patch. How > to > enable decrement IV? Should I add a target hook or something to let target > decide whether enable decrement IV? At the moment, the only other targets that use IFN_LOAD_L

Re: [PATCH V6] VECT: Add decrement IV support in Loop Vectorizer

2023-05-12 Thread Richard Sandiford via Gcc-patches
"juzhe.zhong" writes: > Hi, Richard. For "can iterate more than once", is it correct use the > condition > "LOOP_LENS ().length >1". No, that says whether any LOAD_LENs or STORE_LENs operate on multiple vectors, rather than just single vectors. I meant: whether the vector loop body might

Re: [PATCH 2/3] Refactor widen_plus as internal_fn

2023-05-12 Thread Richard Sandiford via Gcc-patches
Richard Biener writes: > On Fri, 12 May 2023, Andre Vieira (lists) wrote: > >> I have dealt with, I think..., most of your comments. There's quite a few >> changes, I think it's all a bit simpler now. I made some other changes to the >> costing in tree-inline.cc and gimple-range-op.cc in which I t

Re: [PATCH] aarch64: Add SVE instruction types

2023-05-15 Thread Richard Sandiford via Gcc-patches
Evandro Menezes via Gcc-patches writes: > This patch adds the attribute `type` to most SVE1 instructions, as in the > other > instructions. Thanks for doing this. Could you say what criteria you used for picking the granularity? Other maintainers might disagree, but personally I'd prefer to di

Re: [PATCH 2/3] Refactor widen_plus as internal_fn

2023-05-15 Thread Richard Sandiford via Gcc-patches
Richard Biener writes: > On Fri, 12 May 2023, Richard Sandiford wrote: > >> Richard Biener writes: >> > On Fri, 12 May 2023, Andre Vieira (lists) wrote: >> > >> >> I have dealt with, I think..., most of your comments. There's quite a few >> >> changes, I think it's all a bit simpler now. I made s

Re: [PATCH 2/3] Refactor widen_plus as internal_fn

2023-05-15 Thread Richard Sandiford via Gcc-patches
Richard Biener writes: > On Mon, 15 May 2023, Richard Sandiford wrote: > >> Richard Biener writes: >> > But I'm also not sure >> > how much of that is really needed (it seems to be tied around >> > optimizing optabs space?) >> >> Not sure what you mean by "this". Optabs space shouldn't be a pro

Re: [PATCH 2/6] aarch64: Allow moves after tied-register intrinsics

2023-05-15 Thread Richard Sandiford via Gcc-patches
Kyrylo Tkachov writes: > Hi Richard, > >> -Original Message- >> From: Gcc-patches > bounces+kyrylo.tkachov=arm@gcc.gnu.org> On Behalf Of Richard >> Sandiford via Gcc-patches >> Sent: Tuesday, May 9, 2023 7:48 AM >> To: gcc-patches@gcc.gn

Re: [PATCH 2/6] aarch64: Allow moves after tied-register intrinsics

2023-05-15 Thread Richard Sandiford via Gcc-patches
t;> >> Kyrylo Tkachov writes: >> > Hi Richard, >> > >> >> -Original Message- >> >> From: Gcc-patches > >> bounces+kyrylo.tkachov=arm@gcc.gnu.org> On Behalf Of Richard >> >> Sandiford via Gcc-patches >> >&g

Re: [aarch64] Code-gen for vector initialization involving constants

2023-05-15 Thread Richard Sandiford via Gcc-patches
Prathamesh Kulkarni writes: > Hi Richard, > After committing the interleave+zip1 patch for vector initialization, > it seems to regress the s32 case for this patch: > > int32x4_t f_s32(int32_t x) > { > return (int32x4_t) { x, x, x, 1 }; > } > > code-gen: > f_s32: > moviv30.2s, 0x1 >

Re: [PATCH V7] VECT: Add decrement IV support in Loop Vectorizer

2023-05-15 Thread Richard Sandiford via Gcc-patches
juzhe.zh...@rivai.ai writes: > From: Juzhe-Zhong > > This patch implement decrement IV for length approach in loop control. > > Address comment from kewen that incorporate the implementation inside > "vect_set_loop_controls_directly" instead of a standalone function. > > Address comment from Richa

Re: [PATCH V7] VECT: Add decrement IV support in Loop Vectorizer

2023-05-15 Thread Richard Sandiford via Gcc-patches
"juzhe.zh...@rivai.ai" writes: >>> The examples are good, but this one made me wonder: why is the >>> adjustment made to the limit (namely 16, the gap between _39 and _41) >>> different from the limits imposed by the MIN_EXPR (32)? And I think >>> the answer is that: > >>> - _47 counts the number

Re: [PATCH v3] Machine_Mode: Extend machine_mode from 8 to 16 bits

2023-05-16 Thread Richard Sandiford via Gcc-patches
"Li, Pan2" writes: > Kindly ping for this PATCH v3. The patch was sent on Saturday, so this is effectively pinging after one working day in most of Europe and America. That's too soon and comes across as aggressive. I realise you and others are working intensively on this. But in a sense that'

Re: [PATCH V7] VECT: Add decrement IV support in Loop Vectorizer

2023-05-16 Thread Richard Sandiford via Gcc-patches
"juzhe.zh...@rivai.ai" writes: > Oh, > I am sorry for incorrect typos in the last email, fix typos : > > Hi, Richard. > For case 2, I come up with this idea: > + Case 2 (SLP multiple rgroup): > + ... > + _38 = (unsigned long) n_12(D); > + _39 = _38 * 2

Re: [PATCH V7] VECT: Add decrement IV support in Loop Vectorizer

2023-05-16 Thread Richard Sandiford via Gcc-patches
"juzhe.zh...@rivai.ai" writes: > Hi, Richard. > > RVV infrastructure in RISC-V backend status: > 1. All RVV instructions pattern related to intrinsics are all finished (They > will be called not only by intrinsics but also autovec in the future). > 2. In case of autovec, we finished len_load/len_

Re: [PATCH] [PR96339] AArch64: Optimise svlast[ab]

2023-05-16 Thread Richard Sandiford via Gcc-patches
Tejas Belagod writes: >> + { >> +int i; >> +int nelts = vector_cst_encoded_nelts (v); >> +int first_el = 0; >> + >> +for (i = first_el; i < nelts; i += step) >> + if (VECTOR_CST_ENCODED_ELT (v, i) != VECTOR_CST_ENCODED_ELT (v, > first_el)) > > I think this should use !operand

Re: [PATCH v3] Machine_Mode: Extend machine_mode from 8 to 16 bits

2023-05-16 Thread Richard Sandiford via Gcc-patches
pan2...@intel.com writes: > diff --git a/gcc/rtl-ssa/accesses.h b/gcc/rtl-ssa/accesses.h > index c5180b9308a..38b4d6160c2 100644 > --- a/gcc/rtl-ssa/accesses.h > +++ b/gcc/rtl-ssa/accesses.h > @@ -254,7 +254,7 @@ private: >unsigned int m_spare : 2; > >// The value returned by the accessor

[PATCH] aarch64: Allow moves after tied-register intrinsics (2nd edition)

2023-05-16 Thread Richard Sandiford via Gcc-patches
I missed these two in g:4ff89f10ca0d41f9cfa76 because I was testing on a system that didn't support big-endian compilation. Testing on aarch64_be-elf shows no other related failures (although the overall results are worse than for little-endian). Tested on aarch64_be-elf & pushed. Richard gcc/t

Re: [PATCH] [PR96339] AArch64: Optimise svlast[ab]

2023-05-16 Thread Richard Sandiford via Gcc-patches
Tejas Belagod writes: >>> + { >>> + b = build3 (BIT_FIELD_REF, TREE_TYPE (f.lhs), val, >>> + bitsize_int (step * BITS_PER_UNIT), >>> + bitsize_int ((16 - step) * BITS_PER_UNIT)); >>> + >>> + return gimple_build_assign (f.lhs, b); >>> +

Re: [PATCH] rtl: AArch64: New RTL for ABD

2023-05-16 Thread Richard Sandiford via Gcc-patches
Sorry for the slow reply. Oluwatamilore Adebayo writes: > From afa416dab831795f7e1114da2fb9e94ea3b8c519 Mon Sep 17 00:00:00 2001 > From: oluade01 > Date: Fri, 14 Apr 2023 15:10:07 +0100 > Subject: [PATCH 2/4] AArch64: New RTL for ABD > > This patch adds new RTL and tests for sabd and uabd > > PR

Re: [aarch64] Code-gen for vector initialization involving constants

2023-05-18 Thread Richard Sandiford via Gcc-patches
Prathamesh Kulkarni writes: > On Tue, 16 May 2023 at 00:29, Richard Sandiford > wrote: >> >> Prathamesh Kulkarni writes: >> > Hi Richard, >> > After committing the interleave+zip1 patch for vector initialization, >> > it seems to regress the s32 case for this patch: >> > >> > int32x4_t f_s32(int

Re: [PATCH] Machine_Mode: Extend machine_mode from 8 to 16 bits

2023-05-18 Thread Richard Sandiford via Gcc-patches
pan2...@intel.com writes: > diff --git a/gcc/rtl-ssa/accesses.h b/gcc/rtl-ssa/accesses.h > index c5180b9308a..c2103a5cb5c 100644 > --- a/gcc/rtl-ssa/accesses.h > +++ b/gcc/rtl-ssa/accesses.h > @@ -215,7 +215,11 @@ private: > >// The values returned by the accessors above. >unsigned int m_

Re: [aarch64] Code-gen for vector initialization involving constants

2023-05-18 Thread Richard Sandiford via Gcc-patches
Prathamesh Kulkarni writes: > On Thu, 18 May 2023 at 13:37, Richard Sandiford > wrote: >> >> Prathamesh Kulkarni writes: >> > On Tue, 16 May 2023 at 00:29, Richard Sandiford >> > wrote: >> >> >> >> Prathamesh Kulkarni writes: >> >> > Hi Richard, >> >> > After committing the interleave+zip1 pat

Re: [PATCH 1/4] Missed opportunity to use [SU]ABD

2023-05-18 Thread Richard Sandiford via Gcc-patches
Thanks for the update. Some of these comments would have applied to the first version, so sorry for not catching them first time. writes: > From: oluade01 > > This adds a recognition pattern for the non-widening > absolute difference (ABD). > > gcc/ChangeLog: > > * doc/md.texi (sabd, uabd

Re: [PATCH] [PR96339] AArch64: Optimise svlast[ab]

2023-05-19 Thread Richard Sandiford via Gcc-patches
Tejas Belagod writes: > Am I correct to understand that we still need to check for the case when > there's a repeating non-zero elements in the case of NELTS_PER_PATTERN == 2? > eg. { 0, 0, 1, 1, 1, 1,} which should be encoded as {0, 0, 1, 1} with > NPATTERNS = 2 ? Yeah, that's right. The cu

Re: [PATCH V11] VECT: Add decrement IV support in Loop Vectorizer

2023-05-19 Thread Richard Sandiford via Gcc-patches
Thanks for the update. I'll split this review into two pieces. Second piece to follow (not sure when, but hopefully soon). juzhe.zh...@rivai.ai writes: > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc > index ed0166fedab..6f49bdee009 100644 > --- a/gcc/tree-vect-loop.cc > +++ b/gcc/tr

Re: [PATCH V11] VECT: Add decrement IV support in Loop Vectorizer

2023-05-19 Thread Richard Sandiford via Gcc-patches
"juzhe.zh...@rivai.ai" writes: >>> I don't think this is a property of decrementing IVs. IIUC it's really >>> a property of rgl->factor == 1 && factor == 1, where factor would need >>> to be passed in by the caller. Because of that, it should probably be >>> a separate patch. > Is it right that

Re: [PATCH V11] VECT: Add decrement IV support in Loop Vectorizer

2023-05-19 Thread Richard Sandiford via Gcc-patches
"juzhe.zh...@rivai.ai" writes: > Hi, Richard. Thanks for the comments. > > Would you mind telling me whether it is possible that we can make decrement > IV support into GCC middle-end ? > > If yes, could you tell what I should do next for the patches since I am > confused that it seems the imple

Re: [PATCH V12] VECT: Fix issue of multiple-rgroup for length is counting elements

2023-05-22 Thread Richard Sandiford via Gcc-patches
juzhe.zh...@rivai.ai writes: > From: Ju-Zhe Zhong > > Address comments from Richard that splits the patch of fixing multiple-rgroup > handling of length counting elements. > > This patch is fixing issue of handling multiple-rgroup of length is counting > elements > > Before this patch, multiple r

Re: [PATCH V13] VECT: Fix bug of multiple-rgroup for length is counting elements

2023-05-22 Thread Richard Sandiford via Gcc-patches
juzhe.zh...@rivai.ai writes: > From: Ju-Zhe Zhong > > Address comments from Richard that splits the patch of fixing multiple-rgroup > handling of length counting elements. > > This patch is fixing issue of handling multiple-rgroup of length is counting > elements > > Before this patch, multiple r

Re: [aarch64] Code-gen for vector initialization involving constants

2023-05-22 Thread Richard Sandiford via Gcc-patches
Prathamesh Kulkarni writes: > Hi Richard, > Thanks for the suggestions. Does the attached patch look OK ? > Boostrap+test in progress on aarch64-linux-gnu. Like I say, please wait for the tests to complete before sending an RFA. It saves a review cycle if the tests don't in fact pass. > diff --g

[PATCH 2/2] aarch64: Provide FPR alternatives for some bit insertions [PR109632]

2023-05-23 Thread Richard Sandiford via Gcc-patches
At -O2, and so with SLP vectorisation enabled: struct complx_t { float re, im; }; complx_t add(complx_t a, complx_t b) { return {a.re + b.re, a.im + b.im}; } generates: fmovw3, s1 fmovx0, d0 fmovx1, d2 fmovw2, s3 bfi x

[PATCH 1/2] md: Allow to refer to the value of int iterator FOO

2023-05-23 Thread Richard Sandiford via Gcc-patches
In a follow-up patch, I wanted to use an int iterator to iterate over various possible values of a const_int. But one problem with int iterators was that there was no way of referring to the current value of the iterator. This is unlike modes and codes, which provide automatic "mode", "MODE", "co

Re: [PATCH 2/2] aarch64: Provide FPR alternatives for some bit insertions [PR109632]

2023-05-23 Thread Richard Sandiford via Gcc-patches
Richard Biener writes: > On Tue, May 23, 2023 at 12:38 PM Richard Sandiford via Gcc-patches > wrote: >> >> At -O2, and so with SLP vectorisation enabled: >> >> struct complx_t { float re, im; }; >> complx_t add(complx_t a, complx_t b) { >&

Re: [PATCH] RISC-V: Refactor the framework of RVV auto-vectorization

2023-05-23 Thread Richard Sandiford via Gcc-patches
"juzhe.zh...@rivai.ai" writes: > Yeah. I know. > Like ARM does everywhere: > (define_expand "vcond" > [(set (match_operand:SVE_ALL 0 "register_operand") > (if_then_else:SVE_ALL > (match_operator 3 "comparison_operator" > [(match_operand:SVE_I 4 "register_operand") >(match_op

Re: [PATCH] Fix type error of 'switch (SUBREG_BYTE (op)).'

2023-05-23 Thread Richard Sandiford via Gcc-patches
Jeff Law via Gcc-patches writes: > On 5/17/23 03:03, Jin Ma wrote: >> For example: >> (define_insn "mov_lowpart_sidi2" >>[(set (match_operand:SI0 "register_operand" "=r") >> (subreg:SI (match_operand:DI 1 "register_operand" " r") 0))] >>"TARGET_64BIT" >>"mov\t%0,%1

Re: [PATCH] tree-optimization/109747 - SLP cost of CTORs

2023-05-23 Thread Richard Sandiford via Gcc-patches
Richard Biener writes: > The x86 backend looks at the SLP node passed to the add_stmt_cost > hook when costing vec_construct, looking for elements that require > a move from a GPR to a vector register and cost that. But since > vect_prologue_cost_for_slp decomposes the cost for an external > SLP

[PATCH] early-remat: Resync with new DF postorders [PR109940]

2023-05-23 Thread Richard Sandiford via Gcc-patches
When I wrote early-remat, the DF_FORWARD block order was a postorder of a reverse/backward walk (i.e. of the inverted cfg), rather than a reverse postorder of a forward walk. A postorder of a backward walk lacked the important property that dominators come before the blocks they dominate; instead

Re: [PATCH 1/2] Missed opportunity to use [SU]ABD

2023-05-24 Thread Richard Sandiford via Gcc-patches
Thanks for the update. Mostly LGTM, just some minor things left below. Oluwatamilore Adebayo writes: > diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc > index > a49b09539776c0056e77f99b10365d0a8747fbc5..3a2248263cf67834a1cb41167a1783a3b6400014 > 100644 > --- a/gcc/tree-vect-

Re: [PATCH V3] RISC-V: Add RVV comparison autovectorization

2023-05-24 Thread Richard Sandiford via Gcc-patches
Richard Biener writes: > On Tue, May 23, 2023 at 5:05 PM wrote: >> >> From: Juzhe-Zhong >> >> This patch enable RVV auto-vectorization including floating-point >> unorder and order comparison. >> >> The testcases are leveraged from Richard. >> So include Richard as co-author. >> >> Co-Authored-B

Re: [aarch64] Code-gen for vector initialization involving constants

2023-05-24 Thread Richard Sandiford via Gcc-patches
Prathamesh Kulkarni writes: > On Mon, 22 May 2023 at 14:18, Richard Sandiford > wrote: >> >> Prathamesh Kulkarni writes: >> > Hi Richard, >> > Thanks for the suggestions. Does the attached patch look OK ? >> > Boostrap+test in progress on aarch64-linux-gnu. >> >> Like I say, please wait for the

Re: [PATCH V12] VECT: Add decrement IV iteration loop control by variable amount support

2023-05-24 Thread Richard Sandiford via Gcc-patches
Sorry for the slow review. I needed some time to go through this patch and surrounding code to understand it, and to understand why it wasn't structured the way I was expecting. I've got some specific comments below, and then a general comment about how I think we should structure this. juzhe.zh

Re: [PATCH V12] VECT: Add decrement IV iteration loop control by variable amount support

2023-05-24 Thread Richard Sandiford via Gcc-patches
Sorry, I realised later that I had an implicit assumption here: if there are multiple rgroups, it's better to have a single IV for the smallest rgroup and scale that up to bigger rgroups. E.g. if the loop control IV is taken from an N-control rgroup and has a step S, an N*M-control rgroup would be

Re: [PATCH V12] VECT: Add decrement IV iteration loop control by variable amount support

2023-05-24 Thread Richard Sandiford via Gcc-patches
钟居哲 writes: >>> In other words, why is this different from what >>>vect_set_loop_controls_directly would do? > Oh, I see. You are confused that why I do not make multiple-rgroup vec_trunk > handling inside "vect_set_loop_controls_directly". > > Well. Frankly, I just replicate the handling of ARM

Re: [PATCH V12] VECT: Add decrement IV iteration loop control by variable amount support

2023-05-24 Thread Richard Sandiford via Gcc-patches
钟居哲 writes: >>> Both approaches are fine. I'm not against one or the other. > >>> What I didn't understand was why your patch only reuses existing IVs >>> for max_nscalars_per_iter == 1. Was it to avoid having to do a >>> multiplication (well, really a shift left) when moving from one >>> rgroup

Re: [PATCH V14] VECT: Add decrement IV iteration loop control by variable amount support

2023-05-24 Thread Richard Sandiford via Gcc-patches
Thanks for trying it. I'm still surprised that no multiplication is needed though. Does the patch work for: short x[100]; int y[200]; void f() { for (int i = 0, j = 0; i < 100; i += 2, j += 4) { x[i + 0] += 1; x[i + 1] += 2; y[j + 0] += 1; y[j + 1] += 2; y[j + 2] += 3;

Re: [PATCH V14] VECT: Add decrement IV iteration loop control by variable amount support

2023-05-24 Thread Richard Sandiford via Gcc-patches
钟居哲 writes: > Hi, the .optimized dump is like this: > >[local count: 21045336]: > ivtmp.26_36 = (unsigned long) &x; > ivtmp.27_3 = (unsigned long) &y; > ivtmp.30_6 = (unsigned long) &MEM [(void *)&y + 16B]; > ivtmp.31_10 = (unsigned long) &MEM [(void *)&y + 32B]; > ivtmp.32_14 = (u

Re: [PATCH V14] VECT: Add decrement IV iteration loop control by variable amount support

2023-05-24 Thread Richard Sandiford via Gcc-patches
钟居哲 writes: > Hi, Richard. I still don't understand it. Sorry about that. > >>> loop_len_48 = MIN_EXPR ; > >> _74 = loop_len_34 * 2 - loop_len_48; > > I have the tests already tested. > We have a MIN_EXPR to calculate the total elements: > loop_len_34 = MIN_EXPR ; > I think "8" is already mul

<    1   2   3   4   5   6   7   8   9   10   >