The loop body in the testcase contains an operation on invariants.
SLP detects this and can hoist/schedule the operation outside of
the loop. However, after the fix for PR96373, we would try to
apply a loop mask to this operation, even though the mask is
defined in the loop.
The patch does what R
"Li, Pan2" writes:
> Oops, looks I missed that part for assertion. Thank you for coaching.
> Added and tested the below changes at the end of emit_mode_adjustments
> already but looks we may have other problems about the size, the precision
> and the C types.
>
> Looks like I need to hold this P
Jeff Law via Gcc-patches writes:
> On 3/5/23 12:28, Tamar Christina via Gcc-patches wrote:
>>
>> The regression was reported during stage-1. A patch was provided during
>> stage 1 and the discussions around combine stalled.
>>
>> The regression for AArch64 needs to be fixed in GCC 13. The hit i
pan2...@intel.com writes:
> From: Pan Li
>
> Fix the bug of the rvv bool mode precision with the adjustment.
> The bits size of vbool*_t will be adjusted to
> [1, 2, 4, 8, 16, 32, 64] according to the rvv spec 1.0 isa. The
> adjusted mode precison of vbool*_t will help unde
pan2...@intel.com writes:
> From: Pan Li
>
> Fix the bug of the rvv bool mode precision with the adjustment.
> The bits size of vbool*_t will be adjusted to
> [1, 2, 4, 8, 16, 32, 64] according to the rvv spec 1.0 isa. The
> adjusted mode precison of vbool*_t will help unde
Segher Boessenkool writes:
> On Mon, Mar 06, 2023 at 12:47:06PM +, Richard Sandiford wrote:
>> How about the patch below?
>
> What about it? What would make it any better than the previous?
It does what Jeff suggested in the quoted message: work within the existing
extract/make_compound_oper
Jakub Jelinek writes:
> On Mon, Mar 06, 2023 at 03:08:00PM +0000, Richard Sandiford via Gcc-patches
> wrote:
>> Segher Boessenkool writes:
>> > On Mon, Mar 06, 2023 at 12:47:06PM +, Richard Sandiford wrote:
>> >> How about the patch below?
>> >
&
Tamar Christina writes:
>> -Original Message-
>> From: Richard Sandiford
>> Sent: Friday, January 27, 2023 12:26 PM
>> To: Tamar Christina
>> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw
>> ; Marcus Shawcroft
>> ; Kyrylo Tkachov
>> Subject: Re: [PATCH]AArch64: Fix codegen regressi
Segher Boessenkool writes:
> On Mon, Mar 06, 2023 at 04:34:59PM +, Richard Sandiford wrote:
>> Jakub Jelinek writes:
>> > Could we have a target hook to canonicalize memory addresses for combiner,
>> > like we have that targetm.canonicalize_comparison ?
>>
>> I don't think a hook makes sense
Tamar Christina writes:
> Ping,
>
> And updated the hook to allow to differentiate between ISAs.
>
> As Andy said before initializing a ranger instance is cheap but not free, and
> if
> the intention is to call it often during a pass it should be instantiated at
> pass startup and passed along to
Tamar Christina writes:
> Ping,
>
> And updating the hook.
>
> There are no new test as new correctness tests were added to the mid-end and
> the existing codegen tests for this already exist.
>
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>
> Ok for master?
>
> Thanks,
> Tama
Tamar Christina writes:
>> -Original Message-
>> > + (match_operand:VQN 4 "register_operand" "w")))]
>> >"TARGET_SIMD"
>> > + "#"
>> > + "&& true"
>> > + [(const_int 0)]
>> > {
>> > - unsigned HOST_WIDE_INT size
>> > -= (1ULL << GET_MODE_UNIT_BITSIZE (mode)) - 1;
>> >
Segher Boessenkool writes:
> Hi!
>
> On Mon, Mar 06, 2023 at 07:13:08PM +, Richard Sandiford wrote:
>> Segher Boessenkool writes:
>> > Most importantly, what makes you think this is a problem for aarch64
>> > only? If it actually is, you can fix it in the aarch64 config! Either
>> > with or
Segher Boessenkool writes:
> On Wed, Mar 08, 2023 at 11:58:51AM +, Richard Sandiford wrote:
>> Segher Boessenkool writes:
>> > An #ifdef is a way of making a change that is not finished yet not hurt
>> > the other targets. It still hurts generic development, which indirectly
>> > hurts all t
Richard Biener via Gcc-patches writes:
> The following fixes the condition determining whether we need an
> epilogue.
>
> When r12-2429-g62acc72a957b56 introduced this check I didn't notice
> the odd condition on review. Richard - do you remember if this
> was on purpose?
Oops, no, looks like a
This series of patches fixes PR106594, an aarch64 regression in which
we fail to combine an extension into an address. The first patch just
refactors code. The second patch contains the actual fix.
The cover note for the second patch describes the problem and the fix.
Tested on aarch64-linux-gn
This patch just splits some code out of make_compound_operation_int
into a new function called make_compound_operation_and. It is a
prerequisite for the fix for PR106594.
It might (or might not) make sense to put more of the existing
"and" handling into the new function, so that the subreg+lshift
g:c23a9c87cc62bd177fd0d4db6ad34b34e1b9a31f uses nonzero_bits
information to convert sign_extends into zero_extends.
That change is semantically correct in itself, but for the
testcase in the PR, it leads to a series of unfortunate events,
as described below.
We try to combine:
Trying 24 -> 25:
Sorry for the slow response.
Jakub Jelinek writes:
> Hi!
>
> On Mon, Jan 30, 2023 at 11:07:23PM +, Richard Sandiford wrote:
>> Jakub Jelinek writes:
>> > https://gcc.gnu.org/pipermail/gcc-patches/2022-November/605965.html
>> > - ABI - aarch64: Add bfloat16_t support for aarch64 (enabling i
Tamar Christina writes:
> Hi,
>
> Here's the respun patch.
>
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>
> Ok for master?
>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
> PR target/108583
> * target.def (preferred_div_as_shifts_over_mult): New.
> * doc/tm.texi.i
Jakub Jelinek writes:
> On Thu, Mar 09, 2023 at 05:14:11PM +, Richard Sandiford wrote:
>> We decided to keep the current mangling of __bf16 and use it for
>> std::bfloat16_t too. __bf16 will become a non-standard arithmetic type.
>> This will be an explicit diversion from the Itanium ABI.
>>
Bernhard Reutner-Fischer via Gcc-patches writes:
> On 7 March 2023 07:21:23 CET, juzhe.zh...@rivai.ai wrote:
>>From: Ju-Zhe Zhong
>>
>
>>+class vleff : public function_base
>>+{
>>+public:
>>+ unsigned int call_properties (const function_instance &) const override
>>+ {
>>+return CP_READ_ME
Andrew Pinski via Gcc-patches writes:
> After r6-2044-g98e30e515f184b, code like "((x & 0xff00ff00U) >> 8)"
> would be optimized like (x >> 8) & 0xff00ffU which is normally better
> except on aarch64, the shift right could be combined with another
> operation in some cases. So we need to add a few
Jakub Jelinek writes:
> On Fri, Mar 10, 2023 at 08:43:02AM +, Richard Sandiford wrote:
>> > So, either __bf16 should be also extended floating-point type
>> > like decltype (0.0bf16) and std::bfloat16_t and in that case
>> > it is fine if it mangles u6__bf16, or __bf16 will be a distinct
>> >
Jakub Jelinek writes:
> On Fri, Mar 10, 2023 at 11:50:39AM +, Richard Sandiford wrote:
>> > Will test it momentarily (including the patch it depends on):
>
> Note, testing still pending, I'm testing in a Fedora scratch build
> and that is quite slow (lto bootstrap and the like).
>
>> A naive q
Sorry for the slow reply.
Prathamesh Kulkarni writes:
> Unfortunately it regresses code-gen for the following case:
>
> svint32_t f(int32x4_t x)
> {
> return svdupq_s32 (x[0], x[1], x[2], x[3]);
> }
>
> -O2 code-gen with trunk:
> f:
> dup z0.q, z0.q[0]
> ret
>
> -O2 code-gen
Richard Biener writes:
> On Wed, May 10, 2023 at 12:05 AM Richard Sandiford via Gcc-patches
> wrote:
>>
>> Andrew Pinski writes:
>> > On Tue, May 9, 2023 at 11:02 AM Richard Sandiford via Gcc-patches
>> > wrote:
>> >>
>> >> REG_A
Richard Biener writes:
> On Wed, 10 May 2023, pan2...@intel.com wrote:
>
>> From: Pan Li
>>
>> The decl_or_value is defined as void * before this PATCH. It will take
>> care of both the tree_node and rtx_def. Unfortunately, given a void
>> pointer cannot tell the input is tree_node or rtx_def.
>
Oluwatamilore Adebayo writes:
> From 0b5f469171c340ef61a48a31877d495bb77bd35f Mon Sep 17 00:00:00 2001
> From: oluade01
> Date: Fri, 14 Apr 2023 10:24:43 +0100
> Subject: [PATCH 1/4] Missed opportunity to use [SU]ABD
>
> This adds a recognition pattern for the non-widening
> absolute difference (
Jakub Jelinek writes:
> On Wed, May 10, 2023 at 07:57:05PM +0800, pan2...@intel.com wrote:
>> --- a/gcc/var-tracking.cc
>> +++ b/gcc/var-tracking.cc
>> @@ -116,9 +116,14 @@
>> #include "fibonacci_heap.h"
>> #include "print-rtl.h"
>> #include "function-abi.h"
>> +#include "mux-utils.h"
>>
>>
Richard Biener writes:
> On Wed, May 10, 2023 at 11:49 AM Richard Biener
> wrote:
>>
>> On Wed, May 10, 2023 at 11:01 AM Richard Sandiford
>> wrote:
>> >
>> > Oluwatamilore Adebayo writes:
>> > > From 0b5f469171c340ef61a48a31877d495bb77bd35f Mon Sep 17 00:00:00 2001
>> > > From: oluade01
>> >
Thanks, mostly looks good to me. Some minor comments below.
pan2...@intel.com writes:
> From: Pan Li
>
> The decl_or_value is defined as void * before this PATCH. It will take
> care of both the tree_node and rtx_def. Unfortunately, given a void
> pointer cannot tell the input is tree_node or rt
In addition to Jeff's comments:
juzhe.zh...@rivai.ai writes:
> [...]
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index cc4a93a8763..99cf0cdbdca 100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -4974,6 +4974,40 @@ for (i = 1; i < operand3; i++)
>operand0[i] = operand0[i - 1] &
Alexandre Oliva via Gcc-patches writes:
> On vxworks, isunordered is defined as a macro that ultimately calls a
> _Fpcomp function, that GCC doesn't recognize as a builtin, so it
> can't optimize accordingly.
>
> Use __builtin_isunordered instead to get the desired code for the
> test.
>
> Regstra
钟居哲 writes:
> Thanks Richard.
> I am planning to seperate a patch with only creat_iv stuff only.
>
> Are you suggesting that I remove "tree_code incr_op = code;"
> Use the argument directly ?
>
> I saw the codes here:
>
> /* For easier readability of the created code, produce MINUS_EXPRs
>
"Li, Pan2" writes:
> Thanks Richard Sandiford. Update PATCH v4 here ->
> https://gcc.gnu.org/pipermail/gcc-patches/2023-May/618099.html.
>
>> - if (dv_as_opaque (node->dv) != decl || node->offset != offset)
>> + if (node->dv.first_or_null () != decl || node->offset !=
>> + offset)
>
>>
钟居哲 writes:
> I am sorry that I am still confused about that.
>
> Is this what you want ?
>
> bool use_minus_p = TREE_CODE (step) == INTEGER_CST && ((TYPE_UNSIGNED
> (TREE_TYPE (step)) && tree_int_cst_lt (step1, step))
> || (!TYPE_UNSIGNED (TREE_TYPE (step)) &&
> !tree_exp
pan2...@intel.com writes:
> From: Pan Li
>
> The decl_or_value is defined as void * before this PATCH. It will take
> care of both the tree_node and rtx_def. Unfortunately, given a void
> pointer cannot tell the input is tree_node or rtx_def.
>
> Then we have some implicit structure layout require
juzhe.zh...@rivai.ai writes:
> From: Juzhe-Zhong
>
> This is patch is a seperate patch preparing for supporting decrement IV.
>
> gcc/ChangeLog:
>
> * cfgloopmanip.cc (create_empty_loop_on_edge): Add PLUS_EXPR.
> * gimple-loop-interchange.cc
> (tree_loop_interchange::map_induction
Christophe Lyon writes:
> On 5/10/23 16:52, Kyrylo Tkachov wrote:
>>
>>
>>> -Original Message-
>>> From: Christophe Lyon
>>> Sent: Wednesday, May 10, 2023 2:31 PM
>>> To: gcc-patches@gcc.gnu.org; Kyrylo Tkachov ;
>>> Richard Earnshaw ; Richard Sandiford
>>>
>>> Cc: Christophe Lyon
>>>
Christophe Lyon writes:
> On 5/11/23 10:30, Richard Sandiford wrote:
>> Christophe Lyon writes:
>>> On 5/10/23 16:52, Kyrylo Tkachov wrote:
> -Original Message-
> From: Christophe Lyon
> Sent: Wednesday, May 10, 2023 2:31 PM
> To: gcc-patches@gcc.gnu.org; Kyrylo
"juzhe.zh...@rivai.ai" writes:
> Hi, Richard. Since create_iv has been approved and soon will be commited
> after
> we bootstrap && regression.
>
> Now, I plan to send patch for "decrement IV".
>
> After reading your comments, I have several questions:
>
> 1.
>>if (use_bias_adjusted_len)
>>
"juzhe.zh...@rivai.ai" writes:
> Oh, I see. But I saw there is a variable using_partial_vectors_p
> in the loop data structure.
>
> Can I add a variable call using_select_vl_p ?
Yeah. Please also add a wrapper macro like
LOOP_VINFO_USING_PARTIAL_VECTORS_P. (I'm not really a fan of the
wrappers,
The trunk patch for this PR corrected the ABI for enums that have
a defined underlying type. We shouldn't change the ABI on the branches
though, so this patch just removes the assertions that highlighed
the problem.
I think the same approach makes sense longer-term: keep the assertions
at maximum
"juzhe.zh...@rivai.ai" writes:
> Thanks. I have read rgroup descriptions again.
> Still I am not fully understand it clearly, bear with me :)
>
> I don't known how to differentiate Case 2 and Case 3.
>
> Case 2 is multiple rgroup for SLP.
> Case 3 is multiple rgroup for non-SLP (VEC_PACK_TRUNC)
>
"Roger Sayle" writes:
> An analysis of backend UNSPECs reveals that two of the most common UNSPECs
> across target backends are for copysign and bit reversal. This patch
> adds RTX codes for these expressions to allow their representation to
> be standardized, and them to optimized by the middle-
"Roger Sayle" writes:
> This patch proposes adding run-time library support for bit reversal,
> by adding a __bitrevsi2 function to libgcc. Thoughts/opinions?
>
> I'm also tempted to add __popcount[qh]i2 and __parity[qh]i2 to libgcc,
> to allow the RTL optimizers to perform narrowing operations,
Prathamesh Kulkarni writes:
> diff --git a/gcc/testsuite/gcc.target/aarch64/vec-init-18.c
> b/gcc/testsuite/gcc.target/aarch64/vec-init-18.c
> new file mode 100644
> index 000..598a51f17c6
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/vec-init-18.c
> @@ -0,0 +1,20 @@
> +/* { dg
Prathamesh Kulkarni writes:
> On Tue, 2 May 2023 at 18:22, Richard Sandiford
> wrote:
>>
>> Prathamesh Kulkarni writes:
>> > On Tue, 2 May 2023 at 17:32, Richard Sandiford
>> > wrote:
>> >>
>> >> Prathamesh Kulkarni writes:
>> >> > On Tue, 2 May 2023 at 14:56, Richard Sandiford
>> >> > wrote
Tejas Belagod writes:
> From: Tejas Belagod
>
> This PR optimizes an SVE intrinsics sequence where
> svlasta (svptrue_pat_b8 (SV_VL1), x)
> a scalar is selected based on a constant predicate and a variable vector.
> This sequence is optimized to return the correspoding element of a NEON
pan2...@intel.com writes:
> From: Pan Li
>
> We are running out of the machine_mode(8 bits) in RISC-V backend. Thus
> we would like to extend the machine mode bit size from 8 to 16 bits.
> However, it is sensitive to extend the memory size in common structure
> like tree or rtx. This patch would l
juzhe.zh...@rivai.ai writes:
> From: Ju-Zhe Zhong
>
> 1. Fix document description according Jeff && Richard.
> 2. Add LOOP_VINFO_USING_SELECT_VL_P for single rgroup.
> 3. Add LOOP_VINFO_USING_SLP_ADJUSTED_LEN_P for SLP multiple rgroup.
>
> Fix bugs for V5 after testing:
> https://gcc.gnu.org/piper
"Li, Pan2 via Gcc-patches" writes:
> Thanks Richard for comments. In previous, I am not sure it is reasonable to
> let everywhere consume the same macro in rtl.h (As the includes you
> mentioned). Thus, make a conservative change in PATCH v1.
>
> I will address the comments and try to align the
"juzhe.zhong" writes:
> Thanks Richard.
> I will do that as you suggested. I have a question for the first patch. How
> to
> enable decrement IV? Should I add a target hook or something to let target
> decide whether enable decrement IV?
At the moment, the only other targets that use IFN_LOAD_L
"juzhe.zhong" writes:
> Hi, Richard. For "can iterate more than once", is it correct use the
> condition
> "LOOP_LENS ().length >1".
No, that says whether any LOAD_LENs or STORE_LENs operate on multiple
vectors, rather than just single vectors.
I meant: whether the vector loop body might
Richard Biener writes:
> On Fri, 12 May 2023, Andre Vieira (lists) wrote:
>
>> I have dealt with, I think..., most of your comments. There's quite a few
>> changes, I think it's all a bit simpler now. I made some other changes to the
>> costing in tree-inline.cc and gimple-range-op.cc in which I t
Evandro Menezes via Gcc-patches writes:
> This patch adds the attribute `type` to most SVE1 instructions, as in the
> other
> instructions.
Thanks for doing this.
Could you say what criteria you used for picking the granularity? Other
maintainers might disagree, but personally I'd prefer to di
Richard Biener writes:
> On Fri, 12 May 2023, Richard Sandiford wrote:
>
>> Richard Biener writes:
>> > On Fri, 12 May 2023, Andre Vieira (lists) wrote:
>> >
>> >> I have dealt with, I think..., most of your comments. There's quite a few
>> >> changes, I think it's all a bit simpler now. I made s
Richard Biener writes:
> On Mon, 15 May 2023, Richard Sandiford wrote:
>
>> Richard Biener writes:
>> > But I'm also not sure
>> > how much of that is really needed (it seems to be tied around
>> > optimizing optabs space?)
>>
>> Not sure what you mean by "this". Optabs space shouldn't be a pro
Kyrylo Tkachov writes:
> Hi Richard,
>
>> -Original Message-
>> From: Gcc-patches > bounces+kyrylo.tkachov=arm@gcc.gnu.org> On Behalf Of Richard
>> Sandiford via Gcc-patches
>> Sent: Tuesday, May 9, 2023 7:48 AM
>> To: gcc-patches@gcc.gn
t;>
>> Kyrylo Tkachov writes:
>> > Hi Richard,
>> >
>> >> -Original Message-
>> >> From: Gcc-patches > >> bounces+kyrylo.tkachov=arm@gcc.gnu.org> On Behalf Of Richard
>> >> Sandiford via Gcc-patches
>> >&g
Prathamesh Kulkarni writes:
> Hi Richard,
> After committing the interleave+zip1 patch for vector initialization,
> it seems to regress the s32 case for this patch:
>
> int32x4_t f_s32(int32_t x)
> {
> return (int32x4_t) { x, x, x, 1 };
> }
>
> code-gen:
> f_s32:
> moviv30.2s, 0x1
>
juzhe.zh...@rivai.ai writes:
> From: Juzhe-Zhong
>
> This patch implement decrement IV for length approach in loop control.
>
> Address comment from kewen that incorporate the implementation inside
> "vect_set_loop_controls_directly" instead of a standalone function.
>
> Address comment from Richa
"juzhe.zh...@rivai.ai" writes:
>>> The examples are good, but this one made me wonder: why is the
>>> adjustment made to the limit (namely 16, the gap between _39 and _41)
>>> different from the limits imposed by the MIN_EXPR (32)? And I think
>>> the answer is that:
>
>>> - _47 counts the number
"Li, Pan2" writes:
> Kindly ping for this PATCH v3.
The patch was sent on Saturday, so this is effectively pinging after
one working day in most of Europe and America. That's too soon and
comes across as aggressive.
I realise you and others are working intensively on this. But in a
sense that'
"juzhe.zh...@rivai.ai" writes:
> Oh,
> I am sorry for incorrect typos in the last email, fix typos :
>
> Hi, Richard.
> For case 2, I come up with this idea:
> + Case 2 (SLP multiple rgroup):
> + ...
> + _38 = (unsigned long) n_12(D);
> + _39 = _38 * 2
"juzhe.zh...@rivai.ai" writes:
> Hi, Richard.
>
> RVV infrastructure in RISC-V backend status:
> 1. All RVV instructions pattern related to intrinsics are all finished (They
> will be called not only by intrinsics but also autovec in the future).
> 2. In case of autovec, we finished len_load/len_
Tejas Belagod writes:
>> + {
>> +int i;
>> +int nelts = vector_cst_encoded_nelts (v);
>> +int first_el = 0;
>> +
>> +for (i = first_el; i < nelts; i += step)
>> + if (VECTOR_CST_ENCODED_ELT (v, i) != VECTOR_CST_ENCODED_ELT (v,
> first_el))
>
> I think this should use !operand
pan2...@intel.com writes:
> diff --git a/gcc/rtl-ssa/accesses.h b/gcc/rtl-ssa/accesses.h
> index c5180b9308a..38b4d6160c2 100644
> --- a/gcc/rtl-ssa/accesses.h
> +++ b/gcc/rtl-ssa/accesses.h
> @@ -254,7 +254,7 @@ private:
>unsigned int m_spare : 2;
>
>// The value returned by the accessor
I missed these two in g:4ff89f10ca0d41f9cfa76 because I was
testing on a system that didn't support big-endian compilation.
Testing on aarch64_be-elf shows no other related failures
(although the overall results are worse than for little-endian).
Tested on aarch64_be-elf & pushed.
Richard
gcc/t
Tejas Belagod writes:
>>> + {
>>> + b = build3 (BIT_FIELD_REF, TREE_TYPE (f.lhs), val,
>>> + bitsize_int (step * BITS_PER_UNIT),
>>> + bitsize_int ((16 - step) * BITS_PER_UNIT));
>>> +
>>> + return gimple_build_assign (f.lhs, b);
>>> +
Sorry for the slow reply.
Oluwatamilore Adebayo writes:
> From afa416dab831795f7e1114da2fb9e94ea3b8c519 Mon Sep 17 00:00:00 2001
> From: oluade01
> Date: Fri, 14 Apr 2023 15:10:07 +0100
> Subject: [PATCH 2/4] AArch64: New RTL for ABD
>
> This patch adds new RTL and tests for sabd and uabd
>
> PR
Prathamesh Kulkarni writes:
> On Tue, 16 May 2023 at 00:29, Richard Sandiford
> wrote:
>>
>> Prathamesh Kulkarni writes:
>> > Hi Richard,
>> > After committing the interleave+zip1 patch for vector initialization,
>> > it seems to regress the s32 case for this patch:
>> >
>> > int32x4_t f_s32(int
pan2...@intel.com writes:
> diff --git a/gcc/rtl-ssa/accesses.h b/gcc/rtl-ssa/accesses.h
> index c5180b9308a..c2103a5cb5c 100644
> --- a/gcc/rtl-ssa/accesses.h
> +++ b/gcc/rtl-ssa/accesses.h
> @@ -215,7 +215,11 @@ private:
>
>// The values returned by the accessors above.
>unsigned int m_
Prathamesh Kulkarni writes:
> On Thu, 18 May 2023 at 13:37, Richard Sandiford
> wrote:
>>
>> Prathamesh Kulkarni writes:
>> > On Tue, 16 May 2023 at 00:29, Richard Sandiford
>> > wrote:
>> >>
>> >> Prathamesh Kulkarni writes:
>> >> > Hi Richard,
>> >> > After committing the interleave+zip1 pat
Thanks for the update. Some of these comments would have applied
to the first version, so sorry for not catching them first time.
writes:
> From: oluade01
>
> This adds a recognition pattern for the non-widening
> absolute difference (ABD).
>
> gcc/ChangeLog:
>
> * doc/md.texi (sabd, uabd
Tejas Belagod writes:
> Am I correct to understand that we still need to check for the case when
> there's a repeating non-zero elements in the case of NELTS_PER_PATTERN == 2?
> eg. { 0, 0, 1, 1, 1, 1,} which should be encoded as {0, 0, 1, 1} with
> NPATTERNS = 2 ?
Yeah, that's right. The cu
Thanks for the update. I'll split this review into two pieces.
Second piece to follow (not sure when, but hopefully soon).
juzhe.zh...@rivai.ai writes:
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index ed0166fedab..6f49bdee009 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tr
"juzhe.zh...@rivai.ai" writes:
>>> I don't think this is a property of decrementing IVs. IIUC it's really
>>> a property of rgl->factor == 1 && factor == 1, where factor would need
>>> to be passed in by the caller. Because of that, it should probably be
>>> a separate patch.
> Is it right that
"juzhe.zh...@rivai.ai" writes:
> Hi, Richard. Thanks for the comments.
>
> Would you mind telling me whether it is possible that we can make decrement
> IV support into GCC middle-end ?
>
> If yes, could you tell what I should do next for the patches since I am
> confused that it seems the imple
juzhe.zh...@rivai.ai writes:
> From: Ju-Zhe Zhong
>
> Address comments from Richard that splits the patch of fixing multiple-rgroup
> handling of length counting elements.
>
> This patch is fixing issue of handling multiple-rgroup of length is counting
> elements
>
> Before this patch, multiple r
juzhe.zh...@rivai.ai writes:
> From: Ju-Zhe Zhong
>
> Address comments from Richard that splits the patch of fixing multiple-rgroup
> handling of length counting elements.
>
> This patch is fixing issue of handling multiple-rgroup of length is counting
> elements
>
> Before this patch, multiple r
Prathamesh Kulkarni writes:
> Hi Richard,
> Thanks for the suggestions. Does the attached patch look OK ?
> Boostrap+test in progress on aarch64-linux-gnu.
Like I say, please wait for the tests to complete before sending an RFA.
It saves a review cycle if the tests don't in fact pass.
> diff --g
At -O2, and so with SLP vectorisation enabled:
struct complx_t { float re, im; };
complx_t add(complx_t a, complx_t b) {
return {a.re + b.re, a.im + b.im};
}
generates:
fmovw3, s1
fmovx0, d0
fmovx1, d2
fmovw2, s3
bfi x
In a follow-up patch, I wanted to use an int iterator to iterate
over various possible values of a const_int. But one problem
with int iterators was that there was no way of referring to the
current value of the iterator. This is unlike modes and codes,
which provide automatic "mode", "MODE", "co
Richard Biener writes:
> On Tue, May 23, 2023 at 12:38 PM Richard Sandiford via Gcc-patches
> wrote:
>>
>> At -O2, and so with SLP vectorisation enabled:
>>
>> struct complx_t { float re, im; };
>> complx_t add(complx_t a, complx_t b) {
>&
"juzhe.zh...@rivai.ai" writes:
> Yeah. I know.
> Like ARM does everywhere:
> (define_expand "vcond"
> [(set (match_operand:SVE_ALL 0 "register_operand")
> (if_then_else:SVE_ALL
> (match_operator 3 "comparison_operator"
> [(match_operand:SVE_I 4 "register_operand")
>(match_op
Jeff Law via Gcc-patches writes:
> On 5/17/23 03:03, Jin Ma wrote:
>> For example:
>> (define_insn "mov_lowpart_sidi2"
>>[(set (match_operand:SI0 "register_operand" "=r")
>> (subreg:SI (match_operand:DI 1 "register_operand" " r") 0))]
>>"TARGET_64BIT"
>>"mov\t%0,%1
Richard Biener writes:
> The x86 backend looks at the SLP node passed to the add_stmt_cost
> hook when costing vec_construct, looking for elements that require
> a move from a GPR to a vector register and cost that. But since
> vect_prologue_cost_for_slp decomposes the cost for an external
> SLP
When I wrote early-remat, the DF_FORWARD block order was a postorder
of a reverse/backward walk (i.e. of the inverted cfg), rather than a
reverse postorder of a forward walk. A postorder of a backward walk
lacked the important property that dominators come before the blocks
they dominate; instead
Thanks for the update. Mostly LGTM, just some minor things left below.
Oluwatamilore Adebayo writes:
> diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
> index
> a49b09539776c0056e77f99b10365d0a8747fbc5..3a2248263cf67834a1cb41167a1783a3b6400014
> 100644
> --- a/gcc/tree-vect-
Richard Biener writes:
> On Tue, May 23, 2023 at 5:05 PM wrote:
>>
>> From: Juzhe-Zhong
>>
>> This patch enable RVV auto-vectorization including floating-point
>> unorder and order comparison.
>>
>> The testcases are leveraged from Richard.
>> So include Richard as co-author.
>>
>> Co-Authored-B
Prathamesh Kulkarni writes:
> On Mon, 22 May 2023 at 14:18, Richard Sandiford
> wrote:
>>
>> Prathamesh Kulkarni writes:
>> > Hi Richard,
>> > Thanks for the suggestions. Does the attached patch look OK ?
>> > Boostrap+test in progress on aarch64-linux-gnu.
>>
>> Like I say, please wait for the
Sorry for the slow review. I needed some time to go through this
patch and surrounding code to understand it, and to understand
why it wasn't structured the way I was expecting.
I've got some specific comments below, and then a general comment
about how I think we should structure this.
juzhe.zh
Sorry, I realised later that I had an implicit assumption here:
if there are multiple rgroups, it's better to have a single IV
for the smallest rgroup and scale that up to bigger rgroups.
E.g. if the loop control IV is taken from an N-control rgroup
and has a step S, an N*M-control rgroup would be
钟居哲 writes:
>>> In other words, why is this different from what
>>>vect_set_loop_controls_directly would do?
> Oh, I see. You are confused that why I do not make multiple-rgroup vec_trunk
> handling inside "vect_set_loop_controls_directly".
>
> Well. Frankly, I just replicate the handling of ARM
钟居哲 writes:
>>> Both approaches are fine. I'm not against one or the other.
>
>>> What I didn't understand was why your patch only reuses existing IVs
>>> for max_nscalars_per_iter == 1. Was it to avoid having to do a
>>> multiplication (well, really a shift left) when moving from one
>>> rgroup
Thanks for trying it. I'm still surprised that no multiplication
is needed though. Does the patch work for:
short x[100];
int y[200];
void f() {
for (int i = 0, j = 0; i < 100; i += 2, j += 4) {
x[i + 0] += 1;
x[i + 1] += 2;
y[j + 0] += 1;
y[j + 1] += 2;
y[j + 2] += 3;
钟居哲 writes:
> Hi, the .optimized dump is like this:
>
>[local count: 21045336]:
> ivtmp.26_36 = (unsigned long) &x;
> ivtmp.27_3 = (unsigned long) &y;
> ivtmp.30_6 = (unsigned long) &MEM [(void *)&y + 16B];
> ivtmp.31_10 = (unsigned long) &MEM [(void *)&y + 32B];
> ivtmp.32_14 = (u
钟居哲 writes:
> Hi, Richard. I still don't understand it. Sorry about that.
>
>>> loop_len_48 = MIN_EXPR ;
> >> _74 = loop_len_34 * 2 - loop_len_48;
>
> I have the tests already tested.
> We have a MIN_EXPR to calculate the total elements:
> loop_len_34 = MIN_EXPR ;
> I think "8" is already mul
301 - 400 of 2183 matches
Mail list logo