钟居哲 writes:
> Oh. I see. Thank you so much for pointing this.
> Could you tell me what I should do in the codes?
> It seems that I should adjust it in
> vect_adjust_loop_lens_control
>
> muliply by some factor ? Is this correct multiply by max_nscalars_per_iter
> ?
max_nscalars_per_iter * factor
Prathamesh Kulkarni writes:
> On Wed, 24 May 2023 at 15:40, Richard Sandiford
> wrote:
>>
>> Prathamesh Kulkarni writes:
>> > On Mon, 22 May 2023 at 14:18, Richard Sandiford
>> > wrote:
>> >>
>> >> Prathamesh Kulkarni writes:
>> >> > Hi Richard,
>> >> > Thanks for the suggestions. Does the att
I'll look at the samples tomorrow, but just to address one thing:
钟居哲 writes:
>>> What gives the best code in these cases? Is emitting a multiplication
>>> better? Or is using a new IV better?
> Could you give me more detail information about "new refresh IV" approach.
> I'd like to try that.
LGTM, just a couple of comment tweaks:
Prathamesh Kulkarni writes:
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index d6fc94015fa..db7ca4c28c3 100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -22332,6 +22332,46 @@ aarch64_un
Thanks, this looks functionally correct to me. And I agree it handles
the cases that previously needed multiplication.
But I think it regresses code quality when no multiplication was needed.
We can now generate duplicate IVs. Perhaps ivopts would remove the
duplicates, but it might be hard, bec
"Jin Ma" writes:
>> > On 5/17/23 03:03, Jin Ma wrote:
>> >> For example:
>> >> (define_insn "mov_lowpart_sidi2"
>> >>[(set (match_operand:SI0 "register_operand" "=r")
>> >> (subreg:SI (match_operand:DI 1 "register_operand" " r") 0))]
>> >>"TARGET_64BIT"
>> >>"mov\t
Jin Ma writes:
> When the last insn1 of BB1 and the first insn2 of BB2 are fusion, insn2 will
> clear all dependencies in the function chain_to_prev_insn, resulting in insn2
> may mov to any BB, and the program calculation result is wrong.
>
> gcc/ChangeLog:
>
> * sched-deps.cc (sched_macro_
"juzhe.zh...@rivai.ai" writes:
> Hi, Richard. Thanks for the comments.
>
>>> if (!LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo)
>>> || !iv_rgc
>>> || (iv_rgc->max_nscalars_per_iter * iv_rgc->factor
>>> != rgc->max_nscalars_per_iter * rgc->factor))
>>> {
> >> /* See
This looks good to me. Just a couple of very minor cosmetic things:
juzhe.zh...@rivai.ai writes:
> @@ -753,17 +846,35 @@ vect_set_loop_condition_partial_vectors (class loop
> *loop,
> continue;
> }
>
> - /* See whether zero-based IV would ever generate all-false masks
>
Kyrylo Tkachov via Gcc-patches writes:
> Hi all,
>
> This patch expresses the intrinsics for the SRA and RSRA instructions with
> standard RTL codes rather than relying on UNSPECs.
> These instructions perform a vector shift right plus accumulate with an
> optional rounding constant addition for t
Joel Hutton via Gcc-patches writes:
> Hi all,
>
> This patch removes support for the widening subtract operation in the aarch64
> backend as it is causing a performance regression.
>
> In the following example:
>
> #include
> extern void wdiff( int16_t d[16], uint8_t *restrict pix1, uint8_t *re
Richard Biener writes:
> On Thu, 21 Jan 2021, Richard Sandiford wrote:
>
>> Joel Hutton via Gcc-patches writes:
>> > Hi all,
>> >
>> > This patch removes support for the widening subtract operation in the
>> > aarch64 backend as it is causing a performance regression.
>> >
>> > In the following
Ilya Leoshkevich writes:
> On Thu, 2021-01-21 at 12:29 +, Richard Sandiford wrote:
>> Given what you said in the other message about combine, I agree this
>> is a reasonable workaround. I don't know whether it's suitable for
>> stage 4 or whether it would need to wait for stage 1.
>
> Thanks
Wilco Dijkstra writes:
> In aarch64_classify_symbol symbols are allowed large offsets on relocations.
> This means the offset can use all of the +/-4GB offset, leaving no offset
> available for the symbol itself. This results in relocation overflow and
> link-time errors for simple expressions li
Segher Boessenkool writes:
> Hi!
>
> What is holding up this patch still? Ke Wen has pinged it every month
> since May, and there has still not been a review.
FAOD (since I'm on cc:), I don't feel qualified to review this.
Tree-level loop stuff isn't really my area.
Thanks,
Richard
>
>
> Seghe
Kyrylo Tkachov via Gcc-patches writes:
> diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c
> index
> d6201d3cb943e145720c18fbf3aadd853fd87b44..800815b855c759075b4326361cc4db7183f1c543
> 100644
> --- a/gcc/tree-ssa-math-opts.c
> +++ b/gcc/tree-ssa-math-opts.c
> @@ -3252,8 +3252,8 @
Thanks for doing this. The patch looks good with one very minor nit fixed:
Jonathan Wright writes:
> diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
> index
> f7efee61de4c5268acf446555af4a93fece6b169..da696d9fee2ffbabc9d89f2e9299fbde086cfee1
> 100644
> --- a/gcc/conf
Richard Biener writes:
> On Fri, 22 Jan 2021, Segher Boessenkool wrote:
>
>> On Fri, Jan 22, 2021 at 02:47:06PM +0100, Richard Biener wrote:
>> > On Thu, 21 Jan 2021, Segher Boessenkool wrote:
>> > > What is holding up this patch still? Ke Wen has pinged it every month
>> > > since May, and there
Richard Biener writes:
> This fixes VECTOR_CST element access with POLY_INT elements and
> allows to produce dump files of the PR98726 testcase without
> ICEing.
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu, OK?
Thanks for doing this. I could have sworn I'd written almost
exactly the
Jakub Jelinek via Gcc-patches writes:
> Hi!
>
> The testcase in the patch doesn't assemble, because the instruction requires
> that the penultimate operand (lsb) range is [0, 32] (or [0, 64]) and the last
> operand's range is [1, 32 - lsb] (or [1, 64 - lsb]).
> The INTVAL (shft_amnt) < GET_MODE_BI
Kyrylo Tkachov writes:
> Hi Jonathan,
>
>> -Original Message-
>> From: Jonathan Wright
>> Sent: 27 January 2021 16:03
>> To: gcc-patches@gcc.gnu.org
>> Cc: Kyrylo Tkachov
>> Subject: [PATCH] aarch64: Use GCC vector extensions for FP ml[as]_n
>> intrinsics
>>
>> Hi,
>>
>> As subject, this
Jakub Jelinek via Gcc-patches writes:
> Hi!
>
> The following testcase ICEs, because after the veclower pass which is the
> last point which lowers unsupported vector operations to supported ones
> (or scalars) match.pd simplifies a supported vector operation into
> unsupported one (vec << 1 >> 1
Jakub Jelinek via Gcc-patches writes:
> On Tue, Feb 02, 2021 at 09:38:09AM +, Richard Sandiford wrote:
>> > +default:
>> > + if (!VECTOR_MODE_P (mode))
>> > + return true;
>> > + op = optab_for_tree_code (code, type, optab_default);
>> > + if (op == unknown_optab
>> > +
Richard Biener writes:
> On Mon, Feb 1, 2021 at 6:54 PM Joel Hutton wrote:
>>
>> Hi Richard(s),
>>
>> I'm just looking to see if I'm going about this the right way, based on the
>> discussion we had on IRC. I've managed to hack something together, I've
>> attached a (very) WIP patch which gives
Richard Biener writes:
> On Tue, Feb 2, 2021 at 4:03 PM Richard Sandiford
> wrote:
>>
>> Richard Biener writes:
>> > On Mon, Feb 1, 2021 at 6:54 PM Joel Hutton wrote:
>> >>
>> >> Hi Richard(s),
>> >>
>> >> I'm just looking to see if I'm going about this the right way, based on
>> >> the discus
Richard Biener writes:
> On January 30, 2021 11:52:20 AM GMT+01:00, Jakub Jelinek
> wrote:
>>On Sat, Jan 30, 2021 at 11:47:24AM +0100, Richard Biener wrote:
>>> OK, so I'd prefer we simply unset the flag after processing deferred
>>rescan. I clearly misread the function to do that.
>>
>>This wo
Richard Biener writes:
> On Tue, Feb 2, 2021 at 5:19 PM Richard Sandiford
> wrote:
>>
>> Richard Biener writes:
>> > On Tue, Feb 2, 2021 at 4:03 PM Richard Sandiford
>> > wrote:
>> >>
>> >> Richard Biener writes:
>> >> > On Mon, Feb 1, 2021 at 6:54 PM Joel Hutton wrote:
>> >> >>
>> >> >> Hi R
Richard Biener writes:
> On Tue, 2 Feb 2021, Richard Sandiford wrote:
>
>> Richard Biener writes:
>> > On January 30, 2021 11:52:20 AM GMT+01:00, Jakub Jelinek
>> > wrote:
>> >>On Sat, Jan 30, 2021 at 11:47:24AM +0100, Richard Biener wrote:
>> >>> OK, so I'd prefer we simply unset the flag afte
Joel Hutton writes:
In practice this will only affect targets that choose to use mixed
vector sizes, and I think it's reasonable to optimise only for the
case in which such targets support widening conversions. So what
do you think about the idea of emitting separate conversio
Joel Hutton writes:
>>> So emit a v4qi->v8qi gimple conversion
>>> then a regular widen_lo/hi using the existing backend patterns/optabs?
>>
>>I was thinking of using a v8qi->v8hi convert on each operand followed
>>by a normal v8hi subtraction. That's what we'd generate if the target
>>didn't def
Richard Biener writes:
> The following attempts to account for the fact that BB vectorization
> regions now can span multiple loop levels and that an unprofitable
> inner loop vectorization shouldn't be offsetted by a profitable
> outer loop vectorization to make it overall profitable.
>
> For now
"Andre Vieira (lists)" writes:
> Hi,
>
> As mentioned in the PR, this patch fixes up the nvectors parameter passed to
> vect_get_loop_mask in vectorizable_condition.
> Before the STMT_VINFO_VEC_STMTS rework we used to handle each ncopy
> separately, now we gather them all at the same time and do
Richard Biener writes:
> On Fri, 5 Feb 2021, Richard Sandiford wrote:
>> Richard Biener writes:
>> > + /* First produce cost vectors sorted by loop index. */
>> > + auto_vec >
>> > +li_scalar_costs (scalar_costs.length ());
>> > + auto_vec >
>> > +li_vector_costs (vector_costs.length
"Andre Vieira (lists)" writes:
> On 05/02/2021 12:47, Richard Sandiford wrote:
>> "Andre Vieira (lists)" writes:
>>> Hi,
>>>
>>> As mentioned in the PR, this patch fixes up the nvectors parameter passed
>>> to vect_get_loop_mask in vectorizable_condition.
>>> Before the STMT_VINFO_VEC_STMTS rewo
Peter Bergner writes:
> Adding Richard since he's reviewed the generic opaque mode code in
> the past and this patch contains some more eneric support.
>
> GCC handles pseudos that are used uninitialized, by emitting a
> (set (reg: ) CONST0_RTX(regmode)) before their uninitialized
> pseudo usage.
Joel Hutton writes:
> Hi Richards,
>
> This patch adds support for the V8QI->V8HI case from widening vect patterns
> as discussed to target PR98772.
Thanks, the approach looks good to me. Mostly just minor comments below.
> Bootstrapped and regression tested on aarch64.
>
>
> [aarch64][vect] S
Joel Hutton writes:
> @@ -277,6 +277,81 @@ optab_for_tree_code (enum tree_code code, const_tree
> type,
> }
> }
>
> +/* Function supportable_half_widening_operation
> +
I realise existing (related) functions do have a “Function foo” line,
but it's not generally the GCC style, so I think
One more formatting nit, sorry:
Joel Hutton writes:
> +bool
> +supportable_half_widening_operation (enum tree_code code,
> +tree vectype_out, tree vectype_in,
> +enum tree_code *code1)
The arguments need reindenting for the new function nam
df_lr_bb_local_compute has:
FOR_EACH_INSN_INFO_DEF (def, insn_info)
/* If the def is to only part of the reg, it does
not kill the other defs that reach here. */
if (!(DF_REF_FLAGS (def) & (DF_REF_PARTIAL | DF_REF_CONDITIONAL)))
However, as noted in the comment i
Richard Biener writes:
> On Thu, Feb 11, 2021 at 4:33 PM Richard Sandiford via Gcc-patches
> wrote:
>>
>> df_lr_bb_local_compute has:
>>
>> FOR_EACH_INSN_INFO_DEF (def, insn_info)
>> /* If the def is to only part of the reg, it does
>>
I noticed while working on PR98863 that we were using the main
obstack to allocate temporary uses. That was safe, but represents
a kind of local memory leak.
Tested on aarch64-linux-gnu and x86_64-linux-gnu, pushed as obvious.
Richard
gcc/
* rtl-ssa/accesses.cc (function_info::make_use
The rtl-ssa code uses an on-the-side IL and needs to build that IL
for each block and RTL insn. I'd originally not used the classical
dominance frontier method for placing phis on the basis that it seemed
like more work in this context: we're having to visit everything in
an RPO walk anyway, so fo
Peter Bergner writes:
> 2021-02-12 Peter Bergner
>
> gcc/
> PR rtl-optimization/98872
> * init-regs.c (initialize_uninitialized_regs): Skip initialization
> if CONST0_RTX is NULL.
>
> gcc/testsuite/
> PR rtl-optimization/98872
> * gcc.target/powerpc/pr98872.c: New
Maya Rashish via Gcc-patches writes:
> Some subtargets don't provide the canonical function names as
> the symbol name in C libraries, and libcalls will only work if
> the builtins are patched to emit the correct library name.
>
> For example, on NetBSD, cabsl has the symbol name __c99_cabsl,
> an
Jakub Jelinek writes:
> On Thu, Feb 11, 2021 at 03:03:38PM +0000, Richard Sandiford via Gcc-patches
> wrote:
>> gcc/
>> * df-problems.c (df_lr_bb_local_compute): Treat partial definitions
>> as read-modify operations.
>>
>> gcc/testsuite/
>>
Hi Victor,
Thanks for the patch. I have a couple of very minor comments below,
but otherwise it looks good to go. However, it will need to wait for
stage 1 to open, unless it fixes a regression.
Victor Do Nascimento via Gcc-patches writes:
> diff --git a/gcc/config/aarch64/aarch64-simd.md
> b
Maya Rashish via Gcc-patches writes:
> Some subtargets don't provide the canonical function names as
> the symbol name in C libraries, and libcalls will only work if
> the builtins are patched to emit the correct library name.
>
> For example, on NetBSD, cabsl has the symbol name __c99_cabsl,
> an
Jakub Jelinek via Gcc-patches writes:
> Hi!
>
> The following testcase started ICEing with my recent changes to enable
> split4 after sel-sched, but it seems the bug is more general.
> Some of the i386 splitter condition functions use and rely on df, but
> the split passes don't really df_analyze/
Richard Biener writes:
> On Tue, 16 Feb 2021, Richard Sandiford wrote:
>
>> Jakub Jelinek via Gcc-patches writes:
>> > Hi!
>> >
>> > The following testcase started ICEing with my recent changes to enable
>> > split4 after sel-sched, but it seems the bug is more general.
>> > Some of the i386 spli
Jakub Jelinek writes:
> On Tue, Feb 16, 2021 at 09:42:22AM +0100, Richard Biener wrote:
>> Just to get an idea whether it's worth doing the extra df_analyze.
>> Since we have possibly 5 split passes it's a lot of churn for things
>> like that WRF ltrans unit that already spends 40% of its time in
Jakub Jelinek writes:
> On Tue, Feb 16, 2021 at 09:16:40AM +, Richard Sandiford wrote:
>> But doing it on demand like this seems fragile. And the targets aren't
>> a fixed… target. I think we need to design the interface so that things
>> are unlikely to go wrong in future rather than design
Jakub Jelinek via Gcc-patches writes:
> On Tue, Feb 16, 2021 at 09:55:40AM +, Richard Sandiford wrote:
>> I assume that's because pass_df_initialize_no_opt is slightly after
>> the first pass_split_all_insns? Seems like it should just be a case
>> of moving it up.
>>
>> > And for noflow wher
Jakub Jelinek via Gcc-patches writes:
> @@ -14897,6 +14892,32 @@ distance_agu_use (unsigned int regno0, r
>return distance >> 1;
> }
>
> +/* Copy recog_data_d from SRC to DEST. */
> +
> +static void
> +copy_recog_data (recog_data_d *dest, recog_data_d *src)
> +{
> + dest->n_operands = src
Jakub Jelinek writes:
> On Tue, Feb 16, 2021 at 01:09:43PM +, Richard Sandiford wrote:
>> Can I put in a plea to put this in recog.[hc], and possibly also make
>> it a copy constructor for recog_data_d? I can't think of any legitimate
>> cases in which we'd want to copy the whole structure, i
Jakub Jelinek writes:
> On Tue, Feb 16, 2021 at 03:03:56PM +0000, Richard Sandiford via Gcc-patches
> wrote:
>> > On Tue, Feb 16, 2021 at 01:09:43PM +, Richard Sandiford wrote:
>> >> Can I put in a plea to put this in recog.[hc], and possibly also make
>
Xi Ruoyao via Gcc-patches writes:
>> > > > I can't understand the comment either. To me it looks like it's
>> > > > possible to
>> > > > remove this "if (MSA_SUPPORTED_P (mode)) return 0;"
I think the point is that the MSA loads and stores only have a 10-bit
offset field instead of the usual 16-
Jakub Jelinek writes:
> On Wed, Feb 17, 2021 at 10:30:06AM +, Richard Sandiford wrote:
>> Hmm. I think that just means that the optimisation performed by
>> the copy constructor isn't valid in practice (even if it should be
>> in principle). Guess this is the curse of manipulating data struc
Tamar Christina writes:
> Hi All,
>
> This patch disables the test for PR99149 on Big-endian
> where for standard AArch64 the patterns are disabled.
>
> Regtested on aarch64-none-linux-gnu and no issues.
>
> Committed under the obvious rule.
>
> Thanks,
> Tamar
>
> gcc/testsuite/ChangeLog:
>
>
Alexandre Oliva writes:
> On Feb 26, 2021, Segher Boessenkool wrote:
>
>> On Fri, Feb 26, 2021 at 12:31:16PM -0500, David Edelsohn wrote:
>>> On Fri, Feb 26, 2021 at 11:09 AM Alexandre Oliva wrote:
>>> >
>>> > This patch avoids an ICE in gimplefe-28.c, in our ppc64-vxworks7r2
>>> > tests. Teste
Alex Coplan writes:
> Hi!
>
> As the PR shows, we were missing a check in
> function_resolver::require_vector_type to see if the argument type was already
> invalid. This was causing us to attempt to emit a diagnostic and subsequently
> ICE in print_type. Fixed thusly.
>
> Bootstrapped and regtest
Alex Coplan writes:
> Hi all,
>
> As discussed in the PR, we currently have two different numbering
> schemes for SVE builtins: one for C, and one for C++. This is
> problematic for LTO, where we end up getting confused about which
> intrinsic we're talking about. This patch inserts placeholders i
Richard Biener writes:
> On Fri, 30 Jul 2021, Richard Sandiford wrote:
>> > @@ -9456,6 +9499,51 @@ vectorizable_load (vec_info *vinfo,
>> >data_ref = NULL_TREE;
>> >break;
>> > }
>> > + else if (memory_access_type == VMAT_GATHER
Richard Biener writes:
> On Mon, 2 Aug 2021, Richard Sandiford wrote:
>
>> Richard Biener writes:
>> > On Fri, 30 Jul 2021, Richard Sandiford wrote:
>> >> > @@ -9456,6 +9499,51 @@ vectorizable_load (vec_info *vinfo,
>> >> > data_ref = NULL_TREE;
>> >> >
Richard Biener via Gcc-patches writes:
> On Fri, Jul 30, 2021 at 5:59 PM Richard Sandiford via Gcc-patches
> wrote:
>>
>> This patch adds a simple class for holding A/B fractions.
>> As the comments in the patch say, the class isn't designed
>> to have nice
Richard Biener writes:
> On Mon, Aug 2, 2021 at 12:43 PM Richard Sandiford
> wrote:
>>
>> Richard Biener via Gcc-patches writes:
>> > On Fri, Jul 30, 2021 at 5:59 PM Richard Sandiford via Gcc-patches
>> > wrote:
>> >>
>> >> This patch
Richard Biener writes:
> On Mon, Aug 2, 2021 at 1:31 PM Richard Sandiford
> wrote:
>>
>> Richard Biener writes:
>> > On Mon, Aug 2, 2021 at 12:43 PM Richard Sandiford
>> > wrote:
>> >>
>> >> Richard Biener via Gcc-patches writes:
&
This patch series:
(1) generalises the aarch64 vector costs to allow for the final patch.
This part should be a no-op for existing tuning code.
(2) tweaks the AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS code. This currently
only affects neoverse-v1 and again helps with the final patch.
(3)
The tuning structures have an sve_width field that specifies the
number of bits in an SVE vector (or SVE_NOT_IMPLEMENTED if not
applicable). This patch turns the field into a bitmask so that
it can specify multiple widths at the same time. For now we
always treat the mininum width as the likely w
This patch adds a simple fixed-point class for holding fractional
cost values. It can exactly represent the reciprocal of any
single-vector SVE element count (including the non-power-of-2 ones).
This means that it can also hold 1/N for all N in [1, 16], which should
be enough for the various *_per
This patch splits the SVE-specific part of aarch64_adjust_body_cost
out into its own subroutine, so that a future patch can call it
more than once. I wondered about using a lambda to avoid having
to pass all the arguments, but in the end this way seemed clearer.
gcc/
* config/aarch64/aarc
This patch adds tuning fields for the total cost of a gather load
instruction. Until now, we've costed them as one scalar load
per element instead. Those scalar_load-based values are also
what the patch uses to fill in the new fields for existing
cost structures.
gcc/
* config/aarch64/aa
When the vectoriser scalarises a strided store, it counts one
scalar_store for each element plus one vec_to_scalar extraction
for each element. However, extracting element 0 is free on AArch64,
so it should have zero cost.
I don't have a testcase that requires this for existing -mtune
options, bu
The issue-based vector costs currently assume that a multiply-add
sequence can be implemented using a single instruction. This is
generally true for scalars (which have a 4-operand instruction)
and SVE (which allows the output to be tied to any input).
However, for Advanced SIMD, multiplying two v
The AArch64 vector costs try to take issue rates into account.
However, when vectorising an outer loop, we lumped the inner
and outer operations together, which is somewhat meaningless.
This patch restricts the heuristic to the inner loop.
gcc/
* config/aarch64/aarch64.c (aarch64_add_stmt_
This patch adds an option to tune for Neoverse cores that have
a total vector bandwidth of 512 bits (4x128 for Advanced SIMD
and a vector-length-dependent equivalent for SVE). This is intended
to be a compromise between tuning aggressively for a single core like
Neoverse V1 (which can be too narro
After vect_analyze_loop has successfully analysed a loop for
one base vector mode B1, it considers using following base vector
modes to vectorise an epilogue. However, for VECT_COMPARE_COSTS,
a later mode B2 might turn out to be better than B1 was. Initially
this comparison will be between an epi
This patch uses a more accurate scalar iteration estimate when
comparing the epilogue of a constant-iteration loop with a candidate
replacement epilogue.
In the testcase, the patch prevents a 1-to-3-element SVE epilogue
from seeming better than a 64-bit Advanced SIMD epilogue.
Tested on aarch64-l
"H.J. Lu via Gcc-patches" writes:
> @@ -1122,8 +1122,8 @@ class op_by_pieces_d
> and its associated FROM_CFN_DATA can be used to replace loads with
> constant values. LEN describes the length of the operation. */
>
> -op_by_pieces_d::op_by_pieces_d (rtx to, bool to_load,
> -
Jonathan Wright via Gcc-patches writes:
> Hi,
>
> V2 of the patch addresses the initial review comments, factors out
> common code (as we discussed off-list) and adds a set of unit tests
> to verify the code generation benefit.
>
> Regression tested and bootstrapped on aarch64-none-linux-gnu - no
Jonathan Wright via Gcc-patches writes:
> Hi,
>
> The Neon multiply/multiply-accumulate/multiply-subtract instructions
> can select the top or bottom half of the operand registers. This
> selection does not change the cost of the underlying instruction and
> this should be reflected by the RTL cos
Richard Biener writes:
> This adds a gather vectorization capability to the vectorizer
> without target support by decomposing the offset vector, doing
> sclar loads and then building a vector from the result. This
> is aimed mainly at cases where vectorizing the rest of the loop
> offsets the co
Richard Biener writes:
> On Tue, Aug 3, 2021 at 2:10 PM Richard Sandiford via Gcc-patches
> wrote:
>>
>> The issue-based vector costs currently assume that a multiply-add
>> sequence can be implemented using a single instruction. This is
>> generally true for s
Richard Sandiford via Gcc-patches writes:
> Richard Biener writes:
>> On Tue, Aug 3, 2021 at 2:10 PM Richard Sandiford via Gcc-patches
>> wrote:
>>>
>>> The issue-based vector costs currently assume that a multiply-add
>>> sequence can be implem
Jonathan Wright writes:
> Hi,
>
> Changes suggested here and those discussed off-list have been
> implemented in V2 of the patch.
>
> Regression tested and bootstrapped on aarch64-none-linux-gnu - no
> issues.
>
> Ok for master?
>
> Thanks,
> Jonathan
>
> ---
>
> gcc/ChangeLog:
>
> 2021-07-19 Jon
Tested on aarch64-linux-gnu and pushed.
Richard
gcc/
* config/aarch64/aarch64.c: Fix a typo.
---
gcc/config/aarch64/aarch64.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index f80de2ca897..81c002ba0b0 10
As per $SUBJECT. OK to install?
Richard
gcc/
PR middle-end/101787
* doc/md.texi (cond_ashl, cond_ashr, cond_lshr): Document.
---
gcc/doc/md.texi | 11 +++
1 file changed, 11 insertions(+)
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index f6d1bc1ad0f..f8047aefccc 100
Jonathan Wright writes:
> Hi,
>
> V2 of this patch uses the same approach as that just implemented
> for the multiply high-half cost patch.
>
> Regression tested and bootstrapped on aarch64-none-linux-gnu - no
> issues.
>
> Ok for master?
>
> Thanks,
> Jonathan
>
> ---
>
> gcc/ChangeLog:
>
> 2021-
Jonathan Wright writes:
> Hi,
>
> V2 of this change implements the same approach as for the multiply
> and add-widen patches.
>
> Regression tested and bootstrapped on aarch64-none-linux-gnu - no
> issues.
>
> Ok for master?
>
> Thanks,
> Jonathan
>
> ---
>
> gcc/ChangeLog:
>
> 2021-07-28 Jonatha
Richard Biener writes:
> On Tue, Aug 3, 2021 at 2:09 PM Richard Sandiford via Gcc-patches
> wrote:
>>
>> When the vectoriser scalarises a strided store, it counts one
>> scalar_store for each element plus one vec_to_scalar extraction
>> for each element. However, e
Richard Biener writes:
> On Thu, Aug 5, 2021 at 2:04 PM Richard Sandiford
> wrote:
>>
>> Richard Biener writes:
>> > On Tue, Aug 3, 2021 at 2:09 PM Richard Sandiford via Gcc-patches
>> > wrote:
>> >>
>> >> When the vectoriser scal
Richard Biener via Gcc-patches writes:
> On Fri, Aug 6, 2021 at 5:32 AM liuhongt wrote:
>>
>> Hi:
>> ---
>> OK, I think sth is amiss here upthread. insv/extv do look like they
>> are designed
>> to work on integer modes (but docs do not say anything about this here).
>> In fact the caller of ext
Jonathan Wright writes:
> Hi,
>
> As subject, this patch uses __builtin_memcpy to copy vector structures
> instead of using a union - or constructing a new opaque structure one
> vector at a time - in each of the vst4[q]_lane Neon intrinsics in
> arm_neon.h.
>
> It also adds new code generation te
Jonathan Wright writes:
> Hi,
>
> As subject, this patch uses __builtin_memcpy to copy vector structures
> instead of using a union - or constructing a new opaque structure one
> vector at a time - in each of the vst3[q]_lane Neon intrinsics in
> arm_neon.h.
>
> It also adds new code generation te
Jonathan Wright writes:
> Hi,
>
> As subject, this patch uses __builtin_memcpy to copy vector structures
> instead of using a union - or constructing a new opaque structure one
> vector at a time - in each of the vst2[q]_lane Neon intrinsics in
> arm_neon.h.
>
> It also adds new code generation te
Jonathan Wright writes:
> Hi,
>
> As subject, this patch uses __builtin_memcpy to copy vector structures
> instead of using a union - or constructing a new opaque structure one
> vector at a time - in each of the vst[234][q] and vst1[q]_x[234] bfloat
> Neon intrinsics in arm_neon.h.
>
> It also ad
liuhongt via Gcc-patches writes:
> Hi:
> Bootstrapped and regtested on x86_64-linux-gnu{-m32,}
> Ok for trunk?
I think if anything the canonicalisation should be the other way:
if the shift amount is an in-range constant, we know that it fits
within a vector element, and so the vector form sh
Sorry for the slow reply.
Robin Dapp via Gcc-patches writes:
>> Hmm, OK. Doesn't expanding both versions up-front create the same kind of
>> problem that the patch is fixing, in that we expand (and therefore cost)
>> both the reversed and unreversed comparison? Also…
>>
> [..]
>>
>> …for min/
Jonathan Wright writes:
> diff --git a/gcc/testsuite/gcc.target/aarch64/vector_structure_intrinsics.c
> b/gcc/testsuite/gcc.target/aarch64/vector_structure_intrinsics.c
> index
> 60c53bc27f8378c78b119576ed19fde0e5743894..a8e31ab85d6fd2a045c8efaf2cbc42b5f40d2411
> 100644
> --- a/gcc/testsuite/gc
Tejas Belagod via Gcc-patches writes:
> Hi,
>
> Loops containing long long shifts fail to vectorize due to the vectorizer
> not being able to recognize long long right shifts. This is due to a bug
> in the iterator used for the vashr and vlshr patterns in aarch64-simd.md.
>
> Tested and bootstrapp
Richard Biener via Gcc-patches writes:
> This removes the cost part of vect_worthwhile_without_simd_p, retaining
> only the correctness bits. The reason is that the cost heuristic
> do not properly account for SLP plus the check whether "without simd"
> applies misfires for AVX512 mask vectors at
401 - 500 of 2183 matches
Mail list logo