Re: [PATCH V14] VECT: Add decrement IV iteration loop control by variable amount support

2023-05-24 Thread Richard Sandiford via Gcc-patches
钟居哲 writes: > Oh. I see. Thank you so much for pointing this. > Could you tell me what I should do in the codes? > It seems that I should adjust it in > vect_adjust_loop_lens_control > > muliply by some factor ? Is this correct multiply by max_nscalars_per_iter > ? max_nscalars_per_iter * factor

Re: [aarch64] Code-gen for vector initialization involving constants

2023-05-24 Thread Richard Sandiford via Gcc-patches
Prathamesh Kulkarni writes: > On Wed, 24 May 2023 at 15:40, Richard Sandiford > wrote: >> >> Prathamesh Kulkarni writes: >> > On Mon, 22 May 2023 at 14:18, Richard Sandiford >> > wrote: >> >> >> >> Prathamesh Kulkarni writes: >> >> > Hi Richard, >> >> > Thanks for the suggestions. Does the att

Re: [PATCH V14] VECT: Add decrement IV iteration loop control by variable amount support

2023-05-24 Thread Richard Sandiford via Gcc-patches
I'll look at the samples tomorrow, but just to address one thing: 钟居哲 writes: >>> What gives the best code in these cases? Is emitting a multiplication >>> better? Or is using a new IV better? > Could you give me more detail information about "new refresh IV" approach. > I'd like to try that.

Re: [aarch64] Code-gen for vector initialization involving constants

2023-05-25 Thread Richard Sandiford via Gcc-patches
LGTM, just a couple of comment tweaks: Prathamesh Kulkarni writes: > diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc > index d6fc94015fa..db7ca4c28c3 100644 > --- a/gcc/config/aarch64/aarch64.cc > +++ b/gcc/config/aarch64/aarch64.cc > @@ -22332,6 +22332,46 @@ aarch64_un

Re: [PATCH V15] VECT: Add decrement IV iteration loop control by variable amount support

2023-05-25 Thread Richard Sandiford via Gcc-patches
Thanks, this looks functionally correct to me. And I agree it handles the cases that previously needed multiplication. But I think it regresses code quality when no multiplication was needed. We can now generate duplicate IVs. Perhaps ivopts would remove the duplicates, but it might be hard, bec

Re: [PATCH] Fix type error of 'switch (SUBREG_BYTE (op)).'

2023-05-25 Thread Richard Sandiford via Gcc-patches
"Jin Ma" writes: >> > On 5/17/23 03:03, Jin Ma wrote: >> >> For example: >> >> (define_insn "mov_lowpart_sidi2" >> >>[(set (match_operand:SI0 "register_operand" "=r") >> >> (subreg:SI (match_operand:DI 1 "register_operand" " r") 0))] >> >>"TARGET_64BIT" >> >>"mov\t

Re: [PATCH] RISC-V: In pipeline scheduling, insns should not be fusion in different BB blocks.

2023-05-25 Thread Richard Sandiford via Gcc-patches
Jin Ma writes: > When the last insn1 of BB1 and the first insn2 of BB2 are fusion, insn2 will > clear all dependencies in the function chain_to_prev_insn, resulting in insn2 > may mov to any BB, and the program calculation result is wrong. > > gcc/ChangeLog: > > * sched-deps.cc (sched_macro_

Re: [PATCH V15] VECT: Add decrement IV iteration loop control by variable amount support

2023-05-25 Thread Richard Sandiford via Gcc-patches
"juzhe.zh...@rivai.ai" writes: > Hi, Richard. Thanks for the comments. > >>> if (!LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo) >>> || !iv_rgc >>> || (iv_rgc->max_nscalars_per_iter * iv_rgc->factor >>> != rgc->max_nscalars_per_iter * rgc->factor)) >>> { > >> /* See

Re: [PATCH V16] VECT: Add decrement IV iteration loop control by variable amount support

2023-05-25 Thread Richard Sandiford via Gcc-patches
This looks good to me. Just a couple of very minor cosmetic things: juzhe.zh...@rivai.ai writes: > @@ -753,17 +846,35 @@ vect_set_loop_condition_partial_vectors (class loop > *loop, > continue; > } > > - /* See whether zero-based IV would ever generate all-false masks >

Re: [PATCH] stor-layout, aarch64: Express SRA intrinsics with RTL codes

2023-05-25 Thread Richard Sandiford via Gcc-patches
Kyrylo Tkachov via Gcc-patches writes: > Hi all, > > This patch expresses the intrinsics for the SRA and RSRA instructions with > standard RTL codes rather than relying on UNSPECs. > These instructions perform a vector shift right plus accumulate with an > optional rounding constant addition for t

Re: [AArch64] Remove backend support for widen-sub

2021-01-21 Thread Richard Sandiford via Gcc-patches
Joel Hutton via Gcc-patches writes: > Hi all, > > This patch removes support for the widening subtract operation in the aarch64 > backend as it is causing a performance regression. > > In the following example: > > #include > extern void wdiff( int16_t d[16], uint8_t *restrict pix1, uint8_t *re

Re: [AArch64] Remove backend support for widen-sub

2021-01-21 Thread Richard Sandiford via Gcc-patches
Richard Biener writes: > On Thu, 21 Jan 2021, Richard Sandiford wrote: > >> Joel Hutton via Gcc-patches writes: >> > Hi all, >> > >> > This patch removes support for the widening subtract operation in the >> > aarch64 backend as it is causing a performance regression. >> > >> > In the following

Re: [PATCH v3] fwprop: Allow (subreg (mem)) simplifications

2021-01-21 Thread Richard Sandiford via Gcc-patches
Ilya Leoshkevich writes: > On Thu, 2021-01-21 at 12:29 +, Richard Sandiford wrote: >> Given what you said in the other message about combine, I agree this >> is a reasonable workaround. I don't know whether it's suitable for >> stage 4 or whether it would need to wait for stage 1. > > Thanks

Re: [GCC8 backport] AArch64: Fix symbol offset limit (PR 98618)

2021-01-21 Thread Richard Sandiford via Gcc-patches
Wilco Dijkstra writes: > In aarch64_classify_symbol symbols are allowed large offsets on relocations. > This means the offset can use all of the +/-4GB offset, leaving no offset > available for the symbol itself. This results in relocation overflow and > link-time errors for simple expressions li

Re: [PATCH 1/4] unroll: Add middle-end unroll factor estimation

2021-01-22 Thread Richard Sandiford via Gcc-patches
Segher Boessenkool writes: > Hi! > > What is holding up this patch still? Ke Wen has pinged it every month > since May, and there has still not been a review. FAOD (since I'm on cc:), I don't feel qualified to review this. Tree-level loop stuff isn't really my area. Thanks, Richard > > > Seghe

Re: [PATCH] tree-ssa-mathopts: Use proper poly_int64 comparison with param_avoid_fma_max_bits [PR 98766]

2021-01-22 Thread Richard Sandiford via Gcc-patches
Kyrylo Tkachov via Gcc-patches writes: > diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c > index > d6201d3cb943e145720c18fbf3aadd853fd87b44..800815b855c759075b4326361cc4db7183f1c543 > 100644 > --- a/gcc/tree-ssa-math-opts.c > +++ b/gcc/tree-ssa-math-opts.c > @@ -3252,8 +3252,8 @

Re: [PATCH] aarch64: Use RTL builtins for integer mla intrinsics

2021-01-22 Thread Richard Sandiford via Gcc-patches
Thanks for doing this. The patch looks good with one very minor nit fixed: Jonathan Wright writes: > diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h > index > f7efee61de4c5268acf446555af4a93fece6b169..da696d9fee2ffbabc9d89f2e9299fbde086cfee1 > 100644 > --- a/gcc/conf

Re: [PATCH 1/4] unroll: Add middle-end unroll factor estimation

2021-01-25 Thread Richard Sandiford via Gcc-patches
Richard Biener writes: > On Fri, 22 Jan 2021, Segher Boessenkool wrote: > >> On Fri, Jan 22, 2021 at 02:47:06PM +0100, Richard Biener wrote: >> > On Thu, 21 Jan 2021, Segher Boessenkool wrote: >> > > What is holding up this patch still? Ke Wen has pinged it every month >> > > since May, and there

Re: [PATCH] middle-end/98726 - fix VECTOR_CST element access

2021-01-26 Thread Richard Sandiford via Gcc-patches
Richard Biener writes: > This fixes VECTOR_CST element access with POLY_INT elements and > allows to produce dump files of the PR98726 testcase without > ICEing. > > Bootstrapped and tested on x86_64-unknown-linux-gnu, OK? Thanks for doing this. I could have sworn I'd written almost exactly the

Re: [PATCH] aarch64: Tighten up checks for ubfix [PR98681]

2021-01-26 Thread Richard Sandiford via Gcc-patches
Jakub Jelinek via Gcc-patches writes: > Hi! > > The testcase in the patch doesn't assemble, because the instruction requires > that the penultimate operand (lsb) range is [0, 32] (or [0, 64]) and the last > operand's range is [1, 32 - lsb] (or [1, 64 - lsb]). > The INTVAL (shft_amnt) < GET_MODE_BI

Re: [PATCH] aarch64: Use GCC vector extensions for FP ml[as]_n intrinsics

2021-01-27 Thread Richard Sandiford via Gcc-patches
Kyrylo Tkachov writes: > Hi Jonathan, > >> -Original Message- >> From: Jonathan Wright >> Sent: 27 January 2021 16:03 >> To: gcc-patches@gcc.gnu.org >> Cc: Kyrylo Tkachov >> Subject: [PATCH] aarch64: Use GCC vector extensions for FP ml[as]_n >> intrinsics >> >> Hi, >> >> As subject, this

Re: [PATCH] gimple-match, gimple-fold: After PROP_gimple_lvec is set, punt for vector stmts that veclower would need to lower [PR98287]

2021-02-02 Thread Richard Sandiford via Gcc-patches
Jakub Jelinek via Gcc-patches writes: > Hi! > > The following testcase ICEs, because after the veclower pass which is the > last point which lowers unsupported vector operations to supported ones > (or scalars) match.pd simplifies a supported vector operation into > unsupported one (vec << 1 >> 1

Re: [PATCH] gimple-match, gimple-fold: After PROP_gimple_lvec is set, punt for vector stmts that veclower would need to lower [PR98287]

2021-02-02 Thread Richard Sandiford via Gcc-patches
Jakub Jelinek via Gcc-patches writes: > On Tue, Feb 02, 2021 at 09:38:09AM +, Richard Sandiford wrote: >> > +default: >> > + if (!VECTOR_MODE_P (mode)) >> > + return true; >> > + op = optab_for_tree_code (code, type, optab_default); >> > + if (op == unknown_optab >> > +

Re: [RFC] Feedback on approach for adding support for V8QI->V8HI widening patterns

2021-02-02 Thread Richard Sandiford via Gcc-patches
Richard Biener writes: > On Mon, Feb 1, 2021 at 6:54 PM Joel Hutton wrote: >> >> Hi Richard(s), >> >> I'm just looking to see if I'm going about this the right way, based on the >> discussion we had on IRC. I've managed to hack something together, I've >> attached a (very) WIP patch which gives

Re: [RFC] Feedback on approach for adding support for V8QI->V8HI widening patterns

2021-02-02 Thread Richard Sandiford via Gcc-patches
Richard Biener writes: > On Tue, Feb 2, 2021 at 4:03 PM Richard Sandiford > wrote: >> >> Richard Biener writes: >> > On Mon, Feb 1, 2021 at 6:54 PM Joel Hutton wrote: >> >> >> >> Hi Richard(s), >> >> >> >> I'm just looking to see if I'm going about this the right way, based on >> >> the discus

Re: [PATCH] i386, df: Fix up gcc.c-torture/compile/20051216-1.c -O1 -march=cascadelake

2021-02-02 Thread Richard Sandiford via Gcc-patches
Richard Biener writes: > On January 30, 2021 11:52:20 AM GMT+01:00, Jakub Jelinek > wrote: >>On Sat, Jan 30, 2021 at 11:47:24AM +0100, Richard Biener wrote: >>> OK, so I'd prefer we simply unset the flag after processing deferred >>rescan. I clearly misread the function to do that. >> >>This wo

Re: [RFC] Feedback on approach for adding support for V8QI->V8HI widening patterns

2021-02-03 Thread Richard Sandiford via Gcc-patches
Richard Biener writes: > On Tue, Feb 2, 2021 at 5:19 PM Richard Sandiford > wrote: >> >> Richard Biener writes: >> > On Tue, Feb 2, 2021 at 4:03 PM Richard Sandiford >> > wrote: >> >> >> >> Richard Biener writes: >> >> > On Mon, Feb 1, 2021 at 6:54 PM Joel Hutton wrote: >> >> >> >> >> >> Hi R

Re: [PATCH] i386, df: Fix up gcc.c-torture/compile/20051216-1.c -O1 -march=cascadelake

2021-02-03 Thread Richard Sandiford via Gcc-patches
Richard Biener writes: > On Tue, 2 Feb 2021, Richard Sandiford wrote: > >> Richard Biener writes: >> > On January 30, 2021 11:52:20 AM GMT+01:00, Jakub Jelinek >> > wrote: >> >>On Sat, Jan 30, 2021 at 11:47:24AM +0100, Richard Biener wrote: >> >>> OK, so I'd prefer we simply unset the flag afte

Re: [RFC] Feedback on approach for adding support for V8QI->V8HI widening patterns

2021-02-03 Thread Richard Sandiford via Gcc-patches
Joel Hutton writes: In practice this will only affect targets that choose to use mixed vector sizes, and I think it's reasonable to optimise only for the case in which such targets support widening conversions. So what do you think about the idea of emitting separate conversio

Re: [RFC] Feedback on approach for adding support for V8QI->V8HI widening patterns

2021-02-03 Thread Richard Sandiford via Gcc-patches
Joel Hutton writes: >>> So emit a v4qi->v8qi gimple conversion >>> then a regular widen_lo/hi using the existing backend patterns/optabs? >> >>I was thinking of using a v8qi->v8hi convert on each operand followed >>by a normal v8hi subtraction. That's what we'd generate if the target >>didn't def

Re: [PATCH] tree-optimization/98855 - redo BB vectorization costing

2021-02-05 Thread Richard Sandiford via Gcc-patches
Richard Biener writes: > The following attempts to account for the fact that BB vectorization > regions now can span multiple loop levels and that an unprofitable > inner loop vectorization shouldn't be offsetted by a profitable > outer loop vectorization to make it overall profitable. > > For now

Re: PR98974: Fix vectorizable_condition after STMT_VINFO_VEC_STMTS

2021-02-05 Thread Richard Sandiford via Gcc-patches
"Andre Vieira (lists)" writes: > Hi, > > As mentioned in the PR, this patch fixes up the nvectors parameter passed to > vect_get_loop_mask in vectorizable_condition. > Before the STMT_VINFO_VEC_STMTS rework we used to handle each ncopy > separately, now we gather them all at the same time and do

Re: [PATCH] tree-optimization/98855 - redo BB vectorization costing

2021-02-05 Thread Richard Sandiford via Gcc-patches
Richard Biener writes: > On Fri, 5 Feb 2021, Richard Sandiford wrote: >> Richard Biener writes: >> > + /* First produce cost vectors sorted by loop index. */ >> > + auto_vec > >> > +li_scalar_costs (scalar_costs.length ()); >> > + auto_vec > >> > +li_vector_costs (vector_costs.length

Re: PR98974: Fix vectorizable_condition after STMT_VINFO_VEC_STMTS

2021-02-08 Thread Richard Sandiford via Gcc-patches
"Andre Vieira (lists)" writes: > On 05/02/2021 12:47, Richard Sandiford wrote: >> "Andre Vieira (lists)" writes: >>> Hi, >>> >>> As mentioned in the PR, this patch fixes up the nvectors parameter passed >>> to vect_get_loop_mask in vectorizable_condition. >>> Before the STMT_VINFO_VEC_STMTS rewo

Re: [PATCH, rs6000, expand, hooks]: Fix PR98872, handle uninitialized opaque mode variables

2021-02-08 Thread Richard Sandiford via Gcc-patches
Peter Bergner writes: > Adding Richard since he's reviewed the generic opaque mode code in > the past and this patch contains some more eneric support. > > GCC handles pseudos that are used uninitialized, by emitting a > (set (reg: ) CONST0_RTX(regmode)) before their uninitialized > pseudo usage.

Re: [aarch64][vect] Support V8QI->V8HI WIDEN_ patterns

2021-02-10 Thread Richard Sandiford via Gcc-patches
Joel Hutton writes: > Hi Richards, > > This patch adds support for the V8QI->V8HI case from widening vect patterns > as discussed to target PR98772. Thanks, the approach looks good to me. Mostly just minor comments below. > Bootstrapped and regression tested on aarch64. > > > [aarch64][vect] S

Re: [aarch64][vect] Support V8QI->V8HI WIDEN_ patterns

2021-02-10 Thread Richard Sandiford via Gcc-patches
Joel Hutton writes: > @@ -277,6 +277,81 @@ optab_for_tree_code (enum tree_code code, const_tree > type, > } > } > > +/* Function supportable_half_widening_operation > + I realise existing (related) functions do have a “Function foo” line, but it's not generally the GCC style, so I think

Re: [aarch64][vect] Support V8QI->V8HI WIDEN_ patterns

2021-02-11 Thread Richard Sandiford via Gcc-patches
One more formatting nit, sorry: Joel Hutton writes: > +bool > +supportable_half_widening_operation (enum tree_code code, > +tree vectype_out, tree vectype_in, > +enum tree_code *code1) The arguments need reindenting for the new function nam

[PATCH] df: Record all definitions in DF_LR_BB_INFO->def [PR98863]

2021-02-11 Thread Richard Sandiford via Gcc-patches
df_lr_bb_local_compute has: FOR_EACH_INSN_INFO_DEF (def, insn_info) /* If the def is to only part of the reg, it does not kill the other defs that reach here. */ if (!(DF_REF_FLAGS (def) & (DF_REF_PARTIAL | DF_REF_CONDITIONAL))) However, as noted in the comment i

Re: [PATCH] df: Record all definitions in DF_LR_BB_INFO->def [PR98863]

2021-02-12 Thread Richard Sandiford via Gcc-patches
Richard Biener writes: > On Thu, Feb 11, 2021 at 4:33 PM Richard Sandiford via Gcc-patches > wrote: >> >> df_lr_bb_local_compute has: >> >> FOR_EACH_INSN_INFO_DEF (def, insn_info) >> /* If the def is to only part of the reg, it does >>

[committed] rtl-ssa: Use right obstack for temporary allocation

2021-02-12 Thread Richard Sandiford via Gcc-patches
I noticed while working on PR98863 that we were using the main obstack to allocate temporary uses. That was safe, but represents a kind of local memory leak. Tested on aarch64-linux-gnu and x86_64-linux-gnu, pushed as obvious. Richard gcc/ * rtl-ssa/accesses.cc (function_info::make_use

[PATCH] rtl-ssa: Reduce the amount of temporary memory needed [PR98863]

2021-02-12 Thread Richard Sandiford via Gcc-patches
The rtl-ssa code uses an on-the-side IL and needs to build that IL for each block and RTL insn. I'd originally not used the classical dominance frontier method for placing phis on the basis that it seemed like more work in this context: we're having to visit everything in an RPO walk anyway, so fo

Re: rtl-optimization: Fix uninitialized use of opaque mode variable ICE [PR98872]

2021-02-15 Thread Richard Sandiford via Gcc-patches
Peter Bergner writes: > 2021-02-12 Peter Bergner > > gcc/ > PR rtl-optimization/98872 > * init-regs.c (initialize_uninitialized_regs): Skip initialization > if CONST0_RTX is NULL. > > gcc/testsuite/ > PR rtl-optimization/98872 > * gcc.target/powerpc/pr98872.c: New

Re: [PATCH 1/2] aarch64: Run SUBTARGET_INIT_BUILTINS if it exists

2021-02-15 Thread Richard Sandiford via Gcc-patches
Maya Rashish via Gcc-patches writes: > Some subtargets don't provide the canonical function names as > the symbol name in C libraries, and libcalls will only work if > the builtins are patched to emit the correct library name. > > For example, on NetBSD, cabsl has the symbol name __c99_cabsl, > an

Re: [PATCH] df: Record all definitions in DF_LR_BB_INFO->def [PR98863]

2021-02-15 Thread Richard Sandiford via Gcc-patches
Jakub Jelinek writes: > On Thu, Feb 11, 2021 at 03:03:38PM +0000, Richard Sandiford via Gcc-patches > wrote: >> gcc/ >> * df-problems.c (df_lr_bb_local_compute): Treat partial definitions >> as read-modify operations. >> >> gcc/testsuite/ >>

Re: [PATCH][AArch64] Leveraging the use of STP instruction for vec_duplicate

2021-02-15 Thread Richard Sandiford via Gcc-patches
Hi Victor, Thanks for the patch. I have a couple of very minor comments below, but otherwise it looks good to go. However, it will need to wait for stage 1 to open, unless it fixes a regression. Victor Do Nascimento via Gcc-patches writes: > diff --git a/gcc/config/aarch64/aarch64-simd.md > b

Re: [PATCH] aarch64: Run SUBTARGET_INIT_BUILTINS if it exists

2021-02-15 Thread Richard Sandiford via Gcc-patches
Maya Rashish via Gcc-patches writes: > Some subtargets don't provide the canonical function names as > the symbol name in C libraries, and libcalls will only work if > the builtins are patched to emit the correct library name. > > For example, on NetBSD, cabsl has the symbol name __c99_cabsl, > an

Re: [PATCH] split, i386: Fix up df uses in i386 splitters [PR99104]

2021-02-16 Thread Richard Sandiford via Gcc-patches
Jakub Jelinek via Gcc-patches writes: > Hi! > > The following testcase started ICEing with my recent changes to enable > split4 after sel-sched, but it seems the bug is more general. > Some of the i386 splitter condition functions use and rely on df, but > the split passes don't really df_analyze/

Re: [PATCH] split, i386: Fix up df uses in i386 splitters [PR99104]

2021-02-16 Thread Richard Sandiford via Gcc-patches
Richard Biener writes: > On Tue, 16 Feb 2021, Richard Sandiford wrote: > >> Jakub Jelinek via Gcc-patches writes: >> > Hi! >> > >> > The following testcase started ICEing with my recent changes to enable >> > split4 after sel-sched, but it seems the bug is more general. >> > Some of the i386 spli

Re: [PATCH] split, i386: Fix up df uses in i386 splitters [PR99104]

2021-02-16 Thread Richard Sandiford via Gcc-patches
Jakub Jelinek writes: > On Tue, Feb 16, 2021 at 09:42:22AM +0100, Richard Biener wrote: >> Just to get an idea whether it's worth doing the extra df_analyze. >> Since we have possibly 5 split passes it's a lot of churn for things >> like that WRF ltrans unit that already spends 40% of its time in

Re: [PATCH] split, i386: Fix up df uses in i386 splitters [PR99104]

2021-02-16 Thread Richard Sandiford via Gcc-patches
Jakub Jelinek writes: > On Tue, Feb 16, 2021 at 09:16:40AM +, Richard Sandiford wrote: >> But doing it on demand like this seems fragile. And the targets aren't >> a fixed… target. I think we need to design the interface so that things >> are unlikely to go wrong in future rather than design

Re: [PATCH] split, i386: Fix up df uses in i386 splitters [PR99104]

2021-02-16 Thread Richard Sandiford via Gcc-patches
Jakub Jelinek via Gcc-patches writes: > On Tue, Feb 16, 2021 at 09:55:40AM +, Richard Sandiford wrote: >> I assume that's because pass_df_initialize_no_opt is slightly after >> the first pass_split_all_insns? Seems like it should just be a case >> of moving it up. >> >> > And for noflow wher

Re: [PATCH] split, i386: Fix up df uses in i386 splitters [PR99104]

2021-02-16 Thread Richard Sandiford via Gcc-patches
Jakub Jelinek via Gcc-patches writes: > @@ -14897,6 +14892,32 @@ distance_agu_use (unsigned int regno0, r >return distance >> 1; > } > > +/* Copy recog_data_d from SRC to DEST. */ > + > +static void > +copy_recog_data (recog_data_d *dest, recog_data_d *src) > +{ > + dest->n_operands = src

Re: [PATCH] split, i386: Fix up df uses in i386 splitters [PR99104]

2021-02-16 Thread Richard Sandiford via Gcc-patches
Jakub Jelinek writes: > On Tue, Feb 16, 2021 at 01:09:43PM +, Richard Sandiford wrote: >> Can I put in a plea to put this in recog.[hc], and possibly also make >> it a copy constructor for recog_data_d? I can't think of any legitimate >> cases in which we'd want to copy the whole structure, i

Re: [PATCH] split, i386, v5: Fix up df uses in i386 splitters [PR99104]

2021-02-17 Thread Richard Sandiford via Gcc-patches
Jakub Jelinek writes: > On Tue, Feb 16, 2021 at 03:03:56PM +0000, Richard Sandiford via Gcc-patches > wrote: >> > On Tue, Feb 16, 2021 at 01:09:43PM +, Richard Sandiford wrote: >> >> Can I put in a plea to put this in recog.[hc], and possibly also make >

Re: [PATCH] MIPS: Fix PR target/98491 (ChangeLog)

2021-02-17 Thread Richard Sandiford via Gcc-patches
Xi Ruoyao via Gcc-patches writes: >> > > > I can't understand the comment either.  To me it looks like it's >> > > > possible to >> > > > remove this "if (MSA_SUPPORTED_P (mode)) return 0;" I think the point is that the MSA loads and stores only have a 10-bit offset field instead of the usual 16-

Re: [PATCH] split, i386, v5: Fix up df uses in i386 splitters [PR99104]

2021-02-18 Thread Richard Sandiford via Gcc-patches
Jakub Jelinek writes: > On Wed, Feb 17, 2021 at 10:30:06AM +, Richard Sandiford wrote: >> Hmm. I think that just means that the optimisation performed by >> the copy constructor isn't valid in practice (even if it should be >> in principle). Guess this is the curse of manipulating data struc

Re: [PATCH][comitted] Testsuite: Disable PR99149 test on big-endian

2021-03-01 Thread Richard Sandiford via Gcc-patches
Tamar Christina writes: > Hi All, > > This patch disables the test for PR99149 on Big-endian > where for standard AArch64 the patterns are disabled. > > Regtested on aarch64-none-linux-gnu and no issues. > > Committed under the obvious rule. > > Thanks, > Tamar > > gcc/testsuite/ChangeLog: > >

Re: add -mpowerpc-gpopt to options for sqrt insn on PowerPC

2021-03-02 Thread Richard Sandiford via Gcc-patches
Alexandre Oliva writes: > On Feb 26, 2021, Segher Boessenkool wrote: > >> On Fri, Feb 26, 2021 at 12:31:16PM -0500, David Edelsohn wrote: >>> On Fri, Feb 26, 2021 at 11:09 AM Alexandre Oliva wrote: >>> > >>> > This patch avoids an ICE in gimplefe-28.c, in our ppc64-vxworks7r2 >>> > tests. Teste

Re: [PATCH] aarch64: Add missing error_mark_node check [PR99381]

2021-03-04 Thread Richard Sandiford via Gcc-patches
Alex Coplan writes: > Hi! > > As the PR shows, we were missing a check in > function_resolver::require_vector_type to see if the argument type was already > invalid. This was causing us to attempt to emit a diagnostic and subsequently > ICE in print_type. Fixed thusly. > > Bootstrapped and regtest

Re: [PATCH] aarch64: Fix SVE ACLE builtins with LTO [PR99216]

2021-03-08 Thread Richard Sandiford via Gcc-patches
Alex Coplan writes: > Hi all, > > As discussed in the PR, we currently have two different numbering > schemes for SVE builtins: one for C, and one for C++. This is > problematic for LTO, where we end up getting confused about which > intrinsic we're talking about. This patch inserts placeholders i

Re: [PATCH] Add emulated gather capability to the vectorizer

2021-08-02 Thread Richard Sandiford via Gcc-patches
Richard Biener writes: > On Fri, 30 Jul 2021, Richard Sandiford wrote: >> > @@ -9456,6 +9499,51 @@ vectorizable_load (vec_info *vinfo, >> >data_ref = NULL_TREE; >> >break; >> > } >> > + else if (memory_access_type == VMAT_GATHER

Re: [PATCH] Add emulated gather capability to the vectorizer

2021-08-02 Thread Richard Sandiford via Gcc-patches
Richard Biener writes: > On Mon, 2 Aug 2021, Richard Sandiford wrote: > >> Richard Biener writes: >> > On Fri, 30 Jul 2021, Richard Sandiford wrote: >> >> > @@ -9456,6 +9499,51 @@ vectorizable_load (vec_info *vinfo, >> >> > data_ref = NULL_TREE; >> >> >

Re: [PATCH] Add a simple fraction class

2021-08-02 Thread Richard Sandiford via Gcc-patches
Richard Biener via Gcc-patches writes: > On Fri, Jul 30, 2021 at 5:59 PM Richard Sandiford via Gcc-patches > wrote: >> >> This patch adds a simple class for holding A/B fractions. >> As the comments in the patch say, the class isn't designed >> to have nice

Re: [PATCH] Add a simple fraction class

2021-08-02 Thread Richard Sandiford via Gcc-patches
Richard Biener writes: > On Mon, Aug 2, 2021 at 12:43 PM Richard Sandiford > wrote: >> >> Richard Biener via Gcc-patches writes: >> > On Fri, Jul 30, 2021 at 5:59 PM Richard Sandiford via Gcc-patches >> > wrote: >> >> >> >> This patch

Re: [PATCH] Add a simple fraction class

2021-08-03 Thread Richard Sandiford via Gcc-patches
Richard Biener writes: > On Mon, Aug 2, 2021 at 1:31 PM Richard Sandiford > wrote: >> >> Richard Biener writes: >> > On Mon, Aug 2, 2021 at 12:43 PM Richard Sandiford >> > wrote: >> >> >> >> Richard Biener via Gcc-patches writes: &

[PATCH 0/8] aarch64 vector cost tweaks

2021-08-03 Thread Richard Sandiford via Gcc-patches
This patch series: (1) generalises the aarch64 vector costs to allow for the final patch. This part should be a no-op for existing tuning code. (2) tweaks the AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS code. This currently only affects neoverse-v1 and again helps with the final patch. (3)

[PATCH 1/8] aarch64: Turn sve_width tuning field into a bitmask

2021-08-03 Thread Richard Sandiford via Gcc-patches
The tuning structures have an sve_width field that specifies the number of bits in an SVE vector (or SVE_NOT_IMPLEMENTED if not applicable). This patch turns the field into a bitmask so that it can specify multiple widths at the same time. For now we always treat the mininum width as the likely w

[PATCH 2/8] aarch64: Add a simple fixed-point class for costing

2021-08-03 Thread Richard Sandiford via Gcc-patches
This patch adds a simple fixed-point class for holding fractional cost values. It can exactly represent the reciprocal of any single-vector SVE element count (including the non-power-of-2 ones). This means that it can also hold 1/N for all N in [1, 16], which should be enough for the various *_per

[PATCH 3/8] aarch64: Split out aarch64_adjust_body_cost_sve

2021-08-03 Thread Richard Sandiford via Gcc-patches
This patch splits the SVE-specific part of aarch64_adjust_body_cost out into its own subroutine, so that a future patch can call it more than once. I wondered about using a lambda to avoid having to pass all the arguments, but in the end this way seemed clearer. gcc/ * config/aarch64/aarc

[PATCH 4/8] aarch64: Add gather_load_xNN_cost tuning fields

2021-08-03 Thread Richard Sandiford via Gcc-patches
This patch adds tuning fields for the total cost of a gather load instruction. Until now, we've costed them as one scalar load per element instead. Those scalar_load-based values are also what the patch uses to fill in the new fields for existing cost structures. gcc/ * config/aarch64/aa

[PATCH 5/8] aarch64: Tweak the cost of elementwise stores

2021-08-03 Thread Richard Sandiford via Gcc-patches
When the vectoriser scalarises a strided store, it counts one scalar_store for each element plus one vec_to_scalar extraction for each element. However, extracting element 0 is free on AArch64, so it should have zero cost. I don't have a testcase that requires this for existing -mtune options, bu

[PATCH 6/8] aarch64: Tweak MLA vector costs

2021-08-03 Thread Richard Sandiford via Gcc-patches
The issue-based vector costs currently assume that a multiply-add sequence can be implemented using a single instruction. This is generally true for scalars (which have a 4-operand instruction) and SVE (which allows the output to be tied to any input). However, for Advanced SIMD, multiplying two v

[PATCH 7/8] aarch64: Restrict issue heuristics to inner vector loop

2021-08-03 Thread Richard Sandiford via Gcc-patches
The AArch64 vector costs try to take issue rates into account. However, when vectorising an outer loop, we lumped the inner and outer operations together, which is somewhat meaningless. This patch restricts the heuristic to the inner loop. gcc/ * config/aarch64/aarch64.c (aarch64_add_stmt_

[PATCH 8/8] aarch64: Add -mtune=neoverse-512tvb

2021-08-03 Thread Richard Sandiford via Gcc-patches
This patch adds an option to tune for Neoverse cores that have a total vector bandwidth of 512 bits (4x128 for Advanced SIMD and a vector-length-dependent equivalent for SVE). This is intended to be a compromise between tuning aggressively for a single core like Neoverse V1 (which can be too narro

[PATCH] vect: Tweak dump messages for vector mode choice

2021-08-03 Thread Richard Sandiford via Gcc-patches
After vect_analyze_loop has successfully analysed a loop for one base vector mode B1, it considers using following base vector modes to vectorise an epilogue. However, for VECT_COMPARE_COSTS, a later mode B2 might turn out to be better than B1 was. Initially this comparison will be between an epi

[PATCH] vect: Tweak comparisons with existing epilogue loops

2021-08-03 Thread Richard Sandiford via Gcc-patches
This patch uses a more accurate scalar iteration estimate when comparing the epilogue of a constant-iteration loop with a candidate replacement epilogue. In the testcase, the patch prevents a 1-to-3-element SVE epilogue from seeming better than a 64-bit Advanced SIMD epilogue. Tested on aarch64-l

Re: [PATCH] by_pieces: Properly set m_max_size in op_by_pieces

2021-08-04 Thread Richard Sandiford via Gcc-patches
"H.J. Lu via Gcc-patches" writes: > @@ -1122,8 +1122,8 @@ class op_by_pieces_d > and its associated FROM_CFN_DATA can be used to replace loads with > constant values. LEN describes the length of the operation. */ > > -op_by_pieces_d::op_by_pieces_d (rtx to, bool to_load, > -

Re: [PATCH V2] aarch64: Don't include vec_select in SIMD multiply cost

2021-08-04 Thread Richard Sandiford via Gcc-patches
Jonathan Wright via Gcc-patches writes: > Hi, > > V2 of the patch addresses the initial review comments, factors out > common code (as we discussed off-list) and adds a set of unit tests > to verify the code generation benefit. > > Regression tested and bootstrapped on aarch64-none-linux-gnu - no

Re: [PATCH] aarch64: Don't include vec_select high-half in SIMD multiply cost

2021-08-04 Thread Richard Sandiford via Gcc-patches
Jonathan Wright via Gcc-patches writes: > Hi, > > The Neon multiply/multiply-accumulate/multiply-subtract instructions > can select the top or bottom half of the operand registers. This > selection does not change the cost of the underlying instruction and > this should be reflected by the RTL cos

Re: [PATCH 1/2] Add emulated gather capability to the vectorizer

2021-08-04 Thread Richard Sandiford via Gcc-patches
Richard Biener writes: > This adds a gather vectorization capability to the vectorizer > without target support by decomposing the offset vector, doing > sclar loads and then building a vector from the result. This > is aimed mainly at cases where vectorizing the rest of the loop > offsets the co

Re: [PATCH 6/8] aarch64: Tweak MLA vector costs

2021-08-04 Thread Richard Sandiford via Gcc-patches
Richard Biener writes: > On Tue, Aug 3, 2021 at 2:10 PM Richard Sandiford via Gcc-patches > wrote: >> >> The issue-based vector costs currently assume that a multiply-add >> sequence can be implemented using a single instruction. This is >> generally true for s

Re: [PATCH 6/8] aarch64: Tweak MLA vector costs

2021-08-04 Thread Richard Sandiford via Gcc-patches
Richard Sandiford via Gcc-patches writes: > Richard Biener writes: >> On Tue, Aug 3, 2021 at 2:10 PM Richard Sandiford via Gcc-patches >> wrote: >>> >>> The issue-based vector costs currently assume that a multiply-add >>> sequence can be implem

Re: [PATCH V2] aarch64: Don't include vec_select high-half in SIMD multiply cost

2021-08-04 Thread Richard Sandiford via Gcc-patches
Jonathan Wright writes: > Hi, > > Changes suggested here and those discussed off-list have been > implemented in V2 of the patch. > > Regression tested and bootstrapped on aarch64-none-linux-gnu - no > issues. > > Ok for master? > > Thanks, > Jonathan > > --- > > gcc/ChangeLog: > > 2021-07-19 Jon

[committed] aarch64: Fix a typo

2021-08-04 Thread Richard Sandiford via Gcc-patches
Tested on aarch64-linux-gnu and pushed. Richard gcc/ * config/aarch64/aarch64.c: Fix a typo. --- gcc/config/aarch64/aarch64.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index f80de2ca897..81c002ba0b0 10

[PATCH] doc: Document cond_* shift optabs in md.texi

2021-08-05 Thread Richard Sandiford via Gcc-patches
As per $SUBJECT. OK to install? Richard gcc/ PR middle-end/101787 * doc/md.texi (cond_ashl, cond_ashr, cond_lshr): Document. --- gcc/doc/md.texi | 11 +++ 1 file changed, 11 insertions(+) diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index f6d1bc1ad0f..f8047aefccc 100

Re: [PATCH V2] aarch64: Don't include vec_select high-half in SIMD add cost

2021-08-05 Thread Richard Sandiford via Gcc-patches
Jonathan Wright writes: > Hi, > > V2 of this patch uses the same approach as that just implemented > for the multiply high-half cost patch. > > Regression tested and bootstrapped on aarch64-none-linux-gnu - no > issues. > > Ok for master? > > Thanks, > Jonathan > > --- > > gcc/ChangeLog: > > 2021-

Re: [PATCH V2] aarch64: Don't include vec_select high-half in SIMD subtract cost

2021-08-05 Thread Richard Sandiford via Gcc-patches
Jonathan Wright writes: > Hi, > > V2 of this change implements the same approach as for the multiply > and add-widen patches. > > Regression tested and bootstrapped on aarch64-none-linux-gnu - no > issues. > > Ok for master? > > Thanks, > Jonathan > > --- > > gcc/ChangeLog: > > 2021-07-28 Jonatha

[PATCH] vect: Move costing helpers from aarch64 code

2021-08-05 Thread Richard Sandiford via Gcc-patches
Richard Biener writes: > On Tue, Aug 3, 2021 at 2:09 PM Richard Sandiford via Gcc-patches > wrote: >> >> When the vectoriser scalarises a strided store, it counts one >> scalar_store for each element plus one vec_to_scalar extraction >> for each element. However, e

Re: [PATCH] vect: Move costing helpers from aarch64 code

2021-08-05 Thread Richard Sandiford via Gcc-patches
Richard Biener writes: > On Thu, Aug 5, 2021 at 2:04 PM Richard Sandiford > wrote: >> >> Richard Biener writes: >> > On Tue, Aug 3, 2021 at 2:09 PM Richard Sandiford via Gcc-patches >> > wrote: >> >> >> >> When the vectoriser scal

Re: [PATCH] Make sure we're playing with integral modes before call extract_integral_bit_field.

2021-08-06 Thread Richard Sandiford via Gcc-patches
Richard Biener via Gcc-patches writes: > On Fri, Aug 6, 2021 at 5:32 AM liuhongt wrote: >> >> Hi: >> --- >> OK, I think sth is amiss here upthread. insv/extv do look like they >> are designed >> to work on integer modes (but docs do not say anything about this here). >> In fact the caller of ext

Re: [PATCH 1/4] aarch64: Use memcpy to copy structures in vst4[q]_lane intrinsics

2021-08-06 Thread Richard Sandiford via Gcc-patches
Jonathan Wright writes: > Hi, > > As subject, this patch uses __builtin_memcpy to copy vector structures > instead of using a union - or constructing a new opaque structure one > vector at a time - in each of the vst4[q]_lane Neon intrinsics in > arm_neon.h. > > It also adds new code generation te

Re: [PATCH 2/4] aarch64: Use memcpy to copy structures in vst3[q]_lane intrinsics

2021-08-06 Thread Richard Sandiford via Gcc-patches
Jonathan Wright writes: > Hi, > > As subject, this patch uses __builtin_memcpy to copy vector structures > instead of using a union - or constructing a new opaque structure one > vector at a time - in each of the vst3[q]_lane Neon intrinsics in > arm_neon.h. > > It also adds new code generation te

Re: [PATCH 3/4] aarch64: Use memcpy to copy structures in vst2[q]_lane intrinsics

2021-08-06 Thread Richard Sandiford via Gcc-patches
Jonathan Wright writes: > Hi, > > As subject, this patch uses __builtin_memcpy to copy vector structures > instead of using a union - or constructing a new opaque structure one > vector at a time - in each of the vst2[q]_lane Neon intrinsics in > arm_neon.h. > > It also adds new code generation te

Re: [PATCH 4/4] aarch64: Use memcpy to copy structures in bfloat vst* intrinsics

2021-08-06 Thread Richard Sandiford via Gcc-patches
Jonathan Wright writes: > Hi, > > As subject, this patch uses __builtin_memcpy to copy vector structures > instead of using a union - or constructing a new opaque structure one > vector at a time - in each of the vst[234][q] and vst1[q]_x[234] bfloat > Neon intrinsics in arm_neon.h. > > It also ad

Re: [PATCH] [rtl-optimization] Simplify vector shift/rotate with const_vec_duplicate to vector shift/rotate with const_int element.

2021-08-06 Thread Richard Sandiford via Gcc-patches
liuhongt via Gcc-patches writes: > Hi: > Bootstrapped and regtested on x86_64-linux-gnu{-m32,} > Ok for trunk? I think if anything the canonicalisation should be the other way: if the shift amount is an in-range constant, we know that it fits within a vector element, and so the vector form sh

Re: [PATCH 4/7] ifcvt/optabs: Allow using a CC comparison for emit_conditional_move.

2021-08-06 Thread Richard Sandiford via Gcc-patches
Sorry for the slow reply. Robin Dapp via Gcc-patches writes: >> Hmm, OK. Doesn't expanding both versions up-front create the same kind of >> problem that the patch is fixing, in that we expand (and therefore cost) >> both the reversed and unreversed comparison? Also… >> > [..] >> >> …for min/

Re: [PATCH] testsuite: aarch64: Fix failing vector structure tests on big-endian

2021-08-06 Thread Richard Sandiford via Gcc-patches
Jonathan Wright writes: > diff --git a/gcc/testsuite/gcc.target/aarch64/vector_structure_intrinsics.c > b/gcc/testsuite/gcc.target/aarch64/vector_structure_intrinsics.c > index > 60c53bc27f8378c78b119576ed19fde0e5743894..a8e31ab85d6fd2a045c8efaf2cbc42b5f40d2411 > 100644 > --- a/gcc/testsuite/gc

Re: [PATCH, AArch64] PR target/101609 - Use the correct iterator for AArch64 vector right shift pattern.

2021-08-06 Thread Richard Sandiford via Gcc-patches
Tejas Belagod via Gcc-patches writes: > Hi, > > Loops containing long long shifts fail to vectorize due to the vectorizer > not being able to recognize long long right shifts. This is due to a bug > in the iterator used for the vashr and vlshr patterns in aarch64-simd.md. > > Tested and bootstrapp

Re: [PATCH] tree-optimization/101801 - remove vect_worthwhile_without_simd_p

2021-08-06 Thread Richard Sandiford via Gcc-patches
Richard Biener via Gcc-patches writes: > This removes the cost part of vect_worthwhile_without_simd_p, retaining > only the correctness bits. The reason is that the cost heuristic > do not properly account for SLP plus the check whether "without simd" > applies misfires for AVX512 mask vectors at

<    1   2   3   4   5   6   7   8   9   10   >