Re: [PATCH]AArch64 update costing for MLA by invariant

2023-08-02 Thread Richard Sandiford via Gcc-patches
Tamar Christina writes: > Hi All, > > When determining issue rates we currently discount non-constant MLA > accumulators > for Advanced SIMD but don't do it for the latency. > > This means the costs for Advanced SIMD with a constant accumulator are wrong > and > results in us costing SVE and Adv

Re: [PATCH]AArch64 update costing for combining vector conditionals

2023-08-02 Thread Richard Sandiford via Gcc-patches
Tamar Christina writes: > Hi All, > > boolean comparisons have different cost depending on the mode. e.g. > a && b when predicated doesn't require an addition instruction, the AND is > free Nit (for the commit msg): additional Maybe: for SVE, a && b doesn't require an additional instruction

Re: [PATCH]AArch64 update costing for MLA by invariant

2023-08-02 Thread Richard Sandiford via Gcc-patches
Tamar Christina writes: >> Tamar Christina writes: >> > Hi All, >> > >> > When determining issue rates we currently discount non-constant MLA >> > accumulators for Advanced SIMD but don't do it for the latency. >> > >> > This means the costs for Advanced SIMD with a constant accumulator are >> >

Re: [PATCH] tree-optimization/110838 - vectorization of widened shifts

2023-08-02 Thread Richard Sandiford via Gcc-patches
Richard Biener writes: > On Tue, 1 Aug 2023, Richard Sandiford wrote: > >> Richard Sandiford writes: >> > Richard Biener via Gcc-patches writes: >> >> The following makes sure to limit the shift operand when vectorizing >> >> (short)((int)x >> 31) via (short)x >> 31 as the out of bounds shift >>

Re: [PATCH] wide-int: Fix up wi::shifted_mask [PR106144]

2022-07-01 Thread Richard Sandiford via Gcc-patches
Jakub Jelinek writes: > Hi! > > As the following self-test testcase shows, wi::shifted_mask sometimes > doesn't create canonicalized wide_ints, which then fail to compare equal > to canonicalized wide_ints with the same value. > In particular, wi::mask (128, false, 128) gives { -1 } with len 1 and

Re: [PATCH] tree-optimization/106131 - wrong code with FRE rewriting

2022-07-01 Thread Richard Sandiford via Gcc-patches
Richard Biener via Gcc-patches writes: > The following makes sure to not use the original TBAA type for > looking up a value across an aggregate copy when we had to offset > the read. > > Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed to trunk. > > 2022-06-30 Richard Biener > >

Re: [PATCH] tree-optimization/106131 - wrong code with FRE rewriting

2022-07-01 Thread Richard Sandiford via Gcc-patches
Richard Biener writes: > On Fri, 1 Jul 2022, Richard Sandiford wrote: > >> Richard Biener via Gcc-patches writes: >> > The following makes sure to not use the original TBAA type for >> > looking up a value across an aggregate copy when we had to offset >> > the read. >> > >> > Bootstrapped and te

Re: [PATCH][AArch64] Implement ACLE Data Intrinsics

2022-07-01 Thread Richard Sandiford via Gcc-patches
"Andre Vieira (lists)" writes: > On 29/06/2022 08:18, Richard Sandiford wrote: >>> + break; >>> +case AARCH64_RBIT: >>> +case AARCH64_RBITL: >>> +case AARCH64_RBITLL: >>> + if (mode == SImode) >>> + icode = CODE_FOR_aarch64_rbitsi; >>> + else >>> + icode = CODE_FOR_a

Re: [PATCH 2/2] Revert maybe_ne -> known_ne change in vn_reference_lookup_3

2022-07-01 Thread Richard Sandiford via Gcc-patches
Richard Biener via Gcc-patches writes: > This reverts the change as discussed. Thanks! > Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed. > > 2022-07-01 Richard Biener > > * tree-ssa-sccvn.cc (vn_reference_lookup_3): Revert > back to using maybe_ne (off, -1). > --- >

Re: [RFC] trailing_wide_ints with runtime variable lengths

2022-07-01 Thread Richard Sandiford via Gcc-patches
Aldy Hernandez via Gcc-patches writes: > Currently global ranges are stored in SSA_NAME_RANGE_INFO as a pair of > wide_int-like objects along with the nonzero bits. We frequently lose > precision when streaming out our higher resolution iranges. The plan > was always to store the full irange bet

Re: [PATCH] Mips: Resolve build issues for the n32 ABI

2022-07-04 Thread Richard Sandiford via Gcc-patches
Xi Ruoyao via Gcc-patches writes: > On Fri, 2022-07-01 at 12:40 +, Dimitrije Milosevic wrote: >> Building the ASAN for the n32 MIPS ABI currently fails, due to a few reasons: >> - defined(__mips64), which is set solely based on the architecture type >> (32-bit/64-bit), >> was still used in s

Re: [PATCH 1/2]AArch64 Add fallback case using sdot for usdot

2022-07-04 Thread Richard Sandiford via Gcc-patches
t;> > > > -Original Message- >> >> > > > From: Richard Sandiford >> >> > > > Sent: Thursday, June 16, 2022 7:54 PM >> >> > > > To: Tamar Christina >> >> > > > Cc: gcc-patches@gcc.gnu.org; nd ; Richard

Re: [PATCH] aarch64: Move vreinterpret definitions into the compiler

2022-07-04 Thread Richard Sandiford via Gcc-patches
Sorry for the slow review. Andrew Carlotti via Gcc-patches writes: > Hi, > > This removes a significant number of intrinsic definitions from the arm_neon.h > header file, and reduces the amount of code duplication. The new macros and > data structures are intended to also facilitate moving other

Re: [PATCH] Maintain LC SSA when doing SVE vectorization

2022-07-05 Thread Richard Sandiford via Gcc-patches
Richard Biener writes: > The final loop IV use after the loop has that not in LC SSA > (and inserts not simplified _2 = _3 - 0 stmts). In particular > since it splits the exit edge when there's a virtual PHI in the > destination it breaks virtual LC SSA form (but likely also > non-virtual). > > T

Re: [PATCH]middle-end Use subregs to expand COMPLEX_EXPR to set the lowpart.

2022-07-05 Thread Richard Sandiford via Gcc-patches
Tamar Christina writes: >> > so that the multiple_p test is skipped if the structure is undefined. >> >> Actually, we should probably skip the constant_multiple_p test as well. >> Keeping it would only be meaningful for little-endian. >> >> simplify_gen_subreg should alread do the necessary chec

Re: [PATCH] Move reload_completed and other rtl.h globals to crtl structure.

2022-07-11 Thread Richard Sandiford via Gcc-patches
I know it'll seem like make-work, but could you put the combine flag in a separate follow-on patch? Reorganising the existing flags (very welcome!) and adding new ones seem like different things. TBH I'm a bit suspicious of the combine flag. What fundamental property holds true after combine tha

[committed] vect: Restore optab_vector argument [PR106250]

2022-07-11 Thread Richard Sandiford via Gcc-patches
In g:76c3041b856cb0 I'd removed a "C ? optab_vector : optab_mixed_sign" argument from a call to directly_supported_p, thinking that the argument only existed because of the condition (which I was removing). But the difference between the scalar and vector forms matters for shifts, so we do still n

[PATCH] aarch64: Remove redundant builtins code

2022-07-12 Thread Richard Sandiford via Gcc-patches
aarch64_builtin_vectorized_function handles some built-in functions that already have equivalent internal functions. This seems to be redundant now, since the target builtins that it chooses are mapped to the same optab patterns as the internal functions. Tested on aarch64-linux-gnu & pushed. Ri

[PATCH] Add internal functions for iround etc. [PR106253]

2022-07-12 Thread Richard Sandiford via Gcc-patches
The PR is about the aarch64 port using an ACLE built-in function to vectorise a scalar function call, even though the ECF_* flags for the ACLE function didn't match the ECF_* flags for the scalar call. To some extent that kind of difference is inevitable, since the ACLE intrinsics are supposed to

Ping^2: [RFA configure parts] aarch64: Make cc1 &co handle --with options

2022-07-12 Thread Richard Sandiford via Gcc-patches
Ping^2 for the configure bits. Richard Sandiford via Gcc-patches writes: > On aarch64, --with-arch, --with-cpu and --with-tune only have an > effect on the driver, so “./xgcc -B./ -O3” can give significantly > different results from “./cc1 -O3”. --with-arch did have a limited > eff

Re: [PATCH v2 1/2] aarch64: Don't return invalid GIMPLE assign statements

2022-07-13 Thread Richard Sandiford via Gcc-patches
Richard Biener via Gcc-patches writes: > On Tue, Jul 12, 2022 at 4:38 PM Andrew Carlotti > wrote: >> >> aarch64_general_gimple_fold_builtin doesn't check whether the LHS of a >> function call is null before converting it to an assign statement. To avoid >> returning an invalid GIMPLE statement i

[PATCH] arm: Replace arm_builtin_vectorized_function [PR106253]

2022-07-13 Thread Richard Sandiford via Gcc-patches
This patch extends the fix for PR106253 to AArch32. As with AArch64, we were using ACLE intrinsics to vectorise scalar built-ins, even though the two sometimes have different ECF_* flags. (That in turn is because the ACLE intrinsics should follow the instruction semantics as closely as possible,

Re: [PATCH v2 2/2] aarch64: Lower vcombine to GIMPLE

2022-07-13 Thread Richard Sandiford via Gcc-patches
Andrew Carlotti writes: > This lowers vcombine intrinsics to a GIMPLE vector constructor, which enables > better optimisation during GIMPLE passes. > > gcc/ > > * config/aarch64/aarch64-builtins.c > (aarch64_general_gimple_fold_builtin): Add combine. > > gcc/testsuite/ > > * gcc.

Re: [PATCH v2 1/4] aarch64: Add V1DI mode

2022-07-13 Thread Richard Sandiford via Gcc-patches
Andrew Carlotti writes: > We already have a V1DF mode, so this makes the vector modes more consistent. > > Additionally, this allows us to recognise uint64x1_t and int64x1_t types given > only the mode and type qualifiers (e.g. in aarch64_lookup_simd_builtin_type). > > gcc/ChangeLog: > > * c

Re: [PATCH v2 2/4] aarch64: Remove qualifier_internal

2022-07-13 Thread Richard Sandiford via Gcc-patches
Andrew Carlotti writes: > This has been unused since 2014, so there's no reason to retain it. > > gcc/ChangeLog: > > * config/aarch64/aarch64-builtins.cc > (enum aarch64_type_qualifiers): Remove qualifier_internal. > (aarch64_init_simd_builtin_functions): Remove qualifier_interna

Re: [PATCH v2 3/4] aarch64: Consolidate simd type lookup functions

2022-07-13 Thread Richard Sandiford via Gcc-patches
Andrew Carlotti writes: > There were several similarly-named functions, which each built or looked up a > type using a different subset of valid modes or qualifiers. > > This change combines these all into a single function, which can additionally > handle const and pointer qualifiers. I like the

Re: [PATCH v2 4/4] aarch64: Move vreinterpret definitions into the compiler

2022-07-13 Thread Richard Sandiford via Gcc-patches
Andrew Carlotti writes: > This removes a significant number of intrinsic definitions from the arm_neon.h > header file, and reduces the amount of code duplication. The new macros and > data structures are intended to also facilitate moving other intrinsic > definitions out of the header file in fu

Re: [aarch64] Use op_mode instead of vmode for op0, op1 in aarch64_vectorize_vec_perm_const

2022-07-14 Thread Richard Sandiford via Gcc-patches
Prathamesh Kulkarni writes: > Hi, > For following test case: > > svint32_t foo() > { > int32x4_t v = (int32x4_t) { 1, 2, 3, 4 }; > svint32_t v2 = svld1rq_s32 (svptrue_b8(), &v[0]); > return v2; > } > > After applying workaround in forwprop to not simplify VEC_PERM_EXPR in > simplify_permutat

Re: ICE after folding svld1rq to vec_perm_expr duing forwprop

2022-07-14 Thread Richard Sandiford via Gcc-patches
Richard Biener writes: > On Thu, Jul 14, 2022 at 9:55 AM Prathamesh Kulkarni > wrote: >> >> On Wed, 13 Jul 2022 at 12:22, Richard Biener >> wrote: >> > >> > On Tue, Jul 12, 2022 at 9:12 PM Prathamesh Kulkarni via Gcc-patches >> > wrote: >> > > >> > > Hi Richard, >> > > For the following test:

Re: [PATCH] aarch64: Replace manual swapping idiom with std::swap in aarch64.cc

2022-07-18 Thread Richard Sandiford via Gcc-patches
Richard Ball writes: > Replace manual swapping idiom with std::swap in aarch64.cc > > gcc/config/aarch64/aarch64.cc has a few manual swapping idioms of the form: > > x = in0, in0 = in1, in1 = x; > > The preferred way is using the standard: > > std::swap (in0, in1); > > We should just fix these to

Re: [PATCH v2.1 3/4] aarch64: Consolidate simd type lookup functions

2022-07-20 Thread Richard Sandiford via Gcc-patches
Andrew Carlotti writes: > On Wed, Jul 13, 2022 at 05:36:04PM +0100, Richard Sandiford wrote: >> I like the part about getting rid of: >> >> static tree >> aarch64_simd_builtin_type (machine_mode mode, >> bool unsigned_p, bool poly_p) >> >> and the flow of the new function

[PATCH] graphds: Fix description of SCC algorithm

2022-07-22 Thread Richard Sandiford via Gcc-patches
graphds_scc says that it uses Tarjan's algorithm, but it looks like it uses Kosaraju's algorithm instead (dfs one way followed by dfs the other way). OK to install? Richard gcc/ * graphds.cc (graphds_scc): Fix algorithm attribution. --- gcc/graphds.cc | 2 +- 1 file changed, 1 insertio

Re: [PATCH 1/1] Fix bit-position comparison

2022-07-27 Thread Richard Sandiford via Gcc-patches
Richard Biener writes: > On Wed, 27 Jul 2022, juzhe.zh...@rivai.ai wrote: > >> From: zhongjuzhe >> >> gcc/ChangeLog: >> >> * expr.cc (expand_assignment): Change GET_MODE_PRECISION to >> GET_MODE_BITSIZE >> >> --- >> gcc/expr.cc | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-)

Re: PING: [PATCH] libsanitizer: Cherry-pick 2bfb0fcb51510f22723c8cdfefe from upstream

2022-07-27 Thread Richard Sandiford via Gcc-patches
Dimitrije Milosevic writes: >> Do you know someone very familiar with MIPS and GCC and capable as a >> port maintainer? An active MIPS port maintainer will make the situation >> better. > Sadly, no. I agree it would make things easier. Yeah, I agree that's what we need. I stepped down from bein

Re: [PATCH] Add new target hook: simplify_modecc_const.

2022-07-28 Thread Richard Sandiford via Gcc-patches
Seems this thread has become a bit heated, so I'll try to proceed with caution :-) In the below, I'll use "X-mode const_int" to mean "a const_int that is known from context to represent an X-mode value". Of course, the const_int itself always stores VOIDmode. "Roger Sayle" writes: > Hi Segher,

[PATCH] RFC: Extend SLP permutation optimisations

2022-08-02 Thread Richard Sandiford via Gcc-patches
Currently SLP tries to force permute operations "down" the graph from loads in the hope of reducing the total number of permutes needed or (in the best case) removing the need for the permutes entirely. This patch tries to extend it as follows: - Allow loads to take a different permutation from t

Ping^3: [RFA configure parts] aarch64: Make cc1 &co handle --with options

2022-08-02 Thread Richard Sandiford via Gcc-patches
Ping^3 for the configure bits. Richard Sandiford via Gcc-patches writes: > On aarch64, --with-arch, --with-cpu and --with-tune only have an > effect on the driver, so “./xgcc -B./ -O3” can give significantly > different results from “./cc1 -O3”. --with-arch did have a limited > eff

Re: [PATCH] Teach VN about masked/len stores

2022-08-02 Thread Richard Sandiford via Gcc-patches
Richard Biener writes: > The following teaches VN to handle reads from .MASK_STORE and > .LEN_STORE. For this push_partial_def is extended first for > convenience so we don't have to handle the full def case in the > caller (possibly other paths can be simplified then). Also > the partial defini

Re: [PATCH] Some additional zero-extension related optimizations in simplify-rtx.

2022-08-02 Thread Richard Sandiford via Gcc-patches
"Roger Sayle" writes: > This patch implements some additional zero-extension and sign-extension > related optimizations in simplify-rtx.cc. The original motivation comes > from PR rtl-optimization/71775, where in comment #2 Andrew Pinski sees: > > Failed to match this instruction: > (set (reg:DI

Re: [PATCH take #2] Some additional zero-extension related optimizations in simplify-rtx.

2022-08-02 Thread Richard Sandiford via Gcc-patches
"Roger Sayle" writes: > Many thanks to Segher and Richard for pointing out that my removal > of optimizations of ABS(ABS(x)) and ABS(FFS(x)) in the original version > of this patch was incorrect, and my assumption that these would be > subsumed by val_signbit_known_clear_p was mistaken. That the

Re: [PATCH take #2] Some additional zero-extension related optimizations in simplify-rtx.

2022-08-02 Thread Richard Sandiford via Gcc-patches
Richard Sandiford via Gcc-patches writes: > "Roger Sayle" writes: >> Many thanks to Segher and Richard for pointing out that my removal >> of optimizations of ABS(ABS(x)) and ABS(FFS(x)) in the original version >> of this patch was incorrect, and my assumption that

Re: [PATCH] lower-subreg, expr: Mitigate inefficiencies derived from "(clobber (reg X))" followed by "(set (subreg (reg X)) (...))"

2022-08-03 Thread Richard Sandiford via Gcc-patches
Takayuki 'January June' Suwa via Gcc-patches writes: > Emitting "(clobber (reg X))" before "(set (subreg (reg X)) (...))" keeps > data flow consistent, but it also increases register allocation pressure > and thus often creates many unwanted register-to-register moves that > cannot be optimized aw

Re: [09/23] Add a cut-down version of std::span (array_slice)

2022-08-03 Thread Richard Sandiford via Gcc-patches
Martin Jambor writes: > Hi Richard, > > On Fri, Nov 13 2020, Richard Sandiford via Gcc-patches wrote: >> A later patch wants to be able to pass around subarray views of an >> existing array. The standard class to do that is std::span, but it's >> a C++20 thing.

Re: [PATCH] RFC: Extend SLP permutation optimisations

2022-08-04 Thread Richard Sandiford via Gcc-patches
Richard Biener writes: > On Tue, 2 Aug 2022, Richard Sandiford wrote: > >> Currently SLP tries to force permute operations "down" the graph >> from loads in the hope of reducing the total number of permutes >> needed or (in the best case) removing the need for the permutes >> entirely. This patch

Re: [PATCH] lower-subreg, expr: Mitigate inefficiencies derived from "(clobber (reg X))" followed by "(set (subreg (reg X)) (...))"

2022-08-04 Thread Richard Sandiford via Gcc-patches
Jeff Law via Gcc-patches writes: > On 8/3/2022 1:52 AM, Richard Sandiford via Gcc-patches wrote: >> Takayuki 'January June' Suwa via Gcc-patches >> writes: >>> Emitting "(clobber (reg X))" before "(set (subreg (reg X)) (...))" keeps &g

Re: [PATCH] lower-subreg, expr: Mitigate inefficiencies derived from "(clobber (reg X))" followed by "(set (subreg (reg X)) (...))"

2022-08-04 Thread Richard Sandiford via Gcc-patches
Takayuki 'January June' Suwa writes: > Thanks for your response. > > On 2022/08/03 16:52, Richard Sandiford wrote: >> Takayuki 'January June' Suwa via Gcc-patches >> writes: >>> Emitting "(clobber (reg X))" before "(set (subreg (reg X)) (...))" keeps >>> data flow consistent, but it also increas

Re: [RFC: PATCH] Extend vectorizer to handle nonlinear induction for neg, mul/lshift/rshift with a constant.

2022-08-04 Thread Richard Sandiford via Gcc-patches
Richard Biener writes: >> +/* Create vector init for vectorized iv. */ >> +static tree >> +vect_create_nonlinear_iv_init (gimple_seq* stmts, tree init_expr, >> + tree step_expr, poly_uint64 nunits, >> + tree vectype, >> +

Re: Missed lowering to ld1rq from svld1rq for memory operand

2022-08-05 Thread Richard Sandiford via Gcc-patches
Prathamesh Kulkarni writes: > Hi Richard, > Following from off-list discussion, in the attached patch, I wrote pattern > similar to vec_duplicate_reg, which seems to work for the svld1rq tests. > Does it look OK ? > > Sorry, I didn't fully understand your suggestion on integrating with > vec_dupli

Re: [PATCH] middle-end: Allow backend to expand/split double word compare to 0/-1.

2022-08-05 Thread Richard Sandiford via Gcc-patches
"Roger Sayle" writes: > This patch to the middle-end's RTL expansion reorders the code in > emit_store_flag_1 so that the backend has more control over how best > to expand/split double word equality/inequality comparisons against > zero or minus one. With the current implementation, the middle-e

Re: [RFA configure parts] aarch64: Make cc1 &co handle --with options

2022-08-05 Thread Richard Sandiford via Gcc-patches
Richard Earnshaw writes: > On 13/06/2022 15:33, Richard Sandiford via Gcc-patches wrote: >> On aarch64, --with-arch, --with-cpu and --with-tune only have an >> effect on the driver, so “./xgcc -B./ -O3” can give significantly >> different results from “./cc1 -O3”. --with-ar

Re: [PATCH]AArch64 Undo vec_widen_shiftl optabs [PR106346]

2023-08-02 Thread Richard Sandiford via Gcc-patches
Tamar Christina writes: > Hi All, > > In GCC 11 we implemented the vectorizer optab for widening left shifts, > however this optab is only supported for uniform shift constants. > > At the moment GCC still has two loop vectorization strategy (classical loop > and > SLP based loop vec) and the opt

Re: [PATCH][gensupport]: Don't segfault on empty attrs list

2023-08-02 Thread Richard Sandiford via Gcc-patches
Tamar Christina writes: > Hi All, > > Currently we segfault when len == 0 for an attribute list. > > essentially [cons: =0, 1, 2, 3; attrs: ] segfaults but should be equivalent to > [cons: =0, 1, 2, 3] and [cons: =0, 1, 2, 3; attrs:]. This fixes it by just > returning early and leaving it to the

Re: [PATCH] tree-optimization/110838 - vectorization of widened shifts

2023-08-02 Thread Richard Sandiford via Gcc-patches
Richard Biener writes: > [...] >> >> in vect_determine_precisions_from_range. Maybe we should drop >> >> the shift handling from there and instead rely on >> >> vect_determine_precisions_from_users, extending: >> >> >> >> if (TREE_CODE (shift) != INTEGER_CST >> >> || !wi::ltu_p (wi::to_w

Re: [PATCH] AArch64: Do not increase the vect reduction latency by multiplying count [PR110625]

2023-08-03 Thread Richard Sandiford via Gcc-patches
Hao Liu OS writes: > Hi Richard, > > Update the patch with a simple case (see below case and comments). It shows > a live stmt may not have reduction def, which introduce the ICE. > > Is it OK for trunk? OK, thanks. Richard > > Fix the assertion failure on empty reduction define in info_

[PATCH] poly_int: Handle more can_div_trunc_p cases

2023-08-03 Thread Richard Sandiford via Gcc-patches
can_div_trunc_p (a, b, &Q, &r) tries to compute a Q and r that satisfy the usual conditions for truncating division: (1) a = b * Q + r (2) |b * Q| <= |a| (3) |r| < |b| We can compute Q using the constant component (the case when all indeterminates are zero). Since |r| < |b| for th

Re: [PATCH]AArch64 Undo vec_widen_shiftl optabs [PR106346]

2023-08-03 Thread Richard Sandiford via Gcc-patches
Tamar Christina writes: >> > + >> > +(define_constraint "D3" >> > + "@internal >> > + A constraint that matches vector of immediates that is with 0 to >> > +(bits(mode)/2)-1." >> > + (and (match_code "const,const_vector") >> > + (match_test "aarch64_const_vec_all_same_in_range_p (op, 0, >> >

Re: [RFC] [v2] Extend fold_vec_perm to handle VLA vectors

2023-08-03 Thread Richard Sandiford via Gcc-patches
Prathamesh Kulkarni writes: > On Tue, 25 Jul 2023 at 18:25, Richard Sandiford > wrote: >> >> Hi, >> >> Thanks for the rework and sorry for the slow review. > Hi Richard, > Thanks for the suggestions! Please find my responses inline below. >> >> Prathamesh Kulkarni writes: >> > Hi Richard, >> >

Re: [RFC] [v2] Extend fold_vec_perm to handle VLA vectors

2023-08-03 Thread Richard Sandiford via Gcc-patches
Richard Sandiford writes: > Prathamesh Kulkarni writes: >> On Tue, 25 Jul 2023 at 18:25, Richard Sandiford >> wrote: >>> >>> Hi, >>> >>> Thanks for the rework and sorry for the slow review. >> Hi Richard, >> Thanks for the suggestions! Please find my responses inline below. >>> >>> Prathamesh K

Re: [PATCH]AArch64 update costing for MLA by invariant

2023-08-03 Thread Richard Sandiford via Gcc-patches
Tamar Christina writes: >> >> Do you see vect_constant_defs in practice, or is this just for >> >> completeness? >> >> I would expect any constants to appear as direct operands. I don't >> >> mind keeping it if it's just a belt-and-braces thing though. >> > >> > In the latency case where I had a

Re: [RFC] Combine zero_extract and sign_extend for TARGET_TRULY_NOOP_TRUNCATION

2023-08-04 Thread Richard Sandiford via Gcc-patches
YunQiang Su writes: > PR #104914 > > On TRULY_NOOP_TRUNCATION_MODES_P (DImode, SImode)) == true platforms, > zero_extract (SI, SI) can be sign-extended. So, if a zero_extract (DI, > DI) following with an sign_extend(SI, DI) can be merged to a single > zero_extract (SI, SI). > > gcc/ChangeLog: >

Re: [PATCH] tree-optimization/110838 - vectorization of widened right shifts

2023-08-04 Thread Richard Sandiford via Gcc-patches
Richard Biener writes: > The following fixes a problem with my last attempt of avoiding > out-of-bound shift values for vectorized right shifts of widened > operands. Instead of truncating the shift amount with a bitwise > and we actually need to saturate it to the target precision. > > The follo

Re: [RFC] [v2] Extend fold_vec_perm to handle VLA vectors

2023-08-04 Thread Richard Sandiford via Gcc-patches
Full review this time, sorry for the skipping the tests earlier. Prathamesh Kulkarni writes: > diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc > index 7e5494dfd39..680d0e54fd4 100644 > --- a/gcc/fold-const.cc > +++ b/gcc/fold-const.cc > @@ -85,6 +85,10 @@ along with GCC; see the file COPYING3.

Re: [RFC] [v2] Extend fold_vec_perm to handle VLA vectors

2023-08-08 Thread Richard Sandiford via Gcc-patches
Prathamesh Kulkarni writes: > On Fri, 4 Aug 2023 at 20:36, Richard Sandiford > wrote: >> >> Full review this time, sorry for the skipping the tests earlier. > Thanks for the detailed review! Please find my responses inline below. >> >> Prathamesh Kulkarni writes: >> > diff --git a/gcc/fold-const

Re: [PATCH] aarch64: enable mixed-types for aarch64 simdclones

2023-08-08 Thread Richard Sandiford via Gcc-patches
"Andre Vieira (lists)" writes: > Hi, > > This patch enables the use of mixed-types for simd clones for AArch64 > and adds aarch64 as a target_vect_simd_clones. > > Bootstrapped and regression tested on aarch64-unknown-linux-gnu > > gcc/ChangeLog: > > * config/aarch64/aarch64.cc (currentl

Re: [PATCH][GCC] aarch64: Add support for Cortex-A520 CPU

2023-08-08 Thread Richard Sandiford via Gcc-patches
Richard Ball writes: > This patch adds support for the Cortex-A520 CPU to GCC. > > No regressions on aarch64-none-elf. > > Ok for master? > > > gcc/ChangeLog: > >     * config/aarch64/aarch64-cores.def (AARCH64_CORE): Add > Cortex-A520 CPU. >     * config/aarch64/aarch64-tune.md: Regene

Re: [PATCH] aarch64: SVE/NEON Bridging intrinsics

2023-08-09 Thread Richard Sandiford via Gcc-patches
Richard Ball writes: > ACLE has added intrinsics to bridge between SVE and Neon. > > The NEON_SVE Bridge adds intrinsics that allow conversions between NEON and > SVE vectors. > > This patch adds support to GCC for the following 3 intrinsics: > svset_neonq, svget_neonq and svdup_neonq > > gcc/Chan

Re: [PATCH] VECT: Support loop len control on EXTRACT_LAST vectorization

2023-08-09 Thread Richard Sandiford via Gcc-patches
"juzhe.zh...@rivai.ai" writes: > Hi, Richi. > >>> that should be > >>> || (!LOOP_VINFO_FULLY_MASKED_P (loop_vinfo) >>> && !LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo)) > >>> I think. It seems to imply that SLP isn't supported with >>> masking/lengthing. > > Oh, yes. At first glance, the

Re: [PATCH] aarch64: enable mixed-types for aarch64 simdclones

2023-08-09 Thread Richard Sandiford via Gcc-patches
"Andre Vieira (lists)" writes: > Here is my new version, see inline response to your comments. > > New cover letter: > > This patch enables the use of mixed-types for simd clones for AArch64, > adds aarch64 as a target_vect_simd_clones and corrects the way the > simdlen is chosen for non-specifi

Re: [PATCH] aarch64: enable mixed-types for aarch64 simdclones

2023-08-09 Thread Richard Sandiford via Gcc-patches
Jakub Jelinek writes: > On Wed, Aug 09, 2023 at 05:55:28PM +0100, Richard Sandiford wrote: >> Jakub: do you remember what the reason was? I don't mind dropping >> "function", but it feels weird to drop the quotes around "simd". >> Seems like, if we do that, there'll one day be a patch to add >> t

Re: [PATCH] aarch64: enable mixed-types for aarch64 simdclones

2023-08-10 Thread Richard Sandiford via Gcc-patches
Jakub Jelinek writes: > On Wed, Aug 09, 2023 at 06:27:20PM +0100, Richard Sandiford wrote: >> Jakub Jelinek writes: >> > On Wed, Aug 09, 2023 at 05:55:28PM +0100, Richard Sandiford wrote: >> >> Jakub: do you remember what the reason was? I don't mind dropping >> >> "function", but it feels weird

Re: [PATCH] VR-VALUES: Simplify comparison using range pairs

2023-08-10 Thread Richard Sandiford via Gcc-patches
Richard Biener via Gcc-patches writes: > On Wed, Aug 9, 2023 at 6:16 PM Andrew Pinski via Gcc-patches > wrote: >> >> If `A` has a range of `[0,0][100,INF]` and the comparison >> of `A < 50`. This should be optimized to `A <= 0` (which then >> will be optimized to just `A == 0`). >> This patch imp

Re: [PATCH] VR-VALUES: Simplify comparison using range pairs

2023-08-10 Thread Richard Sandiford via Gcc-patches
Richard Biener writes: > On Thu, Aug 10, 2023 at 3:44 PM Richard Sandiford > wrote: >> >> Richard Biener via Gcc-patches writes: >> > On Wed, Aug 9, 2023 at 6:16 PM Andrew Pinski via Gcc-patches >> > wrote: >> >> >> >> If `A` has a range of `[0,0][100,INF]` and the comparison >> >> of `A < 50`.

Re: [RFC] [v2] Extend fold_vec_perm to handle VLA vectors

2023-08-10 Thread Richard Sandiford via Gcc-patches
Prathamesh Kulkarni writes: >> static bool >> is_simple_vla_size (poly_uint64 size) >> { >> if (size.is_constant ()) >> return false; >> for (int i = 1; i < ARRAY_SIZE (size.coeffs); ++i) >> if (size[i] != (i <= 1 ? size[0] : 0)) > Just wondering is this should be (i == 1 ? size[0] : 0

Re: [RFC] GCC Security policy

2023-08-10 Thread Richard Sandiford via Gcc-patches
Siddhesh Poyarekar writes: > On 2023-08-08 10:30, Siddhesh Poyarekar wrote: >>> Do you have a suggestion for the language to address libgcc, >>> libstdc++, etc. and libiberty, libbacktrace, etc.? >> >> I'll work on this a bit and share a draft. > > Hi David, > > Here's what I came up with for di

Re: [PATCH] tree-optimization/110979 - fold-left reduction and partial vectors

2023-08-11 Thread Richard Sandiford via Gcc-patches
Richard Biener writes: > When we vectorize fold-left reductions with partial vectors but > no target operation available we use a vector conditional to force > excess elements to zero. But that doesn't correctly preserve > the sign of zero. The following patch disables partial vector > support i

Re: [PATCH V3] VECT: Support loop len control on EXTRACT_LAST vectorization

2023-08-11 Thread Richard Sandiford via Gcc-patches
Richard Biener writes: > On Fri, 11 Aug 2023, juzhe.zh...@rivai.ai wrote: > >> Hi, Richi. >> >> > 1. Target is using loop MASK as the partial vector loop control. >> >> I don't think it checks for this? >> >> I am not sure whether I understand EXTRACT_LAST correctly. >> But if target doesn't use

Re: [PATCH] genrecog: Add SUBREG_BYTE.to_constant check to the genrecog

2023-08-14 Thread Richard Sandiford via Gcc-patches
Juzhe-Zhong writes: > Hi, there is genrecog issue happens in RISC-V backend. > > This is the ICE info: > > 0xfa3ba4 poly_int_pod<2u, unsigned short>::to_constant() const > ../../../riscv-gcc/gcc/poly-int.h:504 > 0x28eaa91 recog_5 > ../../../riscv-gcc/gcc/config/riscv/bitmanip.md:31

Re: [PATCH] vect: Move VMAT_GATHER_SCATTER handlings from final loop nest

2023-08-14 Thread Richard Sandiford via Gcc-patches
Thanks for the clean-ups. But... "Kewen.Lin" writes: > Hi, > > Following Richi's suggestion [1], this patch is to move the > handlings on VMAT_GATHER_SCATTER in the final loop nest > of function vectorizable_load to its own loop. Basically > it duplicates the final loop nest, clean up some usel

Re: [RFC] [v2] Extend fold_vec_perm to handle VLA vectors

2023-08-14 Thread Richard Sandiford via Gcc-patches
Prathamesh Kulkarni writes: > On Thu, 10 Aug 2023 at 21:27, Richard Sandiford > wrote: >> >> Prathamesh Kulkarni writes: >> >> static bool >> >> is_simple_vla_size (poly_uint64 size) >> >> { >> >> if (size.is_constant ()) >> >> return false; >> >> for (int i = 1; i < ARRAY_SIZE (size.coe

Re: [PATCH] vect: Move VMAT_GATHER_SCATTER handlings from final loop nest

2023-08-14 Thread Richard Sandiford via Gcc-patches
"Kewen.Lin" writes: > Hi Richard, > > on 2023/8/14 20:20, Richard Sandiford wrote: >> Thanks for the clean-ups. But... >> >> "Kewen.Lin" writes: >>> Hi, >>> >>> Following Richi's suggestion [1], this patch is to move the >>> handlings on VMAT_GATHER_SCATTER in the final loop nest >>> of functio

Re: [RFC] GCC Security policy

2023-08-14 Thread Richard Sandiford via Gcc-patches
I think it would help to clarify what the aim of the security policy is. Specifically: (1) What service do we want to provide to users by classifying one thing as a security bug and another thing as not a security bug? (2) What service do we want to provide to the GNU community by the same

Re: [PATCH] Add support for vector conitional not

2023-08-14 Thread Richard Sandiford via Gcc-patches
Andrew Pinski via Gcc-patches writes: > Like the support conditional neg (r12-4470-g20dcda98ed376cb61c74b2c71), > this just adds conditional not too. > Also we should be able to turn `(a ? -1 : 0) ^ b` into a conditional > not. > > OK? Bootstrapped and tested on x86_64-linux-gnu and aarch64-linux-

Re: [PATCH] vect: Move VMAT_GATHER_SCATTER handlings from final loop nest

2023-08-15 Thread Richard Sandiford via Gcc-patches
Richard Biener writes: > On Tue, Aug 15, 2023 at 4:44 AM Kewen.Lin wrote: >> >> on 2023/8/14 22:16, Richard Sandiford wrote: >> > No, it was more that 219-142=77, so it seems like a lot of lines >> > are being duplicated rather than simply being moved. (Unlike for >> > VMAT_LOAD_STORE_LANES, whi

Re: [PATCH V4] VECT: Support loop len control on EXTRACT_LAST vectorization

2023-08-15 Thread Richard Sandiford via Gcc-patches
Richard Biener writes: > On Tue, 15 Aug 2023, Kewen.Lin wrote: > >> Hi Stefan, >> >> on 2023/8/15 02:51, Stefan Schulze Frielinghaus wrote: >> > Hi everyone, >> > >> > I have bootstrapped and regtested the patch below on s390. For the >> > 64-bit target I do not see any changes regarding the te

Re: [PATCH][RFC] tree-optimization/92335 - Improve sinking heuristics for vectorization

2023-08-15 Thread Richard Sandiford via Gcc-patches
Richard Biener writes: > On Mon, 14 Aug 2023, Prathamesh Kulkarni wrote: >> On Mon, 7 Aug 2023 at 13:19, Richard Biener >> wrote: >> > It doesn't seem to make a difference for x86. That said, the "fix" is >> > probably sticking the correct target on the dump-check, it seems >> > that vect_fold_

Re: [PATCH V4] VECT: Support loop len control on EXTRACT_LAST vectorization

2023-08-15 Thread Richard Sandiford via Gcc-patches
Richard Biener writes: > On Tue, 15 Aug 2023, Richard Sandiford wrote: > >> Richard Biener writes: >> > On Tue, 15 Aug 2023, Kewen.Lin wrote: >> > >> >> Hi Stefan, >> >> >> >> on 2023/8/15 02:51, Stefan Schulze Frielinghaus wrote: >> >> > Hi everyone, >> >> > >> >> > I have bootstrapped and reg

Re: [PATCH] vect: Move VMAT_GATHER_SCATTER handlings from final loop nest

2023-08-15 Thread Richard Sandiford via Gcc-patches
Richard Biener writes: >> OK, fair enough. So the idea is: see where we end up and then try to >> improve/factor the APIs in a less peephole way? > > Yeah, I think that's the only good way forward. OK, no objection from me. Sorry for holding the patch up. Richard

Re: [PATCH] Handle TYPE_OVERFLOW_UNDEFINED vectorized BB reductions

2023-08-15 Thread Richard Sandiford via Gcc-patches
Richard Biener writes: > The following changes the gate to perform vectorization of BB reductions > to use needs_fold_left_reduction_p which in turn requires handling > TYPE_OVERFLOW_UNDEFINED types in the epilogue code generation by > promoting any operations generated there to use unsigned arith

Re: [PATCH] IFN: Fix vector extraction into promoted subreg.

2023-08-15 Thread Richard Sandiford via Gcc-patches
"juzhe.zh...@rivai.ai" writes: > Hi, Robin, Richard and Richi. > > I am wondering whether we can just simply replace the VEC_EXTRACT expander > with binary? > > Like this :? > > DEF_INTERNAL_OPTAB_FN (VEC_EXTRACT, ECF_CONST | ECF_NOTHROW, > - vec_extract, vec_extract) > + vec_ex

Re: [RFC] [v2] Extend fold_vec_perm to handle VLA vectors

2023-08-16 Thread Richard Sandiford via Gcc-patches
Prathamesh Kulkarni writes: >> Unfortunately, the patch regressed following tests on ppc64le and >> armhf respectively: >> gcc.target/powerpc/vec-perm-ctor.c scan-tree-dump-not optimized >> "VIEW_CONVERT_EXPR" >> gcc.dg/tree-ssa/forwprop-20.c scan-tree-dump-not forwprop1 "VEC_PERM_EXPR" >> >> This

Re: [PATCH] IFN: Fix vector extraction into promoted subreg.

2023-08-16 Thread Richard Sandiford via Gcc-patches
Robin Dapp writes: >> However: >> >> | #define vec_extract_direct { 3, 3, false } >> >> This looks wrong. The numbers are argument numbers (or -1 for a return >> value). vec_extract only takes 2 arguments, so 3 looks to be out-of-range. >> >> | #define direct_vec_extract_optab_supported_p dir

Re: [PATCH v2][GCC] aarch64: Add support for Cortex-A720 CPU

2023-08-16 Thread Richard Sandiford via Gcc-patches
Richard Ball writes: > v2: Add missing PROFILE feature flag. > > This patch adds support for the Cortex-A720 CPU to GCC. > > No regressions on aarch64-none-elf. > > Ok for master? > > gcc/ChangeLog: > > * config/aarch64/aarch64-cores.def (AARCH64_CORE): Add Cortex- > A720 CPU. >

Re: [WIP RFC] Add support for keyword-based attributes

2023-08-16 Thread Richard Sandiford via Gcc-patches
Joseph Myers writes: > On Mon, 17 Jul 2023, Michael Matz via Gcc-patches wrote: > >> So, essentially you want unignorable attributes, right? Then implement >> exactly that: add one new keyword "__known_attribute__" (invent a better >> name, maybe :) ), semantics exactly as with __attribute__ (i

Re: [PATCH] doc: Fixes to RTL-SSA sample code

2023-08-17 Thread Richard Sandiford via Gcc-patches
Alex Coplan writes: > Hi, > > This patch fixes up the code examples in the RTL-SSA documentation (the > sections on making insn changes) to reflect the current API. > > The main issues are as follows: > - rtl_ssa::recog takes an obstack_watermark & as the first parameter. >Presumably this is

[PATCH] c: Add support for [[__extension__ ...]]

2023-08-17 Thread Richard Sandiford via Gcc-patches
Joseph Myers writes: > On Wed, 16 Aug 2023, Richard Sandiford via Gcc-patches wrote: > >> Would it be OK to add support for: >> >> [[__extension__ ...]] >> >> to suppress the pedwarn about using [[]] prior to C2X? Then we can > > That seems lik

Re: [PATCH] c: Add support for [[__extension__ ...]]

2023-08-17 Thread Richard Sandiford via Gcc-patches
Richard Biener writes: >> Am 17.08.2023 um 13:25 schrieb Richard Sandiford via Gcc-patches >> : >> >> Joseph Myers writes: >>>> On Wed, 16 Aug 2023, Richard Sandiford via Gcc-patches wrote: >>>> >>>> Would it be OK to add support f

Re: [PATCH] tree-optimization/111048 - avoid flawed logic in fold_vec_perm

2023-08-18 Thread Richard Sandiford via Gcc-patches
Richard Biener writes: > The following avoids running into somehow flawed logic in fold_vec_perm > for non-VLA vectors. > > Bootstrap & regtest running on x86_64-unknown-linux-gnu. > > Richard. > > PR tree-optimization/111048 > * fold-const.cc (fold_vec_perm_cst): Check for non-VLA >

Re: [PATCH] c: Add support for [[__extension__ ...]]

2023-08-18 Thread Richard Sandiford via Gcc-patches
Richard Sandiford writes: > Joseph Myers writes: >> On Wed, 16 Aug 2023, Richard Sandiford via Gcc-patches wrote: >> >>> Would it be OK to add support for: >>> >>> [[__extension__ ...]] >>> >>> to suppress the pedwarn about u

Re: [PATCH] tree-optimization/111048 - avoid flawed logic in fold_vec_perm

2023-08-21 Thread Richard Sandiford via Gcc-patches
Prathamesh Kulkarni writes: > On Mon, 21 Aug 2023 at 12:26, Richard Biener wrote: >> >> On Sat, 19 Aug 2023, Prathamesh Kulkarni wrote: >> >> > On Fri, 18 Aug 2023 at 14:52, Richard Biener wrote: >> > > >> > > On Fri, 18 Aug 2023, Richard Sandiford wrote: >> > > >> > > > Richard Biener writes:

Re: [PATCH] gimple_fold: Support COND_LEN_FNMA/COND_LEN_FMS/COND_LEN_FNMS gimple fold

2023-08-21 Thread Richard Sandiford via Gcc-patches
Juzhe-Zhong writes: > Hi, Richard and Richi. > > Currently, GCC support COND_LEN_FMA for floating-point **NO** -ffast-math. > It's supported in tree-ssa-math-opts.cc. However, GCC failed to support > COND_LEN_FNMA/COND_LEN_FMS/COND_LEN_FNMS. > > Consider this following case: > #define TEST_TYPE(T

<    1   2   3   4   5   6   7   8   9   10   >