Re: [PATCH][GCC][PR target/98177] aarch64: SVE: ICE in expand_direct_optab_fn

2020-12-16 Thread Richard Sandiford via Gcc-patches
Przemyslaw Wirkus writes: > > This is a bug in the vectoriser: the vectoriser shouldn't generate > > IFN_REDUC_MAX calls that the target doesn't support. > > > > I think the problem comes from using the wrong interface to get the index > > type for a COND_REDUCTION. vectorizable_reduction has: >

Re: [07/23] Add a class that multiplexes two pointer types

2020-12-16 Thread Richard Sandiford via Gcc-patches
Martin Sebor writes: > On 11/26/20 10:06 AM, Richard Sandiford wrote: >> Martin Sebor writes: >>> I do have one concern: the tendency to prioritize efficiency >>> over safety (this can be said about most GCC code). Specifically >>> in this class, the address bit twiddling makes me uneasy. I don'

Re: [16/23] recog: Add a way of temporarily undoing changes

2020-12-16 Thread Richard Sandiford via Gcc-patches
Jeff Law via Gcc-patches writes: > On 11/13/20 1:19 AM, Richard Sandiford via Gcc-patches wrote: >> In some cases, it can be convenient to roll back the changes that >> have been made by validate_change to see how things looked before, >> then reroll the changes. For e

Re: [08/23] Add an alternative splay tree implementation

2020-12-16 Thread Richard Sandiford via Gcc-patches
Jeff Law writes: > On 11/13/20 1:15 AM, Richard Sandiford via Gcc-patches wrote: >> We already have two splay tree implementations: the old C one in >> libiberty and a templated reimplementation of it in typed-splay-tree.h. >> However, they have some drawbacks: >&

Re: [PATCH 22/23] Add rtl-ssa

2020-12-16 Thread Richard Sandiford via Gcc-patches
Jeff Law via Gcc-patches writes: > On 11/13/20 1:23 AM, Richard Sandiford via Gcc-patches wrote: >> This patch adds the RTL SSA infrastructure itself. The following >> fwprop.c patch will make use of it. >> >> gcc/ >> * configure.ac: Add rtl-ssa to t

Re: [PATCH 23/23] fwprop: Rewrite to use RTL SSA

2020-12-16 Thread Richard Sandiford via Gcc-patches
Jeff Law via Gcc-patches writes: > On 11/13/20 1:24 AM, Richard Sandiford via Gcc-patches wrote: >> This patch rewrites fwprop.c to use the RTL SSA framework. It tries >> as far as possible to mimic the old behaviour, even in caes where >> that doesn't fit naturally wit

Re: [20/23] rtlanal: Add simple_regno_set

2020-12-16 Thread Richard Sandiford via Gcc-patches
Jeff Law writes: > On 11/13/20 1:21 AM, Richard Sandiford via Gcc-patches wrote: >> This patch adds a routine for finding a “simple” SET for a register >> definition. See the comment in the patch for details. >> >> gcc/ >> * rtl.h (simple_regno_s

Re: [PATCH] rtl-ssa: Include memmodel.h before tm_p.h

2020-12-17 Thread Richard Sandiford via Gcc-patches
Rainer Orth writes: > Hi Kyryll, > >>> Fixed by moving the memmove.h include in rtl-ssa.h before tm_p.h. >>> >>> Tested on sparc-sun-solaris2.11 and i386-pc-solaris2.11 (both into stage >>> 3 now, so the compilation error is gone). >>> >>> Ok for master? >> >> AFAIK simple patches like this that

Re: [PATCH] vect, aarch64: Extend SVE vs Advanced SIMD costing decisions in vect_better_loop_vinfo_p

2020-12-17 Thread Richard Sandiford via Gcc-patches
Kyrylo Tkachov via Gcc-patches writes: > Hi all, > > While experimenting with some backend costs for Advanced SIMD and SVE I hit > many cases where GCC would pick SVE for VLA auto-vectorisation even when the > backend very clearly presented cheaper costs for Advanced SIMD. > For a simple float add

Re: [07/23] Add a class that multiplexes two pointer types

2020-12-17 Thread Richard Sandiford via Gcc-patches
Tom Tromey writes: >>>>>> "Richard" == Richard Sandiford via Gcc-patches >>>>>> writes: > > Richard> +// A class that stores a choice "A or B", where A has type T1 * and > B has > Richard> +// type T2 *. Both T1 and T2

[committed] rtl-ssa: Fix reg_raw_mode thinko [PR98347]

2020-12-17 Thread Richard Sandiford via Gcc-patches
I'd used reg_raw_mode[regno] for general registers, even though the array is only valid for hard registers. This patch uses regno_reg_rtx instead. Tested on i686-linux-gnu, committed as obvious. Richard gcc/ PR rtl-optimization/98347 * rtl-ssa/access-utils.h (full_register): Us

[PATCH] vect: Fix missing alias checks for 128-bit SVE [PR98371]

2020-12-18 Thread Richard Sandiford via Gcc-patches
On AArch64, the vectoriser tries various ways of vectorising with both SVE and Advanced SIMD and picks the best one. All other things being equal, it prefers earlier attempts over later attempts. The way this works currently is that, once it has a successful vectorisation attempt A, it analyses a

[committed] aarch64: Extend aarch64-autovec-preference==2 to 128-bit SVE

2020-12-18 Thread Richard Sandiford via Gcc-patches
When compiling with -msve-vector-bits=128, aarch64_preferred_simd_mode would pass the same vector width to aarch64_simd_container_mode for both SVE and Advanced SIMD, and so Advanced SIMD would always “win”. This patch instead makes it choose directly between SVE and Advanced SIMD modes, so that aa

[PATCH] rtl-ssa: Fix updates to call clobbers [PR98403]

2020-12-21 Thread Richard Sandiford via Gcc-patches
In the PR, fwprop was changing a call instruction and tripped an assert when trying to update a list of call clobbers. There are two ways we could handle this: remove the call clobber and then add it back, or assume that the clobber will stay in its current place. At the moment we don't have enoug

Re: [PATCH v3] arm&aarch64: subdivide the type attribute "alu_shfit_imm"

2020-12-22 Thread Richard Sandiford via Gcc-patches
"Qian, Jianhua" writes: > Hi Richard > > Thanks for reviewing again. > I have updated the patch to v3. Thanks, pushed to master now that the copyright assignment is on file. Richard

Re: -mno-long-calls for expected regalloc in arm/fp16-aapcs-2.c test

2020-12-29 Thread Richard Sandiford via Gcc-patches
Alexandre Oliva writes: > The implicit -mlong-calls used in our arm-vxworks configurations > changes the register allocation patterns in the arm/fp16-aapcs-2.c > test: r3 ends up used in the long-call sequence, and we end up using > ip as a temporary, which doesn't match the expected mov patterns.

Re: -mno-long-calls for arm/headmerge tests

2020-12-29 Thread Richard Sandiford via Gcc-patches
Alexandre Oliva writes: > The headmerge tests pass a constant to conditional calls, so that the > same constant is always passed to a function, though it's a different > function depending on which path is taken. > > The test checks that the constant appears only once in the assembly > output, as

Re: -mno-long-calls for arm/no_unique_address tests

2020-12-29 Thread Richard Sandiford via Gcc-patches
Alexandre Oliva writes: > The implicit -mlong-calls from our vxworks configurations makes the > tail-call instructions differ from those expected by the > no_unique_address tests in gcc.target/arm. > > This patch adds -mno-long-calls to the compilation commands, so that > we generate the expected

Re: -mno-long-calls for mve_libcall tests

2020-12-29 Thread Richard Sandiford via Gcc-patches
Alexandre Oliva writes: > The implicit -mlong-calls used in our vxworks configurations changes > the call sequences from those expected in the mve_libcall testcases. > > This patch brings the test output in line with the expectations, with > an explicit -mno-long-calls. > > Regstrapped on x86_64-l

Re: Add missing vxworks filters to lib/target-supports.exp functions

2020-12-29 Thread Richard Sandiford via Gcc-patches
Alexandre Oliva writes: > Explicitly disable some vxworks-missing features in the testsuite, that > the current feature tests detect as present. > > Regstrapped on x86_64-linux-gnu, and tested with -x-arm-wrs-vxworks7r2. > Ok to install? > > > from Olivier Hainque > for gcc/testsuite/ChangeLog >

[PATCH] vect: Fix bogus alignment assumption in alias checks [PR94994]

2020-12-31 Thread Richard Sandiford via Gcc-patches
This PR is about a case in which the vectoriser was feeding incorrect alignment information to tree-data-ref.c, leading to incorrect runtime alias checks. The alignment was taken from the TREE_TYPE of the DR_REF, which in this case was a COMPONENT_REF with a normally-aligned type. However, the un

[PATCH] recog: Fix a constrain_operands corner case [PR97144]

2020-12-31 Thread Richard Sandiford via Gcc-patches
aarch64's *add3_poly_1 has a pattern with the constraints: "=...,r,&r" "...,0,rk" "...,Uai,Uat" i.e. the penultimate alternative requires operands 0 and 1 to match, but the final alternative does not allow them to match. The register allocators dealt with this correctly, and so used differ

[PATCH] vect: Avoid generating out-of-range shifts [PR98302]

2020-12-31 Thread Richard Sandiford via Gcc-patches
In this testcase we end up with: unsigned long long x = ...; char y = (char) (x << 37); The overwidening pattern realised that only the low 8 bits of x << 37 are needed, but then tried to turn that into: unsigned long long x = ...; char y = (char) x << 37; which gives an out-of-range sh

[pushed] genmodes: Update GET_MODE_MASK when changing NUNITS [PR98214]

2020-12-31 Thread Richard Sandiford via Gcc-patches
The static GET_MODE_MASKs for SVE vectors are based on the static precisions, which in turn are based on 128-bit SVE. The precisions are later updated based on -msve-vector-bits (usually to become variable length), but the GET_MODE_MASK stayed the same. This caused combine to fold: (*_extract:D

Re: [PATCH] recog: Fix a constrain_operands corner case [PR97144]

2020-12-31 Thread Richard Sandiford via Gcc-patches
"H.J. Lu" writes: > On Thu, Dec 31, 2020 at 7:57 AM Richard Sandiford via Gcc-patches > wrote: >> >> aarch64's *add3_poly_1 has a pattern with the constraints: >> >> "=...,r,&r" >> "...,0,rk" >> "...,Uai,

Re: [PATCH] i386: Remove unnecessary clobbers from combine splitters.

2020-12-31 Thread Richard Sandiford via Gcc-patches
Segher Boessenkool writes: > It isn't likely that any other pass would try to create this pattern, > but this isn't guaranteed, and such other passes do not necessarily do > the add-the-clobber (that is specific to combine, even!) Maybe fwprop > could create this insn, or something like Richard's

[PATCH] explow, aarch64: Fix force-Pmode-to-mem for ILP32 [PR97269]

2021-01-04 Thread Richard Sandiford via Gcc-patches
This patch fixes a mode/rtx mismatch for ILP32 targets in: mem = force_const_mem (ptr_mode, imm); where imm can be Pmode rather than ptr_mode. The patch uses convert_memory_address to convert the Pmode address to ptr_mode before the call. However, immediate addresses can in general co

[PATCH] vect, aarch64: Fix alignment units for IFN_MASK* [PR95401]

2021-01-04 Thread Richard Sandiford via Gcc-patches
The IFN_MASK* functions take two leading arguments: a load or store pointer and a “cookie”. The type of the cookie is the type of the access for TBAA purposes (like for MEM_REFs) while the value of the cookie is the alignment of the access. This PR was caused by a disagreement about whether the al

[committed] aarch64: Use the MUL VL form of SVE PRF[BHWD]

2021-01-04 Thread Richard Sandiford via Gcc-patches
The expansions of the svprf[bhwd] instructions weren't taking advantage of the immediate addressing mode. Tested on aarch64-linux-gnu and aarch64_be-elf. Pushed to trunk so far. Will backport to GCC 10 “soon”. Richard gcc/ * config/aarch64/aarch64.c (offset_6bit_signed_scaled_p): New f

[committed] aarch64: Improve vcombine codegen [PR89057]

2021-01-04 Thread Richard Sandiford via Gcc-patches
This patch fixes a codegen regression in the handling of things like: __temp.val[0] \ = vcombine_##funcsuffix (__b.val[0], \ vcreate_##funcsuffix (__AARCH64_UINT64_C

Re: [RFC] AArch64: Have RTL patterns recognize DI extracts from vectors at offset 0 as no-op.

2021-01-04 Thread Richard Sandiford via Gcc-patches
Tamar Christina writes: > Hi All, > > I have been looking into a class of problems where GCC is not recognizing that > a subreg of lane 0 (using little-endian as example) of a vector register and > passing that to an instruction. > > As an example consider > > poly64_t > testcase (uint8x16_t input

Re: [08/23] Add an alternative splay tree implementation

2021-01-04 Thread Richard Sandiford via Gcc-patches
Andreas Schwab writes: > That doesn't build with gcc 4.8: Which subversion are you using? It works for me with stock gcc 4.8.5, which is what I'd used to test the series for C++ compatiblity. Richard > > In file included from ../../gcc/splay-tree-utils.h:491:0, > from ../../gc

Re: [08/23] Add an alternative splay tree implementation

2021-01-04 Thread Richard Sandiford via Gcc-patches
Andreas Schwab writes: > On Jan 04 2021, Richard Sandiford wrote: > >> Andreas Schwab writes: >>> That doesn't build with gcc 4.8: >> >> Which subversion are you using? > > This is 4.8.1. Hmm, OK. I guess that raises the question whether “supporting GCC 4.8” means supporting every patchlevel, o

[PATCH] gimple-isel: Fall back to using vcond_mask [PR98560]

2021-01-06 Thread Richard Sandiford via Gcc-patches
PR98560 is about a case in which the vectoriser initially generates: mask_1 = a < 0; mask_2 = mask_1 & ...; res = VEC_COND_EXPR ; The vectoriser thus expects res to be calculated using vcond_mask. However, we later manage to fold mask_2 to mask_1, leaving: mask_1 = a < 0; res = VEC_CON

[PATCH] gimple-isel: Check whether IFN_VCONDEQ is supported [PR98560]

2021-01-06 Thread Richard Sandiford via Gcc-patches
This patch follows on from the previous one for the PR and makes sure that we can handle == as well as <. Previously we assumed without checking that IFN_VCONDEQ was available if IFN_VCOND or IFN_VCONDU wasn't. The patch also fixes the definition of the IFN_VCOND* functions. The optabs are conver

Re: [PATCH] genemit: Handle `const_double_zero' rtx

2021-01-06 Thread Richard Sandiford via Gcc-patches
"Maciej W. Rozycki" writes: > On Wed, 16 Dec 2020, Maciej W. Rozycki wrote: > >> > CONST_DOUBLE_ATOF ("0", VOIDmode) seems malformed though, and I'd expect >> > it to assert in REAL_MODE_FORMAT (via the format_helper constructor). >> > I'm not sure the patch is strictly safer than the status quo.

Re: [committed] patch to fix PR97978

2021-01-07 Thread Richard Sandiford via Gcc-patches
Vladimir Makarov via Gcc-patches writes: > The following fixes > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97978 > > The patch was successfully bootstrapped on x86-64. Can you explain this a bit more? The assert fires if the register allocation is inconsistent with the conflict information.

Re: [PATCH] gimple-isel: Fall back to using vcond_mask [PR98560]

2021-01-07 Thread Richard Sandiford via Gcc-patches
Richard Biener writes: > On Wed, 6 Jan 2021, Richard Sandiford wrote: > >> PR98560 is about a case in which the vectoriser initially generates: >> >> mask_1 = a < 0; >> mask_2 = mask_1 & ...; >> res = VEC_COND_EXPR ; >> >> The vectoriser thus expects res to be calculated using vcond_mask.

[committed] aarch64: Support conditional unpacked integer unary arithmetic on SVE

2021-01-07 Thread Richard Sandiford via Gcc-patches
This patch extends the conditional unary integer operations from SVE_FULL_I to SVE_I. In each case the type suffix is taken from the element size rather than the container size: this matters for ABS and NEG, but doesn't matter for NOT. Tested on aarch64-linux-gnu and aarch64_be-elf, pushed to tru

Re: [PATCH]AArch64 SVE2: Fix aarch64-sve2-acle-asm tests.

2021-01-08 Thread Richard Sandiford via Gcc-patches
Tamar Christina writes: > Hi All, > > This fixes a logical inconsistency with the SVE2 ACLE tests where the SVE2 > tests > are checking for SVE support in the assembler instead of SVE2. > > This makes all these tests fail when the user has an SVE enabled assembler but > not an SVE2 one. > > Ok fo

[pushed] aarch64: Support conditional unpacked UXT on SVE

2021-01-08 Thread Richard Sandiford via Gcc-patches
This patch extends the conditional UXT patterns from SVE_FULL_I to SVE_I. It doesn't matter in this case whether the type suffix is taken from the element size or the container size. Tested on aarch64-linux-gnu and aarch64_be-elf, pushed to trunk. Richard gcc/ * config/aarch64/aarch64-

[pushed] aarch64: Support unpacked CNOT on SVE

2021-01-08 Thread Richard Sandiford via Gcc-patches
This patch adds unpacked support for unconditional and conditional CNOT. The type suffix has to be taken from the element size rather than the container size. Tested on aarch64-linux-gnu and aarch64_be-elf. Pushed to trunk. Richard gcc/ * config/aarch64/aarch64-sve.md (*cnot): Extend

Re: [PATCH v2] aarch64: Add cpu cost tables for A64FX

2021-01-08 Thread Richard Sandiford via Gcc-patches
Qian Jianhua writes: > This patch add cost tables for A64FX. > > ChangeLog: > 2021-01-08 Qian jianhua > > gcc/ > * config/aarch64/aarch64-cost-tables.h (a64fx_extra_costs): New. > * config/aarch64/aarch64.c (a64fx_addrcost_table): New. > (a64fx_regmove_cost, a64fx_vector_cost):

Re: [PATCH v2] aarch64: Add cpu cost tables for A64FX

2021-01-11 Thread Richard Sandiford via Gcc-patches
"Qian, Jianhua" writes: > Hi Richard > >> -Original Message- >> From: Richard Sandiford >> Sent: Friday, January 8, 2021 7:04 PM >> To: Qian, Jianhua/钱 建华 >> Cc: gcc-patches@gcc.gnu.org >> Subject: Re: [PATCH v2] aarch64: Add cpu cost tables for A64FX >> >> Qian Jianhua writes: >> > Th

[pushed] aarch64: Add support for unpacked SVE shifts

2021-01-11 Thread Richard Sandiford via Gcc-patches
This patch adds support for unpacked SVE LSL, ASR and LSR. For right shifts, the type suffix needs to be taken from the element size rather than the container size. Tested on aarch64-linux-gnu and aarch64_be-elf. Pushed to trunk. Richard gcc/ * config/aarch64/aarch64-sve.md (3)

[pushed] aarch64: Add support for unpacked SVE mult, max and min

2021-01-11 Thread Richard Sandiford via Gcc-patches
This patch makes the SVE_INT_BINARY_IMM patterns support unpacked arithmetic, covering MUL, SMAX, SMIN, UMAX and UMIN. For min and max, the type suffix must be taken from the element size rather than the container size. The XFAILs are due to PR98602. Tested on aarch64-linux-gnu and aarch64_be-elf

[pushed] aarch64: Add general unpacked SVE conditional binary arithmetic

2021-01-11 Thread Richard Sandiford via Gcc-patches
This patch adds support for conditional binary ADD, SUB, MUL, SMAX, UMAX, SMIN, UMIN, LSL, LSR, ASR, AND, ORR and EOR. It's not really possible to split it up further given how the patterns are written. Min, max and right-shift need the element size rather than the container size. The others wou

[pushed] aarch64: Add support for unpacked SVE ADR

2021-01-11 Thread Richard Sandiford via Gcc-patches
This patch extends the ADR patterns to handle unpacked vectors. They would work with both elements and containers, but since the instructions only support .s and .d, we get more coverage by using containers. Tested on aarch64-linux-gnu and aarch64_be-elf. Pushed to trunk. Richard gcc/

[pushed] aarch64: Add support for unpacked SVE ABD

2021-01-11 Thread Richard Sandiford via Gcc-patches
This patch adds support for unpacked SVE SABD and UABD. It also rewrites the patterns so that they match as combine patterns without the need for REG_EQUAL notes. Finally, there was no pattern for merging with the second input, which can be handled by reversing the operands. The type suffix needs

[pushed] aarch64: Add support for unpacked SVE MULH

2021-01-11 Thread Richard Sandiford via Gcc-patches
This patch extends the SMULH and UMULH support to unpacked vectors. The type suffix must be taken from the element size rather than the container size. The main use of these patterns is to support division and modulus by a constant. The conditional forms would be hard to trigger from non-ACLE cod

[pushed] aarch64: Add support for unpacked SVE conditional BIC

2021-01-11 Thread Richard Sandiford via Gcc-patches
This patch adds support for unpacked conditional BIC. The type suffix could be taken from the element size or the container size, so the patch continues to use the element size. This is consistent with the existing support for unconditional BIC. Tested on aarch64-linux-gnu and aarch64_be-elf. P

[pushed] aarch64: Add support for unpacked SVE ASRD

2021-01-11 Thread Richard Sandiford via Gcc-patches
This patch adds support for both conditional and unconditional unpacked ASRD. This meant adding a new define_insn for the unconditional form, instead of reusing the conditional instructions. It also meant extending the current conditional patterns to support merging with any independent value, no

[PATCH] alias: Fix offset checks involving section anchors [PR92294]

2021-01-12 Thread Richard Sandiford via Gcc-patches
This is a repost of: https://gcc.gnu.org/pipermail/gcc-patches/2020-February/539763.html which was initially posted during stage 4. (And yeah, I only just missed stage 4 again.) IMO it would be better to fix the bug directly (as the patch tries to do) instead of wait for a more thorough redes

Re: [PATCH] aarch64 : Mark rotate immediates with '#' as per DDI0487iFc.

2021-01-12 Thread Richard Sandiford via Gcc-patches
Iain Sandoe writes: > Hi, > > The armv8_arm manual [C6.2.226, ROR (immediate)] uses a # in front > of the immediate rotation quantity. > > Although, it seems, GAS is able to infer the # (or is leninent about > its absence) assemblers based on the LLVM back end expect it and error out. > > tested o

Re: [PATCH] aarch64 : Mark rotate immediates with '#' as per DDI0487iFc.

2021-01-12 Thread Richard Sandiford via Gcc-patches
Iain Sandoe writes: > Hi Richard, > > Richard Sandiford wrote: > >> Iain Sandoe writes: > >>> The armv8_arm manual [C6.2.226, ROR (immediate)] uses a # in front >>> of the immediate rotation quantity. >>> >>> Although, it seems, GAS is able to infer the # (or is leninent about >>> its absence) a

[PATCH] sh: Remove match_scratch operand test

2021-01-12 Thread Richard Sandiford via Gcc-patches
This patch fixes a regression on sh4 introduced by the rtl-ssa stuff. The port had a pattern: (define_insn "movsf_ie" [(set (match_operand:SF 0 "general_movdst_operand" "=f,r,f,f,fy, f,m, r, r,m,f,y,y,rf,r,y,<,y,y") (match_operand:SF 1 "general_movsrc_oper

[pushed] rtl-ssa: Fix reversed comparisons in accesses.h comment

2021-01-13 Thread Richard Sandiford via Gcc-patches
Noticed while looking at something else that the comment above def_lookup got the description of the comparisons the wrong way round. Tested on aarch64-linux-gnu and pushed as obvious. Richard gcc/ * rtl-ssa/accesses.h (def_lookup): Fix order of comparison results. --- gcc/rtl-ssa/acce

[pushed] aarch64: Tighten condition on sve/sel* tests

2021-01-13 Thread Richard Sandiford via Gcc-patches
Noticed while testing on a different machine that the sve/sel_*.c tests require .variant_pcs support but don't test for it. .variant_pcs post-dates SVE so there shouldn't be a need to test for both. Tested on aarch64-linux-gnu & pushed. Richard gcc/testsuite/ * gcc.target/aarch64/sve/se

[pushed] aarch64: Add support for unpacked SVE MLA and MAD

2021-01-13 Thread Richard Sandiford via Gcc-patches
This patch extends the MLA/MAD patterns to support unpacked integer vectors. The type suffix could be either the element size or the container size, but using the element size should be more efficient. Tested on aarch64-linux-gnu and aarch64_be-elf, pushed to trunk. Richard gcc/ * conf

[pushed] aarch64: Add support for unpacked SVE MLS and MSB

2021-01-13 Thread Richard Sandiford via Gcc-patches
This patch extends the MLS/MSB patterns to support unpacked integer vectors. The type suffix could be either the element size or the container size, but using the element size should be more efficient. Tested on aarch64-linux-gnu and aarch64_be-elf, pushed to trunk. Richard gcc/ * conf

[PATCH] vect: Account for unused IFN_LOAD_LANES results

2021-01-14 Thread Richard Sandiford via Gcc-patches
At the moment, if we use only one vector of an LD4 result, we'll treat the LD4 as having the cost of a single load. But all 4 loads and any associated permutes take place regardless of which results are actually used. This patch therefore counts the cost of unused LOAD_LANES results against the fi

[pushed] rtl-ssa: Fix a silly typo

2021-01-15 Thread Richard Sandiford via Gcc-patches
s/ref/reg/ on a previously unused function name. Sorry for the blunder. Tested on aarch64-linux-gnu, aarch64_be-elf and x86_64-linux-gnu, pushed as obvious. Richard gcc/ * rtl-ssa/functions.h (function_info::ref_defs): Rename to... (function_info::reg_defs): ...this. *

[pushed] recog: Fix insn_change_watermark destructor

2021-01-15 Thread Richard Sandiford via Gcc-patches
Noticed while working on something else that the insn_change_watermark destructor could call cancel_changes for changes that no longer exist. The loop in cancel_changes is a nop in that case, but: num_changes = num; can mess things up. I think this would only affect nested uses of insn_change_

[pushed] aarch64: Add a minipass for fusing CC insns [PR88836]

2021-01-15 Thread Richard Sandiford via Gcc-patches
This patch adds a small target-specific pass to remove redundant SVE PTEST instructions. There are two important uses of this: - Removing PTESTs after WHILELOs (PR88836). The original testcase no longer exhibits the problem due to more recent optimisations, but it can still be seen in simple

Re: [PATCH]AArch64: Add NEON, SVE and SVE2 RTL patterns for Multiply, FMS and FMA.

2021-01-15 Thread Richard Sandiford via Gcc-patches
Tamar Christina writes: > Hi All, > > This adds implementation for the optabs for complex operations. With this the > following C code: > > void g (float complex a[restrict N], float complex b[restrict N], > float complex c[restrict N]) > { > for (int i=0; i < N; i++) > c[i]

Re: [PATCH] [PR rtl/optimization/98694] Fix incorrect optimization by cprop_hardreg.

2021-01-18 Thread Richard Sandiford via Gcc-patches
Hongtao Liu via Gcc-patches writes: > Hi: > If SRC had been assigned a mode narrower than the copy, we can't link > DEST into the chain even they have same > hard_regno_nregs(i.e. HImode/SImode in i386 backend). In general, changes between modes within the same hard register are OK. Could you e

Re: [PATCH] [PR rtl/optimization/98694] Fix incorrect optimization by cprop_hardreg.

2021-01-18 Thread Richard Sandiford via Gcc-patches
Hongtao Liu writes: > On Mon, Jan 18, 2021 at 6:18 PM Richard Sandiford > wrote: >> >> Hongtao Liu via Gcc-patches writes: >> > Hi: >> > If SRC had been assigned a mode narrower than the copy, we can't link >> > DEST into the chain even they have same >> > hard_regno_nregs(i.e. HImode/SImode i

Re: [PATCH] alias: Fix offset checks involving section anchors [PR92294]

2021-01-18 Thread Richard Sandiford via Gcc-patches
Richard Biener writes: > On Mon, 18 Jan 2021, Jan Hubicka wrote: > >> > This is a repost of: >> > >> > https://gcc.gnu.org/pipermail/gcc-patches/2020-February/539763.html >> > >> > which was initially posted during stage 4. (And yeah, I only just >> > missed stage 4 again.) >> > >> > IMO it

Re: The performance data for two different implementation of new security feature -ftrivial-auto-var-init

2021-01-18 Thread Richard Sandiford via Gcc-patches
Qing Zhao writes: D will keep all initialized aggregates as aggregates and live which means stack will be allocated for it. With A the usual optimizations to reduce stack usage can be applied. >>> >>> I checked the routine “poverties::bump_map” in 511.povray_r since it >>> has a l

Re: [PATCH] alias: Fix offset checks involving section anchors [PR92294]

2021-01-18 Thread Richard Sandiford via Gcc-patches
Jan Hubicka writes: >> >> >> >> Well, in tree-ssa code we do assume these to be either disjoint objects >> >> or equal (in decl_refs_may_alias_p that continues in case >> >> compare_base_decls is -1). I am not sure if we win much by threating >> >> them differently on RTL level. I would preffer

Re: [PATCH] aarch64: reimplement vqmovn_high* intrinsics using builtins

2021-01-18 Thread Richard Sandiford via Gcc-patches
Kyrylo Tkachov via Gcc-patches writes: > diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def > b/gcc/config/aarch64/aarch64-simd-builtins.def > index > 6efc7706a41e02d947753a4cda984159b68bd39f..27e9026d9e8b7ff980c5b8d9ff1b00490e3a18cb > 100644 > --- a/gcc/config/aarch64/aarch64-simd-built

Re: [PATCH] [PR rtl/optimization/98694] Fix incorrect optimization by cprop_hardreg.

2021-01-19 Thread Richard Sandiford via Gcc-patches
Hongtao Liu writes: > On Mon, Jan 18, 2021 at 7:10 PM Richard Sandiford > wrote: >> >> Hongtao Liu writes: >> > On Mon, Jan 18, 2021 at 6:18 PM Richard Sandiford >> > wrote: >> >> >> >> Hongtao Liu via Gcc-patches writes: >> >> > Hi: >> >> > If SRC had been assigned a mode narrower than the

Re: [committed] Skip asm goto test fails on hppa

2021-01-19 Thread Richard Sandiford via Gcc-patches
Hans-Peter Nilsson writes: > On Tue, 19 Jan 2021, Jakub Jelinek wrote: > >> On Mon, Jan 18, 2021 at 11:50:56PM -0500, Hans-Peter Nilsson wrote: >> > On Mon, 18 Jan 2021, John David Anglin wrote: >> > > The hppa target is a reload target and asm goto is not supported on >> > > reload targets. >> >

Re: [PATCH] [PR rtl/optimization/98694] Fix incorrect optimization by cprop_hardreg.

2021-01-19 Thread Richard Sandiford via Gcc-patches
Jakub Jelinek via Gcc-patches writes: > On Tue, Jan 19, 2021 at 12:38:47PM +0000, Richard Sandiford via Gcc-patches > wrote: >> > actually only the lower 16bits are needed, the original insn is like >> > >> > .294.r.ira >> > (insn 69 68 70 13 (set (

Re: [PATCH] aarch64: Use GCC vector extensions for integer mls intrinsics

2021-01-19 Thread Richard Sandiford via Gcc-patches
Jonathan Wright writes: > Hi, > > As subject, this patch rewrites integer mls Neon intrinsics to use > a - b * c rather than inline assembly code, allowing for better > scheduling and optimization. > > Regression tested and bootstrapped on aarch64-none-linux-gnu - no > issues. > > If ok, please co

Re: [PATCH] alias: Fix offset checks involving section anchors [PR92294]

2021-01-19 Thread Richard Sandiford via Gcc-patches
Jan Hubicka writes: >> On Mon, 18 Jan 2021, Richard Sandiford wrote: >> >> > Jan Hubicka writes: >> > >> >> >> > >> >> Well, in tree-ssa code we do assume these to be either disjoint >> > >> >> objects >> > >> >> or equal (in decl_refs_may_alias_p that continues in case >> > >> >> compare_base

[PATCH] vect: Fix VLA SLP invariant optimisation [PR98535]

2021-01-20 Thread Richard Sandiford via Gcc-patches
duplicate_and_interleave is the main fallback way of loading a repeating sequence of elements into variable-length vectors. The code handles cases in which the number of elements in the sequence is potentially several times greater than the number of elements in a vector. Let: - NE be the (compil

Re: [PATCH] [PR rtl/optimization/98694] Fix incorrect optimization by cprop_hardreg.

2021-01-20 Thread Richard Sandiford via Gcc-patches
Hongtao Liu writes: > On Wed, Jan 20, 2021 at 12:10 AM Richard Sandiford > wrote: >> >> Jakub Jelinek via Gcc-patches writes: >> > On Tue, Jan 19, 2021 at 12:38:47PM +, Richard Sandiford via >> > Gcc-patches wrote: >> >> > actually onl

Re: [PATCH] Handle overflow in dependence analysis lambda ops gracefully

2021-01-20 Thread Richard Sandiford via Gcc-patches
Richard Biener writes: > diff --git a/gcc/hwint.h b/gcc/hwint.h > index 127b0130c66..8812bc7150f 100644 > --- a/gcc/hwint.h > +++ b/gcc/hwint.h > @@ -333,4 +333,46 @@ absu_hwi (HOST_WIDE_INT x) >return x >= 0 ? (unsigned HOST_WIDE_INT)x : -(unsigned HOST_WIDE_INT)x; > } > > +/* Compute the

Re: [PATCH] fwprop: Allow (subreg (mem)) simplifications

2021-01-21 Thread Richard Sandiford via Gcc-patches
Ilya Leoshkevich via Gcc-patches writes: > On Tue, 2021-01-19 at 09:41 +0100, Richard Biener wrote: >> On Mon, Jan 18, 2021 at 11:04 PM Ilya Leoshkevich via Gcc-patches >> wrote: >> > >> Suppose we have: >> > >> > (set (reg/v:TF 63) (mem/c:TF (reg/v:DI 62))) >> > (set (reg:FPRX2 66) (su

Re: [GCC9 backport] AArch64: Fix symbol offset limit (PR 98618)

2021-01-21 Thread Richard Sandiford via Gcc-patches
Wilco Dijkstra via Gcc-patches writes: > In aarch64_classify_symbol symbols are allowed large offsets on relocations. > This means the offset can use all of the +/-4GB offset, leaving no offset > available for the symbol itself. This results in relocation overflow and > link-time errors for simpl

Re: [PATCH] fwprop: Allow (subreg (mem)) simplifications

2021-01-21 Thread Richard Sandiford via Gcc-patches
Ilya Leoshkevich writes: > On Thu, 2021-01-21 at 10:49 +, Richard Sandiford wrote: >> What prevents combine from handling this? Are the instructions in >> different blocks? > > I wanted to do this before combine, because in __ieee754_sqrtl case > fwprop turns this (example from the commit mes

Re: [PATCH v2] fwprop: Allow (subreg (mem)) simplifications

2021-01-21 Thread Richard Sandiford via Gcc-patches
Ilya Leoshkevich via Gcc-patches writes: > v1: https://gcc.gnu.org/pipermail/gcc-patches/2021-January/563800.html > > v1 -> v2: Allow (mem) -> (subreg) propagation only for single uses. > > Boostrapped and regtested on x86_64-redhat-linux, ppc64le-redhat-linux > and s390x-redhat-linux. Ok for mas

[committed] testsuite: Extend vector() regexp

2020-11-17 Thread Richard Sandiford via Gcc-patches
For variable-length vectors, the N inside “vector(N) T” can contain the characters ‘[’, ‘]’ and ‘,’. Tested on aarch64-linux-gnu (with and without SVE), arm-linux-gnueabihf and x86_64-linux-gnu. Pushed as obvious. Richard gcc/testsuite/ * gcc.dg/vect/pr91750.c: Allow "[]," inside a vec

[committed] testsuite: Remove XFAIL for variable-length vectors

2020-11-17 Thread Richard Sandiford via Gcc-patches
The XFAIL for variable-length vectors is no longer needed since we can't build the required constant vector and so fall back to fixed-length alternatives. Tested on aarch64-linux-gnu (with and without SVE), arm-linux-gnueabihf and x86_64-linux-gnu. Pushed as obvious. Richard gcc/testsuite/

[committed] testsuite: XFAIL some SLP reduction tests for VLA SVE

2020-11-17 Thread Richard Sandiford via Gcc-patches
For variable-length SVE, we can only use SLP for N scalars of type T if the number of Ts in a vector is a multiple of N. For ints this means that N must be 4 or 2, so this patch XFAILs two tests for N==8. The exact limit seems inherently target-specific -- variable-length vectors with a 256-bit g

[committed] testsuite: XFAIL SLP induction tests for VL vectors

2020-11-17 Thread Richard Sandiford via Gcc-patches
We don't yet support SLP inductions for variable-length vectors, so this patch XFAILs some associated tests. (Inductions aren't inherently difficult to support. It just hasn't been done yet.) Tested on aarch64-linux-gnu (with and without SVE), arm-linux-gnueabihf and x86_64-linux-gnu. Pushed as

[committed] testsuite: Adjust vect/pr65947-8.c for SVE

2020-11-17 Thread Richard Sandiford via Gcc-patches
We can vectorise vect/pr65947-8.c for SVE, as we can for GCN. Tested on aarch64-linux-gnu (with and without SVE), arm-linux-gnueabihf and x86_64-linux-gnu. Pushed as obvious. Richard gcc/testsuite/ * gcc.dg/vect/pr65947-8.c: Expect the loop to be vectorized for SVE. --- gcc/testsuite/

[committed] testsuite: Adjust vect/bb-slp-subgroups-3.c for VL vectors

2020-11-17 Thread Richard Sandiford via Gcc-patches
Because we disable the cost model, targets with variable-length vectors can end up vectorising the store to a[0..7] on its own. With the cost model we do something sensible. Tested on aarch64-linux-gnu (with and without SVE), arm-linux-gnueabihf and x86_64-linux-gnu. Pushed as obvious. Richard

[committed] testsuite: Add a vect_element_align_preferred guard

2020-11-17 Thread Richard Sandiford via Gcc-patches
We don't try to increase the alignment of decls if vect_element_align_preferred. Tested on aarch64-linux-gnu (with and without SVE), arm-linux-gnueabihf and x86_64-linux-gnu. Pushed as obvious. Richard gcc/testsuite/ * gcc.dg/vect/aligned-section-anchors-nest-1.c: XFAIL alignment

[committed] testsuite: Add a vect_load_lanes guard

2020-11-17 Thread Richard Sandiford via Gcc-patches
We still fall back to load/store-lanes for slp-46.c, if the target supports it. Tested on aarch64-linux-gnu (with and without SVE), arm-linux-gnueabihf and x86_64-linux-gnu. Pushed as obvious. Richard gcc/testsuite/ * gcc.dg/vect/slp-46.c: XFAIL test for SLP on vect_load_lanes targets.

[PATCH 1/5] testsuite: Fix vect/vect-sdiv-pow2-1.c

2020-11-17 Thread Richard Sandiford via Gcc-patches
We're now able to vectorise the set-up loop: int p = power2 (fns[i].po2); for (int j = 0; j < N; j++) a[j] = ((p << 4) * j) / (N - 1) - (p << 5); Rather than adjust the expected output for that, it seemed better to disable optimisation for the testing code. Tested on aarch64-

[PATCH 2/5] testsuite: Add a vect_partial_vectors_usage_2 guard

2020-11-17 Thread Richard Sandiford via Gcc-patches
We don't need an epilogue loop if the main loop can operate on partial vectors, so this patch disables an associated test. The alternative would be to force partial-vectors-usage=1 on the command line. Tested on aarch64-linux-gnu (with and without SVE), arm-linux-gnueabihf and x86_64-linux-gnu. O

[PATCH 3/5] testsuite: Add vect_perm3_int guards

2020-11-17 Thread Richard Sandiford via Gcc-patches
SLP vectorisation of gcc.dg/vect/fast-math-vect-call-1.c involves a group of 3 floats, which requires the same permutation as vect_perm3_int. The load/store_lanes XFAILs in gcc.dg/vect/slp-perm-6.c implicitly assumed vect_perm3_int, which is true for Advanced SIMD but not for VLA SVE. Whether it'

[PATCH 4/5] testsuite: Adjust gcc.dg/vect/slp-21.c for Arm targets

2020-11-17 Thread Richard Sandiford via Gcc-patches
On arm* and aarch64* targets, we can vectorise the second of the main loops using SLP, not just the third. As the comments say, whether this is supported depends on a very specific permutation, so it seemed better to use direct target selectors. Tested on aarch64-linux-gnu (with and without SVE),

[PATCH 5/5] testsuite: Adjust bb-slp-pr68892.c for AArch64

2020-11-17 Thread Richard Sandiford via Gcc-patches
AArch64 passes the "not profitable" test because it treats vec_construct as having a high-enough cost. This means that we can try other vector modes, which in turn causes "BB vectorization with gaps at the end of a load is not supported" to be printed more than once. The number of times that we p

[committed] PR97693: Specify required vectype in vectorizable_call

2020-11-17 Thread Richard Sandiford via Gcc-patches
The vectorizable_call part of r11-1143 dropped the required vectype when moving from vect_get_vec_def_for_operand to vect_get_vec_defs_for_operand. This caused an ICE on the testcase for SVE, because we ended up with a non-predicate value being passed to a predicate input. AFAICT this was the onl

[committed] aarch64: Remove XFAILs for two SVE tests

2020-11-17 Thread Richard Sandiford via Gcc-patches
These tests started passing a while ago, so remove the XFAILs. Tested on aarch64-linux-gnu, pushed to trunk. Richard gcc/testsuite/ * gcc.target/aarch64/sve/cond_cnot_1.c: Remove XFAIL. * gcc.target/aarch64/sve/cond_unary_1.c: Likewise. --- gcc/testsuite/gcc.target/aarch64/sve/

Re: [PATCH]AArch64[GCC-8] Fix overflow in memcopy expansion on aarch64.

2020-11-17 Thread Richard Sandiford via Gcc-patches
Tamar Christina writes: > Hi All, > > This a partial backport for 0f801e0b6cc9f67c9a8983127e23161f6025c5b6 which > fixes > a truncation error for the inline memcopy on AArch64 on GCC-8. > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. > > Ok for GCC-8? OK, thanks. Richard >

Re: [PATCH] Add MODE_OPAQUE

2020-11-17 Thread Richard Sandiford via Gcc-patches
acsaw...@linux.ibm.com writes: > From: Aaron Sawdey > > Richard, > Thanks for the review. I think I have resolved everything, as follows: > > * I was able to remove the const_tiny_rtx initialization for > MODE_OPAQUE. If that becomes a problem it's a pretty simple matter to > use an UNSPEC to a

<    11   12   13   14   15   16   17   18   19   20   >