[PATCH] aarch64: Define __ARM_FEATURE_RCPC

2022-10-04 Thread Richard Sandiford via Gcc-patches
https://github.com/ARM-software/acle/pull/199 adds a new feature macro for RCPC, for use in things like inline assembly. This patch adds the associated support to GCC. Also, RCPC is required for Armv8.3-A and later, but the armv8.3-a entry didn't include it. This was probably harmless in practic

Re: [PATCH] aarch64: update Ampere-1 core definition

2022-10-04 Thread Richard Sandiford via Gcc-patches
Philipp Tomsich writes: > This brings the extensions detected by -mcpu=native on Ampere-1 systems > in sync with the defaults generated for -mcpu=ampere1. > > Note that some kernel versions may misreport the presence of PAUTH and > PREDRES (i.e., -mcpu=native will add 'nopauth' and 'nopredres'). >

Re: [PATCH][AArch64] Improve immediate expansion [PR106583]

2022-10-05 Thread Richard Sandiford via Gcc-patches
Wilco Dijkstra writes: > Improve immediate expansion of immediates which can be created from a > bitmask immediate and 2 MOVKs. This reduces the number of 4-instruction > immediates in SPECINT/FP by 10-15%. > > Passes regress, OK for commit? > > gcc/ChangeLog: > > PR target/106583 >

Re: [PATCH][AArch64] Improve bit tests [PR105773]

2022-10-05 Thread Richard Sandiford via Gcc-patches
Wilco Dijkstra writes: > Since AArch64 sets all flags on logical operations, comparisons with zero > can be combined into an AND even if the condition is LE or GT. > > Passes regress, OK for commit? > > gcc: > PR target/105773 > * config/aarch64/aarch64.cc (aarch64_select_cc_mode):

Re: [PATCH v2] aarch64: fix off-by-one in reading cpuinfo

2022-10-06 Thread Richard Sandiford via Gcc-patches
Philipp Tomsich writes: > Fixes: 341573406b39 > > Don't subtract one from the result of strnlen() when trying to point > to the first character after the current string. This issue would > cause individual characters (where the 128 byte buffers are stitched > together) to be lost. > > gcc/ChangeL

Re: [PATCH v2] aarch64: update Ampere-1 core definition

2022-10-06 Thread Richard Sandiford via Gcc-patches
Philipp Tomsich writes: > This brings the extensions detected by -mcpu=native on Ampere-1 systems > in sync with the defaults generated for -mcpu=ampere1. > > Note that some early kernel versions on Ampere1 may misreport the > presence of PAUTH and PREDRES (i.e., -mcpu=native will add 'nopauth' >

Re: [PATCH][AArch64] Improve immediate expansion [PR106583]

2022-10-07 Thread Richard Sandiford via Gcc-patches
Wilco Dijkstra via Gcc-patches writes: > Hi Richard, > >> Did you consider handling the case where the movks aren't for >> consecutive bitranges?  E.g. the patch handles: > >> but it looks like it would be fairly easy to extend it to: >> >>  0x12345678 > > Yes, with a more general search l

Re: [PATCH][AArch64] Improve immediate expansion [PR106583]

2022-10-07 Thread Richard Sandiford via Gcc-patches
Wilco Dijkstra writes: > Hi Richard, > >>> Yes, with a more general search loop we can get that case too - >>> it doesn't trigger much though. The code that checks for this is >>> now refactored into a new function. Given there are now many >>> more calls to aarch64_bitmask_imm, I added a streamli

Re: [PATCH][RFT] Vectorization of first-order recurrences

2022-10-11 Thread Richard Sandiford via Gcc-patches
Richard Biener writes: > On Mon, 10 Oct 2022, Andrew Stubbs wrote: >> On 10/10/2022 12:03, Richard Biener wrote: >> > The following picks up the prototype by Ju-Zhe Zhong for vectorizing >> > first order recurrences. That solves two TSVC missed optimization PRs. >> > >> > There's a new scalar cy

Re: [PATCH][RFT] Vectorization of first-order recurrences

2022-10-12 Thread Richard Sandiford via Gcc-patches
Richard Biener writes: > + /* First-order recurrence autovectorization needs to handle permutation > + with indices = [nunits-1, nunits, nunits+1, ...]. */ > + vec_perm_builder sel (nunits, 1, 3); > + for (int i = 0; i < 3; ++i) > +sel.quick_push (nunits - dist + i); > + vec_perm_indi

Re: [PATCH] machmode: Introduce GET_MODE_NEXT_MODE with previous GET_MODE_WIDER_MODE meaning, add new GET_MODE_WIDER_MODE

2022-10-12 Thread Richard Sandiford via Gcc-patches
Jakub Jelinek writes: > On Wed, Oct 05, 2022 at 04:02:25PM -0400, Jason Merrill wrote: >> > > > @@ -5716,7 +5716,13 @@ emit_store_flag_1 (rtx target, enum rtx_ >> > > >{ >> > > > machine_mode optab_mode = mclass == MODE_CC ? CCmode : >> > > > compare_mode; >> > > > icode =

[PATCH] gomp: Various fixes for SVE types [PR101018]

2022-03-02 Thread Richard Sandiford via Gcc-patches
Various parts of the omp code checked whether the size of a decl was an INTEGER_CST in order to determine whether the decl was variable-sized or not. If it was variable-sized, it was expected to have a DECL_VALUE_EXPR replacement, as for VLAs. This patch uses poly_int_tree_p instead, so that vari

Re: [PATCH v8 02/12] LoongArch Port: gcc build

2022-03-06 Thread Richard Sandiford via Gcc-patches
Hi, Thanks for the submission. Some comments below on this patch, but otherwise it looks good. I hope to get to the other patches in the series soon. I haven't followed all of the previous discussion, so some of these points might already have been discussed. Sorry in advance if so. xucheng..

Re: [PATCH v8 04/12] LoongArch Port: Machine description files.

2022-03-06 Thread Richard Sandiford via Gcc-patches
Hi, Some comments below, but otherwise it looks good to me. xucheng...@loongson.cn writes: > […] > +(define_memory_constraint "k" > + "A memory operand whose address is formed by a base register and > (optionally scaled) > + index register." > + (and (match_code "mem") > + (not (match_

Re: [PATCH v8 05/12] LoongArch Port: Machine description C files and .h files.

2022-03-07 Thread Richard Sandiford via Gcc-patches
Hi, Some comments below, but otherwise it looks good to me. A few of the comments are about removing hook or macro definitions that are the same as the default. Doing that helps people who want to update a hook interface in future, since there are then fewer places to adjust. xucheng...@loongso

Re: [PATCH][RFC] tree-optimization/84201 - add --param vect-induction-float

2022-03-08 Thread Richard Sandiford via Gcc-patches
Richard Biener writes: > This adds a --param to allow disabling of vectorization of > floating point inductions. Ontop of -Ofast this should allow > 549.fotonik3d_r to not miscompare. > > While I thought of a more elaborate way of disabling certain > vectorization kinds (reductions also came to m

Re: [PATCH] mips: avoid signed overflow in LUI_OPERAND [PR104842]

2022-03-08 Thread Richard Sandiford via Gcc-patches
Xi Ruoyao writes: > I think this one obvious. Ok for trunk? OK, thanks. Richard > > gcc/ > > PR target/104842 > * config/mips/mips.h (LUI_OPERAND): Cast the input to an unsigned > value before adding an offset. > --- > gcc/config/mips/mips.h | 2 +- > 1 file changed, 1 inser

Re: [PATCH v8 06/12] LoongArch Port: Builtin functions.

2022-03-08 Thread Richard Sandiford via Gcc-patches
xucheng...@loongson.cn writes: > +#ifndef _GCC_LOONGARCH_BASE_INTRIN_H > +#define _GCC_LOONGARCH_BASE_INTRIN_H > + > +#ifdef __cplusplus > +extern "C" { > +#endif > + > +typedef struct drdtime > +{ > + unsigned long dvalue; > + unsigned long dtimeid; > +} __drdtime_t; > + > +typedef struct rdtime

Re: [PATCH v8 08/12] LoongArch Port: libgcc

2022-03-08 Thread Richard Sandiford via Gcc-patches
xucheng...@loongson.cn writes: > diff --git a/libgcc/config/loongarch/crti.S b/libgcc/config/loongarch/crti.S > new file mode 100644 > index 000..27b7eab3626 > --- /dev/null > +++ b/libgcc/config/loongarch/crti.S > @@ -0,0 +1,43 @@ > +/* Copyright (C) 2021-2022 Free Software Foundation, Inc

Re: [PATCH v8 11/12] LoongArch Port: gcc/testsuite

2022-03-08 Thread Richard Sandiford via Gcc-patches
xucheng...@loongson.cn writes: > diff --git a/gcc/testsuite/lib/target-supports.exp > b/gcc/testsuite/lib/target-supports.exp > index 737e1a8913b..843b508b010 100644 > --- a/gcc/testsuite/lib/target-supports.exp > +++ b/gcc/testsuite/lib/target-supports.exp > @@ -286,6 +286,10 @@ proc check_config

Re: [PATCH v8 12/12] LoongArch Port: Add doc.

2022-03-08 Thread Richard Sandiford via Gcc-patches
xucheng...@loongson.cn writes: > From: chenglulu > > 2022-03-04 Chenghua Xu > Lulu Cheng > > * contrib/config-list.mk: Add LoongArch triplet. > * gcc/doc/install.texi: Add LoongArch options section. > * gcc/doc/invoke.texi: Add LoongArch options section. > *

Re: [PATCH v8 00/12] Add LoongArch support.

2022-03-08 Thread Richard Sandiford via Gcc-patches
Xi Ruoyao via Gcc-patches writes: > On Fri, 2022-03-04 at 15:17 +0800, xucheng...@loongson.cn wrote: > >> The binutils has been merged into trunk: >> https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=560b3fe208255ae909b4b1c88ba9c28b09043307 >> >> Note: We split -mabi= into -mabi=lp64d/f/s

Re: [PATCH v2] Add TARGET_MOVE_WITH_MODE_P

2022-03-09 Thread Richard Sandiford via Gcc-patches
Richard Biener via Gcc-patches writes: > On Wed, Mar 2, 2022 at 10:18 PM H.J. Lu wrote: >> >> On Wed, Mar 02, 2022 at 09:51:26AM +0100, Richard Biener wrote: >> > On Tue, Mar 1, 2022 at 11:41 PM H.J. Lu via Gcc-patches >> > wrote: >> > > >> > > Add TARGET_FOLD_MEMCPY_MAX for the maximum number o

Re: [PATCH RFC] mips: add TARGET_ZERO_CALL_USED_REGS hook [PR104817, PR104820]

2022-03-09 Thread Richard Sandiford via Gcc-patches
Xi Ruoyao writes: > Bootstrapped and regtested on mips64el-linux-gnuabi64. > > I'm not sure if it's "correct" to clobber other registers during the > zeroing of scratch registers. But I can't really come up with a better > idea: on MIPS there is no simple way to clear one bit in FCSR (i. e. > FCC

Re: PING**4 - [PATCH] middle-end: Support ABIs that pass FP values as wider integers.

2022-03-14 Thread Richard Sandiford via Gcc-patches
"Roger Sayle" writes: > Hi Richard, >> Yes, which is why I think the target should claim argument passing happens > in reg:HI. > > Unfortunately, this hits another "feature" of the nvptx backend; it's a > > /* Implement TARGET_MODES_TIEABLE_P. */ > bool nvptx_modes_tieable_p (machine_mode, machi

Re: [PATCH v2] Add TARGET_MOVE_WITH_MODE_P

2022-03-14 Thread Richard Sandiford via Gcc-patches
Richard Biener writes: > On Wed, Mar 9, 2022 at 7:04 PM Richard Sandiford > wrote: >> >> Richard Biener via Gcc-patches writes: >> > On Wed, Mar 2, 2022 at 10:18 PM H.J. Lu wrote: >> >> >> >> On Wed, Mar 02, 2022 at 09:51:26AM +0100, Richard Biener wrote: >> >> > On Tue, Mar 1, 2022 at 11:41 PM

Re: [PATCH RFC] mips: add TARGET_ZERO_CALL_USED_REGS hook [PR104817, PR104820]

2022-03-14 Thread Richard Sandiford via Gcc-patches
Sorry for the slow response, was out for a few days. Xi Ruoyao writes: > On Sat, 2022-03-12 at 18:48 +0800, Xi Ruoyao via Gcc-patches wrote: >> On Fri, 2022-03-11 at 21:26 +, Qing Zhao wrote: >> > Hi, Ruoyao, >> > >> > (I might not be able to reply to this thread till next Wed due to a >> >

Re: [PATCH 1/2] libsanitizer: cherry-pick db7bca28638e from upstream

2022-03-14 Thread Richard Sandiford via Gcc-patches
Xi Ruoyao writes: > libsanitizer/ > > * sanitizer_common/sanitizer_atomic_clang.h: Ensures to only > include sanitizer_atomic_clang_mips.h for O32. OK, thanks. Richard > --- > libsanitizer/sanitizer_common/sanitizer_atomic_clang.h | 4 ++-- > 1 file changed, 2 insertions(+), 2 dele

Re: [PATCH 2/2] Enable libsanitizer build on mips64

2022-03-14 Thread Richard Sandiford via Gcc-patches
Xi Ruoyao writes: > Bootstrapped and regtested on mips64el-linux-gnuabi64. > > bootstrap-ubsan revealed 3 bugs (PR 104842, 104843, 104851). > bootstrap-asan did not reveal any new bug. > > gcc/ > > * config/mips/mips.h (SUBTARGET_SHADOW_OFFSET): Define. > * config/mips/mips.cc (mips_op

Re: [PATCH] aarch64: Fix up RTL sharing bug in aarch64_load_symref_appropriately [PR104910]

2022-03-16 Thread Richard Sandiford via Gcc-patches
Jakub Jelinek writes: > Hi! > > We unshare all RTL created during expansion, but when > aarch64_load_symref_appropriately is called after expansion like in the > following testcases, we use imm in both HIGH and LO_SUM operands. > If imm is some RTL that shouldn't be shared like a non-sharable CONS

Re: [PATCH] Ignore (possible) signed zeros in operands of FP comparisons.

2022-03-16 Thread Richard Sandiford via Gcc-patches
Richard Biener via Gcc-patches writes: > On Mon, Mar 14, 2022 at 8:26 PM Roger Sayle > wrote: >> I've been wondering about the possible performance/missed-optimization >> impact of my patch for PR middle-end/98420 and similar IEEE correctness >> fixes that disable constant folding optimizations

Re: [aarch64] Add Neoverse N2 tuning structs

2022-03-16 Thread Richard Sandiford via Gcc-patches
"Andre Vieira (lists)" writes: > Hi, > > This patch adds tuning structures for Neoverse N2. > > 2022-03-16  Tamar Christina  >                Andre Vieira > >     * config/aarch64/aarch64.cc (neoversen2_addrcost_table, > neoversen2_regmove_cost, >     neoversen2_advsimd_vector_cost,

Re: [aarch64] Add Demeter tuning structs

2022-03-16 Thread Richard Sandiford via Gcc-patches
"Andre Vieira (lists)" writes: > Hi, > > This patch adds tuning structs for -mcpu/-mtune=demeter. > > > 2022-03-16  Tamar Christina  >    Andre Vieira > >     * config/aarch64/aarch64.cc (demeter_addrcost_table, > demeter_regmove_cost, >     demeter_advsimd_vector

Re: [aarch64] Update reg-costs to differentiate between memmove costs

2022-03-16 Thread Richard Sandiford via Gcc-patches
"Andre Vieira (lists)" writes: > This patch introduces a struct to differentiate between different > memmove costs to enable a better modeling of memory operations. These > have been modelled for > -mcpu/-mtune=neoverse-v1/neoverse-n1/neoverse-n2/neoverse-512tvb, for > all other tunings all en

Re: [aarch64] Update regmove costs for neoverse-v1 and neoverse-512tvb tunings

2022-03-16 Thread Richard Sandiford via Gcc-patches
"Andre Vieira (lists)" writes: > Hi, > > This patch updates the register move tunings for > -mcpu/-mtune={neoverse-v1,neoverse-512tvb}. > > 2022-03-16  Tamar Christina  >    Andre Vieira > >     * config/aarch64/aarch64.cc (neoversev1_regmove_cost): New > tuning struc

Re: [aarch64] Implement determine_suggested_unroll_factor

2022-03-16 Thread Richard Sandiford via Gcc-patches
"Andre Vieira (lists)" writes: > Hi, > > This patch implements the costing function > determine_suggested_unroll_factor for aarch64. > It determines the unrolling factor by dividing the number of X > operations we can do per cycle by the number of X operations in the loop > body, taking this in

Re: [PATCH RFC] mips: add TARGET_ZERO_CALL_USED_REGS hook [PR104817, PR104820]

2022-03-18 Thread Richard Sandiford via Gcc-patches
Xi Ruoyao writes: >> >> If we have to go this way, I think it’s better to make the change you >> suggested above, >> and then also update the documentation, both internal documentation on >> how to define >>  the hook and the user level documentation on what the user might >> expect when using

Re: [PATCH] [GIMPLE FE] allow to unit test loop passes

2022-03-18 Thread Richard Sandiford via Gcc-patches
Richard Biener writes: > The following arranges for the GIMPLE frontend to parse an > additional loops(...) specification, currently consisting of > 'normal' and 'lcssa'. The intent is to allow unit testing > of passes inside the GIMPLE loop optimization pipeline where > we keep the IL in loop-cl

Re: [PATCH v9 04/12] LoongArch Port: Machine description files.

2022-03-20 Thread Richard Sandiford via Gcc-patches
Thanks, this addresses most of my comments from the v8 review. There were a couple left over though: chenglulu writes: > +(define_attr "compression" "none,all" > + (const_string "none")) I still don't understand the purpose of keeping this for LoongArch. > +(define_insn "truncdisi2_extended" >

[PATCH] rtl-ssa: Fix prev/next_def confusion [PR104869]

2022-03-20 Thread Richard Sandiford via Gcc-patches
rtl-ssa chains definitions into an RPO list. It also groups sequences of clobbers together into a single node, so that it's possible to skip over the clobbers in constant time in order to get the next or previous set. When adding a clobber to an insn, the main DF barriers for that clobber are the

Re: [PATCH v9 06/12] LoongArch Port: Builtin functions.

2022-03-22 Thread Richard Sandiford via Gcc-patches
Hi, Thanks for the update. It looks like there are some unaddressed comments from the v8 review: chenglulu writes: > gcc/ > > * config/loongarch/larchintrin.h: New file. > * config/loongarch/loongarch-builtins.cc: New file. > --- > gcc/config/loongarch/larchintrin.h | 409 +

Re: [PATCH v9 11/12] LoongArch Port: gcc/testsuite

2022-03-22 Thread Richard Sandiford via Gcc-patches
chenglulu writes: > diff --git a/gcc/testsuite/lib/target-supports.exp > b/gcc/testsuite/lib/target-supports.exp > index 737e1a8913b..843b508b010 100644 > --- a/gcc/testsuite/lib/target-supports.exp > +++ b/gcc/testsuite/lib/target-supports.exp > @@ -286,6 +286,10 @@ proc check_configured_with {

Re: [PATCH v9 12/12] LoongArch Port: Add doc.

2022-03-22 Thread Richard Sandiford via Gcc-patches
chenglulu writes: > +@item -msmall-data-limit=@var{number} > +@opindex -msmall-data-limit > +Put global and static data smaller than @code{number} bytes into a special > +section (on some targets). The default value is 0. One minor left-over from v8: this should be @var{number} rather than @code

Re: [PATCH v9 00/12] Add LoongArch support.

2022-03-22 Thread Richard Sandiford via Gcc-patches
chenglulu writes: > Hi, all: > > This is the v9 version of LoongArch Port based on > 9fc8f278ebebc57537dc0cb9d33e36d932be0bc3. > Please review. Thanks for the update. I've sent follows-up for parts 4, 6, 11 and 12, but otherwise v9 addresses all the comments I had. The series LGTM with those i

Re: [PATCH v10 00/12] Add LoongArch support.

2022-03-28 Thread Richard Sandiford via Gcc-patches
chenglulu writes: > Hi, all: > > This is the v10 version of LoongArch Port based on > d1ca63a1b7d5986913b14567a4950b055a5a3f07. OK for trunk. Thanks for the updates. Richard > Please review. > > We know it is stage4, I think it is ok for a new prot. > The kernel side upstream waiting for a a

Re: [PATCH][RFC] tree-optimization/101908 - avoid STLF fails when vectorizing

2022-03-28 Thread Richard Sandiford via Gcc-patches
Richard Biener writes: > Since we're now vectorizing by default at -O2 issues like PR101908 > become more important where we apply basic-block vectorization to > parts of the function covering loads from function parameters passed > on the stack. Since we have no good idea how the stack pushing >

Re: [PATCH][RFC] tree-optimization/101908 - avoid STLF fails when vectorizing

2022-03-28 Thread Richard Sandiford via Gcc-patches
Richard Biener writes: > On Mon, 28 Mar 2022, Richard Sandiford wrote: > >> Richard Biener writes: >> > Since we're now vectorizing by default at -O2 issues like PR101908 >> > become more important where we apply basic-block vectorization to >> > parts of the function covering loads from function

Re: [aarch64] Implement determine_suggested_unroll_factor

2022-03-28 Thread Richard Sandiford via Gcc-patches
"Andre Vieira (lists)" writes: > Hi, > > Addressed all of your comments bar the pred ops one. > > Is this OK? > > > gcc/ChangeLog: > >     * config/aarch64/aarch64.cc (aarch64_vector_costs): Define > determine_suggested_unroll_factor and m_nosve_pattern. >     (determine_suggested_unrol

Re: Test for linking for arm/size-optimization-ieee-[123].c

2022-03-31 Thread Richard Sandiford via Gcc-patches
Alexandre Oliva via Gcc-patches writes: > These tests require a target that supports arm soft-float. The > problem is that the test checks for compile-time soft-float support, > but they may hit a problem when the linker complains that it can't > combine the testcase's object file with hard-float

Re: try multi dest registers in default_zero_call_used_regs

2022-03-31 Thread Richard Sandiford via Gcc-patches
Alexandre Oliva via Gcc-patches writes: > When the mode of regno_reg_rtx is not hard_regno_mode_ok for the > target, try grouping the register with subsequent ones. This enables > s16 to s31 and their hidden pairs to be zeroed with the default logic > on some arm variants. > > Regstrapped on x86_

Re: [PATCH] tree-optimization/104912 - ensure cost model is checked first

2022-03-31 Thread Richard Sandiford via Gcc-patches
Richard Biener writes: > The following makes sure that when we build the versioning condition > for vectorization including the cost model check, we check for the > cost model and branch over other versioning checks. That is what > the cost modeling assumes, since the cost model check is the only

Re: [patch]update the documentation for TARGET_ZERO_CALL_USED_REGS hook and add an assertion

2022-03-31 Thread Richard Sandiford via Gcc-patches
Qing Zhao writes: > Hi, > > Per our discussion on: > https://gcc.gnu.org/pipermail/gcc-patches/2022-March/592002.html > > I come up with the following patch to: > > 1. Update the documentation for TARGET_ZERO_CALL_USED_REGS hook; > 2. Add an assertion in function.cc to make sure the actually zer

Re: [PATCH] aarch64: Fix aarch64-tune.md (re)generation [PR105144]

2022-04-04 Thread Richard Sandiford via Gcc-patches
Jakub Jelinek writes: > Hi! > > As I wrote in the PR, our Fedora trunk gcc builds likely after r12-7842 > change are now failing (lto1 crashes). > What happens is that when one bootstraps into an empty build directory > (or set of them), mddeps.mk doesn't exist yet and so Makefile doesn't > includ

Re: [PATCH] aarch64: Restrict aarch64-tune.md regeneration to --enable-maintainer-mode [PR105144]

2022-04-04 Thread Richard Sandiford via Gcc-patches
Jakub Jelinek writes: > Hi! > > Normally updates to the source directory files are guarded with > --enable-maintainer-mode, e.g. we don't regenerate configure, config.h, > Makefile.in in directories that use automake etc. unless gcc is configured > that way. Otherwise the source tree can't be e.g

Re: try multi dest registers in default_zero_call_used_regs

2022-04-04 Thread Richard Sandiford via Gcc-patches
Alexandre Oliva writes: > Hello, Richard, > > Thanks for the review! > > On Mar 31, 2022, Richard Sandiford wrote: > >>> + /* If the natural mode doesn't work, try some wider mode. */ >>> + if (!targetm.hard_regno_mode_ok (regno, mode)) >>> + { >>> + for (int nregs = 2; >>> +

[PATCH] vect: Fix mask handling for SLP gathers [PR103761]

2022-04-05 Thread Richard Sandiford via Gcc-patches
check_load_store_for_partial_vectors predates the support for SLP gathers and so had a hard-coded assumption that gathers/scatters (and load/stores lanes) would be non-SLP operations. This patch passes down the slp_node so that the routine can work out how many vectors are needed in both the SLP a

[pushed] aarch64: Use error_n for plural text [PR104897]

2022-04-05 Thread Richard Sandiford via Gcc-patches
Use error_n rather than error_at for “%d vectors”, so that translators can pick different translations based on the number (2 vs more than 2, etc.) Tested on aarch64-linux-gnu & pushed. Richard gcc/ PR target/104897 * config/aarch64/aarch64-sve-builtins.cc (function_reso

[pushed] aarch64: Fix -fpack-struct + [PR103147]

2022-04-05 Thread Richard Sandiford via Gcc-patches
This PR is about -fpack-struct causing a crash when is included. The new register_tuple_type code was expecting a normal unpacked structure layout instead of a packed one. For SVE we got around this by temporarily suppressing -fpack-struct, so that the tuple types always have their normal ABI.

[pushed] aarch64: Stop +mops clobbering variable values

2022-04-05 Thread Richard Sandiford via Gcc-patches
The mops cpy* patterns take three registers: a destination address, a source address, and a size. The patterns clobber all three registers as part of the operation. The set* patterns take a destination address, a size, and a store value, and they clobber the first two registers as part of the ope

Re: [PATCH] gimple.cc: Adjust gimple_call_builtin_p and gimple_call_combined_fn [PR105150]

2022-04-06 Thread Richard Sandiford via Gcc-patches
Richard Biener via Gcc-patches writes: > On Wed, 6 Apr 2022, Jakub Jelinek wrote: > >> On Wed, Apr 06, 2022 at 08:13:24AM +0200, Richard Biener wrote: >> > On Tue, 5 Apr 2022, Jakub Jelinek wrote: >> > >> > > On Tue, Apr 05, 2022 at 11:28:53AM +0200, Richard Biener wrote: >> > > > > In GIMPLE, we

Re: [PATCH] gimple.cc: Adjust gimple_call_builtin_p and gimple_call_combined_fn [PR105150]

2022-04-06 Thread Richard Sandiford via Gcc-patches
Jakub Jelinek writes: > On Wed, Apr 06, 2022 at 11:52:23AM +0200, Richard Biener wrote: >> On Wed, 6 Apr 2022, Jakub Jelinek wrote: >> >> > On Wed, Apr 06, 2022 at 09:41:44AM +0100, Richard Sandiford wrote: >> > > But it seems like the magic incantation to detect “real” built-in >> > > function c

Re: [PATCH]AArch64 fix ls64 intrinsics expansion [PR104409]

2022-04-07 Thread Richard Sandiford via Gcc-patches
Tamar Christina writes: > Hi All, > > The LS64 intrinsics used a machinery that's not safe to use unless being > called from a pragma instantiation. > > This moves the initialization code to a new pragma for arm_acle.h. > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. > > I didn

Re: [AArch64] PR target/105157 Increase number of cores TARGET_CPU_DEFAULT can encode

2022-04-08 Thread Richard Sandiford via Gcc-patches
"Andre Vieira (lists)" writes: > Hi, > > This addresses the compile-time increase seen in the PR target/105157. > This was being caused by selecting the wrong core tuning, as when we > added the latest AArch64 the TARGET_CPU_generic tuning was pushed beyond > the 0x3f mask we used to encode bot

Re: [AArch64] PR target/105157 Increase number of cores TARGET_CPU_DEFAULT can encode

2022-04-08 Thread Richard Sandiford via Gcc-patches
"Andre Vieira (lists)" writes: > On 08/04/2022 08:04, Richard Sandiford wrote: >> I think this would be better as a static assert at the top level: >> >>static_assert (TARGET_CPU_generic < TARGET_CPU_MASK, >> "TARGET_CPU_NBITS is big enough"); > The motivation being that you want

Re: [PATCH] mips: testsuite: enforce -ffat-lto-objects for pr102024-4.c

2022-04-11 Thread Richard Sandiford via Gcc-patches
Xi Ruoyao writes: > Another brown paper bag fix for MIPS :(. > > This failure was not detected running mips.exp=pr102024-* with a cross > compiler, so I just spotted it now running the test natively. > > --- > > The body of func is optimized away with -flto -fno-fat-lto-objects, so > the psABI inf

Re: [PING] AArch64: add R30_REGNUM into shrink-wrapping separate

2022-04-12 Thread Richard Sandiford via Gcc-patches
Dan Li writes: > Gentile ping for this :), thanks. > > Link: https://gcc.gnu.org/pipermail/gcc-patches/2022-February/590906.html Sorry, I should have realised this at the time, but I don't think we can do this after all. The ABI requires us to set up the frame chain before assigning to the frame

Re: [PATCH] tree-optimization/105250 - adjust fold_convertible_p PR105140 fix

2022-04-13 Thread Richard Sandiford via Gcc-patches
Richard Biener via Gcc-patches writes: > The following reverts the original PR105140 fix and goes for instead > applying the additional fold_convert constraint for VECTOR_TYPE > conversions also to fold_convertible_p. I did not try sanitizing > all of this at this point. > > Bootstrapped on x86_6

Re: [PATCH] tree-optimization/105250 - adjust fold_convertible_p PR105140 fix

2022-04-13 Thread Richard Sandiford via Gcc-patches
Richard Biener writes: > On Wed, 13 Apr 2022, Richard Sandiford wrote: > >> Richard Biener via Gcc-patches writes: >> > The following reverts the original PR105140 fix and goes for instead >> > applying the additional fold_convert constraint for VECTOR_TYPE >> > conversions also to fold_convertib

[pushed] aarch64: Make sure the UF divides the VF [PR105254]

2022-04-13 Thread Richard Sandiford via Gcc-patches
In this PR, we were trying to set the unroll factor to a value higher than the minimum VF (or more specifically, to a value that doesn't divide the VF). I guess there are two approaches to this: let the target pick any value it likes and make target-independent code pare it back to something that

Re: [PATCH] tree-optimization/104010 - fix SLP scalar costing with patterns

2022-04-14 Thread Richard Sandiford via Gcc-patches
Richard Biener writes: > When doing BB vectorization the scalar cost compute is derailed > by patterns, causing lanes to be considered live and thus not > costed on the scalar side. For the testcase in PR104010 this > prevents vectorization which was done by GCC 11. PR103941 > shows similar case

Re: [PATCH] tree-optimization/104010 - fix SLP scalar costing with patterns

2022-04-14 Thread Richard Sandiford via Gcc-patches
Richard Biener writes: > On Thu, 14 Apr 2022, Richard Sandiford wrote: > >> Richard Biener writes: >> > When doing BB vectorization the scalar cost compute is derailed >> > by patterns, causing lanes to be considered live and thus not >> > costed on the scalar side. For the testcase in PR104010

Re: [PATCH] testsuite: Adjust possibly fragile slp-perm-9.c [PR104015]

2022-01-18 Thread Richard Sandiford via Gcc-patches
"Kewen.Lin" writes: > on 2022/1/18 锟斤拷锟斤拷11:06, Kewen.Lin via Gcc-patches wrote: >> Hi, >> >> As discussed in PR104015, the test case slp-perm-9.c can be >> fragile when vectorizer tries to use different vectorisation >> strategies. >> >> As Richard suggested, this patch tries to make the check

[pushed] aarch64: Fix overly optimistic LDP/STP matching [PR104005]

2022-01-18 Thread Richard Sandiford via Gcc-patches
In g:526e1639aa76b0a8496b0dc3a3ff2c450229544e I'd added support for finding more consecutive MEMs. However, the check was too eager, in that it matched MEM_REFs with the same base address even if that base address was an arbitrary SSA name. This can give wrong results if a MEM_REF from one loop i

[PATCH] waccess: Look at calls when tracking clobbers [PR104092]

2022-01-18 Thread Richard Sandiford via Gcc-patches
In this PR the waccess pass was fed: D.10779 ={v} {CLOBBER}; VIEW_CONVERT_EXPR(D.10779) = .MASK_LOAD_LANES (addr_5(D), 64B, _2); _7 = D.10779.__val[0]; However, the tracking of m_clobbers only looked at gassigns, so it missed that the clobber on the first line was overwritten by the call o

Re: [PATCH] waccess: Look at calls when tracking clobbers [PR104092]

2022-01-19 Thread Richard Sandiford via Gcc-patches
Richard Biener writes: > On Tue, Jan 18, 2022 at 2:40 PM Richard Sandiford via Gcc-patches > wrote: >> >> In this PR the waccess pass was fed: >> >> D.10779 ={v} {CLOBBER}; >> VIEW_CONVERT_EXPR(D.10779) = .MASK_LOAD_LANES (addr_5(D), >> 64B, _2);

Re: [PATCH] tree-optimization/104112 - add check for vect epilogue reduc reuse

2022-01-19 Thread Richard Sandiford via Gcc-patches
Richard Biener writes: > This adds a missing check for the availability of intermediate vector > types required to re-use the accumulator of a vectorized reduction > in the vectorized epilogue. For SVE and VNx2DF vs V2DF with > -msve-vector-bits=512 for example V4DF is not available. > > In addit

Re: [PATCH] waccess: Look at calls when tracking clobbers [PR104092]

2022-01-19 Thread Richard Sandiford via Gcc-patches
Martin Sebor writes: > On 1/19/22 03:09, Richard Sandiford wrote: >> Richard Biener writes: >>> On Tue, Jan 18, 2022 at 2:40 PM Richard Sandiford via Gcc-patches >>> wrote: >>>> >>>> In this PR the waccess pass was fed: >>>> &g

Re: [PATCH] tree-optimization/104114 - avoid diagnosing V1mode lowering

2022-01-19 Thread Richard Sandiford via Gcc-patches
Richard Biener via Gcc-patches writes: > Currently we diagnose vector lowering of V1mode operations that > are not natively supported into V_C_E, scalar op plus CTOR with > -Wvector-operation-performance but that's hardly useful behavior > even though the way we lower things can be improved. > > T

Re: [PATCH v3 04/15] arm: Add GENERAL_AND_VPR_REGS regclass

2022-01-20 Thread Richard Sandiford via Gcc-patches
"Andre Vieira (lists)" writes: > On 20/01/2022 09:14, Christophe Lyon wrote: >> >> >> On Wed, Jan 19, 2022 at 7:18 PM Andre Vieira (lists) via Gcc-patches >> wrote: >> >> Hi Christophe, >> >> On 13/01/2022 14:56, Christophe Lyon via Gcc-patches wrote: >> > At some point during the de

Re: [PATCH v3 06/15] arm: Fix mve_vmvnq_n_ argument mode

2022-01-20 Thread Richard Sandiford via Gcc-patches
"Andre Vieira (lists)" writes: > On 13/01/2022 14:56, Christophe Lyon via Gcc-patches wrote: >> The vmvnq_n* intrinsics and have [u]int[16|32]_t arguments, so use >> iterator instead of HI in mve_vmvnq_n_. >> >> 2022-01-13 Christophe Lyon >> >> gcc/ >> * config/arm/mve.md (mve_vmvnq_

Re: [PING^3][PATCH, v2, 1/1, AARCH64][PR102768] aarch64: Add compiler support for Shadow Call Stack

2022-01-20 Thread Richard Sandiford via Gcc-patches
Thanks for the patch and sorry for the (very) slow review. Dan Li writes: > diff --git a/gcc/c-family/c-attribs.c b/gcc/c-family/c-attribs.c > index 007b928c54b..9b3a35c06bf 100644 > --- a/gcc/c-family/c-attribs.c > +++ b/gcc/c-family/c-attribs.c > @@ -56,6 +56,8 @@ static tree handle_cold_attrib

Re: [PATCH] Fix alignment of stack slots for overaligned types [PR103500]

2022-01-20 Thread Richard Sandiford via Gcc-patches
Sorry for the slow response. Alex Coplan writes: > On 20/12/2021 13:19, Richard Sandiford wrote: >> Alex Coplan via Gcc-patches writes: >> > Hi, >> > >> > This fixes PR103500 i.e. ensuring that stack slots for >> > passed-by-reference overaligned types are appropriately aligned. For the >> > tes

Re: [PATCH v2] Disable -fsplit-stack support on non-glibc targets

2022-01-20 Thread Richard Sandiford via Gcc-patches
cc:ing the x86 and s390 maintainers soeren--- via Gcc-patches writes: > From: Sören Tempel > > The -fsplit-stack option requires the pthread_t TCB definition in the > libc to provide certain struct fields at specific hardcoded offsets. As > far as I know, only glibc provides these fields at the

Re: [PATCH] cprop_hardreg: Workaround for narrow mode != lowpart targets

2022-01-20 Thread Richard Sandiford via Gcc-patches
Andreas Krebbel via Gcc-patches writes: > On 1/14/22 20:41, Andreas Krebbel via Gcc-patches wrote: >> On 1/14/22 08:37, Richard Biener wrote: >> ... >>> Can the gist of this bug be put into the GCC bugzilla so the rev can >>> refer to it? >> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104034 >>

Re: [PATCH] Fix alignment of stack slots for overaligned types [PR103500]

2022-01-21 Thread Richard Sandiford via Gcc-patches
Richard Sandiford via Gcc-patches writes: > How about instead: > > (1) Define a new ASLK_* flag for assign_stack_local_1. > > (2) When the flag is set, make: > > if (alignment_in_bits > MAX_SUPPORTED_STACK_ALIGNMENT) > { > alignment_in_bits =

Re: [PATCH] tree-optimization/100089 - BB vectorization of if-converted loop bodies

2022-01-21 Thread Richard Sandiford via Gcc-patches
Richard Biener writes: > The PR complains that when we only partially BB vectorize an > if-converted loop body that this can leave unvectorized code > unconditionally executed and thus effectively slow down code. > For -O2 we already mitigated the issue by not doing BB vectorization > when not all

Re: [PATCH v3] Disable -fsplit-stack support on non-glibc targets

2022-01-21 Thread Richard Sandiford via Gcc-patches
soe...@soeren-tempel.net writes: > From: Sören Tempel > > The -fsplit-stack option requires the pthread_t TCB definition in the > libc to provide certain struct fields at specific hardcoded offsets. As > far as I know, only glibc provides these fields at the required offsets. > Most notably, musl

Re: [PING^3][PATCH, v2, 1/1, AARCH64][PR102768] aarch64: Add compiler support for Shadow Call Stack

2022-01-25 Thread Richard Sandiford via Gcc-patches
Dan Li writes: >>> + >>> if (flag_stack_usage_info) >>>current_function_static_stack_size = constant_lower_bound >>> (frame_size); >>> >>> @@ -9066,6 +9089,10 @@ aarch64_expand_epilogue (bool for_sibcall) >>> RTX_FRAME_RELATED_P (insn) = 1; >>>} >>> >>> + /*

Re: [PING^3][PATCH, v2, 1/1, AARCH64][PR102768] aarch64: Add compiler support for Shadow Call Stack

2022-01-31 Thread Richard Sandiford via Gcc-patches
Thanks for the discussion and sorry for the slow reply, was out most of last week. Dan Li writes: > Thanks, Ard, > > On 1/26/22 00:10, Ard Biesheuvel wrote: >> On Wed, 26 Jan 2022 at 08:53, Dan Li wrote: >>> >>> Hi, all, >>> >>> Sorry for bothering. >>> >>> I'm trying to commit aarch64 scs code

Re: [PATCH] [PATCH, v3, 1/1, AARCH64][PR102768] aarch64: Add compiler support for Shadow Call Stack

2022-01-31 Thread Richard Sandiford via Gcc-patches
Dan Li writes: > Shadow Call Stack can be used to protect the return address of a > function at runtime, and clang already supports this feature[1]. > > To enable SCS in user mode, in addition to compiler, other support > is also required (as discussed in [2]). This patch only adds basic > support

Re: [PATCH v3 07/15] arm: Implement MVE predicates as vectors of booleans

2022-01-31 Thread Richard Sandiford via Gcc-patches
Sorry for the slow response, was out last week. Christophe Lyon via Gcc-patches writes: > diff --git a/gcc/emit-rtl.c b/gcc/emit-rtl.c > index f16d320..5f559f8fd93 100644 > --- a/gcc/emit-rtl.c > +++ b/gcc/emit-rtl.c > @@ -6239,9 +6239,14 @@ init_emit_once (void) > >/* For BImode, 1 and

Re: [PATCH v3 07/15] arm: Implement MVE predicates as vectors of booleans

2022-01-31 Thread Richard Sandiford via Gcc-patches
Christophe Lyon via Gcc-patches writes: > On Mon, Jan 31, 2022 at 7:01 PM Richard Sandiford via Gcc-patches < > gcc-patches@gcc.gnu.org> wrote: > >> Sorry for the slow response, was out last week. >> >> Christophe Lyon via Gcc-patches writes: >> > di

Re: [PATCH] reload: Adjust find_reloads to comment; test intersection, not subset

2022-01-31 Thread Richard Sandiford via Gcc-patches
Hans-Peter Nilsson via Gcc-patches writes: > I'm not seriously submitting this patch for approval. I just thought > it'd be interesting to some people, at least those maintaining ports > still using reload; I know it's reload and major ports don't really > care about that anymore. TL;DR: scroll

Re: [PATCH][1/4][committed] aarch64: Add support for Armv8.8-a memory operations and memcpy expansion

2022-02-01 Thread Richard Sandiford via Gcc-patches
Kyrylo Tkachov writes: > Hi Richard, > > Sorry for the delay in getting back to this. I'm now working on a patch to > adjust this. > >> -Original Message- >> From: Richard Sandiford >> Sent: Tuesday, December 14, 2021 10:48 AM >> To: Kyrylo Tkachov via Gcc-patches >> Cc: Kyrylo Tkachov

Re: [PATCH v3] [AARCH64] Fix PR target/103100 -mstrict-align and memset on not aligned buffers

2022-02-01 Thread Richard Sandiford via Gcc-patches
apinski--- via Gcc-patches writes: > From: Andrew Pinski > > The problem here is that aarch64_expand_setmem does not change the alignment > for strict alignment case. This is version 3 of this patch, is is based on > version 2 and moves the check for the number of instructions from the > optimizi

Re: [2/3 PATCH]AArch64 use canonical ordering for complex mul, fma and fms

2022-02-01 Thread Richard Sandiford via Gcc-patches
Tamar Christina writes: >> -Original Message- >> From: Richard Sandiford >> Sent: Friday, December 17, 2021 4:49 PM >> To: Tamar Christina >> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw >> ; Marcus Shawcroft >> ; Kyrylo Tkachov >> Subject: Re: [2/3 PATCH]AArch64 use canonical ord

Re: [PATCH] reload: Adjust comment in find_reloads about subset, not intersection

2022-02-02 Thread Richard Sandiford via Gcc-patches
Hans-Peter Nilsson writes: >> From: Richard Sandiford >> Hans-Peter Nilsson via Gcc-patches writes: >> > The mystery isn't so much that there's code mismatching comments or >> > intent, but that this code has been there "forever". There has been a >> > function reg_classes_intersect_p, in gcc s

Re: [PATCH] reload: Adjust comment in find_reloads about subset, not intersection

2022-02-02 Thread Richard Sandiford via Gcc-patches
Richard Sandiford writes: > Hans-Peter Nilsson writes: >>> From: Richard Sandiford >>> Hans-Peter Nilsson via Gcc-patches writes: >>> > The mystery isn't so much that there's code mismatching comments or >>> > intent, but that this code has been there "forever". There has been a >>> > function

[pushed] testsuite: Update guality xfails for aarch64*-*-*

2022-02-03 Thread Richard Sandiford via Gcc-patches
Following on from GCC 11 patch g:f31ddad8ac8, this one gives clean guality.exp test results for aarch64-linux-gnu with modern gdb (this time gdb 11.2). The justification is the same as previously: -- For people using older gdbs, it will trade one set of noisy results for another set. I still

[pushed] testsuite: Remove TSVC XFAILs for SVE

2022-02-03 Thread Richard Sandiford via Gcc-patches
Many of the XFAILed TSVC tests pass for SVE. This patch updates the markup accordingly. Tested on aarch64-linux-gnu & pushed. Richard gcc/testsuite/ * gcc.dg/vect/tsvc/vect-tsvc-s1115.c: Don't XFAIL for SVE. * gcc.dg/vect/tsvc/vect-tsvc-s114.c: Likewise. * gcc.dg/vect/t

<    2   3   4   5   6   7   8   9   10   11   >