Re: [PATCH] sched-deps: do not schedule pseudos across calls [PR108117]

2023-01-13 Thread Richard Sandiford via Gcc-patches
Alexander Monakov writes: > On Fri, 23 Dec 2022, Jose E. Marchesi wrote: > >> > +1 for trying this FWIW. There's still plenty of time to try an >> > alternative solution if there are unexpected performance problems. >> >> Let me see if Alexander's patch fixes the issue at hand (it must) and >> w

Re: [aarch64] Use wzr/xzr for assigning vector element to 0

2023-01-17 Thread Richard Sandiford via Gcc-patches
Prathamesh Kulkarni writes: > Hi Richard, > For the following (contrived) test: > > void foo(int32x4_t v) > { > v[3] = 0; > return v; > } > > -O2 code-gen: > foo: > fmovs1, wzr > ins v0.s[3], v1.s[0] > ret > > I suppose we can instead emit the following code-gen

Re: [PATCH v5] LoongArch: Fixed a compilation failure with '%c' in inline assembly [PR107731].

2023-01-17 Thread Richard Sandiford via Gcc-patches
Lulu Cheng writes: > Co-authored-by: Yang Yujie > > gcc/ChangeLog: > > * config/loongarch/loongarch.cc (loongarch_classify_address): > Add precessint for CONST_INT. > (loongarch_print_operand_reloc): Operand modifier 'c' is supported. > (loongarch_print_operand): Increase

Re: [PATCH 1/1] [fwprop]: Add the support of forwarding the vec_duplicate rtx

2023-01-17 Thread Richard Sandiford via Gcc-patches
lehua.d...@rivai.ai writes: > From: Lehua Ding > > ps: Resend for adjusting the width of each line of text. > > Hi, > > When I was adding the new RISC-V auto-vectorization function, I found that > converting `vector-reg1 vop vector-vreg2` to `scalar-reg3 vop vectorreg2` > is not very easy to handl

Re: [PATCH] libgcc: Fix uninitialized RA signing on AArch64 [PR107678]

2023-01-17 Thread Richard Sandiford via Gcc-patches
Wilco Dijkstra writes: > Hi, > >> @Wilco, can you please send the rebased patch for patch review? We would >> need in out openSUSE package soon. > > Here is an updated and rebased version: > > Cheers, > Wilco > > v4: rebase and add REG_UNSAVED_ARCHEXT. > > A recent change only initializes the regs

Re: [PATCH 1/1] [fwprop]: Add the support of forwarding the vec_duplicate rtx

2023-01-18 Thread Richard Sandiford via Gcc-patches
"丁乐华" writes: > > I don't think this pattern is correct, because SEL isn't commutative > > in the vector operands. > > Indeed, I think I should invert PRED operand or the comparison > operator which produce the PRED operand first. That would work, but it would no longer be a win. The vectoriser

Re: [PATCH v6] LoongArch: Fixed a compilation failure with '%c' in inline assembly [PR107731].

2023-01-18 Thread Richard Sandiford via Gcc-patches
Lulu Cheng writes: > Co-authored-by: Yang Yujie > > gcc/ChangeLog: > > * config/loongarch/loongarch.cc (loongarch_classify_address): > Add precessint for CONST_INT. > (loongarch_print_operand_reloc): Operand modifier 'c' is supported. > (loongarch_print_operand): Increase

Re: [aarch64] Use wzr/xzr for assigning vector element to 0

2023-01-18 Thread Richard Sandiford via Gcc-patches
Prathamesh Kulkarni writes: > On Tue, 17 Jan 2023 at 18:29, Richard Sandiford > wrote: >> >> Prathamesh Kulkarni writes: >> > Hi Richard, >> > For the following (contrived) test: >> > >> > void foo(int32x4_t v) >> > { >> > v[3] = 0; >> > return v; >> > } >> > >> > -O2 code-gen: >> > foo: >>

Re: [aarch64] Use exact_log2 (INTVAL (operands[2])) >= 0 to gate for vec_merge patterns.

2023-01-18 Thread Richard Sandiford via Gcc-patches
Prathamesh Kulkarni writes: > Hi Richard, > Based on your suggestion in the other thread, the patch uses > exact_log2 (INTVAL (operands[2])) >= 0 to gate for vec_merge patterns. > Bootstrap+test in progress on aarch64-linux-gnu. > Does it look OK ? Yeah, this is OK, thanks. IMO it's a latent bug

Re: [PATCH 1/2] aarch64: fix ICE in aarch64_layout_arg [PR108411]

2023-01-19 Thread Richard Sandiford via Gcc-patches
Christophe Lyon writes: > The previous patch added an assert which should not be applied to PST > types (Pure Scalable Types) because alignment does not matter in this > case. This patch moves the assert after the PST case is handled to > avoid the ICE. > > PR target/108411 > gcc/ >

Re: [PATCH 2/2] aarch64: add -fno-stack-protector to some tests [PR108411]

2023-01-19 Thread Richard Sandiford via Gcc-patches
Christophe Lyon writes: > As discussed in the PR, these recently added tests fail when the > testsuite is executed with -fstack-protector-strong. To avoid this, > this patch adds -fno-stack-protector to dg-options. > > PR target/108411 > gcc/testsuite > * g++.target/aarch64/bitf

Re: [PATCH 03/11] aarch64: Use br instead of ret for eh_return

2023-08-23 Thread Richard Sandiford via Gcc-patches
Szabolcs Nagy writes: > The expected way to handle eh_return is to pass the stack adjustment > offset and landing pad address via > > EH_RETURN_STACKADJ_RTX > EH_RETURN_HANDLER_RTX > > to the epilogue that is shared between normal return paths and the > eh_return paths. EH_RETURN_HANDLER_RTX

Re: [PATCH] AArch64: Fix MOPS memmove operand corruption [PR111121]

2023-08-23 Thread Richard Sandiford via Gcc-patches
Wilco Dijkstra writes: > A MOPS memmove may corrupt registers since there is no copy of the input > operands to temporary > registers. Fix this by calling aarch64_expand_cpymem which does this. Also > fix an issue with > STRICT_ALIGNMENT being ignored if TARGET_MOPS is true, and avoid crashing

Re: [PATCH] rtl: Forward declare rtx_code

2023-08-23 Thread Richard Sandiford via Gcc-patches
Richard Earnshaw via Gcc-patches writes: > Now that we require C++ 11, we can safely forward declare rtx_code > so that we can use it in target hooks. > > gcc/ChangeLog > * coretypes.h (rtx_code): Add forward declaration. > * rtl.h (rtx_code): Make compatible with forward declaration.

Re: [PATCH] rtl: use rtx_code for gen_ccmp_first and gen_ccmp_next

2023-08-23 Thread Richard Sandiford via Gcc-patches
Richard Earnshaw via Gcc-patches writes: > Note, this patch is dependent on the patch I posted yesterday to > forward declare rtx_code in coretypes.h. > > -- > Now that we have a forward declaration of rtx_code in coretypes.h, we > can adjust these hooks to take rtx_code arguments rather than

Re: [PATCH] rtl: Forward declare rtx_code

2023-08-23 Thread Richard Sandiford via Gcc-patches
"Richard Earnshaw (lists)" writes: > On 23/08/2023 16:49, Richard Sandiford via Gcc-patches wrote: >> Richard Earnshaw via Gcc-patches writes: >>> Now that we require C++ 11, we can safely forward declare rtx_code >>> so that we can use it i

Re: [PATCH] AArch64: Fix MOPS memmove operand corruption [PR111121]

2023-08-23 Thread Richard Sandiford via Gcc-patches
Wilco Dijkstra writes: > Hi Richard, > > (that's quick!) > >> + if (size > max_copy_size || size > max_mops_size) >> +return aarch64_expand_cpymem_mops (operands, is_memmove); >> >> Could you explain this a bit more? If I've followed the logic correctly, >> max_copy_size will always be 0 for

[PATCH] aarch64: Account for different Advanced SIMD fusing options

2023-08-24 Thread Richard Sandiford via Gcc-patches
The scalar FNMADD/FNMSUB and SVE FNMLA/FNMLS instructions mean that either side of a subtraction can start an accumulator chain. However, Advanced SIMD doesn't have an equivalent instruction. This means that, for Advanced SIMD, a subtraction can only be fused if the second operand is a multiplicati

Re: [PATCH] tree-optimization/111115 - SLP of masked stores

2023-08-24 Thread Richard Sandiford via Gcc-patches
Richard Biener writes: > The following adds the capability to do SLP on .MASK_STORE, I do not > plan to add interleaving support. > > Bootstrapped and tested on x86_64-unknown-linux-gnu, OK? LGTM, thanks. Richard > Thanks, > Richard. > > PR tree-optimization/15 > gcc/ > * tree-v

Re: [PATCH 03/11] aarch64: Use br instead of ret for eh_return

2023-08-24 Thread Richard Sandiford via Gcc-patches
Richard Sandiford writes: > Rather than hiding this in target code, perhaps we should add a > target-independent concept of an "eh_return taken" flag, say > EH_RETURN_TAKEN_RTX. > > We could define it so that, on targets that define EH_RETURN_TAKEN_RTX, > a register EH_RETURN_STACKADJ_RTX and a re

Re: [PATCH] RISC-V: Add conditional unary neg/abs/not autovec patterns

2023-08-24 Thread Richard Sandiford via Gcc-patches
Jeff Law writes: > On 8/22/23 02:08, juzhe.zh...@rivai.ai wrote: >> Yes, I agree long-term we want every-thing be optimized as early as >> possible. >> >> However, IMHO, it's impossible we can support every conditional patterns >> in the middle-end (match.pd). >> It's a really big number. >> >

Re: [PATCH V2] gimple_fold: Support COND_LEN_FNMA/COND_LEN_FMS/COND_LEN_FNMS gimple fold

2023-08-24 Thread Richard Sandiford via Gcc-patches
Juzhe-Zhong writes: > Hi, Richard and Richi. > > Currently, GCC support COND_LEN_FMA for floating-point **NO** -ffast-math. > It's supported in tree-ssa-math-opts.cc. However, GCC failed to support > COND_LEN_FNMA/COND_LEN_FMS/COND_LEN_FNMS. > > Consider this following case: > #define TEST_TYPE(T

Re: [RFC] > WIDE_INT_MAX_PREC support in wide-int

2023-08-29 Thread Richard Sandiford via Gcc-patches
Just some off-the-cuff thoughts. Might think differently when I've had more time... Richard Biener writes: > On Mon, 28 Aug 2023, Jakub Jelinek wrote: > >> Hi! >> >> While the _BitInt series isn't committed yet, I had a quick look at >> lifting the current lowest limitation on maximum _BitInt p

Re: [PATCH] fwprop: Allow UNARY_P and check register pressure.

2023-08-29 Thread Richard Sandiford via Gcc-patches
Jeff Law writes: > On 8/24/23 08:06, Robin Dapp via Gcc-patches wrote: >> Ping. I refined the code and some comments a bit and added a test >> case. >> >> My question in general would still be: Is this something we want >> given that we potentially move some of combine's work a bit towards >> t

[PATCH] attribs: Use existing traits for excl_hash_traits

2023-08-29 Thread Richard Sandiford via Gcc-patches
excl_hash_traits can be defined more simply by reusing existing traits. Tested on aarch64-linux-gnu. OK to install? Richard gcc/ * attribs.cc (excl_hash_traits): Delete. (test_attribute_exclusions): Use pair_hash and nofree_string_hash instead. --- gcc/attribs.cc | 45

RE: [PATCH] expmed: Allow extract_bit_field via mem for low-precision modes.

2023-08-30 Thread Richard Sandiford via Gcc-patches
[Sorry for any weird MUA issues, don't have access to my usual set-up.] > when looking at a riscv ICE in vect-live-6.c I noticed that we > assume that the variable part (coeffs[1] * x1) of the to-be-extracted > bit number in extract_bit_field_1 is a multiple of BITS_PER_UNIT. > > This means that b

Re: [PATCH] expmed: Allow extract_bit_field via mem for low-precision modes.

2023-08-30 Thread Richard Sandiford via Gcc-patches
Robin Dapp writes: >> But in the VLA case, doesn't it instead have precision 4+4X? >> The problem then is that we can't tell at compile time which >> byte that corresponds to. So... > > Yes 4 + 4x. I keep getting confused with poly modes :) > In this case we want to extract the bitnum [3 4] = 3

[PATCH] aarch64: Fix return register handling in untyped_call

2023-08-31 Thread Richard Sandiford via Gcc-patches
While working on another patch, I hit a problem with the aarch64 expansion of untyped_call. The expander emits the usual: (set (mem ...) (reg resN)) instructions to store the result registers to memory, but it didn't say in RTL where those resN results came from. This eventually led to a fail

[PATCH] lra: Avoid unfolded plus-0

2023-08-31 Thread Richard Sandiford via Gcc-patches
While backporting another patch to an earlier release, I hit a situation in which lra_eliminate_regs_1 would eliminate an address to: (plus (reg:P R) (const_int 0)) This address compared not-equal to plain: (reg:P R) which caused an ICE in a later peephole2. (The ICE showed up in gfort

Re: [PATCH] expmed: Allow extract_bit_field via mem for low-precision modes.

2023-09-01 Thread Richard Sandiford via Gcc-patches
Robin Dapp via Gcc-patches writes: >> It's not just a question of which byte though. It's also a question >> of which bit. >> >> One option would be to code-generate for even X and for odd X, and select >> between them at runtime. But that doesn't scale well to 2+2X and 1+1X. >> >> Otherwise I

Re: [PATCH 06/13] [APX EGPR] Map reg/mem constraints in inline asm to non-EGPR constraint.

2023-09-01 Thread Richard Sandiford via Gcc-patches
Uros Bizjak via Gcc-patches writes: > On Thu, Aug 31, 2023 at 11:18 AM Jakub Jelinek via Gcc-patches > wrote: >> >> On Thu, Aug 31, 2023 at 04:20:17PM +0800, Hongyu Wang via Gcc-patches wrote: >> > From: Kong Lingling >> > >> > In inline asm, we do not know if the insn can use EGPR, so disable E

Re: [PATCH]AArch64 xorsign: Fix scalar xorsign lowering

2023-09-01 Thread Richard Sandiford via Gcc-patches
Tamar Christina writes: > Hi All, > > In GCC-9 our scalar xorsign pattern broke and we didn't notice it because the > testcase was not strong enough. With this commit > > 8d2d39587d941a40f25ea0144cceb677df115040 is the first bad commit > commit 8d2d39587d941a40f25ea0144cceb677df115040 > Author: S

Re: [PATCH]AArch64 xorsign: Fix scalar xorsign lowering

2023-09-01 Thread Richard Sandiford via Gcc-patches
Tamar Christina writes: >> -Original Message- >> From: Richard Sandiford >> Sent: Friday, September 1, 2023 2:36 PM >> To: Tamar Christina >> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw >> ; Marcus Shawcroft >> ; Kyrylo Tkachov >> Subject: Re: [PATCH]AArch64 xorsign: Fix scalar x

Re: [PATCH] Bug 111071: fix the subr with -1 to not due to the simplify.

2023-09-04 Thread Richard Sandiford via Gcc-patches
"yanzhang.wang--- via Gcc-patches" writes: > From: Yanzhang Wang > > gcc/testsuite/ChangeLog: > > * gcc.target/aarch64/sve/acle/asm/subr_s8.c: Modify subr with -1 > to not. > > Signed-off-by: Yanzhang Wang > --- > > Tested on my local arm environment and passed. Thanks Andrew Pinski's

Re: [PATCH] testsuite: Remove unwanted 'dg-do run' from gcc.dg/vect tests

2023-09-04 Thread Richard Sandiford via Gcc-patches
Christophe Lyon via Gcc-patches writes: > Tests under gcc.dg/vect use check_vect_support_and_set_flags to set > compilation flags as appropriate for the target, but they also set > dg-do-what-default to 'run' or 'compile', depending on the actual > target hardware (or simulator) capabilities. > >

Re: [PATCH v3] mklog: handle Signed-off-by, minor cleanup

2023-09-04 Thread Richard Sandiford via Gcc-patches
Marc Poulhiès via Gcc-patches writes: > Richard Sandiford via Gcc-patches writes: >>> +# this regex matches the first line of the "end" in the initial commit >>> message >>> +FIRST_LINE_OF_END_RE = re.compile('(?i)^(signed-off-by|co-authored-by|#):

Re: [PATCH] testsuite: aarch64: Adjust SVE ACLE tests to new generated code

2023-09-04 Thread Richard Sandiford via Gcc-patches
Thiago Jung Bauermann via Gcc-patches writes: > Since commit e7a36e4715c7 "[PATCH] RISC-V: Support simplify (-1-x) for > vector." these tests fail on aarch64-linux: > > === g++ tests === > > Running g++:g++.target/aarch64/sve/acle/aarch64-sve-acle-asm.exp ... > FAIL: gcc.target/aarch

Re: [PATCH] Bug 111071: fix the subr with -1 to not due to the simplify.

2023-09-04 Thread Richard Sandiford via Gcc-patches
Richard Sandiford writes: > "yanzhang.wang--- via Gcc-patches" writes: >> From: Yanzhang Wang >> >> gcc/testsuite/ChangeLog: >> >> * gcc.target/aarch64/sve/acle/asm/subr_s8.c: Modify subr with -1 >> to not. >> >> Signed-off-by: Yanzhang Wang >> --- >> >> Tested on my local arm environm

Re: RFC: Introduce -fhardened to enable security-related flags

2023-09-04 Thread Richard Sandiford via Gcc-patches
Qing Zhao via Gcc-patches writes: >> On Aug 29, 2023, at 3:42 PM, Marek Polacek via Gcc-patches >> wrote: >> >> Improving the security of software has been a major trend in the recent >> years. Fortunately, GCC offers a wide variety of flags that enable extra >> hardening. These flags aren't

Re: [PATCH] fwprop: Allow UNARY_P and check register pressure.

2023-09-05 Thread Richard Sandiford via Gcc-patches
Robin Dapp writes: >> So I don't think I have a good feel for the advantages and disadvantages >> of doing this. Robin's analysis of the aarch64 changes was nice and >> detailed though. I think the one that worries me most is the addressing >> mode one. fwprop is probably the first chance we ge

Re: testsuite: Port 'check-function-bodies' to nvptx

2023-09-05 Thread Richard Sandiford via Gcc-patches
Thomas Schwinge writes: > Hi! > > On 2023-09-04T23:05:05+0200, I wrote: >> On 2019-07-16T15:04:49+0100, Richard Sandiford >> wrote: >>> This patch therefore adds a new check-function-bodies dg-final test > >>> The regexps in parse_function_bodies are fairly general, but might >>> still need to b

Re: [PATCH 01/11] aarch64: AARCH64_ISA_RCPC was defined twice

2023-09-05 Thread Richard Sandiford via Gcc-patches
Szabolcs Nagy writes: > gcc/ChangeLog: > > * config/aarch64/aarch64.h (AARCH64_ISA_RCPC): Remove dup. OK, thanks. Richard > --- > gcc/config/aarch64/aarch64.h | 1 - > 1 file changed, 1 deletion(-) > > diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h > index 2b0fc

Re: [PATCH 04/11] aarch64: Do not force a stack frame for EH returns

2023-09-05 Thread Richard Sandiford via Gcc-patches
Szabolcs Nagy writes: > EH returns no longer rely on clobbering the return address on the stack > so forcing a stack frame is not necessary. > > This does not actually change the code gen for the unwinder since there > are calls before the EH return. > > gcc/ChangeLog: > > * config/aarch64/a

Re: [PATCH 05/11] aarch64: Add eh_return compile tests

2023-09-05 Thread Richard Sandiford via Gcc-patches
Szabolcs Nagy writes: > gcc/testsuite/ChangeLog: > > * gcc.target/aarch64/eh_return-2.c: New test. > * gcc.target/aarch64/eh_return-3.c: New test. OK. I wonder if it's worth using check-function-bodies for -3.c though. It would then be easy to verify that the autiasp only occurs on t

Re: [PATCH 06/11] aarch64: Fix pac-ret eh_return tests

2023-09-05 Thread Richard Sandiford via Gcc-patches
Szabolcs Nagy writes: > This is needed since eh_return no longer prevents pac-ret in the > normal return path. > > gcc/testsuite/ChangeLog: > > * gcc.target/aarch64/return_address_sign_1.c: Move func4 to ... > * gcc.target/aarch64/return_address_sign_2.c: ... here and fix the > s

Re: [PATCH 07/11] aarch64: Disable branch-protection for pcs tests

2023-09-05 Thread Richard Sandiford via Gcc-patches
Szabolcs Nagy writes: > The tests manipulate the return address in abitest-2.h and thus not > compatible with -mbranch-protection=pac-ret+leaf or > -mbranch-protection=gcs. > > gcc/testsuite/ChangeLog: > > * gcc.target/aarch64/aapcs64/func-ret-1.c: Disable branch-protection. > * gcc.ta

Re: [PATCH 10/11] aarch64: Fix branch-protection error message tests

2023-09-05 Thread Richard Sandiford via Gcc-patches
Szabolcs Nagy writes: > Update tests for the new branch-protection parser errors. > > gcc/testsuite/ChangeLog: > > * gcc.target/aarch64/branch-protection-attr.c: Update. > * gcc.target/aarch64/branch-protection-option.c: Update. OK, thanks. (And I agree these are better messages. :))

Re: [PATCH v1 2/6] LoongArch: improved target configuration interface

2023-09-05 Thread Richard Sandiford via Gcc-patches
Yang Yujie writes: > @@ -5171,25 +5213,21 @@ case "${target}" in > # ${with_multilib_list} should not contain whitespaces, > # consecutive commas or slashes. > if echo "${with_multilib_list}" \ > - | grep -E -e "[[:space:]]" -e '[,/][,/]' -e '[

Re: [PATCH] LoongArch: Fix unintentional bash-ism in r14-3665.

2023-09-06 Thread Richard Sandiford via Gcc-patches
Yang Yujie writes: > gcc/ChangeLog: > > * config.gcc: remove non-POSIX syntax "<<<". OK. Thanks for the quick fix. Richard. > --- > gcc/config.gcc | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/gcc/config.gcc b/gcc/config.gcc > index b2fe7c7ceef..6d4c8becd28 10

Re: [PATCH] fwprop: Allow UNARY_P and check register pressure.

2023-09-06 Thread Richard Sandiford via Gcc-patches
Robin Dapp writes: > Hi Richard, > > I did some testing with the attached v2 that does not restrict to UNARY > anymore. As feared ;) there is some more fallout that I'm detailing below. > > On Power there is one guality fail (pr43051-1.c) that I would take > the liberty of ignoring for now. > > O

[PATCH] Tweak language choice in config-list.mk

2023-09-07 Thread Richard Sandiford via Gcc-patches
When I tried to use config-list.mk, the build for every triple except the build machine's failed for m2. This is because, unlike other languages, m2 builds target objects during all-gcc. The build will therefore fail unless you have access to an appropriate binutils (or an equivalent). That's qu

Re: [PATCH] fwprop: Allow UNARY_P and check register pressure.

2023-09-07 Thread Richard Sandiford via Gcc-patches
Robin Dapp writes: > Hi Richard, > > I did some testing with the attached v2 that does not restrict to UNARY > anymore. As feared ;) there is some more fallout that I'm detailing below. > > On Power there is one guality fail (pr43051-1.c) that I would take > the liberty of ignoring for now. > > O

Re: [PATCH] Support folding min(poly,poly) to const

2023-09-07 Thread Richard Sandiford via Gcc-patches
Lehua Ding writes: > Hi, > > This patch adds support that tries to fold `MIN (poly, poly)` to > a constant. Consider the following C Code: > > ``` > void foo2 (int* restrict a, int* restrict b, int n) > { > for (int i = 0; i < 3; i += 1) > a[i] += b[i]; > } > ``` > > Before this patch: >

Re: [PATCH V2] Support folding min(poly,poly) to const

2023-09-08 Thread Richard Sandiford via Gcc-patches
Lehua Ding writes: > Hi, > > This patch adds support that tries to fold `MIN (poly, poly)` to > a constant. Consider the following C Code: > > ``` > void foo2 (int* restrict a, int* restrict b, int n) > { > for (int i = 0; i < 3; i += 1) > a[i] += b[i]; > } > ``` > > Before this patch: >

Re: [PATCH V2] Support folding min(poly,poly) to const

2023-09-08 Thread Richard Sandiford via Gcc-patches
Richard Sandiford writes: > Lehua Ding writes: >> Hi, >> >> This patch adds support that tries to fold `MIN (poly, poly)` to >> a constant. Consider the following C Code: >> >> ``` >> void foo2 (int* restrict a, int* restrict b, int n) >> { >> for (int i = 0; i < 3; i += 1) >> a[i] += b

Re: [PATCH V3] Support folding min(poly,poly) to const

2023-09-08 Thread Richard Sandiford via Gcc-patches
Lehua Ding writes: > V3 change: Address Richard's comments. > > Hi, > > This patch adds support that tries to fold `MIN (poly, poly)` to > a constant. Consider the following C Code: > > ``` > void foo2 (int* restrict a, int* restrict b, int n) > { > for (int i = 0; i < 3; i += 1) > a[i]

[PATCH] Allow target attributes in non-gnu namespaces

2023-09-08 Thread Richard Sandiford via Gcc-patches
Currently there are four static sources of attributes: - LANG_HOOKS_ATTRIBUTE_TABLE - LANG_HOOKS_COMMON_ATTRIBUTE_TABLE - LANG_HOOKS_FORMAT_ATTRIBUTE_TABLE - TARGET_ATTRIBUTE_TABLE All of the attributes in these tables go in the "gnu" namespace. This means that they can use the traditional GNU __

Re: [PATCH] pretty-print: Fix up pp_wide_int [PR111329]

2023-09-11 Thread Richard Sandiford via Gcc-patches
Jakub Jelinek writes: > Hi! > > The recent pp_wide_int changes for _BitInt support (because not all > wide_ints fit into the small fixed size digit_buffer anymore) apparently > broke > +FAIL: gcc.dg/analyzer/out-of-bounds-diagram-1-debug.c (test for excess > errors) > +FAIL: gcc.dg/analyzer/out-o

[PATCH 00/19] aarch64: Fix -fstack-protector issue

2023-09-12 Thread Richard Sandiford via Gcc-patches
This series of patches fixes deficiencies in GCC's -fstack-protector implementation for AArch64 when using dynamically allocated stack space. This is CVE-2023-4039. See: https://developer.arm.com/Arm%20Security%20Center/GCC%20Stack%20Protector%20Vulnerability%20AArch64 https://github.com/metaredt

[PATCH 01/19] aarch64: Use local frame vars in shrink-wrapping code

2023-09-12 Thread Richard Sandiford via Gcc-patches
aarch64_layout_frame uses a shorthand for referring to cfun->machine->frame: aarch64_frame &frame = cfun->machine->frame; This patch does the same for some other heavy users of the structure. No functional change intended. gcc/ * config/aarch64/aarch64.cc (aarch64_save_callee_saves): U

[PATCH 05/19] aarch64: Add bytes_below_hard_fp to frame info

2023-09-12 Thread Richard Sandiford via Gcc-patches
Following on from the previous bytes_below_saved_regs patch, this one records the number of bytes that are below the hard frame pointer. This eventually replaces below_hard_fp_saved_regs_size. If a frame pointer is not needed, the epilogue adds final_adjust to the stack pointer before restoring re

[PATCH 07/19] aarch64: Only calculate chain_offset if there is a chain

2023-09-12 Thread Richard Sandiford via Gcc-patches
After previous patches, it is no longer necessary to calculate a chain_offset in cases where there is no chain record. gcc/ * config/aarch64/aarch64.cc (aarch64_expand_prologue): Move the calculation of chain_offset into the emit_frame_chain block. --- gcc/config/aarch64/aarch64.c

[PATCH 12/19] aarch64: Simplify top of frame allocation

2023-09-12 Thread Richard Sandiford via Gcc-patches
After previous patches, it no longer really makes sense to allocate the top of the frame in terms of varargs_and_saved_regs_size and saved_regs_and_above. gcc/ * config/aarch64/aarch64.cc (aarch64_layout_frame): Simplify the allocation of the top of the frame. --- gcc/config/aarch

[PATCH 02/19] aarch64: Avoid a use of callee_offset

2023-09-12 Thread Richard Sandiford via Gcc-patches
When we emit the frame chain, i.e. when we reach Here in this statement of aarch64_expand_prologue: if (emit_frame_chain) { // Here ... } the stack is in one of two states: - We've allocated up to the frame chain, but no more. - We've allocated the whole frame, and the fra

[PATCH 06/19] aarch64: Tweak aarch64_save/restore_callee_saves

2023-09-12 Thread Richard Sandiford via Gcc-patches
aarch64_save_callee_saves and aarch64_restore_callee_saves took a parameter called start_offset that gives the offset of the bottom of the saved register area from the current stack pointer. However, it's more convenient for later patches if we use the bottom of the entire frame as the reference po

[PATCH 09/19] aarch64: Rename hard_fp_offset to bytes_above_hard_fp

2023-09-12 Thread Richard Sandiford via Gcc-patches
Similarly to the previous locals_offset patch, hard_fp_offset was described as: /* Offset from the base of the frame (incomming SP) to the hard_frame_pointer. This value is always a multiple of STACK_BOUNDARY. */ poly_int64 hard_fp_offset; which again took an “upside-down” view: h

[PATCH 11/19] aarch64: Measure reg_offset from the bottom of the frame

2023-09-12 Thread Richard Sandiford via Gcc-patches
reg_offset was measured from the bottom of the saved register area. This made perfect sense with the original layout, since the bottom of the saved register area was also the hard frame pointer address. It became slightly less obvious with SVE, since we save SVE registers below the hard frame point

[PATCH 03/19] aarch64: Explicitly handle frames with no saved registers

2023-09-12 Thread Richard Sandiford via Gcc-patches
If a frame has no saved registers, it can be allocated in one go. There is no need to treat the areas below and above the saved registers as separate. And if we allocate the frame in one go, it should be allocated as the initial_adjust rather than the final_adjust. This allows the frame size to g

[PATCH 10/19] aarch64: Tweak frame_size comment

2023-09-12 Thread Richard Sandiford via Gcc-patches
This patch fixes another case in which a value was described with an “upside-down” view. gcc/ * config/aarch64/aarch64.h (aarch64_frame::frame_size): Tweak comment. --- gcc/config/aarch64/aarch64.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/gcc/config/aarch64/

[PATCH 13/19] aarch64: Minor initial adjustment tweak

2023-09-12 Thread Richard Sandiford via Gcc-patches
This patch just changes a calculation of initial_adjust to one that makes it slightly more obvious that the total adjustment is frame.frame_size. gcc/ * config/aarch64/aarch64.cc (aarch64_layout_frame): Tweak calculation of initial_adjust for frames in which all saves are S

[PATCH 15/19] aarch64: Put LR save probe in first 16 bytes

2023-09-12 Thread Richard Sandiford via Gcc-patches
-fstack-clash-protection uses the save of LR as a probe for the next allocation. The next allocation could be: * another part of the static frame, e.g. when allocating SVE save slots or outgoing arguments * an alloca in the same function * an allocation made by a callee function However, whe

[PATCH 04/19] aarch64: Add bytes_below_saved_regs to frame info

2023-09-12 Thread Richard Sandiford via Gcc-patches
The frame layout code currently hard-codes the assumption that the number of bytes below the saved registers is equal to the size of the outgoing arguments. This patch abstracts that value into a new field of aarch64_frame. gcc/ * config/aarch64/aarch64.h (aarch64_frame::bytes_below_saved

[PATCH 14/19] aarch64: Tweak stack clash boundary condition

2023-09-12 Thread Richard Sandiford via Gcc-patches
The AArch64 ABI says that, when stack clash protection is used, there can be a maximum of 1KiB of unprobed space at sp on entry to a function. Therefore, we need to probe when allocating >= guard_size - 1KiB of data (>= rather than >). This is what GCC does. If an allocation is exactly guard_siz

[PATCH 18/19] aarch64: Remove below_hard_fp_saved_regs_size

2023-09-12 Thread Richard Sandiford via Gcc-patches
After previous patches, it's no longer necessary to store saved_regs_size and below_hard_fp_saved_regs_size in the frame info. All measurements instead use the top or bottom of the frame as reference points. gcc/ * config/aarch64/aarch64.h (aarch64_frame::saved_regs_size) (aarch64_

[PATCH 08/19] aarch64: Rename locals_offset to bytes_above_locals

2023-09-12 Thread Richard Sandiford via Gcc-patches
locals_offset was described as: /* Offset from the base of the frame (incomming SP) to the top of the locals area. This value is always a multiple of STACK_BOUNDARY. */ This is implicitly an “upside down” view of the frame: the incoming SP is at offset 0, and anything N bytes below

[PATCH 16/19] aarch64: Simplify probe of final frame allocation

2023-09-12 Thread Richard Sandiford via Gcc-patches
Previous patches ensured that the final frame allocation only needs a probe when the size is strictly greater than 1KiB. It's therefore safe to use the normal 1024 probe offset in all cases. The main motivation for doing this is to simplify the code and remove the number of special cases. gcc/

[PATCH 19/19] aarch64: Make stack smash canary protect saved registers

2023-09-12 Thread Richard Sandiford via Gcc-patches
AArch64 normally puts the saved registers near the bottom of the frame, immediately above any dynamic allocations. But this means that a stack-smash attack on those dynamic allocations could overwrite the saved registers without needing to reach as far as the stack smash canary. The same thing co

[PATCH 17/19] aarch64: Explicitly record probe registers in frame info

2023-09-12 Thread Richard Sandiford via Gcc-patches
The stack frame is currently divided into three areas: A: the area above the hard frame pointer B: the SVE saves below the hard frame pointer C: the outgoing arguments If the stack frame is allocated in one chunk, the allocation needs a probe if the frame size is >= guard_size - 1KiB. In additio

Re: [PATCH] AArch64: List official cores before codenames

2023-09-13 Thread Richard Sandiford via Gcc-patches
Wilco Dijkstra writes: > List official cores first so that -cpu=native does not show a codename with -v > or in errors/warnings. Nice spot. > Passes regress, OK for commit? > > gcc/ChangeLog: > * config/aarch64/aarch64-cores.def (neoverse-n1): Place before ares. > (neoverse-v1):

[PATCH] aarch64: Coerce addresses to be suitable for LD1RQ

2023-09-14 Thread Richard Sandiford via Gcc-patches
In the following test: svuint8_t ld(uint8_t *ptr) { return svld1rq(svptrue_b8(), ptr + 2); } ptr + 2 is a valid address for an Advanced SIMD load, but not for an SVE load. We therefore ended up generating: ldr q0, [x0, 2] dup z0.q, z0.q[0] This patch makes us generate

[PATCH] aarch64: Restore SVE WHILE costing

2023-09-14 Thread Richard Sandiford via Gcc-patches
AArch64 previously costed WHILELO instructions on the first call to add_stmt_cost. This was because, at the time, only add_stmt_cost had access to the loop_vec_info. However, after the AVX512 changes, we only calculate the masks later. This patch moves the WHILELO costing to finish_cost, which is

[PATCH] aarch64: Fix loose ldpstp check [PR111411]

2023-09-15 Thread Richard Sandiford via Gcc-patches
aarch64_operands_ok_for_ldpstp contained the code: /* One of the memory accesses must be a mempair operand. If it is not the first one, they need to be swapped by the peephole. */ if (!aarch64_mem_pair_operand (mem_1, GET_MODE (mem_1)) && !aarch64_mem_pair_operand (mem_2, GET

Re: [PATCH] internal-fn: Convert uninitialized SSA_NAME into SCRATCH rtx[PR110751]

2023-09-17 Thread Richard Sandiford via Gcc-patches
Juzhe-Zhong writes: > According to PR: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110751 > > As Richard and Richi suggested, we recognize uninitialized SSA_NAME and > convert it > into SCRATCH rtx if the target predicate allows SCRATCH. > > It can help to reduce redundant data move instructions

Re: [PATCH] AArch64: Improve immediate expansion [PR105928]

2023-09-17 Thread Richard Sandiford via Gcc-patches
Wilco Dijkstra writes: > Support immediate expansion of immediates which can be created from 2 MOVKs > and a shifted ORR or BIC instruction. Change aarch64_split_dimode_const_store > to apply if we save one instruction. > > This reduces the number of 4-instruction immediates in SPECINT/FP by 5%.

Re: [AArch64][testsuite] Adjust vect_copy_lane_1.c for new code-gen

2023-09-17 Thread Richard Sandiford via Gcc-patches
Prathamesh Kulkarni writes: > Hi, > After 27de9aa152141e7f3ee66372647d0f2cd94c4b90, there's a following > regression: > FAIL: gcc.target/aarch64/vect_copy_lane_1.c scan-assembler-times > ins\\tv0.s\\[1\\], v1.s\\[0\\] 3 > > This happens because for the following function from vect_copy_lane_1.c:

Re: [PATCH V2] internal-fn: Support undefined rtx for uninitialized SSA_NAME

2023-09-17 Thread Richard Sandiford via Gcc-patches
Juzhe-Zhong writes: > According to PR: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110751 > > As Richard and Richi suggested, we recognize uninitialized SSA_NAME and > convert it > into SCRATCH rtx if the target predicate allows SCRATCH. > > It can help to reduce redundant data move instructions

Re: [PATCH/RFC 08/10] aarch64: Don't use CEIL for vector_store in aarch64_stp_sequence_cost

2023-09-18 Thread Richard Sandiford via Gcc-patches
Kewen Lin writes: > This costing adjustment patch series exposes one issue in > aarch64 specific costing adjustment for STP sequence. It > causes the below test cases to fail: > > - gcc/testsuite/gcc.target/aarch64/ldp_stp_15.c > - gcc/testsuite/gcc.target/aarch64/ldp_stp_16.c > - gcc/tests

Re: [PATCH] data-ref: Rework integer handling in split_constant_offset [PR98069]

2020-12-10 Thread Richard Sandiford via Gcc-patches
Richard Biener writes: >> @@ -812,33 +997,80 @@ split_constant_offset_1 (tree type, tree op0, enum >> tree_code code, tree op1, >> } >> } >> >> -/* Expresses EXP as VAR + OFF, where off is a constant. The type of OFF >> - will be ssizetype. */ >> +/* If EXP has pointer type, try to ex

Re: RFC: ARM MVE and Neon auto-vectorization

2020-12-10 Thread Richard Sandiford via Gcc-patches
Christophe Lyon writes: > On Wed, 9 Dec 2020 at 17:47, Richard Sandiford > wrote: >> >> Christophe Lyon via Gcc-patches writes: >> > Hi, >> > >> > I've been working for a while on enabling auto-vectorization for ARM >> > MVE, and I find it a bit awkward to keep things common with Neon as >> > mu

Re: [PATCH] tree-optimization/98211 - fix bogus vectorization of conversion

2020-12-11 Thread Richard Sandiford via Gcc-patches
Richard Biener writes: > Pattern recog incompletely handles some bool cases but we shouldn't > miscompile as a result but not vectorize. Unfortunately > vectorizable_assignment lets invalid conversions (that > vectorizable_conversion rejects) slip through. The following > rectifies that. > > Boo

Re: [PATCH] tree-optimization/98211 - fix bogus vectorization of conversion

2020-12-11 Thread Richard Sandiford via Gcc-patches
Richard Biener writes: > On Fri, 11 Dec 2020, Richard Sandiford wrote: >> > diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c >> > index a4980a931a9..d3ab8aa1c29 100644 >> > --- a/gcc/tree-vect-stmts.c >> > +++ b/gcc/tree-vect-stmts.c >> > @@ -5123,6 +5123,17 @@ vectorizable_assignment (v

Re: [PATCH] tree-optimization/98211 - fix bogus vectorization of conversion

2020-12-11 Thread Richard Sandiford via Gcc-patches
Richard Biener writes: > On Fri, 11 Dec 2020, Richard Sandiford wrote: > >> Richard Biener writes: >> > On Fri, 11 Dec 2020, Richard Sandiford wrote: >> >> > diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c >> >> > index a4980a931a9..d3ab8aa1c29 100644 >> >> > --- a/gcc/tree-vect-stmts.

Re: [PATCH][GCC] aarch64: Add support for Cortex-A78C

2020-12-11 Thread Richard Sandiford via Gcc-patches
Przemyslaw Wirkus writes: > This patch adds support for -mcpu=cortex-a78c command line option. > For more information about this processor, see [0]: > > [0] https://developer.arm.com/ip-products/processors/cortex-a/cortex-a78c > > OK for master ? > > gcc/ChangeLog: > > * config/aarch64/aarch

Re: [PATCH][GCC][PR target/98177] aarch64: SVE: ICE in expand_direct_optab_fn

2020-12-14 Thread Richard Sandiford via Gcc-patches
Przemyslaw Wirkus writes: > Hi, > > Recent 'support SVE comparisons for unpacked integers' patch extends > operands of define_expands from SVE_FULL to SVE_ALL. This causes an ICE > hence this PR patch. > > This patch adds this relaxation for: > + reduc__scal_ and > + arch64_pred_reduc__ > in order

Re: [PATCH]AArch64: Add NEON, SVE and SVE2 RTL patterns for Complex Addition, Multiply and FMA.

2020-12-14 Thread Richard Sandiford via Gcc-patches
Rearranging slightly… > @@ -708,6 +713,10 @@ (define_c_enum "unspec" > UNSPEC_FCMLA90 ; Used in aarch64-simd.md. > UNSPEC_FCMLA180 ; Used in aarch64-simd.md. > UNSPEC_FCMLA270 ; Used in aarch64-simd.md. > +UNSPEC_FCMUL ; Used in aarch64-simd.md. > +UNSPEC_FCMUL180 ;

Re: [PATCH]AArch64: Add NEON, SVE and SVE2 RTL patterns for Complex Addition, Multiply and FMA.

2020-12-14 Thread Richard Sandiford via Gcc-patches
Tamar Christina writes: > Hi Richard, > > Do you object to me splitting off complex add and addressing your remaining > feedback later when the rewrite of mul and fma are done. No, sounds good to me. Thanks, Richard

Re: [19/23] rtlanal: Add some new helper classes

2020-12-14 Thread Richard Sandiford via Gcc-patches
Jeff Law writes: > On 11/13/20 1:20 AM, Richard Sandiford via Gcc-patches wrote: >> This patch adds some classes for gathering the list of registers >> and memory that are read and written by an instruction, along >> with various properties about the accesses. In some ways i

Re: [PATCH]AArch64: Add NEON, SVE and SVE2 RTL patterns for Complex Addition, Multiply and FMA.

2020-12-16 Thread Richard Sandiford via Gcc-patches
Tamar Christina writes: > Hi Richard, > > Here's the split off complex add. > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. > Checked with armv8-a+sve2+fp16 and no issues. Note that due to a mid-end > limitation SLP for SVE currently fails for some permutes. The tests have >

Re: libcody: Fix for dash

2020-12-16 Thread Richard Sandiford via Gcc-patches
Nathan Sidwell writes: > Apparently 'var+=...' is not a dash thing. Fixed thusly. > > * config.m4: Avoid non-dash idiom > * configure: Rebuilt. > > pushed (2 patches, because I didn't look carefully enough the first time) Thanks. I think the other uses of += need the same treatme

Re: [PATCH] genemit: Handle `const_double_zero' rtx

2020-12-16 Thread Richard Sandiford via Gcc-patches
"Maciej W. Rozycki" writes: > On Tue, 15 Dec 2020, Jeff Law wrote: > >> > @@ -1942,7 +1942,7 @@ gen_divdf3_cc (rtx operand0 ATTRIBUTE_UN >> >gen_rtx_DIV (DFmode, >> >operand1, >> >operand2), >> > - const0_rtx)), >> > + CONST_DOUBLE_ATOF ("0", VOIDmode))), >> >gen_rtx_SET

<    10   11   12   13   14   15   16   17   18   19   >