Re: [PATCH] aarch64: Use LDR for first-element loads for Advanced SIMD

2025-05-29 Thread Richard Sandiford
Dhruv Chawla writes: > On 08/05/25 18:43, Richard Sandiford wrote: >> Otherwise it looks good. But I think we should think about how we >> plan to integrate the related optimisation for register inputs. E.g.: >> >> int32x4_t foo(int32_t x) { >> return vs

Re: [PATCH v2 2/2] emit-rtl: Validate mode for paradoxical hardware subregs [PR119966]

2025-05-29 Thread Richard Sandiford
Sorry for the slow reply. Dimitar Dimitrov writes: > On Fri, May 16, 2025 at 06:14:30PM +0100, Richard Sandiford wrote: >> Dimitar Dimitrov writes: >> > After r16-160-ge6f89d78c1a752, late_combine2 started transforming the >> > following RTL for pru-unknown-elf: >&

Re: [PATCH v2] ext-dce: Don't refine live width with SUBREG mode if !TRULY_NOOP_TRUNCATION_MODES_P [PR 120050]

2025-05-28 Thread Richard Sandiford
Sorry for the slow reply, had a few days off. Xi Ruoyao writes: > If we see a promoted subreg and TRULY_NOOP_TRUNCATION says the > truncation is not a noop, then all bits of the inner reg are live. We > cannot reduce the live mask to that of the mode of the subreg. > > gcc/ChangeLog: > > P

Re: [PATCH] rtl-ssa: Reject non-address uses of autoinc regs [PR120347]

2025-05-28 Thread Richard Sandiford
Richard Biener writes: > On Thu, May 22, 2025 at 12:19 PM Richard Sandiford > wrote: >> >> As the rtl.texi documentation of RTX_AUTOINC expressions says: >> >> If a register used as the operand of these expressions is used in >> another address in an insn

Re: [PATCH][RFC] Allow the target to request a masked vector epilogue

2025-05-22 Thread Richard Sandiford
Richard Biener writes: > On Mon, 19 May 2025, Richard Sandiford wrote: > >> Richard Biener writes: >>> On Fri, 16 May 2025, Richard Sandiford wrote: >>>>> The simple prototype below uses a separate flag from the epilogue >>>>> mode, but I

Re: [PATCH 2/2] aarch64: Improve rtx_cost for constants in COMPARE [PR120372]

2025-05-22 Thread Richard Sandiford
Andrew Pinski writes: > The middle-end uses rtx_cost on constants with the outer of being COMPARE > to find out the cost of a constant formation for a comparison instruction. > So for aarch64 backend, we would just return the cost of constant formation > in general. We can improve this by seeing i

Re: [AUTOFDO][AARCH64] Add support for profilebootstrap

2025-05-22 Thread Richard Sandiford
Kugan Vivekanandarajah writes: > Add support for autoprofiledbootstrap in aarch64. > This is similar to what is done for i386. Added > gcc/config/aarch64/gcc-auto-profile for aarch64 profile > creation. > > How to run: > configure --with-build-config=bootstrap-lto > make autoprofiledbootstrap > >

Re: [PATCH 2/2] aarch64: Fold lsl+lsr+orr to rev for half-width shifts

2025-05-22 Thread Richard Sandiford
Dhruv Chawla writes: > On 20/05/25 16:35, Richard Sandiford wrote: >> Dhruv Chawla writes: >>> [...] >>> Would it be a good idea to add tests for the bad codegen as well? I have >>> added tests for lsl/usra in the next round of patches. >> >> Nah

Re: [PATCH v2] [aarch64] [vxworks] mark x18 as fixed, adjust tests

2025-05-22 Thread Richard Sandiford
Alexandre Oliva writes: > On May 21, 2025, Richard Sandiford wrote: > >> I think this one shows a deeper issue, though. -fsanitize=shadow-call-stack >> is currently hardcoded to use x18: > > Oh, indeed! > >> and I assume this usage will be incompatible wi

Re: [PATCH v4 2/2] aarch64: Fold lsl+lsr+orr to rev for half-width shifts

2025-05-22 Thread Richard Sandiford
writes: > [...] > +;; The RTL combiners are able to combine "ior (ashift, ashiftrt)" to a > "bswap". > +;; Match that as well. > +(define_insn_and_split "*v_revvnx8hi" > + [(parallel > +[(set (match_operand:VNx8HI 0 "register_operand") > + (bswap:VNx8HI (match_operand 1 "register_opera

Re: [PATCH v4 1/2] aarch64: Match unpredicated shift patterns for ADR, SRA and ADDHNB instructions

2025-05-22 Thread Richard Sandiford
these instructions. > > Bootstrapped and regtested on aarch64-linux-gnu. > > Signed-off-by: Dhruv Chawla > Co-authored-by: Richard Sandiford > > gcc/ChangeLog: > > * gcc/config/aarch64/aarch64-sve.md (@aarch64_adr_shift): > Match lowered form o

[PATCH] rtl-ssa: Reject non-address uses of autoinc regs [PR120347]

2025-05-22 Thread Richard Sandiford
As the rtl.texi documentation of RTX_AUTOINC expressions says: If a register used as the operand of these expressions is used in another address in an insn, the original value of the register is used. Uses of the register outside of an address are not permitted within the same insn as a u

Re: [PATCH 3/3] genemit: Use a byte encoding to generate insns

2025-05-21 Thread Richard Sandiford
Jeff Law writes: > Given you know the RTL gen* related thingies better than anyone, I'd say > go forward and if there's any fallout, we can certainly cope with it. Thanks. I've now pushed the series and the earlier genemit tweaks, with the discussed change to mark the operandN arguments as cons

Re: [PATCH] aarch64: Carry over zeroness in aarch64_evpc_reencode

2025-05-21 Thread Richard Sandiford
Pengxuan Zheng writes: > There was a bug in aarch64_evpc_reencode which could leave zero_op0_p and > zero_op1_p of the struct "newd" uninitialized. r16-701-gd77c3bc1c35e303 fixed > the issue by zero initializing "newd." This patch provides an alternative fix > a

Re: [PATCH] testsuite: aarch64: arm: Fix -mcpu=unset support in shared effective targets

2025-05-21 Thread Richard Sandiford
Christophe Lyon writes: > Many tests became unsupported on aarch64 when -mcpu=unset was added to > several arm_* effective targets, because this flag is only supported > on arm. > > Since these effective targets are used on arm and aarch64, the patch > adds -mcpu=unset on arm only, and restores ""

Re: [PATCH] [aarch64] [vxworks] mark x18 as fixed, adjust tests

2025-05-21 Thread Richard Sandiford
Alexandre Oliva writes: > VxWorks uses x18 as the TCB, so STATIC_CHAIN_REGNUM has long been set > (in gcc/config/aarch64/aarch64-vxworks.h) to use x9 instead. > > This patch marks x18 as fixed on TARGET_VXWORKS, so that it is not > chosen by the register allocator, and adjusts tests that depend on

[PATCH] sparc: Avoid operandN variables in .md files

2025-05-20 Thread Richard Sandiford
The automatically-generated gen_* routines take their operands as individual arguments, named "operand0" upwards. These arguments are stored into an "operands" array before invoking the expander's C++ code, which can then modify the operands by writing to the array. However, the SPARC sign-extend

Re: [PATCH v3 3/3] asf: Fix calling of emit_move_insn on registers of different modes [PR119884]

2025-05-20 Thread Richard Sandiford
Richard Sandiford writes: > Konstantinos Eleftheriou writes: >> This patch uses `lowpart_subreg` for the base register initialization, >> instead of zero-extending it. We had tried this solution before, but >> we were leaving undefined bytes in the upper part of the regist

Re: [PATCH 2/2]AArch64: propose -mmax-vectorization as an option to override vector costing

2025-05-20 Thread Richard Sandiford
Tamar Christina writes: > Hi All, > > With the middle-end providing a way to make vectorization more profitable by > scaling vect-scalar-cost-multiplier this makes a more user friendly option > to make it easier to use. > > I propose making it an actual -m option that we document and retain vs usi

Re: [PATCH v3 2/3] sbitmap: Add bitmap_is_range_set_p function

2025-05-20 Thread Richard Sandiford
Konstantinos Eleftheriou writes: > Hi Richard, thanks for your response. > > On Tue, May 20, 2025 at 8:05 AM Richard Biener > wrote: >> >> On Mon, May 19, 2025 at 4:14 PM Konstantinos Eleftheriou >> wrote: >> > >> > This patch adds the `bitmap_is_range_set_p` function in sbitmap, >> > which chec

Re: [PATCH v3 3/3] asf: Fix calling of emit_move_insn on registers of different modes [PR119884]

2025-05-20 Thread Richard Sandiford
Konstantinos Eleftheriou writes: > This patch uses `lowpart_subreg` for the base register initialization, > instead of zero-extending it. We had tried this solution before, but > we were leaving undefined bytes in the upper part of the register. > This shouldn't be happening as we are supposed to

Re: [PATCH v3 1/3] sbitmap: Add bitmap_bit_in_range_p_1 helper function

2025-05-20 Thread Richard Sandiford
Konstantinos Eleftheriou writes: > This patch adds the `bitmap_bit_in_range_p_1` helper function, > in order to be used by `bitmap_bit_in_range_p`. The helper function > contains the previous implementation of `bitmap_bit_in_range_p` and > `bitmap_bit_in_range_p` has been updated to call the helpe

Re: [PATCH 2/2] aarch64: Fold lsl+lsr+orr to rev for half-width shifts

2025-05-20 Thread Richard Sandiford
Dhruv Chawla writes: > On 06/05/25 21:57, Richard Sandiford wrote: >> External email: Use caution opening links or attachments >> >> >> Dhruv Chawla writes: >>> This patch modifies the intrinsic expanders to expand svlsl and svlsr to >>> unpredi

Re: [PATCH v3 1/2] aarch64: Match unpredicated shift patterns for ADR, SRA and ADDHNB instructions

2025-05-20 Thread Richard Sandiford
writes: > @@ -4899,7 +4876,9 @@ > if (CONST_INT_P (operands[2])) >{ > amount = gen_const_vec_duplicate (mode, operands[2]); > - if (!aarch64_sve_shift_operand (operands[2], mode)) > + if (!aarch64_sve_shift_operand (operands[2], mode) > + && !aarch64_simd_shift_i

Re: [PATCH] [testsuite] [aarch64] match alt cache clear names in sme nonlocal_goto tests

2025-05-20 Thread Richard Sandiford
Alexandre Oliva writes: > vxworks calls cacheTextUpdate instead of __clear_cache. > > Adjust the sme/nonlocal_goto_*.c tests for inexact matches. > > Regstrapped on x86_64-linux-gnu. Also tested with gcc-14 on aarch64-, > arm-, x86-, and x86_64-vxworks7r2. Ok to install? > > > for gcc/testsuite

Re: [PATCH] [testsuite] [aarch64] use uint64_t in rwsr tests

2025-05-20 Thread Richard Sandiford
Alexandre Oliva writes: > stdint.h defines uint64_t instead of __uint64_t, so use the former. > __uint64_t is not available on e.g. vxworks. > > Regstrapped on x86_64-linux-gnu. Also tested with gcc-14 on aarch64-, > arm-, x86-, and x86_64-vxworks7r2. Ok to install? > > > for gcc/testsuite/Chan

Re: [PUSHED] aarch64: Fix an oversight in aarch64_evpc_reencode

2025-05-20 Thread Richard Sandiford
Pengxuan Zheng writes: > Some fields (e.g., zero_op0_p and zero_op1_p) of the struct "newd" may be left > uninitialized in aarch64_evpc_reencode. This can cause reading of > uninitialized > data. I found this oversight when testing my patches on and/fmov > optimizations. This patch fixes the bug

Re: [AUTOFDO][AARCH64] Add support for profilebootstrap

2025-05-19 Thread Richard Sandiford
Kugan Vivekanandarajah writes: > diff --git a/Makefile.in b/Makefile.in > index b1ed67d3d4f..b5e3e520791 100644 > --- a/Makefile.in > +++ b/Makefile.in > @@ -4271,7 +4271,7 @@ all-stageautoprofile-bfd: configure-stageautoprofile-bfd > $(HOST_EXPORTS) \ > $(POSTSTAGE1_HOST_EXPORTS) \ >

Re: [PATCH 6/9] genemit: Consistently use operand arrays in gen_* functions

2025-05-19 Thread Richard Sandiford
Richard Sandiford writes: > Jeff Law writes: >> So two questions. Is there any meanginful performance impact expected >> here using the array form rather than locals? And does this impact how >> folks write their C/C++ fragments in the expanders and such? > > I

Re: [PATCH][RFC] Allow the target to request a masked vector epilogue

2025-05-19 Thread Richard Sandiford
Richard Biener writes: > On Fri, 16 May 2025, Richard Sandiford wrote: >> > The simple prototype below uses a separate flag from the epilogue >> > mode, but I wonder how we want to more generally want to handle >> > whether to use masking or not when iterating

Re: [PATCH 3/3] genemit: Use a byte encoding to generate insns

2025-05-18 Thread Richard Sandiford
Richard Biener writes: >> Am 16.05.2025 um 19:37 schrieb Richard Sandiford : >> >> genemit has traditionally used open-coded gen_rtx_FOO sequences >> to build up the instruction pattern. This is now the source of >> quite a bit of bloat in the binary, and also a s

Re: [PATCH 6/9] genemit: Consistently use operand arrays in gen_* functions

2025-05-18 Thread Richard Sandiford
Jeff Law writes: > On 5/16/25 11:21 AM, Richard Sandiford wrote: >> One slightly awkward part about emitting the generator function >> bodies is that: >> >> * define_insn and define_expand routines have a separate argument for >>each operand, named "op

Re: [PATCH 1/9] nds32: Avoid accessing beyond the operands[] array

2025-05-18 Thread Richard Sandiford
Jeff Law writes: > On 5/16/25 11:32 AM, Jeff Law wrote: >> >> >> On 5/16/25 11:21 AM, Richard Sandiford wrote: >>> This pattern used operands[2] to hold the shift amount, even though >>> the pattern doesn't have an operand 2 (not even as a matc

Re: [PATCH 1/3] genemit: Remove support for string operands

2025-05-18 Thread Richard Sandiford
Jeff Law writes: > On 5/16/25 11:32 AM, Richard Sandiford wrote: >> gen_exp currently supports the 's' (string) operand type. It would >> certainly be possible to make the upcoming bytecode patch support >> that too. However, the rtx codes that have string operand

[PATCH 3/3] genemit: Use a byte encoding to generate insns

2025-05-16 Thread Richard Sandiford
genemit has traditionally used open-coded gen_rtx_FOO sequences to build up the instruction pattern. This is now the source of quite a bit of bloat in the binary, and also a source of slow compile times. Two obvious ways of trying to deal with this are: (1) Try to identify rtxes that have a simi

[PATCH 2/3] genemit: Avoid using gen_exp in output_add_clobbers

2025-05-16 Thread Richard Sandiford
output_add_clobbers emits code to add: (clobber (scratch:M)) and/or: (clobber (reg:M R)) expressions to the end of a PARALLEL. At the moment, it does this using the general gen_exp function. That makes sense with the code in its current form, but with later patches it's more convenient to

[PATCH 1/3] genemit: Remove support for string operands

2025-05-16 Thread Richard Sandiford
gen_exp currently supports the 's' (string) operand type. It would certainly be possible to make the upcoming bytecode patch support that too. However, the rtx codes that have string operands should be very rarely used in hard-coded define_insn/expand/split/peephole2 rtx templates (as opposed to

[PATCH 0/3] Make genemit.cc use a byte encoding of the rtx pattern

2025-05-16 Thread Richard Sandiford
t.mk. OK to install? Richard Richard Sandiford (3): genemit: Remove support for string operands genemit: Avoid using gen_exp in output_add_clobbers genemit: Use a byte encoding to generate insns gcc/emit-rtl.cc | 292 gcc/genemit.cc|

[PATCH 8/9] genemit: Always track multiple uses of operands

2025-05-16 Thread Richard Sandiford
gen_exp has code to detect when the same operand is used multiple times. It ensures that second and subsequent uses call copy_rtx, to enforce correct unsharing. However, for historical reasons that aren't clear to me, this was skipped for a define_insn unless the define_insn was a parallel. It wa

[PATCH 7/9] genemit: Add a generator struct

2025-05-16 Thread Richard Sandiford
gen_exp now has quite a few arguments that need to be passed to each recursive call. This patch turns it and related routines into member functions of a new generator class, so that the shared information can be stored in member variables. This also helps to make later patches less noisy. gcc/

[PATCH 3/9] genemit: Use references rather than pointers

2025-05-16 Thread Richard Sandiford
This patch makes genemit.cc pass the md_rtx_info around by constant reference rather than pointer. It's somewhat of a cosmetic change on its own, but it makes later changes less noisy. gcc/ * genemit.cc (gen_exp): Make the info argument a constant reference. (gen_emit_seq, gen_ins

[PATCH 5/9] genemit: Factor out code common to insns and expands

2025-05-16 Thread Richard Sandiford
Mostly to reduce cut-&-paste. gcc/ * genemit.cc (start_gen_insn): New function, split out from... (gen_insn, gen_expand): ...here. --- gcc/genemit.cc | 45 ++--- 1 file changed, 22 insertions(+), 23 deletions(-) diff --git a/gcc/genemit.cc

[PATCH 6/9] genemit: Consistently use operand arrays in gen_* functions

2025-05-16 Thread Richard Sandiford
One slightly awkward part about emitting the generator function bodies is that: * define_insn and define_expand routines have a separate argument for each operand, named "operand0" upwards. * define_split and define_peephole2 routines take a pointer to an array, named "operands". * the C++ p

[PATCH 9/9] genemit: Remove purported handling of location_ts

2025-05-16 Thread Richard Sandiford
gen_exp had code to handle the 'L' operand format. But this format is specifically for location_ts, which are only used in RTX_INSNs. Those should never occur in this context, where the input is always an md file rather than an __RTL function. Any hard-coded raw location value would be meaningles

[PATCH 4/9] genemit: Add an internal queue

2025-05-16 Thread Richard Sandiford
An earlier version of this series wanted to collect information about all the gen_* functions that are going to be generated. The current version no longer does that, but the queue seemed worth keeping anyway, since it gives a more consistent structure. gcc/ * genemit.cc (queue): New stati

[PATCH 2/9] xstormy16: Avoid accessing beyond the operands[] array

2025-05-16 Thread Richard Sandiford
The negsi2 C++ code writes to operands[2] even though the pattern has no operand 2. gcc/ * config/stormy16/stormy16.md (negsi2): Remove unused assignment. --- gcc/config/stormy16/stormy16.md | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/gcc/config/stormy16/stormy16

[PATCH 0/9] Some tweaks to genemit.cc

2025-05-16 Thread Richard Sandiford
rget changes. Otherwise it's stuff that I could self-approve, but I'll leave a few days for comments. Richard Sandiford (9): nds32: Avoid accessing beyond the operands[] array xstormy16: Avoid accessing beyond the operands[] array genemit: Use references rather than pointers g

[PATCH 1/9] nds32: Avoid accessing beyond the operands[] array

2025-05-16 Thread Richard Sandiford
This pattern used operands[2] to hold the shift amount, even though the pattern doesn't have an operand 2 (not even as a match_dup). This caused a build failure with -Werror: array subscript 2 is above array bounds of ‘rtx_def* [2]’ gcc/ * config/nds32/nds32-intrinsic.md (unspec_get_pen

Re: [PATCH v2 2/2] emit-rtl: Validate mode for paradoxical hardware subregs [PR119966]

2025-05-16 Thread Richard Sandiford
Dimitar Dimitrov writes: > After r16-160-ge6f89d78c1a752, late_combine2 started transforming the > following RTL for pru-unknown-elf: > > (insn 3949 3948 3951 255 (set (reg:QI 56 r14.b0 [orig:1856 _619 ] [1856]) > (and:QI (reg:QI 1 r0.b1 [orig:1855 _201 ] [1855]) > (const

Re: [PATCH v2 1/2] emit-rtl: Allow extra checks for paradoxical subregs [PR119966]

2025-05-16 Thread Richard Sandiford
PR target/119966 > > gcc/ChangeLog: > > * emit-rtl.cc (validate_subreg): Do not exit immediately for > paradoxical subregs. Filter subsequent tests which are > not valid for paradoxical subregs. > > Co-authored-by: Richard Sandiford > Signed-off-by:

Re: [PATCH] [PR120276] regcprop: Replace partial_subreg_p by ordered_p && maybe_lt

2025-05-16 Thread Richard Sandiford
Jennifer Schmitz writes: > [PATCH] [PR120276] regcprop: Return from copy_value for unordered modes > > The ICE in PR120276 resulted from a comparison of VNx4QI and V8QI using > partial_subreg_p in the function copy_value during the RTL pass > regcprop, failing the assertion in > > inline bool > pa

Re: [PATCH][RFC] Allow the target to request a masked vector epilogue

2025-05-16 Thread Richard Sandiford
Richard Biener writes: > Targets recently got the ability to request the vector mode to be > used for a vector epilogue (or the epilogue of a vector epilogue). The > following adds the ability for it to indicate the epilogue should use > loop masking, irrespective of the --param vect-partial-vect

Re: [PATCH] [PR120276] regcprop: Replace partial_subreg_p by ordered_p && maybe_lt

2025-05-16 Thread Richard Sandiford
Jennifer Schmitz writes: > The ICE in PR120276 resulted from a comparison of VNx4QI and V8QI using > partial_subreg_p in the function copy_value during the RTL pass > regcprop, failing the assertion in > > inline bool > partial_subreg_p (machine_mode outermode, machine_mode innermode) > { > /* M

[PATCH 2/4] Automatic replacement of get_insns/end_sequence pairs

2025-05-15 Thread Richard Sandiford
This is the result of using a regexp to replace instances of: = get_insns (); end_sequence (); with: = end_sequence (); where the indentation is the same for both lines, and where there might be blank lines inbetween. gcc/ * asan.cc (asan_clear_shadow): Use the return value of

[PATCH 4/4] Manual tweak of some end_sequence callers

2025-05-15 Thread Richard Sandiford
This patch mops up obvious redundancies that weren't caught by the automatic regexp replacements in earlier patches. It doesn't do anything with genemit.cc, since that will be part of a later series. gcc/ * config/arm/arm.cc (arm_gen_load_multiple_1): Simplify use of end_sequence.

[PATCH 1/4] Make end_sequence return the insn sequence

2025-05-15 Thread Richard Sandiford
The start_sequence/end_sequence interface was a big improvement over the previous state, but one slightly awkward thing about it is that you have to call get_insns before end_sequence in order to get the insn sequence itself: To get the contents of the sequence just made, you must call `get_

[PATCH 3/4] Automatic replacement of end_sequence/return pairs

2025-05-15 Thread Richard Sandiford
This is the result of using a regexp to replace: rtx( |_insn *) = end_sequence (); return ; with: return end_sequence (); gcc/ * asan.cc (asan_emit_allocas_unpoison): Directly return the result of end_sequence. (hwasan_emit_untag_frame): Likewise. * config/

[PATCH 0/4] Make end_sequence return the insn sequence

2025-05-15 Thread Richard Sandiford
nemit series that I'm hoping to post tomorrow. Bootstrapped & regression-tested on aarch64-linux-gnu & x86_64-linux-gnu. Also tested against "all" config.gcc cases using config/config-list.mk. OK to install? Richard Richard Sandiford (4): Make end_sequence return t

Re: [PATCH] Enhance -fopt-info-vec vectorized loop diagnostic

2025-05-14 Thread Richard Sandiford
Richard Biener writes: > The following includes whether we vectorize an epilogue, whether > we use loop masking and what vectorization factor (unroll factor) > we use. So it's now > > t.c:4:21: optimized: loop vectorized using 64 byte vectors and unroll factor > 32 > t.c:4:21: optimized: epilogu

Re: [PATCH 8/8] AArch64: rules for CMPBR instructions

2025-05-14 Thread Richard Sandiford
Karl Meakin writes: > On 07/05/2025 14:32, Richard Sandiford wrote: >> Karl Meakin writes: >>> Add rules for lowering `cbranch4` to CBB/CBH/CB when CMPBR >>> extension is enabled. >>> >>> gcc/ChangeLog: >>> >>> * config/aa

Re: [PATCH 8/8] AArch64: rules for CMPBR instructions

2025-05-14 Thread Richard Sandiford
Karl Meakin writes: >>> + else >>> +{ >>> + operands[1] = aarch64_gen_compare_reg (GET_CODE (operands[0]), >>> +operands[1], operands[2]); >>> + operands[2] = const0_rtx; >>> +} >>> + } >>> +) >>> + >>> @@ -758,6 +781,58 @@ (define_expand

Re: [AUTOFDO][AARCH64] Add support for profilebootstrap

2025-05-14 Thread Richard Sandiford
Kugan Vivekanandarajah writes: > Adding Eugene and Andi to CC as Sam suggested. > >> On 13 May 2025, at 12:57 am, Richard Sandiford >> wrote: >> >> External email: Use caution opening links or attachments >> >> >> Kugan Vivekanandarajah writ

Re: [PATCH][RFC] Add vector_costs::add_vector_cost vector stmt grouping hook

2025-05-13 Thread Richard Sandiford
Richard Biener writes: > The following refactors the vectorizer vector_costs target API > to add a new vector_costs::add_vector_cost entry which groups > all individual sub-stmts we create per "vector stmt", aka SLP > node. This allows for the targets to more easily match on > complex cases like

Re: [PATCH] aarch64: Remove cmov6 patterns

2025-05-12 Thread Richard Sandiford
Andrew Pinski writes: > Since the cmov optab is not used and is being removed, > the `cmov6` patterns from the aarch64 backend can > also be removed. > > gcc/ChangeLog: > * config/aarch64/aarch64.md (cmov6): Remove. OK, thanks. Richard > > Signed-off-by: Andrew Pinski > --- > gcc/config

Re: [PATCH v2 2/3] aarch64: Optimize AND with certain vector of immediates as FMOV [PR100165]

2025-05-12 Thread Richard Sandiford
Pengxuan Zheng writes: > diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc > index 15f08cebeb1..98ce85dfdae 100644 > --- a/gcc/config/aarch64/aarch64.cc > +++ b/gcc/config/aarch64/aarch64.cc > @@ -23621,6 +23621,36 @@ aarch64_simd_valid_and_imm (rtx op) >return aarch64

Re: [PATCH] optabs: Remove cmov optab [PR120230]

2025-05-12 Thread Richard Sandiford
Andrew Pinski writes: > cmov optab was added back in r0-24110-g1c0290eaac4094 > (https://gcc.gnu.org/pipermail/gcc-patches/1999-September/018596.html) > but it was never used. movcc is used instead and since > r0-93453-gf90b7a5a7913cc (cond-optab), > movcc becomes what cmov_optab was going to be;

Re: [PATCH v2 3/3] aarch64: Add more vector permute tests for the FMOV optimization [PR100165]

2025-05-12 Thread Richard Sandiford
Pengxuan Zheng writes: > diff --git a/gcc/testsuite/gcc.target/aarch64/fmov-3-le.c > b/gcc/testsuite/gcc.target/aarch64/fmov-3-le.c > new file mode 100644 > index 000..adbf87243f6 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/fmov-3-le.c > @@ -0,0 +1,130 @@ > +/* { dg-do compil

Re: [PATCH v2 1/3] aarch64: Recognize vector permute patterns which can be interpreted as AND [PR100165]

2025-05-12 Thread Richard Sandiford
Pengxuan Zheng writes: > +/* Recognize patterns suitable for the AND instructions. */ > +static bool > +aarch64_evpc_and (struct expand_vec_perm_d *d) > +{ > + /* Either d->op0 or d->op1 should be a vector of all zeros. */ > + if (d->one_vector_p || (!d->zero_op0_p && !d->zero_op1_p)) > +r

Re: [PATCH 2/2] Move vector lowering to before vectorization

2025-05-12 Thread Richard Sandiford
Richard Biener writes: > The following moves vector lowering to before vectorization - in fact > to before DCE/forwprop and CSE. This gets us the chance to re-vectorize > the lowered form eventually. Note that when the vectorizer learns to > handle vector code vector lowering should be likely in

Re: [AUTOFDO][AARCH64] Add support for profilebootstrap

2025-05-12 Thread Richard Sandiford
Kugan Vivekanandarajah writes: > diff --git a/configure.ac b/configure.ac > index 730db3c1402..701284e38f2 100644 > --- a/configure.ac > +++ b/configure.ac > @@ -621,6 +621,14 @@ case "${target}" in > ;; > esac > > +autofdo_target="i386" > +case "${target}" in > + aarch64-*-*) > +auto

Re: [PATCH] ext-dce: Only transform extend to subreg if TRULY_NOOP_TRUNCATION [PR 120050]

2025-05-12 Thread Richard Sandiford
Xi Ruoyao writes: > The tranform would be unsafe if !TRULY_NOOP_TRUNCATION because on these > machines the hardware may look at bits outside of the given mode. > > gcc/ChangeLog: > > PR rtl-optimization/120050 > * ext-dce.cc (ext_dce_try_optimize_insn): Only transform the > insn

Re: [PATCH] vect: Improve vectorization for small-trip-count loops using subvectors

2025-05-09 Thread Richard Sandiford
Tamar Christina writes: >> -Original Message- >> From: Richard Biener >> Sent: Friday, May 9, 2025 11:08 AM >> To: Richard Sandiford >> Cc: Pengfei Li ; gcc-patches@gcc.gnu.org; >> ktkac...@nvidia.com >> Subject: Re: [PATCH] vect: Improv

Re: [PATCH] vect: Improve vectorization for small-trip-count loops using subvectors

2025-05-09 Thread Richard Sandiford
Richard Biener writes: > On Thu, 8 May 2025, Pengfei Li wrote: > >> This patch improves the auto-vectorization for loops with known small >> trip counts by enabling the use of subvectors - bit fields of original >> wider vectors. A subvector must have the same vector element type as the >> origina

Re: [PATCH] aarch64: Use LDR for first-element loads for Advanced SIMD

2025-05-08 Thread Richard Sandiford
Dhruv Chawla writes: > This patch modifies Advanced SIMD assembly generation to emit an LDR > instruction when a vector is created using a load to the first element with > the > other elements being zero. > > This is similar to what *aarch64_combinez already does. > > Example: > > uint8x16_t foo(

Re: [PATCH 2/2] gensupport: validate compact constraint modifiers

2025-05-08 Thread Richard Sandiford
Richard Earnshaw writes: > For constraints there are operand modifiers and constraint qualifiers. > Operand modifiers apply to all alternatives and must appear, in > traditional syntax before the first alternative. Constraint > qualifiers, on the other hand must appear in each alternative to whic

Re: [PATCH] emit-rtl: Add extra checks for paradoxical hardware subregs [PR119966]

2025-05-08 Thread Richard Sandiford
Dimitar Dimitrov writes: > On Tue, May 06, 2025 at 01:17:40PM +0100, Richard Sandiford wrote: >> Dimitar Dimitrov writes: >> > After r16-160-ge6f89d78c1a752, late_combine2 started transforming the >> > following RTL for pru-unknown-elf: >> > >> > (i

Re: [PATCH v2] asf: Fix calling of emit_move_insn on registers of different modes [PR119884]

2025-05-08 Thread Richard Sandiford
Konstantinos Eleftheriou writes: > During the base register initialization, when we are eliminating the load > instruction, we were calling `emit_move_insn` on registers of the same > size but of different mode in some cases, causing an ICE. > > This patch uses `lowpart_subreg` for the base regist

Re: [RFC PATCH 0/5] aarch64: Support for user-defined aarch64 tuning parameters in JSON

2025-05-08 Thread Richard Sandiford
Kyrylo Tkachov writes: > In Hi Richard, > >> On 6 May 2025, at 12:34, Richard Sandiford wrote: >> >> writes: >>> From: Soumya AR >>> >>> Hi, >>> >>> This RFC and subsequent patch series introduces support for printing

Re: [RFC PATCH 0/2] Add target_clones profile option support

2025-05-08 Thread Richard Sandiford
Yangyu Chen writes: >> On 6 May 2025, at 17:49, Alfie Richards wrote: >> >> On 06/05/2025 09:36, Yangyu Chen wrote: On 6 May 2025, at 16:01, Alfie Richards wrote: Hello, I like this idea. I have a couple thoughts to add. On 05/05/2025 09:46, Yangyu Chen wro

Re: [PATCH] AArch64: Optimize SVE loads/stores with ptrue predicates to unpredicated instructions.

2025-05-08 Thread Richard Sandiford
Sorry for the slow review. Jennifer Schmitz writes: > SVE loads and stores where the predicate is all-true can be optimized to > unpredicated instructions. For example, > svuint8_t foo (uint8_t *x) > { > return svld1 (svptrue_b8 (), x); > } > was compiled to: > foo: > ptrue p3.b, all >

Re: [GCC16, RFC, V2 06/14] opts: doc: aarch64: add new memtag sanitizer

2025-05-08 Thread Richard Sandiford
Indu Bhagat writes: >>> [...] >>> diff --git a/gcc/opts.cc b/gcc/opts.cc >>> index 86c6691ecec4..00db662c32ef 100644 >>> --- a/gcc/opts.cc >>> +++ b/gcc/opts.cc >>> [...] >>> @@ -2780,6 +2788,13 @@ common_handle_option (struct gcc_options *opts, >>> SET_OPTION_IF_UNSET (opts, opts_set, >>>

Re: [GCC16,RFC,V2 03/14] aarch64: add new insn definition for st2g

2025-05-07 Thread Richard Sandiford
Indu Bhagat writes: >>> starting bb 3 >>> 33: {cc:CC=cmp(r121:DI,0x10);r121:DI=r121:DI-0x10;} >>> 32: r122:DI=r122:DI+0x10 >>> 31: [r122:DI+0]=unspec/v[[r122:DI+0],r120:DI] 17 >>> mem count failure >>> mem count failure >> >> Yeah, we'd need to update auto-inc-dec for this case. >>

Re: [PATCH 7/8] AArch64: precommit test for CMPBR instructions

2025-05-07 Thread Richard Sandiford
Richard Earnshaw writes: > On 07/05/2025 17:28, Richard Earnshaw (lists) wrote: >> On 07/05/2025 16:54, Richard Sandiford wrote: >>> Richard Earnshaw writes: >>>> On 07/05/2025 13:57, Richard Sandiford wrote: >>>>> Kyrylo Tkachov writes: >&g

Re: [PATCH 8/8] AArch64: rules for CMPBR instructions

2025-05-07 Thread Richard Sandiford
Richard Sandiford writes: >> @@ -758,6 +781,58 @@ (define_expand "cbranchcc4" >>"" >> ) >> >> +;; Emit a `CB (register)` or `CB (immediate)` instruction. >> +(define_insn "aarch64_cb" >> +

Re: [PATCH 7/8] AArch64: precommit test for CMPBR instructions

2025-05-07 Thread Richard Sandiford
Richard Earnshaw writes: > On 07/05/2025 13:57, Richard Sandiford wrote: >> Kyrylo Tkachov writes: >>>> On 7 May 2025, at 12:27, Karl Meakin wrote: >>>> >>>> Commit the test file `cmpbr.c` before rules for generating the new >>>> instru

Re: [PATCH 8/8] AArch64: rules for CMPBR instructions

2025-05-07 Thread Richard Sandiford
Karl Meakin writes: > Add rules for lowering `cbranch4` to CBB/CBH/CB when CMPBR > extension is enabled. > > gcc/ChangeLog: > > * config/aarch64/aarch64.md (cbranch4): emit CMPBR > instructions if possible. > (cbranch4): new expand rule. > (aarch64_cb): likewise. > (a

Re: [PATCH 7/8] AArch64: precommit test for CMPBR instructions

2025-05-07 Thread Richard Sandiford
Kyrylo Tkachov writes: >> On 7 May 2025, at 12:27, Karl Meakin wrote: >> >> Commit the test file `cmpbr.c` before rules for generating the new >> instructions are added, so that the changes in codegen are more obvious >> in the next commit. > > I guess that’s an LLVM best practice. > In GCC sinc

Re: [PATCH 4/8] AArch64: add constants for branch displacements

2025-05-07 Thread Richard Sandiford
Karl Meakin writes: > Extract the hardcoded values for the minimum PC-relative displacements > into named constants and document them. > > gcc/ChangeLog: > > * config/aarch64/aarch64.md (BRANCH_LEN_P_128MiB): New constant. > (BRANCH_LEN_N_128MiB): likewise. > (BRANCH_LEN_P_1MiB):

Re: [PATCH 3/8] AArch64: rename branch instruction rules

2025-05-07 Thread Richard Sandiford
Kyrylo Tkachov writes: >> On 7 May 2025, at 12:27, Karl Meakin wrote: >> >> Give the `define_insn` rules used in lowering `cbranch4` to RTL >> more descriptive and consistent names: from now on, each rule is named >> after the AArch64 instruction that it generates. Also add comments to >> docume

Re: [PATCH 2/8] AArch64: reformat branch instruction rules

2025-05-07 Thread Richard Sandiford
Karl Meakin writes: > Make the formatting of the RTL templates in the rules for branch > instructions more consistent with each other. One source of variation is the 80-character limit. It's a bit of a soft limit for rtl, but it is still good to keep to it where that's easy. So... > > gcc/Chang

Re: [PATCH] [PR117978] AArch64: Fold SVE load/store with certain ptrue patterns to LDR/STR.

2025-05-07 Thread Richard Sandiford
Jennifer Schmitz writes: > @@ -3698,6 +3706,24 @@ aarch64_partial_ptrue_length (rtx_vector_builder > &builder, >return vl; > } > > +/* Return: > + > + * -1 if all bits of PRED are set > + * N if PRED has N leading set bits followed by all clear bits > + * 0 if PRED does not have any of

Re: [PATCH] asf: Fix calling of emit_move_insn on registers of different modes [PR119884]

2025-05-07 Thread Richard Sandiford
by zero-extending it. For big-endian targets, the least significant byte should come from address X+3 rather than address X. The byte at address X (i.e. the byte with the equal offset) should instead go in the most significant byte, typically using a shift left. Richard > > Konstanti

Re: [PATCH][www] Mark reload as to be removed for GCC 16

2025-05-06 Thread Richard Sandiford
Richard Biener writes: > The following amends gcc-15/changes.html with a note that reload > is going to be removed for GCC 16. > > OK for www? > > * htdocs/gcc-15/changes.html: Mark GCC 15 as last release > supporting reload. My reading of the threads was that no-one is objecting to t

Re: [PATCH] [PR117978] AArch64: Fold SVE load/store with certain ptrue patterns to LDR/STR.

2025-05-06 Thread Richard Sandiford
Jennifer Schmitz writes: > About the tests: Non-power-of-2 patterns are already being tested in > gcc.target/aarch64/sve/acle/general/whilelt_5.c. OK > For the case of svptrue_b16 () > with 8-bit load, I added a test case for it. Currently, it has a single test > case, > but if necessary I can

Re: [PATCH 2/2] aarch64: Fold lsl+lsr+orr to rev for half-width shifts

2025-05-06 Thread Richard Sandiford
Dhruv Chawla writes: > This patch modifies the intrinsic expanders to expand svlsl and svlsr to > unpredicated forms when the predicate is a ptrue. It also folds the > following pattern: > > lsl , , > lsr , , > orr , , > > to: > > revb/h/w , > > when the shift amount is equal to half t

Re: [PATCH 1/2] aarch64: Match unpredicated shift patterns for ADR, SRA, and ADDHNB instructions

2025-05-06 Thread Richard Sandiford
Hi, Thanks for the update. The patch mostly looks good, but one minor and one more substantial comment below. BTW, the patch seems to have been corrupted en route, in that unchanged lines have too much space. Attaching is fine if that's easier. Dhruv Chawla writes: > diff --git a/gcc/config/a

Re: [PATCH] emit-rtl: Add extra checks for paradoxical hardware subregs [PR119966]

2025-05-06 Thread Richard Sandiford
Dimitar Dimitrov writes: > After r16-160-ge6f89d78c1a752, late_combine2 started transforming the > following RTL for pru-unknown-elf: > > (insn 3949 3948 3951 255 (set (reg:QI 56 r14.b0 [orig:1856 _619 ] [1856]) > (and:QI (reg:QI 1 r0.b1 [orig:1855 _201 ] [1855]) > (const

Re: [GCC16,RFC,V2 03/14] aarch64: add new insn definition for st2g

2025-05-06 Thread Richard Sandiford
Indu Bhagat writes: > On 4/15/25 9:21 AM, Richard Sandiford wrote: >> Indu Bhagat writes: >>> Store Allocation Tags (st2g) is an Armv8.5-A memory tagging (MTE) >>> instruction. It stores an allocation tag to two tag granules of memory. >>> >>> TBD:

Re: [RFC PATCH 0/5] aarch64: Support for user-defined aarch64 tuning parameters in JSON

2025-05-06 Thread Richard Sandiford
writes: > From: Soumya AR > > Hi, > > This RFC and subsequent patch series introduces support for printing and > parsing > of aarch64 tuning parameters in the form of JSON. Thanks for doing this. It looks really useful. My main question is: rather than write the parsing and printing routines

Re: [PATCH] [PR117978] AArch64: Fold SVE load/store with certain ptrue patterns to LDR/STR.

2025-05-02 Thread Richard Sandiford
Jennifer Schmitz writes: > SVE loads/stores using predicates that select the bottom 8, 16, 32, 64, > or 128 bits of a register can be folded to ASIMD LDR/STR, thus avoiding the > predicate. > For example, > svuint8_t foo (uint8_t *x) { > return svld1 (svwhilelt_b8 (0, 16), x); > } > was previous

  1   2   3   4   5   6   7   8   9   10   >