Dhruv Chawla writes:
> On 08/05/25 18:43, Richard Sandiford wrote:
>> Otherwise it looks good. But I think we should think about how we
>> plan to integrate the related optimisation for register inputs. E.g.:
>>
>> int32x4_t foo(int32_t x) {
>> return vs
Sorry for the slow reply.
Dimitar Dimitrov writes:
> On Fri, May 16, 2025 at 06:14:30PM +0100, Richard Sandiford wrote:
>> Dimitar Dimitrov writes:
>> > After r16-160-ge6f89d78c1a752, late_combine2 started transforming the
>> > following RTL for pru-unknown-elf:
>&
Sorry for the slow reply, had a few days off.
Xi Ruoyao writes:
> If we see a promoted subreg and TRULY_NOOP_TRUNCATION says the
> truncation is not a noop, then all bits of the inner reg are live. We
> cannot reduce the live mask to that of the mode of the subreg.
>
> gcc/ChangeLog:
>
> P
Richard Biener writes:
> On Thu, May 22, 2025 at 12:19 PM Richard Sandiford
> wrote:
>>
>> As the rtl.texi documentation of RTX_AUTOINC expressions says:
>>
>> If a register used as the operand of these expressions is used in
>> another address in an insn
Richard Biener writes:
> On Mon, 19 May 2025, Richard Sandiford wrote:
>
>> Richard Biener writes:
>>> On Fri, 16 May 2025, Richard Sandiford wrote:
>>>>> The simple prototype below uses a separate flag from the epilogue
>>>>> mode, but I
Andrew Pinski writes:
> The middle-end uses rtx_cost on constants with the outer of being COMPARE
> to find out the cost of a constant formation for a comparison instruction.
> So for aarch64 backend, we would just return the cost of constant formation
> in general. We can improve this by seeing i
Kugan Vivekanandarajah writes:
> Add support for autoprofiledbootstrap in aarch64.
> This is similar to what is done for i386. Added
> gcc/config/aarch64/gcc-auto-profile for aarch64 profile
> creation.
>
> How to run:
> configure --with-build-config=bootstrap-lto
> make autoprofiledbootstrap
>
>
Dhruv Chawla writes:
> On 20/05/25 16:35, Richard Sandiford wrote:
>> Dhruv Chawla writes:
>>> [...]
>>> Would it be a good idea to add tests for the bad codegen as well? I have
>>> added tests for lsl/usra in the next round of patches.
>>
>> Nah
Alexandre Oliva writes:
> On May 21, 2025, Richard Sandiford wrote:
>
>> I think this one shows a deeper issue, though. -fsanitize=shadow-call-stack
>> is currently hardcoded to use x18:
>
> Oh, indeed!
>
>> and I assume this usage will be incompatible wi
writes:
> [...]
> +;; The RTL combiners are able to combine "ior (ashift, ashiftrt)" to a
> "bswap".
> +;; Match that as well.
> +(define_insn_and_split "*v_revvnx8hi"
> + [(parallel
> +[(set (match_operand:VNx8HI 0 "register_operand")
> + (bswap:VNx8HI (match_operand 1 "register_opera
these instructions.
>
> Bootstrapped and regtested on aarch64-linux-gnu.
>
> Signed-off-by: Dhruv Chawla
> Co-authored-by: Richard Sandiford
>
> gcc/ChangeLog:
>
> * gcc/config/aarch64/aarch64-sve.md (@aarch64_adr_shift):
> Match lowered form o
As the rtl.texi documentation of RTX_AUTOINC expressions says:
If a register used as the operand of these expressions is used in
another address in an insn, the original value of the register is
used. Uses of the register outside of an address are not permitted
within the same insn as a u
Jeff Law writes:
> Given you know the RTL gen* related thingies better than anyone, I'd say
> go forward and if there's any fallout, we can certainly cope with it.
Thanks. I've now pushed the series and the earlier genemit tweaks,
with the discussed change to mark the operandN arguments as cons
Pengxuan Zheng writes:
> There was a bug in aarch64_evpc_reencode which could leave zero_op0_p and
> zero_op1_p of the struct "newd" uninitialized. r16-701-gd77c3bc1c35e303 fixed
> the issue by zero initializing "newd." This patch provides an alternative fix
> a
Christophe Lyon writes:
> Many tests became unsupported on aarch64 when -mcpu=unset was added to
> several arm_* effective targets, because this flag is only supported
> on arm.
>
> Since these effective targets are used on arm and aarch64, the patch
> adds -mcpu=unset on arm only, and restores ""
Alexandre Oliva writes:
> VxWorks uses x18 as the TCB, so STATIC_CHAIN_REGNUM has long been set
> (in gcc/config/aarch64/aarch64-vxworks.h) to use x9 instead.
>
> This patch marks x18 as fixed on TARGET_VXWORKS, so that it is not
> chosen by the register allocator, and adjusts tests that depend on
The automatically-generated gen_* routines take their operands as
individual arguments, named "operand0" upwards. These arguments are
stored into an "operands" array before invoking the expander's C++
code, which can then modify the operands by writing to the array.
However, the SPARC sign-extend
Richard Sandiford writes:
> Konstantinos Eleftheriou writes:
>> This patch uses `lowpart_subreg` for the base register initialization,
>> instead of zero-extending it. We had tried this solution before, but
>> we were leaving undefined bytes in the upper part of the regist
Tamar Christina writes:
> Hi All,
>
> With the middle-end providing a way to make vectorization more profitable by
> scaling vect-scalar-cost-multiplier this makes a more user friendly option
> to make it easier to use.
>
> I propose making it an actual -m option that we document and retain vs usi
Konstantinos Eleftheriou writes:
> Hi Richard, thanks for your response.
>
> On Tue, May 20, 2025 at 8:05 AM Richard Biener
> wrote:
>>
>> On Mon, May 19, 2025 at 4:14 PM Konstantinos Eleftheriou
>> wrote:
>> >
>> > This patch adds the `bitmap_is_range_set_p` function in sbitmap,
>> > which chec
Konstantinos Eleftheriou writes:
> This patch uses `lowpart_subreg` for the base register initialization,
> instead of zero-extending it. We had tried this solution before, but
> we were leaving undefined bytes in the upper part of the register.
> This shouldn't be happening as we are supposed to
Konstantinos Eleftheriou writes:
> This patch adds the `bitmap_bit_in_range_p_1` helper function,
> in order to be used by `bitmap_bit_in_range_p`. The helper function
> contains the previous implementation of `bitmap_bit_in_range_p` and
> `bitmap_bit_in_range_p` has been updated to call the helpe
Dhruv Chawla writes:
> On 06/05/25 21:57, Richard Sandiford wrote:
>> External email: Use caution opening links or attachments
>>
>>
>> Dhruv Chawla writes:
>>> This patch modifies the intrinsic expanders to expand svlsl and svlsr to
>>> unpredi
writes:
> @@ -4899,7 +4876,9 @@
> if (CONST_INT_P (operands[2]))
>{
> amount = gen_const_vec_duplicate (mode, operands[2]);
> - if (!aarch64_sve_shift_operand (operands[2], mode))
> + if (!aarch64_sve_shift_operand (operands[2], mode)
> + && !aarch64_simd_shift_i
Alexandre Oliva writes:
> vxworks calls cacheTextUpdate instead of __clear_cache.
>
> Adjust the sme/nonlocal_goto_*.c tests for inexact matches.
>
> Regstrapped on x86_64-linux-gnu. Also tested with gcc-14 on aarch64-,
> arm-, x86-, and x86_64-vxworks7r2. Ok to install?
>
>
> for gcc/testsuite
Alexandre Oliva writes:
> stdint.h defines uint64_t instead of __uint64_t, so use the former.
> __uint64_t is not available on e.g. vxworks.
>
> Regstrapped on x86_64-linux-gnu. Also tested with gcc-14 on aarch64-,
> arm-, x86-, and x86_64-vxworks7r2. Ok to install?
>
>
> for gcc/testsuite/Chan
Pengxuan Zheng writes:
> Some fields (e.g., zero_op0_p and zero_op1_p) of the struct "newd" may be left
> uninitialized in aarch64_evpc_reencode. This can cause reading of
> uninitialized
> data. I found this oversight when testing my patches on and/fmov
> optimizations. This patch fixes the bug
Kugan Vivekanandarajah writes:
> diff --git a/Makefile.in b/Makefile.in
> index b1ed67d3d4f..b5e3e520791 100644
> --- a/Makefile.in
> +++ b/Makefile.in
> @@ -4271,7 +4271,7 @@ all-stageautoprofile-bfd: configure-stageautoprofile-bfd
> $(HOST_EXPORTS) \
> $(POSTSTAGE1_HOST_EXPORTS) \
>
Richard Sandiford writes:
> Jeff Law writes:
>> So two questions. Is there any meanginful performance impact expected
>> here using the array form rather than locals? And does this impact how
>> folks write their C/C++ fragments in the expanders and such?
>
> I
Richard Biener writes:
> On Fri, 16 May 2025, Richard Sandiford wrote:
>> > The simple prototype below uses a separate flag from the epilogue
>> > mode, but I wonder how we want to more generally want to handle
>> > whether to use masking or not when iterating
Richard Biener writes:
>> Am 16.05.2025 um 19:37 schrieb Richard Sandiford :
>>
>> genemit has traditionally used open-coded gen_rtx_FOO sequences
>> to build up the instruction pattern. This is now the source of
>> quite a bit of bloat in the binary, and also a s
Jeff Law writes:
> On 5/16/25 11:21 AM, Richard Sandiford wrote:
>> One slightly awkward part about emitting the generator function
>> bodies is that:
>>
>> * define_insn and define_expand routines have a separate argument for
>>each operand, named "op
Jeff Law writes:
> On 5/16/25 11:32 AM, Jeff Law wrote:
>>
>>
>> On 5/16/25 11:21 AM, Richard Sandiford wrote:
>>> This pattern used operands[2] to hold the shift amount, even though
>>> the pattern doesn't have an operand 2 (not even as a matc
Jeff Law writes:
> On 5/16/25 11:32 AM, Richard Sandiford wrote:
>> gen_exp currently supports the 's' (string) operand type. It would
>> certainly be possible to make the upcoming bytecode patch support
>> that too. However, the rtx codes that have string operand
genemit has traditionally used open-coded gen_rtx_FOO sequences
to build up the instruction pattern. This is now the source of
quite a bit of bloat in the binary, and also a source of slow
compile times.
Two obvious ways of trying to deal with this are:
(1) Try to identify rtxes that have a simi
output_add_clobbers emits code to add:
(clobber (scratch:M))
and/or:
(clobber (reg:M R))
expressions to the end of a PARALLEL. At the moment, it does this
using the general gen_exp function. That makes sense with the code
in its current form, but with later patches it's more convenient to
gen_exp currently supports the 's' (string) operand type. It would
certainly be possible to make the upcoming bytecode patch support
that too. However, the rtx codes that have string operands should
be very rarely used in hard-coded define_insn/expand/split/peephole2
rtx templates (as opposed to
t.mk. OK to install?
Richard
Richard Sandiford (3):
genemit: Remove support for string operands
genemit: Avoid using gen_exp in output_add_clobbers
genemit: Use a byte encoding to generate insns
gcc/emit-rtl.cc | 292
gcc/genemit.cc|
gen_exp has code to detect when the same operand is used multiple
times. It ensures that second and subsequent uses call copy_rtx,
to enforce correct unsharing.
However, for historical reasons that aren't clear to me, this was
skipped for a define_insn unless the define_insn was a parallel.
It wa
gen_exp now has quite a few arguments that need to be passed
to each recursive call. This patch turns it and related routines
into member functions of a new generator class, so that the shared
information can be stored in member variables.
This also helps to make later patches less noisy.
gcc/
This patch makes genemit.cc pass the md_rtx_info around by constant
reference rather than pointer. It's somewhat of a cosmetic change
on its own, but it makes later changes less noisy.
gcc/
* genemit.cc (gen_exp): Make the info argument a constant reference.
(gen_emit_seq, gen_ins
Mostly to reduce cut-&-paste.
gcc/
* genemit.cc (start_gen_insn): New function, split out from...
(gen_insn, gen_expand): ...here.
---
gcc/genemit.cc | 45 ++---
1 file changed, 22 insertions(+), 23 deletions(-)
diff --git a/gcc/genemit.cc
One slightly awkward part about emitting the generator function
bodies is that:
* define_insn and define_expand routines have a separate argument for
each operand, named "operand0" upwards.
* define_split and define_peephole2 routines take a pointer to an array,
named "operands".
* the C++ p
gen_exp had code to handle the 'L' operand format. But this format
is specifically for location_ts, which are only used in RTX_INSNs.
Those should never occur in this context, where the input is always
an md file rather than an __RTL function. Any hard-coded raw
location value would be meaningles
An earlier version of this series wanted to collect information
about all the gen_* functions that are going to be generated.
The current version no longer does that, but the queue seemed
worth keeping anyway, since it gives a more consistent structure.
gcc/
* genemit.cc (queue): New stati
The negsi2 C++ code writes to operands[2] even though the pattern
has no operand 2.
gcc/
* config/stormy16/stormy16.md (negsi2): Remove unused assignment.
---
gcc/config/stormy16/stormy16.md | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/gcc/config/stormy16/stormy16
rget changes. Otherwise it's
stuff that I could self-approve, but I'll leave a few days for comments.
Richard Sandiford (9):
nds32: Avoid accessing beyond the operands[] array
xstormy16: Avoid accessing beyond the operands[] array
genemit: Use references rather than pointers
g
This pattern used operands[2] to hold the shift amount, even though
the pattern doesn't have an operand 2 (not even as a match_dup).
This caused a build failure with -Werror:
array subscript 2 is above array bounds of ‘rtx_def* [2]’
gcc/
* config/nds32/nds32-intrinsic.md (unspec_get_pen
Dimitar Dimitrov writes:
> After r16-160-ge6f89d78c1a752, late_combine2 started transforming the
> following RTL for pru-unknown-elf:
>
> (insn 3949 3948 3951 255 (set (reg:QI 56 r14.b0 [orig:1856 _619 ] [1856])
> (and:QI (reg:QI 1 r0.b1 [orig:1855 _201 ] [1855])
> (const
PR target/119966
>
> gcc/ChangeLog:
>
> * emit-rtl.cc (validate_subreg): Do not exit immediately for
> paradoxical subregs. Filter subsequent tests which are
> not valid for paradoxical subregs.
>
> Co-authored-by: Richard Sandiford
> Signed-off-by:
Jennifer Schmitz writes:
> [PATCH] [PR120276] regcprop: Return from copy_value for unordered modes
>
> The ICE in PR120276 resulted from a comparison of VNx4QI and V8QI using
> partial_subreg_p in the function copy_value during the RTL pass
> regcprop, failing the assertion in
>
> inline bool
> pa
Richard Biener writes:
> Targets recently got the ability to request the vector mode to be
> used for a vector epilogue (or the epilogue of a vector epilogue). The
> following adds the ability for it to indicate the epilogue should use
> loop masking, irrespective of the --param vect-partial-vect
Jennifer Schmitz writes:
> The ICE in PR120276 resulted from a comparison of VNx4QI and V8QI using
> partial_subreg_p in the function copy_value during the RTL pass
> regcprop, failing the assertion in
>
> inline bool
> partial_subreg_p (machine_mode outermode, machine_mode innermode)
> {
> /* M
This is the result of using a regexp to replace instances of:
= get_insns ();
end_sequence ();
with:
= end_sequence ();
where the indentation is the same for both lines, and where there
might be blank lines inbetween.
gcc/
* asan.cc (asan_clear_shadow): Use the return value of
This patch mops up obvious redundancies that weren't caught by the
automatic regexp replacements in earlier patches. It doesn't do
anything with genemit.cc, since that will be part of a later series.
gcc/
* config/arm/arm.cc (arm_gen_load_multiple_1): Simplify use of
end_sequence.
The start_sequence/end_sequence interface was a big improvement over
the previous state, but one slightly awkward thing about it is that
you have to call get_insns before end_sequence in order to get the
insn sequence itself:
To get the contents of the sequence just made, you must call
`get_
This is the result of using a regexp to replace:
rtx( |_insn *) = end_sequence ();
return ;
with:
return end_sequence ();
gcc/
* asan.cc (asan_emit_allocas_unpoison): Directly return the
result of end_sequence.
(hwasan_emit_untag_frame): Likewise.
* config/
nemit series that I'm
hoping to post tomorrow.
Bootstrapped & regression-tested on aarch64-linux-gnu & x86_64-linux-gnu.
Also tested against "all" config.gcc cases using config/config-list.mk.
OK to install?
Richard
Richard Sandiford (4):
Make end_sequence return t
Richard Biener writes:
> The following includes whether we vectorize an epilogue, whether
> we use loop masking and what vectorization factor (unroll factor)
> we use. So it's now
>
> t.c:4:21: optimized: loop vectorized using 64 byte vectors and unroll factor
> 32
> t.c:4:21: optimized: epilogu
Karl Meakin writes:
> On 07/05/2025 14:32, Richard Sandiford wrote:
>> Karl Meakin writes:
>>> Add rules for lowering `cbranch4` to CBB/CBH/CB when CMPBR
>>> extension is enabled.
>>>
>>> gcc/ChangeLog:
>>>
>>> * config/aa
Karl Meakin writes:
>>> + else
>>> +{
>>> + operands[1] = aarch64_gen_compare_reg (GET_CODE (operands[0]),
>>> +operands[1], operands[2]);
>>> + operands[2] = const0_rtx;
>>> +}
>>> + }
>>> +)
>>> +
>>> @@ -758,6 +781,58 @@ (define_expand
Kugan Vivekanandarajah writes:
> Adding Eugene and Andi to CC as Sam suggested.
>
>> On 13 May 2025, at 12:57 am, Richard Sandiford
>> wrote:
>>
>> External email: Use caution opening links or attachments
>>
>>
>> Kugan Vivekanandarajah writ
Richard Biener writes:
> The following refactors the vectorizer vector_costs target API
> to add a new vector_costs::add_vector_cost entry which groups
> all individual sub-stmts we create per "vector stmt", aka SLP
> node. This allows for the targets to more easily match on
> complex cases like
Andrew Pinski writes:
> Since the cmov optab is not used and is being removed,
> the `cmov6` patterns from the aarch64 backend can
> also be removed.
>
> gcc/ChangeLog:
> * config/aarch64/aarch64.md (cmov6): Remove.
OK, thanks.
Richard
>
> Signed-off-by: Andrew Pinski
> ---
> gcc/config
Pengxuan Zheng writes:
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index 15f08cebeb1..98ce85dfdae 100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -23621,6 +23621,36 @@ aarch64_simd_valid_and_imm (rtx op)
>return aarch64
Andrew Pinski writes:
> cmov optab was added back in r0-24110-g1c0290eaac4094
> (https://gcc.gnu.org/pipermail/gcc-patches/1999-September/018596.html)
> but it was never used. movcc is used instead and since
> r0-93453-gf90b7a5a7913cc (cond-optab),
> movcc becomes what cmov_optab was going to be;
Pengxuan Zheng writes:
> diff --git a/gcc/testsuite/gcc.target/aarch64/fmov-3-le.c
> b/gcc/testsuite/gcc.target/aarch64/fmov-3-le.c
> new file mode 100644
> index 000..adbf87243f6
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/fmov-3-le.c
> @@ -0,0 +1,130 @@
> +/* { dg-do compil
Pengxuan Zheng writes:
> +/* Recognize patterns suitable for the AND instructions. */
> +static bool
> +aarch64_evpc_and (struct expand_vec_perm_d *d)
> +{
> + /* Either d->op0 or d->op1 should be a vector of all zeros. */
> + if (d->one_vector_p || (!d->zero_op0_p && !d->zero_op1_p))
> +r
Richard Biener writes:
> The following moves vector lowering to before vectorization - in fact
> to before DCE/forwprop and CSE. This gets us the chance to re-vectorize
> the lowered form eventually. Note that when the vectorizer learns to
> handle vector code vector lowering should be likely in
Kugan Vivekanandarajah writes:
> diff --git a/configure.ac b/configure.ac
> index 730db3c1402..701284e38f2 100644
> --- a/configure.ac
> +++ b/configure.ac
> @@ -621,6 +621,14 @@ case "${target}" in
> ;;
> esac
>
> +autofdo_target="i386"
> +case "${target}" in
> + aarch64-*-*)
> +auto
Xi Ruoyao writes:
> The tranform would be unsafe if !TRULY_NOOP_TRUNCATION because on these
> machines the hardware may look at bits outside of the given mode.
>
> gcc/ChangeLog:
>
> PR rtl-optimization/120050
> * ext-dce.cc (ext_dce_try_optimize_insn): Only transform the
> insn
Tamar Christina writes:
>> -Original Message-
>> From: Richard Biener
>> Sent: Friday, May 9, 2025 11:08 AM
>> To: Richard Sandiford
>> Cc: Pengfei Li ; gcc-patches@gcc.gnu.org;
>> ktkac...@nvidia.com
>> Subject: Re: [PATCH] vect: Improv
Richard Biener writes:
> On Thu, 8 May 2025, Pengfei Li wrote:
>
>> This patch improves the auto-vectorization for loops with known small
>> trip counts by enabling the use of subvectors - bit fields of original
>> wider vectors. A subvector must have the same vector element type as the
>> origina
Dhruv Chawla writes:
> This patch modifies Advanced SIMD assembly generation to emit an LDR
> instruction when a vector is created using a load to the first element with
> the
> other elements being zero.
>
> This is similar to what *aarch64_combinez already does.
>
> Example:
>
> uint8x16_t foo(
Richard Earnshaw writes:
> For constraints there are operand modifiers and constraint qualifiers.
> Operand modifiers apply to all alternatives and must appear, in
> traditional syntax before the first alternative. Constraint
> qualifiers, on the other hand must appear in each alternative to whic
Dimitar Dimitrov writes:
> On Tue, May 06, 2025 at 01:17:40PM +0100, Richard Sandiford wrote:
>> Dimitar Dimitrov writes:
>> > After r16-160-ge6f89d78c1a752, late_combine2 started transforming the
>> > following RTL for pru-unknown-elf:
>> >
>> > (i
Konstantinos Eleftheriou writes:
> During the base register initialization, when we are eliminating the load
> instruction, we were calling `emit_move_insn` on registers of the same
> size but of different mode in some cases, causing an ICE.
>
> This patch uses `lowpart_subreg` for the base regist
Kyrylo Tkachov writes:
> In Hi Richard,
>
>> On 6 May 2025, at 12:34, Richard Sandiford wrote:
>>
>> writes:
>>> From: Soumya AR
>>>
>>> Hi,
>>>
>>> This RFC and subsequent patch series introduces support for printing
Yangyu Chen writes:
>> On 6 May 2025, at 17:49, Alfie Richards wrote:
>>
>> On 06/05/2025 09:36, Yangyu Chen wrote:
On 6 May 2025, at 16:01, Alfie Richards wrote:
Hello,
I like this idea. I have a couple thoughts to add.
On 05/05/2025 09:46, Yangyu Chen wro
Sorry for the slow review.
Jennifer Schmitz writes:
> SVE loads and stores where the predicate is all-true can be optimized to
> unpredicated instructions. For example,
> svuint8_t foo (uint8_t *x)
> {
> return svld1 (svptrue_b8 (), x);
> }
> was compiled to:
> foo:
> ptrue p3.b, all
>
Indu Bhagat writes:
>>> [...]
>>> diff --git a/gcc/opts.cc b/gcc/opts.cc
>>> index 86c6691ecec4..00db662c32ef 100644
>>> --- a/gcc/opts.cc
>>> +++ b/gcc/opts.cc
>>> [...]
>>> @@ -2780,6 +2788,13 @@ common_handle_option (struct gcc_options *opts,
>>> SET_OPTION_IF_UNSET (opts, opts_set,
>>>
Indu Bhagat writes:
>>> starting bb 3
>>> 33: {cc:CC=cmp(r121:DI,0x10);r121:DI=r121:DI-0x10;}
>>> 32: r122:DI=r122:DI+0x10
>>> 31: [r122:DI+0]=unspec/v[[r122:DI+0],r120:DI] 17
>>> mem count failure
>>> mem count failure
>>
>> Yeah, we'd need to update auto-inc-dec for this case.
>>
Richard Earnshaw writes:
> On 07/05/2025 17:28, Richard Earnshaw (lists) wrote:
>> On 07/05/2025 16:54, Richard Sandiford wrote:
>>> Richard Earnshaw writes:
>>>> On 07/05/2025 13:57, Richard Sandiford wrote:
>>>>> Kyrylo Tkachov writes:
>&g
Richard Sandiford writes:
>> @@ -758,6 +781,58 @@ (define_expand "cbranchcc4"
>>""
>> )
>>
>> +;; Emit a `CB (register)` or `CB (immediate)` instruction.
>> +(define_insn "aarch64_cb"
>> +
Richard Earnshaw writes:
> On 07/05/2025 13:57, Richard Sandiford wrote:
>> Kyrylo Tkachov writes:
>>>> On 7 May 2025, at 12:27, Karl Meakin wrote:
>>>>
>>>> Commit the test file `cmpbr.c` before rules for generating the new
>>>> instru
Karl Meakin writes:
> Add rules for lowering `cbranch4` to CBB/CBH/CB when CMPBR
> extension is enabled.
>
> gcc/ChangeLog:
>
> * config/aarch64/aarch64.md (cbranch4): emit CMPBR
> instructions if possible.
> (cbranch4): new expand rule.
> (aarch64_cb): likewise.
> (a
Kyrylo Tkachov writes:
>> On 7 May 2025, at 12:27, Karl Meakin wrote:
>>
>> Commit the test file `cmpbr.c` before rules for generating the new
>> instructions are added, so that the changes in codegen are more obvious
>> in the next commit.
>
> I guess that’s an LLVM best practice.
> In GCC sinc
Karl Meakin writes:
> Extract the hardcoded values for the minimum PC-relative displacements
> into named constants and document them.
>
> gcc/ChangeLog:
>
> * config/aarch64/aarch64.md (BRANCH_LEN_P_128MiB): New constant.
> (BRANCH_LEN_N_128MiB): likewise.
> (BRANCH_LEN_P_1MiB):
Kyrylo Tkachov writes:
>> On 7 May 2025, at 12:27, Karl Meakin wrote:
>>
>> Give the `define_insn` rules used in lowering `cbranch4` to RTL
>> more descriptive and consistent names: from now on, each rule is named
>> after the AArch64 instruction that it generates. Also add comments to
>> docume
Karl Meakin writes:
> Make the formatting of the RTL templates in the rules for branch
> instructions more consistent with each other.
One source of variation is the 80-character limit. It's a bit of a soft
limit for rtl, but it is still good to keep to it where that's easy.
So...
>
> gcc/Chang
Jennifer Schmitz writes:
> @@ -3698,6 +3706,24 @@ aarch64_partial_ptrue_length (rtx_vector_builder
> &builder,
>return vl;
> }
>
> +/* Return:
> +
> + * -1 if all bits of PRED are set
> + * N if PRED has N leading set bits followed by all clear bits
> + * 0 if PRED does not have any of
by zero-extending it.
For big-endian targets, the least significant byte should come from
address X+3 rather than address X. The byte at address X (i.e. the
byte with the equal offset) should instead go in the most significant
byte, typically using a shift left.
Richard
>
> Konstanti
Richard Biener writes:
> The following amends gcc-15/changes.html with a note that reload
> is going to be removed for GCC 16.
>
> OK for www?
>
> * htdocs/gcc-15/changes.html: Mark GCC 15 as last release
> supporting reload.
My reading of the threads was that no-one is objecting to t
Jennifer Schmitz writes:
> About the tests: Non-power-of-2 patterns are already being tested in
> gcc.target/aarch64/sve/acle/general/whilelt_5.c.
OK
> For the case of svptrue_b16 ()
> with 8-bit load, I added a test case for it. Currently, it has a single test
> case,
> but if necessary I can
Dhruv Chawla writes:
> This patch modifies the intrinsic expanders to expand svlsl and svlsr to
> unpredicated forms when the predicate is a ptrue. It also folds the
> following pattern:
>
> lsl , ,
> lsr , ,
> orr , ,
>
> to:
>
> revb/h/w ,
>
> when the shift amount is equal to half t
Hi,
Thanks for the update. The patch mostly looks good, but one minor and
one more substantial comment below.
BTW, the patch seems to have been corrupted en route, in that unchanged
lines have too much space. Attaching is fine if that's easier.
Dhruv Chawla writes:
> diff --git a/gcc/config/a
Dimitar Dimitrov writes:
> After r16-160-ge6f89d78c1a752, late_combine2 started transforming the
> following RTL for pru-unknown-elf:
>
> (insn 3949 3948 3951 255 (set (reg:QI 56 r14.b0 [orig:1856 _619 ] [1856])
> (and:QI (reg:QI 1 r0.b1 [orig:1855 _201 ] [1855])
> (const
Indu Bhagat writes:
> On 4/15/25 9:21 AM, Richard Sandiford wrote:
>> Indu Bhagat writes:
>>> Store Allocation Tags (st2g) is an Armv8.5-A memory tagging (MTE)
>>> instruction. It stores an allocation tag to two tag granules of memory.
>>>
>>> TBD:
writes:
> From: Soumya AR
>
> Hi,
>
> This RFC and subsequent patch series introduces support for printing and
> parsing
> of aarch64 tuning parameters in the form of JSON.
Thanks for doing this. It looks really useful. My main question is:
rather than write the parsing and printing routines
Jennifer Schmitz writes:
> SVE loads/stores using predicates that select the bottom 8, 16, 32, 64,
> or 128 bits of a register can be folded to ASIMD LDR/STR, thus avoiding the
> predicate.
> For example,
> svuint8_t foo (uint8_t *x) {
> return svld1 (svwhilelt_b8 (0, 16), x);
> }
> was previous
1 - 100 of 2152 matches
Mail list logo