Andrew Pinski writes:
> r15-6789-ge7f98d9603808b added a new RTL pass for hardreg PRE for the hard
> register
> of FPM_REGNUM, this pass could get expensive if you have a large number of
> basic blocks
> and the hard register was never alive so it does nothing in the end.
> In the aarch64 case,
Andrew Pinski writes:
> r15-6789-ge7f98d9603808b added a new RTL pass for hardreg PRE for the hard
> register
> of FPM_REGNUM, but this pass does nothing if there can be any FPM_REGNUM
> register in it.
> So let's set HARDREG_PRE_REGNOS to include all zeros if !TARGET_FP8.
> Now the pass will on
Kyrylo Tkachov writes:
>> On 15 Jul 2025, at 15:50, Richard Sandiford
>> wrote:
>>
>> Kyrylo Tkachov writes:
>>> Hi all,
>>>
>>> SVE2 BSL2N (x, y, z) = (x & z) | (~y & ~z). When x == y this computes:
>>> (x & z) | (~
Kyrylo Tkachov writes:
> From 930789b3c366777c49d4eb2f4dc84b0374601504 Mon Sep 17 00:00:00 2001
> From: Kyrylo Tkachov
> Date: Fri, 11 Jul 2025 02:50:32 -0700
> Subject: [PATCH 1/2] aarch64: Use SVE2 NBSL for vector NOR and NAND for
> Advanced SIMD modes
>
> We already have patterns to use the N
Kyrylo Tkachov writes:
> Hi all,
>
> SVE2 BSL2N (x, y, z) = (x & z) | (~y & ~z). When x == y this computes:
> (x & z) | (~x & ~z) which is ~(x ^ z).
> Thus, we can use it to match RTL patterns (not (xor (...) (...))) for both
> Advanced SIMD and SVE modes when TARGET_SVE2.
> This patch does that.
ackports with the subreg_size_lowpart_offset change
mentioned below.
It doesn't look like you have commit access yet. If you'd like it,
please follow the instructions at https://gcc.gnu.org/gitwrite.html
(I'll sponsor).
Richard
> Konstantinos
>
> On Fri, Jul 4, 2
Alfie Richards writes:
> Hi all,
>
> This is a minor fixup to the previous patch I committed fixing Spencers
> comments.
>
> Bootstrapped and reg tested for Aarch64.
>
> Thanks,
> Alfie
>
> -- >8 --
>
> Fixup to the SME2+FAMINMAX intrinsics commit.
>
> gcc/ChangeLog:
>
> * config/aarch64/aar
Soumya AR writes:
> One additional change with this patch is that I had to update ldapr-sext.c
> too.
>
> During the combine pass, cases of UNSPECV_LDAP (with an offset) + sign_extend
> transform to LDAPUR + SEXT, and later to LDAPURS (with address folding).
>
> The aarch64 tests run with -moverr
Tamar Christina writes:
> One question I did have not directly related to the unwinder changes,
> But the ABI mentions that if any of the reserved bytes in TPIDR2_EL0
> Block are non-zero that TPIDR2_EL0 must be left unchanged [1].
The full requirement is:
If TPIDR2_EL0 is nonnull and if any r
Spencer Abson writes:
> This patch extends the splitting patterns for combining FP comparisons
> with predicated logical operations such that they cover all of SVE_F.
>
> gcc/ChangeLog:
>
> * config/aarch64/aarch64-sve.md (*fcm_and_combine):
> Extend from SVE_FULL_F to SVE_F.
> (
Richard Biener writes:
> On Mon, 14 Jul 2025, Richard Sandiford wrote:
>
>> Richard Biener writes:
>> > For loop masking we need to mask a mask AND operation with the loop
>> > mask. The following makes sure we have a corresponding mask
>> > available.
Spencer Abson writes:
> [...]
> +/* If REF describes the high half of a 128-bit vector, return this
> + vector. Otherwise, return NULL_TREE. */
> +static tree
> +aarch64_v128_highpart_ref (const_tree ref)
> +{
> + if (TREE_CODE (ref) != SSA_NAME)
> +return NULL_TREE;
> +
> + gassign *stm
Richard Biener writes:
> For loop masking we need to mask a mask AND operation with the loop
> mask. The following makes sure we have a corresponding mask
> available. There's no good way to distinguish loop masking from
> len masking here, so assume we have recorded a mask for the operands
> ma
Kyrylo Tkachov writes:
>> On 11 Jul 2025, at 16:48, Richard Sandiford
>> wrote:
>>> Shall I backport this for GCC 15.2 as well?
>>> The test case uses C operators which were enabled in GCC 15, though I
>>> suppose one could construct a pure ACLE intrins
Tamar Christina writes:
>> -Original Message-
>> From: Richard Sandiford
>> Sent: Friday, July 11, 2025 4:23 PM
>> To: gcc-patches@gcc.gnu.org
>> Cc: Alex Coplan ; Alice Carlotti
>> ;
>> pins...@gmail.com; ktkac...@nvidia.com; Richard Earnsha
This PR is partly about a code quality regression that was triggered
by g:caa7a99a052929d5970677c5b639e1fa5166e334. That patch taught the
gimple optimisers to fold two VEC_PERM_EXPRs into one, conditional
upon either (a) the original permutations not being "native" operations
or (b) the combined p
Kyrylo Tkachov writes:
>> On 10 Jul 2025, at 11:12, Kyrylo Tkachov wrote:
>>
>>
>>
>>> On 10 Jul 2025, at 10:40, Richard Sandiford
>>> wrote:
>>>
>>> Kyrylo Tkachov writes:
>>>> Hi all,
>>>>
>>>&
Richard Biener writes:
> On Thu, 10 Jul 2025, Richard Sandiford wrote:
>
>> Richard Biener writes:
>> > The following removes an optimization that wrongly triggers right now
>> > because it accesses LOOP_VINFO_COST_MODEL_THRESHOLD which might not be
>> > co
g:4b47acfe2b626d1276e229a0cf165e934813df6c caused a segfault
in aarch64_vector_costs::analyze_loop_vinfo when costing scalar
code, since we'd end up dividing by a zero VF.
Much of the structure of the aarch64 costing code dates from
a stage 4 patch, when we had to work within the bounds of what
th
Richard Biener writes:
> The following removes an optimization that wrongly triggers right now
> because it accesses LOOP_VINFO_COST_MODEL_THRESHOLD which might not be
> computed yet.
>
> Testing on x86_64 didn't reveal any testsuite coverage.
>
> Bootstrapped and tested on x86_64-unknown-linux-gn
The SVE svpfalse folding tests use CFI directives to delimit the
function bodies. That requires -funwind-tables to be enabled,
which is true by default for *-linux-gnu targets, but not for *-elf.
Tested on aarch64-linux-gnu and aarch64_be-elf. Pushed as obvious.
Richard
gcc/testsuite/
LD1Q gathers and ST1Q scatters are unusual in that they operate
on 128-bit blocks (effectively VNx1TI). However, we don't have
modes or ACLE types for 128-bit integers, and 128-bit integers
are not the intended use case. Instead, the instructions are
intended to be used in "hybrid VLA" operations
Soumya AR writes:
>> On 10 Jul 2025, at 3:15 PM, Richard Sandiford
>> wrote:
>>
>> External email: Use caution opening links or attachments
>>
>>
>> Soumya AR writes:
>>>> On 1 Jul 2025, at 9:22 PM, Kyrylo Tkachov wrote:
>>>
Soumya AR writes:
>> On 1 Jul 2025, at 9:22 PM, Kyrylo Tkachov wrote:
>>
>>
>>
>>> On 1 Jul 2025, at 17:36, Richard Sandiford
>>> wrote:
>>>
>>> Soumya AR writes:
>>>> From 2a2c3e3683aaf3041524df166fc6f8cf20895a0b Mo
Kyrylo Tkachov writes:
> Hi all,
>
> While the SVE2 NBSL instruction accepts MOVPRFX to add more flexibility
> due to its tied operands, the destination of the movprfx cannot be also
> a source operand. But the offending pattern in aarch64-sve2.md tries
> to do exactly that for the "=?&w,w,w" alte
Richard Sandiford writes:
> TARGET_VECTORIZE_VEC_PERM_CONST has code to match the SVE2.1
> "hybrid VLA" DUPQ, EXTQ, UZPQ{1,2}, and ZIPQ{1,2} instructions.
> This matching was conditional on !BYTES_BIG_ENDIAN.
>
> The ACLE code also lowered the associated SVE2.1 i
Matthieu Longo writes:
> Those methods's implementation is relying on duck-typing at compile
> time.
> The structure corresponding to the node of a doubly linked list needs
> to define attributes 'prev' and 'next' which are pointers on the type
> of a node.
> The structure wrapping the nodes and o
aarch64_simd_valid_imm tries to decompose a constant into a repeating
series of 64 bits, since most Advanced SIMD and SVE immediate forms
require that. (The exceptions are handled first.) It does this by
building up a byte-level register image, lsb first. If the image does
turn out to repeat eve
When using SVE INDEX to load an Advanced SIMD vector, we need to
take account of the different element ordering for big-endian
targets. For example, when big-endian targets store the V4SI
constant { 0, 1, 2, 3 } in registers, 0 becomes the most
significant element, whereas INDEX always operates fr
TARGET_VECTORIZE_VEC_PERM_CONST has code to match the SVE2.1
"hybrid VLA" DUPQ, EXTQ, UZPQ{1,2}, and ZIPQ{1,2} instructions.
This matching was conditional on !BYTES_BIG_ENDIAN.
The ACLE code also lowered the associated SVE2.1 intrinsics into
suitable VEC_PERM_EXPRs. This lowering was not conditio
While working on a new testcase that uses the RTL frontend,
I hit a bug where a (reg ...) that spans multiple hard registers
had REG_NREGS set to 1. This caused various things to misbehave.
For example, if the (reg ...) in question was used as crtl->return_rtx,
only the first register in the group
Jeff Law writes:
> On 7/4/25 10:21 AM, Richard Sandiford wrote:
>> ext-dce had:
>>
>>if (SUBREG_P (dst) && SUBREG_BYTE (dst).is_constant ())
>> {
>>bit = subreg_lsb (dst).to_constant ();
>>if (bit
These tests required runtime support for -fstack-protector,
but didn't test for it.
Tested on aarch64-linux-gnu and aarch64_be-elf & pushed as obvious.
Richard
gcc/testsuite/
* gcc.target/aarch64/pr118348_1.c: Require fstack_protector.
* gcc.target/aarch64/pr118348_2.c: Likewise
Kyrylo Tkachov writes:
> Thanks for your comments, do you mean something like the following?
Yeah, the patch LGTM, thanks.
Richard
> Or do you mean to have separate alternatives with each one individually tying
> one of operands 2 or 3 to r0?
>
> Kyrill
>
>
>>
>> Thanks,
>> Tamar
>>
>>> Than
Matthieu Longo writes:
> Those methods's implementation is relying on duck-typing at compile
> time.
> The structure corresponding to the node of a doubly linked list needs
> to define attributes 'prev' and 'next' which are pointers on the type
> of a node.
> The structure wrapping the nodes and o
Tamar Christina writes:
>> -Original Message-
>> From: Richard Sandiford
>> Sent: Monday, July 7, 2025 12:55 PM
>> To: Kyrylo Tkachov
>> Cc: GCC Patches ; Richard Earnshaw
>> ; Alex Coplan ; Andrew
>> Pinski
>> Subject: Re: [P
Richard Sandiford writes:
> Kyrylo Tkachov writes:
>> Hi all,
>>
>> To handle DImode BCAX operations we want to do them on the SIMD side only if
>> the incoming arguments don't require a cross-bank move.
>> This means we need to split back the combination
Kyrylo Tkachov writes:
> Hi all,
>
> To handle DImode BCAX operations we want to do them on the SIMD side only if
> the incoming arguments don't require a cross-bank move.
> This means we need to split back the combination to separate GP BIC+EOR
> instructions if the operands are expected to be in
Tamar Christina writes:
>> -Original Message-
>> From: Kyrylo Tkachov
>> Sent: Monday, July 7, 2025 10:38 AM
>> To: GCC Patches
>> Cc: Richard Sandiford ; Richard Earnshaw
>> ; Alex Coplan ; Andrew
>> Pinski
>> Subject: [PATCH] aar
In the tree codes and optabs, the "hi" in a vector hi/lo pair means
"most significant" and the "lo" means "least significant", with
sigificance following GCC's normal endian expectations. Thus on
big-endian targets, the hi part handles the first half of the elements
in memory order and the lo part
ext-dce had:
if (SUBREG_P (dst) && SUBREG_BYTE (dst).is_constant ())
{
bit = subreg_lsb (dst).to_constant ();
if (bit >= HOST_BITS_PER_WIDE_INT)
bit = HOST_BITS_PER_WIDE_INT - 1;
dst = SUBREG_REG (dst);
But a constant
Lowpart subregs are generally disallowed on big-endian SVE vector
registers, since the first memory element is stored at the least
significant end of the register, rather than the most significant end.
(See the comment at the head of aarch64-sve.md for details,
and aarch64_modes_compatible_p for th
aarch64_expand_vector_init contains some divide-and-conquer code
that tries to load the odd and even elements into 64-bit registers
and then ZIP them together. On big-endian targets, the even elements
are more significant than the odd elements and so should come second
in the ZIP.
This fixes many
Richard Biener writes:
> @@ -1738,8 +1738,13 @@ protected:
>unsigned int m_suggested_unroll_factor;
>
>/* The suggested mode to be used for a vectorized epilogue or VOIDmode,
> - determined at finish_cost. */
> + determined at finish_cost. m_masked_epilogue is epilogue should u
Spencer Abson writes:
> This patch extends our vec_cmp expander to support partial FP modes.
>
> We use a predicate mode that is narrower the operation's VPRED to govern
> unpacked FP operations under flag_trapping_math, so the expansion must
> handle cases where the comparison's target and govern
"H.J. Lu" writes:
> On Thu, Jul 3, 2025 at 11:02 PM Richard Sandiford
> wrote:
>>
>> "H.J. Lu" writes:
>> > Since a backend may ignore user type alignment for arguments passed on
>> > stack, update alignment for arguments passed on stack
Konstantinos Eleftheriou writes:
> On Wed, May 7, 2025 at 11:29 AM Richard Sandiford
> wrote:
>> But I thought the code was allowing multiple stores to be forwarded to
>> a single (wider) load. E.g. 4 individual byte stores at address X, X+1,
>> X+2 and X+3 could be fo
Alfie Richards writes:
> +/* string_slice inherits from array_slice, specifically to refer to a
> substring
> + of a character array.
> + It includes some string like helpers. */
> +class string_slice : public array_slice
> +{
> +public:
> + string_slice () : array_slice () {}
> + string_s
"H.J. Lu" writes:
> Since a backend may ignore user type alignment for arguments passed on
> stack, update alignment for arguments passed on stack when copying MEM's
> memory attributes.
>
> gcc/
>
> PR target/120839
> * emit-rtl.cc (set_mem_attrs): Update alignment for argument on
> stack.
>
> gc
Richard Biener writes:
> The following fixes bad alignment computaton for epilog vectorization
> when as in this case for 510.parest_r and masked epilog vectorization
> with AVX512 we end up choosing AVX to vectorize the main loop and
> masked AVX512 (sic!) to vectorize the epilog. In that case a
Karl Meakin writes:
> This patch series adds support for the CMPBR extension. It includes the
> new `+cmpbr` option and rules to generate the new instructions when
> lowering conditional branches.
Thanks for the update, LGTM. I've pushed the series to trunk.
Richard
> Changelog:
> * v9:
> -
Karl Meakin writes:
> Commit the test file `mask_load_2.c` before the vectorisation analysis
> is changed, so that the changes in codegen are more obvious in the next
> commit.
>
> gcc/testsuite/ChangeLog:
> * gcc.target/aarch64/sve/mask_load_2.c: New test.
OK, thanks.
Richard
> ---
> ..
Karl Meakin writes:
> On 01/07/2025 11:02, Richard Sandiford wrote:
>> Karl Meakin writes:
>>> @@ -763,6 +784,68 @@ (define_expand "cbranchcc4"
>>> ""
>>> )
>>>
>>> +;; Emit a `CB (register)` or `CB (immediate
Wilco Dijkstra writes:
> Since all Armv9 cores support shifted LDRH/STRH, use the correct cost of zero
> for these.
>
> Passes regress, OK for commit?
>
> gcc:
> * config/aarch64/tuning_models/generic_armv9_a.h
> (generic_armv9_a_addrcost_table): Use zero cost for himode.
OK if th
Richard Sandiford writes:
> Karl Meakin writes:
>> +// If the branch destination is out of range (1KiB), we have to generate an
>> +// extra B instruction (which can handle larger displacements) and branch
>> around
>> +// it
>> +int far_branch(i32 x, i32 y) {
&
Soumya AR writes:
> From 2a2c3e3683aaf3041524df166fc6f8cf20895a0b Mon Sep 17 00:00:00 2001
> From: Soumya AR
> Date: Mon, 30 Jun 2025 12:17:30 -0700
> Subject: [PATCH] aarch64: Enable selective LDAPUR generation for cores with
> RCPC2
>
> This patch adds the ability to fold the address computati
Remi Machet writes:
> Attached is the updated patch for the aarch64 conversion of some
> mvn+shrn patterns into a mvni+subhn. Hopefully attachment fixes the tab
> issues, the cover letter was updated to better explain what the patch
> does, code was changed to use emit_move_insn, and testcase w
Karl Meakin writes:
> Extract the hardcoded values for the minimum PC-relative displacements
> into named constants and document them.
>
> gcc/ChangeLog:
>
> * config/aarch64/aarch64.md (BRANCH_LEN_P_128MiB): New constant.
> (BRANCH_LEN_N_128MiB): Likewise.
> (BRANCH_LEN_P_1MiB):
Karl Meakin writes:
> Commit the test file `cmpbr.c` before rules for generating the new
> instructions are added, so that the changes in codegen are more obvious
> in the next commit.
>
> gcc/testsuite/ChangeLog:
>
> * lib/target-supports.exp: Add `cmpbr` to the list of extensions.
>
Karl Meakin writes:
> @@ -763,6 +784,68 @@ (define_expand "cbranchcc4"
>""
> )
>
> +;; Emit a `CB (register)` or `CB (immediate)` instruction.
> +;; The immediate range depends on the comparison code.
> +;; Comparisons against immediates outside this range fall back to
> +;; CMP + B.
> +(de
Karl Meakin writes:
> @@ -729,30 +729,31 @@ (define_expand "cbranch4"
> (match_operator 0 "aarch64_comparison_operator"
>[(match_operand:GPF_F16 1 "register_operand")
> (match_operand:GPF_F16 2 "aarch64_fp_compare_operand")])
> - (label_ref
Christopher Bazley writes:
> The test added by r16-1671-ge7ff8e8d77df74 passed despite using
> regular expressions that would never match real assembly language
> output from the compiler. Because the regular expressions were not
> expected to match, and didn't, this was not noticeable; however,
>
Christoph Müllner writes:
> This patch converts the fold-mem-offsets pass from DF to RTL-SSA.
> Along with this conversion, the way the pass collects information
> was completely reworked. Instead of visiting each instruction multiple
> times, this is now down only once.
>
> Most significant chan
Richard Biener writes:
> On Fri, 27 Jun 2025, Richard Biener wrote:
>
>> On Thu, 26 Jun 2025, Richard Sandiford wrote:
>>
>> > Richard Biener writes:
>> > > The following fixes the computation of supports_partial_vectors which
>> > > is used t
Richard Biener writes:
> The following fixes the computation of supports_partial_vectors which
> is used to prune the set of modes to iterate over for epilog
> vectorization. The used partial_vectors_supported_p predicate
> only looks for while_ult while also support predication when
> mask modes
Richard Biener writes:
> The following avoids re-analyzing the loop as epilogue when not
> using partial vectors and the mode is the same as the autodetected
> vector mode and that has a too high VF for a non-predicated loop.
> This situation occurs almost always on x86 and saves us one
> re-analy
Karl Meakin writes:
> Move the rules for CBZ/TBZ to be above the rules for
> CBB/CBH/CB. We want them to have higher priority
> because they can express larger displacements.
>
> gcc/ChangeLog:
>
> * config/aarch64/aarch64.md (aarch64_cbz1): Move
> above rules for CBB/CBH/CB.
> (
Richard Biener writes:
> On Tue, 24 Jun 2025, Richard Sandiford wrote:
>
>> Tamar Christina writes:
>> > store_bit_field_1 has an optimization where if a target is not a memory
>> > operand
>> > and the entire value is being set from something larger we
Christoph Müllner writes:
> insn_info::has_been_deleted () is documented to return true if an
> instruction is deleted. Such instructions have their `volatile` bit set,
> which can be tested via rtx_insn::deleted ().
>
> The current condition for insn_info::has_been_deleted () is:
> * m_rtl is no
Karl Meakin writes:
> Add rules for lowering `cbranch4` to CBB/CBH/CB when
> CMPBR extension is enabled.
>
> gcc/ChangeLog:
>
> * config/aarch64/aarch64-protos.h (aarch64_cb_rhs): New function.
> * config/aarch64/aarch64.cc (aarch64_cb_rhs): Likewise.
> * config/aarch64/aarch64.m
Christoph Müllner writes:
> On Tue, Jun 24, 2025 at 9:29 PM Richard Sandiford
> wrote:
>>
>> Christoph Müllner writes:
>> > insn_info::has_been_deleted () is documented to return true if an
>> > instruction is deleted. Such instructions have their `volatil
Richard Sandiford writes:
> Karl Meakin writes:
>> + "r"))
>> + (label_ref (match_operand 2))
>> + (pc)))]
>> + "TARGET_CMPBR"
>> + "cb\\t%0, %1, %l2";
Sorr
Karl Meakin writes:
> Give the `define_insn` rules used in lowering `cbranch4` to RTL
> more descriptive and consistent names: from now on, each rule is named
> after the AArch64 instruction that it generates. Also add comments to
> document each rule.
>
> gcc/ChangeLog:
>
> * config/aarch64
Karl Meakin writes:
> Make the formatting of the RTL templates in the rules for branch
> instructions more consistent with each other.
>
> gcc/ChangeLog:
>
> * config/aarch64/aarch64.md (cbranch4): Reformat.
> (cbranchcc4): Likewise.
> (condjump): Likewise.
> (*compare_cond
Richard Biener writes:
> On Tue, 24 Jun 2025, Richard Sandiford wrote:
>
>> Richard Biener writes:
>> > On Tue, 24 Jun 2025, Richard Sandiford wrote:
>> >
>> >> Tamar Christina writes:
>> >> > store_bit_field_1 has an optimization where
Richard Biener writes:
> On Tue, 24 Jun 2025, Richard Sandiford wrote:
>> Richard Biener writes:
>> > On Tue, 24 Jun 2025, Richard Sandiford wrote:
>> >> (from h8300). This is also why simplify_gen_subreg has:
>> >>
>> >> if (GET_CODE
process_uses_of_deleted_def seems to have been written on the assumption
that non-degenerate phis would be explicitly deleted by an insn_change,
and that the function therefore only needed to delete degenerate phis.
But that was inconsistent with the rest of the code, and wouldn't be
very convenien
lra-eliminations.cc:move_plus_up tries to:
Transform (subreg (plus reg const)) to (plus (subreg reg) const)
when it is possible.
Most of it is heavily conditional:
if (!paradoxical_subreg_p (x)
&& GET_CODE (subreg_reg) == PLUS
&& CONSTANT_P (XEXP (subreg_reg, 1))
&& GET
Tamar Christina writes:
> store_bit_field_1 has an optimization where if a target is not a memory
> operand
> and the entire value is being set from something larger we can just wrap a
> subreg around the source and emit a move.
>
> For vector constructors this is however problematic because the
Andrew Pinski writes:
> On Fri, Jun 20, 2025, 4:47 PM Wilco Dijkstra wrote:
>
>>
>> TARGET_CONST_ANCHOR appears to trigger too often, even on simple
>> immediates.
>> It inserts extra ADD/SUB instructions even when a single MOV exists.
>> Disable it to improve overall code quality: on SPEC2017 it
In this PR, we started with:
(subreg:V2DI (reg:DI virtual-reg) 0)
and vregs instantiated the virtual register to the argument pointer.
But:
(subreg:V2DI (reg:DI ap) 0)
is not a sensible subreg, since the argument pointer certainly can't
be referenced in V2DImode. This is (IMO correctly
Karl Meakin writes:
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index be5a97294dd..1d4ae73a963 100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -944,16 +944,50 @@ static const char *
> svpattern_token (enum aarch64_svpatter
4.html
> - [4]: https://gcc.gnu.org/pipermail/gcc-patches/2024-October/666178.html
> - [5]:
> https://inbox.sourceware.org/gcc-patches/20250604150612.1234394-1-matthieu.lo...@arm.com/
>
> ## Previous revisions
>
> Diff with revision 4 [5]:
> - address the comments from Richard Sandiford.
We generated inefficient code for bitfield references to Advanced
SIMD structure modes. In RTL, these modes are just extra-long
vectors, and so inserting and extracting an element is simply
a vec_set or vec_extract operation.
For the record, I don't think these modes should ever become fully
fled
Alex Coplan writes:
> Hi Remi,
>
> On 12/06/2025 17:02, Richard Sandiford wrote:
>> Remi Machet writes:
>> > + "TARGET_SIMD"
>> > + "#"
>> > + "&& true"
>> > + [(const_int 0)]
>> > +{
>
Jakub Jelinek writes:
> On Fri, Jun 13, 2025 at 10:05:14AM +0200, Jakub Jelinek wrote:
>> On Fri, Jun 13, 2025 at 08:52:55AM +0100, Richard Sandiford wrote:
>> > > 2025-06-12 Jakub Jelinek
>> > >
>> > > * cfgexpand.cc (construct_init_block): If
Jiawei writes:
> This patch adds a new simplification rule to `simplify-rtx.cc` that
> handles a common bit manipulation pattern involving a single-bit set
> and clear followed by XOR.
>
> The transformation targets RTL of the form:
>
> (xor (and (rotate (~1) A) B) (ashift 1 A))
>
> which is sem
Jakub Jelinek writes:
> Hi!
>
> Andrew ran some extra ranger checking during bootstrap and found one more
> case (though much rarer than the GIMPLE_COND case).
>
> Seems on fold-const.cc (native_encode_expr) we end up with bb 2, ENTRY
> bb successor, having PHI nodes (usually there is some bb in b
Remi Machet writes:
> Add an optimization to aarch64 SIMD converting mvn+shrn into mvni+subhn
> which
> allows for better optimization when the code is inside a loop by using a
> constant.
>
> Bootstrapped and regtested on aarch64-linux-gnu.
>
> Signed-off-by: Remi Machet
>
> gcc/ChangeLog:
>
>
Jakub Jelinek writes:
> Hi!
>
> My r16-1398 PR120434 ranger during expansion change broke profiled lto
> bootstrap on x86_64-linux, the following testcase is reduced from that.
>
> The problem is during expand_gimple_cond, if we are unlucky that neither
> of edge_true and edge_false point to the n
Jeff Law writes:
> On 6/9/25 12:40 PM, Dimitar Dimitrov wrote:
>> On Sun, Jun 08, 2025 at 09:09:44AM -0600, Jeff Law wrote:
>>>
>>>
>>> On 6/5/25 2:16 PM, Dimitar Dimitrov wrote:
PR119966 showed that combine could generate unfoldable hardware subregs
for pru-unknown-elf. To fix, strengt
Jiawei writes:
> This patch adds a new simplification rule to `simplify-rtx.cc` that
> handles a common bit manipulation pattern involving a single-bit set
> and clear followed by XOR.
>
> The transformation targets RTL of the form:
>
> (xor (and (rotate (~1) A) B) (ashift 1 A))
>
> which is sem
Jeff Law writes:
> On 6/11/25 9:53 AM, Richard Sandiford wrote:
>
>>
>>> +into B | (1 << A). */
>>> + if (GET_CODE (op0) == AND
>>> + && GET_CODE (XEXP (op0, 0)) == ROTATE
>>> + && CONST_INT_P (XEXP (
Jiawei writes:
> This patch adds a new simplification rule to `simplify-rtx.cc` that
> handles a common bit manipulation pattern involving a single-bit set
> and clear followed by XOR.
>
> The transformation targets RTL of the form:
>
> (xor (and (rotate (~1), A) B) (ashift 1, A))
>
> which is s
Spencer Abson writes:
> On Tue, Jun 10, 2025 at 08:04:06PM +0100, Richard Sandiford wrote:
>> Spencer Abson writes:
>> > On Fri, Jun 06, 2025 at 03:52:12PM +0100, Richard Sandiford wrote:
>> >> Spencer Abson writes:
>> >> > @@ -8165,20 +8169,25 @@
Tamar Christina writes:
> @@ -360,8 +367,8 @@ The number of Newton iterations for calculating the
> reciprocal for double type.
>
> -param=aarch64-autovec-preference=
> Target Joined Var(aarch64_autovec_preference)
> Enum(aarch64_autovec_preference) Init(AARCH64_AUTOVEC_DEFAULT) Param
> ---p
Matthieu Longo writes:
> GCS (Guarded Control Stack, an Armv9.4-a extension) requires some
> caution at runtime. The runtime linker needs to reason about the
> compatibility of a set of relocable object files that might not
> have been compiled with the same compiler.
> Up until now, those metadat
Matthieu Longo writes:
> The code emitting the GNU properties was moved to a separate file to
> improve modularity and "releave" the 31000-lines long aarch64.cc file
> from a few lines.
>
> It introduces a new namespace "aarch64::" for AArch64 backend which
> reduce the length of function names by
Matthieu Longo writes:
> GNU properties are emitted to provide some information about the features
> used in the generated code like BTI, GCS, or PAC. However, no debug
> comment are emitted in the generated assembly even if -dA is provided.
> It makes understanding the information stored in the .
The PCS defines a lazy save scheme for managing ZA across normal
"private-ZA" functions. GCC currently uses this scheme for calls
to all private-ZA functions (rather than using caller-save).
Therefore, before a sequence of calls to private-ZA functions, GCC emits
code to set up a lazy save. Afte
1 - 100 of 2302 matches
Mail list logo