[PATCH/GCC16 v2 1/1] AArch64: Emit half-precision FCMP/FCMPE

2025-01-31 Thread Spencer Abson
Enable a target with FEAT_FP16 to emit the half-precision variants of FCMP/FCMPE. gcc/ChangeLog: * config/aarch64/aarch64.md: Update cbranch, cstore, fcmp and fcmpe to use the GPF_F16 iterator for floating-point modes. gcc/testsuite/ChangeLog: * gcc.target/aarch6

[PATCH/GCC16 v2 0/1] AArch64: Emit half-precision FCMP/FCMPE

2025-01-31 Thread Spencer Abson
documentation of these instructions can be found here: https://developer.arm.com/documentation/ddi0602/2024-12 Successfully bootstrapped and regtested on aarch64-linux-gnu. OK for stage 1? Spencer Abson (1): AArch64: Emit half-precision FCMP/FCMPE gcc/config/aarch64/aarch64.md | 29

[PATCH/GCC16 0/1] AArch64: Define the spaceship optab [PR117013]

2025-01-23 Thread Spencer Abson
aarch64-linux-gnu. OK for stage 1? Spencer Abson (1): AArch64: Define the spaceship optab [PR117013] gcc/config/aarch64/aarch64-protos.h | 1 + gcc/config/aarch64/aarch64.cc | 73 +++ gcc/config/aarch64/aarch64.md | 43 .../g++.target

[PATCH/GCC16 1/1] AArch64: Define the spaceship optab [PR117013]

2025-01-23 Thread Spencer Abson
This expansion ensures that exactly one comparison is emitted for spacesip-like sequences on floating-point operands, including when the result of such sequences are compared against members of std. For both integer and floating-point types, we optimize for the case in which the result of a sp

[PATCH/GCC16 0/1] AArch64: Emit half-precision FCMP/FCMPE

2025-01-27 Thread Spencer Abson
://developer.arm.com/documentation/ddi0602/2024-12 Successfully bootstrapped and regtested on aarch64-linux-gnu. OK for stage 1? Spencer Abson (1): AArch64: Emit half-precision FCMP/FCMPE gcc/config/aarch64/aarch64.md | 29 +- .../gcc.target/aarch64/_Float16_cmp_1.c | 54

[PATCH/GCC16 1/1] AArch64: Emit half-precision FCMP/FCMPE

2025-01-27 Thread Spencer Abson
Enable a target with FEAT_FP16 to emit the half-precision variants of FCMP/FCMPE. gcc/ChangeLog: * config/aarch64/aarch64.md: Update cbranch, cstore, fcmp and fcmpe to use the GPF_F16 iterator for floating-point modes. gcc/testsuite/ChangeLog: * gcc.target/aarch6

[PATCH 0/1][RFC] middle-end: target support checks for vectorizable_induction

2025-03-20 Thread Spencer Abson
originial code? While this is an RFC, the patch itself has been bootstrapped and regtested on aarch64-linux-gnu. Thank you very much for any discussion. Spencer Abson Spencer Abson (1): Induction vectorizer: prevent ICE for scalable types gcc/tree-vect-loop.cc | 39 +++

[PATCH 1/1][RFC] Induction vectorizer: prevent ICE for scalable types

2025-03-20 Thread Spencer Abson
We currently check that the target suppports PLUS_EXPR and MINUS_EXPR with step_vectype (a fix for pr103523). However, vectorizable_induction can emit a vectorized MULT_EXPR when calculating the step of each IV for SLP, and both MULT_EXPR/FLOAT_EXPR when calculating VEC_INIT for float inductions.

Re: [PATCH 1/1] AArch64: Fold builtins with highpart args to highpart equivalent [PR117850]

2025-02-18 Thread Spencer Abson
Hi Kyrill, Thanks for your comments, and for answering my question RE your work. Happy to apply those changes in the next revision. Cheers, Spencer

Re: [PATCH 1/1] AArch64: Fold builtins with highpart args to highpart equivalent [PR117850]

2025-02-18 Thread Spencer Abson
On Tue, Feb 18, 2025 at 10:27:46AM +, Richard Sandiford wrote: > Thanks, this generally looks really good. Some comments on top of > Kyrill's, and Christophe's comment internally about -save-temps. > > Spencer Abson writes: > > +/* Build and return a new VECTOR_C

[PATCH 1/1] AArch64: Fold builtins with highpart args to highpart equivalent [PR117850]

2025-02-17 Thread Spencer Abson
Add a fold at gimple_fold_builtin to prefer the highpart variant of a builtin if the arguments are better suited to it. This helps us avoid copying data between lanes before operation. E.g. We prefer to use UMULL2 rather than DUP+UMULL for the following: uint16x8_t foo(const uint8x16_t s) {

[PATCH 0/1] AArch64: Fold builtin calls w/ highpart args to highpart equivalent [PR117850]

2025-02-17 Thread Spencer Abson
or stage-1? Spencer Spencer Abson (1): AArch64: Fold builtins with highpart args to highpart equivalent [PR117850] gcc/config/aarch64/aarch64-builtin-pairs.def | 77 ++ gcc/config/aarch64/aarch64-builtins.cc| 232 ++ .../aarch64/simd/fold_to_highpart_1.c

[PATCH v2 1/1] AArch64: Fold builtins with highpart args to highpart equivalent [PR117850]

2025-02-21 Thread Spencer Abson
Add a fold at gimple_fold_builtin to prefer the highpart variant of a builtin if the arguments are better suited to it. This helps us avoid copying data between lanes before operation. E.g. We prefer to use UMULL2 rather than DUP+UMULL for the following: uint16x8_t foo(const uint8x16_t s) {

[PATCH v2 0/1] AArch64: Fold builtins with highpart args to highpart equivalent [PR117850]

2025-02-21 Thread Spencer Abson
lso tested on a cross-compiler targeting aarch64_be-none-linux-gnu. OK for stage-1? Thanks, Spencer Spencer Abson (1): AArch64: Fold builtins with highpart args to highpart equivalent [PR117850] gcc/config/aarch64/aarch64-builtin-pairs.def | 81 ++ gcc/config/aarch64/aarch64-builtins.

[PATCH 0/1] middle-end: Fix operation_could_trap_p for FIX_TRUNC_EXPR

2025-05-14 Thread Spencer Abson
for the issue fixed by commit 0eb5e901f6e2, if it is still relevant. Thanks Spencer Abson (1): middle-end: Fix operation_could_trap_p for FIX_TRUNC_EXPR .../gcc.dg/tree-ssa/ifcvt-fix-trunc-1.c| 18 ++ .../gcc.dg/tree-ssa/ifcvt-fix-trunc-2.c| 6 ++

[PATCH 1/1] middle-end: Fix operation_could_trap_p for FIX_TRUNC_EXPR

2025-05-14 Thread Spencer Abson
Floating-point to integer conversions can be inexact or invalid (e.g., due to overflow or NaN). However, since users of operation_could_trap_p infer the bool FP_OPERATION argument from the expression's type, FIX_TRUNC_EXPR is considered non-trapping here. This patch handles FIX_TRUNC_EXPR explici

[pushed] MAINTAINERS: add myself to write after approval

2025-05-16 Thread Spencer Abson
. NameBZ account Email Soumya AR soumyaa +Spencer Abson sabson Mark G. Adams mgadams Ajit Kumar Agarwal aagarwa Pedro Alves palves

Re: [PATCH v2 1/1] middle-end: Fix operation_could_trap_p for FIX_TRUNC expressions

2025-06-03 Thread Spencer Abson
On Tue, Jun 03, 2025 at 03:26:40PM +0200, Richard Biener wrote: > On Tue, Jun 3, 2025 at 3:09 PM Spencer Abson wrote: > > > > Floating-point to integer conversions can be inexact or invalid (e.g., due > > to > > overflow or NaN). However, since users of operation_coul

[PATCH 14/14] aarch64: Add support for unpacked SVE FP conditional ternary arithmetic

2025-06-02 Thread Spencer Abson
This patch extends the expander for fma, fnma, fms, and fnms to support partial SVE FP modes. We add the missing BF16 tests, which we can now trigger for having implemented the conditional expander. We also add tests for the 'merging with multiplicand' case, which this expander canonicalizes (alb

Re: [PATCH 09/14] aarch64: Add support for unpacked SVE FDIV

2025-06-11 Thread Spencer Abson
On Tue, Jun 10, 2025 at 07:54:31PM +0100, Richard Sandiford wrote: > Spencer Abson writes: > > On Fri, Jun 06, 2025 at 12:46:32PM +0100, Richard Sandiford wrote: > >> Spencer Abson writes: > >> > This patch extends the unpredicated FP division expander to suppor

Re: [PATCH 11/14] aarch64: Add support for unpacked SVE FP conditional binary arithmetic

2025-06-11 Thread Spencer Abson
On Tue, Jun 10, 2025 at 08:04:06PM +0100, Richard Sandiford wrote: > Spencer Abson writes: > > On Fri, Jun 06, 2025 at 03:52:12PM +0100, Richard Sandiford wrote: > >> Spencer Abson writes: > >> > @@ -8165,20 +8169,25 @@ > >> > ;; > >> > ;;

Re: [PATCH 11/14] aarch64: Add support for unpacked SVE FP conditional binary arithmetic

2025-06-09 Thread Spencer Abson
On Fri, Jun 06, 2025 at 03:52:12PM +0100, Richard Sandiford wrote: > Spencer Abson writes: > > @@ -8165,20 +8169,25 @@ > > ;; > > ;; For unpacked vectors, it doesn't really matter whether SEL uses the > > ;; the container size or the element size. If SEL use

Re: [PATCH 04/14] aarch64: Add support for unpacked SVE FP comparisons

2025-06-09 Thread Spencer Abson
On Fri, Jun 06, 2025 at 10:02:19AM +0100, Richard Sandiford wrote: > Spencer Abson writes: > > @@ -27292,10 +27291,16 @@ aarch64_emit_sve_invert_fp_cond (rtx target, > > rtx_code code, rtx pred, > > void > > aarch64_expand_sve_vec_cmp_float (rtx target, rtx_code

Re: [PATCH 08/14] aarch64: Add support for unpacked SVE FP binary arithmetic

2025-06-09 Thread Spencer Abson
On Fri, Jun 06, 2025 at 12:18:15PM +0100, Richard Sandiford wrote: > Spencer Abson writes: > > This patch extends the expanders for unpredicated smax, smin, add, sub, > > mul, min, and max, so that they support partial SVE FP modes. > > > > The relevant insn/split patt

Re: [PATCH 09/14] aarch64: Add support for unpacked SVE FDIV

2025-06-09 Thread Spencer Abson
On Fri, Jun 06, 2025 at 12:46:32PM +0100, Richard Sandiford wrote: > Spencer Abson writes: > > This patch extends the unpredicated FP division expander to support > > partial FP modes. It extends the existing patterns used to implement > > UNSPEC_COND_FDIV and it'

Re: [PATCH 13/14] aarch64: Relaxed SEL combiner patterns for unpacked SVE FP ternary arithmetic

2025-06-09 Thread Spencer Abson
On Fri, Jun 06, 2025 at 04:04:18PM +0100, Richard Sandiford wrote: > Spencer Abson writes: > > Extend the ternary op/UNSPEC_SEL combiner patterns from SVE_FULL_F/ > > SVE_FULL_F_BF to SVE_F/SVE_F_BF, where the strictness value is > > SVE_RELAXED_GP. > > > > We ca

Re: [PATCH 02/14] aarch64: Add support for unpacked SVE FP conversions

2025-06-09 Thread Spencer Abson
On Thu, Jun 05, 2025 at 06:11:44PM +0100, Richard Sandiford wrote: > Spencer Abson writes: > > @@ -9487,21 +9489,39 @@ > > ;; - FCVTZU > > ;; > > - > > > > -;; Unpredicated conve

Re: [PATCH 03/14] aarch64: Relaxed SEL combiner patterns for unpacked SVE FP conversions

2025-06-09 Thread Spencer Abson
On Thu, Jun 05, 2025 at 09:24:27PM +0100, Richard Sandiford wrote: > Spencer Abson writes: > > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/unpacked_cond_cvtf_1.c > > b/gcc/testsuite/gcc.target/aarch64/sve/unpacked_cond_cvtf_1.c > > new file mode 100644 > > ind

[PATCH 10/14] aarch64: Relaxed SEL combiner patterns for unpacked SVE FP binary arithmetic

2025-06-02 Thread Spencer Abson
Extend the binary op/UNSPEC_SEL combiner patterns from SVE_FULL_F/ SVE_FULL_F_B16B16 to SVE_F/SVE_F_B16B16, where the strictness value is SVE_RELAXED_GP. gcc/ChangeLog: * config/aarch64/aarch64-sve.md (*cond__2_relaxed): Extend from SVE_FULL_F_B16B16 to SVE_F_B16B16. (*con

[PATCH 02/14] aarch64: Add support for unpacked SVE FP conversions

2025-06-02 Thread Spencer Abson
This patch introduces expanders for FP<-FP conversions that levarage partial vector modes. We also extend the INT<-FP and FP<-INT conversions using the same approach. The ACLE enables vectorized conversions like the following: fcvt z0.h, p7/m, z1.s Modelling the source vector as VNx4SF: ... |

[PATCH 11/14] aarch64: Add support for unpacked SVE FP conditional binary arithmetic

2025-06-02 Thread Spencer Abson
This patch extends the expander for conditional smax, smin, add, sub, mul, min, max, and div to support partial SVE FP modes. The natural mask supplied to the unpacked operation leaves the undefined elements in each container unpredicated. This expansion modifies this mask to explicitly disable t

[PATCH 09/14] aarch64: Add support for unpacked SVE FDIV

2025-06-02 Thread Spencer Abson
This patch extends the unpredicated FP division expander to support partial FP modes. It extends the existing patterns used to implement UNSPEC_COND_FDIV and it's approximation as needed. gcc/ChangeLog: * config/aarch64/aarch64-sve.md: (@aarch64_sve_): Extend from SVE_FULL_F to S

[PATCH 04/14] aarch64: Add support for unpacked SVE FP comparisons

2025-06-02 Thread Spencer Abson
This patch extends our vec_cmp expander to support partial FP modes. We use an unnatural predicate mode to govern unpacked FP operations under flag_trapping_math, so the expansion must handle cases where the comparison's target and governing predicates have different modes. While such predicates

[PATCH 03/14] aarch64: Relaxed SEL combiner patterns for unpacked SVE FP conversions

2025-06-02 Thread Spencer Abson
Add UNSPEC_SEL combiner patterns for unpacked FP conversions, where the strictness value is SVE_RELAXED_GP. gcc/ChangeLog: * config/aarch64/aarch64-sve.md (*cond__nontrunc_relaxed): New FCVT/SEL combiner pattern. (*cond__trunc_relaxed): New FCVTZ{S,U}/SEL c

[PATCH 08/14] aarch64: Add support for unpacked SVE FP binary arithmetic

2025-06-02 Thread Spencer Abson
This patch extends the expanders for unpredicated smax, smin, add, sub, mul, min, and max, so that they support partial SVE FP modes. The relevant insn/split patterns have also been updated. gcc/ChangeLog: * config/aarch64/aarch64-sve.md (3): Extend from SVE_FULL_F to SVE_F, and

[PATCH 07/14] aarch64: Relaxed SEL combiner patterns for unpacked SVE FP unary operations

2025-06-02 Thread Spencer Abson
Extend the unary op/UNSPEC_SEL combiner patterns from SVE_FULL_F to SVE_F, where the strictness value is SVE_RELAXED_GP. gcc/ChangeLog: * config/aarch64/aarch64-sve.md (*cond__2_relaxed): Extend from SVE_FULL_F to SVE_F. (*cond__any_relaxed): Likewise. gcc/testsuite/Chang

[PATCH 05/14] aarch64: Compare/and splits for unpacked SVE FP comparisons

2025-06-02 Thread Spencer Abson
This patch extends the compare/and splitting patterns for FP comparisons from SVE_FULL_F to SVE_F. gcc/ChangeLog: * config/aarch64/aarch64-sve.md (*fcm_and_combine): Extend to SVE_F. gcc/testsuite/ChangeLog: * gcc.target/aarch64/sve/unpacked_fcm_1.c: Allow other tests

[PATCH 01/14] aarch64: Extend iterator support for partial SVE FP modes

2025-06-02 Thread Spencer Abson
Define new iterators for partial floating-point modes, and cover these in some existing mode_attrs. This patch serves as a starting point for a series that extends support for unpacked floating-point operations. To differentiate between BFloat mode iterators that need to test TARGET_SSVE_B16B16,

[PATCH 06/14] aarch64: Add support for unpacked SVE FP unary operations

2025-06-02 Thread Spencer Abson
This patch extends the expander for unpredicated round, nearbyint, floor, ceil, rint, and trunc, so that it can handle partial SVE FP modes. We move fabs and fneg to a separate expander, since they are not trapping instructions. gcc/ChangeLog: * config/aarch64/aarch64-sve.md (2): Replace

[PATCH 13/14] aarch64: Relaxed SEL combiner patterns for unpacked SVE FP ternary arithmetic

2025-06-02 Thread Spencer Abson
Extend the ternary op/UNSPEC_SEL combiner patterns from SVE_FULL_F/ SVE_FULL_F_BF to SVE_F/SVE_F_BF, where the strictness value is SVE_RELAXED_GP. We can only reliably test the 'merging with the third input' (addend) and 'independent value' patterns at this stage as the canocalisation that reorder

[PATCH 00/14] aarch64: Add support for unpacked SVE FP operations

2025-06-02 Thread Spencer Abson
w_bug.cgi?id=118151. Bootstrapped & regtested on aarch64-linux-gnu. Thanks, Spencer Spencer Abson (14): aarch64: Extend iterator support for partial SVE FP modes aarch64: Add support for unpacked SVE FP conversions aarch64: Relaxed SEL combiner patterns for unpacked SVE FP conversions a

[PATCH 12/14] aarch64: Add support for unpacked SVE FP ternary arithmetic

2025-06-02 Thread Spencer Abson
This patch extends the expander for unconditional fma, fnma, fms, and fnms, so that it supports partial SVE FP modes. gcc/ChangeLog: * config/aarch64/aarch64-sve.md (4): Extend from SVE_FULL_F_B16B16 to SVE_F_B16B16. Use sve_fp_pred instead of aarch64_ptrue_reg. (

[PATCH v2 1/1] middle-end: Fix operation_could_trap_p for FIX_TRUNC expressions

2025-06-03 Thread Spencer Abson
Floating-point to integer conversions can be inexact or invalid (e.g., due to overflow or NaN). However, since users of operation_could_trap_p infer the bool FP_OPERATION argument from the expression's type, the FIX_TRUNC family are considered non-trapping here. This patch handles them explicitly

[PATCH v2 0/1] middle-end: Fix operation_could_trap_p for FIX_TRUNC expressions

2025-06-03 Thread Spencer Abson
-gnu. OK for master? Thanks, Spencer Spencer Abson (1): middle-end: Fix operation_could_trap_p for FIX_TRUNC expressions .../gcc.dg/tree-ssa/ifcvt-fix-trunc-1.c | 19 +++ .../gcc.dg/tree-ssa/ifcvt-fix-trunc-2.c | 6 ++ .../gcc.target/aarch64/sve/pr96

Re: [PATCH 02/14] aarch64: Add support for unpacked SVE FP conversions

2025-06-03 Thread Spencer Abson
Thanks, Alfie. I agree that having a table with just one entry looks a little odd, but the rest of the file follows this pattern. For example: ;; - ;; [FP] Absolute difference ;;

Re: [PATCH 03/14] aarch64: Relaxed SEL combiner patterns for unpacked SVE FP conversions

2025-06-09 Thread Spencer Abson
On Mon, Jun 09, 2025 at 02:48:58PM +0100, Richard Sandiford wrote: > Spencer Abson writes: > > On Thu, Jun 05, 2025 at 09:24:27PM +0100, Richard Sandiford wrote: > >> Spencer Abson writes: > >> > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/unpacked_cond_cv

Re: [PATCH] aarch64: Fold NOT+PTEST to NOTS [PR118150]

2025-06-13 Thread Spencer Abson
On Fri, Jun 13, 2025 at 02:12:44PM +, Kyrylo Tkachov wrote: > Hi Spencer, > > Thanks for the patch. > > > On 13 Jun 2025, at 14:46, Spencer Abson wrote: > > > > Add the missing combiner patterns for folding NOT+PTEST to NOTS when > > they share the same

[PATCH] aarch64: Fold NOT+PTEST to NOTS [PR118150]

2025-06-13 Thread Spencer Abson
Add the missing combiner patterns for folding NOT+PTEST to NOTS when they share the same GP. gcc/ChangeLog: * config/aarch64/aarch64-sve.md (*one_cmpl3_cc): New combiner pattern. (*one_cmpl3_ptest): Likewise. gcc/testsuite/ChangeLog: * gcc.target/aarch64/sve/acle