Re: [PATCH v2 1/1] middle-end: Fix operation_could_trap_p for FIX_TRUNC expressions

2025-06-03 Thread Spencer Abson
On Tue, Jun 03, 2025 at 03:26:40PM +0200, Richard Biener wrote: > On Tue, Jun 3, 2025 at 3:09 PM Spencer Abson wrote: > > > > Floating-point to integer conversions can be inexact or invalid (e.g., due > > to > > overflow or NaN). However, since users of operation_coul

[PATCH v2 1/1] middle-end: Fix operation_could_trap_p for FIX_TRUNC expressions

2025-06-03 Thread Spencer Abson
Floating-point to integer conversions can be inexact or invalid (e.g., due to overflow or NaN). However, since users of operation_could_trap_p infer the bool FP_OPERATION argument from the expression's type, the FIX_TRUNC family are considered non-trapping here. This patch handles them explicitly

[PATCH v2 0/1] middle-end: Fix operation_could_trap_p for FIX_TRUNC expressions

2025-06-03 Thread Spencer Abson
-gnu. OK for master? Thanks, Spencer Spencer Abson (1): middle-end: Fix operation_could_trap_p for FIX_TRUNC expressions .../gcc.dg/tree-ssa/ifcvt-fix-trunc-1.c | 19 +++ .../gcc.dg/tree-ssa/ifcvt-fix-trunc-2.c | 6 ++ .../gcc.target/aarch64/sve/pr96

Re: [PATCH 02/14] aarch64: Add support for unpacked SVE FP conversions

2025-06-03 Thread Spencer Abson
Thanks, Alfie. I agree that having a table with just one entry looks a little odd, but the rest of the file follows this pattern. For example: ;; - ;; [FP] Absolute difference ;;

[PATCH 10/14] aarch64: Relaxed SEL combiner patterns for unpacked SVE FP binary arithmetic

2025-06-02 Thread Spencer Abson
Extend the binary op/UNSPEC_SEL combiner patterns from SVE_FULL_F/ SVE_FULL_F_B16B16 to SVE_F/SVE_F_B16B16, where the strictness value is SVE_RELAXED_GP. gcc/ChangeLog: * config/aarch64/aarch64-sve.md (*cond__2_relaxed): Extend from SVE_FULL_F_B16B16 to SVE_F_B16B16. (*con

[PATCH 14/14] aarch64: Add support for unpacked SVE FP conditional ternary arithmetic

2025-06-02 Thread Spencer Abson
This patch extends the expander for fma, fnma, fms, and fnms to support partial SVE FP modes. We add the missing BF16 tests, which we can now trigger for having implemented the conditional expander. We also add tests for the 'merging with multiplicand' case, which this expander canonicalizes (alb

[PATCH 12/14] aarch64: Add support for unpacked SVE FP ternary arithmetic

2025-06-02 Thread Spencer Abson
This patch extends the expander for unconditional fma, fnma, fms, and fnms, so that it supports partial SVE FP modes. gcc/ChangeLog: * config/aarch64/aarch64-sve.md (4): Extend from SVE_FULL_F_B16B16 to SVE_F_B16B16. Use sve_fp_pred instead of aarch64_ptrue_reg. (

[PATCH 00/14] aarch64: Add support for unpacked SVE FP operations

2025-06-02 Thread Spencer Abson
w_bug.cgi?id=118151. Bootstrapped & regtested on aarch64-linux-gnu. Thanks, Spencer Spencer Abson (14): aarch64: Extend iterator support for partial SVE FP modes aarch64: Add support for unpacked SVE FP conversions aarch64: Relaxed SEL combiner patterns for unpacked SVE FP conversions a

[PATCH 13/14] aarch64: Relaxed SEL combiner patterns for unpacked SVE FP ternary arithmetic

2025-06-02 Thread Spencer Abson
Extend the ternary op/UNSPEC_SEL combiner patterns from SVE_FULL_F/ SVE_FULL_F_BF to SVE_F/SVE_F_BF, where the strictness value is SVE_RELAXED_GP. We can only reliably test the 'merging with the third input' (addend) and 'independent value' patterns at this stage as the canocalisation that reorder

[PATCH 01/14] aarch64: Extend iterator support for partial SVE FP modes

2025-06-02 Thread Spencer Abson
Define new iterators for partial floating-point modes, and cover these in some existing mode_attrs. This patch serves as a starting point for a series that extends support for unpacked floating-point operations. To differentiate between BFloat mode iterators that need to test TARGET_SSVE_B16B16,

[PATCH 06/14] aarch64: Add support for unpacked SVE FP unary operations

2025-06-02 Thread Spencer Abson
This patch extends the expander for unpredicated round, nearbyint, floor, ceil, rint, and trunc, so that it can handle partial SVE FP modes. We move fabs and fneg to a separate expander, since they are not trapping instructions. gcc/ChangeLog: * config/aarch64/aarch64-sve.md (2): Replace

[PATCH 05/14] aarch64: Compare/and splits for unpacked SVE FP comparisons

2025-06-02 Thread Spencer Abson
This patch extends the compare/and splitting patterns for FP comparisons from SVE_FULL_F to SVE_F. gcc/ChangeLog: * config/aarch64/aarch64-sve.md (*fcm_and_combine): Extend to SVE_F. gcc/testsuite/ChangeLog: * gcc.target/aarch64/sve/unpacked_fcm_1.c: Allow other tests

[PATCH 07/14] aarch64: Relaxed SEL combiner patterns for unpacked SVE FP unary operations

2025-06-02 Thread Spencer Abson
Extend the unary op/UNSPEC_SEL combiner patterns from SVE_FULL_F to SVE_F, where the strictness value is SVE_RELAXED_GP. gcc/ChangeLog: * config/aarch64/aarch64-sve.md (*cond__2_relaxed): Extend from SVE_FULL_F to SVE_F. (*cond__any_relaxed): Likewise. gcc/testsuite/Chang

[PATCH 08/14] aarch64: Add support for unpacked SVE FP binary arithmetic

2025-06-02 Thread Spencer Abson
This patch extends the expanders for unpredicated smax, smin, add, sub, mul, min, and max, so that they support partial SVE FP modes. The relevant insn/split patterns have also been updated. gcc/ChangeLog: * config/aarch64/aarch64-sve.md (3): Extend from SVE_FULL_F to SVE_F, and

[PATCH 04/14] aarch64: Add support for unpacked SVE FP comparisons

2025-06-02 Thread Spencer Abson
This patch extends our vec_cmp expander to support partial FP modes. We use an unnatural predicate mode to govern unpacked FP operations under flag_trapping_math, so the expansion must handle cases where the comparison's target and governing predicates have different modes. While such predicates

[PATCH 03/14] aarch64: Relaxed SEL combiner patterns for unpacked SVE FP conversions

2025-06-02 Thread Spencer Abson
Add UNSPEC_SEL combiner patterns for unpacked FP conversions, where the strictness value is SVE_RELAXED_GP. gcc/ChangeLog: * config/aarch64/aarch64-sve.md (*cond__nontrunc_relaxed): New FCVT/SEL combiner pattern. (*cond__trunc_relaxed): New FCVTZ{S,U}/SEL c

[PATCH 11/14] aarch64: Add support for unpacked SVE FP conditional binary arithmetic

2025-06-02 Thread Spencer Abson
This patch extends the expander for conditional smax, smin, add, sub, mul, min, max, and div to support partial SVE FP modes. The natural mask supplied to the unpacked operation leaves the undefined elements in each container unpredicated. This expansion modifies this mask to explicitly disable t

[PATCH 09/14] aarch64: Add support for unpacked SVE FDIV

2025-06-02 Thread Spencer Abson
This patch extends the unpredicated FP division expander to support partial FP modes. It extends the existing patterns used to implement UNSPEC_COND_FDIV and it's approximation as needed. gcc/ChangeLog: * config/aarch64/aarch64-sve.md: (@aarch64_sve_): Extend from SVE_FULL_F to S

[PATCH 02/14] aarch64: Add support for unpacked SVE FP conversions

2025-06-02 Thread Spencer Abson
This patch introduces expanders for FP<-FP conversions that levarage partial vector modes. We also extend the INT<-FP and FP<-INT conversions using the same approach. The ACLE enables vectorized conversions like the following: fcvt z0.h, p7/m, z1.s Modelling the source vector as VNx4SF: ... |

[pushed] MAINTAINERS: add myself to write after approval

2025-05-16 Thread Spencer Abson
. NameBZ account Email Soumya AR soumyaa +Spencer Abson sabson Mark G. Adams mgadams Ajit Kumar Agarwal aagarwa Pedro Alves palves

[PATCH 1/1] middle-end: Fix operation_could_trap_p for FIX_TRUNC_EXPR

2025-05-14 Thread Spencer Abson
Floating-point to integer conversions can be inexact or invalid (e.g., due to overflow or NaN). However, since users of operation_could_trap_p infer the bool FP_OPERATION argument from the expression's type, FIX_TRUNC_EXPR is considered non-trapping here. This patch handles FIX_TRUNC_EXPR explici

[PATCH 0/1] middle-end: Fix operation_could_trap_p for FIX_TRUNC_EXPR

2025-05-14 Thread Spencer Abson
for the issue fixed by commit 0eb5e901f6e2, if it is still relevant. Thanks Spencer Abson (1): middle-end: Fix operation_could_trap_p for FIX_TRUNC_EXPR .../gcc.dg/tree-ssa/ifcvt-fix-trunc-1.c| 18 ++ .../gcc.dg/tree-ssa/ifcvt-fix-trunc-2.c| 6 ++

[PATCH 1/1][RFC] Induction vectorizer: prevent ICE for scalable types

2025-03-20 Thread Spencer Abson
We currently check that the target suppports PLUS_EXPR and MINUS_EXPR with step_vectype (a fix for pr103523). However, vectorizable_induction can emit a vectorized MULT_EXPR when calculating the step of each IV for SLP, and both MULT_EXPR/FLOAT_EXPR when calculating VEC_INIT for float inductions.

[PATCH 0/1][RFC] middle-end: target support checks for vectorizable_induction

2025-03-20 Thread Spencer Abson
originial code? While this is an RFC, the patch itself has been bootstrapped and regtested on aarch64-linux-gnu. Thank you very much for any discussion. Spencer Abson Spencer Abson (1): Induction vectorizer: prevent ICE for scalable types gcc/tree-vect-loop.cc | 39 +++

[PATCH v2 1/1] AArch64: Fold builtins with highpart args to highpart equivalent [PR117850]

2025-02-21 Thread Spencer Abson
Add a fold at gimple_fold_builtin to prefer the highpart variant of a builtin if the arguments are better suited to it. This helps us avoid copying data between lanes before operation. E.g. We prefer to use UMULL2 rather than DUP+UMULL for the following: uint16x8_t foo(const uint8x16_t s) {

[PATCH v2 0/1] AArch64: Fold builtins with highpart args to highpart equivalent [PR117850]

2025-02-21 Thread Spencer Abson
lso tested on a cross-compiler targeting aarch64_be-none-linux-gnu. OK for stage-1? Thanks, Spencer Spencer Abson (1): AArch64: Fold builtins with highpart args to highpart equivalent [PR117850] gcc/config/aarch64/aarch64-builtin-pairs.def | 81 ++ gcc/config/aarch64/aarch64-builtins.

Re: [PATCH 1/1] AArch64: Fold builtins with highpart args to highpart equivalent [PR117850]

2025-02-18 Thread Spencer Abson
On Tue, Feb 18, 2025 at 10:27:46AM +, Richard Sandiford wrote: > Thanks, this generally looks really good. Some comments on top of > Kyrill's, and Christophe's comment internally about -save-temps. > > Spencer Abson writes: > > +/* Build and return a new VECTOR_C

Re: [PATCH 1/1] AArch64: Fold builtins with highpart args to highpart equivalent [PR117850]

2025-02-18 Thread Spencer Abson
Hi Kyrill, Thanks for your comments, and for answering my question RE your work. Happy to apply those changes in the next revision. Cheers, Spencer

[PATCH 1/1] AArch64: Fold builtins with highpart args to highpart equivalent [PR117850]

2025-02-17 Thread Spencer Abson
Add a fold at gimple_fold_builtin to prefer the highpart variant of a builtin if the arguments are better suited to it. This helps us avoid copying data between lanes before operation. E.g. We prefer to use UMULL2 rather than DUP+UMULL for the following: uint16x8_t foo(const uint8x16_t s) {

[PATCH 0/1] AArch64: Fold builtin calls w/ highpart args to highpart equivalent [PR117850]

2025-02-17 Thread Spencer Abson
or stage-1? Spencer Spencer Abson (1): AArch64: Fold builtins with highpart args to highpart equivalent [PR117850] gcc/config/aarch64/aarch64-builtin-pairs.def | 77 ++ gcc/config/aarch64/aarch64-builtins.cc| 232 ++ .../aarch64/simd/fold_to_highpart_1.c

[PATCH/GCC16 v2 1/1] AArch64: Emit half-precision FCMP/FCMPE

2025-01-31 Thread Spencer Abson
Enable a target with FEAT_FP16 to emit the half-precision variants of FCMP/FCMPE. gcc/ChangeLog: * config/aarch64/aarch64.md: Update cbranch, cstore, fcmp and fcmpe to use the GPF_F16 iterator for floating-point modes. gcc/testsuite/ChangeLog: * gcc.target/aarch6

[PATCH/GCC16 v2 0/1] AArch64: Emit half-precision FCMP/FCMPE

2025-01-31 Thread Spencer Abson
documentation of these instructions can be found here: https://developer.arm.com/documentation/ddi0602/2024-12 Successfully bootstrapped and regtested on aarch64-linux-gnu. OK for stage 1? Spencer Abson (1): AArch64: Emit half-precision FCMP/FCMPE gcc/config/aarch64/aarch64.md | 29

[PATCH/GCC16 1/1] AArch64: Emit half-precision FCMP/FCMPE

2025-01-27 Thread Spencer Abson
Enable a target with FEAT_FP16 to emit the half-precision variants of FCMP/FCMPE. gcc/ChangeLog: * config/aarch64/aarch64.md: Update cbranch, cstore, fcmp and fcmpe to use the GPF_F16 iterator for floating-point modes. gcc/testsuite/ChangeLog: * gcc.target/aarch6

[PATCH/GCC16 0/1] AArch64: Emit half-precision FCMP/FCMPE

2025-01-27 Thread Spencer Abson
://developer.arm.com/documentation/ddi0602/2024-12 Successfully bootstrapped and regtested on aarch64-linux-gnu. OK for stage 1? Spencer Abson (1): AArch64: Emit half-precision FCMP/FCMPE gcc/config/aarch64/aarch64.md | 29 +- .../gcc.target/aarch64/_Float16_cmp_1.c | 54

[PATCH/GCC16 1/1] AArch64: Define the spaceship optab [PR117013]

2025-01-23 Thread Spencer Abson
This expansion ensures that exactly one comparison is emitted for spacesip-like sequences on floating-point operands, including when the result of such sequences are compared against members of std. For both integer and floating-point types, we optimize for the case in which the result of a sp

[PATCH/GCC16 0/1] AArch64: Define the spaceship optab [PR117013]

2025-01-23 Thread Spencer Abson
aarch64-linux-gnu. OK for stage 1? Spencer Abson (1): AArch64: Define the spaceship optab [PR117013] gcc/config/aarch64/aarch64-protos.h | 1 + gcc/config/aarch64/aarch64.cc | 73 +++ gcc/config/aarch64/aarch64.md | 43 .../g++.target