[PATCH/GCC16 v2 1/1] AArch64: Emit half-precision FCMP/FCMPE

2025-01-31 Thread Spencer Abson
Enable a target with FEAT_FP16 to emit the half-precision variants of FCMP/FCMPE. gcc/ChangeLog: * config/aarch64/aarch64.md: Update cbranch, cstore, fcmp and fcmpe to use the GPF_F16 iterator for floating-point modes. gcc/testsuite/ChangeLog: * gcc.target/aarch6

[PATCH/GCC16 v2 0/1] AArch64: Emit half-precision FCMP/FCMPE

2025-01-31 Thread Spencer Abson
documentation of these instructions can be found here: https://developer.arm.com/documentation/ddi0602/2024-12 Successfully bootstrapped and regtested on aarch64-linux-gnu. OK for stage 1? Spencer Abson (1): AArch64: Emit half-precision FCMP/FCMPE gcc/config/aarch64/aarch64.md | 29

[PATCH/GCC16 0/1] AArch64: Define the spaceship optab [PR117013]

2025-01-23 Thread Spencer Abson
aarch64-linux-gnu. OK for stage 1? Spencer Abson (1): AArch64: Define the spaceship optab [PR117013] gcc/config/aarch64/aarch64-protos.h | 1 + gcc/config/aarch64/aarch64.cc | 73 +++ gcc/config/aarch64/aarch64.md | 43 .../g++.target

[PATCH/GCC16 1/1] AArch64: Define the spaceship optab [PR117013]

2025-01-23 Thread Spencer Abson
This expansion ensures that exactly one comparison is emitted for spacesip-like sequences on floating-point operands, including when the result of such sequences are compared against members of std. For both integer and floating-point types, we optimize for the case in which the result of a sp

[PATCH/GCC16 0/1] AArch64: Emit half-precision FCMP/FCMPE

2025-01-27 Thread Spencer Abson
://developer.arm.com/documentation/ddi0602/2024-12 Successfully bootstrapped and regtested on aarch64-linux-gnu. OK for stage 1? Spencer Abson (1): AArch64: Emit half-precision FCMP/FCMPE gcc/config/aarch64/aarch64.md | 29 +- .../gcc.target/aarch64/_Float16_cmp_1.c | 54

[PATCH/GCC16 1/1] AArch64: Emit half-precision FCMP/FCMPE

2025-01-27 Thread Spencer Abson
Enable a target with FEAT_FP16 to emit the half-precision variants of FCMP/FCMPE. gcc/ChangeLog: * config/aarch64/aarch64.md: Update cbranch, cstore, fcmp and fcmpe to use the GPF_F16 iterator for floating-point modes. gcc/testsuite/ChangeLog: * gcc.target/aarch6

[PATCH v2 1/1] AArch64: Fold builtins with highpart args to highpart equivalent [PR117850]

2025-02-21 Thread Spencer Abson
Add a fold at gimple_fold_builtin to prefer the highpart variant of a builtin if the arguments are better suited to it. This helps us avoid copying data between lanes before operation. E.g. We prefer to use UMULL2 rather than DUP+UMULL for the following: uint16x8_t foo(const uint8x16_t s) {

[PATCH v2 0/1] AArch64: Fold builtins with highpart args to highpart equivalent [PR117850]

2025-02-21 Thread Spencer Abson
lso tested on a cross-compiler targeting aarch64_be-none-linux-gnu. OK for stage-1? Thanks, Spencer Spencer Abson (1): AArch64: Fold builtins with highpart args to highpart equivalent [PR117850] gcc/config/aarch64/aarch64-builtin-pairs.def | 81 ++ gcc/config/aarch64/aarch64-builtins.

Re: [PATCH 1/1] AArch64: Fold builtins with highpart args to highpart equivalent [PR117850]

2025-02-18 Thread Spencer Abson
Hi Kyrill, Thanks for your comments, and for answering my question RE your work. Happy to apply those changes in the next revision. Cheers, Spencer

Re: [PATCH 1/1] AArch64: Fold builtins with highpart args to highpart equivalent [PR117850]

2025-02-18 Thread Spencer Abson
On Tue, Feb 18, 2025 at 10:27:46AM +, Richard Sandiford wrote: > Thanks, this generally looks really good. Some comments on top of > Kyrill's, and Christophe's comment internally about -save-temps. > > Spencer Abson writes: > > +/* Build and return a new VECTOR_C

[PATCH 1/1] AArch64: Fold builtins with highpart args to highpart equivalent [PR117850]

2025-02-17 Thread Spencer Abson
Add a fold at gimple_fold_builtin to prefer the highpart variant of a builtin if the arguments are better suited to it. This helps us avoid copying data between lanes before operation. E.g. We prefer to use UMULL2 rather than DUP+UMULL for the following: uint16x8_t foo(const uint8x16_t s) {

[PATCH 0/1] AArch64: Fold builtin calls w/ highpart args to highpart equivalent [PR117850]

2025-02-17 Thread Spencer Abson
or stage-1? Spencer Spencer Abson (1): AArch64: Fold builtins with highpart args to highpart equivalent [PR117850] gcc/config/aarch64/aarch64-builtin-pairs.def | 77 ++ gcc/config/aarch64/aarch64-builtins.cc| 232 ++ .../aarch64/simd/fold_to_highpart_1.c

[PATCH 0/1][RFC] middle-end: target support checks for vectorizable_induction

2025-03-20 Thread Spencer Abson
originial code? While this is an RFC, the patch itself has been bootstrapped and regtested on aarch64-linux-gnu. Thank you very much for any discussion. Spencer Abson Spencer Abson (1): Induction vectorizer: prevent ICE for scalable types gcc/tree-vect-loop.cc | 39 +++

[PATCH 1/1][RFC] Induction vectorizer: prevent ICE for scalable types

2025-03-20 Thread Spencer Abson
We currently check that the target suppports PLUS_EXPR and MINUS_EXPR with step_vectype (a fix for pr103523). However, vectorizable_induction can emit a vectorized MULT_EXPR when calculating the step of each IV for SLP, and both MULT_EXPR/FLOAT_EXPR when calculating VEC_INIT for float inductions.