Enable a target with FEAT_FP16 to emit the half-precision variants
of FCMP/FCMPE.
gcc/ChangeLog:
* config/aarch64/aarch64.md: Update cbranch, cstore, fcmp
and fcmpe to use the GPF_F16 iterator for floating-point
modes.
gcc/testsuite/ChangeLog:
* gcc.target/aarch6
documentation of these instructions can be found here:
https://developer.arm.com/documentation/ddi0602/2024-12
Successfully bootstrapped and regtested on aarch64-linux-gnu.
OK for stage 1?
Spencer Abson (1):
AArch64: Emit half-precision FCMP/FCMPE
gcc/config/aarch64/aarch64.md | 29
aarch64-linux-gnu.
OK for stage 1?
Spencer Abson (1):
AArch64: Define the spaceship optab [PR117013]
gcc/config/aarch64/aarch64-protos.h | 1 +
gcc/config/aarch64/aarch64.cc | 73 +++
gcc/config/aarch64/aarch64.md | 43
.../g++.target
This expansion ensures that exactly one comparison is emitted for
spacesip-like sequences on floating-point operands, including when
the result of such sequences are compared against members of
std.
For both integer and floating-point types, we optimize for the case
in which the result of a sp
://developer.arm.com/documentation/ddi0602/2024-12
Successfully bootstrapped and regtested on aarch64-linux-gnu.
OK for stage 1?
Spencer Abson (1):
AArch64: Emit half-precision FCMP/FCMPE
gcc/config/aarch64/aarch64.md | 29 +-
.../gcc.target/aarch64/_Float16_cmp_1.c | 54
Enable a target with FEAT_FP16 to emit the half-precision variants
of FCMP/FCMPE.
gcc/ChangeLog:
* config/aarch64/aarch64.md: Update cbranch, cstore, fcmp
and fcmpe to use the GPF_F16 iterator for floating-point
modes.
gcc/testsuite/ChangeLog:
* gcc.target/aarch6
Add a fold at gimple_fold_builtin to prefer the highpart variant of a builtin
if the arguments are better suited to it. This helps us avoid copying data
between lanes before operation.
E.g. We prefer to use UMULL2 rather than DUP+UMULL for the following:
uint16x8_t
foo(const uint8x16_t s) {
lso
tested on a cross-compiler targeting aarch64_be-none-linux-gnu.
OK for stage-1?
Thanks,
Spencer
Spencer Abson (1):
AArch64: Fold builtins with highpart args to highpart equivalent
[PR117850]
gcc/config/aarch64/aarch64-builtin-pairs.def | 81 ++
gcc/config/aarch64/aarch64-builtins.
Hi Kyrill,
Thanks for your comments, and for answering my question RE your work. Happy to
apply those changes in the next revision.
Cheers,
Spencer
On Tue, Feb 18, 2025 at 10:27:46AM +, Richard Sandiford wrote:
> Thanks, this generally looks really good. Some comments on top of
> Kyrill's, and Christophe's comment internally about -save-temps.
>
> Spencer Abson writes:
> > +/* Build and return a new VECTOR_C
Add a fold at gimple_fold_builtin to prefer the highpart variant of a builtin
if the arguments are better suited to it. This helps us avoid copying data
between lanes before operation.
E.g. We prefer to use UMULL2 rather than DUP+UMULL for the following:
uint16x8_t
foo(const uint8x16_t s) {
or stage-1?
Spencer
Spencer Abson (1):
AArch64: Fold builtins with highpart args to highpart equivalent
[PR117850]
gcc/config/aarch64/aarch64-builtin-pairs.def | 77 ++
gcc/config/aarch64/aarch64-builtins.cc| 232 ++
.../aarch64/simd/fold_to_highpart_1.c
originial code?
While this is an RFC, the patch itself has been bootstrapped and regtested on
aarch64-linux-gnu.
Thank you very much for any discussion.
Spencer Abson
Spencer Abson (1):
Induction vectorizer: prevent ICE for scalable types
gcc/tree-vect-loop.cc | 39 +++
We currently check that the target suppports PLUS_EXPR and MINUS_EXPR
with step_vectype (a fix for pr103523). However, vectorizable_induction
can emit a vectorized MULT_EXPR when calculating the step of each IV for
SLP, and both MULT_EXPR/FLOAT_EXPR when calculating VEC_INIT for float
inductions.
14 matches
Mail list logo