On Tue, Jun 03, 2025 at 03:26:40PM +0200, Richard Biener wrote:
> On Tue, Jun 3, 2025 at 3:09 PM Spencer Abson wrote:
> >
> > Floating-point to integer conversions can be inexact or invalid (e.g., due
> > to
> > overflow or NaN). However, since users of operation_coul
Floating-point to integer conversions can be inexact or invalid (e.g., due to
overflow or NaN). However, since users of operation_could_trap_p infer the
bool FP_OPERATION argument from the expression's type, the FIX_TRUNC family
are considered non-trapping here.
This patch handles them explicitly
-gnu.
OK for master?
Thanks,
Spencer
Spencer Abson (1):
middle-end: Fix operation_could_trap_p for FIX_TRUNC expressions
.../gcc.dg/tree-ssa/ifcvt-fix-trunc-1.c | 19 +++
.../gcc.dg/tree-ssa/ifcvt-fix-trunc-2.c | 6 ++
.../gcc.target/aarch64/sve/pr96
Thanks, Alfie. I agree that having a table with just one entry looks a
little odd, but the rest of the file follows this pattern. For example:
;; -
;; [FP] Absolute difference
;;
Extend the binary op/UNSPEC_SEL combiner patterns from SVE_FULL_F/
SVE_FULL_F_B16B16 to SVE_F/SVE_F_B16B16, where the strictness value
is SVE_RELAXED_GP.
gcc/ChangeLog:
* config/aarch64/aarch64-sve.md (*cond__2_relaxed):
Extend from SVE_FULL_F_B16B16 to SVE_F_B16B16.
(*con
This patch extends the expander for fma, fnma, fms, and fnms to support
partial SVE FP modes.
We add the missing BF16 tests, which we can now trigger for having
implemented the conditional expander.
We also add tests for the 'merging with multiplicand' case, which this
expander canonicalizes (alb
This patch extends the expander for unconditional fma, fnma, fms, and
fnms, so that it supports partial SVE FP modes.
gcc/ChangeLog:
* config/aarch64/aarch64-sve.md (4): Extend from
SVE_FULL_F_B16B16 to SVE_F_B16B16. Use sve_fp_pred instead
of aarch64_ptrue_reg.
(
w_bug.cgi?id=118151.
Bootstrapped & regtested on aarch64-linux-gnu.
Thanks,
Spencer
Spencer Abson (14):
aarch64: Extend iterator support for partial SVE FP modes
aarch64: Add support for unpacked SVE FP conversions
aarch64: Relaxed SEL combiner patterns for unpacked SVE FP conversions
a
Extend the ternary op/UNSPEC_SEL combiner patterns from SVE_FULL_F/
SVE_FULL_F_BF to SVE_F/SVE_F_BF, where the strictness value is
SVE_RELAXED_GP.
We can only reliably test the 'merging with the third input' (addend)
and 'independent value' patterns at this stage as the canocalisation that
reorder
Define new iterators for partial floating-point modes, and cover these
in some existing mode_attrs. This patch serves as a starting point for
a series that extends support for unpacked floating-point operations.
To differentiate between BFloat mode iterators that need to test
TARGET_SSVE_B16B16,
This patch extends the expander for unpredicated round, nearbyint, floor,
ceil, rint, and trunc, so that it can handle partial SVE FP modes.
We move fabs and fneg to a separate expander, since they are not trapping
instructions.
gcc/ChangeLog:
* config/aarch64/aarch64-sve.md (2): Replace
This patch extends the compare/and splitting patterns for FP comparisons
from SVE_FULL_F to SVE_F.
gcc/ChangeLog:
* config/aarch64/aarch64-sve.md (*fcm_and_combine):
Extend to SVE_F.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/sve/unpacked_fcm_1.c: Allow other tests
Extend the unary op/UNSPEC_SEL combiner patterns from SVE_FULL_F to SVE_F,
where the strictness value is SVE_RELAXED_GP.
gcc/ChangeLog:
* config/aarch64/aarch64-sve.md (*cond__2_relaxed):
Extend from SVE_FULL_F to SVE_F.
(*cond__any_relaxed): Likewise.
gcc/testsuite/Chang
This patch extends the expanders for unpredicated smax, smin, add, sub,
mul, min, and max, so that they support partial SVE FP modes.
The relevant insn/split patterns have also been updated.
gcc/ChangeLog:
* config/aarch64/aarch64-sve.md (3): Extend from
SVE_FULL_F to SVE_F, and
This patch extends our vec_cmp expander to support partial FP modes.
We use an unnatural predicate mode to govern unpacked FP operations under
flag_trapping_math, so the expansion must handle cases where the comparison's
target and governing predicates have different modes.
While such predicates
Add UNSPEC_SEL combiner patterns for unpacked FP conversions, where the
strictness value is SVE_RELAXED_GP.
gcc/ChangeLog:
* config/aarch64/aarch64-sve.md
(*cond__nontrunc_relaxed):
New FCVT/SEL combiner pattern.
(*cond__trunc_relaxed):
New FCVTZ{S,U}/SEL c
This patch extends the expander for conditional smax, smin, add, sub,
mul, min, max, and div to support partial SVE FP modes.
The natural mask supplied to the unpacked operation leaves the undefined
elements in each container unpredicated. This expansion modifies this
mask to explicitly disable t
This patch extends the unpredicated FP division expander to support
partial FP modes. It extends the existing patterns used to implement
UNSPEC_COND_FDIV and it's approximation as needed.
gcc/ChangeLog:
* config/aarch64/aarch64-sve.md: (@aarch64_sve_):
Extend from SVE_FULL_F to S
This patch introduces expanders for FP<-FP conversions that levarage
partial vector modes. We also extend the INT<-FP and FP<-INT conversions
using the same approach.
The ACLE enables vectorized conversions like the following:
fcvt z0.h, p7/m, z1.s
Modelling the source vector as VNx4SF:
... |
.
NameBZ account Email
Soumya AR soumyaa
+Spencer Abson sabson
Mark G. Adams mgadams
Ajit Kumar Agarwal aagarwa
Pedro Alves palves
Floating-point to integer conversions can be inexact or invalid (e.g., due to
overflow or NaN). However, since users of operation_could_trap_p infer the
bool FP_OPERATION argument from the expression's type, FIX_TRUNC_EXPR is
considered non-trapping here.
This patch handles FIX_TRUNC_EXPR explici
for the issue
fixed by commit 0eb5e901f6e2, if it is still relevant.
Thanks
Spencer Abson (1):
middle-end: Fix operation_could_trap_p for FIX_TRUNC_EXPR
.../gcc.dg/tree-ssa/ifcvt-fix-trunc-1.c| 18 ++
.../gcc.dg/tree-ssa/ifcvt-fix-trunc-2.c| 6 ++
We currently check that the target suppports PLUS_EXPR and MINUS_EXPR
with step_vectype (a fix for pr103523). However, vectorizable_induction
can emit a vectorized MULT_EXPR when calculating the step of each IV for
SLP, and both MULT_EXPR/FLOAT_EXPR when calculating VEC_INIT for float
inductions.
originial code?
While this is an RFC, the patch itself has been bootstrapped and regtested on
aarch64-linux-gnu.
Thank you very much for any discussion.
Spencer Abson
Spencer Abson (1):
Induction vectorizer: prevent ICE for scalable types
gcc/tree-vect-loop.cc | 39 +++
Add a fold at gimple_fold_builtin to prefer the highpart variant of a builtin
if the arguments are better suited to it. This helps us avoid copying data
between lanes before operation.
E.g. We prefer to use UMULL2 rather than DUP+UMULL for the following:
uint16x8_t
foo(const uint8x16_t s) {
lso
tested on a cross-compiler targeting aarch64_be-none-linux-gnu.
OK for stage-1?
Thanks,
Spencer
Spencer Abson (1):
AArch64: Fold builtins with highpart args to highpart equivalent
[PR117850]
gcc/config/aarch64/aarch64-builtin-pairs.def | 81 ++
gcc/config/aarch64/aarch64-builtins.
On Tue, Feb 18, 2025 at 10:27:46AM +, Richard Sandiford wrote:
> Thanks, this generally looks really good. Some comments on top of
> Kyrill's, and Christophe's comment internally about -save-temps.
>
> Spencer Abson writes:
> > +/* Build and return a new VECTOR_C
Hi Kyrill,
Thanks for your comments, and for answering my question RE your work. Happy to
apply those changes in the next revision.
Cheers,
Spencer
Add a fold at gimple_fold_builtin to prefer the highpart variant of a builtin
if the arguments are better suited to it. This helps us avoid copying data
between lanes before operation.
E.g. We prefer to use UMULL2 rather than DUP+UMULL for the following:
uint16x8_t
foo(const uint8x16_t s) {
or stage-1?
Spencer
Spencer Abson (1):
AArch64: Fold builtins with highpart args to highpart equivalent
[PR117850]
gcc/config/aarch64/aarch64-builtin-pairs.def | 77 ++
gcc/config/aarch64/aarch64-builtins.cc| 232 ++
.../aarch64/simd/fold_to_highpart_1.c
Enable a target with FEAT_FP16 to emit the half-precision variants
of FCMP/FCMPE.
gcc/ChangeLog:
* config/aarch64/aarch64.md: Update cbranch, cstore, fcmp
and fcmpe to use the GPF_F16 iterator for floating-point
modes.
gcc/testsuite/ChangeLog:
* gcc.target/aarch6
documentation of these instructions can be found here:
https://developer.arm.com/documentation/ddi0602/2024-12
Successfully bootstrapped and regtested on aarch64-linux-gnu.
OK for stage 1?
Spencer Abson (1):
AArch64: Emit half-precision FCMP/FCMPE
gcc/config/aarch64/aarch64.md | 29
Enable a target with FEAT_FP16 to emit the half-precision variants
of FCMP/FCMPE.
gcc/ChangeLog:
* config/aarch64/aarch64.md: Update cbranch, cstore, fcmp
and fcmpe to use the GPF_F16 iterator for floating-point
modes.
gcc/testsuite/ChangeLog:
* gcc.target/aarch6
://developer.arm.com/documentation/ddi0602/2024-12
Successfully bootstrapped and regtested on aarch64-linux-gnu.
OK for stage 1?
Spencer Abson (1):
AArch64: Emit half-precision FCMP/FCMPE
gcc/config/aarch64/aarch64.md | 29 +-
.../gcc.target/aarch64/_Float16_cmp_1.c | 54
This expansion ensures that exactly one comparison is emitted for
spacesip-like sequences on floating-point operands, including when
the result of such sequences are compared against members of
std.
For both integer and floating-point types, we optimize for the case
in which the result of a sp
aarch64-linux-gnu.
OK for stage 1?
Spencer Abson (1):
AArch64: Define the spaceship optab [PR117013]
gcc/config/aarch64/aarch64-protos.h | 1 +
gcc/config/aarch64/aarch64.cc | 73 +++
gcc/config/aarch64/aarch64.md | 43
.../g++.target
36 matches
Mail list logo