Kyrylo Tkachov <ktkac...@nvidia.com> writes:
> Hi Saurabh,
>
>> On 13 Sep 2024, at 11:06, saurabh....@arm.com wrote:
>> 
>> External email: Use caution opening links or attachments
>> 
>> 
>> The AArch64 FEAT_FAMINMAX extension is optional from Armv9.2-a and
>> mandatory from Armv9.5-a. It introduces instructions for computing the
>> floating point absolute maximum and minimum of the two vectors
>> element-wise.
>> 
>> This patch adds code generation for famax and famin in terms of existing
>> unspecs. With this patch:
>> 1. famax can be expressed as taking fmax/fmaxnm of the two operands and
>>  then taking absolute value of their result.
>> 2. famin can be expressed as taking fmin/fminnm of the two operands and
>>  then taking absolute value of their result.
>> 
>> This fusion of operators is only possible when
>> -march=armv9-a+faminmax+sve flags are passed.
>> 
>> This code generation is only available on -O2 or -O3 as that is when
>> auto-vectorization is enabled.
>> 
>> gcc/ChangeLog:
>> 
>>       * config/aarch64/aarch64-sve.md
>>       (*aarch64_pred_faminmax_fused): Instruction pattern for faminmax
>>       codegen.
>>       * config/aarch64/iterators.md: Attribute for faminmax codegen.
>> 
>> gcc/testsuite/ChangeLog:
>> 
>>       * gcc.target/aarch64/sve/faminmax.c: New test.
>> ---
>> gcc/config/aarch64/aarch64-sve.md             | 29 +++++++
>> gcc/config/aarch64/iterators.md               |  6 ++
>> .../gcc.target/aarch64/sve/faminmax.c         | 85 +++++++++++++++++++
>> 3 files changed, 120 insertions(+)
>> create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/faminmax.c
>> 
>
> diff --git a/gcc/config/aarch64/aarch64-sve.md 
> b/gcc/config/aarch64/aarch64-sve.md
> index a5cd42be9d5..feb6438efde 100644
> --- a/gcc/config/aarch64/aarch64-sve.md
> +++ b/gcc/config/aarch64/aarch64-sve.md
> @@ -11111,3 +11111,32 @@
>     return "sel\t%0.<Vetype>, %3, %2.<Vetype>, %1.<Vetype>";
>   }
> )
>
> A slight tangent, maybe more of a question for Richard, but should we be 
> putting these extensions into aarch64-sve2.md or aarch64-sve.md?
> It looks like the architecture has had a major extension with SVE2 or SVE so 
> it made sense to create aarch64-sve2.md but now the incremental improvements 
> can be considered as an extension to either?

Yeah, good question. :)  I guess the sve/sve2 split doesn't make much
sense any more.  But while we have it, new patterns that are specific
to SVE2+ should probably go in aarch64-sve2.md.

> +;; -------------------------------------------------------------------------
> +;; -- [FP] Absolute maximum and minimum
> +;; -------------------------------------------------------------------------
> +;; Includes:
> +;; - FAMAX
> +;; - FAMIN
> +;; -------------------------------------------------------------------------
> +
> +;; Predicated floating-point absolute maximum and minimum.
> +(define_insn "*aarch64_pred_faminmax_fused"
> +  [(set (match_operand:SVE_FULL_F 0 "register_operand" "=w")
> +     (unspec:SVE_FULL_F
> +       [(match_operand:<VPRED> 1 "register_operand" "Upl")
> +        (match_operand:SI 4 "aarch64_sve_gp_strictness" "w")
> +        (unspec:SVE_FULL_F
> +          [(match_operand 5)
> +           (const_int SVE_RELAXED_GP)
> +           (match_operand:SVE_FULL_F 2 "register_operand" "w")]
> +          UNSPEC_COND_FABS)
> +        (unspec:SVE_FULL_F
> +          [(match_operand 6)
> +           (const_int SVE_RELAXED_GP)
> +           (match_operand:SVE_FULL_F 3 "register_operand" "w")]
> +          UNSPEC_COND_FABS)]
> +       SVE_COND_FP_MAXMIN))]
> +  "TARGET_SVE_FAMINMAX"
> +  "<faminmax_cond_uns_op>\t%0.<Vetype>, %1/m, %0.<Vetype>, %3.<Vetype>”
>
> This output pattern is missing operand 2.

Yeah.  We should use the same formulation as elsewhere to support:

- operand 2 tied to operand 0
- operand 3 tied to operand 0 (through commutativity)
- all three are separate register (using movprfx)

@aarch64_pred_<su>abd<mode> is an example of a similar commutative operation.

I don't think this distinguishes between fmax(nm)s that came from
intrinsics and fmaxnms that came from the smax optab.  The former
can't be optimised, since famax has slightly different behaviour.
The latter can, because smax on a float is inherently somewhat fuzzy.

I think we should also have tests that something like:

#include <arm_sve.h>

svfloat32_t foo(svfloat32_t x, svfloat32_t y) {
    svbool_t pg = svptrue_b8();
    return svmax_x(pg, svabs_x(pg, x), svabs_x(pg, y));
}

and

#include <arm_sve.h>

svfloat32_t foo(svfloat32_t x, svfloat32_t y) {
    svbool_t pg = svptrue_b8();
    return svmaxnm_x(pg, svabs_x(pg, x), svabs_x(pg, y));
}

are not optimised to famax even when famax is available.  This can be
done using scan-assemblers for the three individual instructions and
a scan-assembler-not for famax.

As for how to fix that: I think we'll need to use UNSPEC_COND_SMAX
and UNSPEC_COND_SMIN for "smax" and "smin" (even for floating-point
modes), rather than the current UNSPEC_COND_FMAXNM and UNSPEC_COND_FMINNM.
Code that wants to generate UNSPEC_COND_FMAXNM or UNSPEC_COND_FMINNM
directly can do it via the separate fmax/fmin optabs.

I think that can all be done by judicious tweaking of existing iterators,
but I haven't tried...

Thanks,
Richard

Reply via email to