Re: [PATCH v3 1/2] aarch64: Add SVE2 faminmax intrinsics

Saurabh Jha Wed, 09 Oct 2024 01:12:37 -0700



On 10/8/2024 3:27 PM, Richard Sandiford wrote:

Saurabh Jha <saurabh....@arm.com> writes:

Thanks for the review. Wanted to clarify your comment:

On 10/8/2024 11:51 AM, Richard Sandiford wrote:

<saurabh....@arm.com> writes:

diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/amax_f16.c 
b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/amax_f16.c
new file mode 100644
index 00000000000..de4a6f8efaa
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/amax_f16.c
@@ -0,0 +1,312 @@
[...]
+/*
+** amax_h4_f16_m_untied:
+**     mov     (z[0-9]+\.h), h4
+**     movprfx z0, z1
+**     famax   z0\.h, p0/m, z0\.h, \1
+**     ret
+*/
+TEST_UNIFORM_ZD (amax_h4_f16_m_untied, svfloat16_t, __fp16,
+                z0 = svamax_n_f16_m (p0, z1, d4),
+                z0 = svamax_m (p0, z1, d4))
+
+/*
+** amax_2_f16_m:
+**     fmov    (z[0-9]+\.h), #2\.0(?:e\+0)?
+**     famax   z0\.h, p0/m, z0\.h, \1
+**     ret
+*/
+TEST_UNIFORM_Z (amax_2_f16_m, svfloat16_t,
+               z0 = svamax_n_f16_m (p0, z0, 2),
+               z0 = svamax_m (p0, z0, 2))


Rather than dropping the tests for 0 and 1, I think we should keep
them and verify that we don't try to use non-existent immediate forms.
(It would be easy for that to happen, if they were added to the wrong
iterators.)

Maybe:

/*
** amax_0_f16_m_tied1:
**      ...
**      famax   z0\.h, p0/m, z0\.h, z[0-9]+\.h
**      ret
*/
TEST_UNIFORM_Z (amax_0_f16_m_tied1, svfloat16_t,
                z0 = svamax_n_f16_m (p0, z0, 0),
                z0 = svamax_m (p0, z0, 0))

/*
** amax_1_f16_m_tied1:
**      ...
**      famax   z0\.h, p0/m, z0\.h, z[0-9]+\.h
**      ret
*/
TEST_UNIFORM_Z (amax_1_f16_m_tied1, svfloat16_t,
                z0 = svamax_n_f16_m (p0, z0, 1),
                z0 = svamax_m (p0, z0, 1))

(untested).  Similarly for the other files, and for _z and _x.


Right now, if we do this, we get an ICE like this

unrecognizable insn:
(insn 18 17 19 2 (set (reg:VNx8HF 107)
          (unspec:VNx8HF [
                  (reg:VNx8BI 108)
                  (unspec:VNx8HF [
                          (reg:VNx8BI 108)
                          (const_int 1 [0x1])
                          (reg:VNx8HF 109)
                          (const_vector:VNx8HF repeat [
                                  (const_double:HF 0.0 [0x0.0p+0])
                              ])
                      ] UNSPEC_COND_FAMAX)
                  (reg:VNx8HF 109)
              ] UNSPEC_SEL))
"/home/saujha01/gnu-work/src/gcc/gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/amax_f16.c":54:1
-1
       (nil))
during RTL pass: vregs


Ah, yeah, that's the kind of failure that the tests above would
defend against.

Most likely, ICE is not the right answer here. But if we want to have a
test for immediate forms, we would have to add support for it in
instruction patterns, isn't it? But these functions don't expect
immediate operands.

Should we add support for immediate operands in the instruction patterns
or should we somehow write test so that we make sure we don't
inadvertently add support for immediate forms? I think you meant the
latter but not sure how could we add that test without adding support
for it in the compiler.


Yeah, I meant the latter.  From a user's point of view, this is a
similar sort of situation to:

/*
** amax_2_f32_m:
**      fmov    (z[0-9]+\.s), #2\.0(?:e\+0)?
**      famax   z0\.s, p0/m, z0\.s, \1
**      ret
*/
TEST_UNIFORM_Z (amax_2_f32_m, svfloat32_t,
                z0 = svamax_n_f32_m (p0, z0, 2),
                z0 = svamax_m (p0, z0, 2))

which we do handle correctly.  The idea is the same for 0 and 1:
we should force the constant into a register and use a pure
register FAMIN/FAMAX.

(In the tests above, I dropped matching the fmov part, because being
overly specific about ways of moving zeros into registers forced Tamar
to do a lot of make-work for one of his recent patches.)

I think at least one of the bugs is:

  (define_int_attr sve_pred_fp_rhs2_operand
    [(UNSPEC_COND_FADD "aarch64_sve_float_arith_with_sub_operand")
+   (UNSPEC_COND_FAMAX "aarch64_sve_float_maxmin_operand")
+   (UNSPEC_COND_FAMIN "aarch64_sve_float_maxmin_operand")
     (UNSPEC_COND_FDIV "register_operand")
     (UNSPEC_COND_FMAX "aarch64_sve_float_maxmin_operand")
     (UNSPEC_COND_FMAXNM "aarch64_sve_float_maxmin_operand")

this should use register_operand rather than aarch64_sve_float_maxmin_operand,
since FAMIN and FAMAX don't have the same immediate forms as FMIN and FMAX.

Thanks for the explanation. Totally makes sense now. I will send a newversion shortly :)


Thanks,
Richard

Re: [PATCH v3 1/2] aarch64: Add SVE2 faminmax intrinsics

Reply via email to