Re: [PATCH] arm: Add new vector mode macros

Richard Sandiford Tue, 22 Sep 2020 09:21:11 -0700

Kyrylo Tkachov <[email protected]> writes:
> Hi Richard,
>
>> -----Original Message-----
>> From: Richard Sandiford <[email protected]>
>> Sent: 16 September 2020 11:15
>> To: [email protected]
>> Cc: [email protected]; Richard Earnshaw <[email protected]>;
>> Ramana Radhakrishnan <[email protected]>; Kyrylo
>> Tkachov <[email protected]>; Dennis Zhang
>> <[email protected]>
>> Subject: Re: [PATCH] arm: Add new vector mode macros
>> 
>> Ping
>> 
>> Richard Sandiford <[email protected]> writes:
>> > [ This is related to Dennis's subtraction patch
>> >   https://gcc.gnu.org/pipermail/gcc-patches/2020-September/553339.html
>> >   and the discussion about how the patterns were written.  I wanted
>> >   to see whether there was a way that we could simplify the current
>> >   addition handling that might perhaps make it easier to add other
>> >   MVE operations in future.  It seemed like one of those situations
>> >   in which the most productive thing would be to try it and see,
>> >   rather than just describe it in words.
>> >
>> >   One of the questions Ramana had in the thread above was: why does
>> >   MVE not need the flag_unsafe_math_optimizations flag?  AIUI the reason
>> >   is that MVE honours the FPSCR.FZ flag while SF Advanced SIMD always
>> >   flushes to zero.  (HF Advanced SIMD honours FPSCR.FZ16 and so also
>> >   doesn't need flag_unsafe_math_optimizations.) ]
>> >
>> > The AArch32 port now has three vector extensions: iwMMXt, Neon
>> > and MVE.  We already have some named expanders that are shared
>> > by all three, and soon we'll need more.
>> >
>> > One way of handling this would be to use define_mode_iterators
>> > that specify the condition for each mode.  For example,
>> >
>> >   (V16QI "TARGET_NEON || TARGET_HAVE_MVE")
>> >   (V8QI "TARGET_NEON || TARGET_REALLY_IWMXXT")
>> >   ...
>> >   (V2SF "TARGET_NEON && flag_unsafe_math_optimizations")
>> >
>> > etc.  However, we'll need several mode iterators, and it would
>> > be repetitive to specify the mode condition every time.
>> >
>> > This patch therefore introduces per-mode macros that say whether
>> > we can perform general arithmetic on the mode.  Initially there are
>> > two sets of macros:
>> >
>> > ARM_HAVE_NEON_<MODE>_ARITH
>> >   true if Neon can handle general arithmetic on <MODE>
>> >
>> > ARM_HAVE_<MODE>_ARITH
>> >   true if any vector extension can handle general arithmetic on <MODE>
>> >
>> > The macro definitions themselves are undeniably ugly, but hopefully
>> > they're justified by the simplifications they allow.
>> >
>> > The patch converts the addition patterns to use this scheme.
>> >
>> > Previously there were three copies of the V8HF and V4HF addition
>> > patterns for Neon:
>> >
>> > (1) *add<VDQ:mode>3_neon, which provided plus:VnHF even without
>> >     TARGET_NEON_FP16INST.  This was probably harmless since all the
>> >     named patterns had an appropriate guard, but it is possible that
>> >     something could have tried to generate the plus directly, such as
>> >     by using a REG_EQUAL note to generate a new pattern.
>> >
>> > (2) addv8hf3_neon and addv4hf3, which had the correct
>> >     TARGET_NEON_FP16INST target condition, but unnecessarily required
>> >     flag_unsafe_math_optimizations.  Unlike VnSF operations, VnHF
>> >     operations do not force flush to zero.
>> >
>> > (3) add<VH:mode>3_fp16, which provided plus:VnHF with the
>> >     correct conditions (TARGET_NEON_FP16INST, with no
>> >     flag_unsafe_math_optimizations test).
>> >
>> > The patch in essence renames add<VH:mode>3_fp16 to
>> *add<VH:mode>3_neon
>> > (part of *add<VDQ:mode>3_neon) and removes the other two patterns.
>> >
>> > WDYT?  Does this look like a way forward?
>
> This looks like a productive way forward to me.
> Okay if the other maintainer don't object by the end of the week.


Thanks.  Dennis pointed out off-list that it regressed
armv8_2-fp16-arith-2.c, which was expecting FP16 vectorisation
to be rejected for -fno-fast-math.  As mentioned above, that shouldn't
be necessary given that FP16 arithmetic (unlike FP32 arithmetic) has a
flush-to-zero control.

This version therefore updates the test to expect the same output
as the -ffast-math version.

Tested on arm-linux-gnueabi (hopefully for real this time -- I must
have messed up the testing last time).  OK for trunk?

FWIW, the non-testsuite part is the same as before.

Richard


gcc/
        * config/arm/arm.h (ARM_HAVE_NEON_V8QI_ARITH, ARM_HAVE_NEON_V4HI_ARITH)
        (ARM_HAVE_NEON_V2SI_ARITH, ARM_HAVE_NEON_V16QI_ARITH): New macros.
        (ARM_HAVE_NEON_V8HI_ARITH, ARM_HAVE_NEON_V4SI_ARITH): Likewise.
        (ARM_HAVE_NEON_V2DI_ARITH, ARM_HAVE_NEON_V4HF_ARITH): Likewise.
        (ARM_HAVE_NEON_V8HF_ARITH, ARM_HAVE_NEON_V2SF_ARITH): Likewise.
        (ARM_HAVE_NEON_V4SF_ARITH, ARM_HAVE_V8QI_ARITH, ARM_HAVE_V4HI_ARITH)
        (ARM_HAVE_V2SI_ARITH, ARM_HAVE_V16QI_ARITH, ARM_HAVE_V8HI_ARITH)
        (ARM_HAVE_V4SI_ARITH, ARM_HAVE_V2DI_ARITH, ARM_HAVE_V4HF_ARITH)
        (ARM_HAVE_V2SF_ARITH, ARM_HAVE_V8HF_ARITH, ARM_HAVE_V4SF_ARITH):
        Likewise.
        * config/arm/iterators.md (VNIM, VNINOTM): Delete.
        * config/arm/vec-common.md (add<VNIM:mode>3, addv8hf3)
        (add<VNINOTM:mode>3): Replace with...
        (add<VDQ:mode>3): ...this new expander.
        * config/arm/neon.md (*add<VDQ:mode>3_neon): Use the new
        ARM_HAVE_NEON_<MODE>_ARITH macros as the C condition.
        (addv8hf3_neon, addv4hf3, add<VFH:mode>3_fp16): Delete in
        favor of the above.
        (neon_vadd<VH:mode>): Use gen_add<mode>3 instead of
        gen_add<mode>3_fp16.

gcc/testsuite/
        * gcc.target/arm/armv8_2-fp16-arith-2.c: Expect FP16 vectorization
        even without -ffast-math.
---
 gcc/config/arm/arm.h                          | 41 ++++++++++++++++
 gcc/config/arm/iterators.md                   |  8 ----
 gcc/config/arm/neon.md                        | 47 +------------------
 gcc/config/arm/vec-common.md                  | 42 ++---------------
 .../gcc.target/arm/armv8_2-fp16-arith-2.c     | 20 +++++---
 5 files changed, 61 insertions(+), 97 deletions(-)

diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index f4d3676c5bc..4a63d33c70d 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -1110,6 +1110,47 @@ extern const int arm_arch_cde_coproc_bits[];
 #define VALID_MVE_STRUCT_MODE(MODE) \
   ((MODE) == TImode || (MODE) == OImode || (MODE) == XImode)
 
+/* The conditions under which vector modes are supported for general
+   arithmetic using Neon.  */
+
+#define ARM_HAVE_NEON_V8QI_ARITH TARGET_NEON
+#define ARM_HAVE_NEON_V4HI_ARITH TARGET_NEON
+#define ARM_HAVE_NEON_V2SI_ARITH TARGET_NEON
+
+#define ARM_HAVE_NEON_V16QI_ARITH TARGET_NEON
+#define ARM_HAVE_NEON_V8HI_ARITH TARGET_NEON
+#define ARM_HAVE_NEON_V4SI_ARITH TARGET_NEON
+#define ARM_HAVE_NEON_V2DI_ARITH TARGET_NEON
+
+/* HF operations have their own flush-to-zero control (FPSCR.FZ16).  */
+#define ARM_HAVE_NEON_V4HF_ARITH TARGET_NEON_FP16INST
+#define ARM_HAVE_NEON_V8HF_ARITH TARGET_NEON_FP16INST
+
+/* SF operations always flush to zero, regardless of FPSCR.FZ, so we can
+   only use them for general arithmetic when -funsafe-math-optimizations
+   is in effect.  */
+#define ARM_HAVE_NEON_V2SF_ARITH \
+  (TARGET_NEON && flag_unsafe_math_optimizations)
+#define ARM_HAVE_NEON_V4SF_ARITH ARM_HAVE_NEON_V2SF_ARITH
+
+/* The conditions under which vector modes are supported for general
+   arithmetic by any vector extension.  */
+
+#define ARM_HAVE_V8QI_ARITH (ARM_HAVE_NEON_V8QI_ARITH || TARGET_REALLY_IWMMXT)
+#define ARM_HAVE_V4HI_ARITH (ARM_HAVE_NEON_V4HI_ARITH || TARGET_REALLY_IWMMXT)
+#define ARM_HAVE_V2SI_ARITH (ARM_HAVE_NEON_V2SI_ARITH || TARGET_REALLY_IWMMXT)
+
+#define ARM_HAVE_V16QI_ARITH (ARM_HAVE_NEON_V16QI_ARITH || TARGET_HAVE_MVE)
+#define ARM_HAVE_V8HI_ARITH (ARM_HAVE_NEON_V8HI_ARITH || TARGET_HAVE_MVE)
+#define ARM_HAVE_V4SI_ARITH (ARM_HAVE_NEON_V4SI_ARITH || TARGET_HAVE_MVE)
+#define ARM_HAVE_V2DI_ARITH ARM_HAVE_NEON_V2DI_ARITH
+
+#define ARM_HAVE_V4HF_ARITH ARM_HAVE_NEON_V4HF_ARITH
+#define ARM_HAVE_V2SF_ARITH ARM_HAVE_NEON_V2SF_ARITH
+
+#define ARM_HAVE_V8HF_ARITH (ARM_HAVE_NEON_V8HF_ARITH || TARGET_HAVE_MVE_FLOAT)
+#define ARM_HAVE_V4SF_ARITH (ARM_HAVE_NEON_V4SF_ARITH || TARGET_HAVE_MVE_FLOAT)
+
 /* The register numbers in sequence, for passing to arm_gen_load_multiple.  */
 extern int arm_regs_in_sequence[];
 
diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
index 0bc9eba0722..c70e3bc2731 100644
--- a/gcc/config/arm/iterators.md
+++ b/gcc/config/arm/iterators.md
@@ -66,14 +66,6 @@ (define_mode_iterator VSHFT [V4HI V2SI DI])
 ;; Integer and float modes supported by Neon and IWMMXT.
 (define_mode_iterator VALL [V2DI V2SI V4HI V8QI V2SF V4SI V8HI V16QI V4SF])
 
-;; Integer and float modes supported by Neon, IWMMXT and MVE, used by
-;; arithmetic epxand patterns.
-(define_mode_iterator VNIM [V16QI V8HI V4SI V4SF])
-
-;; Integer and float modes supported by Neon and IWMMXT but not MVE, used by
-;; arithmetic epxand patterns.
-(define_mode_iterator VNINOTM [V2SI V4HI V8QI V2SF V2DI])
-
 ;; Integer and float modes supported by Neon, IWMMXT and MVE.
 (define_mode_iterator VNIM1 [V16QI V8HI V4SI V4SF V2DI])
 
diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index 3e7b51d8ab6..96bf277f501 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -501,7 +501,7 @@ (define_insn "*add<mode>3_neon"
   [(set (match_operand:VDQ 0 "s_register_operand" "=w")
         (plus:VDQ (match_operand:VDQ 1 "s_register_operand" "w")
                  (match_operand:VDQ 2 "s_register_operand" "w")))]
-  "TARGET_NEON && (!<Is_float_mode> || flag_unsafe_math_optimizations)"
+  "ARM_HAVE_NEON_<MODE>_ARITH"
   "vadd.<V_if_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
   [(set (attr "type")
       (if_then_else (match_test "<Is_float_mode>")
@@ -509,49 +509,6 @@ (define_insn "*add<mode>3_neon"
                     (const_string "neon_add<q>")))]
 )
 
-;; As with SFmode, full support for HFmode vector arithmetic is only available
-;; when flag-unsafe-math-optimizations is enabled.
-
-;; Add pattern with modes V8HF and V4HF is split into separate patterns to add
-;; support for standard pattern addv8hf3 in MVE.  Following pattern is called
-;; from "addv8hf3" standard pattern inside vec-common.md file.
-
-(define_insn "addv8hf3_neon"
-  [(set
-    (match_operand:V8HF 0 "s_register_operand" "=w")
-    (plus:V8HF
-     (match_operand:V8HF 1 "s_register_operand" "w")
-     (match_operand:V8HF 2 "s_register_operand" "w")))]
- "TARGET_NEON_FP16INST && flag_unsafe_math_optimizations"
- "vadd.f16\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
- [(set_attr "type" "neon_fp_addsub_s_q")]
-)
-
-(define_insn "addv4hf3"
-  [(set
-    (match_operand:V4HF 0 "s_register_operand" "=w")
-    (plus:V4HF
-     (match_operand:V4HF 1 "s_register_operand" "w")
-     (match_operand:V4HF 2 "s_register_operand" "w")))]
- "TARGET_NEON_FP16INST && flag_unsafe_math_optimizations"
- "vadd.f16\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
- [(set_attr "type" "neon_fp_addsub_s_q")]
-)
-
-(define_insn "add<mode>3_fp16"
-  [(set
-    (match_operand:VH 0 "s_register_operand" "=w")
-    (plus:VH
-     (match_operand:VH 1 "s_register_operand" "w")
-     (match_operand:VH 2 "s_register_operand" "w")))]
- "TARGET_NEON_FP16INST"
- "vadd.<V_if_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
- [(set (attr "type")
-   (if_then_else (match_test "<Is_float_mode>")
-    (const_string "neon_fp_addsub_s<q>")
-    (const_string "neon_add<q>")))]
-)
-
 (define_insn "*sub<mode>3_neon"
   [(set (match_operand:VDQ 0 "s_register_operand" "=w")
         (minus:VDQ (match_operand:VDQ 1 "s_register_operand" "w")
@@ -1837,7 +1794,7 @@ (define_expand "neon_vadd<mode>"
    (match_operand:VH 2 "s_register_operand")]
   "TARGET_NEON_FP16INST"
 {
-  emit_insn (gen_add<mode>3_fp16 (operands[0], operands[1], operands[2]));
+  emit_insn (gen_add<mode>3 (operands[0], operands[1], operands[2]));
   DONE;
 })
 
diff --git a/gcc/config/arm/vec-common.md b/gcc/config/arm/vec-common.md
index b7e3619caf4..c3c86c46355 100644
--- a/gcc/config/arm/vec-common.md
+++ b/gcc/config/arm/vec-common.md
@@ -81,43 +81,11 @@ (define_expand "movv8hf"
 ;; patterns separately for Neon, IWMMXT and MVE.
 
 (define_expand "add<mode>3"
-  [(set (match_operand:VNIM 0 "s_register_operand")
-       (plus:VNIM (match_operand:VNIM 1 "s_register_operand")
-                  (match_operand:VNIM 2 "s_register_operand")))]
-  "(TARGET_NEON && ((<MODE>mode != V2SFmode && <MODE>mode != V4SFmode)
-                   || flag_unsafe_math_optimizations))
-   || (TARGET_REALLY_IWMMXT && VALID_IWMMXT_REG_MODE (<MODE>mode))
-   || (TARGET_HAVE_MVE && VALID_MVE_SI_MODE(<MODE>mode))
-   || (TARGET_HAVE_MVE_FLOAT && VALID_MVE_SF_MODE(<MODE>mode))"
-{
-})
-
-;; Vector arithmetic.  Expanders are blank, then unnamed insns implement
-;; patterns separately for Neon and MVE.
-
-(define_expand "addv8hf3"
-  [(set (match_operand:V8HF 0 "s_register_operand")
-       (plus:V8HF (match_operand:V8HF 1 "s_register_operand")
-                  (match_operand:V8HF 2 "s_register_operand")))]
-  "(TARGET_HAVE_MVE_FLOAT && VALID_MVE_SF_MODE(V8HFmode))
-   || (TARGET_NEON_FP16INST && flag_unsafe_math_optimizations)"
-{
-  if (TARGET_NEON_FP16INST && flag_unsafe_math_optimizations)
-    emit_insn (gen_addv8hf3_neon (operands[0], operands[1], operands[2]));
-})
-
-;; Vector arithmetic.  Expanders are blank, then unnamed insns implement
-;; patterns separately for Neon and IWMMXT.
-
-(define_expand "add<mode>3"
-  [(set (match_operand:VNINOTM 0 "s_register_operand")
-       (plus:VNINOTM (match_operand:VNINOTM 1 "s_register_operand")
-                     (match_operand:VNINOTM 2 "s_register_operand")))]
-  "(TARGET_NEON && ((<MODE>mode != V2SFmode && <MODE>mode != V4SFmode)
-                   || flag_unsafe_math_optimizations))
-   || (TARGET_REALLY_IWMMXT && VALID_IWMMXT_REG_MODE (<MODE>mode))"
-{
-})
+  [(set (match_operand:VDQ 0 "s_register_operand")
+       (plus:VDQ (match_operand:VDQ 1 "s_register_operand")
+                 (match_operand:VDQ 2 "s_register_operand")))]
+  "ARM_HAVE_<MODE>_ARITH"
+)
 
 ;; Vector arithmetic. Expanders are blank, then unnamed insns implement
 ;; patterns separately for IWMMXT and Neon.
diff --git a/gcc/testsuite/gcc.target/arm/armv8_2-fp16-arith-2.c 
b/gcc/testsuite/gcc.target/arm/armv8_2-fp16-arith-2.c
index 24d0528540d..81bad225a1f 100644
--- a/gcc/testsuite/gcc.target/arm/armv8_2-fp16-arith-2.c
+++ b/gcc/testsuite/gcc.target/arm/armv8_2-fp16-arith-2.c
@@ -89,17 +89,23 @@ TEST_CMP (greaterthanqual, >=, int16x8_t, float16x8_t)
 /* { dg-final { scan-assembler-times {vneg\.f16\ts[0-9]+, s[0-9]+} 1 } }  */
 /* { dg-final { scan-assembler-times {vneg\.f16\td[0-9]+, d[0-9]+} 1 } }  */
 /* { dg-final { scan-assembler-times {vneg\.f16\tq[0-9]+, q[0-9]+} 1 } }  */
+/* { dg-final { scan-assembler-times {vabs\.f16\ts[0-9]+, s[0-9]+} 2 } }  */
+
+/* { dg-final { scan-assembler-times {vadd\.f16\ts[0-9]+, s[0-9]+, s[0-9]+} 1 
} }  */
+/* { dg-final { scan-assembler-times {vadd\.f16\td[0-9]+, d[0-9]+, d[0-9]+} 1 
} }  */
+/* { dg-final { scan-assembler-times {vadd\.f16\tq[0-9]+, q[0-9]+, q[0-9]+} 1 
} }  */
+
+/* { dg-final { scan-assembler-times {vsub\.f16\ts[0-9]+, s[0-9]+, s[0-9]+} 1 
} }  */
+/* { dg-final { scan-assembler-times {vsub\.f16\td[0-9]+, d[0-9]+, d[0-9]+} 1 
} }  */
+/* { dg-final { scan-assembler-times {vsub\.f16\tq[0-9]+, q[0-9]+, q[0-9]+} 1 
} }  */
+
+/* { dg-final { scan-assembler-times {vmul\.f16\ts[0-9]+, s[0-9]+, s[0-9]+} 1 
} }  */
+/* { dg-final { scan-assembler-times {vmul\.f16\td[0-9]+, d[0-9]+, d[0-9]+} 1 
} }  */
+/* { dg-final { scan-assembler-times {vmul\.f16\tq[0-9]+, q[0-9]+, q[0-9]+} 1 
} }  */
 
-/* { dg-final { scan-assembler-times {vadd\.f16\ts[0-9]+, s[0-9]+, s[0-9]+} 13 
} }  */
-/* { dg-final { scan-assembler-times {vsub\.f16\ts[0-9]+, s[0-9]+, s[0-9]+} 13 
} }  */
-/* { dg-final { scan-assembler-times {vmul\.f16\ts[0-9]+, s[0-9]+, s[0-9]+} 13 
} }  */
 /* { dg-final { scan-assembler-times {vdiv\.f16\ts[0-9]+, s[0-9]+, s[0-9]+} 13 
} }  */
 /* { dg-final { scan-assembler-times {vcmp\.f32\ts[0-9]+, s[0-9]+} 26 } }  */
-
 /* { dg-final { scan-assembler-times {vcmpe\.f32\ts[0-9]+, s[0-9]+} 52 } }  */
-/* { dg-final { scan-assembler-times {vcmpe\.f32\ts[0-9]+, #0} 2 } }  */
-
-/* { dg-final { scan-assembler-not {vabs\.f16} } }  */
 
 /* { dg-final { scan-assembler-not {vadd\.f32} } }  */
 /* { dg-final { scan-assembler-not {vsub\.f32} } }  */

Re: [PATCH] arm: Add new vector mode macros

Reply via email to