On Sun, Nov 11, 2018 at 11:28 AM Tamar Christina
<[email protected]> wrote:
>
> Hi All,
>
> This patch adds the expander support for supporting autovectorization of
> complex number operations
> such as Complex addition with a rotation along the Argand plane. This also
> adds support for complex
> FMA.
>
> The instructions are described in the ArmARM [1] and are available from
> Armv8.3-a onwards.
>
> Concretely, this generates
>
> f90:
> add ip, r1, #15
> add r3, r0, #15
> sub r3, r3, r2
> sub ip, ip, r2
> cmp ip, #30
> cmphi r3, #30
> add r3, r0, #1600
> bls .L5
> .L3:
> vld1.32 {q8}, [r0]!
> vld1.32 {q9}, [r1]!
> vcadd.f32 q8, q8, q9, #90
> vst1.32 {q8}, [r2]!
> cmp r0, r3
> bne .L3
> bx lr
> .L5:
> vld1.32 {d16}, [r0]!
> vld1.32 {d17}, [r1]!
> vcadd.f32 d16, d16, d17, #90
> vst1.32 {d16}, [r2]!
> cmp r0, r3
> bne .L5
> bx lr
>
>
>
> now instead of
>
> f90:
> add ip, r1, #31
> add r3, r0, #31
> sub r3, r3, r2
> sub ip, ip, r2
> cmp ip, #62
> cmphi r3, #62
> add r3, r0, #1600
> bls .L2
> .L3:
> vld2.32 {d20-d23}, [r0]!
> vld2.32 {d24-d27}, [r1]!
> cmp r0, r3
> vsub.f32 q8, q10, q13
> vadd.f32 q9, q12, q11
> vst2.32 {d16-d19}, [r2]!
> bne .L3
> bx lr
> .L2:
> vldr d19, .L10
> .L5:
> vld1.32 {d16}, [r1]!
> vld1.32 {d18}, [r0]!
> vrev64.32 d16, d16
> cmp r0, r3
> vsub.f32 d17, d18, d16
> vadd.f32 d16, d16, d18
> vswp d16, d17
> vtbl.8 d16, {d16, d17}, d19
> vst1.32 {d16}, [r2]!
> bne .L5
> bx lr
> .L11:
> .align 3
> .L10:
> .byte 0
> .byte 1
> .byte 2
> .byte 3
> .byte 12
> .byte 13
> .byte 14
> .byte 15
>
>
> For complex additions with a 90* rotation along the Argand plane.
>
> [1]
> https://developer.arm.com/docs/ddi0487/latest/arm-architecture-reference-manual-armv8-for-armv8-a-architecture-profile
>
> Bootstrap and Regtest on aarch64-none-linux-gnu, arm-none-gnueabihf and
> x86_64-pc-linux-gnu
> are still on going but previous patch showed no regressions.
>
> The instructions have also been tested on aarch64-none-elf and arm-none-eabi
> on a Armv8.3-a model
> and -march=Armv8.3-a+fp16 and all tests pass.
>
> Ok for trunk?
+;; The complex mla operations always need to expand to two instructions.
+;; The first operation does half the computation and the second does the
+;; remainder. Because of this, expand early.
+(define_expand "fcmla<rot><mode>4"
+ [(set (match_operand:VF 0 "register_operand")
+ (plus:VF (match_operand:VF 1 "register_operand")
+ (unspec:VF [(match_operand:VF 2 "register_operand")
+ (match_operand:VF 3 "register_operand")]
+ VCMLA)))]
+ "TARGET_COMPLEX"
+{
+ emit_insn (gen_neon_vcmla<rotsplit1><mode> (operands[0], operands[1],
+ operands[2], operands[3]));
+ emit_insn (gen_neon_vcmla<rotsplit2><mode> (operands[0], operands[0],
+ operands[2], operands[3]));
+ DONE;
+})
What's the two halves? Why hide this from the vectorizer if you go down all to
the detail and expose the rotation to it?
+;; The vcadd and vcmla patterns are made UNSPEC for the explicitly due to the
+;; fact that their usage need to guarantee that the source vectors are
+;; contiguous. It would be wrong to describe the operation without being able
+;; to describe the permute that is also required, but even if that is done
+;; the permute would have been created as a LOAD_LANES which means the values
+;; in the registers are in the wrong order.
Hmm, it's totally non-obvious to me how this relates to loads or what
a "non-contiguous"
register would be? That is, once you make this an unspec combine will
never be able
to synthesize this from intrinsics code that doesn't use this form.
+(define_insn "neon_vcadd<rot><mode>"
+ [(set (match_operand:VF 0 "register_operand" "=w")
+ (unspec:VF [(match_operand:VF 1 "register_operand" "w")
+ (match_operand:VF 2 "register_operand" "w")]
+ VCADD))]
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
> 2018-11-11 Tamar Christina <[email protected]>
>
> * config/arm/arm.c (arm_arch8_3, arm_arch8_4): New.
> * config/arm/arm.h (TARGET_COMPLEX, arm_arch8_3, arm_arch8_4): New.
> (arm_option_reconfigure_globals): Use them.
> * config/arm/iterators.md (VDF, VQ_HSF): New.
> (VCADD, VCMLA): New.
> (VF_constraint, rot, rotsplit1, rotsplit2): Add V4HF and V8HF.
> * config/arm/neon.md (neon_vcadd<rot><mode>, fcadd<rot><mode>3,
> neon_vcmla<rot><mode>, fcmla<rot><mode>4): New.
> * config/arm/unspecs.md (UNSPEC_VCADD90, UNSPEC_VCADD270,
> UNSPEC_VCMLA, UNSPEC_VCMLA90, UNSPEC_VCMLA180, UNSPEC_VCMLA270): New.
>
> gcc/testsuite/ChangeLog:
>
> 2018-11-11 Tamar Christina <[email protected]>
>
> * gcc.target/aarch64/advsimd-intrinsics/vcadd-arrays_1.c: Add Arm
> support.
> * gcc.target/aarch64/advsimd-intrinsics/vcadd-arrays_2.c: Likewise.
> * gcc.target/aarch64/advsimd-intrinsics/vcadd-arrays_3.c: Likewise.
> * gcc.target/aarch64/advsimd-intrinsics/vcadd-arrays_4.c: Likewise.
> * gcc.target/aarch64/advsimd-intrinsics/vcadd-arrays_5.c: Likewise.
> * gcc.target/aarch64/advsimd-intrinsics/vcadd-arrays_6.c: Likewise.
> * gcc.target/aarch64/advsimd-intrinsics/vcadd-complex_1.c: Likewise.
> * gcc.target/aarch64/advsimd-intrinsics/vcadd-complex_2.c: Likewise.
> * gcc.target/aarch64/advsimd-intrinsics/vcadd-complex_3.c: Likewise.
> * gcc.target/aarch64/advsimd-intrinsics/vcadd-complex_4.c: Likewise.
> * gcc.target/aarch64/advsimd-intrinsics/vcadd-complex_5.c: Likewise.
> * gcc.target/aarch64/advsimd-intrinsics/vcadd-complex_6.c: Likewise.
> * gcc.target/aarch64/advsimd-intrinsics/vcmla-complex_1.c: Likewise.
> * gcc.target/aarch64/advsimd-intrinsics/vcmla-complex_180_1.c:
> Likewise.
> * gcc.target/aarch64/advsimd-intrinsics/vcmla-complex_180_2.c:
> Likewise.
> * gcc.target/aarch64/advsimd-intrinsics/vcmla-complex_180_3.c:
> Likewise.
> * gcc.target/aarch64/advsimd-intrinsics/vcmla-complex_2.c: Likewise.
> * gcc.target/aarch64/advsimd-intrinsics/vcmla-complex_270_1.c:
> Likewise.
> * gcc.target/aarch64/advsimd-intrinsics/vcmla-complex_270_2.c:
> Likewise.
> * gcc.target/aarch64/advsimd-intrinsics/vcmla-complex_270_3.c:
> Likewise.
> * gcc.target/aarch64/advsimd-intrinsics/vcmla-complex_3.c: Likewise.
> * gcc.target/aarch64/advsimd-intrinsics/vcmla-complex_90_1.c:
> Likewise.
> * gcc.target/aarch64/advsimd-intrinsics/vcmla-complex_90_2.c:
> Likewise.
> * gcc.target/aarch64/advsimd-intrinsics/vcmla-complex_90_3.c:
> Likewise.
>
> --