> -----Original Message----- > From: Tamar Christina <tamar.christ...@arm.com> > Sent: 25 September 2020 15:32 > To: gcc-patches@gcc.gnu.org > Cc: nd <n...@arm.com>; Ramana Radhakrishnan > <ramana.radhakrish...@arm.com>; Richard Earnshaw > <richard.earns...@arm.com>; ni...@redhat.com; Kyrylo Tkachov > <kyrylo.tkac...@arm.com> > Subject: [PATCH v2 15/16]Arm: Add MVE RTL patterns for Complex Addition, > Multiply and FMA. > > Hi All, > > This adds implementation for the optabs for complex operations. With this > the > following C code: > > void f90 (int _Complex a[restrict N], int _Complex b[restrict N], > int _Complex c[restrict N]) > { > for (int i=0; i < N; i++) > c[i] = a[i] + (b[i] * I); > } > > generates > > .L3: > mov r3, r0 > vldrw.32 q2, [r3] > mov r3, r1 > vldrw.32 q1, [r3] > mov r3, r2 > vcadd.i32 q3, q2, q1, #90 > adds r0, r0, #16 > vstrw.32 q3, [r3] > adds r1, r1, #16 > adds r2, r2, #16 > le lr, .L3 > pop {r4, r5, r6, r7, r8, pc} > > which is not ideal due to register allocation and addressing mode issues with > MVE in general. However -frename-register cleans up the register allocation: > > .L3: > mov r5, r0 > mov r6, r1 > vldrw.32 q2, [r5] > vldrw.32 q1, [r6] > mov r7, r2 > vcadd.i32 q3, q2, q1, #90 > adds r0, r0, #16 > vstrw.32 q3, [r7] > adds r1, r1, #16 > adds r2, r2, #16 > le lr, .L3 > pop {r4, r5, r6, r7, r8, pc} > > but leaves the addressing mode problems. > > Before this patch it generated a scalar loop > > .L2: > ldr r7, [r0, r3, lsl #2] > ldr r5, [r6, r3, lsl #2] > ldr r4, [r1, r3, lsl #2] > subs r5, r7, r5 > ldr r7, [lr, r3, lsl #2] > add r4, r4, r7 > str r5, [r2, r3, lsl #2] > str r4, [ip, r3, lsl #2] > adds r3, r3, #2 > cmp r3, #200 > bne .L2 > pop {r4, r5, r6, r7, pc} > > > > Bootstrapped Regtested on arm-none-linux-gnueabihf and no issues. > Cross compiled arm-none-eabi and ran with -march=armv8.1- > m.main+mve.fp > -mfloat-abi=hard -mfpu=auto and regression is on-going. > > Unfortunately MVE does not currently implement auto-vectorization of > floating > point values. As such I cannot test this directly. But since they share 90% > of the code with NEON these should just work whenever support is added so > I > would still like to commit these.
I believe MVE modes are now supported for autovectorisation since 29c650cd899496c4f9bc069d03d0d7ecfb632176 Could you try out the floating-point modes too? > > To support this I had to refactor the MVE bits a bit. This now uses the same > unspecs for both NEON and MVE and removes the unneeded different signed > and > unsigned unspecs since they both point to the signed instruction. > > I have tried multiple approaches to cleaning this up but I think this is the > nicest it can get given the slight ISA differences. > > Ok for master if no issues? Ok. Thanks, Kyrill > > Thanks, > Tamar > > gcc/ChangeLog: > > * config/arm/arm_mve.h (__arm_vcaddq_rot90_u8, > __arm_vcaddq_rot270_u8, > , __arm_vcaddq_rot90_s8, __arm_vcaddq_rot270_s8, > __arm_vcaddq_rot90_u16, __arm_vcaddq_rot270_u16, > __arm_vcaddq_rot90_s16, > __arm_vcaddq_rot270_s16, __arm_vcaddq_rot90_u32, > __arm_vcaddq_rot270_u32, __arm_vcaddq_rot90_s32, > __arm_vcaddq_rot270_s32, __arm_vcmulq_rot90_f16, > __arm_vcmulq_rot270_f16, __arm_vcmulq_rot180_f16, > __arm_vcmulq_f16, __arm_vcaddq_rot90_f16, > __arm_vcaddq_rot270_f16, > __arm_vcmulq_rot90_f32, __arm_vcmulq_rot270_f32, > __arm_vcmulq_rot180_f32, __arm_vcmulq_f32, > __arm_vcaddq_rot90_f32, > __arm_vcaddq_rot270_f32, __arm_vcmlaq_f16, > __arm_vcmlaq_rot180_f16, > __arm_vcmlaq_rot270_f16, __arm_vcmlaq_rot90_f16, > __arm_vcmlaq_f32, > __arm_vcmlaq_rot180_f32, __arm_vcmlaq_rot270_f32, > __arm_vcmlaq_rot90_f32): Update builtin calls. > * config/arm/arm_mve_builtins.def (vcaddq_rot90_u, > vcaddq_rot270_u, > vcaddq_rot90_s, vcaddq_rot270_s, vcaddq_rot90_f, > vcaddq_rot270_f, > vcmulq_f, vcmulq_rot90_f, vcmulq_rot180_f, vcmulq_rot270_f, > vcmlaq_f, vcmlaq_rot90_f, vcmlaq_rot180_f, vcmlaq_rot270_f): > Removed. > (vcaddq_rot90, vcaddq_rot270, vcmulq, vcmulq_rot90, > vcmulq_rot180, > vcmulq_rot270, vcmlaq, vcmlaq_rot90, vcmlaq_rot180, > vcmlaq_rot270): > New. > * config/arm/constraints.md (Dz): Include MVE. > * config/arm/iterators.md (mve_rotsplit1, mve_rotsplit2): New. > * config/arm/mve.md (VCADDQ_ROT270_S, VCADDQ_ROT90_S, > VCADDQ_ROT270_U, > VCADDQ_ROT90_U, VCADDQ_ROT270_F, VCADDQ_ROT90_F, > VCMULQ_F, > VCMULQ_ROT180_F, VCMULQ_ROT270_F, VCMULQ_ROT90_F, > VCMLAQ_F, > VCMLAQ_ROT180_F, VCMLAQ_ROT90_F, VCMLAQ_ROT270_F, > VCADDQ_ROT270_S, > VCADDQ_ROT270, VCADDQ_ROT90): Removed. > (mve_rot, VCMUL): New. > (mve_vcaddq_rot270_<supf><mode, > mve_vcaddq_rot90_<supf><mode>, > mve_vcaddq_rot270_f<mode>, mve_vcaddq_rot90_f<mode>, > mve_vcmulq_f<mode, > mve_vcmulq_rot180_f<mode>, mve_vcmulq_rot270_f<mode>, > mve_vcmulq_rot90_f<mode>, mve_vcmlaq_f<mode>, > mve_vcmlaq_rot180_f<mode>, > mve_vcmlaq_rot270_f<mode>, mve_vcmlaq_rot90_f<mode>): > Removed. > (mve_vcmlaq<mve_rot><mode>, mve_vcmulq<mve_rot><mode>, > mve_vcaddq<mve_rot><mode>, cadd<rot><mode>3, > mve_vcaddq<mve_rot><mode>): > New. > * config/arm/neon.md (cadd<rot><mode>3, > cml<fcmac1><rot_op><mode>4): > Moved. > (cmul<rot_op><mode>3): Exclude MVE types. > * config/arm/unspecs.md (UNSPEC_VCMUL90, UNSPEC_VCMUL270): > New. > * config/arm/vec-common.md (cadd<rot><mode>3, > cmul<rot_op><mode>3, > arm_vcmla<rot><mode>, cml<fcmac1><rot_op><mode>4): New. > > --