Re: [PATCH 9/9][GCC][Arm] Add ACLE intrinsics for complex mutliplication and addition

Christophe Lyon Fri, 11 Jan 2019 02:02:36 -0800

Hi Tamar,


On Thu, 10 Jan 2019 at 16:41, Tamar Christina <tamar.christ...@arm.com> wrote:
>
> Hi Christoph,
>
> It was introduced in a small refactoring after which I only retested the 
> testcases I added,which don't trigger the issue.
>
> In any case it's a trivial fix and I'll submit a patch in a bit.
>
> Tamar
>
> ________________________________________
> From: Christophe Lyon <christophe.l...@linaro.org>
> Sent: Thursday, January 10, 2019 3:35:18 PM
> To: Tamar Christina
> Cc: Kyrill Tkachov; gcc-patches@gcc.gnu.org; nd; Ramana Radhakrishnan; 
> Richard Earnshaw; ni...@redhat.com
> Subject: Re: [PATCH 9/9][GCC][Arm] Add ACLE intrinsics for complex 
> mutliplication and addition
>
> Hi Tamar,
>
>
> On Thu, 10 Jan 2019 at 04:44, Tamar Christina <tamar.christ...@arm.com> wrote:
> >
> > Hi Kyrill,
> >
> > Committed with a the addition of a few trivial defines and iterators that 
> > were missing due to
> > The patch being split.
> >
> > Thanks,
> > Tamar
> >
> > -----Original Message-----
> > From: Kyrill Tkachov <kyrylo.tkac...@foss.arm.com>
> > Sent: Friday, December 21, 2018 11:40 AM
> > To: Tamar Christina <tamar.christ...@arm.com>; gcc-patches@gcc.gnu.org
> > Cc: nd <n...@arm.com>; Ramana Radhakrishnan <ramana.radhakrish...@arm.com>; 
> > Richard Earnshaw <richard.earns...@arm.com>; ni...@redhat.com
> > Subject: Re: [PATCH 9/9][GCC][Arm] Add ACLE intrinsics for complex 
> > mutliplication and addition
> >
> > Hi Tamar,
> >
> > On 11/12/18 15:46, Tamar Christina wrote:
> > > Hi All,
> > >
> > > This patch adds NEON intrinsics and tests for the Armv8.3-a complex
> > > multiplication and add instructions with a rotate along the Argand plane.
> > >
> > > The instructions are documented in the ArmARM[1] and the intrinsics
> > > specification will be published on the Arm website [2].
> > >
> > > The Lane versions of these instructions are special in that they always 
> > > select a pair.
> > > using index 0 means selecting lane 0 and 1.  Because of this the range
> > > check for the intrinsics require special handling.
> > >
> > > On Arm, in order to implement some of the lane intrinsics we're using
> > > the structure of the register file.  The lane variant of these
> > > instructions always select a D register, but the data itself can be
> > > stored in Q registers.  This means that for single precision complex
> > > numbers you are only allowed to select D[0] but using the register file 
> > > layout you can get the range 0-1 for lane indices by selecting between 
> > > Dn[0] and Dn+1[0].
> > >
> > > Same reasoning applies for half float complex numbers, except there
> > > your D register indexes can be 0 or 1, so you have a total range of 4 
> > > elements (for a V8HF).
> > >
> > >
> > > [1]
> > > https://developer.arm.com/docs/ddi0487/latest/arm-architecture-referen
> > > ce-manual-armv8-for-armv8-a-architecture-profile
> > > [2] https://developer.arm.com/docs/101028/latest
> > >
> > > Bootstrapped Regtested on arm-none-gnueabihf and no issues.
> > >
> > > Ok for trunk?
> > >
> >
> > Ok.
> > Thanks,
> > Kyrill
> >
> > > Thanks,
> > > Tamar
> > >
> > > gcc/ChangeLog:
> > >
> > > 2018-12-11  Tamar Christina  <tamar.christ...@arm.com>
> > >
> > >         * config/arm/arm-builtins.c
> > >         (enum arm_type_qualifiers): Add qualifier_lane_pair_index.
> > >         (MAC_LANE_PAIR_QUALIFIERS): New.
> > >         (arm_expand_builtin_args): Use it.
> > >         (arm_expand_builtin_1): Likewise.
> > >         * config/arm/arm-protos.h (neon_vcmla_lane_prepare_operands): New.
> > >         * config/arm/arm.c (neon_vcmla_lane_prepare_operands): New.
> > >         * config/arm/arm-c.c (arm_cpu_builtins): Add 
> > > __ARM_FEATURE_COMPLEX.
> > >         * config/arm/arm_neon.h:
> > >         (vcadd_rot90_f16): New.
> > >         (vcaddq_rot90_f16): New.
> > >         (vcadd_rot270_f16): New.
> > >         (vcaddq_rot270_f16): New.
> > >         (vcmla_f16): New.
> > >         (vcmlaq_f16): New.
> > >         (vcmla_lane_f16): New.
> > >         (vcmla_laneq_f16): New.
> > >         (vcmlaq_lane_f16): New.
> > >         (vcmlaq_laneq_f16): New.
> > >         (vcmla_rot90_f16): New.
> > >         (vcmlaq_rot90_f16): New.
> > >         (vcmla_rot90_lane_f16): New.
> > >         (vcmla_rot90_laneq_f16): New.
> > >         (vcmlaq_rot90_lane_f16): New.
> > >         (vcmlaq_rot90_laneq_f16): New.
> > >         (vcmla_rot180_f16): New.
> > >         (vcmlaq_rot180_f16): New.
> > >         (vcmla_rot180_lane_f16): New.
> > >         (vcmla_rot180_laneq_f16): New.
> > >         (vcmlaq_rot180_lane_f16): New.
> > >         (vcmlaq_rot180_laneq_f16): New.
> > >         (vcmla_rot270_f16): New.
> > >         (vcmlaq_rot270_f16): New.
> > >         (vcmla_rot270_lane_f16): New.
> > >         (vcmla_rot270_laneq_f16): New.
> > >         (vcmlaq_rot270_lane_f16): New.
> > >         (vcmlaq_rot270_laneq_f16): New.
> > >         (vcadd_rot90_f32): New.
> > >         (vcaddq_rot90_f32): New.
> > >         (vcadd_rot270_f32): New.
> > >         (vcaddq_rot270_f32): New.
> > >         (vcmla_f32): New.
> > >         (vcmlaq_f32): New.
> > >         (vcmla_lane_f32): New.
> > >         (vcmla_laneq_f32): New.
> > >         (vcmlaq_lane_f32): New.
> > >         (vcmlaq_laneq_f32): New.
> > >         (vcmla_rot90_f32): New.
> > >         (vcmlaq_rot90_f32): New.
> > >         (vcmla_rot90_lane_f32): New.
> > >         (vcmla_rot90_laneq_f32): New.
> > >         (vcmlaq_rot90_lane_f32): New.
> > >         (vcmlaq_rot90_laneq_f32): New.
> > >         (vcmla_rot180_f32): New.
> > >         (vcmlaq_rot180_f32): New.
> > >         (vcmla_rot180_lane_f32): New.
> > >         (vcmla_rot180_laneq_f32): New.
> > >         (vcmlaq_rot180_lane_f32): New.
> > >         (vcmlaq_rot180_laneq_f32): New.
> > >         (vcmla_rot270_f32): New.
> > >         (vcmlaq_rot270_f32): New.
> > >         (vcmla_rot270_lane_f32): New.
> > >         (vcmla_rot270_laneq_f32): New.
> > >         (vcmlaq_rot270_lane_f32): New.
> > >         (vcmlaq_rot270_laneq_f32): New.
> > >         * config/arm/arm_neon_builtins.def (vcadd90, vcadd270, vcmla0, 
> > > vcmla90,
> > >         vcmla180, vcmla270, vcmla_lane0, vcmla_lane90, vcmla_lane180, 
> > > vcmla_lane270,
> > >         vcmla_laneq0, vcmla_laneq90, vcmla_laneq180, vcmla_laneq270,
> > >         vcmlaq_lane0, vcmlaq_lane90, vcmlaq_lane180, vcmlaq_lane270): New.
> > >         * config/arm/neon.md (neon_vcmla_lane<rot><mode>,
> > >         neon_vcmla_laneq<rot><mode>, neon_vcmlaq_lane<rot><mode>): New.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > > 2018-12-11  Tamar Christina  <tamar.christ...@arm.com>
> > >
> > >         * gcc.target/aarch64/advsimd-intrinsics/vector-complex.c: Add 
> > > AArch32 regexpr.
> > >         * gcc.target/aarch64/advsimd-intrinsics/vector-complex_f16.c: 
> > > Likewise.
> > >
> > > --
> >
>
> Since r267796, I've noticed a regression on aarch64:
> FAIL: gcc.target/aarch64/pr68674.c (test for excess errors)
> Excess errors:
> /home/tcwg-buildslave/workspace/tcwg-buildfarm_0/_build/builds/aarch64-unknown-linux-gnu/aarch64-unknown-linux-gnu/gcc.git~master_rev_9ccac37030d1cce880d7df7a5716fb56f89a67f6-stage2/gcc/include/arm_neon.h:33361:10:
> error: incompatible types when returning type 'int' but 'float16x4_t'
> was expected
> /home/tcwg-buildslave/workspace/tcwg-buildfarm_0/_build/builds/aarch64-unknown-linux-gnu/aarch64-unknown-linux-gnu/gcc.git~master_rev_9ccac37030d1cce880d7df7a5716fb56f89a67f6-stage2/gcc/include/arm_neon.h:33385:10:
> error: incompatible types when returning type 'int' but 'float16x4_t'
> was expected
> /home/tcwg-buildslave/workspace/tcwg-buildfarm_0/_build/builds/aarch64-unknown-linux-gnu/aarch64-unknown-linux-gnu/gcc.git~master_rev_9ccac37030d1cce880d7df7a5716fb56f89a67f6-stage2/gcc/include/arm_neon.h:33423:10:
> error: incompatible types when returning type 'int' but 'float16x4_t'
> was expected
> /home/tcwg-buildslave/workspace/tcwg-buildfarm_0/_build/builds/aarch64-unknown-linux-gnu/aarch64-unknown-linux-gnu/gcc.git~master_rev_9ccac37030d1cce880d7df7a5716fb56f89a67f6-stage2/gcc/include/arm_neon.h:33477:10:
> error: incompatible types when returning type 'int' but 'float16x4_t'
> was expected
> /home/tcwg-buildslave/workspace/tcwg-buildfarm_0/_build/builds/aarch64-unknown-linux-gnu/aarch64-unknown-linux-gnu/gcc.git~master_rev_9ccac37030d1cce880d7df7a5716fb56f89a67f6-stage2/gcc/include/arm_neon.h:33595:10:
> error: incompatible types when returning type 'int' but 'float32x2_t'
> was expected
> /home/tcwg-buildslave/workspace/tcwg-buildfarm_0/_build/builds/aarch64-unknown-linux-gnu/aarch64-unknown-linux-gnu/gcc.git~master_rev_9ccac37030d1cce880d7df7a5716fb56f89a67f6-stage2/gcc/include/arm_neon.h:33648:10:
> error: incompatible types when returning type 'int' but 'float32x2_t'
> was expected
> /home/tcwg-buildslave/workspace/tcwg-buildfarm_0/_build/builds/aarch64-unknown-linux-gnu/aarch64-unknown-linux-gnu/gcc.git~master_rev_9ccac37030d1cce880d7df7a5716fb56f89a67f6-stage2/gcc/include/arm_neon.h:33701:10:
> error: incompatible types when returning type 'int' but 'float32x2_t'
> was expected
> /home/tcwg-buildslave/workspace/tcwg-buildfarm_0/_build/builds/aarch64-unknown-linux-gnu/aarch64-unknown-linux-gnu/gcc.git~master_rev_9ccac37030d1cce880d7df7a5716fb56f89a67f6-stage2/gcc/include/arm_neon.h:33754:10:
> error: incompatible types when returning type 'int' but 'float32x2_t'
> was expected
>
> I'm surprised you didn't see this during validations?


I've noticed other problems on arm-none-linux-gnueabihf:
FAIL: gcc.target/aarch64/advsimd-intrinsics/vector-complex_f16.c   -O0
 (test for excess errors)
Excess errors:
/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-linux-gnueabihf/gcc3/gcc/include/arm_neon.h:18323:10:
error: this builtin is not supported for this target
[....]
The testcase is compiled with -mfp16-format=ieee -march=armv8.3-a -O2
-march=armv8.3-a+fp16


In addition, guess what, some scan-assembler-times directives fail on
big-endian.....
on armeb-none-linux-gnueabihf :
gcc.target/aarch64/advsimd-intrinsics/vector-complex.c   -O0  :
vcmla.f32\\td[0-9]+, d[0-9]+, d[0-9]+\\[0\\], #0 found 1 times
FAIL: gcc.target/aarch64/advsimd-intrinsics/vector-complex.c   -O0
scan-assembler-times vcmla.f32\\td[0-9]+, d[0-9]+, d[0-9]+\\[0\\], #0
2
gcc.target/aarch64/advsimd-intrinsics/vector-complex.c   -O0  :
vcmla.f32\\td[0-9]+, d[0-9]+, d[0-9]+\\[0\\], #180 found 1 times
FAIL: gcc.target/aarch64/advsimd-intrinsics/vector-complex.c   -O0
scan-assembler-times vcmla.f32\\td[0-9]+, d[0-9]+, d[0-9]+\\[0\\],
#180 2
gcc.target/aarch64/advsimd-intrinsics/vector-complex.c   -O0  :
vcmla.f32\\td[0-9]+, d[0-9]+, d[0-9]+\\[0\\], #270 found 1 times
FAIL: gcc.target/aarch64/advsimd-intrinsics/vector-complex.c   -O0
scan-assembler-times vcmla.f32\\td[0-9]+, d[0-9]+, d[0-9]+\\[0\\],
#270 2
gcc.target/aarch64/advsimd-intrinsics/vector-complex.c   -O0  :
vcmla.f32\\td[0-9]+, d[0-9]+, d[0-9]+\\[0\\], #90 found 1 times
FAIL: gcc.target/aarch64/advsimd-intrinsics/vector-complex.c   -O0
scan-assembler-times vcmla.f32\\td[0-9]+, d[0-9]+, d[0-9]+\\[0\\], #90
2
gcc.target/aarch64/advsimd-intrinsics/vector-complex.c   -O0  :
vcmla.f32\\tq[0-9]+, q[0-9]+, d[0-9]+\\[0\\], #0 found 0 times
FAIL: gcc.target/aarch64/advsimd-intrinsics/vector-complex.c   -O0
scan-assembler-times vcmla.f32\\tq[0-9]+, q[0-9]+, d[0-9]+\\[0\\], #0
2
gcc.target/aarch64/advsimd-intrinsics/vector-complex.c   -O0  :
vcmla.f32\\tq[0-9]+, q[0-9]+, d[0-9]+\\[0\\], #180 found 0 times
FAIL: gcc.target/aarch64/advsimd-intrinsics/vector-complex.c   -O0
scan-assembler-times vcmla.f32\\tq[0-9]+, q[0-9]+, d[0-9]+\\[0\\],
#180 2
gcc.target/aarch64/advsimd-intrinsics/vector-complex.c   -O0  :
vcmla.f32\\tq[0-9]+, q[0-9]+, d[0-9]+\\[0\\], #270 found 0 times
FAIL: gcc.target/aarch64/advsimd-intrinsics/vector-complex.c   -O0
scan-assembler-times vcmla.f32\\tq[0-9]+, q[0-9]+, d[0-9]+\\[0\\],
#270 2
gcc.target/aarch64/advsimd-intrinsics/vector-complex.c   -O0  :
vcmla.f32\\tq[0-9]+, q[0-9]+, d[0-9]+\\[0\\], #90 found 0 times
FAIL: gcc.target/aarch64/advsimd-intrinsics/vector-complex.c   -O0
scan-assembler-times vcmla.f32\\tq[0-9]+, q[0-9]+, d[0-9]+\\[0\\], #90
2

On aarch64_be, I'm see ICEs:
/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vector-complex.c:
In function 'test_vcmla_laneq_f32':
/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vector-complex.c:78:1:
internal compiler error: Segmentation fault
0xc3967f crash_signal
        /gcc/toplev.c:326
0xa70718 mark_jump_label_1
        /gcc/jump.c:1087
0xa707fb mark_jump_label_1
        /gcc/jump.c:1212
0xa707fb mark_jump_label_1
        /gcc/jump.c:1212
0xa70c62 mark_all_labels
        /gcc/jump.c:332
0xa70c62 rebuild_jump_labels_1
        /gcc/jump.c:74
0x78c6af execute
        /gcc/cfgexpand.c:6549
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See <https://gcc.gnu.org/bugs/> for instructions.

and similar for vector-complex_f16.c

Maybe you've already fixed this later in the series?

Happy new year :)

Christophe

Re: [PATCH 9/9][GCC][Arm] Add ACLE intrinsics for complex mutliplication and addition

Reply via email to