Hi James,
Thanks for your comment.
Seems we need a 'dup' before 'fmul' if we use the GCC vector extension syntax
way.
Example:
dup v1.2s, v1.s[0]
fmul v0.2s, v1.2s, v0.2s
And we need another pattern to combine this two insns into 'fmul
%0.2s,%1.2s,%2.s[0]', which is kind of complex.
BTW: maybe it's better to reconsider this issue after this patch, right?
Thanks.
Jiang jiji
On Sat, Apr 11, 2015 at 11:37:47AM +0100, Jiangjiji wrote:
> Hi,
> This is a ping for: https://gcc.gnu.org/ml/gcc-patches/2015-03/msg00772.html
> Regtested with aarch64-linux-gnu on QEMU.
> This patch has no regressions for aarch64_be-linux-gnu big-endian target
> too.
> OK for the trunk?
>
> Thanks.
> Jiang jiji
>
>
> ----------
> Re: [PING^2] [PATCH] [AArch64, NEON] Improve vmulX intrinsics
>
> Hi, Kyrill
> Thank you for your suggestion.
> I fixed it and regtested with aarch64-linux-gnu on QEMU.
> This patch has no regressions for aarch64_be-linux-gnu big-endian target
> too.
> OK for the trunk?
Hi Jiang,
I'm sorry that I've taken so long to get to this, I've been out of office for
several weeks. I have one comment.
> +__extension__ static __inline float32x2_t __attribute__
> +((__always_inline__))
> +vmul_n_f32 (float32x2_t __a, float32_t __b) {
> + return __builtin_aarch64_mul_nv2sf (__a, __b); }
> +
For vmul_n_* intrinsics, is there a reason we don't want to use the GCC vector
extension syntax to allow us to write these as:
__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
vmul_n_f32 (float32x2_t __a, float32_t __b)
{
return __a * __b;
}
It would be great if we could make that work.
Thanks,
James