https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85048

Devin Hussey <husseydevin at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |husseydevin at gmail dot com

--- Comment #5 from Devin Hussey <husseydevin at gmail dot com> ---
ARM/AArch64 NEON use these:

From            To           Intrinsic      ARMv7-a          AArch64
intXxY_t     -> int2XxY_t    vmovl_sX       vmovl.sX         sshll #0?
uintXxY_t.   -> uint2XxY_t   vmovl_uX       vmovl.uX         ushll #0?
[u]int2XxY_t -> [u]intXxY_t  vmovn_[us]X    vmovn.iX         xtn
floatXxY_t   -> intXxY_t     vcvt[q]_sX_fX  vcvt.sX.fX       fcvtzs
floatXxY_t   -> uintXxY_t    vcvt[q]_uX_fX  vcvt.uX.fX       fcvtzu
intXxY_t     -> floatXxY_t   vcvt[q]_fX_sX  vcvt.fX.sX       scvtf
uintXxY_t    -> floatXxY_t   vcvt[q]_fX_uX  vcvt.fX.uX       ucvtf
float32x2_t  -> float64x2_t  vcvt_f32_f64   2x vcvt.f64.f32  fcvtl
float64x2_t  -> float32x2_t  vcvt_f64_f32   2x vcvt.f32.f64  fcvtn

Clang optimizes vmovl to vshll by zero for some reason. 

float32x2_t <-> float64x2_t requires 2 VFP instructions on ARMv7-a.

Reply via email to