https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85048
Devin Hussey <husseydevin at gmail dot com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |husseydevin at gmail dot com --- Comment #5 from Devin Hussey <husseydevin at gmail dot com> --- ARM/AArch64 NEON use these: From To Intrinsic ARMv7-a AArch64 intXxY_t -> int2XxY_t vmovl_sX vmovl.sX sshll #0? uintXxY_t. -> uint2XxY_t vmovl_uX vmovl.uX ushll #0? [u]int2XxY_t -> [u]intXxY_t vmovn_[us]X vmovn.iX xtn floatXxY_t -> intXxY_t vcvt[q]_sX_fX vcvt.sX.fX fcvtzs floatXxY_t -> uintXxY_t vcvt[q]_uX_fX vcvt.uX.fX fcvtzu intXxY_t -> floatXxY_t vcvt[q]_fX_sX vcvt.fX.sX scvtf uintXxY_t -> floatXxY_t vcvt[q]_fX_uX vcvt.fX.uX ucvtf float32x2_t -> float64x2_t vcvt_f32_f64 2x vcvt.f64.f32 fcvtl float64x2_t -> float32x2_t vcvt_f64_f32 2x vcvt.f32.f64 fcvtn Clang optimizes vmovl to vshll by zero for some reason. float32x2_t <-> float64x2_t requires 2 VFP instructions on ARMv7-a.