Ping.
Alan Lawrence wrote:
vld1_lane intrinsics ICE at -O0 because they contain a call to the vset_lane
intrinsics, through which the lane index is not constant-propagated. (They are
fine at -O1 and higher!). This fixes the ICE by replacing said call by a macro.
Rather than defining many individual macros
__aarch64_vset(q?)_lane_[uspf](8|16|32|64), instead this introduces a
__AARCH64_NUM_LANES macro using sizeof(), such that a single
__aarch64_vset_lane_any macro handles all variants (with bounds-checking and
endianness-flipping). This reduces potential for error vs. writing the number of
lanes for each variant by hand as previously.
Also factor the endianness-flipping out to a separate macro __aarch64_lane; I
intend to use this for vget_lane too in another patch.
Tested with check-gcc on aarch64-none-elf and aarch64_be-none-elf (including new
test that FAILs without this patch).
Ok for trunk?
gcc/ChangeLog:
* config/aarch64/arm_neon.h (__AARCH64_NUM_LANES, __aarch64_lane *2):
New.
(aarch64_vset_lane_any): Redefine using previous, same for BE + LE.
(vset_lane_f32, vset_lane_f64, vset_lane_p8, vset_lane_p16,
vset_lane_s8, vset_lane_s16, vset_lane_s32, vset_lane_s64,
vset_lane_u8, vset_lane_u16, vset_lane_u32, vset_lane_u64): Remove
number of lanes.
(vld1_lane_f32, vld1_lane_f64, vld1_lane_p8, vld1_lane_p16,
vld1_lane_s8, vld1_lane_s16, vld1_lane_s32, vld1_lane_s64,
vld1_lane_u8, vld1_lane_u16, vld1_lane_u32, vld1_lane_u64): Call
__aarch64_vset_lane_any rather than vset_lane_xxx.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/vld1_lane-o0.c: New test.