Ping.

Alan Lawrence wrote:
vld1_lane intrinsics ICE at -O0 because they contain a call to the vset_lane intrinsics, through which the lane index is not constant-propagated. (They are fine at -O1 and higher!). This fixes the ICE by replacing said call by a macro.

Rather than defining many individual macros __aarch64_vset(q?)_lane_[uspf](8|16|32|64), instead this introduces a __AARCH64_NUM_LANES macro using sizeof(), such that a single __aarch64_vset_lane_any macro handles all variants (with bounds-checking and endianness-flipping). This reduces potential for error vs. writing the number of lanes for each variant by hand as previously.

Also factor the endianness-flipping out to a separate macro __aarch64_lane; I intend to use this for vget_lane too in another patch.

Tested with check-gcc on aarch64-none-elf and aarch64_be-none-elf (including new test that FAILs without this patch).

Ok for trunk?


gcc/ChangeLog:

        * config/aarch64/arm_neon.h (__AARCH64_NUM_LANES, __aarch64_lane *2):
        New.
        (aarch64_vset_lane_any): Redefine using previous, same for BE + LE.
        (vset_lane_f32, vset_lane_f64, vset_lane_p8, vset_lane_p16,
        vset_lane_s8, vset_lane_s16, vset_lane_s32, vset_lane_s64,
        vset_lane_u8, vset_lane_u16, vset_lane_u32, vset_lane_u64): Remove
        number of lanes.
        (vld1_lane_f32, vld1_lane_f64, vld1_lane_p8, vld1_lane_p16,
        vld1_lane_s8, vld1_lane_s16, vld1_lane_s32, vld1_lane_s64,
        vld1_lane_u8, vld1_lane_u16, vld1_lane_u32, vld1_lane_u64): Call
        __aarch64_vset_lane_any rather than vset_lane_xxx.

gcc/testsuite/ChangeLog:

        * gcc.target/aarch64/vld1_lane-o0.c: New test.


Reply via email to