https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88361
--- Comment #1 from Daniel Fruzynski <bugzi...@poradnik-webmastera.com> --- For reference, this is NEON code which I used on AARCH64: [code] void test2() { int n = 0; for (; n < SIZE*SIZE-3; n += 4) { // Copy data uint32x4_t v = vld1q_u32((uint32_t*)(&src[0][0] + n)); vst1q_u32((uint32_t*)(&dst1[0][0] + n), v); // Calculate bitmasks v = vshlq_u32(vdupq_n_u32(1), vreinterpretq_s32_u32(v)); vst1q_u32((uint32_t*)(&dst2[0][0] + n), v); } for (; n < SIZE*SIZE; n++) { int x = *(&src[0][0] + n); *((&dst1[0][0] + n)) = x; *((&dst2[0][0] + n)) = 1 << x; } } [/code]