https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82518
--- Comment #46 from Jakub Jelinek <jakub at gcc dot gnu.org> --- Wonder if that: vect_array.11[0] = vect_vec_iv_.7_45; vect_array.11[1] = vect__4.8_48; on armeb shouldn't have been [1] and [0] instead, otherwise we end up with: (insn 35 37 38 5 (set (subreg:V4SI (reg:OI 155 [ vect_array.11 ]) 0) (reg:V4SI 110 [ vect_vec_iv_.7 ])) "pr82518.c":8 939 {*neon_movv4si} (nil)) (insn 38 35 41 5 (set (subreg:V4SI (reg:OI 155 [ vect_array.11 ]) 16) (plus:V4SI (reg:V4SI 110 [ vect_vec_iv_.7 ]) (reg:V4SI 171))) "pr82518.c":8 998 {*addv4si3_neon} (nil)) (insn 41 38 39 5 (set (reg:V4SI 110 [ vect_vec_iv_.7 ]) (plus:V4SI (reg:V4SI 110 [ vect_vec_iv_.7 ]) (reg:V4SI 169))) 998 {*addv4si3_neon} (nil)) (insn 39 41 43 5 (set (mem:OI (post_inc:SI (reg:SI 152 [ ivtmp.31 ])) [2 MEM[(int *)vectp_p.9_49]+0 S32 A32]) (unspec:OI [ (reg:OI 155 [ vect_array.11 ]) (unspec:V4SI [ (const_int 0 [0]) ] UNSPEC_VSTRUCTDUMMY) ] UNSPEC_VST2)) "pr82518.c":8 2396 {neon_vst2v4si} (expr_list:REG_INC (reg:SI 152 [ ivtmp.31 ]) (nil))) where pseudo 110 is the vect_vec_iv_.7_45 ({i, i + 1, i + 2, i + 3}) and insn 38 adds {1, 1, 1, 1} to that. It really depends on what exactly the neon_vst2v4si instruction does on armeb. vmov.i32 q10, #4 @ v4si vmov.i32 q9, #1 @ v4si ... vldr d16, .L19 vldr d17, .L19+8 .L4: vadd.i32 q11, q8, q9 vst1.64 {d16-d17}, [sp:64] vadd.i32 q8, q8, q10 vstr d22, [sp, #16] vstr d23, [sp, #24] vld1.64 {d22-d25}, [sp:64] vst2.32 {d22-d25}, [r3]! If it works like on armel, except the elements of the vectors are byte-swapped, then it should be [1] and [0].