https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114566
--- Comment #11 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Seems like vectorizer bug to me. The _42 + 128 store is to
MEM <vector(16) float> [(float *)_42 + 128B];
aka:
<target_mem_ref 0x7fffea146580
type <vector_type 0x7fffea038930
type <real_type 0x7fffea1532a0 float sizes-gimplified SF
size <integer_cst 0x7fffea12cfd8 constant 32>
unit-size <integer_cst 0x7fffea14f000 constant 4>
align:32 warn_if_not_align:0 symtab:0 alias-set 1 canonical-type
0x7fffea1532a0 precision:32
pointer_to_this <pointer_type 0x7fffea153930>>
V16SF
size <integer_cst 0x7fffea14f480 constant 512>
unit-size <integer_cst 0x7fffea299960 constant 64>
user align:32 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type
0x7fffea2ac2a0 nunits:16
pointer_to_this <pointer_type 0x7fffea038888>>
arg:0 <ssa_name 0x7fffea03fee8
type <pointer_type 0x7fffea153000 type <void_type 0x7fffea14bf18 void>
public unsigned DI
size <integer_cst 0x7fffea12cd98 constant 64>
unit-size <integer_cst 0x7fffea12cdb0 constant 8>
align:64 warn_if_not_align:0 symtab:0 alias-set 2 canonical-type
0x7fffea153000
pointer_to_this <pointer_type 0x7fffea15b9d8>>
def_stmt _42 = (void *) ivtmp.49_58;
version:42
ptr-info 0x7fffea068540>
arg:1 <integer_cst 0x7fffea047c30 type <pointer_type 0x7fffea153930>
constant 128>>
so has 32-bit alignment there, so it uses movmisalign optab.
The _42 + 192 store is
MEM <vector(8) float> [(float *)_42 + 192B];
aka
<target_mem_ref 0x7fffea146600
type <vector_type 0x7fffea2a4f18
type <real_type 0x7fffea1532a0 float sizes-gimplified SF
size <integer_cst 0x7fffea12cfd8 constant 32>
unit-size <integer_cst 0x7fffea14f000 constant 4>
align:32 warn_if_not_align:0 symtab:0 alias-set 1 canonical-type
0x7fffea1532a0 precision:32
pointer_to_this <pointer_type 0x7fffea153930>>
V8SF
size <integer_cst 0x7fffea14f108 constant 256>
unit-size <integer_cst 0x7fffea14f1f8 constant 32>
align:256 warn_if_not_align:0 symtab:0 alias-set 1 canonical-type
0x7fffea2a4f18 nunits:8
pointer_to_this <pointer_type 0x7fffea2a7e70>>
arg:0 <ssa_name 0x7fffea03fee8
type <pointer_type 0x7fffea153000 type <void_type 0x7fffea14bf18 void>
public unsigned DI
size <integer_cst 0x7fffea12cd98 constant 64>
unit-size <integer_cst 0x7fffea12cdb0 constant 8>
align:64 warn_if_not_align:0 symtab:0 alias-set 2 canonical-type
0x7fffea153000
pointer_to_this <pointer_type 0x7fffea15b9d8>>
def_stmt _42 = (void *) ivtmp.49_58;
version:42
ptr-info 0x7fffea068540>
arg:1 <integer_cst 0x7fffea0682e8 type <pointer_type 0x7fffea153930>
constant 192>>
and so it expects 256-bit alignment (despite only 128-bit being guaranteed).