https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118873
Bug ID: 118873 Summary: -favoid-store-forwarding makes a mess out of a STLF fail Product: gcc Version: 15.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: rguenth at gcc dot gnu.org Target Milestone: --- The following testcase created from PR90579 shows that -favoid-store-forwarding on x86_64 with -O2 -mavx2 doubles the number of STLF fails rather than doing any good. typedef int v4si __attribute__((vector_size(16))); typedef int v8si __attribute__((vector_size(32))); v8si a; v4si b; void foo (int *p) { v8si aa = a; v4si bb = b; *(v8si *)p = a; *(v4si *)(p + 8) = b; a = *(v8si *)(p + 4); } code generates to, at -O2 -mavx2 foo: .LFB0: .cfi_startproc vmovdqa a(%rip), %ymm0 vmovdqa %ymm0, (%rdi) <--- store vmovdqa b(%rip), %xmm0 vmovdqa %xmm0, 32(%rdi) <--- store vmovdqa 16(%rdi), %ymm0 <--- STLF FAIL vmovdqa %ymm0, a(%rip) vzeroupper and with -favoid-store-forwarding foo: .LFB0: .cfi_startproc pushq %rbp .cfi_def_cfa_offset 16 .cfi_offset 6, -16 movq %rsp, %rbp .cfi_def_cfa_register 6 andq $-32, %rsp vmovdqa a(%rip), %ymm0 vmovdqa %ymm0, (%rdi) <--- original first store vmovdqa b(%rip), %xmm0 vmovdqa 16(%rdi), %ymm1 <--- STLF fail plus uninit memory read vmovdqa %ymm1, -32(%rsp) vmovdqa %xmm0, -16(%rsp) vmovdqa -32(%rsp), %ymm2 <--- STLF fail newly introduced vmovdqa %xmm0, 32(%rdi) vmovdqa %ymm2, a(%rip) vzeroupper we introudce a read of uninitialized memory and the attempt to set the upper part results in a spill: Store forwarding avoided with bit inserts: With sequence: (insn 15 0 0 (set (subreg:V4SI (reg:V8SI 100 [ _3 ]) 16) (reg:V4SI 104)) 2397 {movv4si_internal} (nil)) subregs are not the canonical form for vector inserts, instead you are expected to use (vec_concat (vec_select ...) (...)) IIRC. The first load with the STLF fail should have been narrowed using knowledge that reg:V4SI and reg:V8SI use the same register file (are tieable?)