https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118873

            Bug ID: 118873
           Summary: -favoid-store-forwarding makes a mess out of a STLF
                    fail
           Product: gcc
           Version: 15.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: rguenth at gcc dot gnu.org
  Target Milestone: ---

The following testcase created from PR90579 shows that -favoid-store-forwarding
on x86_64 with -O2 -mavx2 doubles the number of STLF fails rather than doing
any good.

typedef int v4si __attribute__((vector_size(16)));
typedef int v8si __attribute__((vector_size(32)));

v8si a;
v4si b;

void foo (int *p)
{
  v8si aa = a;
  v4si bb = b;
  *(v8si *)p = a;
  *(v4si *)(p + 8) = b;
  a = *(v8si *)(p + 4);
}


code generates to, at -O2 -mavx2

foo:
.LFB0:
        .cfi_startproc
        vmovdqa a(%rip), %ymm0
        vmovdqa %ymm0, (%rdi)    <--- store
        vmovdqa b(%rip), %xmm0
        vmovdqa %xmm0, 32(%rdi)  <--- store
        vmovdqa 16(%rdi), %ymm0  <--- STLF FAIL
        vmovdqa %ymm0, a(%rip)
        vzeroupper

and with -favoid-store-forwarding

foo:
.LFB0:
        .cfi_startproc
        pushq   %rbp
        .cfi_def_cfa_offset 16
        .cfi_offset 6, -16
        movq    %rsp, %rbp
        .cfi_def_cfa_register 6
        andq    $-32, %rsp
        vmovdqa a(%rip), %ymm0
        vmovdqa %ymm0, (%rdi)   <--- original first store
        vmovdqa b(%rip), %xmm0
        vmovdqa 16(%rdi), %ymm1 <--- STLF fail plus uninit memory read
        vmovdqa %ymm1, -32(%rsp)
        vmovdqa %xmm0, -16(%rsp)
        vmovdqa -32(%rsp), %ymm2 <--- STLF fail newly introduced
        vmovdqa %xmm0, 32(%rdi)
        vmovdqa %ymm2, a(%rip)
        vzeroupper

we introudce a read of uninitialized memory and the attempt to set the
upper part results in a spill:

Store forwarding avoided with bit inserts:
With sequence:
  (insn 15 0 0 (set (subreg:V4SI (reg:V8SI 100 [ _3 ]) 16)
        (reg:V4SI 104)) 2397 {movv4si_internal}
     (nil))

subregs are not the canonical form for vector inserts, instead you are
expected to use (vec_concat (vec_select ...) (...)) IIRC.

The first load with the STLF fail should have been narrowed using knowledge
that reg:V4SI and reg:V8SI use the same register file (are tieable?)
  • [Bug rtl-optimizatio... rguenth at gcc dot gnu.org via Gcc-bugs

Reply via email to