[Bug target/123631] New: Odd choice for vector constant materialization

rguenth at gcc dot gnu.org via Gcc-bugs Fri, 16 Jan 2026 03:23:07 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=123631


            Bug ID: 123631
           Summary: Odd choice for vector constant materialization
           Product: gcc
           Version: 16.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: rguenth at gcc dot gnu.org
  Target Milestone: ---

I'm seeing

void foo (int *q)
{
  q[0] = 10;
  q[1] = 10;
  q[2] = 10;
  q[3] = 10;
}

with -march=znver2

   0:   b8 0a 00 00 00          mov    $0xa,%eax
   5:   c5 f9 6e c0             vmovd  %eax,%xmm0
   9:   c4 e2 79 58 c0          vpbroadcastd %xmm0,%xmm0
   e:   c5 fa 7f 07             vmovdqu %xmm0,(%rdi)

and -march=znver4

   0:   b8 0a 00 00 00          mov    $0xa,%eax
   5:   62 f2 7d 08 7c c0       vpbroadcastd %eax,%xmm0
   b:   c5 fa 7f 07             vmovdqu %xmm0,(%rdi)

which are both larger than with a non-uniform vector constant which is
loaded from memory:

   0:   c5 f9 6f 05 00 00 00    vmovdqa 0x0(%rip),%xmm0        # 8 <foo+0x8>
   7:   00 
   8:   c5 fa 7f 07             vmovdqu %xmm0,(%rdi)

and I think also has comparable (if not lower) latency (due to GPR<->XMM move)
if in cache, for sure less uops and less port pressure.

With FP we're broadcasting from scalar memory using vbroadcastss.  For
the same sized integer data that should be possible as well, but is
one byte larger (but possibly better for dcache, esp. when broadcasting
to %ymm or %zmm).

[Bug target/123631] New: Odd choice for vector constant materialization

Reply via email to