[Bug target/121230] New: x86: Inefficient code generation with -m3dnow -msse since GCC 12

manx-bugzilla at problemloesungsmaschine dot de via Gcc-bugs Thu, 24 Jul 2025 00:25:49 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121230


            Bug ID: 121230
           Summary: x86: Inefficient code generation with -m3dnow -msse
                    since GCC 12
           Product: gcc
           Version: 15.1.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: manx-bugzilla at problemloesungsmaschine dot de
  Target Milestone: ---

Consider the following C code:
```
typedef struct {
    float a;
    float b;
} f32_2;

f32_2 add32_2(f32_2 x, f32_2 y) {
    return (f32_2){
        x.a + y.a,
        x.b + y.b};
}
```

Godbolt link: https://godbolt.org/z/T6To8qbe1

GCC 15.1 -m32 -march=athlon-xp -std=c11 -O3 generates:
``` (top left)
add32_2:
        sub     esp, 12
        fld     DWORD PTR [esp+28]
        fadd    DWORD PTR [esp+20]
        mov     eax, DWORD PTR [esp+16]
        fstp    DWORD PTR [esp+4]
        fld     DWORD PTR [esp+32]
        fadd    DWORD PTR [esp+24]
        movss   xmm0, DWORD PTR [esp+4]
        fstp    DWORD PTR [esp+4]
        movss   xmm1, DWORD PTR [esp+4]
        unpcklps        xmm0, xmm1
        movlps  QWORD PTR [eax], xmm0
        add     esp, 12
        ret     4
```
which unnecessarily channels the return value through an XMM register.

This does not happen with
GCC 15.1 -m32 -march=pentium3 -std=c11 -O3:
``` (top center)
add32_2:
        fld     DWORD PTR [esp+20]
        fadd    DWORD PTR [esp+12]
        mov     eax, DWORD PTR [esp+4]
        fld     DWORD PTR [esp+16]
        fadd    DWORD PTR [esp+8]
        fstp    DWORD PTR [eax]
        fstp    DWORD PTR [eax+4]
        ret     4
```

or with GCC 11.4 -m32 -march=athlon-xp -std=c11 -O3:
``` (top right)
add32_2:
        fld     DWORD PTR [esp+20]
        fadd    DWORD PTR [esp+12]
        mov     eax, DWORD PTR [esp+4]
        fld     DWORD PTR [esp+16]
        fadd    DWORD PTR [esp+8]
        fstp    DWORD PTR [eax]
        fstp    DWORD PTR [eax+4]
        ret     4
```

Note: Athlon-XP supports MMX, 3DNOW, SSE1, while Pentium3 supports MMX, SSE1,
and apparently GCC choose -mfpmath=387 instead of -mfpmath=sse for both (which
is probably fine, and not subject of this issue).

Even if I force -mfpmath=sse, the code generation still looks a bit weird:

GCC 15.1 -m32 -march=i686 -mmmx -m3dnow -msse -msse2 -mfpmath=sse -std=c11 -O3:
``` (bottom left)
add32_2:
        movss   xmm0, DWORD PTR [esp+8]
        movss   xmm1, DWORD PTR [esp+12]
        addss   xmm0, DWORD PTR [esp+16]
        addss   xmm1, DWORD PTR [esp+20]
        mov     eax, DWORD PTR [esp+4]
        unpcklps        xmm0, xmm1
        movlps  QWORD PTR [eax], xmm0
        ret     4
```

compared to without -m3dnow
GCC 15.1 -m32 -march=i686 -mmmx -msse -msse2 -mfpmath=sse -std=c11 -O3:
``` (bottom center)
add32_2:
        movss   xmm0, DWORD PTR [esp+12]
        movss   xmm1, DWORD PTR [esp+8]
        mov     eax, DWORD PTR [esp+4]
        addss   xmm0, DWORD PTR [esp+20]
        addss   xmm1, DWORD PTR [esp+16]
        movss   DWORD PTR [eax+4], xmm0
        movss   DWORD PTR [eax], xmm1
        ret     4
```

Clang for comparison does default to generating SSE1 instructions, and does not
even support -mfpmath=387 with -msse, or -m3dnow at all.
Clang 20.1.0 -m32 -march=i686 -mmmx -msse -msse2 -std=c11 -O3:
``` (bottom right)
add32_2:
        mov     eax, dword ptr [esp + 4]
        movsd   xmm0, qword ptr [esp + 8]
        movsd   xmm1, qword ptr [esp + 16]
        addps   xmm1, xmm0
        movlps  qword ptr [eax], xmm1
        ret     4
```

As far as I know, GCC does not generate 3DNOW instructions by itself, which
makes it even more weird that -m3dnow appears to influence (and worsen) code
generation of both x87 and SSE1 instructions.

The problem appears to have first appeared with GCC 12.

[Bug target/121230] New: x86: Inefficient code generation with -m3dnow -msse since GCC 12

Reply via email to