http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46284

           Summary: Lack of proper optimization for certain SSE
                    operations, and weird behavior with similar source
                    codes
           Product: gcc
           Version: 4.6.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c++
        AssignedTo: unassig...@gcc.gnu.org
        ReportedBy: rydenci...@gmail.com


I am using this source code: http://pastebin.com/tMpQ2Bzv
Compile with -O3 -march=core2 -std=c++0x
Notice that the line 178 has been commented.
GCC will produce the following assembly in the final binary to initialize v1
and v2:
mov     dword ptr [esp+60h+var_30], 3F800000h
mov     dword ptr [esp+60h+var_30+4], 40000000h
mov     dword ptr [esp+60h+var_30+8], 40400000h
mov     dword ptr [esp+60h+var_30+0Ch], 40800000h
mov     dword ptr [esp+60h+var_20], 41000000h
mov     dword ptr [esp+60h+var_20+4], 40E00000h
mov     dword ptr [esp+60h+var_20+8], 40C00000h
mov     dword ptr [esp+60h+var_20+0Ch], 40A00000h

Removing the comment on that line will change the assembly and the
initialization will be changed to:
movaps  xmm1, oword ptr ds:oword_47D090
movaps  xmm0, oword ptr ds:oword_47D0A0
movaps  oword ptr [esp+80h+var_50], xmm1
movaps  oword ptr [esp+80h+var_40], xmm0

which seems to make no sense.

Also, the assembly for the first case would look like this:
mov     dword ptr [esp+60h+var_30], 3F800000h
mov     dword ptr [esp+60h+var_30+4], 40000000h
mov     dword ptr [esp+60h+var_30+8], 40400000h
mov     dword ptr [esp+60h+var_30+0Ch], 40800000h
mov     dword ptr [esp+60h+var_20], 41000000h
mov     dword ptr [esp+60h+var_20+4], 40E00000h
mov     dword ptr [esp+60h+var_20+8], 40C00000h
mov     dword ptr [esp+60h+var_20+0Ch], 40A00000h
movaps  xmm0, oword ptr [esp+60h+var_30]
mov     [esp+60h+var_60], offset aResultadoFFFF ; "Resultado: %f %f %f %f\n"
addps   xmm0, oword ptr [esp+60h+var_20]
movaps  oword ptr [esp+60h+var_10], xmm0
fld     dword ptr [esp+60h+var_10+0Ch]
fstp    [esp+60h+var_44]
fld     dword ptr [esp+60h+var_10+8]
fstp    [esp+60h+var_4C]
fld     dword ptr [esp+60h+var_10+4]
fstp    [esp+60h+var_54]
fld     dword ptr [esp+60h+var_10]
fstp    [esp+60h+var_5C]
call    printf
xor     eax, eax

But the object creation for those vectors should be dropped at all, and it
should work on SSE registers when possible.

Reply via email to