http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46284
Summary: Lack of proper optimization for certain SSE operations, and weird behavior with similar source codes Product: gcc Version: 4.6.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ AssignedTo: unassig...@gcc.gnu.org ReportedBy: rydenci...@gmail.com I am using this source code: http://pastebin.com/tMpQ2Bzv Compile with -O3 -march=core2 -std=c++0x Notice that the line 178 has been commented. GCC will produce the following assembly in the final binary to initialize v1 and v2: mov dword ptr [esp+60h+var_30], 3F800000h mov dword ptr [esp+60h+var_30+4], 40000000h mov dword ptr [esp+60h+var_30+8], 40400000h mov dword ptr [esp+60h+var_30+0Ch], 40800000h mov dword ptr [esp+60h+var_20], 41000000h mov dword ptr [esp+60h+var_20+4], 40E00000h mov dword ptr [esp+60h+var_20+8], 40C00000h mov dword ptr [esp+60h+var_20+0Ch], 40A00000h Removing the comment on that line will change the assembly and the initialization will be changed to: movaps xmm1, oword ptr ds:oword_47D090 movaps xmm0, oword ptr ds:oword_47D0A0 movaps oword ptr [esp+80h+var_50], xmm1 movaps oword ptr [esp+80h+var_40], xmm0 which seems to make no sense. Also, the assembly for the first case would look like this: mov dword ptr [esp+60h+var_30], 3F800000h mov dword ptr [esp+60h+var_30+4], 40000000h mov dword ptr [esp+60h+var_30+8], 40400000h mov dword ptr [esp+60h+var_30+0Ch], 40800000h mov dword ptr [esp+60h+var_20], 41000000h mov dword ptr [esp+60h+var_20+4], 40E00000h mov dword ptr [esp+60h+var_20+8], 40C00000h mov dword ptr [esp+60h+var_20+0Ch], 40A00000h movaps xmm0, oword ptr [esp+60h+var_30] mov [esp+60h+var_60], offset aResultadoFFFF ; "Resultado: %f %f %f %f\n" addps xmm0, oword ptr [esp+60h+var_20] movaps oword ptr [esp+60h+var_10], xmm0 fld dword ptr [esp+60h+var_10+0Ch] fstp [esp+60h+var_44] fld dword ptr [esp+60h+var_10+8] fstp [esp+60h+var_4C] fld dword ptr [esp+60h+var_10+4] fstp [esp+60h+var_54] fld dword ptr [esp+60h+var_10] fstp [esp+60h+var_5C] call printf xor eax, eax But the object creation for those vectors should be dropped at all, and it should work on SSE registers when possible.