------- Additional Comments From rth at gcc dot gnu dot org 2005-01-30 10:59 ------- Ah hah. This is a bit of "cleverness" in the backend. It turns out that for K8, imul with an 8-bit immediate is vector decoded, and imul with a register is direct decoded. In theory, splitting out the constant allows greater throughput through the instruction decoder. Now, we *only* do this split after register allocation, and only if there are in fact a free register. So in no case should this be causing more register spills.
As an aside, if we had proper control over the exact instruction form being used, as opposed to yielding that control to the assembler, we'd emit the imul with a 32-bit immediate (0x69 db 30 00 00 00), since that's direct decoded too. But emitting machine code directly from the compiler is pie-in-the-sky territory. Other than that, I guess the rest is just generic register-allocation sucking. Fancy attaching the assembly produced with icc for the second test case, so that we know what we're aiming for? -- What |Removed |Added ---------------------------------------------------------------------------- Status|WAITING |NEW Ever Confirmed| |1 Keywords| |ra Last reconfirmed|0000-00-00 00:00:00 |2005-01-30 10:59:44 date| | Summary|[missed-optimization] gcc4 |sub-optimial register |is really reluctant to use |allocation with sse |fancy x86 addressing modes | http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19680