------- Comment #4 from ubizjak at gmail dot com 2007-03-01 13:47 ------- Current mainline produces really horrible code:
.L4: movl (%edx), %ebx addl $1, %eax movl 4(%edx), %esi addl $8, %edx movl %ebx, 8(%esp) movl %esi, 12(%esp) movq 8(%esp), %mm0 paddq (%ecx), %mm0 addl $8, %ecx cmpl %edi, %eax movq %mm0, 8(%esp) movl 8(%esp), %ebx movl 12(%esp), %esi jne .L4 This is due to two problems: 1) For some reason, ivopts doesn't use fancy i386 addressing modes. -fno-ivopts produces slightly better code: .L4: movl (%edi,%eax,8), %edx movl 4(%edi,%eax,8), %ecx movl %edx, 8(%esp) movl %ecx, 12(%esp) movq 8(%esp), %mm0 paddq (%esi,%eax,8), %mm0 addl $1, %eax cmpl %eax, %ebx movq %mm0, 8(%esp) movl 8(%esp), %edx movl 12(%esp), %ecx ja .L4 2) A DImode register is used in the middle of RTL stream, following to this reload sequence: (insn:HI 21 20 53 4 (set (reg:DI 1 dx) (mem:DI (plus:SI (mult:SI (reg/v:SI 0 ax [orig:59 i ] [59]) (const_int 8 [0x8])) (reg/v/f:SI 5 di [orig:64 a ] [64])) [2 S8 A64])) 56 {*movdi_2} (nil) (nil)) (insn 53 21 54 4 (set (mem/c:DI (plus:SI (reg/f:SI 7 sp) (const_int 8 [0x8])) [5 S8 A8]) (reg:DI 1 dx)) 56 {*movdi_2} (nil) (nil)) (insn 54 53 22 4 (set (reg:DI 29 mm0) (mem/c:DI (plus:SI (reg/f:SI 7 sp) (const_int 8 [0x8])) [5 S8 A8])) 56 {*movdi_2} (nil) (nil)) (insn:HI 22 54 55 4 (set (reg:DI 29 mm0) (unspec:DI [ (plus:DI (reg:DI 29 mm0) (mem:DI (plus:SI (mult:SI (reg/v:SI 0 ax [orig:59 i ] [59]) (const_int 8 [0x8])) (reg/v/f:SI 4 si [orig:65 b ] [65])) [2 S8 A64])) ] 38)) 612 {mmx_adddi3} (insn_list:REG_DEP_TRUE 21 (nil)) (nil)) DImode register in insn 21 gets allocated to dx/cx DImode pair, but insn 22 wants mmx register. Reload then inserts insn 53 and 54 to satisfy input and output constraints. The same story repeats at the end of the loop, but this time dx/cx gets allocated to V2SImode pseudo (?!): (insn:HI 24 55 26 4 (set (reg/v:V2SI 1 dx [orig:60 sum ] [60]) (mem/c:V2SI (plus:SI (reg/f:SI 7 sp) (const_int 8 [0x8])) [5 S8 A8])) 581 {*movv2si_internal} (insn_list:REG_DEP_TRUE 22 (nil)) (nil)) In above case, mm0 register (in DImode) gets reloaded from mm0 (V2SImode) via memory. It looks that mmx DImode _really_ upsets register allocator as it can be allocated to either si/si register pair or to mmx register. Perhaps we need V1DI mode to separate pure DImodes (either 2*32bit for i686 or 64bit for x86_64) from mmx DImodes. It is possible to change delicate allocation balance by changing register preferences in movdi_2 and mov<mode>_internal MMX move patterns, but we really need more robust solution for this problem. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=22152