http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46524
Summary: Code size regression due to not reusing immediate
operands of moves
Product: gcc
Version: 4.6.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
AssignedTo: [email protected]
ReportedBy: [email protected]
Created attachment 22433
--> http://gcc.gnu.org/bugzilla/attachment.cgi?id=22433
preprocessed testcase
this is another CSiBE module. The testcase is storing a lot of zeros and ones
into many places. GCC 4.3 keep 0 in ebp and does:
f29: f3 0f 5c 84 24 e8 00 subss 0xe8(%rsp),%xmm0
f30: 00 00
f32: 8b 84 24 0c 01 00 00 mov 0x10c(%rsp),%eax
f39: c7 84 24 80 00 00 00 movl $0x3f800000,0x80(%rsp)
f40: 00 00 80 3f
f44: 89 ac 24 84 00 00 00 mov %ebp,0x84(%rsp)
f4b: 89 ac 24 88 00 00 00 mov %ebp,0x88(%rsp)
f52: 89 ac 24 8c 00 00 00 mov %ebp,0x8c(%rsp)
f59: 89 44 24 40 mov %eax,0x40(%rsp)
f5d: 89 44 24 54 mov %eax,0x54(%rsp)
f61: 89 ac 24 90 00 00 00 mov %ebp,0x90(%rsp)
f68: c7 84 24 94 00 00 00 movl $0x3f800000,0x94(%rsp)
f6f: 00 00 80 3f
f73: 89 ac 24 98 00 00 00 mov %ebp,0x98(%rsp)
f7a: 89 ac 24 9c 00 00 00 mov %ebp,0x9c(%rsp)
f81: f3 0f 59 05 00 00 00 mulss 0x0(%rip),%xmm0 # f89
<main+0xf89>
f88: 00
f89: 89 ac 24 a0 00 00 00 mov %ebp,0xa0(%rsp)
f90: 89 ac 24 a4 00 00 00 mov %ebp,0xa4(%rsp)
f97: c7 84 24 a8 00 00 00 movl $0x3f800000,0xa8(%rsp)
f9e: 00 00 80 3f
fa2: 89 ac 24 ac 00 00 00 mov %ebp,0xac(%rsp)
fa9: 89 ac 24 b0 00 00 00 mov %ebp,0xb0(%rsp)
fb0: 89 ac 24 b4 00 00 00 mov %ebp,0xb4(%rsp)
fb7: 89 ac 24 b8 00 00 00 mov %ebp,0xb8(%rsp)
fbe: c7 84 24 bc 00 00 00 movl $0x3f800000,0xbc(%rsp)
fc5: 00 00 80 3f
fc9: 89 6c 24 44 mov %ebp,0x44(%rsp)
fcd: 89 6c 24 48 mov %ebp,0x48(%rsp)
fd1: 89 6c 24 4c mov %ebp,0x4c(%rsp)
fd5: 89 6c 24 50 mov %ebp,0x50(%rsp)
fd9: 89 6c 24 58 mov %ebp,0x58(%rsp)
fdd: 89 6c 24 5c mov %ebp,0x5c(%rsp)
fe1: 89 6c 24 60 mov %ebp,0x60(%rsp)
fe5: 89 6c 24 64 mov %ebp,0x64(%rsp)
fe9: f3 0f 11 44 24 68 movss %xmm0,0x68(%rsp)
fef: 89 6c 24 6c mov %ebp,0x6c(%rsp)
ff3: 89 6c 24 70 mov %ebp,0x70(%rsp)
ff7: 89 6c 24 74 mov %ebp,0x74(%rsp)
ffb: 89 6c 24 78 mov %ebp,0x78(%rsp)
fff: c7 44 24 7c 00 00 80 movl $0x3f800000,0x7c(%rsp)
1006: 3f
Mainline uses stores:
13a1: f3 0f 5c 8c 24 dc 00 subss 0xdc(%rsp),%xmm1
13a8: 00 00
13aa: 49 03 44 24 40 add 0x40(%r12),%rax
13af: f3 0f 11 40 2c movss %xmm0,0x2c(%rax)
13b4: c7 40 20 00 00 00 00 movl $0x0,0x20(%rax)
13bb: c7 40 24 00 00 00 00 movl $0x0,0x24(%rax)
13c2: c7 40 28 00 00 00 00 movl $0x0,0x28(%rax)
13c9: 8b 84 24 f8 00 00 00 mov 0xf8(%rsp),%eax
13d0: f3 0f 11 44 24 40 movss %xmm0,0x40(%rsp)
13d6: f3 0f 59 0d 00 00 00 mulss 0x0(%rip),%xmm1 # 13de
<main+0x13de>
13dd: 00
13de: f3 0f 11 44 24 54 movss %xmm0,0x54(%rsp)
13e4: f3 0f 11 44 24 68 movss %xmm0,0x68(%rsp)
13ea: c7 44 24 44 00 00 00 movl $0x0,0x44(%rsp)
13f1: 00
13f2: 89 84 24 80 00 00 00 mov %eax,0x80(%rsp)
13f9: f3 0f 11 44 24 7c movss %xmm0,0x7c(%rsp)
13ff: 89 84 24 94 00 00 00 mov %eax,0x94(%rsp)
1406: c7 44 24 48 00 00 00 movl $0x0,0x48(%rsp)
140d: 00
140e: c7 44 24 4c 00 00 00 movl $0x0,0x4c(%rsp)
1415: 00
1416: c7 44 24 50 00 00 00 movl $0x0,0x50(%rsp)
141d: 00
141e: c7 44 24 58 00 00 00 movl $0x0,0x58(%rsp)
1425: 00
1426: c7 44 24 5c 00 00 00 movl $0x0,0x5c(%rsp)
142d: 00
142e: c7 44 24 60 00 00 00 movl $0x0,0x60(%rsp)
1435: 00
1436: c7 44 24 64 00 00 00 movl $0x0,0x64(%rsp)
143d: 00
143e: c7 44 24 6c 00 00 00 movl $0x0,0x6c(%rsp)
1445: 00
1446: c7 44 24 70 00 00 00 movl $0x0,0x70(%rsp)
144d: 00
144e: c7 44 24 74 00 00 00 movl $0x0,0x74(%rsp)
1455: 00
1456: c7 44 24 78 00 00 00 movl $0x0,0x78(%rsp)
145d: 00
145e: c7 84 24 84 00 00 00 movl $0x0,0x84(%rsp)
1465: 00 00 00 00
1469: c7 84 24 88 00 00 00 movl $0x0,0x88(%rsp)
1470: 00 00 00 00
1474: c7 84 24 8c 00 00 00 movl $0x0,0x8c(%rsp)
147b: 00 00 00 00
147f: c7 84 24 90 00 00 00 movl $0x0,0x90(%rsp)
1486: 00 00 00 00
148a: c7 84 24 98 00 00 00 movl $0x0,0x98(%rsp)
1491: 00 00 00 00
1495: c7 84 24 9c 00 00 00 movl $0x0,0x9c(%rsp)
149c: 00 00 00 00
14a0: c7 84 24 a0 00 00 00 movl $0x0,0xa0(%rsp)
14a7: 00 00 00 00
RTL cprop1 pass manages to propagate constants everywhere.
-fno-gcse leads to proper codegen here, but still we get about 7% bigger text
section
compared to 4.3.