IMHO, current save/restore registers strategy is not optimal. Look: # cat test.c
#include <stdio.h> void print(char *mess, char *format, int text) { printf(mess); printf(format,text); } void main() { print("X=","%d\n",1); } # gcc --version gcc (GCC) 4.5.0 20090601 (experimental) # gcc -o test test.c -O2 # objdump -d test 00000000004004d0 <print>: 4004d0: 48 89 5c 24 f0 mov %rbx,-0x10(%rsp) <---- 4004d5: 48 89 6c 24 f8 mov %rbp,-0x8(%rsp) <---- 4004da: 48 89 f3 mov %rsi,%rbx 4004dd: 48 83 ec 18 sub $0x18,%rsp <---- 4004e1: 89 d5 mov %edx,%ebp 4004e3: 31 c0 xor %eax,%eax 4004e5: e8 ce fe ff ff callq 4003b8 <pri...@plt> 4004ea: 89 ee mov %ebp,%esi 4004ec: 48 89 df mov %rbx,%rdi 4004ef: 48 8b 6c 24 10 mov 0x10(%rsp),%rbp <---- 4004f4: 48 8b 5c 24 08 mov 0x8(%rsp),%rbx <---- 4004f9: 31 c0 xor %eax,%eax 4004fb: 48 83 c4 18 add $0x18,%rsp <---- 4004ff: e9 b4 fe ff ff jmpq 4003b8 <pri...@plt> ========= Let's replace current save/restore: 48 89 5c 24 f0 mov %rbx,-0x10(%rsp) 48 89 6c 24 f8 mov %rbp,-0x8(%rsp) 48 83 ec 18 sub $0x18,%rsp ... 48 8b 6c 24 10 mov 0x10(%rsp),%rbp 48 8b 5c 24 08 mov 0x8(%rsp),%rbx 48 83 c4 18 add $0x18,%rsp to faster and short new save/restore: 55 push %rbp 53 push %rbx 53 push %rbx ; dummy push ... 5b pop %rbx ; dummy pop 5b pop %rbx 5d pop %rbp IMPOTANT note: For faster execution, "dummy push" have to use same register as previous push! Measurement results on Core2: new save/restore 5 ticks faster then carrent one. Regards, Vladimir Volynsky -- Summary: Nonoptimal save/restore registers Product: gcc Version: 4.5.0 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: c AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: vvv at ru dot ru http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40363