Let it be clear from the start this is a potshot and while those trends aren't exactly new or specific to my code, i haven't tried to provide anything but specific data from one of my app, on win32/cygwin.
Primo, gcc getting much better wrt inling exacerbates the fact that it's not as good as other compilers at shrinking the stack frame size, and perhaps as was suggested by Uros when discussing that point a pass to address that would make sense. As i'm too lazy to properly measure cruft across multiple compilers, i'll use my rtrt app where i mostly control large scale inlining by hand. objdump -wdrfC --no-show-raw-insn $1|perl -pe 's/^\s+\w+:\s+//'|perl -ne 'printf "%4d\n", hex($1) if /sub\s+\$(0x\w+),%esp/'|sort -r| head -n 10 msvc:2196 2100 1772 1692 1688 1444 1428 1312 1308 1160 icc: 2412 2280 2172 2044 1928 1848 1820 1588 1428 1396 gcc: 2604 2596 2412 2076 2028 1932 1900 1756 1720 1132 That's with msvc8 sp1, icc 9.1.033, g++ 4.3-20070119, each compiler being configured to optimize as much as possible for speed. That confirms what i see when checking codegen for specific functions. Secundo, while i very much appreciate the brand new string ops, it seems that on ia32 some array initialization cases where left out, hence i still see oodles of 'movl $0x0' when generating code for k8. Also those zeroings get coalesced at the top of functions on ia32, and i have a function where there's 3 pages of those right after prologue. See the attached 'grep 'movl $0x0' dump.
movl0.S.bz2
Description: BZip2 compressed data