------- Comment #3 from hjl dot tools at gmail dot com 2010-02-05 19:35 ------- It is due to X86_TUNE_INTER_UNIT_MOVES is off by default. I used pextrd to avoid one more memory access since -mtune=core2 isn't faster than default in most cases.
For this case, icc generates: movl 4(%esp), %eax ret -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42968