Compiling i387.c from the Linux kernel using: -nostdinc -isystem /usr/lib/gcc/i386-redhat-linux/4.0.1/include -D__KERNEL__ -Iinclude -Wall -Wstrict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common -ffreestanding -O2 -fomit-frame-pointer -g -save-temps -msoft-float -m32 -fno-builtin-sprintf -fno-builtin-log2 -fno-builtin-puts -mpreferred-stack-boundary=2 -fno-unit-at-a-time -march=i686 -mtune=pentium4 -mregparm=3 -Iinclude/asm-i386/mach-default -Wdeclaration-after-statement -Wno-pointer-sign -DKBUILD_BASENAME=i387 -DKBUILD_MODNAME=i387 -carch/i386/kernel/i387.c (these are the flags generated by rpmbuild on a Fedora Core 4 system)
Using 4.0 the restore_fpu function looks like: restore_fpu: testb $1, boot_cpu_data+15 je .L23 [snip] Using 4.1 it looks like: restore_fpu: movl %eax, %edx movl boot_cpu_data+12, %eax testl $16777216, %eax je .L24 [snip] Similar code sequences appear in other functions in the same file: get_fpu_mxcsr, get_fpu_swd, get_fpu_cwd, set_fpregs. The size of these functions increases by 5 bytes (i.e.20%) It seems that some of these functions might be on some critical path in the kernel, so the size increase (and maybe speed penalty) could have an impact. For 4.0 the 00.expand dump looks like: (insn 9 7 10 1 (set (reg/f:SI 59) (const:SI (plus:SI (symbol_ref:SI ("boot_cpu_data") [flags 0x40] <var_decl 0xb7ee2d 80 boot_cpu_data>) (const_int 12 [0xc])))) -1 (nil) (nil)) (insn 10 9 11 1 (set (reg:SI 60) (mem/s/j:SI (reg/f:SI 59) [0 boot_cpu_data.x86_capability+0 S4 A32])) -1 (nil) (nil)) (insn 11 10 12 1 (parallel [ (set (reg:SI 61) (and:SI (reg:SI 60) (const_int 16777216 [0x1000000]))) (clobber (reg:CC 17 flags)) ]) -1 (nil) (nil)) (insn 12 11 13 1 (set (reg:CCZ 17 flags) (compare:CCZ (reg:SI 61) (const_int 0 [0x0]))) -1 (nil) (nil)) for 4.1 is identical except for insn 10 which has mem/s/v/j:SI instead of mem/s/j:SI. The combine pass of 4.0 deletes insn 10, that does not happen for 4.1 For 4.1 the generated code does not change when using -Os or -march=pentium4 This is one of the causes for PR23153 -- Summary: mov + mov + testl generated instead of testb Product: gcc Version: 4.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: dann at godzilla dot ics dot uci dot edu GCC target triplet: i686-pc-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24810