Compiling i387.c from the Linux kernel using: 
 -nostdinc -isystem /usr/lib/gcc/i386-redhat-linux/4.0.1/include -D__KERNEL__
-Iinclude -Wall -Wstrict-prototypes -Wno-trigraphs -fno-strict-aliasing
-fno-common -ffreestanding -O2 -fomit-frame-pointer -g -save-temps -msoft-float
-m32 -fno-builtin-sprintf -fno-builtin-log2 -fno-builtin-puts
-mpreferred-stack-boundary=2 -fno-unit-at-a-time -march=i686 -mtune=pentium4
-mregparm=3 -Iinclude/asm-i386/mach-default -Wdeclaration-after-statement
-Wno-pointer-sign -DKBUILD_BASENAME=i387 -DKBUILD_MODNAME=i387
-carch/i386/kernel/i387.c
(these are the flags generated by rpmbuild on a Fedora Core 4 system) 

Using 4.0 the restore_fpu function looks like:
restore_fpu:
        testb   $1, boot_cpu_data+15
        je      .L23
        [snip]

Using 4.1 it looks like:
restore_fpu:
        movl    %eax, %edx
        movl    boot_cpu_data+12, %eax
        testl   $16777216, %eax
        je      .L24
        [snip]

Similar code sequences appear in other functions in the same file: 
get_fpu_mxcsr, get_fpu_swd, get_fpu_cwd, set_fpregs.
The size of these functions increases by 5 bytes (i.e.20%) 

It seems that some of these functions might be on some critical path in the
kernel, so the size increase (and maybe speed penalty) could have an impact.

For 4.0 the 00.expand dump looks like:

(insn 9 7 10 1 (set (reg/f:SI 59)
        (const:SI (plus:SI (symbol_ref:SI ("boot_cpu_data") [flags 0x40]
<var_decl 0xb7ee2d
80 boot_cpu_data>)
                (const_int 12 [0xc])))) -1 (nil)
    (nil))

(insn 10 9 11 1 (set (reg:SI 60)
        (mem/s/j:SI (reg/f:SI 59) [0 boot_cpu_data.x86_capability+0 S4 A32]))
-1 (nil)
    (nil))

(insn 11 10 12 1 (parallel [
            (set (reg:SI 61)
                (and:SI (reg:SI 60)
                    (const_int 16777216 [0x1000000])))
            (clobber (reg:CC 17 flags))
        ]) -1 (nil)
    (nil))

(insn 12 11 13 1 (set (reg:CCZ 17 flags)
        (compare:CCZ (reg:SI 61)
            (const_int 0 [0x0]))) -1 (nil)
    (nil))


for 4.1 is identical except for insn 10 which has mem/s/v/j:SI 
instead of mem/s/j:SI. 

The combine pass of 4.0 deletes insn 10, that does not happen for 4.1


For 4.1 the generated code does not change when using -Os or -march=pentium4

This is one of the causes for PR23153


-- 
           Summary: mov + mov + testl generated instead of testb
           Product: gcc
           Version: 4.1.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: dann at godzilla dot ics dot uci dot edu
GCC target triplet: i686-pc-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24810

Reply via email to