Forgot to attach the patch:

Index: i386.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/config/i386/i386.c,v
retrieving revision 1.795.4.33
diff -c -p -r1.795.4.33 i386.c
*** i386.c      15 Aug 2005 23:36:10 -0000      1.795.4.33
--- i386.c      25 Aug 2005 17:08:33 -0000
*************** ix86_rtx_costs (rtx x, int code, int out
*** 15730,15740 ****
        else
        switch (standard_80387_constant_p (x))
          {
!         case 1: /* 0.0 */
!           *total = 1;
!           break;
!         default: /* Other constants */
!           *total = 2;
            break;
          case 0:
          case -1:
--- 15730,15737 ----
        else
        switch (standard_80387_constant_p (x))
          {
!         default: /* All constants */
!           *total = 0;
            break;
          case 0:
          case -1:

On Aug 25, 2005, at 11:09 AM, Fariborz Jahanian wrote:

(Note! I am starting a new thread of an old thread because of old thread's corruption which prevented me from responding).

Following test case:

struct S {
        double d1, d2, d3;
};

struct S ms()
{
        struct S s = {0,0,0};
        return s;
}

Compiled with -O1 -mdynamic-no-pic -march=pentium4 produces:

        pxor    %xmm0, %xmm0
        movsd   %xmm0, 16(%eax)
        movsd   %xmm0, 8(%eax)
        movsd   %xmm0, (%eax)

But following code results in 7% performance gain in eon as reported by one of Apple's performance people:

        movl    $0, 16(%eax)
        movl    $0, 20(%eax)
        movl    $0, 8(%eax)
        movl    $0, 12(%eax)
        movl    $0, (%eax)
        movl    $0, 4(%eax)

This is because cse does not do the constant propagation in this rtl (note that cse is capable of grabbing a constant from REG_EQUAL ).

(insn 12 7 13 0 (set (reg:DF 59)
(mem/u/i:DF (symbol_ref/u:SI ("*LC0") [flags 0x2]) [0 S8 A64])) 64 {*movdf_nointeger} (nil)
    (expr_list:REG_EQUAL (const_double:DF 0.0 [0x0.0p+0])
        (nil)))

(insn 13 12 15 0 (set (mem/s/j:DF (plus:SI (reg/f:SI 58 [ D.1470 ])
                (const_int 16 [0x10])) [0 <result>.d3+0 S8 A32])
        (reg:DF 59)) 64 {*movdf_nointeger} (nil)
    (nil))

(insn 15 13 17 0 (set (mem/s/j:DF (plus:SI (reg/f:SI 58 [ D.1470 ])
                (const_int 8 [0x8])) [0 <result>.d2+0 S8 A32])
        (reg:DF 59)) 64 {*movdf_nointeger} (nil)
    (nil))

(insn 17 15 20 0 (set (mem/s/j:DF (reg/f:SI 58 [ D.1470 ]) [0 <result>.d1+0 S8 A32])
        (reg:DF 59)) 64 {*movdf_nointeger} (nil)
    (nil))

And the reason that it is not doing it is the definition of COST macro which returns a higher cost for const_double than when constant is available in a register. For x86 platform, this cost is evaluated in call to ix86_rtx_costs. It returns 1 or 2. I had a lengthy conversation with Ian Lance Taylor. He suggested to lower the const_double cost to 0. And indeed, this will lower the cost so COST of const_double constant wins. But careful selection of this cost in ix86_rtx_costs makes me cautious that this may break performance on some other flavors of x86 architecture and/or on some other benchmarks. Any comments from those familiar with this cost function (or any other way that cse to do its job, such as a special new cost function) is appreciated.

- Thanks, fariborz ([EMAIL PROTECTED]).






Reply via email to