Forgot to attach the patch:
Index: i386.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/config/i386/i386.c,v
retrieving revision 1.795.4.33
diff -c -p -r1.795.4.33 i386.c
*** i386.c 15 Aug 2005 23:36:10 -0000 1.795.4.33
--- i386.c 25 Aug 2005 17:08:33 -0000
*************** ix86_rtx_costs (rtx x, int code, int out
*** 15730,15740 ****
else
switch (standard_80387_constant_p (x))
{
! case 1: /* 0.0 */
! *total = 1;
! break;
! default: /* Other constants */
! *total = 2;
break;
case 0:
case -1:
--- 15730,15737 ----
else
switch (standard_80387_constant_p (x))
{
! default: /* All constants */
! *total = 0;
break;
case 0:
case -1:
On Aug 25, 2005, at 11:09 AM, Fariborz Jahanian wrote:
(Note! I am starting a new thread of an old thread because of old
thread's corruption which prevented me from responding).
Following test case:
struct S {
double d1, d2, d3;
};
struct S ms()
{
struct S s = {0,0,0};
return s;
}
Compiled with -O1 -mdynamic-no-pic -march=pentium4 produces:
pxor %xmm0, %xmm0
movsd %xmm0, 16(%eax)
movsd %xmm0, 8(%eax)
movsd %xmm0, (%eax)
But following code results in 7% performance gain in eon as
reported by one of Apple's performance people:
movl $0, 16(%eax)
movl $0, 20(%eax)
movl $0, 8(%eax)
movl $0, 12(%eax)
movl $0, (%eax)
movl $0, 4(%eax)
This is because cse does not do the constant propagation in this
rtl (note that cse is capable of grabbing a constant from REG_EQUAL ).
(insn 12 7 13 0 (set (reg:DF 59)
(mem/u/i:DF (symbol_ref/u:SI ("*LC0") [flags 0x2]) [0 S8
A64])) 64 {*movdf_nointeger} (nil)
(expr_list:REG_EQUAL (const_double:DF 0.0 [0x0.0p+0])
(nil)))
(insn 13 12 15 0 (set (mem/s/j:DF (plus:SI (reg/f:SI 58 [ D.1470 ])
(const_int 16 [0x10])) [0 <result>.d3+0 S8 A32])
(reg:DF 59)) 64 {*movdf_nointeger} (nil)
(nil))
(insn 15 13 17 0 (set (mem/s/j:DF (plus:SI (reg/f:SI 58 [ D.1470 ])
(const_int 8 [0x8])) [0 <result>.d2+0 S8 A32])
(reg:DF 59)) 64 {*movdf_nointeger} (nil)
(nil))
(insn 17 15 20 0 (set (mem/s/j:DF (reg/f:SI 58 [ D.1470 ]) [0
<result>.d1+0 S8 A32])
(reg:DF 59)) 64 {*movdf_nointeger} (nil)
(nil))
And the reason that it is not doing it is the definition of COST
macro which returns a higher cost for const_double than when
constant is available in a register. For x86 platform, this cost is
evaluated in call to ix86_rtx_costs. It returns 1 or 2. I had a
lengthy conversation with Ian Lance Taylor. He suggested to lower
the const_double cost to 0. And indeed, this will lower the cost so
COST of const_double constant wins. But careful selection of this
cost in ix86_rtx_costs makes me cautious that this may break
performance on some other flavors of x86 architecture and/or on
some other benchmarks. Any comments from those familiar with this
cost function (or any other way that cse to do its job, such as a
special new cost function) is appreciated.
- Thanks, fariborz ([EMAIL PROTECTED]).