Compiling a simple function like

double foo(double x)  {   return x+1.0;  }

on x86 with -O2 -march=pentium4 -mtune=prescott -mfpmath=sse -fpic, the load of 1.0 is done as

        cvtss2sd        [EMAIL PROTECTED](%ecx), %xmm0

(this is Linux, the same happens on Darwin).
This is not really a good idea, as movsd of a double-precision 1.0 is faster. The change from double to single precision is done in compress_float_constant, and there's no cost computation there; presumably the RTL optimizers are expected
to change it back if that's beneficial.

Without -fpic, this does happen in cse_insn. (mem/u/i:SF (symbol_ref/u:SI ("*.LC0") gets run through fold_rtx, which recognizes this as a pool constant. This causes the
known equivalent CONST_DOUBLE 1.0 to be run through force_const_mem,
producing (mem/u/i:DF (symbol_ref/u:SI ("*.LC1"). Which is then tried in place of the FLOAT_EXTEND, and selected as valid and cheaper. This all seems to
be working as expected.

With -fpic, first, fold_rtx doesn't recognize the PIC form as representing a constant, so cse_insn never tries forcing the CONST_DOUBLE into memory. Hacking around that doesn't help, because force_const_mem doesn't produce the PIC form of
constant reference, even though we're in PIC mode; we get the same
(mem/u/i:DF (symbol_ref/u:SI ("*.LC1"), which doesn't test as valid in PIC mode (correctly).

At this point I'm wondering if this is the right place to be attacking the problem at all.
Advice?  Thanks.

Reply via email to