Compiling a simple function like
double foo(double x) { return x+1.0; }
on x86 with -O2 -march=pentium4 -mtune=prescott -mfpmath=sse -fpic, the
load of 1.0 is done as
cvtss2sd [EMAIL PROTECTED](%ecx), %xmm0
(this is Linux, the same happens on Darwin).
This is not really a good idea, as movsd of a double-precision 1.0 is
faster.
The change from double to single precision is done in
compress_float_constant,
and there's no cost computation there; presumably the RTL optimizers
are expected
to change it back if that's beneficial.
Without -fpic, this does happen in cse_insn. (mem/u/i:SF
(symbol_ref/u:SI ("*.LC0")
gets run through fold_rtx, which recognizes this as a pool constant.
This causes the
known equivalent CONST_DOUBLE 1.0 to be run through force_const_mem,
producing (mem/u/i:DF (symbol_ref/u:SI ("*.LC1"). Which is then tried
in place
of the FLOAT_EXTEND, and selected as valid and cheaper. This all seems
to
be working as expected.
With -fpic, first, fold_rtx doesn't recognize the PIC form as
representing a constant,
so cse_insn never tries forcing the CONST_DOUBLE into memory. Hacking
around
that doesn't help, because force_const_mem doesn't produce the PIC form
of
constant reference, even though we're in PIC mode; we get the same
(mem/u/i:DF (symbol_ref/u:SI ("*.LC1"), which doesn't test as valid in
PIC mode (correctly).
At this point I'm wondering if this is the right place to be attacking
the problem at all.
Advice? Thanks.