This patch series fixes a number of issues in rs6000_rtx_costs, the
aim being to provide costing somewhat closer to reality. Probably the
most important patch of the series is patch 4, which just adds a
comment. Without the analysis that went into that comment, I found
myself making what seemed to be good changes but which introduced
regressions.
So far these changes have not introduced any testsuite regressions
on --with-cpu=power8 and --with-cpu=power9 all lang bootstraps on
powerpc64le-linux. Pat spec tested on power9 against a baseline
master from a few months ago, seeing a few small improvements and no
degradations above the noise.
Some notes:
Examination of varasm.o shows quite a number of cases where
if-conversion succeeds due to different seq_cost. One example:
extern int foo ();
int
default_assemble_integer (unsigned size)
{
extern unsigned long rs6000_isa_flags;
if (size > (!((rs6000_isa_flags & (1UL << 35)) != 0) ? 4 : 8))
return 0;
return foo ();
}
This rather horrible code turns the rs6000_isa_flags value into either
4 or 8:
rldicr 9,9,28,0
srdi 9,9,28
addic 9,9,-1
subfe 9,9,9
rldicr 9,9,0,61
addi 9,9,8
Better would be
rldicl 9,9,29,63
sldi 9,9,2
addi 9,9,4
There is also a "rlwinm ra,rb,3,0,26" instead of "rldicr ra,rb,3,60",
and "li r31,0x4000; rotldi r31,r31,17" vs.
"lis r31,0x8000; clrldi r31,r31,32".
Neither of these is a real change. I saw one occurrence of a 5 insn
sequence being replaced with a load from memory in
default_function_rodata_section, for ".rodata", and others elsewhere.
Sometimes correct insn cost leads to unexpected results. For
example:
extern unsigned bar (void);
unsigned
f1 (unsigned a)
{
if ((a & 0x01000200) == 0x01000200)
return bar ();
return 0;
}
emits for a & 0x01000200
(set (reg) (and (reg) (const_int 0x01000200)))
at expand time (two rlwinm insns) rather than the older
(set (reg) (const_int 0x01000200))
(set (reg) (and (reg) (reg)))
which is three insns. However, since 0x01000200 is needed later the
older code after optimisation is smaller.