https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63281
--- Comment #9 from Bill Schmidt <wschmidt at gcc dot gnu.org> ---
Also reported by Donald Stence this week:
The compiler produces excessive sequences to synthesize some literal constants.
This contributes excess path length and potentially latency.
Constants requiring only 2 or 3 instructions is acceptable. More than 3 should
be optimized via a load from the GOT (i.e., data in GOT).
Compile test case either -O or -O3, default processor (AT-11.0-0).
Example constant from perlbench: 0x000800004100001.
Resulting sequence:
li 3,0
ori 3,3,0x8000
sldi 3,3,32
oris 3,3,0x410
ori 3,3,0x1
It was ~20% faster on the block of some 30 instructions prior to the switch in
the top function of perlbench (S_regmatch). That section of code contained two
longer sequences (one 4, the other 5 instructions - with the 5 one capable of
being done in 4 - as [Segher] pointed out), with the rest being 1 or 2
instruction constant synthesization or addi and a out-of-bounds check for the
switch. I replaced the two longer ones with ld off r2 to get the ~20%. Of
course this is in isolation, but I believe this to be sound.