Hi, I'm having trouble based on available docs like
    https://gcc.gnu.org/onlinedocs/gccint/LTO.html
in understanding just what the gcc LTO framework is
intended to be architecturally capable of.

As a concrete motivating example, I have a 32K embedded
program about 5% of which consists of sequences like

        movhi   r2,0
        addi    r2,r2,26444
        stw     r15,0(r2)

This is on a 32-bit RISC architecture (Nios2) with 16-bit
immediate values in instructions where in general a
sequence like

        movhi   r2,high_half_of_address
        addi    r2,r2,low_half_of_address

is required to assemble an arbitrary 32-bit address in
registers for use.

However, if the high half of the address happens to be
zero, (which is universally true in this program because
code+data fit in 64KB -- forced by hardware constraints)
we can collapse

        movhi   r2,0
        addi    r2,r2,26444
        stw     r15,0(r2)
to just
        stw     r15,26444(r0)

saving two instructions. (On this architecture
R0 is hardwired to zero.)

This seems like a natural peephole optimization at
linktime -- *if* data addresses are resolved in some
(preliminary?) fashion during linktime code generation.

Is this a plausible optimization to implement in gcc
+ binutils with the current -flto support architecture?

If so, what doc/mechanism/approach/sourcefile should
I be studying in order to implement this?

If not, is there some other productive way to tickle
gcc + binutils here?

Thanks in advance,
 -Jeff


Reply via email to