Hi, I'm having trouble based on available docs like https://gcc.gnu.org/onlinedocs/gccint/LTO.html in understanding just what the gcc LTO framework is intended to be architecturally capable of.
As a concrete motivating example, I have a 32K embedded program about 5% of which consists of sequences like movhi r2,0 addi r2,r2,26444 stw r15,0(r2) This is on a 32-bit RISC architecture (Nios2) with 16-bit immediate values in instructions where in general a sequence like movhi r2,high_half_of_address addi r2,r2,low_half_of_address is required to assemble an arbitrary 32-bit address in registers for use. However, if the high half of the address happens to be zero, (which is universally true in this program because code+data fit in 64KB -- forced by hardware constraints) we can collapse movhi r2,0 addi r2,r2,26444 stw r15,0(r2) to just stw r15,26444(r0) saving two instructions. (On this architecture R0 is hardwired to zero.) This seems like a natural peephole optimization at linktime -- *if* data addresses are resolved in some (preliminary?) fashion during linktime code generation. Is this a plausible optimization to implement in gcc + binutils with the current -flto support architecture? If so, what doc/mechanism/approach/sourcefile should I be studying in order to implement this? If not, is there some other productive way to tickle gcc + binutils here? Thanks in advance, -Jeff