2011/10/18 Georg-Johann Lay <a...@gjlay.de>: > Denis Chertykov schrieb: >> 2011/10/18 Georg-Johann Lay <a...@gjlay.de>: >>> Denis Chertykov schrieb: >>>> 2011/10/18 Georg-Johann Lay <a...@gjlay.de>: >>>>> This patch do some tweaks to addhi3 like adding QI scratch register. >>>>> >>>>> The original *addhi3 insn is still there and located prior to new >>>>> addhi3_clobber insn because addhi3 is special to reload (thanks Danis for >>>>> this >>>>> note) so that there is a version with and a version without scratch >>>>> register. >>>>> >>>>> Patch passes without regressions. >>>>> >>>> Which improvements added by this patch ? >>>> >>>> Denis. >>> If the addhi3 is expanded early, the addition happens with QI scratch which >>> avoids reload of constant if target register is in NO_LD. And reduce >>> register >>> pressure as only QI is needed and not reload of constant to HI. >>> >>> Otherwise, there might be sequences like >>> >>> ldi r31, 2 ; *reload_inhi >>> mov r12, r31 >>> clr r13 >>> >>> add r14, r12 ; *addhi3 >>> adc r15, r13 >>> >>> which now will be >>> >>> ldi r31, 2 ; addhi3_clobber >>> add r14, r31 >>> adc r15, __zero_reg__ >>> >>> Similar applies if the reload of the constant happens to LD regs: >>> >>> ldi r30, 2 ; *movhi >>> clr r31 >>> >>> add r14, r12 ; *addhi3 >>> adc r15, r13 >>> >>> will become >>> >>> ldi r30, 2 ; addhi3_clobber >>> add r14, r30 >>> adc r15, __zero_reg__ >>> >>> For *addhi3 insns the register pressure is not reduced but the insn sequence >>> might be smarter if peep2 comes up with a QI scratch or if it detects a >>> *reload_inhi insn just prior to the addition (and the reg that holds the >>> reloaded constant dies after the addition). >>> >>> As *addhi3 is special to reload, there is still an "ordinary" add addhi insn >>> without scratch. This is easier because, e.g. prologue and epilogue >>> generation >>> generate add insns (not by means of addhi3 expander but by explicit >>> gan_rtx_PLUS). Yet the addhi3 expander factors out the situations when an >>> addhi3 insn is to be generated via addhi3 expander late in the compilation >>> process >> >> Please provide any real world example. >> >> Denis. > > Consider avr-libc (under the assumption that it is "real world" code): > > In avr-libc's build directory, and with the patch integrated: > > $ cd avr/lib/avr4 > $ make clean && make CFLAGS='-save-temps -dp -Os' > $ grep -A 2 'addhi3_clobber\/2' *.s > out-nopeep2.txt (see attachment) > $ grep 'addhi3_clobber\/2' *.s | wc -l > 33 > > This shows that the insns are already there before peep2 and thus no reload of > 16-bit constant is needed; an 8-bit scratch is sufficient. > > Alternatively, the implementation could omit the expansion to addhi3_clobber > in > addhi3 expander and instead rely completely on peep2. However, that does not > reduce register pressure because a 16-bit register will be allocated and the > peep2 just prints things smarter and needs just a QI scratch to call > avr_out_plus_clobber. > > For +/-1, the addition with SEC/ADD/ADC resp. SEC/SBC/SBC leaves cc0 in a > mess. > as most loops use +/-1 on the counter variable, LDI/SUB/SBC is not shorter > but > better because it sets cc0. > > So you like this patch? > Or prefer a patch that is neutral with respect to register allocator and just > uses peep2 to print things smarter?
I'm interested in code improvements. What difference in size of avr-libc ? Denis.