Re: [Patch,AVR]: PR50447: Tweak addhi3

Denis Chertykov Tue, 18 Oct 2011 10:39:37 -0700

2011/10/18 Georg-Johann Lay <a...@gjlay.de>:
> Denis Chertykov schrieb:
>> 2011/10/18 Georg-Johann Lay <a...@gjlay.de>:
>>> Denis Chertykov schrieb:
>>>> 2011/10/18 Georg-Johann Lay <a...@gjlay.de>:
>>>>> This patch do some tweaks to addhi3 like adding QI scratch register.
>>>>>
>>>>> The original *addhi3 insn is still there and located prior to new
>>>>> addhi3_clobber insn because addhi3 is special to reload (thanks Danis for 
>>>>> this
>>>>> note) so that there is a version with and a version without scratch 
>>>>> register.
>>>>>
>>>>> Patch passes without regressions.
>>>>>
>>>> Which improvements added by this patch ?
>>>>
>>>> Denis.
>>> If the addhi3 is expanded early, the addition happens with QI scratch which
>>> avoids reload of constant if target register is in NO_LD. And reduce 
>>> register
>>> pressure as only QI is needed and not reload of constant to HI.
>>>
>>> Otherwise, there might be sequences like
>>>
>>> ldi r31, 2    ; *reload_inhi
>>> mov r12, r31
>>> clr r13
>>>
>>> add r14, r12  ; *addhi3
>>> adc r15, r13
>>>
>>> which now will be
>>>
>>> ldi r31, 2    ; addhi3_clobber
>>> add r14, r31
>>> adc r15, __zero_reg__
>>>
>>> Similar applies if the reload of the constant happens to LD regs:
>>>
>>> ldi r30, 2    ; *movhi
>>> clr r31
>>>
>>> add r14, r12  ; *addhi3
>>> adc r15, r13
>>>
>>> will become
>>>
>>> ldi r30, 2    ; addhi3_clobber
>>> add r14, r30
>>> adc r15, __zero_reg__
>>>
>>> For *addhi3 insns the register pressure is not reduced but the insn sequence
>>> might be smarter if peep2 comes up with a QI scratch or if it detects a
>>> *reload_inhi insn just prior to the addition (and the reg that holds the
>>> reloaded constant dies after the addition).
>>>
>>> As *addhi3 is special to reload, there is still an "ordinary" add addhi insn
>>> without scratch. This is easier because, e.g. prologue and epilogue 
>>> generation
>>> generate add insns (not by means of addhi3 expander but by explicit
>>> gan_rtx_PLUS). Yet the addhi3 expander factors out the situations when an
>>> addhi3 insn is to be generated via addhi3 expander late in the compilation 
>>> process
>>
>> Please provide any real world example.
>>
>> Denis.
>
> Consider avr-libc (under the assumption that it is "real world" code):
>
> In avr-libc's build directory, and with the patch integrated:
>
> $ cd avr/lib/avr4
> $ make clean && make CFLAGS='-save-temps -dp -Os'
> $ grep -A 2 'addhi3_clobber\/2' *.s > out-nopeep2.txt (see attachment)
> $ grep 'addhi3_clobber\/2' *.s | wc -l
> 33
>
> This shows that the insns are already there before peep2 and thus no reload of
> 16-bit constant is needed; an 8-bit scratch is sufficient.
>
> Alternatively, the implementation could omit the expansion to addhi3_clobber 
> in
> addhi3 expander and instead rely completely on peep2. However, that does not
> reduce register pressure because a 16-bit register will be allocated and the
> peep2 just prints things smarter and needs just a QI scratch to call
> avr_out_plus_clobber.
>
> For +/-1, the addition with SEC/ADD/ADC resp. SEC/SBC/SBC leaves cc0 in a 
> mess.
>  as most loops use +/-1 on the counter variable, LDI/SUB/SBC is not shorter 
> but
> better because it sets cc0.
>
> So you like this patch?
> Or prefer a patch that is neutral with respect to register allocator and just
> uses peep2 to print things smarter?


I'm interested in code improvements.
What difference in size of avr-libc ?

Denis.

Re: [Patch,AVR]: PR50447: Tweak addhi3

Reply via email to