Re: Improve addsi3 for CONST_INT

Denis Chertykov Sun, 10 Jul 2011 09:57:37 -0700

2011/7/8 Georg-Johann Lay <[email protected]>:
> Denis Chertykov wrote:
>> 2011/7/7 Georg-Johann Lay:
>>> Hi Denis.
>>
>> I think that it's a good question to discuss inside gcc mailing list.
>> May be somebody more qualified person give a better suggestion than me.
>
> ...bringing this over to gcc mailing list
>
>>> I think about improving addsi3 insn when a const_int is added.
>>> The hard case is when the register to add the value is not in "d".
>>> In the current implementation the constant simply gets reloaded into
>>> some register (provided it is not an "easy" const like 1 or -1).
>>>
>>> That's a waste because
>>>
>>> * a 32-bit register gets initialized with the const, but a 8-bit
>>>  d-register would be sufficient.
>>>
>>> * Instead of setting an intermediate register, the value could be
>>>  added byte-by-byte if one d-reg was available.
>>>
>>> A bit unsure about how to approach that, I'd ask you what you think
>>> about it.  I can think of three approaches:
>>>
>>> 1) Use a peep2 on sequence like
>>>    *reload_insi
>>>    *addsi3
>>>   resp.
>>>    *movsi
>>>    *addsi3
>>>
>>> 2) Add new insn with (clobber (match_scratch:QI "=&d"))
>>>
>>> 3) Allow all const_int in *addsi3. If peep2 comes up with
>>>   a d-clobber, that's fine. Else have to cook up a clobber
>>>   by saving a d-reg to tmp_reg similar to movsi implementation.
>>>
>>> Approach (1) has the advantage that it is "neutral" with respect
>>> to reloading.  However, reload will always allocate a SI register,
>>> even if peep2 comes up with a scratch.
>>
>> But, CSE will always have a chance to optimize it.
>
> You mean CSE or reload-CSE?


As I remember reload himself can optimize such cases, also postreload CSE.

>
> If same constant is used multiple times, CSE can arrange for that and
> combine can can compose a addsi+clobber. combine will only insert the
> constant if it can reduce the number of insns, i.e. if two insns need
> the same constant it won't do the replacement.
>
>>> Approach (2) has the advantage that it has just "pressure 1 on d-regs"
>>> whilst (1) and the current implementation have "pressure 4 or even
>>> r-regs".
>>
>> IMHO it's good idea to check.
>>
>>> I am in favor of approach (3): It's easier pattern like (1) and it
>>> does not put pressure on regs at all.  Also (3) makes no additional
>>> work to support SF or if there were fixed-point mode.
>>
>> It's a fake.
>> Generally, all fake methods is a wrong way. (Look at fake addressing)
>
> Yes, it's a fake.  Bot note that movsi "r,i" alternative is also a
> fake, and maybe you remember my failed approach to use secondary
> reloads for that:
>   http://gcc.gnu.org/ml/gcc/2011-03/msg00290.html
>
> A word on fake addressing (PR46278):
>
> You were right with that.
> At the time you wrote the AVR port, there was nothing like
> MODE_CODE_BASE_REG_CLASS or REGNO_MODE_CODE_OK_FOR_BASE_P.
> And even today it's not possible to specify different costs for
> different addressing modes (ADDRESS_COST won't do it because it wors
> on anatomy of an address and not on the address reg(s) invented).
> If there was a way to tell the costs to IRA/reload and these passes
> cared for the costs, fake addressing would be no problem!
>
> There are architectures where
>  LOAD R, [A]
> resp.
>  LOAD R, [B]
> do exactly the same but have different costs because A can be used
> implicitly and thus yields shorter code.  There's no way to tell it to
> the allocator.  All you can do is define reg-alloc-order and hope for
> the best.
>
>> I think that I WAS WRONG while I decide to create __zero_reg__ and
>> __tmp_reg__. (It was derived from my low qualify at start of EGCS
>> hacking)
>
> I thought about that topic already more than once.
> I have no idea how it could be done better.
>
> OPTIMIZE_MODE_SWITCHING looks rempting but it runs prior to reload,
> and many uses of ZERO or TMP generated in reload or peep2.
> We would need a pass with similar abilities running after peep2.
>
> To test in ISR if ZERO or TMP is needed, insn attributes could help.
> But the stack layout would have to be changed ad-hoc, new BE-pass was
> needed and it's error-prone IMHO.
>
> Introducing new ZERO_REG like R2 would change the ABI and ISR
> pro/epilogue would increase even more because some insns
> change/restore ZERO.
>
>
>>> Or do you think the improvement is not needed?
>>
>> I vote for 2, but I can be wrong.
>> You know the best way ;-) Try all methods and compare results.
>> (profile optimizing yourself ;-)
>>
>>> Thanks for your recommendation.
>>
>> I want to give you a recomendation: please, cleanup the port from
>> using immediate numbers as register numbers. (in your patches you
>> frequently use it)
>
> I intend to do more backend cleanup, the elfos patch just was the first.
>
>> Please, use constants !
>> I forgot to point to them in your last patches.
>> e.i.
>>
>>               /* We have no clobber reg but need one.  Cook one up.
>>                  That's cheaper than loading from constant pool.  */
>>
>>               cooked_clobber_p = true;
>>               clobber_reg = gen_rtx_REG (QImode, 31);
>>
>>
>> Use REG_Z + 1 instead of 31.
>
> Already committed:
>   http://gcc.gnu.org/ml/gcc-patches/2011-07/msg00634.html
>
>> Few years ago I have an idea to use register renumbering to allow
>> using r26,r27,r30,r31 as one SImode register.
>>
>>
>> Denis.
>
> Sounds crazy. You mean make X fixed and to have a soft-X after Z like so?
>
> r26/27: X fix
> r28/29: Y
> r30/31: Z
> r32/33: soft-X

No, no.
I have used Y for frame pointer because of `icall' and `ijmp'
Register allocator can't allocate SImode register to X,Z because Y
between of them.
(It's because of poor AVR architecture)
Just imagine that the Z register in r28 and r29 and Y in r30 and r31.
(only for GCC)
GCC will use r28,r29 as Z but AVR port will print r28 as r30 and r29 as r31,
r30 as r28 and r31 as r29.

>
> That would increase pressure on XYZ even more. I don't think GCC's
> register allocator is ready for that.

It will enable to allocate SImode registers to r26,r27,r30,r31 it's
definitely a win.


Denis.

Re: Improve addsi3 for CONST_INT

Reply via email to