2011/7/8 Georg-Johann Lay <[email protected]>: > Denis Chertykov wrote: >> 2011/7/7 Georg-Johann Lay: >>> Hi Denis. >> >> I think that it's a good question to discuss inside gcc mailing list. >> May be somebody more qualified person give a better suggestion than me. > > ...bringing this over to gcc mailing list > >>> I think about improving addsi3 insn when a const_int is added. >>> The hard case is when the register to add the value is not in "d". >>> In the current implementation the constant simply gets reloaded into >>> some register (provided it is not an "easy" const like 1 or -1). >>> >>> That's a waste because >>> >>> * a 32-bit register gets initialized with the const, but a 8-bit >>> d-register would be sufficient. >>> >>> * Instead of setting an intermediate register, the value could be >>> added byte-by-byte if one d-reg was available. >>> >>> A bit unsure about how to approach that, I'd ask you what you think >>> about it. I can think of three approaches: >>> >>> 1) Use a peep2 on sequence like >>> *reload_insi >>> *addsi3 >>> resp. >>> *movsi >>> *addsi3 >>> >>> 2) Add new insn with (clobber (match_scratch:QI "=&d")) >>> >>> 3) Allow all const_int in *addsi3. If peep2 comes up with >>> a d-clobber, that's fine. Else have to cook up a clobber >>> by saving a d-reg to tmp_reg similar to movsi implementation. >>> >>> Approach (1) has the advantage that it is "neutral" with respect >>> to reloading. However, reload will always allocate a SI register, >>> even if peep2 comes up with a scratch. >> >> But, CSE will always have a chance to optimize it. > > You mean CSE or reload-CSE?
As I remember reload himself can optimize such cases, also postreload CSE. > > If same constant is used multiple times, CSE can arrange for that and > combine can can compose a addsi+clobber. combine will only insert the > constant if it can reduce the number of insns, i.e. if two insns need > the same constant it won't do the replacement. > >>> Approach (2) has the advantage that it has just "pressure 1 on d-regs" >>> whilst (1) and the current implementation have "pressure 4 or even >>> r-regs". >> >> IMHO it's good idea to check. >> >>> I am in favor of approach (3): It's easier pattern like (1) and it >>> does not put pressure on regs at all. Also (3) makes no additional >>> work to support SF or if there were fixed-point mode. >> >> It's a fake. >> Generally, all fake methods is a wrong way. (Look at fake addressing) > > Yes, it's a fake. Bot note that movsi "r,i" alternative is also a > fake, and maybe you remember my failed approach to use secondary > reloads for that: > http://gcc.gnu.org/ml/gcc/2011-03/msg00290.html > > A word on fake addressing (PR46278): > > You were right with that. > At the time you wrote the AVR port, there was nothing like > MODE_CODE_BASE_REG_CLASS or REGNO_MODE_CODE_OK_FOR_BASE_P. > And even today it's not possible to specify different costs for > different addressing modes (ADDRESS_COST won't do it because it wors > on anatomy of an address and not on the address reg(s) invented). > If there was a way to tell the costs to IRA/reload and these passes > cared for the costs, fake addressing would be no problem! > > There are architectures where > LOAD R, [A] > resp. > LOAD R, [B] > do exactly the same but have different costs because A can be used > implicitly and thus yields shorter code. There's no way to tell it to > the allocator. All you can do is define reg-alloc-order and hope for > the best. > >> I think that I WAS WRONG while I decide to create __zero_reg__ and >> __tmp_reg__. (It was derived from my low qualify at start of EGCS >> hacking) > > I thought about that topic already more than once. > I have no idea how it could be done better. > > OPTIMIZE_MODE_SWITCHING looks rempting but it runs prior to reload, > and many uses of ZERO or TMP generated in reload or peep2. > We would need a pass with similar abilities running after peep2. > > To test in ISR if ZERO or TMP is needed, insn attributes could help. > But the stack layout would have to be changed ad-hoc, new BE-pass was > needed and it's error-prone IMHO. > > Introducing new ZERO_REG like R2 would change the ABI and ISR > pro/epilogue would increase even more because some insns > change/restore ZERO. > > >>> Or do you think the improvement is not needed? >> >> I vote for 2, but I can be wrong. >> You know the best way ;-) Try all methods and compare results. >> (profile optimizing yourself ;-) >> >>> Thanks for your recommendation. >> >> I want to give you a recomendation: please, cleanup the port from >> using immediate numbers as register numbers. (in your patches you >> frequently use it) > > I intend to do more backend cleanup, the elfos patch just was the first. > >> Please, use constants ! >> I forgot to point to them in your last patches. >> e.i. >> >> /* We have no clobber reg but need one. Cook one up. >> That's cheaper than loading from constant pool. */ >> >> cooked_clobber_p = true; >> clobber_reg = gen_rtx_REG (QImode, 31); >> >> >> Use REG_Z + 1 instead of 31. > > Already committed: > http://gcc.gnu.org/ml/gcc-patches/2011-07/msg00634.html > >> Few years ago I have an idea to use register renumbering to allow >> using r26,r27,r30,r31 as one SImode register. >> >> >> Denis. > > Sounds crazy. You mean make X fixed and to have a soft-X after Z like so? > > r26/27: X fix > r28/29: Y > r30/31: Z > r32/33: soft-X No, no. I have used Y for frame pointer because of `icall' and `ijmp' Register allocator can't allocate SImode register to X,Z because Y between of them. (It's because of poor AVR architecture) Just imagine that the Z register in r28 and r29 and Y in r30 and r31. (only for GCC) GCC will use r28,r29 as Z but AVR port will print r28 as r30 and r29 as r31, r30 as r28 and r31 as r29. > > That would increase pressure on XYZ even more. I don't think GCC's > register allocator is ready for that. It will enable to allocate SImode registers to r26,r27,r30,r31 it's definitely a win. Denis.
