Hi Vlad,

Thanks for the great reply, and sorry for not replying sooner.
Things have been a bit hectic for me recently.

Vladimir Makarov <[EMAIL PROTECTED]> writes:
> Richard Sandiford wrote:
>> Although I suspect it isn't intentional, I can imagine it doesn't show
>> up much on targets whose memory move costs are significantly higher than
>> their register move costs.
>>
>> Which brings us to MIPS.  The big question there is: what do we
>> do with the accumulator registers?  Do we put them in the same
>> cover class as GPRs?
>>
>> Or perhaps that's jumping the gun.  Perhaps the first question is:
>> should we mark the accumulator registers as fixed, or at least hide them
>> from the register allocator?  I'm planning to do the latter for MIPS16,
>> but I don't think it's a good idea for normal MIPS for two reasons:
>>
>>   - The DSP ASE provides 4 accumulators.  We want to apply
>>     normal register allocation to them.
>>
>>   - Some targets have multiply-accumulate instructions that operate on
>>     LO and HI.  But it isn't always a win to use them.  If a target has
>>     both multiply-accumulate _and_ pipelined three-operand multiplication
>>     instructions, it is often better to use the latter for parallel
>>     multiply-accumulate chains.  We've traditionally treated the
>>     choice as a register-allocation problem, which seems to have
>>     worked reasonably well.
>>
>>     Also, the macc instruction on some targets can copy the LO result
>>     to a GPR too.  The register allocator can currently take advantage
>>     of that, allocating a GPR when it's useful and not wasting one
>>     otherwise.  (From what I've seen in the past, JPEG FFT loops tend
>>     to be on the borderline as far as register pressure on MIPS goes,
>>     so this can be an important win.)
>>
>> But there are only a limited number of accumulator registers (1 or 4,
>> depending on the target).  There's quite a high likelihood that
>> any given value will need to be spilled from the accumulators
>> at some point.  When that happens, it's better to spill to a GPR
>> than it is to spill to memory, since any load or store has to go
>> through a GPR anyway.  It therefore seems better to put GPRs and
>> accumulator registers in the same cover class.
>>
>>   
> It better to put GPRs and ACCs in the same class  if it is better to 
> spill ACC_REGS to GPR than to memory but it should be reflected in some 
> way in memory and register costs. 

Great!  That's what the WIP patch did, and it seems to work well
with IRA for the most part.  This unnecessary spilling thing was the
only real problem I've found.

>> We currently give moves between GPRs and accumulators a higher cost than
>> GPR loads and stores.[*]  On the targets for which this cost is accurate,
>> we _don't_ want to use LO and HI as spill space.  We also don't want
>> to move between one accumulator and another if we can help it.
>> And IRA generally seems happy with this.
>>
>>   [*] Which isn't an accurate reflection of all targets, but that's
>>       another story.  We ought eventually to put this in the CPU cost
>>       table.
>>
>> The hitch is that the cost of storing an accumulator to memory
>> is the cost of a GPR<->accumulator move plus the cost of a load
>> or store.  The cost of moving between one accumulator and another
>> is the cost of two GPR<->accumulator moves.  Both of these aggregate
>> costs are accurate, to the extent that the constituent costs are
>> accurate (see [*] above).  So we have a situation in which the
>> worst-case register<->register cost (acc<->acc) outweighs the
>> worst-cost register<->memory cost (acc<->mem).  And that goes
>> against the cover class documentation.
>>
>>   
> The documentation can be changed.  It is just a recommendation or how I 
> see it.  We have to separate reg classes into non-intersected ones 
> because Chaitin-Briggs coloring needs this for its work.  The bigger 
> cover classes, the more probability that RA puts more pseudos into hard 
> registers.  On the other hand CB coloring does not understand register 
> and memory move costs (assign_hard_reg does understand but it can be 
> used for other coloring algorithms as Chow's priority coloring).  It is 
> hard to find a balance in defining cover classes to put more pseudos 
> into hard-registers and still generate cheap code.  I should think about 
> better IRA_COVER_CLASSES description.

Thanks.  (This certainly wasn't an attack on the documentation btw.
My point was more: "I realise that, because the MIPS set-up isn't
really sanctioned by the documentation, it would be reasonable to
classify this as a bug in the MIPS port".)

But it's good to hear this is more a recommendation than a hard rule.
Like I say, IRA seems to cope pretty well with things despite the
unusual costs.

>> For the most part things Just Work.  But in the code quoted
>> above, this cost:
>>
>>            cost = (ira_register_move_cost[mode][rclass][rclass] 
>>                    * (exit_freq + enter_freq));
>>            ALLOCNO_HARD_REG_COSTS (subloop_allocno)[index] -= cost;
>>
>> outweights this cost:
>>
>>                cost
>>                  += (ira_memory_move_cost[mode][rclass][1] * exit_freq
>>                      + ira_memory_move_cost[mode][rclass][0] * enter_freq);
>>
>> and so move_spill_restore can actually spill an allocno in cases
>> where ira-emit.c wouldn't need to do anything.  In particular,
>> I saw this in zlib's infblock.c (part of CSiBE).  We had a pseudo
>> register P with several allocnos, and every allocno was allocated
>> the same register ($21, as it happens).  $21 wasn't used by any other
>> pseudo register, so no spill code was needed for P.  But move_spill_restore
>> nevertheless spilled some of the allocnos, introducing lots of redundant
>> loads and stores.  If you disable move_spill_restore, ira-emit.c does
>> the right thing.
>>
>> So the move_spill_restore problem affects brok^Wunsual targets like MIPS
>> more than most, and I realise that, in the scenario above, MIPS is going
>> against design.  On the other hand, the code doesn't look quite as I'd
>> expect.
>>
>> In summary:
>>
>>   - Vlad, could you clarify when costs are and aren't accumulated?
>>
>>   
> The costs should be always accumulated.  That was original design.  I 
> rewrote IR building so many times to speed it up and probably lost some 
> places where it is not accumulated.

Thanks.  (And understandable about the updates.)

>>   - I'd appreciate suggestions about how to handle the MIPS accumulators.
>>
>>   
> It is hard to say without actual experiments.  I'd try to
> 1) define ACCs and GPRs as one cover class for subtargets where the move 
> between GPRs and ACCs is cheaper than memory.
> 2) Separate the for the rest of subtargets.
>
> Define correct register and memory costs in any case.  To avoid spilling 
> to HI and LO, define move costs for them which resulting in avoiding 
> them in assign_hard_reg.

OK.

FWIW, a move between GPRs and ACCs is always cheaper than a load
or store of an ACC, so hopefully (1) should work with everything.

>> There's an auxillary question about whether we should use
>> REGNO_REG_CLASS more often in cases where a hard register has already
>> been chosen, since this can make a big difference on some targets.
>>
>>   
> Yes, I think we should.  I have some comments marked by ??? to use hard 
> register class instead of cover_class if we know the actual 
> hard-register.  But I remember one problem that costs might be wrong for 
> some targets for some class to class movements.  It is safer to use 
> cover classes.  Still I think we should move to use hard register 
> classes where it is possible and fix register_move_costs if their wrong 
> for some register class pairs.

Sounds good.

>> For the record, the current CSiBE results for MIPS are:
>>
>>   
> Sorry, Richard.  It is hard to understand to me.  Is the first number is 
> for IRA?
>> -O0 -EB -mips16 -mabi=32 -mhard-float          4499177  3998513 :   88.87%
>> ...
>> -Os -EB -mips16 -mabi=32 -mhard-float          2839673  2840465 :  100.03%
>> ...
>> and from what I've seen, the increases seem to be from this kind of needless
>> spilling.  (A lot of the individual -Os tests are better with IRA,
>> but the ones for which this problem triggers are much worse).

No, -fno-ira was first, -fira was second.  The summary is: -O0 is
already significantly better with IRA, but -Os was slightly worse
for all option combinations I tried.

These figures are the overall ones for the whole of CSiBE.
If you look at the size of individual objects, the -Os results
are also generally better with IRA.  It's just the occasional
large pessimisation in single objects that causes the overall
-Os results to be slightly worse.  The cases I've seen so far were
due to the unnecessary spilling.

Richard

Reply via email to