On Mon, 24 Sep 2012, Richard Sandiford wrote:

> >  From the context I am assuming none of this matters for the 74K (and 
> > presumably the 24KE/34K) and a MULT $0, $0 is indeed faster, but overall 
> > isn't it something that should be decided based on instruction costs from 
> > DFA schedulers?  Is there anything that I've missed here?  It doesn't 
> > appear to me your (and neither the original) proposal takes instruction 
> > cost calculation into consideration.
> 
> In practice, we only move 0 into HI and LO for MADD- and MSUB-style
> operations.  We deliberately don't use HI and LO as scratch space.
> 
> I think it's a reasonable default assumption that anything that supports
> those instructions also has a fast path from MULT to MADD or MULT to MSUB.

 According to my sources the R4650 has a 4-cycle MULT latency (MAD is 3-4 
cycles on that processor).  An MTHI/MTLO pair will take 2 cycles; 
obviously the resulting larger code may adversely affect cache performance 
in some scenarios.

> I certainly don't know of any counter-examples.  The decision is deliberately
> centeralised in one place so that the condition can be tweaked in future
> if necessary.

 Sure, the cost comparison could be done in that single place as well.

  Maciej

Reply via email to