On Mon, 24 Sep 2012, Richard Sandiford wrote: > > From the context I am assuming none of this matters for the 74K (and > > presumably the 24KE/34K) and a MULT $0, $0 is indeed faster, but overall > > isn't it something that should be decided based on instruction costs from > > DFA schedulers? Is there anything that I've missed here? It doesn't > > appear to me your (and neither the original) proposal takes instruction > > cost calculation into consideration. > > In practice, we only move 0 into HI and LO for MADD- and MSUB-style > operations. We deliberately don't use HI and LO as scratch space. > > I think it's a reasonable default assumption that anything that supports > those instructions also has a fast path from MULT to MADD or MULT to MSUB.
According to my sources the R4650 has a 4-cycle MULT latency (MAD is 3-4 cycles on that processor). An MTHI/MTLO pair will take 2 cycles; obviously the resulting larger code may adversely affect cache performance in some scenarios. > I certainly don't know of any counter-examples. The decision is deliberately > centeralised in one place so that the condition can be tweaked in future > if necessary. Sure, the cost comparison could be done in that single place as well. Maciej