"Maciej W. Rozycki" <ma...@codesourcery.com> writes:
> On Mon, 24 Sep 2012, Richard Sandiford wrote:
>
>> >  From the context I am assuming none of this matters for the 74K (and 
>> > presumably the 24KE/34K) and a MULT $0, $0 is indeed faster, but overall 
>> > isn't it something that should be decided based on instruction costs from 
>> > DFA schedulers?  Is there anything that I've missed here?  It doesn't 
>> > appear to me your (and neither the original) proposal takes instruction 
>> > cost calculation into consideration.
>> 
>> In practice, we only move 0 into HI and LO for MADD- and MSUB-style
>> operations.  We deliberately don't use HI and LO as scratch space.
>> 
>> I think it's a reasonable default assumption that anything that supports
>> those instructions also has a fast path from MULT to MADD or MULT to MSUB.
>
>  According to my sources the R4650 has a 4-cycle MULT latency (MAD is 3-4 
> cycles on that processor).  An MTHI/MTLO pair will take 2 cycles; 
> obviously the resulting larger code may adversely affect cache performance 
> in some scenarios.

That's not how the 4650 DFA models it though.

(define_insn_reservation "generic_hilo" 1
  (eq_attr "type" "mfhi,mflo,mthi,mtlo")
  "imuldiv*3")

(define_insn_reservation "r4650_imul" 4
  (and (eq_attr "cpu" "r4650")
       (eq_attr "type" "imul,imul3,imadd"))
  "imuldiv*4")

So if we believed the DFA, MTLO + MTHI would occupy the muldiv unit for 6
rather than 4 cycles.  Any attempt to use the DFA would still favour MULT.

Richard

Reply via email to