"Maciej W. Rozycki" <ma...@codesourcery.com> writes: > On Mon, 24 Sep 2012, Richard Sandiford wrote: > >> > From the context I am assuming none of this matters for the 74K (and >> > presumably the 24KE/34K) and a MULT $0, $0 is indeed faster, but overall >> > isn't it something that should be decided based on instruction costs from >> > DFA schedulers? Is there anything that I've missed here? It doesn't >> > appear to me your (and neither the original) proposal takes instruction >> > cost calculation into consideration. >> >> In practice, we only move 0 into HI and LO for MADD- and MSUB-style >> operations. We deliberately don't use HI and LO as scratch space. >> >> I think it's a reasonable default assumption that anything that supports >> those instructions also has a fast path from MULT to MADD or MULT to MSUB. > > According to my sources the R4650 has a 4-cycle MULT latency (MAD is 3-4 > cycles on that processor). An MTHI/MTLO pair will take 2 cycles; > obviously the resulting larger code may adversely affect cache performance > in some scenarios.
That's not how the 4650 DFA models it though. (define_insn_reservation "generic_hilo" 1 (eq_attr "type" "mfhi,mflo,mthi,mtlo") "imuldiv*3") (define_insn_reservation "r4650_imul" 4 (and (eq_attr "cpu" "r4650") (eq_attr "type" "imul,imul3,imadd")) "imuldiv*4") So if we believed the DFA, MTLO + MTHI would occupy the muldiv unit for 6 rather than 4 cycles. Any attempt to use the DFA would still favour MULT. Richard