[RFC, ARM] cortex_a8_call cost

Dmitry Melnik Thu, 29 Dec 2011 05:44:44 -0800

Hi,

In cortex-a8.md call cost is set to 32, while in cortex-a9.md it's 0:


== cortex-a8.md ==
;; Call latencies are not predictable.  A semi-arbitrary very large
;; number is used as "positive infinity" so that everything should be
;; finished by the time of return.
(define_insn_reservation "cortex_a8_call" 32
  (and (eq_attr "tune" "cortexa8")
       (eq_attr "type" "call"))
  "cortex_a8_issue_branch")

== cortex-a9.md ==
;; Call latencies are essentially 0 but make sure
;; dual issue doesn't happen i.e the next instruction
;; starts at the next cycle.
(define_insn_reservation "cortex_a9_call"  0
  (and (eq_attr "tune" "cortexa9")
       (eq_attr "type" "call"))

"cortex_a9_issue_branch + cortex_a9_multcycle1 + cortex_a9_ls +ca9_issue_vfp_neon")

====

Do these CPUs differ much? Which cost is the right one?

Here's why I'm asking. In the following example, dependence cost of 32for cortex_a8_call causes insns 464 and 575 to be separated by 308 (inspite having same priority), because 575 is not ready at tick 12, whichcauses generation of separate IT-blocks for them on Thumb-2.


;;<---->  9-->   300 r0=call [`spec_putc']        :cortex_a8_issue_branch
;;<---->  9-->   306 r3=sl 0>>0x18^r8                  :cortex_a8_default
;;<----> 10-->   309 cc=cmp(r5,r8)                     :cortex_a8_default
;;<----> 11-->   307 r3=[r3*0x4+r9] :cortex_a8_load_store_1
;;<----> 12-->   464 (cc) r2=0x1                       :cortex_a8_default
;;<----> 13-->   308 sl=sl<<0x8^r3                     :cortex_a8_default
;;<----> 41-->   575 (cc) [sp+0x4]=r2 :cortex_a8_load_store_1

Insn 575 has true dependency with call insn 300 on r2, which isCALL_USED_REG, and as 464 is conditional, 575 retains true dependencywith 300.

Setting cortex_a8_call cost to 1 saves 186 bytes on SPEC2000 INT (butI'm not sure whether it's only because of less IT-block splitting).


--
Best regards,
  Dmitry

[RFC, ARM] cortex_a8_call cost

Reply via email to