Hi,

In cortex-a8.md call cost is set to 32, while in cortex-a9.md it's 0:

== cortex-a8.md ==
;; Call latencies are not predictable.  A semi-arbitrary very large
;; number is used as "positive infinity" so that everything should be
;; finished by the time of return.
(define_insn_reservation "cortex_a8_call" 32
  (and (eq_attr "tune" "cortexa8")
       (eq_attr "type" "call"))
  "cortex_a8_issue_branch")

== cortex-a9.md ==
;; Call latencies are essentially 0 but make sure
;; dual issue doesn't happen i.e the next instruction
;; starts at the next cycle.
(define_insn_reservation "cortex_a9_call"  0
  (and (eq_attr "tune" "cortexa9")
       (eq_attr "type" "call"))
"cortex_a9_issue_branch + cortex_a9_multcycle1 + cortex_a9_ls + ca9_issue_vfp_neon")
====

Do these CPUs differ much? Which cost is the right one?

Here's why I'm asking. In the following example, dependence cost of 32 for cortex_a8_call causes insns 464 and 575 to be separated by 308 (in spite having same priority), because 575 is not ready at tick 12, which causes generation of separate IT-blocks for them on Thumb-2.

;;<---->  9-->   300 r0=call [`spec_putc']        :cortex_a8_issue_branch
;;<---->  9-->   306 r3=sl 0>>0x18^r8                  :cortex_a8_default
;;<----> 10-->   309 cc=cmp(r5,r8)                     :cortex_a8_default
;;<----> 11-->   307 r3=[r3*0x4+r9] :cortex_a8_load_store_1
;;<----> 12-->   464 (cc) r2=0x1                       :cortex_a8_default
;;<----> 13-->   308 sl=sl<<0x8^r3                     :cortex_a8_default
;;<----> 41-->   575 (cc) [sp+0x4]=r2 :cortex_a8_load_store_1

Insn 575 has true dependency with call insn 300 on r2, which is CALL_USED_REG, and as 464 is conditional, 575 retains true dependency with 300.

Setting cortex_a8_call cost to 1 saves 186 bytes on SPEC2000 INT (but I'm not sure whether it's only because of less IT-block splitting).

--
Best regards,
  Dmitry

Reply via email to