Hi,
In cortex-a8.md call cost is set to 32, while in cortex-a9.md it's 0:
== cortex-a8.md ==
;; Call latencies are not predictable. A semi-arbitrary very large
;; number is used as "positive infinity" so that everything should be
;; finished by the time of return.
(define_insn_reservation "cortex_a8_call" 32
(and (eq_attr "tune" "cortexa8")
(eq_attr "type" "call"))
"cortex_a8_issue_branch")
== cortex-a9.md ==
;; Call latencies are essentially 0 but make sure
;; dual issue doesn't happen i.e the next instruction
;; starts at the next cycle.
(define_insn_reservation "cortex_a9_call" 0
(and (eq_attr "tune" "cortexa9")
(eq_attr "type" "call"))
"cortex_a9_issue_branch + cortex_a9_multcycle1 + cortex_a9_ls +
ca9_issue_vfp_neon")
====
Do these CPUs differ much? Which cost is the right one?
Here's why I'm asking. In the following example, dependence cost of 32
for cortex_a8_call causes insns 464 and 575 to be separated by 308 (in
spite having same priority), because 575 is not ready at tick 12, which
causes generation of separate IT-blocks for them on Thumb-2.
;;<----> 9--> 300 r0=call [`spec_putc'] :cortex_a8_issue_branch
;;<----> 9--> 306 r3=sl 0>>0x18^r8 :cortex_a8_default
;;<----> 10--> 309 cc=cmp(r5,r8) :cortex_a8_default
;;<----> 11--> 307 r3=[r3*0x4+r9] :cortex_a8_load_store_1
;;<----> 12--> 464 (cc) r2=0x1 :cortex_a8_default
;;<----> 13--> 308 sl=sl<<0x8^r3 :cortex_a8_default
;;<----> 41--> 575 (cc) [sp+0x4]=r2 :cortex_a8_load_store_1
Insn 575 has true dependency with call insn 300 on r2, which is
CALL_USED_REG, and as 464 is conditional, 575 retains true dependency
with 300.
Setting cortex_a8_call cost to 1 saves 186 bytes on SPEC2000 INT (but
I'm not sure whether it's only because of less IT-block splitting).
--
Best regards,
Dmitry