On 14/11/14 15:12, Maxim Kuvyrkov wrote:
On Nov 14, 2014, at 8:38 AM, Jeff Law <l...@redhat.com> wrote:
On 10/20/14 22:06, Maxim Kuvyrkov wrote:
Hi,
Ramana, this change requires benchmarking, which I can't easily do
at
the moment. I would appreciate any benchmarking results that you can
share. In particular, the value of PARAM_SCHED_AUTOPREF_QUEUE_DEPTH
needs to be tuned/confirmed for Cortex-A15.
What were the results of that benchmarking? IIRC I tabled reviewing this work
waiting for those results (and I probably should have let you know that.
Sorry, my bad there).
I don't have the benchmarking results yet, and I was hoping for ARM to help
with getting the numbers. The arm maintainers still need to OK the
arm-specific portion of the patch, which, I imagine, will happen only of
benchmark scores improve.
I tried benchmarking 78f367cfcfdc9f0a422a362cd85ecc122834d96f from the
trees you gave me links to against the equivalent version on trunk.
The results with SPEC2k on A15 were in the noise with the default value
for PARAM_SCHED_AUTOPREF_QUEUE_DEPTH which is 2 in the backend. I'm
still waiting on results for values 0, 1 and 3 and hopefully something
will come back soon for SPEC2k.
@@ -29903,6 +29915,20 @@ arm_first_cycle_multipass_dfa_lookahead (void)
return issue_rate > 1 ? issue_rate : 0;
}
+/* Enable modeling of Cortex-A15 L2 auto-prefetcher. */
+static int
+arm_first_cycle_multipass_dfa_lookahead_guard (rtx insn, int ready_index)
+{
+ switch (arm_tune)
+ {
+ case cortexa15:
+ return autopref_multipass_dfa_lookahead_guard (insn, ready_index);
+
+ default:
+ return 0;
+ }
+}
+
It would be better to have this as a flag in the tuning tables rather
than hardcoding for a core here. The backend has been moving in that
direction for all core centric information and it is preferable that be
continued.
So this logic here should just be
if (current_tune->multipass_lookahead)
return autopref_multipass_lookahead_guard (insn, ready_index);
else
return 0;
regards
Ramana
...
Can this be built on top of Bin's work for insn fusion? There's a lot of
commonality in the structure of the insns you care about. He's already got a
nice little priority function that I think you could utilize to to ensure the
insns with smaller offsets fire first.
I would argue that macro-fusion should have been implemented the way
autopref_model is -- via
targetm.sched.first_cycle_multipass_dfa_lookahead_guard hook. To implement the
autopref model I cleaned up and generalized existing infrastructure (max_issue
and dfa_lookahead_guard hook) instead of adding yet another decision-making
primitive to the scheduler.
My biggest concern would be sched2 coming along and undoing that work since
you're not going to fuse those into move-multiple types of instructions.
The autoprefetcher will be active only during sched2. It is disabled during
sched1 by the fact that max_issue is not used when scheduling for register
pressure.
Thanks,
--
Maxim Kuvyrkov
www.linaro.org