On 14/11/14 15:12, Maxim Kuvyrkov wrote:
On Nov 14, 2014, at 8:38 AM, Jeff Law <l...@redhat.com> wrote:

On 10/20/14 22:06, Maxim Kuvyrkov wrote:
Hi,
Ramana, this change requires benchmarking, which I can't easily do
at
the moment. I would appreciate any benchmarking results that you can
share. In particular, the value of PARAM_SCHED_AUTOPREF_QUEUE_DEPTH
needs to be tuned/confirmed for Cortex-A15.
What were the results of that benchmarking?  IIRC I tabled reviewing this work 
waiting for those results (and I probably should have let you know that.  
Sorry,  my bad there).

I don't have the benchmarking results yet, and I was hoping for ARM to help 
with getting the numbers.  The arm maintainers still need to OK the 
arm-specific portion of the patch, which, I imagine, will happen only of 
benchmark scores improve.


I tried benchmarking 78f367cfcfdc9f0a422a362cd85ecc122834d96f from the trees you gave me links to against the equivalent version on trunk.

The results with SPEC2k on A15 were in the noise with the default value for PARAM_SCHED_AUTOPREF_QUEUE_DEPTH which is 2 in the backend. I'm still waiting on results for values 0, 1 and 3 and hopefully something will come back soon for SPEC2k.



@@ -29903,6 +29915,20 @@ arm_first_cycle_multipass_dfa_lookahead (void)
   return issue_rate > 1 ? issue_rate : 0;
 }

+/* Enable modeling of Cortex-A15 L2 auto-prefetcher.  */
+static int
+arm_first_cycle_multipass_dfa_lookahead_guard (rtx insn, int ready_index)
+{
+  switch (arm_tune)
+    {
+    case cortexa15:
+      return autopref_multipass_dfa_lookahead_guard (insn, ready_index);
+
+    default:
+      return 0;
+    }
+}
+


It would be better to have this as a flag in the tuning tables rather than hardcoding for a core here. The backend has been moving in that direction for all core centric information and it is preferable that be continued.

So this logic here should just be

if (current_tune->multipass_lookahead)
  return autopref_multipass_lookahead_guard (insn, ready_index);
else
  return 0;




regards
Ramana





...
Can this be built on top of Bin's work for insn fusion?  There's a lot of 
commonality in the structure of the insns you care about.  He's already got a 
nice little priority function that I think you could utilize to to ensure the 
insns with smaller offsets fire first.

I would argue that macro-fusion should have been implemented the way 
autopref_model is -- via 
targetm.sched.first_cycle_multipass_dfa_lookahead_guard hook.  To implement the 
autopref model I cleaned up and generalized existing infrastructure (max_issue 
and dfa_lookahead_guard hook) instead of adding yet another decision-making 
primitive to the scheduler.



My biggest concern would be sched2 coming along and undoing that work since 
you're not going to fuse those into move-multiple types of instructions.

The autoprefetcher will be active only during sched2.  It is disabled during 
sched1 by the fact that max_issue is not used when scheduling for register 
pressure.

Thanks,

--
Maxim Kuvyrkov
www.linaro.org


Reply via email to