Re: [Intel-gfx] [RFC] drm/i915: Add a new modparam for customized ring multiplier

Rogozhkin, Dmitry V Wed, 27 Dec 2017 09:43:20 -0800

>> I definitely asked what will be if GT request will be bigger than IA 
>> request. But that was a year ago and I don't remember the answer. Let me ask 
>> again. I will mail back in few days.


Hi Chris, here is the response.

Question was: "Whether we can meet with the RING transition penalty (at least 
theoretically) if we will have GT request higher than IA request with the 
dominant IA load and tiny GT load, i.e. reverted situation of what we have 
actually faced? For example, if we will try to pin IA frequency to 800MHz (x1 
multiplier) and GT frequency to 700MHz (x2 multiplier): in that case we will 
have requests for ring 800 vs. 1400."

Answer is: "In this case, if the GT will toggle between RC0 and RC6, it will 
force ring frequency to toggle between 800 and 1400, which in the toggling time 
will stall IA execution. This will lead to performance loss." However, this is 
a case if we have really few toggle events within few milliseconds. It is quite 
probable that GT driver will not allow such behavior to happen if it simply 
doesn't often toggle between RC0 and RC6. Considering that GT driver probably 
handles much less interrupts than IA, this can be the case. So, I think Chris, 
that's now question to you: how often you toggle between RC0 and RC6 to see the 
reverted issue to happen? If you don't toggle much, then RING will simply 
remain on 1400 almost all the time and you will see no issue.

Again, I remind that's the talk about Gen9 only.

Dmitry.

-----Original Message-----
From: Intel-gfx [mailto:[email protected]] On Behalf Of 
Rogozhkin, Dmitry V
Sent: Tuesday, December 26, 2017 9:39 AM
To: Chris Wilson <[email protected]>; Li, Yaodong 
<[email protected]>; [email protected]
Cc: Widawsky, Benjamin <[email protected]>
Subject: Re: [Intel-gfx] [RFC] drm/i915: Add a new modparam for customized ring 
multiplier

>> To clarify, the HW will flip between the two GT/IA requests rather than 
>> stick to the highest?

Yes, it will flip on Gen9. On Gen8 there was some mechanism (HW) which 
flattened that. But it was removed/substituted in Gen9. In Gen10 it was tuned  
to close the mentioned issue.

>> Do you know anything about the opposite position. I heard a suggestion that 
>> simply increasing the ringfreq universally caused thermal throttling in some 
>> other workloads. Do you have any knowledge of those?

Initially we tried to just increase GT multiplier to x3 and stepped into the 
throttling. Thus, we introduced parameter to be able to mitigate all that 
depending on the SKU and user needs. I definitely asked what will be if GT 
request will be bigger than IA request. But that was a year ago and I don't 
remember the answer. Let me ask again. I will mail back in few days.

>> You are thinking of plugging into intel_pstate to make it smarter for ia 
>> freq transitions?

Yep. This seems a correct step to give some automatic support instead of 
parameter/hardcoded multiplier.

Dmitry.

-----Original Message-----
From: Chris Wilson [mailto:[email protected]] 
Sent: Tuesday, December 26, 2017 8:59 AM
To: Rogozhkin, Dmitry V <[email protected]>; Li, Yaodong 
<[email protected]>; [email protected]
Cc: Gong, Zhipeng <[email protected]>; Widawsky, Benjamin 
<[email protected]>; Mateo Lozano, Oscar <[email protected]>; 
Kamble, Sagar A <[email protected]>; Li, Yaodong <[email protected]>
Subject: RE: [RFC] drm/i915: Add a new modparam for customized ring multiplier

Quoting Rogozhkin, Dmitry V (2017-12-26 16:39:23)
> Clarification on the issue. Consider that you have a massive load on GT and 
> just tiny one on IA. If GT will program the RING frequency to be lower than 
> IA frequency, then you will fall into the situation when RING frequency 
> constantly transits from GT to IA level and back. Each transition of a RING 
> frequency is a full system stall. If you will have "good" transition rate 
> with few transitions per few milliseconds you will lose ~10% of performance. 
> That's the case for media workloads when you easily can step into this since 
> 1) media utilizes few GPU engines and with few parallel workloads you can 
> make sure that at least 1 engine is _always_ doing something, 2) media BB are 
> relatively small, so you have regular wakeups of the IA to manage requests. 
> This will affect Gen9 platforms due to HW design change (we've spot this in 
> SKL). This will not happen in Gen8 (old HW design). This will be fixed in 
> Gen10+ (CNL+).

To clarify, the HW will flip between the two GT/IA requests rather than stick 
to the highest? Iirc, the expectation was that we were setting a requested 
minimum frequency for the ring/ia based off the gpu freq.

> On SKL we ran into this with the GPU frequency pinned to 700MHz, CPU to 2GHz. 
> Multipliers were x2 for GT, x1 for IA.

Basically, with the GPU clocked to mid frequency, memory throughput is 
insufficient to keep the fixed functions occupied, and you need to increase the 
ring frequency. Is there ever a case where we don't need max ring frequency? 
(Perhaps we still need to set low frequency for GT
idle?) I guess media is more susceptible to this as that workload should be 
sustainable at reduced clocks, GL et al are much more likely to keep the clocks 
ramped all the way up.

Do you know anything about the opposite position. I heard a suggestion that 
simply increasing the ringfreq universally caused thermal throttling in some 
other workloads. Do you have any knowledge of those?
 
> So, effectively, what we need to do is to make sure that RING frequency 
> request from GT is _not_ below the request from IA. If IA requests 2GHz, we 
> can't request 1.4GHz, we need request at least 2GHz. Multiplier patch was 
> intended to do exactly that, but manually. Can  we somehow automate that 
> managing IA frequency requests to the RING?

You are thinking of plugging into intel_pstate to make it smarter for ia freq 
transitions? That seems possible, certainly. I'm not sure if the ring frequency 
is actually poked from anywhere else in the kernel, would be interesting to 
find out.
-Chris
_______________________________________________
Intel-gfx mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [RFC] drm/i915: Add a new modparam for customized ring multiplier

Reply via email to