> On May 10, 2023, at 9:15 AM, Jan Hubicka <hubi...@ucw.cz> wrote:
> 
>> Honza,
>>> Main motivation for this was profiling programs that contain specific
>>> code paths for different CPUs (such as graphics library in Firefox or Linux
>>> kernel). In the situation training machine differs from the machine
>>> program is run later, we end up optimizing for size all code paths
>>> except ones taken by the specific CPU.  This patch essentially tells gcc
>>> to consider every non-trained function as built without profile
>>> feedback.
>> Make sense.
>>> 
>>> For Firefox it had important impact on graphics rendering tests back
>>> then since the building machined had AVX while the benchmarking did not.
>>> Some benchmarks improved several times which is not a surprise if you
>>> consider tight graphics rendering loop optimized for size versus
>>> vectorized one.  
>> 
>> That’s a lot of improvement. So, without -fprofile-partial-training, the PGO 
>> hurt the performance for those cases? 
> 
> Yes, to get code size improvements we assume that the non-trained part
> of code is cold and with -Os we are very aggressive to optimize for
> size.  We now have two-level optimize_for size, so I think we could
> make this more fine grained this stage1.

Okay. I see. 

Thanks a lot for the info.

Another question (which is confusing us very much right now is):

When we lower the following  parameter from 999 to 950: (in GCC8)

DEFPARAM(HOT_BB_COUNT_WS_PERMILLE,
         "hot-bb-count-ws-permille",
         "A basic block profile count is considered hot if it contributes to "
         "the given permillage of the entire profiled execution.”
         999, 0, 1000)

The size of the “text.hot" section is 4x times SMALLER than the default one. Is 
this expected behavior? 
(From my reading of the GCC8 source code, when this parameter is getting 
smaller, more basic blocks and functions will
Be considered as HOT by GCC, then the text.hot section should be larger, not 
smaller, do I miss anything here?)

Thanks a lot for your help.

Qing

> 
> Honza
>> 
>>> The patch has bad effect on code size which in turn
>>> impacts performance too, so I think it makes sense to use
>>> -fprofile-partial-training with bit of care (i.e. only one code where
>>> such scenarios are likely).
>> 
>> Right. 
>>> 
>>> As for backporting, I do not have checkout of GCC 8 right now. It
>>> depends on profile infrastructure that was added in 2017 (so stage1 of
>>> GCC 8), so the patch may backport quite easilly.  I am not 100% sure
>>> what shape the infrastrucure was in the first version, but I am quite
>>> convinced it had the necessary bits - it was able to make the difference
>>> between 0 profile count and missing profile feedback.
>> 
>> This is good to know, I will try to back port to GCC8 and let them test to 
>> see any good impact.
>> 
>> Qing
>>> 
>>> Honza

Reply via email to