Re: FDO and LTO on ARM

Xinliang David Li Fri, 05 Aug 2011 10:39:06 -0700

On Fri, Aug 5, 2011 at 7:40 AM, Jan Hubicka <j...@suse.de> wrote:
> Am Fri 05 Aug 2011 09:32:05 AM CEST schrieb Richard Guenther
> <richard.guent...@gmail.com>:
>
>> On Thu, Aug 4, 2011 at 8:42 PM, Jan Hubicka <j...@suse.de> wrote:
>>>>>
>>>>> Did you try using FDO with -Os?  FDO should make hot code parts
>>>>> optimized similar to -O3 but leave other pieces optimized for size.
>>>>> Using FDO with -O3 gives you the opposite, cold portions optimized
>>>>> for size while the rest is optimized for speed.
>>>
>>> FDO with -Os still optimize for size, even in hot parts.
>>
>> I don't think so.  Or at least that would be a bug.  Shouldn't 'hot'
>> BBs/functions
>> be optimized for speed even at -Os?  Hm, I see predict.c indeed returns
>> always false for optimize_size :(
>
> It was outcome of discussion held some time ago.  I think it was Mark
> promoting point that users opitmize for size when they use -Os period.
>
> I thought we had just the neither cold or hot parts optimized according
> to optimize_size. I originally wanted to have attribute HOT to overwrite
> -Os, so the well annotateed sources (i.e. kernel) could compile with -Os by
> default and explicitely declare the hot parts hot and get them compiled
> appropriately.
>
> With profile feedback however the current logic is binary - i.e. blocks are
> either hot since their count is bigger than the threshold or cold. We don't
> really have "I don't really know" state there.  In some cases it would make
> sense - i.e. there are optimizations that we want to do only in the hottest
> parts of code, but we don't have any logic for that.


For profile summary at function/cgraph_node level, there are three
states: hot, unlikely, and normal.   At BB/EDGE level, there are three
states too, but implementation  turns it into 2 states (by querying
only 'maybe_hot_bb'): hot and not hot --- instead of 'hot', 'not hot
nor cold', and 'cold'.


David
>
> My plan is to extend ipa-profile to do better hot/cold partitioning first:
> at the moment we decide on fixed fraction of maximal count in the program.
> This is unnecesarily conservative for programs with not terribly flat
> profiles.  At IPA level we could collect histogram of counts of instructions
> (i.e. figure out how much time we spend on instructions executed N times)
> and then figure out where is the threshold so 99% of executed instructions
> belongs to hot region. This should give noticeably smaller binaries.
>>
>> I thought we had just the neither cold or hot parts optimized according
>> to optimize_size.
>
>
>>
>>>  So to get resonale
>>> speedups you need -O3+FDO.  -O3+FDO effectively defaults to -Os in cold
>>> portions of program.
>>
>> Well, but unless your training coverage is 100% all parts with no coverage
>> get optimized with -O3 instead of -Os.  And I bet coverage for mozilla
>> isn't even close to 100%.  Thus I think recommending -O3 for FDO is
>> usually a bad idea.
>
> Code with no coverage is cold in our model (as is code executed once or so)
> and thus optimized for -Os even at -O3+FDO. This is bit aggressive on
> optimizing for size side. We might consider changing this policy, but so far
> I didn't see any complains on this...
>
> Honza
>>
>> So - did you try FDO with -O2? ;)
>>
>>> Still -Os+FDO should be somewhat faster than -Os alone, so a slowdown is
>>> bug.  It is not very thoroughly since it is not really used in practice.
>>>
>>>>> Also do you get any warnings on profile mismatches? Perhaps something
>>>>> is wrong to the degree that the relevant part of profile gets
>>>>> misapplied.
>>>>
>>>> I don't get any warning on profile mismatches. I only get a "few"
>>>> missing gcda files warning, but that's expected.
>>>
>>> Perhaps you could compile one of less trivial files you are sure that are
>>> covered by train run and send me -fdump-tree-all-blocks -fdump-ipa-all
>>> dumps
>>> of the compilation so I can double check the profile seems sane. This
>>> could
>>> be good start to rule out something stupid.
>>>
>>> Honza
>>>>
>>>> Cheers,
>>>>
>>>> Mike
>>>>
>>>
>>>
>>>
>>
>
>
>

Re: FDO and LTO on ARM

Reply via email to