Re: [AutoFDO] Fix get_original_name to strip only names that are generated after auto-profile

Jan Hubicka Fri, 27 Jun 2025 06:49:26 -0700

> Hi Honza,
> > So merging the profiles will also lead to inconsistencies making the
> > .part variant to seem more hot than it is...
> 
> I am looking into this and will post the patch as a follow up patch.


Thanks.  Note that now with merging being done recursively to inline
instances while offlining inline differences, partial functions will
need bit more care.  Currently merge things that the head is full
function with everything offlined and will offline everything from
.part.

I suppose we need to recognize split functions and in offline pass
offline all inlined head functions and produce combined offline instance
with the parts...

afdo_offline now has infrastructure for rewritting function instance
names since I needed to deal with the additional problem of inline
functions appearing under their dwarf names.

I think we should move removing the clone suffixes there too instead of
handling renamng directly at afdo read time.
> >> _Z22init_attr_rdwr_indicesP8hash_mapI16rdwr_access_hash11attr_access21simple_hashmap_traitsI19default_hash_traitsIS0_ES1_EEP9tree_node
> >>  total:212 head:71
> >>  2: 71  
> >> _Z22init_attr_rdwr_indicesP8hash_mapI16rdwr_access_hash11attr_access21simple_hashmap_traitsI19default_hash_traitsIS0_ES1_EEP9tree_node.part.0:5
> >>  143: 141
> >>
> >> This looks odd. Looks like create_gcovt getting  mixed up with the offset 
> >> of inlined functions
> >
> > I am not sure I follow what you mean here?
> 
> Head count here is 5 ( head:5). But the sample counts for the offset does not 
> match this.  Except that:
> >>  6: lookup_attribute total:40
> >>    4: 5
> 
> 
> This looks wrong?

Isn't it because head count is determined from the call instructions
reacing the function body while the sample counts of offsets are
accounted from the jump instruction in the body?
> 
> 
> Here is the revised patch for get_original_name. Also added a test case.
> Is this OK?
OK,
thanks!
Honza
> 
> Thanks,
> Kugan
> 
> 
> 
> >
> > This is my current benchmark run with -Ofast -mtune=native (without LTO)
> > comparing no feedback (base) to autofdo (peak)
> >
> > 500.perlbench_r       1      167         9.51  *       1      155        
> > 10.3   *
> > 502.gcc_r             1      132        10.7   *       1      126        
> > 11.2   *
> > 505.mcf_r             1      226         7.16  *       1      225         
> > 7.20  *
> > 520.omnetpp_r         1      203         6.47  *       1      203         
> > 6.47  *
> > 523.xalancbmk_r                               NR                            
> >    NR
> > 525.x264_r            1       84.7      20.7   *       1       90.7      
> > 19.3   *
> > 531.deepsjeng_r       1      208         5.50  *       1      209         
> > 5.47  *
> > 541.leela_r           1      295         5.61  *       1      318         
> > 5.21  *
> > 548.exchange2_r       1       85.9      30.5   *       1       93.3      
> > 28.1   *
> > 557.xz_r              1      225         4.79  *       1      220         
> > 4.90  *
> > Est. SPECrate2017_int_base              9.13
> > Est. SPECrate2017_int_peak                                               
> > 9.05
> >
> > So there are regressions in x264, deepsjeng, leela and exchange neighter
> > of them very bad.  I think it would be interesting to understand
> > 541.leela_r first.
> >
> > Honza
> >>
> >> Thanksm
> >> Kugabn
> 
>

Re: [AutoFDO] Fix get_original_name to strip only names that are generated after auto-profile

Reply via email to