> Hi Honza, > > So merging the profiles will also lead to inconsistencies making the > > .part variant to seem more hot than it is... > > I am looking into this and will post the patch as a follow up patch.
Thanks. Note that now with merging being done recursively to inline instances while offlining inline differences, partial functions will need bit more care. Currently merge things that the head is full function with everything offlined and will offline everything from .part. I suppose we need to recognize split functions and in offline pass offline all inlined head functions and produce combined offline instance with the parts... afdo_offline now has infrastructure for rewritting function instance names since I needed to deal with the additional problem of inline functions appearing under their dwarf names. I think we should move removing the clone suffixes there too instead of handling renamng directly at afdo read time. > >> _Z22init_attr_rdwr_indicesP8hash_mapI16rdwr_access_hash11attr_access21simple_hashmap_traitsI19default_hash_traitsIS0_ES1_EEP9tree_node > >> total:212 head:71 > >> 2: 71 > >> _Z22init_attr_rdwr_indicesP8hash_mapI16rdwr_access_hash11attr_access21simple_hashmap_traitsI19default_hash_traitsIS0_ES1_EEP9tree_node.part.0:5 > >> 143: 141 > >> > >> This looks odd. Looks like create_gcovt getting mixed up with the offset > >> of inlined functions > > > > I am not sure I follow what you mean here? > > Head count here is 5 ( head:5). But the sample counts for the offset does not > match this. Except that: > >> 6: lookup_attribute total:40 > >> 4: 5 > > > This looks wrong? Isn't it because head count is determined from the call instructions reacing the function body while the sample counts of offsets are accounted from the jump instruction in the body? > > > Here is the revised patch for get_original_name. Also added a test case. > Is this OK? OK, thanks! Honza > > Thanks, > Kugan > > > > > > > This is my current benchmark run with -Ofast -mtune=native (without LTO) > > comparing no feedback (base) to autofdo (peak) > > > > 500.perlbench_r 1 167 9.51 * 1 155 > > 10.3 * > > 502.gcc_r 1 132 10.7 * 1 126 > > 11.2 * > > 505.mcf_r 1 226 7.16 * 1 225 > > 7.20 * > > 520.omnetpp_r 1 203 6.47 * 1 203 > > 6.47 * > > 523.xalancbmk_r NR > > NR > > 525.x264_r 1 84.7 20.7 * 1 90.7 > > 19.3 * > > 531.deepsjeng_r 1 208 5.50 * 1 209 > > 5.47 * > > 541.leela_r 1 295 5.61 * 1 318 > > 5.21 * > > 548.exchange2_r 1 85.9 30.5 * 1 93.3 > > 28.1 * > > 557.xz_r 1 225 4.79 * 1 220 > > 4.90 * > > Est. SPECrate2017_int_base 9.13 > > Est. SPECrate2017_int_peak > > 9.05 > > > > So there are regressions in x264, deepsjeng, leela and exchange neighter > > of them very bad. I think it would be interesting to understand > > 541.leela_r first. > > > > Honza > >> > >> Thanksm > >> Kugabn > >