On Wed, Dec 19, 2018 at 10:01:15AM +0800, Bin.Cheng wrote: > On Tue, Dec 18, 2018 at 7:15 PM Bin.Cheng <amker.ch...@gmail.com> wrote: > > > > On Sun, Dec 16, 2018 at 9:11 AM Andi Kleen <a...@linux.intel.com> wrote: > > > > > > "bin.cheng" <bin.ch...@linux.alibaba.com> writes: > > > > > > > Hi, > > > > > > > > Due to ICE and mal-functional bugs, indirect call value profile > > > > transformation > > > > is disabled on GCC-7/8/trunk. This patch restores the transformation. > > > > The > > > > main issue is AutoFDO should store cgraph_node's profile_id of callee > > > > func in > > > > the first histogram value's counter, rather than pointer to callee's > > > > name string > > > > as it is now. > > > > With the patch, some "Indirect call -> direct call" tests pass with > > > > autofdo, while > > > > others are unstable. I think the instability is caused by poor perf > > > > data collected > > > > during regrets run, and can confirm these tests pass if good perf data > > > > could be > > > > collected in manual experiments. > > > > > > Would be good to make the tests stable, otherwise we'll just have > > > regressions in the future again. > > > > > > The problem is that the tests don't run long enough and don't get enough > > > samples? > > Yes, take g++.dg/tree-prof/morefunc.C as an example: > > - int i; > > - for (i = 0; i < 1000; i++) > > + int i, j; > > + for (i = 0; i < 1000000; i++) > > + for (j = 0; j < 50; j++) > > g += tc->foo(); > > if (g<100) g++; > > } > > @@ -27,8 +28,9 @@ void test1 (A *tc) > > static __attribute__((always_inline)) > > void test2 (B *tc) > > { > > - int i; > > + int i, j; > > for (i = 0; i < 1000000; i++) > > + for (j = 0; j < 50; j++) > > > > I have to increase loop count like this to get stable pass on my > > machine. The original count (1000) is too small to be sampled. > > > > > > > > Could add some loop? > > > Or possibly increase the sampling frequency in perf (-F or -c)? > > Maybe, I will have a try. > Turned out all "Indirect call" test can be resolved by adding -c 100 > to perf command line: > diff --git a/gcc/config/i386/gcc-auto-profile > b/gcc/config/i386/gcc-auto-profile > ... > -exec perf record -e $E -b "$@" > +exec perf record -e $E -c 100 -b "$@" > > Is 100 too small here? Or is it fine for all scenarios?
-c 100 is risky because it can cause perf throttling, which makes it lose data. perf has a limiter that if the PMU handler uses too much CPU time it stops measuring for some time. A PMI is 10k+ cycles, so doing one every 100 branches is a lot of CPU time. I wouldn't go down that low. It is better to increase the iteration count. -Andi