On Monday 03 March 2008 22:38, Jan Hubicka wrote: > Hi, > I had to tweak the testcase a bit to not compute minimum: GCC optimizes > this early into MIN_EXPR throwing away any profile information. If we > get serious here we can maintain it via histogram, but I am not sure it > is worth the effort at least until IL is sanitized and expansion cleaned > up with tupple branch. > > I also had to fix bug in branch prediction ignoring __builtin_expect of > any early inlined function and update your testcase to not use > __buliltin_expect in predictable case.
I guess you mean, not to use it in the _unpredictable_ case? > However this is what I get on AthlonXP: > no deps, predictable -- C code took 13.71ns per iteration > no deps, predictable -- cmov code took 13.83ns per iteration > no deps, predictable -- jmp code took 13.94ns per iteration > has deps, predictable -- C code took 15.54ns per iteration > has deps, predictable -- cmov code took 22.21ns per iteration > has deps, predictable -- jmp code took 16.55ns per iteration > no deps, unpredictable -- C code took 13.99ns per iteration > no deps, unpredictable -- cmov code took 13.76ns per iteration > no deps, unpredictable -- jmp code took 26.12ns per iteration > has deps, unpredictable -- C code took 120.37ns per iteration > has deps, unpredictable -- cmov code took 120.76ns per iteration > has deps, unpredictable -- jmp code took 165.82ns per iteration At least for the __builtin_expect case, I guess this is showing that gcc now does exactly what we'd like of it. > The patch is quite SPEC neutral, saving 190Kb in FDO binaries. Still I > think it is worthwhile to have especially because I do believe that all > the target COST predicates should be populated by hotness argument so we > get same results for -Os or -O2 with profile feeback specifying that > nothing is executed or if one marks all functions cold. > At the moment profile feedback with all functions not executed leads to > code smaller than -O2 but closer to -O2 than -Os so there is quite some > fruit here. With LTO or for codebases with more __builtin_expect and > cold hints like kernel or libstdc++ we can get a lot of this benefits > without FDO too. I hope so too. For the kernel we have some parts where __builtin_expect is used quite a lot and noticably helps, and could help even more if we cut down the use of cmov too. I guess on architectures with even more predictated instructions it could be even more useful too. Thanks, Nick