Re: [RFA] optimizing predictable branches on x86

Nick Piggin Mon, 03 Mar 2008 04:26:05 -0800

On Monday 03 March 2008 22:38, Jan Hubicka wrote:
> Hi,
> I had to tweak the testcase a bit to not compute minimum: GCC optimizes
> this early into MIN_EXPR throwing away any profile information.  If we
> get serious here we can maintain it via histogram, but I am not sure it
> is worth the effort at least until IL is sanitized and expansion cleaned
> up with tupple branch.
>
> I also had to fix bug in branch prediction ignoring __builtin_expect of
> any early inlined function and update your testcase to not use
> __buliltin_expect in predictable case.


I guess you mean, not to use it in the _unpredictable_ case?


> However this is what I get on AthlonXP:
>  no deps,   predictable -- C    code took  13.71ns per iteration
>  no deps,   predictable -- cmov code took  13.83ns per iteration
>  no deps,   predictable -- jmp  code took  13.94ns per iteration
>  has deps,   predictable -- C    code took  15.54ns per iteration
>  has deps,   predictable -- cmov code took  22.21ns per iteration
>  has deps,   predictable -- jmp  code took  16.55ns per iteration
>  no deps, unpredictable -- C    code took  13.99ns per iteration
>  no deps, unpredictable -- cmov code took  13.76ns per iteration
>  no deps, unpredictable -- jmp  code took  26.12ns per iteration
>  has deps, unpredictable -- C    code took  120.37ns per iteration
>  has deps, unpredictable -- cmov code took  120.76ns per iteration
>  has deps, unpredictable -- jmp  code took  165.82ns per iteration

At least for the __builtin_expect case, I guess this is showing
that gcc now does exactly what we'd like of it.


> The patch is quite SPEC neutral, saving 190Kb in FDO binaries.  Still I
> think it is worthwhile to have especially because I do believe that all
> the target COST predicates should be populated by hotness argument so we
> get same results for -Os or -O2 with profile feeback specifying that
> nothing is executed or if one marks all functions cold.
> At the moment profile feedback with all functions not executed leads to
> code smaller than -O2 but closer to -O2 than -Os so there is quite some
> fruit here. With LTO or for codebases with more __builtin_expect and
> cold hints like kernel or libstdc++ we can get a lot of this benefits
> without FDO too.

I hope so too. For the kernel we have some parts where
__builtin_expect is used quite a lot and noticably helps, and could
help even more if we cut down the use of cmov too. I guess on
architectures with even more predictated instructions it could be
even more useful too.

Thanks,
Nick

Re: [RFA] optimizing predictable branches on x86

Reply via email to