On Tue, Jun 17, 2025 at 5:43 AM Jeff Law <jeffreya...@gmail.com> wrote:
>
>
>
> On 6/16/25 10:08 PM, Dongyan Chen wrote:
> > Hi, I've come across a question regarding the branch cost of gcc. In the 
> > link
> > https://gcc.godbolt.org/z/hnddevd5h, gcc fails to recognize the optimization
> > branch judgment, while llvm does. I eventually discovered that the value of 
> > the branch
> > cost was too small. Moreover, in that link, if I add "-mbranch-cost=4" (a 
> > larger
> > number can also be used) for gcc, the zicond extension functions properly. 
> > So, is
> > it necessary to modify the branch cost for gcc? According to the source 
> > code, the
> > default mtun is rocket, which has a branch cost of 3. I think it should be 
> > set to 4.
> >
> > gcc/ChangeLog:
> >
> >       * config/riscv/riscv.cc: Change the branch cost.
> >
> > gcc/testsuite/ChangeLog:
> >
> >       * 
> > gcc.target/riscv/zicond-primitiveSemantics_compare_reg_reg_return_reg_reg.c:
> >  New test.
> So I'd be a lot more comfortable with this if someone that knows the
> rocket uarch could chime in or if we had wider data on how this behaves
> in general.

I designed Rocket, so I can confirm Yangyu's comment that the branch
misprediction penalty is usually 3 cycles.  (It's actually 4 if the
address of the correct-path instruction is a 4-byte-long instruction
that isn't naturally aligned, but that should only happen around a
quarter of the time.)  Since it's a single-issue core, that means a
mispredicted branch has a cost of around 4 instructions, and a
correctly predicted one of course has a cost of around 1 instruction.
A branch cost of 4 is therefore an overestimate, since even programs
dominated by unpredictable branches will exhibit correct predictions
some fraction of the time.

For the loop in the original post, Rocket would take 7
cycles/iteration with Zicond.  Without Zicond, if we assume the branch
is taken half the time and is completely unpredictable (thus predicted
correctly half the time), then it will take either 4, 5, 7, or 8
cycles with equal probability, averaging out to 6.  Under that model,
Zicond appears to be a de-optimization for Rocket.  And of course the
branch might be more predictable than that, depending on the
distribution of values and where the search key falls within that
distribution.

I would think the best answer here is to use the right -mtune setting
for a given core (and of course make sure that core's parameters are
accurate), but I'll butt out of that decision.

>  One pico-sized benchmark isn't a great way to evaluate
> something like this.
>
> WRT a followup from Yangyu which touches on the BPI.  My son is
> currently putting together a tuning and scheduler model for the spacemit
> x60 chip in that system under my guidance.  I expect we'll start
> benchmarking that work next week.  We can look at the branch costing
> model for that design as a part of that work.
>
>
> Jeff

Reply via email to