[Bug rtl-optimization/25285] pessimization of goto * ("computed goto")

anton at mips dot complang dot tuwien dot ac dot at Thu, 08 Dec 2005 13:31:31 -0800


------- Comment #6 from anton at mips dot complang dot tuwien dot ac dot at  
2005-12-08 21:31 -------
Subject: Re:  pessimization of goto * ("computed goto")

pinskia at gcc dot gnu dot org wrote:
> ------- Comment #5 from pinskia at gcc dot gnu dot org  2005-12-06 21:58 
> -------
> (In reply to comment #4)
> > So, no, the code is not worse, but much better.  I hope this
> > workaround will continue to work in future versions.
> 
> You are wrong in general since this is a conditional indirect jump.  Since it
> is conditional it means that it is going to do a jump and the locatity reasons
> are that important as like in the old days when there was a little code 
> cache. 
> In fact have doing jne instead of jeq might cause the branch mispridected. 

Sorry, you lost me here.  Conditional branch predictors on current
general-purpose CPUs are history-based, and I would not expect any
difference in the accuracy of the conditional branch prediction.
However, for BTB-based indirect branch predictors (Pentium 3, Athlon
64, and (modulo replication from the trace cache) Pentium 4), the
branch prediction accuracy suffers quite a lot if you combine several
well-predictable indirect branches with different targets into a
single indirect branch.

See [ertl&gregg03] for a deeper discussion, in particular Section 3.
You can also read in Section 5.2 (towards the end) why we don't want
to have a jump to far-away places.

> Note if you were actually using a target which have conditional indirect jumps
> this would be a bug (PPC for an example from either lr or ctr register, see PR
> 25287 for a bug report about that).

Sure, having a conditional indirect jump in-line would be nice.

But if the architecture does not have it (or if gcc does not utilize
it), what I would like to see in the resulting code is:

1) We compile with -fno-reorder-blocks, so the indirect branch should
be in the place corresponding to the source code, not somewhere else.

2) If you do reorder the blocks, you should not merge indirect
branches on CPUs with BTBs, for better branch prediction.

BTW, the __asm__("") workaround works nicely (for now), so I could
produce numbers for the slowdown for this bug, if you are interested.

@InProceedings{ertl&gregg03,
  author =       "M. Anton Ertl and David Gregg",
  title =        "Optimizing Indirect Branch Prediction Accuracy in Virtual
Machine Interpreters",
  crossref =     "sigplan03",
  OPTpages =     "",
  url =         
"http://www.complang.tuwien.ac.at/papers/ertl%26gregg03.ps.gz";,
  abstract =     "Interpreters designed for efficiency execute a huge
                  number of indirect branches and can spend more than
                  half of the execution time in indirect branch
                  mispredictions.  Branch target buffers are the best
                  widely available\mn{on all recent general-purpose
                  machines?} form of indirect branch prediction;
                  however, their prediction accuracy for existing
                  interpretes is only 2\%--50\%.  In this paper we
                  investigate two methods for improving the prediction
                  accuracy of BTBs for interpreters: replicating
                  virtual machine (VM) instructions and combining
                  sequences of VM instructions into superinstructions.
                  We investigate static (interpreter build-time) and
                  dynamic (interpreter run-time) variants of these
                  techniques and compare them and several combinations
                  of these techniques.  These techniques can eliminate
                  nearly all of the dispatch branch mispredictions,
                  and have other benefits, resulting in speedups by a
                  factor of up to 3.17 over efficient threaded-code
                  interpreters, and speedups by a factor of up to 1.3
                  over techniques relying on superinstructions alone."
}

- anton

-- 

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25285

[Bug rtl-optimization/25285] pessimization of goto * ("computed goto")

Reply via email to