------- Comment #6 from anton at mips dot complang dot tuwien dot ac dot at 2005-12-08 21:31 ------- Subject: Re: pessimization of goto * ("computed goto")
pinskia at gcc dot gnu dot org wrote: > ------- Comment #5 from pinskia at gcc dot gnu dot org 2005-12-06 21:58 > ------- > (In reply to comment #4) > > So, no, the code is not worse, but much better. I hope this > > workaround will continue to work in future versions. > > You are wrong in general since this is a conditional indirect jump. Since it > is conditional it means that it is going to do a jump and the locatity reasons > are that important as like in the old days when there was a little code > cache. > In fact have doing jne instead of jeq might cause the branch mispridected. Sorry, you lost me here. Conditional branch predictors on current general-purpose CPUs are history-based, and I would not expect any difference in the accuracy of the conditional branch prediction. However, for BTB-based indirect branch predictors (Pentium 3, Athlon 64, and (modulo replication from the trace cache) Pentium 4), the branch prediction accuracy suffers quite a lot if you combine several well-predictable indirect branches with different targets into a single indirect branch. See [ertl&gregg03] for a deeper discussion, in particular Section 3. You can also read in Section 5.2 (towards the end) why we don't want to have a jump to far-away places. > Note if you were actually using a target which have conditional indirect jumps > this would be a bug (PPC for an example from either lr or ctr register, see PR > 25287 for a bug report about that). Sure, having a conditional indirect jump in-line would be nice. But if the architecture does not have it (or if gcc does not utilize it), what I would like to see in the resulting code is: 1) We compile with -fno-reorder-blocks, so the indirect branch should be in the place corresponding to the source code, not somewhere else. 2) If you do reorder the blocks, you should not merge indirect branches on CPUs with BTBs, for better branch prediction. BTW, the __asm__("") workaround works nicely (for now), so I could produce numbers for the slowdown for this bug, if you are interested. @InProceedings{ertl&gregg03, author = "M. Anton Ertl and David Gregg", title = "Optimizing Indirect Branch Prediction Accuracy in Virtual Machine Interpreters", crossref = "sigplan03", OPTpages = "", url = "http://www.complang.tuwien.ac.at/papers/ertl%26gregg03.ps.gz", abstract = "Interpreters designed for efficiency execute a huge number of indirect branches and can spend more than half of the execution time in indirect branch mispredictions. Branch target buffers are the best widely available\mn{on all recent general-purpose machines?} form of indirect branch prediction; however, their prediction accuracy for existing interpretes is only 2\%--50\%. In this paper we investigate two methods for improving the prediction accuracy of BTBs for interpreters: replicating virtual machine (VM) instructions and combining sequences of VM instructions into superinstructions. We investigate static (interpreter build-time) and dynamic (interpreter run-time) variants of these techniques and compare them and several combinations of these techniques. These techniques can eliminate nearly all of the dispatch branch mispredictions, and have other benefits, resulting in speedups by a factor of up to 3.17 over efficient threaded-code interpreters, and speedups by a factor of up to 1.3 over techniques relying on superinstructions alone." } - anton -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25285