Hi Joern,

I've also been thinking about implementing something like that.  First I would
experiment with implementing switches using perfect hashing functions to see if
this really speeds up execution on current CPUs.  Then I would try to come up
with some heuristic that tells GCC when it is advantageous to do this.  Cost
modeling sounds good but maybe one could start with a prototype that uses
simpler heuristics.

I'm still relatively a newcomer though, so this would take me some non-trivial
amount of time and I sadly already have a rather big project on my plate.
But I'm certainly up to try this in the future!  I'm planning to attend the
upcoming Cauldron so maybe we'll chat about this there.

Cheers,
Filip Kastl


On Sun 2025-05-25 04:57:15, Joern Wolfgang Rennecke wrote:
> This has come up several time over the years:
> https://gcc.gnu.org/legacy-ml/gcc/2006-07/msg00158.html
> https://gcc.gnu.org/legacy-ml/gcc/2006-07/msg00155.html
> https://gcc.gnu.org/pipermail/gcc/2010-March/190234.html
> 
> , but maybe now (or maybe a while ago) is the right time to do this,
> considering the changes in relative costs of basic operations?
> Multiply and barrel shift are cheap on many modern microarchitectures.
> Control flow and non-linear memory access is expensive.
> 
> FWIW, [Dietz92]
> https://docs.lib.purdue.edu/cgi/viewcontent.cgi?article=1312&context=ecetr
> mentions multiply in passing as unpractical for SPARC because of cost, but
> modern CPUs can often do a multiply in a single cycle.
> 
> Approximating division and scaling, for a case value x, we can calculate an
> index or offset into a table as
> f(x) = x*C1 >> C2 & M
> 
> For an index, M masks off the upper bits so that the index fits into
> a table that has a number of elements that is a power of two.
> For architectures where a non-scaled index in cheaper to use than
> a scaled one, we compute an offset by having M also mask off the lower
> bits.
> 
> Each table entry contains a jump address (or offset) and a key - at least if
> both are the same size; for different sizes, it might be cheaper to have two
> tables.
> If we have found values for C1 and C2 that give a perfect hash, we can
> immediately dispatch to the default case for a non-match; otherwise, we can
> have decision trees at the jump destinations, each using the comparison with
> the key from the table for the first decision.
> No separate range check is necessary, so if the multiply is fast enough,
> this should be close in performance to an ordinary tablejump.
> 
> This dispatch method can be used for tables that are too sparse for a
> tablejump,but have enough cases to justify the overhead (depending on
> multiple conditional vs single indirect branch costs, the latter might be a
> low bar).
> 
> I suppose we could make tree-switch-conversion.cc use rtx costs to compare
> the hash implementation to a decision tree, or have a hook make the decision
> - and the default for the hook might use rtx costs...

Reply via email to