Florian Weimer via Gcc <gcc@gcc.gnu.org> writes:

> * Andi Kleen via Gcc:
>
>> Joern Wolfgang Rennecke <g...@amylaar.uk> writes:
>>
>>> This has come up several time over the years:
>>> https://gcc.gnu.org/legacy-ml/gcc/2006-07/msg00158.html
>>> https://gcc.gnu.org/legacy-ml/gcc/2006-07/msg00155.html
>>> https://gcc.gnu.org/pipermail/gcc/2010-March/190234.html
>>>
>>> , but maybe now (or maybe a while ago) is the right time to do this,
>>> considering the changes in relative costs of basic operations?
>>> Multiply and barrel shift are cheap on many modern microarchitectures.
>>
>> There are much more powerfull primitives for hashes on modern CPUs, like
>> CRC or AES rounds. I don't know if anybody has tried to do a
>> perfect hash on top of those though, but perhaps it wouldn't need
>> to be.
>
> Also carry-less multiply persumably.
>
> It's challenging to use those instructions for compiling switch
> statements because they would then be used all over the place.  But
> maybe Fibonacci hashing is good enough.

If generic vectorization makes sense this makes sense too.

>>> Control flow and non-linear memory access is expensive.
>>
>> Before going to hashes I would rather exploit SIMD first, e.g. to do
>> multiple comparisons in a single instruction. Modern hash
>> table implementations like Swiss tables are already doing that
>> for good results.
>
> What are you proposing?  Broadcast the scrutinee to a vector register,
> perform a vector compare against possible switch values, and then
> extract the index of the non-zero component, and then use that for a
> table lookup?

There are many things you can do. One simple case is multiple cases
for a single label where you don't need to extract an index.
But yes extracting indexes works too for more complex cases.

Another approach is to use the shuffle instructions or the SSE 4.2
string instructions. More complex techniques are here [2]

Once you have an index you don't necessarily need a table, for example
upto four cases can be looked up with just condition codes on x86 (see [1])
although one has to be careful to not exceed branch predictor density
limits this way.

The vector compare can also do ranges.

It can also be used to short cut upper level search tree lookups.

[1] https://halobates.de/blog/p/214
[2] http://0x80.pl/notesen/2018-10-18-simd-byte-lookup.html

-Andi

Reply via email to