[Cython] Resurrecting __dict__ for extension types

2012-06-06 Thread Ian Bell
As per a couple of discussions online (
http://mail.python.org/pipermail/cython-devel/2011-February/000122.html),
it looks like at one point it was pretty close to being able to
programmatically and automatically generate a __dict__ for extension types
like for CPython classes.  I have to manually code a function that does
exactly what __dict__ should do, and it is a pain.  I have some classes
with tens of attributes, and that is already a big enough pain.  This is
especially useful to more easily enable deepcopy and pickling for classes.

While on the pickling theme, it seems it really ought to be pretty
straightforward to automatically pickle extension types.  Don't you already
have all the necessary information at compile time?  This was on the wish
list at one point if I am not mistaken and would be very useful to me and
lots of other people.

I'm finally loving coding in Cython and am finally making sense of how best
to use extension types.

Regards,
Ian
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] Hash-based vtables

2012-06-06 Thread Stefan Behnel
mark florisson, 05.06.2012 22:33:
> It doesn't even necessarily have to be about running user code, a user
> could craft data input which causes such a situation. For instance,
> let's say we have a just-in-time specializer which specializes a
> function for the runtime input types, and the types depend on the user
> input. For instance, if we write a web application we can post arrays
> to described by a custom dtype, which draws pictures in some weird way
> for us. We can get it to specialize pretty much any array type, so
> that gives us a good opportunity to find collisions.

Yes, and the bad thing is that a very high probability of having no
collisions even in combination with the need for a huge amount of brute
force work to find one is not enough. An attacker (or otherwise interested
user) may just be lucky, and given how low in the application stack this
will be used, such a bit of luck may have massive consequences.

Stefan
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] Hash-based vtables

2012-06-06 Thread Dag Sverre Seljebotn


Stefan Behnel  wrote:

>mark florisson, 05.06.2012 22:33:
>> It doesn't even necessarily have to be about running user code, a
>user
>> could craft data input which causes such a situation. For instance,
>> let's say we have a just-in-time specializer which specializes a
>> function for the runtime input types, and the types depend on the
>user
>> input. For instance, if we write a web application we can post arrays
>> to described by a custom dtype, which draws pictures in some weird
>way
>> for us. We can get it to specialize pretty much any array type, so
>> that gives us a good opportunity to find collisions.
>
>Yes, and the bad thing is that a very high probability of having no
>collisions even in combination with the need for a huge amount of brute
>force work to find one is not enough. An attacker (or otherwise
>interested
>user) may just be lucky, and given how low in the application stack
>this
>will be used, such a bit of luck may have massive consequences.

Following that line of argument, I guess you keep your money in a mattress 
then? Our modern world is built around the assumption that people don't get 
*that* lucky.

(I agree though that 64 bits is not enough for the security usecase! I'm just 
saying that 160 or 256 bits would be.)

Dag


>
>Stefan
>___
>cython-devel mailing list
>cython-devel@python.org
>http://mail.python.org/mailman/listinfo/cython-devel

-- 
Sent from my Android phone with K-9 Mail. Please excuse my brevity.
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] Hash-based vtables

2012-06-06 Thread mark florisson
On 6 June 2012 10:11, Dag Sverre Seljebotn  wrote:
>
>
> Stefan Behnel  wrote:
>
>>mark florisson, 05.06.2012 22:33:
>>> It doesn't even necessarily have to be about running user code, a
>>user
>>> could craft data input which causes such a situation. For instance,
>>> let's say we have a just-in-time specializer which specializes a
>>> function for the runtime input types, and the types depend on the
>>user
>>> input. For instance, if we write a web application we can post arrays
>>> to described by a custom dtype, which draws pictures in some weird
>>way
>>> for us. We can get it to specialize pretty much any array type, so
>>> that gives us a good opportunity to find collisions.
>>
>>Yes, and the bad thing is that a very high probability of having no
>>collisions even in combination with the need for a huge amount of brute
>>force work to find one is not enough. An attacker (or otherwise
>>interested
>>user) may just be lucky, and given how low in the application stack
>>this
>>will be used, such a bit of luck may have massive consequences.
>
> Following that line of argument, I guess you keep your money in a mattress 
> then? Our modern world is built around the assumption that people don't get 
> *that* lucky.
>
> (I agree though that 64 bits is not enough for the security usecase! I'm just 
> saying that 160 or 256 bits would be.)
>
> Dag
>

I think we're arguing different things. You agree to the security
problem, but Stefan was still emphasizing his old point.

>>
>>Stefan
>>___
>>cython-devel mailing list
>>cython-devel@python.org
>>http://mail.python.org/mailman/listinfo/cython-devel
>
> --
> Sent from my Android phone with K-9 Mail. Please excuse my brevity.
> ___
> cython-devel mailing list
> cython-devel@python.org
> http://mail.python.org/mailman/listinfo/cython-devel
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] Hash-based vtables

2012-06-06 Thread Dag Sverre Seljebotn

On 06/06/2012 11:11 AM, Dag Sverre Seljebotn wrote:



Stefan Behnel  wrote:


mark florisson, 05.06.2012 22:33:

It doesn't even necessarily have to be about running user code, a

user

could craft data input which causes such a situation. For instance,
let's say we have a just-in-time specializer which specializes a
function for the runtime input types, and the types depend on the

user

input. For instance, if we write a web application we can post arrays
to described by a custom dtype, which draws pictures in some weird

way

for us. We can get it to specialize pretty much any array type, so
that gives us a good opportunity to find collisions.


Yes, and the bad thing is that a very high probability of having no
collisions even in combination with the need for a huge amount of brute
force work to find one is not enough. An attacker (or otherwise
interested
user) may just be lucky, and given how low in the application stack
this
will be used, such a bit of luck may have massive consequences.


Following that line of argument, I guess you keep your money in a mattress 
then? Our modern world is built around the assumption that people don't get 
*that* lucky.

(I agree though that 64 bits is not enough for the security usecase! I'm just 
saying that 160 or 256 bits would be.)


(And just to be clear, my current stance is in favour of using interning 
for the ID comparison, in the other head of this thread. I just couldn't 
resist Stefan's bait.)


Dag
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] Hash-based vtables

2012-06-06 Thread Dag Sverre Seljebotn

On 06/05/2012 12:30 AM, Robert Bradshaw wrote:

I just found http://cmph.sourceforge.net/ which looks quite
interesting. Though the resulting hash functions are supposedly cheap,
I have the feeling that branching is considered cheap in this context.


Actually, this lead was *very* promising. I believe the very first 
reference I actually read through and didn't eliminate after the 
abstract totally swept away our home-grown solutions!


"Hash & Displace" by Pagh (1999) is actually very simple, easy to 
understand, and fast both for generation and (the branch-free) lookup:


http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.69.3753&rep=rep1&type=pdf

The idea is:

 - Find a hash `g(x)` to partition the keys into `b` groups (the paper 
requires b > 2n, though I think in practice you can often get away with 
less)


 - Find a hash `f(x)` such that f is 1:1 within each group (which is 
easily achieved since groups only has a few elements)


 - For each group, from largest to smallest: Find a displacement 
`d[group]` so that `f(x) ^ d` doesn't cause collisions.


It requires extra storage for the displacement table. However, I think 8 
bits per element might suffice even for vtables of 512 or 1024 in size. 
Even with 16 bits it's rather negligible compared to the minimum-128-bit 
entries of the table.


I benchmarked these hash functions:

displace1: ((h >> r1) ^ d[h & 63]) & m1
displace2: ((h >> r1) ^ d[h & m2]) & m1
displace3: ((h >> r1) ^ d[(h >> r2) & m2]) & m1

Only the third one is truly in the spirit of the algorithm, but I think 
the first two should work well too (and when h is known compile-time, 
looking up d[h & 63] isn't harder than looking up r1 or m1).


My computer is acting up and all my numbers today are slower than the 
earlier ones (yes, I've disabled turbo-mode in the BIOS for a year ago, 
and yes, I've pinned the CPU speed). But here's today's numbers, 
compiled with -DIMHASH:


 direct: min=5.37e-09  mean=5.39e-09  std=1.96e-11 
val=24.00
  index: min=6.45e-09  mean=6.46e-09  std=1.15e-11 
val=18.00
   twoshift: min=6.99e-09  mean=7.00e-09  std=1.35e-11 
val=18.00
 threeshift: min=7.53e-09  mean=7.54e-09  std=1.63e-11 
val=18.00
  displace1: min=6.99e-09  mean=7.00e-09  std=1.66e-11 
val=18.00
  displace2: min=6.99e-09  mean=7.02e-09  std=2.77e-11 
val=18.00
  displace3: min=7.52e-09  mean=7.54e-09  std=1.19e-11 
val=18.00



I did a dirty prototype of the table-finder as well and it works:

https://github.com/dagss/hashvtable/blob/master/pagh99.py

Dag
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] Hash-based vtables

2012-06-06 Thread Dag Sverre Seljebotn

On 06/06/2012 10:41 PM, Dag Sverre Seljebotn wrote:

On 06/05/2012 12:30 AM, Robert Bradshaw wrote:

I just found http://cmph.sourceforge.net/ which looks quite
interesting. Though the resulting hash functions are supposedly cheap,
I have the feeling that branching is considered cheap in this context.


Actually, this lead was *very* promising. I believe the very first
reference I actually read through and didn't eliminate after the
abstract totally swept away our home-grown solutions!

"Hash & Displace" by Pagh (1999) is actually very simple, easy to
understand, and fast both for generation and (the branch-free) lookup:

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.69.3753&rep=rep1&type=pdf


The idea is:

- Find a hash `g(x)` to partition the keys into `b` groups (the paper
requires b > 2n, though I think in practice you can often get away with
less)

- Find a hash `f(x)` such that f is 1:1 within each group (which is
easily achieved since groups only has a few elements)

- For each group, from largest to smallest: Find a displacement
`d[group]` so that `f(x) ^ d` doesn't cause collisions.

It requires extra storage for the displacement table. However, I think 8
bits per element might suffice even for vtables of 512 or 1024 in size.
Even with 16 bits it's rather negligible compared to the minimum-128-bit
entries of the table.

I benchmarked these hash functions:

displace1: ((h >> r1) ^ d[h & 63]) & m1
displace2: ((h >> r1) ^ d[h & m2]) & m1
displace3: ((h >> r1) ^ d[(h >> r2) & m2]) & m1

Only the third one is truly in the spirit of the algorithm, but I think
the first two should work well too (and when h is known compile-time,
looking up d[h & 63] isn't harder than looking up r1 or m1).

My computer is acting up and all my numbers today are slower than the
earlier ones (yes, I've disabled turbo-mode in the BIOS for a year ago,
and yes, I've pinned the CPU speed). But here's today's numbers,
compiled with -DIMHASH:

direct: min=5.37e-09 mean=5.39e-09 std=1.96e-11 val=24.00
index: min=6.45e-09 mean=6.46e-09 std=1.15e-11 val=18.00
twoshift: min=6.99e-09 mean=7.00e-09 std=1.35e-11 val=18.00
threeshift: min=7.53e-09 mean=7.54e-09 std=1.63e-11 val=18.00
displace1: min=6.99e-09 mean=7.00e-09 std=1.66e-11 val=18.00
displace2: min=6.99e-09 mean=7.02e-09 std=2.77e-11 val=18.00
displace3: min=7.52e-09 mean=7.54e-09 std=1.19e-11 val=18.00


I did a dirty prototype of the table-finder as well and it works:

https://github.com/dagss/hashvtable/blob/master/pagh99.py


The paper obviously puts more effort on minimizing table size and not a 
fast lookup. My hunch is that our choice should be


((h >> table.r) ^ table.d[h & m2]) & m1

and use 8-bits d (because even if you have 1024 methods, you'd rather 
double the number of bins than those 2 extra bits available for 
displacement options).


Then keep incrementing the size of d and the number of table slots (in 
such an order that the total vtable size is minimized) until success. In 
practice this should almost always just increase the size of d, and keep 
the table size at the lowest 2**k that fits the slots (even for 64 
methods or 128 methods :-))


Essentially we avoid the shift in the argument to d[] by making d larger.

Dag
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] Hash-based vtables

2012-06-06 Thread Robert Bradshaw
On Tue, Jun 5, 2012 at 2:41 PM, Dag Sverre Seljebotn
 wrote:
> On 06/05/2012 10:50 PM, Robert Bradshaw wrote:
>>
>> On Tue, Jun 5, 2012 at 1:10 PM, Dag Sverre Seljebotn
>>   wrote:
>>>
>>> On 06/04/2012 11:43 PM, Robert Bradshaw wrote:


 On Mon, Jun 4, 2012 at 1:55 PM, Dag Sverre Seljebotn
     wrote:
>
>
> On 06/04/2012 09:44 PM, Dag Sverre Seljebotn wrote:
>>
>>
>>
>> Me and Robert had a long discussion on the NumFOCUS list about this
>> already, but I figured it was better to continue it and provide more
>> in-depth benchmark results here.
>>
>> It's basically a new idea of how to provide a vtable based on perfect
>> hashing, which should be a lot simpler to implement than what I first
>> imagined.
>>
>> I'll write down some context first, if you're familiar with this
>> skip ahead a bit..
>>
>> This means that you can do fast dispatches *without* the messy
>> business of binding vtable slots at compile time. To be concrete, this
>> might e.g. take the form
>>
>> def f(obj):
>> obj.method(3.4) # try to find a vtable with "void method(double)" in
>> it
>>
>> or, a more typed approach,
>>
>> # File A
>> cdef class MyImpl:
>> def double method(double x): return x * x
>>
>> # File B
>> # Here we never know about MyImpl, hence "duck-typed"
>> @cython.interface
>> class MyIntf:
>> def double method(double x): pass
>>
>> def f(MyIntf obj):
>> # obj *can* be MyImpl instance, or whatever else that supports
>> # that interface
>> obj.method(3.4)
>>
>>
>> Now, the idea to implement this is:
>>
>> a) Both caller and callee pre-hash name/argument string
>> "mymethod:iidd" to 64 bits of hash data (probably lower 64 bits of
>> md5)
>>
>> b) Callee (MyImpl) generates a vtable of its methods by *perfect*
>> hashing. What you do is define a final hash fh as a function
>> of the pre-hash ph, for instance
>>
>> fh = ((ph>>    vtable.r1) ^ (ph>>    vtable.r2) ^ (ph>>
>>  vtable.r3))&
>> vtable.m
>>
>> (Me and Robert are benchmarking different functions to use here.) By
>> playing with r1, r2, r3, you have 64**3 choices of hash function, and
>> will be able to pick a combination which gives *no* (or very few)
>> collisions.
>>
>> c) Caller then combines the pre-hash generated at compile-time, with
>> r1, r2, r3, m stored in the vtable header, in order to find the
>> final location in the hash-table.
>>
>> The exciting thing is that in benchmark, the performance penalty is
>> actually very slight over a C++-style v-table. (Of course you can
>> cache a proper vtable, but the fact that you get so close without
>> caring about caching means that this can be done much faster.)



 One advantage about caching a vtable is that one can possibly put in
 adapters for non-exact matches. It also opens up the possibility of
 putting in stubs to call def methods if they exist. This needs to be
 fleshed out more, (another CEP :) but could provide for a
 backwards-compatible easy first implementation.

>> Back to my and Robert's discussion on benchmarks:
>>
>> I've uploaded benchmarks here:
>>
>> https://github.com/dagss/hashvtable/tree/master/dispatchbench
>>
>> I've changed the benchmark taking to give more robust numbers (at
>> least for me), you want to look at the 'min' column.
>>
>> I changed the benchmark a bit so that it benchmarks a *callsite*.
>> So we don't pass 'h' on the stack, but either a) looks it up in a
>> global
>> variable (default), or b) it's a compile-time constant (immediate in
>> assembly) (compile with -DIMHASH).
>>
>> Similarly, the ID is either an "interned" global variable, or an
>> immediate (-DIMID).
>>
>> The results are very different on my machine depending on this aspect.
>> My conclusions:
>>
>> - Both three shifts with masking, two shifts with a "fallback slot"
>> (allowing for a single collision), three shifts, two shifts with
>> two masks allows for constructing good vtables. In the case of only
>> two shifts, one colliding method gets the twoshift+fback
>> performance and the rest gets the twoshift performance.
>>
>> - Performance is really more affected by whether hashes are
>> immediates or global variables than the hash function. This is in
>> contrast to the interning vs. key benchmarks -- so I think that if
>> we looked up the vtable through PyTypeObject, rather than getting
>> the vtable directly, the loads of the global variables could
>> potentially be masked by that.
>>
>> - My conclusion: Just use lower bits of md5 *both* for the hashing
>> and the ID-ing (don't bother with any interning), and compile the
>> thing as a 

Re: [Cython] Hash-based vtables

2012-06-06 Thread Robert Bradshaw
On Wed, Jun 6, 2012 at 1:57 PM, Dag Sverre Seljebotn
 wrote:
> On 06/06/2012 10:41 PM, Dag Sverre Seljebotn wrote:
>>
>> On 06/05/2012 12:30 AM, Robert Bradshaw wrote:
>>>
>>> I just found http://cmph.sourceforge.net/ which looks quite
>>> interesting. Though the resulting hash functions are supposedly cheap,
>>> I have the feeling that branching is considered cheap in this context.
>>
>>
>> Actually, this lead was *very* promising. I believe the very first
>> reference I actually read through and didn't eliminate after the
>> abstract totally swept away our home-grown solutions!
>>
>> "Hash & Displace" by Pagh (1999) is actually very simple, easy to
>> understand, and fast both for generation and (the branch-free) lookup:
>>
>>
>> http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.69.3753&rep=rep1&type=pdf
>>
>>
>> The idea is:
>>
>> - Find a hash `g(x)` to partition the keys into `b` groups (the paper
>> requires b > 2n, though I think in practice you can often get away with
>> less)
>>
>> - Find a hash `f(x)` such that f is 1:1 within each group (which is
>> easily achieved since groups only has a few elements)
>>
>> - For each group, from largest to smallest: Find a displacement
>> `d[group]` so that `f(x) ^ d` doesn't cause collisions.
>>
>> It requires extra storage for the displacement table. However, I think 8
>> bits per element might suffice even for vtables of 512 or 1024 in size.
>> Even with 16 bits it's rather negligible compared to the minimum-128-bit
>> entries of the table.
>>
>> I benchmarked these hash functions:
>>
>> displace1: ((h >> r1) ^ d[h & 63]) & m1
>> displace2: ((h >> r1) ^ d[h & m2]) & m1
>> displace3: ((h >> r1) ^ d[(h >> r2) & m2]) & m1
>>
>> Only the third one is truly in the spirit of the algorithm, but I think
>> the first two should work well too (and when h is known compile-time,
>> looking up d[h & 63] isn't harder than looking up r1 or m1).
>>
>> My computer is acting up and all my numbers today are slower than the
>> earlier ones (yes, I've disabled turbo-mode in the BIOS for a year ago,
>> and yes, I've pinned the CPU speed). But here's today's numbers,
>> compiled with -DIMHASH:
>>
>> direct: min=5.37e-09 mean=5.39e-09 std=1.96e-11 val=24.00
>> index: min=6.45e-09 mean=6.46e-09 std=1.15e-11 val=18.00
>> twoshift: min=6.99e-09 mean=7.00e-09 std=1.35e-11 val=18.00
>> threeshift: min=7.53e-09 mean=7.54e-09 std=1.63e-11 val=18.00
>> displace1: min=6.99e-09 mean=7.00e-09 std=1.66e-11 val=18.00
>> displace2: min=6.99e-09 mean=7.02e-09 std=2.77e-11 val=18.00
>> displace3: min=7.52e-09 mean=7.54e-09 std=1.19e-11 val=18.00
>>
>>
>> I did a dirty prototype of the table-finder as well and it works:
>>
>> https://github.com/dagss/hashvtable/blob/master/pagh99.py
>
>
> The paper obviously puts more effort on minimizing table size and not a fast
> lookup. My hunch is that our choice should be
>
> ((h >> table.r) ^ table.d[h & m2]) & m1
>
> and use 8-bits d (because even if you have 1024 methods, you'd rather double
> the number of bins than those 2 extra bits available for displacement
> options).
>
> Then keep incrementing the size of d and the number of table slots (in such
> an order that the total vtable size is minimized) until success. In practice
> this should almost always just increase the size of d, and keep the table
> size at the lowest 2**k that fits the slots (even for 64 methods or 128
> methods :-))
>
> Essentially we avoid the shift in the argument to d[] by making d larger.

Nice. I'm surprised that the indirection on d doesn't cost us much;
hopefully its size wouldn't be a big issue either. What kinds of
densities were you achieving?

Going back to the idea of linear probing on a cache miss, this has the
advantage that one can write a brain-dead provider that sets m=0 and
simply lists the methods instead of requiring a table optimizer. (Most
tools, of course, would do the table optimization.) It also lets you
get away with a "kind-of good" hash rather than requiring you search
until you find a (larger?) perfect one.

- Robert
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] Hash-based vtables

2012-06-06 Thread Dag Sverre Seljebotn

On 06/06/2012 11:16 PM, Robert Bradshaw wrote:

On Wed, Jun 6, 2012 at 1:57 PM, Dag Sverre Seljebotn
  wrote:

On 06/06/2012 10:41 PM, Dag Sverre Seljebotn wrote:


On 06/05/2012 12:30 AM, Robert Bradshaw wrote:


I just found http://cmph.sourceforge.net/ which looks quite
interesting. Though the resulting hash functions are supposedly cheap,
I have the feeling that branching is considered cheap in this context.



Actually, this lead was *very* promising. I believe the very first
reference I actually read through and didn't eliminate after the
abstract totally swept away our home-grown solutions!

"Hash&  Displace" by Pagh (1999) is actually very simple, easy to
understand, and fast both for generation and (the branch-free) lookup:


http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.69.3753&rep=rep1&type=pdf


The idea is:

- Find a hash `g(x)` to partition the keys into `b` groups (the paper
requires b>  2n, though I think in practice you can often get away with
less)

- Find a hash `f(x)` such that f is 1:1 within each group (which is
easily achieved since groups only has a few elements)

- For each group, from largest to smallest: Find a displacement
`d[group]` so that `f(x) ^ d` doesn't cause collisions.

It requires extra storage for the displacement table. However, I think 8
bits per element might suffice even for vtables of 512 or 1024 in size.
Even with 16 bits it's rather negligible compared to the minimum-128-bit
entries of the table.

I benchmarked these hash functions:

displace1: ((h>>  r1) ^ d[h&  63])&  m1
displace2: ((h>>  r1) ^ d[h&  m2])&  m1
displace3: ((h>>  r1) ^ d[(h>>  r2)&  m2])&  m1

Only the third one is truly in the spirit of the algorithm, but I think
the first two should work well too (and when h is known compile-time,
looking up d[h&  63] isn't harder than looking up r1 or m1).

My computer is acting up and all my numbers today are slower than the
earlier ones (yes, I've disabled turbo-mode in the BIOS for a year ago,
and yes, I've pinned the CPU speed). But here's today's numbers,
compiled with -DIMHASH:

direct: min=5.37e-09 mean=5.39e-09 std=1.96e-11 val=24.00
index: min=6.45e-09 mean=6.46e-09 std=1.15e-11 val=18.00
twoshift: min=6.99e-09 mean=7.00e-09 std=1.35e-11 val=18.00
threeshift: min=7.53e-09 mean=7.54e-09 std=1.63e-11 val=18.00
displace1: min=6.99e-09 mean=7.00e-09 std=1.66e-11 val=18.00
displace2: min=6.99e-09 mean=7.02e-09 std=2.77e-11 val=18.00
displace3: min=7.52e-09 mean=7.54e-09 std=1.19e-11 val=18.00


I did a dirty prototype of the table-finder as well and it works:

https://github.com/dagss/hashvtable/blob/master/pagh99.py



The paper obviously puts more effort on minimizing table size and not a fast
lookup. My hunch is that our choice should be

((h>>  table.r) ^ table.d[h&  m2])&  m1

and use 8-bits d (because even if you have 1024 methods, you'd rather double
the number of bins than those 2 extra bits available for displacement
options).

Then keep incrementing the size of d and the number of table slots (in such
an order that the total vtable size is minimized) until success. In practice
this should almost always just increase the size of d, and keep the table
size at the lowest 2**k that fits the slots (even for 64 methods or 128
methods :-))

Essentially we avoid the shift in the argument to d[] by making d larger.


Nice. I'm surprised that the indirection on d doesn't cost us much;


Well, table->d[const & const] compiles down to the same kind of code as 
table->m1. I guess I'm surprised too that displace2 doesn't penalize.



hopefully its size wouldn't be a big issue either. What kinds of
densities were you achieving?


The algorithm is designed for 100% density in the table itself. (We can 
lift that to compensate for a small space of possible hash functions I 
guess.)


I haven't done proper simulations yet, but I just tried |vtable|=128, 
|d|=128 from the command line and I had 15 successes or so before the 
first failure. That's with a 100% density in the vtable itself! (And 
when it fails, you increase |d| to get your success).


The caveat is the space spent on d (it's small in comparison, but that's 
why this isn't too good to be true).


A disadvantage might be that we may no longer have the opportunity to 
not make the table size a power of two (i.e. replace the mask with "if 
(likely(slot < n))"). I think for that to work one would need to replace 
the xor group with addition on Z_d.



Going back to the idea of linear probing on a cache miss, this has the
advantage that one can write a brain-dead provider that sets m=0 and
simply lists the methods instead of requiring a table optimizer. (Most
tools, of course, would do the table optimization.) It also lets you
get away with a "kind-of good" hash rather than requiring you search
until you find a (larger?) perfect one.


Well, given that we can have 100% density, and generating the table is 
ligh

Re: [Cython] Hash-based vtables

2012-06-06 Thread Dag Sverre Seljebotn


Dag Sverre Seljebotn  wrote:

>On 06/06/2012 11:16 PM, Robert Bradshaw wrote:
>> On Wed, Jun 6, 2012 at 1:57 PM, Dag Sverre Seljebotn
>>   wrote:
>>> On 06/06/2012 10:41 PM, Dag Sverre Seljebotn wrote:

 On 06/05/2012 12:30 AM, Robert Bradshaw wrote:
>
> I just found http://cmph.sourceforge.net/ which looks quite
> interesting. Though the resulting hash functions are supposedly
>cheap,
> I have the feeling that branching is considered cheap in this
>context.


 Actually, this lead was *very* promising. I believe the very first
 reference I actually read through and didn't eliminate after the
 abstract totally swept away our home-grown solutions!

 "Hash&  Displace" by Pagh (1999) is actually very simple, easy to
 understand, and fast both for generation and (the branch-free)
>lookup:



>http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.69.3753&rep=rep1&type=pdf


 The idea is:

 - Find a hash `g(x)` to partition the keys into `b` groups (the
>paper
 requires b>  2n, though I think in practice you can often get away
>with
 less)

 - Find a hash `f(x)` such that f is 1:1 within each group (which is
 easily achieved since groups only has a few elements)

 - For each group, from largest to smallest: Find a displacement
 `d[group]` so that `f(x) ^ d` doesn't cause collisions.

 It requires extra storage for the displacement table. However, I
>think 8
 bits per element might suffice even for vtables of 512 or 1024 in
>size.
 Even with 16 bits it's rather negligible compared to the
>minimum-128-bit
 entries of the table.

 I benchmarked these hash functions:

 displace1: ((h>>  r1) ^ d[h&  63])&  m1
 displace2: ((h>>  r1) ^ d[h&  m2])&  m1
 displace3: ((h>>  r1) ^ d[(h>>  r2)&  m2])&  m1

 Only the third one is truly in the spirit of the algorithm, but I
>think
 the first two should work well too (and when h is known
>compile-time,
 looking up d[h&  63] isn't harder than looking up r1 or m1).

 My computer is acting up and all my numbers today are slower than
>the
 earlier ones (yes, I've disabled turbo-mode in the BIOS for a year
>ago,
 and yes, I've pinned the CPU speed). But here's today's numbers,
 compiled with -DIMHASH:

 direct: min=5.37e-09 mean=5.39e-09 std=1.96e-11
>val=24.00
 index: min=6.45e-09 mean=6.46e-09 std=1.15e-11
>val=18.00
 twoshift: min=6.99e-09 mean=7.00e-09 std=1.35e-11
>val=18.00
 threeshift: min=7.53e-09 mean=7.54e-09 std=1.63e-11
>val=18.00
 displace1: min=6.99e-09 mean=7.00e-09 std=1.66e-11
>val=18.00
 displace2: min=6.99e-09 mean=7.02e-09 std=2.77e-11
>val=18.00
 displace3: min=7.52e-09 mean=7.54e-09 std=1.19e-11
>val=18.00


 I did a dirty prototype of the table-finder as well and it works:

 https://github.com/dagss/hashvtable/blob/master/pagh99.py
>>>
>>>
>>> The paper obviously puts more effort on minimizing table size and
>not a fast
>>> lookup. My hunch is that our choice should be
>>>
>>> ((h>>  table.r) ^ table.d[h&  m2])&  m1
>>>
>>> and use 8-bits d (because even if you have 1024 methods, you'd
>rather double
>>> the number of bins than those 2 extra bits available for
>displacement
>>> options).
>>>
>>> Then keep incrementing the size of d and the number of table slots
>(in such
>>> an order that the total vtable size is minimized) until success. In
>practice
>>> this should almost always just increase the size of d, and keep the
>table
>>> size at the lowest 2**k that fits the slots (even for 64 methods or
>128
>>> methods :-))
>>>
>>> Essentially we avoid the shift in the argument to d[] by making d
>larger.
>>
>> Nice. I'm surprised that the indirection on d doesn't cost us much;
>
>Well, table->d[const & const] compiles down to the same kind of code as
>
>table->m1. I guess I'm surprised too that displace2 doesn't penalize.
>
>> hopefully its size wouldn't be a big issue either. What kinds of
>> densities were you achieving?
>
>The algorithm is designed for 100% density in the table itself. (We can
>
>lift that to compensate for a small space of possible hash functions I 
>guess.)
>
>I haven't done proper simulations yet, but I just tried |vtable|=128, 
>|d|=128 from the command line and I had 15 successes or so before the 
>first failure. That's with a 100% density in the vtable itself! (And 
>when it fails, you increase |d| to get your success).
>
>The caveat is the space spent on d (it's small in comparison, but
>that's 
>why this isn't too good to be true).
>
>A disadvantage might be that we may no longer have the opportunity to 
>not make the table size a power of two (i.e. replace the mask with "if 
>(likely(slot < n))"). I think for that to work one would need to
>replace 
>the xor group with addition on Z_d.

Strike this p

Re: [Cython] Hash-based vtables

2012-06-06 Thread Robert Bradshaw
On Wed, Jun 6, 2012 at 2:36 PM, Dag Sverre Seljebotn
 wrote:
> On 06/06/2012 11:16 PM, Robert Bradshaw wrote:
>>
>> On Wed, Jun 6, 2012 at 1:57 PM, Dag Sverre Seljebotn
>>   wrote:
>>>
>>> On 06/06/2012 10:41 PM, Dag Sverre Seljebotn wrote:


 On 06/05/2012 12:30 AM, Robert Bradshaw wrote:
>
>
> I just found http://cmph.sourceforge.net/ which looks quite
> interesting. Though the resulting hash functions are supposedly cheap,
> I have the feeling that branching is considered cheap in this context.



 Actually, this lead was *very* promising. I believe the very first
 reference I actually read through and didn't eliminate after the
 abstract totally swept away our home-grown solutions!

 "Hash&  Displace" by Pagh (1999) is actually very simple, easy to

 understand, and fast both for generation and (the branch-free) lookup:



 http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.69.3753&rep=rep1&type=pdf


 The idea is:

 - Find a hash `g(x)` to partition the keys into `b` groups (the paper
 requires b>  2n, though I think in practice you can often get away with
 less)

 - Find a hash `f(x)` such that f is 1:1 within each group (which is
 easily achieved since groups only has a few elements)

 - For each group, from largest to smallest: Find a displacement
 `d[group]` so that `f(x) ^ d` doesn't cause collisions.

 It requires extra storage for the displacement table. However, I think 8
 bits per element might suffice even for vtables of 512 or 1024 in size.
 Even with 16 bits it's rather negligible compared to the minimum-128-bit
 entries of the table.

 I benchmarked these hash functions:

 displace1: ((h>>  r1) ^ d[h&  63])&  m1
 displace2: ((h>>  r1) ^ d[h&  m2])&  m1
 displace3: ((h>>  r1) ^ d[(h>>  r2)&  m2])&  m1


 Only the third one is truly in the spirit of the algorithm, but I think
 the first two should work well too (and when h is known compile-time,
 looking up d[h&  63] isn't harder than looking up r1 or m1).


 My computer is acting up and all my numbers today are slower than the
 earlier ones (yes, I've disabled turbo-mode in the BIOS for a year ago,
 and yes, I've pinned the CPU speed). But here's today's numbers,
 compiled with -DIMHASH:

 direct: min=5.37e-09 mean=5.39e-09 std=1.96e-11 val=24.00
 index: min=6.45e-09 mean=6.46e-09 std=1.15e-11 val=18.00
 twoshift: min=6.99e-09 mean=7.00e-09 std=1.35e-11 val=18.00
 threeshift: min=7.53e-09 mean=7.54e-09 std=1.63e-11
 val=18.00
 displace1: min=6.99e-09 mean=7.00e-09 std=1.66e-11 val=18.00
 displace2: min=6.99e-09 mean=7.02e-09 std=2.77e-11 val=18.00
 displace3: min=7.52e-09 mean=7.54e-09 std=1.19e-11 val=18.00


 I did a dirty prototype of the table-finder as well and it works:

 https://github.com/dagss/hashvtable/blob/master/pagh99.py
>>>
>>>
>>>
>>> The paper obviously puts more effort on minimizing table size and not a
>>> fast
>>> lookup. My hunch is that our choice should be
>>>
>>> ((h>>  table.r) ^ table.d[h&  m2])&  m1
>>>
>>>
>>> and use 8-bits d (because even if you have 1024 methods, you'd rather
>>> double
>>> the number of bins than those 2 extra bits available for displacement
>>> options).
>>>
>>> Then keep incrementing the size of d and the number of table slots (in
>>> such
>>> an order that the total vtable size is minimized) until success. In
>>> practice
>>> this should almost always just increase the size of d, and keep the table
>>> size at the lowest 2**k that fits the slots (even for 64 methods or 128
>>> methods :-))
>>>
>>> Essentially we avoid the shift in the argument to d[] by making d larger.
>>
>>
>> Nice. I'm surprised that the indirection on d doesn't cost us much;
>
>
> Well, table->d[const & const] compiles down to the same kind of code as
> table->m1. I guess I'm surprised too that displace2 doesn't penalize.
>
>
>> hopefully its size wouldn't be a big issue either. What kinds of
>> densities were you achieving?
>
>
> The algorithm is designed for 100% density in the table itself. (We can lift
> that to compensate for a small space of possible hash functions I guess.)
>
> I haven't done proper simulations yet, but I just tried |vtable|=128,
> |d|=128 from the command line and I had 15 successes or so before the first
> failure. That's with a 100% density in the vtable itself! (And when it
> fails, you increase |d| to get your success).
>
> The caveat is the space spent on d (it's small in comparison, but that's why
> this isn't too good to be true).
>
> A disadvantage might be that we may no longer have the opportunity to not
> make the table size a power of two (i.e. replace the mask with "if
> (likely(slot < n))"). I think for that to w